Robots.txt Explained: How to Control Googlebot

What is robots.txt?

Robots.txt is a plain text file that lives at the root of your website — for example https://yoursite.com/robots.txt. It tells search engine crawlers like Googlebot which pages or sections of your site they are allowed or not allowed to crawl.

Every website should have a robots.txt file. Without one, crawlers will crawl everything — including pages you may not want indexed like admin areas, login pages, or duplicate content sections.

🤖 Free Tools

Generate a clean robots.txt file or test existing rules against any URL.

Robots.txt Generator → Robots.txt Tester →

How robots.txt works

The robots.txt file uses a simple syntax. Each rule set starts with a User-agent line that specifies which bot the rules apply to, followed by Allow and Disallow rules.

            # Allow all crawlers to access everything

            User-agent: *

            Disallow:

            # Block all crawlers from the admin area

            User-agent: *

            Disallow: /admin/

            # Point to your sitemap

            Sitemap: https://yoursite.com/sitemap.xml

Key syntax rules:

User-agent: * — applies to all crawlers
User-agent: Googlebot — applies only to Google
Disallow: /admin/ — blocks the /admin/ path
Disallow: (empty) — allows everything
Allow: /admin/help/ — allows a specific sub-path even if parent is blocked
Lines starting with # are comments — ignored by crawlers

What should you block in robots.txt?

Not everything on your site needs to be crawled. Blocking the right pages saves crawl budget for your important content and keeps duplicate or private pages out of Google's index.

❌ Always block these

/admin/ — admin control panels
/wp-admin/ — WordPress admin area
/login/ and /account/ pages
/cart/ and /checkout/ pages
/xmlrpc.php — WordPress XML-RPC

⚠️ Usually block these

/search/ — internal search result pages
/?s= — WordPress search queries
/tag/ and /author/ archive pages
Duplicate URL parameter pages

✓ Never block these

Your main content pages and blog posts
Product and category pages
CSS and JavaScript files (Google needs these to render pages)
Images you want indexed

Critical mistake: robots.txt does not prevent indexing

This is the most common robots.txt misunderstanding. Blocking a page in robots.txt stops Google from crawling it — but it does NOT guarantee the page won't appear in search results.

If other sites link to a blocked page, Google can still discover the URL and show it in results (without being able to read the content). To truly prevent a page from appearing in search results, use a noindex meta tag instead — and don't block it in robots.txt, otherwise Google can't read the noindex instruction.

⚠️ Common mistake to avoid

Never block your CSS and JS files in robots.txt. Google needs to render your pages to understand them. Blocking stylesheets causes Google to see a broken version of your page which can hurt rankings.

Always add your sitemap URL

The robots.txt file is the perfect place to declare your XML sitemap URL. This helps Google find and crawl your sitemap automatically without needing to submit it manually.

            Sitemap: https://yoursite.com/sitemap.xml
          

A robots.txt template for most websites

            User-agent: *

            Disallow: /admin/

            Disallow: /login/

            Disallow: /wp-admin/

            Disallow: /cart/

            Disallow: /checkout/

            Disallow: /account/

            Disallow: /search/

            Allow: /wp-admin/admin-ajax.php

            Sitemap: https://yoursite.com/sitemap.xml

Key takeaways

✓Every website needs a robots.txt file at the root domain

✓Block admin, login, cart, and search pages — never block CSS/JS

✓Robots.txt controls crawling, not indexing — use noindex meta tags to prevent indexing

✓Always include your sitemap URL at the bottom of robots.txt

✓Test your rules with our free Robots.txt Tester before uploading

Generate your robots.txt now

Free tool — build a clean robots.txt with sitemap, crawl-delay, and custom rules.

Open Robots.txt Generator →

← Meta Tags Guide Next: XML Sitemap Guide →