Technical SEO April 3, 2026 · 5 min read

Robots.txt Explained: How to Control Googlebot

Learn what robots.txt does, how to write correct rules, what to block and what never to block — plus a free generator and tester tool.

Advertisement

What is robots.txt?

Robots.txt is a plain text file that lives at the root of your website — for example https://yoursite.com/robots.txt. It tells search engine crawlers like Googlebot which pages or sections of your site they are allowed or not allowed to crawl.

Every website should have a robots.txt file. Without one, crawlers will crawl everything — including pages you may not want indexed like admin areas, login pages, or duplicate content sections.

🤖 Free Tools

Generate a clean robots.txt file or test existing rules against any URL.

How robots.txt works

The robots.txt file uses a simple syntax. Each rule set starts with a User-agent line that specifies which bot the rules apply to, followed by Allow and Disallow rules.

# Allow all crawlers to access everything
User-agent: *
Disallow:

# Block all crawlers from the admin area
User-agent: *
Disallow: /admin/

# Point to your sitemap
Sitemap: https://yoursite.com/sitemap.xml

Key syntax rules:

  • User-agent: * — applies to all crawlers
  • User-agent: Googlebot — applies only to Google
  • Disallow: /admin/ — blocks the /admin/ path
  • Disallow: (empty) — allows everything
  • Allow: /admin/help/ — allows a specific sub-path even if parent is blocked
  • Lines starting with # are comments — ignored by crawlers
Advertisement

What should you block in robots.txt?

Not everything on your site needs to be crawled. Blocking the right pages saves crawl budget for your important content and keeps duplicate or private pages out of Google's index.

❌ Always block these
  • /admin/ — admin control panels
  • /wp-admin/ — WordPress admin area
  • /login/ and /account/ pages
  • /cart/ and /checkout/ pages
  • /xmlrpc.php — WordPress XML-RPC
⚠️ Usually block these
  • /search/ — internal search result pages
  • /?s= — WordPress search queries
  • /tag/ and /author/ archive pages
  • Duplicate URL parameter pages
✓ Never block these
  • Your main content pages and blog posts
  • Product and category pages
  • CSS and JavaScript files (Google needs these to render pages)
  • Images you want indexed

Critical mistake: robots.txt does not prevent indexing

This is the most common robots.txt misunderstanding. Blocking a page in robots.txt stops Google from crawling it — but it does NOT guarantee the page won't appear in search results.

If other sites link to a blocked page, Google can still discover the URL and show it in results (without being able to read the content). To truly prevent a page from appearing in search results, use a noindex meta tag instead — and don't block it in robots.txt, otherwise Google can't read the noindex instruction.

⚠️ Common mistake to avoid

Never block your CSS and JS files in robots.txt. Google needs to render your pages to understand them. Blocking stylesheets causes Google to see a broken version of your page which can hurt rankings.

Always add your sitemap URL

The robots.txt file is the perfect place to declare your XML sitemap URL. This helps Google find and crawl your sitemap automatically without needing to submit it manually.

Sitemap: https://yoursite.com/sitemap.xml

A robots.txt template for most websites

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yoursite.com/sitemap.xml

Key takeaways

Every website needs a robots.txt file at the root domain
Block admin, login, cart, and search pages — never block CSS/JS
Robots.txt controls crawling, not indexing — use noindex meta tags to prevent indexing
Always include your sitemap URL at the bottom of robots.txt
Test your rules with our free Robots.txt Tester before uploading

Generate your robots.txt now

Free tool — build a clean robots.txt with sitemap, crawl-delay, and custom rules.

Open Robots.txt Generator →
← Meta Tags Guide Next: XML Sitemap Guide →
Advertisement