What is robots.txt?
Robots.txt is a plain text file that lives at the root of your website — for example https://yoursite.com/robots.txt. It tells search engine crawlers like Googlebot which pages or sections of your site they are allowed or not allowed to crawl.
Every website should have a robots.txt file. Without one, crawlers will crawl everything — including pages you may not want indexed like admin areas, login pages, or duplicate content sections.
Generate a clean robots.txt file or test existing rules against any URL.
How robots.txt works
The robots.txt file uses a simple syntax. Each rule set starts with a User-agent line that specifies which bot the rules apply to, followed by Allow and Disallow rules.
User-agent: *
Disallow:
# Block all crawlers from the admin area
User-agent: *
Disallow: /admin/
# Point to your sitemap
Sitemap: https://yoursite.com/sitemap.xml
Key syntax rules:
User-agent: *— applies to all crawlersUser-agent: Googlebot— applies only to GoogleDisallow: /admin/— blocks the /admin/ pathDisallow:(empty) — allows everythingAllow: /admin/help/— allows a specific sub-path even if parent is blocked- Lines starting with
#are comments — ignored by crawlers
What should you block in robots.txt?
Not everything on your site needs to be crawled. Blocking the right pages saves crawl budget for your important content and keeps duplicate or private pages out of Google's index.
- /admin/ — admin control panels
- /wp-admin/ — WordPress admin area
- /login/ and /account/ pages
- /cart/ and /checkout/ pages
- /xmlrpc.php — WordPress XML-RPC
- /search/ — internal search result pages
- /?s= — WordPress search queries
- /tag/ and /author/ archive pages
- Duplicate URL parameter pages
- Your main content pages and blog posts
- Product and category pages
- CSS and JavaScript files (Google needs these to render pages)
- Images you want indexed
Critical mistake: robots.txt does not prevent indexing
This is the most common robots.txt misunderstanding. Blocking a page in robots.txt stops Google from crawling it — but it does NOT guarantee the page won't appear in search results.
If other sites link to a blocked page, Google can still discover the URL and show it in results (without being able to read the content). To truly prevent a page from appearing in search results, use a noindex meta tag instead — and don't block it in robots.txt, otherwise Google can't read the noindex instruction.
Never block your CSS and JS files in robots.txt. Google needs to render your pages to understand them. Blocking stylesheets causes Google to see a broken version of your page which can hurt rankings.
Always add your sitemap URL
The robots.txt file is the perfect place to declare your XML sitemap URL. This helps Google find and crawl your sitemap automatically without needing to submit it manually.
A robots.txt template for most websites
Disallow: /admin/
Disallow: /login/
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yoursite.com/sitemap.xml
Key takeaways
Generate your robots.txt now
Free tool — build a clean robots.txt with sitemap, crawl-delay, and custom rules.
Open Robots.txt Generator →