Robots.txt: The Full Guide With Examples

Preferences

Privacy is important to us, so you have the option of disabling certain types of storage that may not be necessary for the basic functioning of the website. Blocking categories may impact your experience on the website. More information

Accept all cookies

About Author

Thibault Besson-Magdelain

Founder of Sorank, 5+ years of experience in SEO, GEO enthusiast.

Read other articles

Summarize with

ChatGPT Perplexity

Share on

Summary: Robots.txt is a text file at your site root that tells Google which pages to crawl. It optimizes crawl budget and hides low-value pages.

Robots.txt is one of the first files Google fetches when crawling your site. A simple robots.txt can save you hundreds of hours of crawl budget wasted on duplicate pages, admin directories, and staging environments.

Google crawls each site with a finite budget based on your authority. Google's official robots.txt documentation confirms that a well-configured robots.txt is your primary tool for crawl budget optimization.

Robots.txt syntax and rules

A robots.txt file has User-agent followed by Allow and Disallow rules. Order matters: specific rules before general ones. Disallow: / would block your entire site. That is rarely useful.

Common robots.txt patterns

Block admin panels, internal search, and parameter-driven duplicates. Always add your sitemap directive: Sitemap: https://example.com/sitemap.xml.

What robots.txt does and does not do

Robots.txt prevents Google from crawling a page. It does NOT prevent ranking. To prevent ranking, use meta noindex instead. Crawl budget is limited. Save it for pages you want to rank.

Testing and monitoring robots.txt

Test your robots.txt in Google Search Console. Go to Settings to find the Robots.txt Tester. Check monthly to ensure Google is respecting your rules. Document your robots.txt logic in comments for your team.

Robots.txt mistakes to avoid

Do not block CSS, JavaScript, or image files. Do not use robots.txt to hide private content. Pair robots.txt with a sitemap to guide Google to your priority pages.

Conclusion

A well-configured robots.txt is invisible to users but critical for SEO. It redirects crawl budget from duplicate pages to your money pages. Start by blocking admin, search, and staging directories, add your sitemap, and test in Search Console. Our GEO audit flags robots.txt issues and shows you what Google sees when it crawls your site.