Preferences

Privacy is important to us, so you have the option of disabling certain types of storage that may not be necessary for the basic functioning of the website. Blocking categories may impact your experience on the website. More information

Accept all cookies

Robots.txt

Robots.txt controls what Google crawls. Learn syntax, examples, and best practices to maximize crawl budget and hide pages from Google.

Man with dark hair and beard wearing a light brown shirt speaks in front of a microphone on a podcast or recording setup.Portrait of a man with short dark hair wearing a white shirt and dark jacket, looking directly at the camera with a neutral expression.Man with short dark hair, beard, and clear glasses wearing a black t-shirt with a white circular logo, standing in front of a stone wall.Celio fabianoSmiling young woman with long brown hair wearing a red top and necklace, outdoors in a tree-filled background.photo de profil du client Xavier Breull
+ 9'000 subscribers
Illustration of a robots.txt file with syntax examples and a crawler icon interacting with a website structure.
Upload UI element
Thibault Besson-Magdelain fondateur de Sorank

About Author

Thibault Besson-Magdelain

Founder of Sorank, 5+ years of experience in SEO, GEO enthusiast.
Share on

Summary: Robots.txt is a text file at your site root that tells Google which pages to crawl. It optimizes crawl budget and hides low-value pages.

Robots.txt is one of the first files Google fetches when crawling your site. A simple robots.txt can save you hundreds of hours of crawl budget wasted on duplicate pages, admin directories, and staging environments.

Google crawls each site with a finite budget based on your authority. Google's official robots.txt documentation confirms that a well-configured robots.txt is your primary tool for crawl budget optimization.

Robots.txt syntax and rules

A robots.txt file has User-agent followed by Allow and Disallow rules. Order matters: specific rules before general ones. Disallow: / would block your entire site. That is rarely useful.

Common robots.txt patterns

Block admin panels, internal search, and parameter-driven duplicates. Always add your sitemap directive: Sitemap: https://example.com/sitemap.xml.

What robots.txt does and does not do

Robots.txt prevents Google from crawling a page. It does NOT prevent ranking. To prevent ranking, use meta noindex instead. Crawl budget is limited. Save it for pages you want to rank.

Testing and monitoring robots.txt

Test your robots.txt in Google Search Console. Go to Settings to find the Robots.txt Tester. Check monthly to ensure Google is respecting your rules. Document your robots.txt logic in comments for your team.

Robots.txt mistakes to avoid

Do not block CSS, JavaScript, or image files. Do not use robots.txt to hide private content. Pair robots.txt with a sitemap to guide Google to your priority pages.

Conclusion

A well-configured robots.txt is invisible to users but critical for SEO. It redirects crawl budget from duplicate pages to your money pages. Start by blocking admin, search, and staging directories, add your sitemap, and test in Search Console. Our GEO audit flags robots.txt issues and shows you what Google sees when it crawls your site.

Frequently questions asked

Do I need a robots.txt file?

Not strictly required, but highly recommended. Without it, Google crawls everything, wasting crawl budget on duplicate and low-value pages.

Can robots.txt block Google from ranking my pages?

No. Robots.txt prevents crawling, not ranking. A page blocked in robots.txt can still rank if other sites link to it.

Where does robots.txt go on my site?

Always at the root: example.com/robots.txt. Google looks there first. If robots.txt is anywhere else, Google will not find it.

Our Blog for Ambitious Company