Preferences

Privacy is important to us, so you have the option of disabling certain types of storage that may not be necessary for the basic functioning of the website. Blocking categories may impact your experience on the website. More information

Accept all cookies

Crawl Budget: How Often Search Engines Visit Your Site

Crawl budget is the number of pages a search engine will crawl on your site in a given time. Learn how it works, who needs it, and how to optimize it in 2026.

Man with dark hair and beard wearing a light brown shirt speaks in front of a microphone on a podcast or recording setup.Portrait of a man with short dark hair wearing a white shirt and dark jacket, looking directly at the camera with a neutral expression.Man with short dark hair, beard, and clear glasses wearing a black t-shirt with a white circular logo, standing in front of a stone wall.Celio fabianoSmiling young woman with long brown hair wearing a red top and necklace, outdoors in a tree-filled background.photo de profil du client Xavier Breull
+ 9'000 subscribers
A chart showing how crawl budget is allocated across high-value pages versus low-value pages.
Upload UI element
Thibault Besson-Magdelain fondateur de Sorank

About Author

Thibault Besson-Magdelain

Founder of Sorank, 5+ years of experience in SEO, GEO enthusiast.
Share on

Summary: Crawl budget is the number of URLs a search engine can and wants to crawl on your site within a given timeframe. Google sets it from two factors, crawl capacity limit and crawl demand, and it matters most for large or frequently updated sites.

Crawl budget describes how much crawling attention a search engine allocates to your website. Googlebot does not crawl every page on the internet equally or constantly; it makes economic choices about where to spend its finite resources. Crawl budget is the practical result of those choices for your specific domain: the set of URLs Google can fetch and wants to fetch in a window of time.

Google does not publish a number for your crawl budget or let you set one manually. Instead, Google's crawl budget documentation explains that it emerges from two underlying factors. Understanding those factors is the key to influencing how thoroughly and how often your pages get crawled, especially as AI crawlers add new pressure on server resources in 2026.

The Two Factors: Crawl Capacity and Crawl Demand

Crawl capacity limit is the maximum number of simultaneous connections Googlebot will use to crawl your site, plus the delay between fetches. It is governed by your server's health. If your site responds quickly and without errors, Google raises the limit and crawls more aggressively. If your server slows down or returns 5xx errors, Google backs off to avoid overloading you.

Crawl demand is how badly Google wants to crawl your pages in the first place. It rises with popularity (URLs that attract traffic and links), perceived inventory (how many useful pages Google thinks you have), and staleness (pages Google believes need refreshing). A large, frequently updated, authoritative site generates high crawl demand; a small, static site generates low demand.

Your effective crawl budget is the meeting point of these two. High capacity with low demand still means light crawling, and high demand throttled by a slow server means missed pages. Both levers matter, and they interact with your overall crawling footprint.

Who Actually Needs to Worry About Crawl Budget

Most websites do not need to think about crawl budget at all. If you have a few hundred or a few thousand pages and your server is reasonably fast, Google will crawl everything important without difficulty. Spending energy on crawl budget optimization for a small site is usually wasted effort.

Google explicitly targets its guidance at three groups: large sites with one million or more unique pages that change at least weekly, medium-to-large sites of 10.000 or more unique pages that change daily, and any site where a large share of URLs show as Discovered but currently not indexed in Search Console. If you fall into one of these buckets, crawl budget becomes a real constraint that directly controls which pages get indexed and how fresh they stay.

How to Tell If Crawl Budget Is a Problem

The clearest signal lives in Google Search Console. Open the Crawl Stats report to see how many requests Googlebot makes per day, the average response time, and any availability errors. A high request count spent on low-value URLs, or rising response times, points to inefficiency.

Watch the Pages report for the Discovered but currently not indexed status. When Google has found a URL but has not crawled it, your important pages may be waiting behind a queue of junk. Server log analysis is the most precise method: it shows exactly which URLs Googlebot fetches and how often, revealing where your budget actually goes versus where you want it to go.

What Wastes Crawl Budget

The biggest drains are predictable. Faceted navigation and URL parameters can spawn near-infinite combinations of filtered and sorted pages, each a unique URL that Googlebot may try to crawl. Session IDs in URLs create the same explosion of duplicates. Every one of these wasted fetches is a fetch not spent on a real page.

Other common wasters include long redirect chains, soft 404 pages that return a 200 status for missing content, duplicate content across multiple URLs, infinite-scroll or calendar pages that generate endless links, and outdated sitemaps pointing at dead URLs. Each of these consumes crawl capacity that should go toward your newest product, article, or landing page and toward better indexing of pages you care about.

How to Optimize Crawl Budget

Start by blocking what Google should not crawl. Use your robots.txt file to disallow faceted parameters, internal search results, and other low-value URL patterns. Google notes that robots.txt, not noindex, is the right tool here: a noindex page still has to be crawled to be read, which spends budget, whereas a disallowed path is skipped.

Next, clean up your status codes. Return a 404 or 410 for pages you have permanently removed so Google stops requesting them. Consolidate duplicate content behind canonical tags, and eliminate redirect chains by pointing links straight at the final URL. Each fix recovers fetches for pages that deserve them.

Finally, keep your XML sitemap accurate with honest lastmod dates, and improve server speed so Google raises your crawl capacity limit. A faster site is a more thoroughly crawled site, all else equal.

Crawl Budget and AI Crawlers in 2026

Googlebot is no longer the only crawler competing for your server's attention. AI engines deploy their own bots, including GPTBot and OAI-SearchBot from OpenAI, ClaudeBot from Anthropic, and PerplexityBot, to gather and refresh the content they cite in answers. These crawlers consume real bandwidth and server cycles.

Server data from 2025 showed AI and search crawler traffic climbing steeply, with several bots growing by hundreds of percent year over year. The practical effect is that the same site-health and efficiency work that protects your Google crawl budget also protects performance for AI crawlers. A fast server and a clean URL structure help every bot, from Googlebot to the engines powering AI search, spend their limited crawling on pages worth fetching.

Crawl Budget vs. Indexing: A Common Confusion

Crawling and indexing are distinct steps, and conflating them leads to wasted effort. Crawl budget governs whether and how often Google fetches a URL. Indexing is the separate decision about whether a crawled page is worth storing and serving in results. A page can be crawled and then deliberately left unindexed because Google judged it low value.

This matters for diagnosis. If a page is Discovered but not crawled, that is a crawl budget or discovery issue, and the fixes above apply. If a page is Crawled but not indexed, the problem is usually content quality, thin content, or duplication, and no amount of crawl optimization will fix it. Identify which stage is failing before you act.

Conclusion

Crawl budget is the number of URLs a search engine can and wants to crawl on your site in a given timeframe, set by crawl capacity limit (your server health) and crawl demand (your popularity, inventory, and freshness). It is a genuine concern for large sites of 10.000 or more pages and for any site with many Discovered but not indexed URLs, and largely a non-issue for small ones. To optimize it, block low-value URLs with robots.txt, fix status codes and redirect chains, eliminate duplicates, keep your sitemap accurate, and speed up your server. In 2026, that same hygiene also keeps AI crawlers efficient. Run a Sorank GEO SEO audit to find the crawl waste hurting your indexing.

Frequently questions asked

What is a good crawl budget for my website?

There is no single target number, and Google does not publish or let you set one. Crawl budget is the practical result of your server health and how much Google wants your pages. For most small and medium sites, Google crawls everything important without any intervention, so there is no number to chase. Crawl budget becomes a real factor only at tens of thousands of URLs or when many pages show as Discovered but currently not indexed.

How do I increase my crawl budget?

You influence it through the two factors Google uses. Raise crawl capacity by making your server fast and reliable, since Google crawls more when responses are quick and error-free. Raise crawl demand by publishing useful content that earns links and traffic, and by updating pages so Google sees them as worth refreshing. Just as important, stop wasting the budget you have by blocking low-value URLs and fixing duplicates, which frees fetches for pages that matter.

Is crawl budget the same as indexing?

No, they are separate stages. Crawl budget controls whether and how often Google fetches a URL, while indexing is the later decision about whether to store and serve that page in search results. A page can be crawled and then left unindexed because Google judged it low value. This distinction matters for diagnosis: a Discovered but not crawled page is a crawl issue, while a Crawled but not indexed page is usually a content quality problem.

Our Blog for Ambitious Company