Producing high-quality, well-structured content is only useful for GEO if AI crawlers can actually reach and render that content. A single misplaced robots.txt directive, a JavaScript-heavy rendering stack, or an absent llms.txt file can silently exclude your entire site from the training and retrieval pipelines of every major AI engine. The tool above audits a domain you provide and checks whether the main AI crawlers, including GPTBot, OAI-SearchBot, PerplexityBot, Google-Extended, and ClaudeBot, can access your pages and process them correctly.
What the audit checks
The tool above evaluates four main categories of crawlability:
- Robots.txt directives: the audit reads your robots.txt file and identifies which AI crawler user-agents are explicitly blocked, accidentally blocked by wildcard rules, or missing from any allow list. It also checks whether the file itself is accessible, properly formatted, and does not exceed the 500 KB limit that some crawlers enforce.
- Meta robots and X-Robots-Tag headers: a robots.txt that allows crawling is insufficient if individual pages carry a
noindexornoarchivemeta tag, or if server response headers instruct bots to skip the page. The audit inspects both sources. - JavaScript rendering dependency: pages that deliver critical content exclusively through JavaScript are invisible to crawlers that do not execute scripts. The audit detects whether the main content on your pages is available in the raw HTML or only after client-side rendering.
- Sitemaps and llms.txt: a well-maintained sitemap.xml helps AI crawlers discover pages efficiently. The newer llms.txt standard, modelled on robots.txt but designed specifically for LLMs, lets you declare which sections of your site are suitable for AI consumption and summarise your content in a machine-readable way. The audit checks whether both files exist and are properly formatted.
How to interpret and act on the results
The tool above flags each issue with a severity level. Here is how to prioritise your remediation:
- Blocked AI crawlers in robots.txt: remove or narrow the directive that blocks the relevant user-agent. If you intentionally block all AI crawlers for licensing reasons, confirm this is a deliberate policy decision rather than an accidental wildcard block inherited from a CMS template.
- Noindex on key pages: review each flagged page. If a page contains valuable content you want cited, remove the noindex directive. If the page is intentionally excluded, verify that the block was indeed intentional and not a staging environment directive left in place after launch.
- JavaScript-only content: implement server-side rendering (SSR) or static site generation (SSG) for content you want AI crawlers to index. At minimum, ensure that page titles, headings, and the first 200 words of body text are available in the server-rendered HTML before JavaScript executes.
- Missing or outdated sitemap: generate a fresh sitemap.xml that includes all canonical URLs, excludes redirected or noindex pages, and is referenced in robots.txt. Update it automatically whenever new content is published.
- No llms.txt file: create an llms.txt file at the root of your domain. At minimum, include a brief description of your site, the primary topics covered, and links to your most important pages. This is a low-effort signal that can meaningfully improve how AI crawlers categorise your site.
A benchmark on AI crawl access
AI Overviews now appear on approximately 31% of Google queries, and position-1 pages behind an AI Overview lose up to 58% of expected clicks (Ahrefs, 2025). The pages that capture that displaced traffic are those cited inside the AI answer. Crawlability is the prerequisite: if an AI bot cannot access your content, no amount of on-page optimisation will earn you a citation. Fixing crawl barriers is therefore the highest-leverage starting point for any GEO strategy.
For ongoing monitoring of your AI crawlability and citation performance across all major AI engines, Sorank tracks your GEO visibility and alerts you when access changes.

























