llms.txt tells AI crawlers what your site is about and how to index it efficiently. Learn the format, best practices, and implementation guide for 2026.

For 20 years, robots.txt was the way to communicate with search engine crawlers. You'd place it at your site root to tell Google and Bing which pages to crawl, which to skip, and when to crawl. Now, a new wave of crawlers is emerging: LLM crawlers from OpenAI, Anthropic, Google, and others. These crawlers have different needs. llms.txt is the emerging standard for communicating with them.
The llms.txt specification is designed to help AI systems understand what your site is about and how to index your content efficiently. Rather than forcing AI crawlers to guess whether your site is an e-commerce store, a news publication, or a technical documentation site, llms.txt tells them explicitly. This clarity helps AI systems retrieve and cite your content more accurately in search results.
AI crawlers face a unique problem. When ChatGPT or Gemini crawls your site to build training or retrieval data, they don't know what to prioritize. Should they crawl product pages or blog posts? How deeply should they crawl documentation? What pages are evergreen, and what pages are outdated? Without guidance, AI crawlers can waste resources on low-value pages or miss important content.
robots.txt helped solve this problem for search engines. It let you specify crawl budgets, disallowed URLs, and sitemap locations. But robots.txt was designed for traditional search engines optimizing for ranking. LLM crawlers have different needs. They care less about ranking position and more about understanding what you're authoritative in.
llms.txt bridges this gap. It communicates site structure and topical focus to AI crawlers. Instead of crawlers having to infer that your site is about "SaaS billing solutions," you can tell them directly. This accelerates discovery, improves indexing accuracy, and increases the likelihood that your content will be cited in relevant AI search results.
The llms.txt format is simple and human-readable. The file lives at your domain root (www.example.com/llms.txt) and contains key-value pairs describing your site. Here's a basic example:
Title: Example SaaS Company
Description: We provide billing automation software for B2B SaaS companies. Our content covers pricing strategies, metering, payment processing, and compliance.
Author: Example Company
Updated: 2026-04-01
Url: https://www.example.com
Crawl-Delay: 2
Allow: /blog, /docs, /resources
Disallow: /admin, /user-dashboard, /checkout
The syntax is intentionally straightforward so crawlers can parse it easily. You specify who you are, what your site covers, and which sections are okay to crawl. LLM crawlers that respect llms.txt will follow these directives, just as Google crawlers follow robots.txt.
Title tells crawlers your site or business name. Keep it concise and descriptive. "Example SaaS Company" is better than "Welcome to our site."
Description is your elevator pitch for what the site covers. Be specific about your topical expertise. Instead of "We write about tech," write "We publish technical guides for Python developers, focusing on async programming, testing, and production deployment." This specificity helps AI systems understand your authority.
Author identifies your organization or personal brand. Use your legal entity name or official brand name.
Updated tells crawlers when you last updated the llms.txt file. Use ISO 8601 format (YYYY-MM-DD). Crawlers use this to know whether to re-fetch the file.
Url is your site's canonical URL. Use the version you prefer (with or without www).
Allow and Disallow specify which sections of your site LLM crawlers can index. List directories or paths. Crawlers will index allowed paths and skip disallowed ones. You can have multiple Allow and Disallow rules.
Crawl-Delay (optional) specifies how many seconds crawlers should wait between requests. Use this if your server is under load. A value of 1-5 seconds is typical.
Beyond basic structure, llms.txt can include topical metadata to guide crawlers toward your areas of expertise. Add a Topics field listing your core topics:
Topics: Machine Learning, Natural Language Processing, Computer Vision, Large Language Models, AI Safety
You can also include a Entities field to define key organizations or people your site covers:
Entities: OpenAI, Anthropic, Google, Meta Platforms, Yann LeCun, Geoffrey Hinton
These fields help AI crawlers understand your topical authority and entity expertise. When a crawler sees "Machine Learning" and "Large Language Models" in your Topics field, it knows to pay special attention to your content on those subjects.
robots.txt is primarily restrictive. You tell crawlers where they're NOT allowed. llms.txt is primarily informative. You tell crawlers what you're about and what matters. robots.txt uses a User-Agent field to target specific crawlers; llms.txt is universal but with LLM crawlers in mind.
robots.txt affects search rankings directly. If you disallow crawlers from a page, it won't rank. llms.txt is less directly consequential to traditional rankings, but increasingly important for AI discoverability. You should have both files on your site with complementary rules.
In many cases, you'll want stricter rules in robots.txt (protecting sensitive pages from Google indexing) and more permissive rules in llms.txt (helping AI crawlers discover your topical expertise). For example:
robots.txt: Disallow /user-dashboard, /checkout, /admin
llms.txt: Allow /blog, /docs, /resources; Disallow /checkout, /admin, /user-dashboard
Create your llms.txt file and place it at www.example.com/llms.txt. Use plain UTF-8 text encoding. Make sure your web server serves it with a Content-Type header of text/plain. Test it by visiting the URL directly in your browser; you should see the raw text file. Validate your syntax using the llms.txt validator to ensure crawlers can parse it correctly.
Write clear, specific descriptions. Don't just copy your homepage tagline. Be honest about what your site covers. If you publish content on 15 different topics, list them. If you're narrowly focused, say so. AI systems value honest, specific metadata over vague descriptions. Include keywords that describe your vertical or industry. If you're an e-commerce site, mention "e-commerce, products, pricing." If you're a SaaS company, mention "software, billing, integrations."
Update the Updated field whenever you make changes to your llms.txt. This helps crawlers know when to re-fetch and re-parse your configuration. If your site's topical focus changes significantly, update the Description and Topics fields. Set a quarterly reminder to review and refresh your llms.txt, especially if you're creating new content categories or refining your positioning.
Monitor llms.txt adoption. As OpenAI, Anthropic, and Google DeepMind expand their AI crawlers, having llms.txt in place ensures your site is properly configured for AI crawling. By 2026, it's becoming table stakes for sites serious about AI search visibility.
Some people worry that llms.txt enables AI companies to train models on their data without permission. This is a fair concern. AI companies and privacy advocates are debating the ethics of web crawling and model training. Search and crawling standards have evolved over two decades to balance access with respect for content creators. llms.txt is part of this evolution, giving site owners more control.
If you want to prevent your content from being used for LLM training, add to llms.txt:
Training-Allowed: false
Some AI labs may respect this directive. However, llms.txt compliance is voluntary; no law requires AI crawlers to respect it. If you want stronger protection, use X-Robots-Tag headers or your server configuration to deny all bot access. For now, llms.txt is a best-effort tool for communication, not a legal mechanism. As regulation develops and industry standards harden, more robust mechanisms may emerge.
It's early to measure ROI from llms.txt since adoption is still ramping. But you can track indicators. Monitor your AI mentions and citations across ChatGPT, Gemini, Claude, and Perplexity. If you implement llms.txt and see citation growth, there's a correlation. Compare your citation growth to competitors who haven't implemented llms.txt yet.
Use AI mention tracking tools to quantify your AI search visibility. Track how many times your content is cited by major AI engines. As llms.txt adoption spreads, you should see measurable improvement in discoverability if your file is well-configured.
Similar to how robots.txt became standardized, llms.txt is on its way to becoming a universal expectation for web standards. By 2027, major AI search platforms will likely check for llms.txt as a first step in crawling. Sites without it may be crawled less efficiently or deprioritized.
Early adoption is smart for competitive advantage. Implementing llms.txt today signals to AI crawlers that you understand the new search landscape. It helps your content be discovered and indexed more efficiently. As competition for AI visibility increases, having proper configuration will matter more, not less.
llms.txt is becoming as essential as robots.txt for sites optimizing for AI search visibility. By placing this simple text file at your domain root, you communicate what your site covers and how AI crawlers should index it. The format is straightforward, implementation takes minutes, and the benefit is clear: better AI discoverability. If you're serious about being cited by ChatGPT, Claude, Gemini, and Perplexity, implement llms.txt now. As AI search traffic grows and adoption spreads, proper llms.txt configuration will become a standard expectation. Get ahead of competitors by implementing it today. Use Sorank's keyword research and discovery tools to identify which topics to highlight in your llms.txt file.
llms.txt is a text file placed at your site root (example.com/llms.txt) that tells AI language model crawlers what your site contains and how to index it optimally. Similar to robots.txt, which directs search engine crawlers, llms.txt directs ChatGPT, Claude, Gemini, and other LLM crawlers. It helps AI engines discover your content faster and understand your topical focus, increasing citation likelihood. As more AI search traffic flows through LLM-powered engines, llms.txt is becoming a critical SEO file.
robots.txt controls which pages traditional search engine crawlers can access. llms.txt is designed specifically for LLM and AI crawlers. Where robots.txt focuses on crawl budget and access restrictions, llms.txt communicates site structure, topical expertise, and important content sections. You can have both: robots.txt manages search engines, llms.txt manages AI crawlers. In many cases, you'll want LLM crawlers to have more permissive access than search bots.
Not mandatory yet, but it's rapidly becoming standard. OpenAI and other AI labs are moving toward respecting llms.txt. In 2026, sites with llms.txt have a clear advantage in AI discoverability. Early adoption is recommended for competitive advantage. Implementing it takes minutes and costs nothing. If your competitors don't have it yet, you can gain an edge by implementing llms.txt now.