Speakable: Mark Up Content for Voice and Audio Answers in 2026

About Author

Thibault Besson-Magdelain

Founder of Sorank, 5+ years of experience in SEO, GEO enthusiast.

Read other articles

Summarize with

ChatGPT Perplexity

Share on

Summary: Speakable is a schema.org property, currently in beta, that marks the sections of a page best suited for text-to-speech playback, letting voice assistants like Google Assistant read those passages aloud in response to spoken queries.

Speakable is a schema.org structured data property that identifies the sections within an article or webpage best suited for audio playback using text-to-speech. By marking up a headline and a short summary, you tell search engines and voice platforms exactly which words to read aloud when answering a spoken question.

It exists because voice and audio answers work differently from a screen. A device cannot read a whole article aloud, so speakable points it to the concise, self-contained passages that make sense without any visual context. The property is still in beta and narrow in scope, but it captures an idea that matters more every year: making content consumable by ear, not just by eye.

What is speakable schema?

Speakable is a property in the schema.org vocabulary that flags page sections for text-to-speech. Implemented as part of structured data, usually within a WebPage or Article type, it specifies which elements a voice assistant should prioritize when reading content aloud. The property can be repeated to mark several sections on a page.

It is best understood as a hint, not a command. The markup tells the platform which passages are voice-friendly, but the system still decides algorithmically whether and what to read. As a form of structured content, speakable is part of the broader practice of labeling your page so machines can use it precisely.

How Google Assistant uses speakable

Google Assistant uses speakable structured data to answer topical news queries on smart speakers and voice-enabled devices. When a user asks for news on a subject, the Assistant returns up to three articles from around the web and uses text-to-speech to read the sections marked as speakable, along with attribution to the source.

This makes speakable a route to appearing in spoken answers, not just visual results. Being the source a device reads aloud is a distinct form of visibility, closely tied to voice search, where the single spoken answer carries far more weight than a list of links a user can scan.

How to implement speakable markup

You add speakable as a SpeakableSpecification inside your structured data, then point it at the content to read using one of two content-locator types: CSS selectors or xPaths. A CSS selector targets elements by class or tag, for example a headline class and a summary class, while an xPath navigates the document structure directly. Only one locator type is allowed per specification object.

A typical JSON-LD snippet sets the page type, then a speakable object whose cssSelector lists the headline and summary elements. Keep the markup pointed at passages that already read well aloud. Implemented cleanly, this is a small addition that turns existing answer ready content into something a voice platform can speak directly.

Content guidelines for speakable sections

Google recommends around 20 to 30 seconds of content per speakable section, roughly two to three sentences. The goal is concise headlines and summaries that deliver comprehensible, useful information on their own, not entire articles read end to end. Mark only the one or two passages that directly answer the likely question.

Crucially, the content must make sense in a voice-only setting. Avoid marking up photo captions, datelines, or attributions that confuse a listener who cannot see the page. Write short, clear sentences in a conversational register, since anything ambiguous on screen becomes worse when heard without visual cues.

Why speakable matters for SEO and GEO

For SEO, speakable is a niche but real opportunity in voice. As audio consumption grows, with a large majority of people listening to online audio each month, being the passage a device reads aloud is valuable hands-free visibility. It also improves accessibility for users with visual impairments and supports multitasking listeners.

For generative engine optimization, the deeper lesson is structural. The same discipline that makes content speakable, concise, self-contained passages that answer a question without surrounding context, is exactly what helps AI assistants extract and cite your content. Optimizing for the ear reinforces good voice search optimization and the passage clarity that AI search rewards.

Limitations and challenges

Speakable is narrow today. It is in beta, limited to news content, and available to users in the US with English-language devices. It works only with Google Assistant, so it does not reach other voice platforms, and Google may choose to read a different section than the one you marked if its algorithms judge it more relevant.

Because of these constraints, speakable should not be the centerpiece of a strategy. Treat it as a low-cost enhancement for eligible content rather than a guaranteed channel. The principles behind it, however, clarity and self-contained answers, apply far beyond the property itself and pay off across voice and AI search.

Speakable and the future of voice and audio

The property points toward a broader shift: content increasingly consumed without a screen, through smart speakers, in-car assistants, and AI voices. As more answers are spoken rather than shown, the ability to deliver a clean, standalone passage becomes a core skill, whether or not a specific markup is involved.

Even if speakable stays limited, designing content to be read aloud, short sentences, direct answers, no reliance on visuals, future-proofs it for voice-first and AI-driven discovery. The format teaches a habit worth keeping: write so a machine can speak your best sentences and have them stand on their own.

Conclusion

Speakable is a schema.org property that marks the sections of a page best suited for text-to-speech, used by Google Assistant to read news answers aloud. It is implemented with CSS selectors or xPaths, works best on concise two to three sentence passages, and remains narrow in scope and in beta. Its real value is the discipline it encourages.

Pair it with broader voice search optimization and clean structured content, and use Sorank's research and content planning tools to build answers that work by ear and on screen. Reference sources: Google Search Central and Productive Shop.

Frequently questions asked

What is speakable schema used for?

Speakable is a schema.org property that marks the sections of a page best suited for text-to-speech playback. Google Assistant uses it to answer topical news queries on smart speakers, reading the marked headline and summary aloud and crediting the source. It lets publishers signal which concise, self-contained passages a voice assistant should speak.

How long should a speakable section be?

Google recommends around 20 to 30 seconds of audio per speakable section, which is roughly two to three sentences. The aim is a concise headline or summary that delivers useful information on its own. Mark only the one or two passages that directly answer the likely question, and make sure they make sense without any visual context.

What are the main limitations of speakable schema?

Speakable is still in beta and narrow in scope. It is limited to news content, available mainly to US users with English-language devices, and works only with Google Assistant rather than all voice platforms. Google may also read a different section than the one you marked if it judges that more relevant, so inclusion is not guaranteed.