Accessibility Snapshots

Overview

The accessibility-snapshot module produces a structured tree of a web document that the server-side UIAgent renders into LLM context as <ui_state>. Shape and filtering are inspired by Playwright’s accessibility snapshot and the Playwright MCP server’s LLM-facing format: the goal is a compact, semantically meaningful tree, not a raw DOM dump. This page documents the web SDK implementation. The RTVI ui-snapshot wire shape is intentionally platform-neutral: native clients can produce the same A11ySnapshot shape from iOS, Android, or other platform accessibility APIs.

import {
  snapshotDocument,
  findElementByRef,
  findRefForElement,
} from "@pipecat-ai/client-js";

Most apps should start streaming through the managed PipecatClient API or useUISnapshot in React. The walker (snapshotDocument) is exposed for apps that need a one-off snapshot, findElementByRef resolves a server-supplied ref back to a live DOM element for command handlers, and findRefForElement returns the snapshot ref assigned to a DOM element after it has appeared in a snapshot.

Apps can mark a subtree as PII / sensitive by adding data-a11y-exclude to the element. The walker skips it and its descendants entirely, without also hiding it from screen readers (unlike aria-hidden). Password inputs are stripped automatically.

A11ySnapshotStreamer

class A11ySnapshotStreamer {
  constructor(
    emitSnapshot: A11ySnapshotEmitter,
    options?: A11ySnapshotStreamerOptions,
  );
  start(): void;
  stop(): void;
}

Low-level, framework-agnostic helper that drives accessibility-snapshot streaming. It wraps the walker, a MutationObserver, and the other triggers (scrollend, resize, focus, visibilitychange) into a single object with a start / stop lifecycle. Vanilla web apps should usually use the managed client methods:

import { PipecatClient } from "@pipecat-ai/client-js";

const pcClient = new PipecatClient({
  /* ... */
});

pcClient.startUISnapshotStream({ debounceMs: 200 });

// Later
pcClient.stopUISnapshotStream();

If you need to provide your own transport or batching layer, instantiate the low-level streamer directly and provide an emitter callback:

import { A11ySnapshotStreamer } from "@pipecat-ai/client-js";

const streamer = new A11ySnapshotStreamer((snapshot) => {
  emitUISnapshot({ tree: snapshot });
});

streamer.start();

React apps should prefer useUISnapshot, which starts and stops the managed PipecatClient stream.

Triggers

A snapshot is scheduled (debounced via debounceMs) on:

DOM mutations observed by MutationObserver on document.body (childList, subtree, and a curated list of attribute changes including role, aria-*, data-a11y-exclude, disabled, hidden, tabindex, href).
Focus changes (focusin, focusout).
Scroll-end (scrollend, captured at window level so any scrollable ancestor triggers it).
Window resize (debounced because the viewport rect shifts which nodes are [offscreen]).
Tab visibility transition to visible.
Text selection changes (selectionchange).

An initial snapshot is scheduled immediately on start().

Options

debounceMs

number

default:"300"

Minimum interval between snapshot emissions, in milliseconds. Multiple triggers within the window coalesce into one snapshot.

trackViewport

boolean

default:"true"

When true, annotate every emitted node with "offscreen" in its state list if its bounding rect sits entirely outside the viewport. Set to false to skip the per-node layout measurement (e.g. on very large pages where layout cost outweighs the viewport signal).

logSnapshots

boolean

default:"false"

When true, log each emitted snapshot to the browser console (node count, rough token estimate, raw tree). Mirrors the server’s log_snapshots flag on UIAgent.

Methods

start

start(): void

Begin streaming. Idempotent: subsequent calls before stop() are no-ops.

stop

stop(): void

Stop streaming. Detaches all observers/listeners and cancels pending timers. Safe to call before start() or multiple times.

snapshotDocument

function snapshotDocument(
  root?: Element,
  options?: SnapshotOptions,
): A11ySnapshot;

Produce a one-off accessibility snapshot of a DOM subtree. Useful for tests or for apps that want to drive snapshot timing manually (e.g. snapshot only on a specific app event).

root

Element

Element to walk. Defaults to document.body.

options

SnapshotOptions

Snapshot options.

Show SnapshotOptions

trackViewport

boolean

default:"true"

When true, each emitted node gets "offscreen" in its state list if its bounding rect sits entirely outside the viewport.

Returns: An A11ySnapshot with a generic root containing the walked children, plus a client-side capture timestamp.

findElementByRef

function findElementByRef(ref: string): Element | null;

Resolve a ref string like "e42" back to a live DOM element. Returns null if the ref was never assigned or the element has since been garbage-collected. The walker assigns stable refs to every emitted DOM element via a WeakMap (forward) plus a Map<string, WeakRef<Element>> (reverse). Refs persist as long as the element stays mounted and survive across snapshots. Command handlers use this to act on nodes the server referenced from <ui_state>. The useDefault*Handler hooks call this internally.

findRefForElement

function findRefForElement(el: Element): string | null;

Return the snapshot ref assigned to a DOM element, if any. Returns null for elements the walker has not visited yet. This is useful when an app needs to associate a user interaction with the nearest snapshot-known node.

Types

A11ySnapshotEmitter

type A11ySnapshotEmitter = (snapshot: A11ySnapshot) => void;

Callback passed to the low-level A11ySnapshotStreamer. The managed PipecatClient stream provides this callback internally and sends each snapshot as a ui-snapshot RTVI message with { tree: snapshot }.

A11yNode

One node in the accessibility snapshot tree.

interface A11yNode {
  ref: string;
  role: string;
  name?: string;
  value?: string;
  state?: string[];
  level?: number;
  colcount?: number;
  rowcount?: number;
  children?: A11yNode[];
}

Field	Type	Description
`ref`	`string`	Stable web reference id of the form `e{N}`. Persists across snapshots while the DOM node is mounted.
`role`	`string`	ARIA role (explicit or tag-derived).
`name`	`string`	Accessible name, truncated to 100 chars.
`value`	`string`	Current value for inputs (omitted for passwords), progress, etc.
`state`	`string[]`	Short state tags. Known values: `"focused"`, `"selected"`, `"expanded"`, `"checked"`, `"disabled"`, `"offscreen"`.
`level`	`number`	Heading level, 1-6.
`colcount`	`number`	Column count for grid-like containers, populated from `aria-colcount`.
`rowcount`	`number`	Row count for grid-like containers, populated from `aria-rowcount`.
`children`	`A11yNode[]`	Child nodes.

A11ySnapshot

interface A11ySnapshot {
  root: A11yNode;
  captured_at: number;
  selection?: A11ySelection;
}

Shape of the payload inside a ui-snapshot RTVI message. A full tree is sent on each update; the server keeps the latest and renders it into <ui_state>...</ui_state> when an agent injects it. The fields are shared across client platforms; details in this page describe how the web SDK fills them from the DOM.

Field	Type	Description
`root`	`A11yNode`	Root of the accessibility tree (usually `document.body`’s node).
`captured_at`	`number`	Client-side timestamp (ms since epoch) when captured.
`selection`	`A11ySelection`	The user’s current text selection, when one exists. Omitted when nothing is selected. Optional.

A11ySelection

interface A11ySelection {
  ref: string;
  text: string;
  start_offset?: number;
  end_offset?: number;
}

The user’s current text selection. Lets the agent ground deictic references like “this paragraph” or “what I selected” against actual on-page content rather than re-asking the user. The server renders this as a <selection ref="...">...</selection> block inside <ui_state>.

Field	Type	Description
`ref`	`string`	Ref of the element carrying the selection. For document selections this is the closest common-ancestor element with a ref; for input/textarea it is the input element itself.
`text`	`string`	The selected text. Truncated at 2000 characters with a trailing ellipsis to keep `<ui_state>` injections bounded.
`start_offset`	`number`	Character offset within the input’s `value` where the selection starts. Only set for `<input>` and `<textarea>`. Optional.
`end_offset`	`number`	Character offset where the selection ends. Only set for `<input>` and `<textarea>`. Optional.

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Accessibility Snapshots

Overview

A11ySnapshotStreamer

Triggers

Options

Methods

start

stop

snapshotDocument

findElementByRef

findRefForElement

Types

A11ySnapshotEmitter

A11yNode

A11ySnapshot

A11ySelection

See also

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Documentation Index

​Overview

​A11ySnapshotStreamer

​Triggers

​Options

​Methods

​start

​stop

​snapshotDocument

​findElementByRef

​findRefForElement

​Types

​A11ySnapshotEmitter

​A11yNode

​A11ySnapshot

​A11ySelection

​See also

Overview

A11ySnapshotStreamer

Triggers

Options

Methods

start

stop

snapshotDocument

findElementByRef

findRefForElement

Types

A11ySnapshotEmitter

A11yNode

A11ySnapshot

A11ySelection

See also