Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-mb-ui-agent.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The accessibility-snapshot module produces a structured tree of a web document that the server-side UIAgent renders into LLM context as <ui_state>. Shape and filtering are inspired by Playwright’s accessibility snapshot and the Playwright MCP server’s LLM-facing format: the goal is a compact, semantically meaningful tree, not a raw DOM dump. This page documents the web SDK implementation. The RTVI ui-snapshot wire shape is intentionally platform-neutral: native clients can produce the same A11ySnapshot shape from iOS, Android, or other platform accessibility APIs.
import {
  snapshotDocument,
  findElementByRef,
  findRefForElement,
} from "@pipecat-ai/client-js";
Most apps should start streaming through the managed PipecatClient API or useUISnapshot in React. The walker (snapshotDocument) is exposed for apps that need a one-off snapshot, findElementByRef resolves a server-supplied ref back to a live DOM element for command handlers, and findRefForElement returns the snapshot ref assigned to a DOM element after it has appeared in a snapshot.
Apps can mark a subtree as PII / sensitive by adding data-a11y-exclude to the element. The walker skips it and its descendants entirely, without also hiding it from screen readers (unlike aria-hidden). Password inputs are stripped automatically.

A11ySnapshotStreamer

class A11ySnapshotStreamer {
  constructor(
    emitSnapshot: A11ySnapshotEmitter,
    options?: A11ySnapshotStreamerOptions,
  );
  start(): void;
  stop(): void;
}
Low-level, framework-agnostic helper that drives accessibility-snapshot streaming. It wraps the walker, a MutationObserver, and the other triggers (scrollend, resize, focus, visibilitychange) into a single object with a start / stop lifecycle. Vanilla web apps should usually use the managed client methods:
import { PipecatClient } from "@pipecat-ai/client-js";

const pcClient = new PipecatClient({
  /* ... */
});

pcClient.startUISnapshotStream({ debounceMs: 200 });

// Later
pcClient.stopUISnapshotStream();
If you need to provide your own transport or batching layer, instantiate the low-level streamer directly and provide an emitter callback:
import { A11ySnapshotStreamer } from "@pipecat-ai/client-js";

const streamer = new A11ySnapshotStreamer((snapshot) => {
  emitUISnapshot({ tree: snapshot });
});

streamer.start();
React apps should prefer useUISnapshot, which starts and stops the managed PipecatClient stream.

Triggers

A snapshot is scheduled (debounced via debounceMs) on:
  • DOM mutations observed by MutationObserver on document.body (childList, subtree, and a curated list of attribute changes including role, aria-*, data-a11y-exclude, disabled, hidden, tabindex, href).
  • Focus changes (focusin, focusout).
  • Scroll-end (scrollend, captured at window level so any scrollable ancestor triggers it).
  • Window resize (debounced because the viewport rect shifts which nodes are [offscreen]).
  • Tab visibility transition to visible.
  • Text selection changes (selectionchange).
An initial snapshot is scheduled immediately on start().

Options

debounceMs
number
default:"300"
Minimum interval between snapshot emissions, in milliseconds. Multiple triggers within the window coalesce into one snapshot.
trackViewport
boolean
default:"true"
When true, annotate every emitted node with "offscreen" in its state list if its bounding rect sits entirely outside the viewport. Set to false to skip the per-node layout measurement (e.g. on very large pages where layout cost outweighs the viewport signal).
logSnapshots
boolean
default:"false"
When true, log each emitted snapshot to the browser console (node count, rough token estimate, raw tree). Mirrors the server’s log_snapshots flag on UIAgent.

Methods

start

start(): void
Begin streaming. Idempotent: subsequent calls before stop() are no-ops.

stop

stop(): void
Stop streaming. Detaches all observers/listeners and cancels pending timers. Safe to call before start() or multiple times.

snapshotDocument

function snapshotDocument(
  root?: Element,
  options?: SnapshotOptions,
): A11ySnapshot;
Produce a one-off accessibility snapshot of a DOM subtree. Useful for tests or for apps that want to drive snapshot timing manually (e.g. snapshot only on a specific app event).
root
Element
Element to walk. Defaults to document.body.
options
SnapshotOptions
Snapshot options.
Returns: An A11ySnapshot with a generic root containing the walked children, plus a client-side capture timestamp.

findElementByRef

function findElementByRef(ref: string): Element | null;
Resolve a ref string like "e42" back to a live DOM element. Returns null if the ref was never assigned or the element has since been garbage-collected. The walker assigns stable refs to every emitted DOM element via a WeakMap (forward) plus a Map<string, WeakRef<Element>> (reverse). Refs persist as long as the element stays mounted and survive across snapshots. Command handlers use this to act on nodes the server referenced from <ui_state>. The useDefault*Handler hooks call this internally.

findRefForElement

function findRefForElement(el: Element): string | null;
Return the snapshot ref assigned to a DOM element, if any. Returns null for elements the walker has not visited yet. This is useful when an app needs to associate a user interaction with the nearest snapshot-known node.

Types

A11ySnapshotEmitter

type A11ySnapshotEmitter = (snapshot: A11ySnapshot) => void;
Callback passed to the low-level A11ySnapshotStreamer. The managed PipecatClient stream provides this callback internally and sends each snapshot as a ui-snapshot RTVI message with { tree: snapshot }.

A11yNode

One node in the accessibility snapshot tree.
interface A11yNode {
  ref: string;
  role: string;
  name?: string;
  value?: string;
  state?: string[];
  level?: number;
  colcount?: number;
  rowcount?: number;
  children?: A11yNode[];
}
FieldTypeDescription
refstringStable web reference id of the form e{N}. Persists across snapshots while the DOM node is mounted.
rolestringARIA role (explicit or tag-derived).
namestringAccessible name, truncated to 100 chars.
valuestringCurrent value for inputs (omitted for passwords), progress, etc.
statestring[]Short state tags. Known values: "focused", "selected", "expanded", "checked", "disabled", "offscreen".
levelnumberHeading level, 1-6.
colcountnumberColumn count for grid-like containers, populated from aria-colcount.
rowcountnumberRow count for grid-like containers, populated from aria-rowcount.
childrenA11yNode[]Child nodes.

A11ySnapshot

interface A11ySnapshot {
  root: A11yNode;
  captured_at: number;
  selection?: A11ySelection;
}
Shape of the payload inside a ui-snapshot RTVI message. A full tree is sent on each update; the server keeps the latest and renders it into <ui_state>...</ui_state> when an agent injects it. The fields are shared across client platforms; details in this page describe how the web SDK fills them from the DOM.
FieldTypeDescription
rootA11yNodeRoot of the accessibility tree (usually document.body’s node).
captured_atnumberClient-side timestamp (ms since epoch) when captured.
selectionA11ySelectionThe user’s current text selection, when one exists. Omitted when nothing is selected. Optional.

A11ySelection

interface A11ySelection {
  ref: string;
  text: string;
  start_offset?: number;
  end_offset?: number;
}
The user’s current text selection. Lets the agent ground deictic references like “this paragraph” or “what I selected” against actual on-page content rather than re-asking the user. The server renders this as a <selection ref="...">...</selection> block inside <ui_state>.
FieldTypeDescription
refstringRef of the element carrying the selection. For document selections this is the closest common-ancestor element with a ref; for input/textarea it is the input element itself.
textstringThe selected text. Truncated at 2000 characters with a trailing ellipsis to keep <ui_state> injections bounded.
start_offsetnumberCharacter offset within the input’s value where the selection starts. Only set for <input> and <textarea>. Optional.
end_offsetnumberCharacter offset where the selection ends. Only set for <input> and <textarea>. Optional.

See also