Documentation Index
Fetch the complete documentation index at: https://daily-mb-ui-agent.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
UIAgent extends LLMContextAgent (an internal LLMAgent subclass that bundles an LLMContext plus the user/assistant aggregator pair) with a UI-aware loop:
- UI events sent from the client via
PipecatClient.sendUIEvent(event, payload)are dispatched to methods marked with@on_ui_event(name), and (by default) appended to the LLM context as<ui_event name="...">payload</ui_event>developer messages. - Accessibility snapshots captured by the client arrive as first-class
ui-snapshotRTVI messages, then the bridge republishes them internally under a reserved event name and stores the latest<ui_state>. By default, the snapshot is auto-injected into the LLM context at the start of every task request, so the agent always reasons over the current screen. - UI commands flow back the other way via
send_command(name, payload). The bridge installed byattach_ui_bridgetranslates each command into anRTVIUICommandFrame(orRTVIUITaskFramefor task-lifecycle traffic) on the root agent’s pipeline; theRTVIObserverwraps it into aui-command/ui-taskenvelope on the wire and the client receives it throughRTVIEvent.UICommand,onUICommand, or React’suseUICommandHandler.
Apps that want a single bundled LLM tool covering the full action vocabulary
can inherit
ReplyToolMixin
alongside UIAgent. Apps with their own tool surface call the action helpers
(scroll_to, highlight, select_text, click, set_input_value) directly
inside custom @tool methods. See Choosing a tool
shape.Configuration
Inherits all parameters fromLLMAgent.
Unique name for this agent.
Whether the agent starts active. Defaults to
True for UIAgent (vs. False
on LLMAgent / LLMContextAgent) because the canonical UIAgent role is an
always-on delegate that should self-activate as soon as its pipeline starts.
Pass active=False only if you have a handoff use case.Bridge configuration. See
BaseAgent for
details. Setting this together with the default auto_inject_ui_state=True
raises ValueError — see auto_inject_ui_state below.Optional pre-built
LLMContext. Forwarded to LLMContextAgent. Note that any
messages seeded here are part of the mutable task history and are cleared
before each task when keep_history=False (the default), since the reset
replaces the entire message list. For durable UI / app instructions, pass them
via the LLM service’s system_instruction instead.Optional user-aggregator parameters. Forwarded to
LLMContextAgent.Optional assistant-aggregator parameters. Forwarded to
LLMContextAgent. Set
enable_auto_context_summarization=True here to keep the running context
bounded over long sessions when keep_history=True.When
True, every UI event received is appended to the LLM context as a <ui_event name="...">payload</ui_event> developer message before the matching
@on_ui_event handler (if any) runs. Override
render_ui_event to customize the rendered string, or set
this to False to disable injection entirely.When
True, the latest <ui_state> snapshot is appended to the LLM context
at the start of every task request, so the agent always reasons over the
current screen. Set to False to call inject_ui_state
yourself.Setting this together with a non-None bridged value raises ValueError:
auto-injection fires on on_task_request, but a bridged UIAgent receives
user voice frames through the bridge instead of task messages, so the
snapshot would never reach the LLM context. The canonical pattern is a
non-bridged UIAgent receiving delegated tasks from a separate voice
LLMAgent. To use a bridged UIAgent (advanced cases), pass
auto_inject_ui_state=False explicitly and call inject_ui_state()
yourself.When
False (the default), the LLM context is cleared at the start of every
task: each task starts from an empty messages list, the current <ui_state>
is injected, and the user’s query follows. Best for the canonical
stateless-delegate role where the voice agent owns dialog state and the UI
agent’s job is “given the current screen, do something.”When True, conversation history accumulates across tasks (queries, prior
<ui_state> blocks, tool calls, responses) so the LLM can reason over
multi-turn references like “show me the next one” or “tell me about the Pro
version of that.” History accumulation grows token usage and can confuse
smaller models when multiple <ui_state> snapshots are present at once;
opt in only when the dialog continuity is worth the cost. Pair with
enable_auto_context_summarization=True on the assistant aggregator (via
assistant_params) to keep the running context bounded over long sessions.
Apps in keep_history=True mode can call await self.reset_context() to
clear manually.In keep_history=False mode, messages pre-seeded via context= are
also cleared on the first task. Durable instructions belong in the LLM
service’s system_instruction setting (e.g. concatenate with the
UI_STATE_PROMPT_GUIDE
constant); use keep_history=True if seeded messages genuinely need to
live in the conversation history.Properties
Inherits all properties fromLLMAgent.
current_task
None when idle. Set when
on_task_request runs and cleared by
respond_to_task. Lets @tool methods inspect the
in-flight task without threading the message through every call.
Abstract Methods
build_llm
LLMAgent.build_llm.
Returns: An LLMService instance.
Lifecycle Hooks
on_bus_message
BaseAgent.on_bus_message that dispatches UI events alongside
base lifecycle handling.
When a BusUIEventMessage
arrives:
- The reserved
UI_SNAPSHOT_EVENT_NAMEupdates the stored snapshot and returns. No<ui_event>injection, no handler dispatch. - Other events trigger context injection followed by handler dispatch: a
<ui_event>developer message is appended to the LLM context (wheninject_events=True), then the matching@on_ui_event(name)handler (if any) runs in its own asyncio task.
super() when overriding so base lifecycle handling continues
to run.
on_task_request
BaseAgent.on_task_request that:
- Acquires the per-agent single-flight task lock (held until
respond_to_taskoron_task_cancelledfires). - Records the in-flight task on
current_taskforrespond_to_taskto close out. - In
keep_history=Falsemode (the default), clears the LLM context so each task starts fresh. - Auto-injects the latest
<ui_state>so the agent reasons over the current screen.
auto_inject_ui_state=False if your app
wants to drive injection manually (e.g. inject only on specific task names).
Task tracking and lock acquisition happen regardless.
The single-flight lock keeps overlapping requests queued rather than
interleaving their context mutations: a second request arrives while the
first is in-flight, sits in acquire(), and proceeds only after the first
calls respond_to_task (or is cancelled).
| Parameter | Type | Description |
|---|---|---|
message | BusTaskRequestMessage | The incoming task request from the bus. |
on_task_cancelled
BaseAgent.on_task_cancelled that releases the single-flight
task lock when the in-flight task is cancelled. Without this hook, a
cancellation would strand the lock (because the framework’s cancel path
sends a CANCELLED response directly via send_task_response, bypassing
respond_to_task and its lock-release) and every subsequent task request
would block forever.
Idempotent: if a tool calls respond_to_task concurrently and clears the
slot first, this hook short-circuits.
| Parameter | Type | Description |
|---|---|---|
message | BusTaskCancelMessage | The cancellation message from the bus. |
Methods
respond_to_task
send_task_response that looks up the current
task from current_task so @tool methods don’t have to
thread the task_id through every call. Clears current_task and
releases the single-flight task lock (acquired in
on_task_request) so the next queued task can proceed.
Calling a second time is a no-op.
speak is the convention the SDK demos use for “text the voice agent
should hand verbatim to TTS.” When provided, it’s merged into the
response dict as {"speak": speak}. Apps that don’t follow the convention
can pass a fully formed response dict and leave speak unset.
No-op when there is no task in flight (e.g. the tool was invoked outside a
task dispatch).
| Parameter | Type | Default | Description |
| ---------- | ----------------------------------------------------------------- | ----------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| response | dict | None | None | Result data dict. Merged with the speak key when speak is provided. |
| speak | str | None | None | Optional short text for verbatim TTS. Omit (or pass None) to leave the response without a speak key — the voice agent stays silent for the turn. |
| status | TaskStatus | COMPLETED | Completion status. |
send_command
BusUICommandMessage,
which the bridge installed by attach_ui_bridge turns into an
RTVIUICommandFrame on the root agent’s pipeline; the RTVIObserver
wraps that into a ui-command envelope on the wire. Client-side handlers
subscribed through RTVIEvent.UICommand, onUICommand, or one of the React
useDefault*Handler hooks dispatch on the command name.
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | App-defined command name (e.g. "toast", "navigate", or any app-specific name). | |
payload | Any | None | Pydantic model, dataclass, dict, or None. See below. |
payload is normalized before sending:
- Pydantic
BaseModelinstance (including the built-in command models inpipecat.processors.frameworks.rtvi.models): converted to a plain dict viamodel_dump(). - Dataclass instance: converted to a plain dict via
dataclasses.asdict. dict: forwarded as-is.None: forwarded as an empty dict.
Action helpers
Plain instance methods that wrapsend_command with the standard payload
models. They are NOT LLM tools — subclasses that want them exposed to the
LLM either inherit
ReplyToolMixin
(which calls these helpers under the hood from a single bundled
reply(...) tool), or write their own @tool methods that delegate to
these helpers.
scroll_to
scroll_to UI command for the given snapshot ref. Convenience
wrapper around send_command("scroll_to", ScrollTo(ref=ref)).
highlight
highlight UI command for the given snapshot ref. Convenience
wrapper around send_command("highlight", Highlight(ref=ref)).
select_text
select_text UI command for the given snapshot ref. With
start_offset / end_offset set, the client’s standard handler selects a
sub-range of the target’s text; with both None (the default), the entire
target text is selected. Offsets are character offsets over the target’s
text content.
click
click UI command for the given snapshot ref. Use inside custom
@tool bodies for state-changing form actions like checkboxes, radio
buttons, and submit buttons. The standard client handler silently no-ops on
disabled targets so the agent can’t bypass UI affordances.
set_input_value
set_input_value UI command for the given input/textarea ref.
With replace=True (the default) the field’s existing value is overwritten;
with replace=False the value is appended.
user_task_group
ui-task
envelopes. Behaves exactly like
task_group(...), but the
client’s task reducer (useUITasks on React) consumes the envelopes
automatically — group_started on entry, task_update for each worker
update, task_completed for each worker response, and group_completed on
exit.
Workers don’t need to change. Any send_task_update they emit against the
group’s task_id is forwarded automatically.
Returns a UserTaskGroupContext for use with async with. See the
Async tasks and lifecycle
guide section for a worked example.
| Parameter | Type | Default | Description |
|---|---|---|---|
*agent_names | str | Names of the worker agents to dispatch to. | |
name | str | None | None | Optional task name for routing to named @task handlers on the workers. |
payload | dict | None | None | Optional structured data describing the work. |
timeout | float | None | None | Optional timeout in seconds covering both the ready-wait and task execution. |
cancel_on_error | bool | True | Whether to cancel the group if a worker errors. |
label | str | None | None | Optional human-readable label surfaced to the client (titles the in-flight task card). |
cancellable | bool | True | Whether the client may request cancellation of this group via ui-cancel-task. |
start_user_task_group
user_task_group. Dispatches
the group in the background and returns the task_id. The SDK manages the
asyncio task that holds the context open while workers run, so callers don’t
need to spawn one themselves.
Use this when an LLM @tool body wants to kick off a task group and return
immediately so the voice agent unblocks. Use the user_task_group context
manager when the caller wants to consume worker events inline (async for event in tg) or react to results before returning.
Worker exceptions inside the dispatched context are logged but do not
propagate; cancellation works the same way it does for the context-manager
form (the user clicks Cancel, the SDK turns it into a ui-cancel-task
envelope and the framework cancels the group).
reset_context
system_instruction) is unaffected.
Apps in keep_history=True mode call this when they want to deliberately
start over (e.g. the user said “start fresh”). Apps in the default
keep_history=False mode don’t need to call this; the per-task reset runs
automatically.
render_ui_state
<ui_state> block.
Produces Playwright-MCP-style indented text with stable UI node refs. Each
line is - role "name" [level=N] [cols=N] [rows=N] [state] [ref=eN], with
children nested one indent deeper. Returns an empty string if no snapshot has
been received yet.
Override to customize the rendered form (different formatting, additional
metadata, alternate truncation).
<ui_state> string, or empty string when no
snapshot is available.
inject_ui_state
<ui_state> block to the LLM context as a developer
message. No-op when no snapshot has been received. The frame is queued with
run_llm=False so the snapshot is treated as context, not a user turn.
Apps with auto_inject_ui_state=True (the default) get this for free at the
start of every task. Call this manually only when you want extra injections
between tasks (e.g. inside an @on_ui_event handler that performs a chain of
work).
visible_nodes
state does not contain "offscreen", in depth-first order
matching <ui_state>.
Useful for code paths that want to reason about visible interactables without
parsing the rendered text. Returns an empty list when no snapshot has been
received yet.
Returns: Flat list of visible A11yNode-shaped dicts.
render_ui_event
<ui_event> XML tag with a name
attribute and a JSON-encoded payload as inner text:
| Parameter | Type | Description |
|---|---|---|
message | BusUIEventMessage | The UI event to render. |