Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-mb-ui-agent.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

ReplyToolMixin is the canonical bundled-tool shape for UIAgent. It exposes a single LLM tool — reply(answer, ...) — that combines a required spoken answer with the optional standard UI actions in one turn-terminating call. Apps compose it by inheriting alongside their UIAgent subclass:
from pipecat_subagents.agents import ReplyToolMixin, UIAgent

class MyUIAgent(ReplyToolMixin, UIAgent):
    ...
The bundled-tool design solves a real failure mode: smaller models often omit the spoken terminator when scroll-and-highlight tools are exposed separately and chainable. Making answer a required argument enforced by the API schema closes that loop — the model can’t call reply(...) without committing to what it will say. The mixin covers the canonical app shapes (pointing, reading, form-fill) and any blend of them. Apps don’t pick a mode up front; the LLM uses whichever fields fit the user’s request per turn, leaving the rest as null. The host class must provide scroll_to, highlight, select_text, click, set_input_value, and respond_to_task (UIAgent does) and must be the target of @tool discovery on the LLM pipeline.
Apps with a different shape — multiple per-domain tools (play, navigate_to_artist, etc.) where each tool is the whole turn — should skip the mixin and write their own @tool methods. Use the action helpers (scroll_to, highlight, select_text, click, set_input_value) on UIAgent directly inside those tool bodies. See Choosing a tool shape for the decision.

ReplyToolMixin

Exposes a single bundled reply(...) LLM tool. One call per turn, no chaining; the required answer argument keeps the model from omitting the spoken terminator.

reply

@tool
async def reply(
    self,
    params: FunctionCallParams,
    answer: str,
    scroll_to: str | None = None,
    highlight: list[str] | None = None,
    select_text: str | None = None,
    fills: list[dict] | None = None,
    click: list[str] | None = None,
) -> None
Reply to the user. Optionally point at content and act on inputs. Action fields are dispatched in a fixed order before the answer is spoken: scroll_to, then highlight, then select_text, then fills, then click, then the spoken answer.

Visual / pointing actions

Draw the user’s attention without changing app state.
  • scroll_to brings an element into view (single ref).
  • highlight flashes elements briefly (list of refs). Best for short emphasis like a button or a fact.
  • select_text puts the page’s text selection on an element (single ref). Best for “this paragraph” / “the section about X” so the user sees exactly what was meant. Persists until the user clicks elsewhere.

State-changing actions

Modify form / app state.
  • fills writes values into inputs (list of {"ref", "value"} objects, multi-fill in one turn).
  • click clicks elements (list of refs in order). Use for checkboxes, radios, submit buttons.

Parameters

ParameterTypeDefaultDescription
paramsFunctionCallParamsFramework-provided tool invocation context.
answerstrThe spoken reply in plain language. One short sentence. No markdown, no symbols.
scroll_tostr | NoneNoneOptional snapshot ref. Scrolls the element into view before speaking.
highlightlist[str] | NoneNoneOptional list of snapshot refs. Visually pulses each element.
select_textstr | NoneNoneOptional snapshot ref. Places the page’s text selection on that element.
fillslist[dict] | NoneNoneOptional list of {"ref": "eN", "value": "..."} objects. Writes each value into the input at ref.
clicklist[str] | NoneNoneOptional list of snapshot refs to click in order. Use for checkboxes, radios, submit buttons.

Examples

A pointing turn:
reply(answer="Here's the iPhone 17.", scroll_to="e5", highlight=["e5"])
A read-side deixis turn:
reply(
    answer="That paragraph explains the rationale.",
    select_text="e42",
)
A form-fill turn:
reply(
    answer="Filled in your details.",
    fills=[
        {"ref": "e3", "value": "Marie Curie"},
        {"ref": "e4", "value": "marie@example.com"},
    ],
    click=["e7"],
)
Defensive guards on the list arguments skip non-conforming entries (a malformed fills element such as None or a bare string is dropped rather than raising) so a transient model hiccup doesn’t leave the single-flight task lock held until the voice-task timeout fires.