From 79bc7918ab238a1f06ca8292a8df56d3f42bb305 Mon Sep 17 00:00:00 2001 From: Chad Bailey Date: Wed, 25 Mar 2026 17:19:53 +0000 Subject: [PATCH 1/2] docs: add comprehensive frame reference pages Add five new documentation pages covering all 128 frame classes in Pipecat: - overview.mdx: frame categories, properties, direction, mixins, common patterns - data-frames.mdx: audio, image, text, transcription, TTS, transport, DTMF frames - control-frames.mdx: lifecycle, pause/resume, LLM boundaries, function calling, TTS state, settings - system-frames.mdx: interruptions, VAD, user/bot state, mute system, errors, metrics - llm-frames.mdx: context management, thinking, tool config, summarization Also updates docs.json navigation to include the new Frames section. Co-Authored-By: Claude Opus 4.6 --- docs.json | 10 + server/frames/control-frames.mdx | 350 +++++++++++++++++++++++ server/frames/data-frames.mdx | 278 ++++++++++++++++++ server/frames/llm-frames.mdx | 303 ++++++++++++++++++++ server/frames/overview.mdx | 302 +++++++++++++++++++ server/frames/system-frames.mdx | 477 +++++++++++++++++++++++++++++++ 6 files changed, 1720 insertions(+) create mode 100644 server/frames/control-frames.mdx create mode 100644 server/frames/data-frames.mdx create mode 100644 server/frames/llm-frames.mdx create mode 100644 server/frames/overview.mdx create mode 100644 server/frames/system-frames.mdx diff --git a/docs.json b/docs.json index d7af84da..1d3a6919 100644 --- a/docs.json +++ b/docs.json @@ -399,6 +399,16 @@ } ] }, + { + "group": "Frames", + "pages": [ + "server/frames/overview", + "server/frames/data-frames", + "server/frames/control-frames", + "server/frames/system-frames", + "server/frames/llm-frames" + ] + }, { "group": "Pipeline", "pages": [ diff --git a/server/frames/control-frames.mdx b/server/frames/control-frames.mdx new file mode 100644 index 00000000..c4cf01f2 --- /dev/null +++ b/server/frames/control-frames.mdx @@ -0,0 +1,350 @@ +--- +title: "Control Frames" +description: "Reference for ControlFrame types: pipeline lifecycle, response boundaries, service settings, and runtime configuration" +--- + +ControlFrames are queued and processed in order alongside data frames. They signal boundaries, state changes, and configuration updates within the pipeline. Unlike system frames, control frames respect ordering guarantees — they won't skip ahead of data frames already in the queue. Control frames are cancelled on user interruption unless combined with `UninterruptibleFrame`. See the [frames overview](/server/frames/overview) for base class details and the full frame hierarchy. + +## Pipeline Lifecycle + +### EndFrame + +Signals graceful pipeline shutdown. The transport stops sending, closes its threads, and the pipeline winds down completely. Inherits from both `ControlFrame` and `UninterruptibleFrame`, so it cannot be cancelled by interruption. + + + Optional reason for the shutdown, passed along for logging or inspection. + + +### StopFrame + +Stops the pipeline but keeps processors in a running state. Useful when you need to halt frame flow without tearing down the entire processor graph. Inherits from `ControlFrame` and `UninterruptibleFrame`. + + +### OutputTransportReadyFrame + +Indicates that the output transport is ready to receive frames. Processors waiting on transport availability can use this as their signal to begin sending. + + +### HeartbeatFrame + +Used for pipeline health monitoring. Processors can observe these to detect stalls or measure latency. + + + Timestamp value for the heartbeat. + + +## Processor Pause/Resume + +While a processor is paused, incoming frames accumulate in its internal queue rather than being dropped. Once the processor is resumed, it drains the queue and processes all buffered frames in the order they arrived. + +For example, the TTS service pauses itself while synthesizing a `TTSSpeakFrame`. If new text frames arrive during synthesis, they queue up instead of producing overlapping audio. The TTS resumes when `BotStoppedSpeakingFrame` (a system frame) arrives, and the buffered frames are processed in order. + +Internally, each processor has two queues: a high-priority input queue for system frames and a process queue for everything else. Pausing blocks the process queue, but system frames continue to flow through the input queue. This is why the typical pattern is for a processor to pause itself and then resume in response to a system frame. + + +`FrameProcessorResumeFrame` is a control frame, which means it enters the same process queue that pausing blocks. If data frames have already queued up ahead of it, the resume frame will be stuck behind them and the processor will stay paused. To resume a paused processor from outside, use the system frame variant `FrameProcessorResumeUrgentFrame` instead — it bypasses the process queue entirely. See [System Frames](/server/frames/system-frames#processor-pauseresume-urgent). + + +### FrameProcessorPauseFrame + +Pauses a specific processor. Queued in order, so the processor finishes handling any frames ahead of it before pausing. + + + The processor to pause. + + +### FrameProcessorResumeFrame + +Resumes a previously paused processor, releasing all buffered frames for processing. + + + The processor to resume. + + + +Because this is a control frame, it will be blocked behind any data frames that queued up while the processor was paused. Use `FrameProcessorResumeUrgentFrame` if the processor may have buffered frames. + + +## LLM Response Boundaries + +These frames bracket LLM output, letting downstream processors (aggregators, TTS services, transports) know when a response starts and ends. + +### LLMFullResponseStartFrame + +Marks the beginning of an LLM response. Followed by one or more `TextFrame`s and terminated by `LLMFullResponseEndFrame`. + + +### LLMFullResponseEndFrame + +Marks the end of an LLM response. + + +### VisionFullResponseStartFrame + +Beginning of a vision model response. Inherits from `LLMFullResponseStartFrame`. + + +### VisionFullResponseEndFrame + +End of a vision model response. Inherits from `LLMFullResponseEndFrame`. + + +### LLMAssistantPushAggregationFrame + +Forces the assistant aggregator to commit its buffered text to context immediately, rather than waiting for the normal end-of-response boundary. + + +## LLM Context Summarization + +Frames that coordinate context summarization: compressing conversation history to stay within token limits. + +### LLMSummarizeContextFrame + +Triggers manual context summarization. Push this frame to request that the LLM summarize the current conversation context. + + + Optional configuration controlling summarization behavior. + + +### LLMContextSummaryRequestFrame + +Internal request from the aggregator to the LLM service, asking it to produce a summary. You typically won't push this yourself — the aggregator creates it in response to `LLMSummarizeContextFrame` or automatic summarization triggers. + + + Unique identifier for this summarization request. + + + + The conversation context to summarize. + + + + Minimum number of recent messages to preserve after summarization. + + + + Target token count for the summarized context. + + + + Prompt instructing the LLM how to summarize. + + + + Optional timeout in seconds for the summarization request. + + +### LLMContextSummaryResultFrame + +The LLM's summarization result, sent back to the aggregator. Inherits from both `ControlFrame` and `UninterruptibleFrame` to ensure the result is never dropped. + + + Matches the originating request. + + + + The generated summary text. + + + + Index of the last message included in the summary. + + + + Error message if summarization failed, otherwise `None`. + + +## LLM Thought Frames + +Bracket extended thinking output from LLMs that support it (e.g., Claude with extended thinking enabled). + +### LLMThoughtStartFrame + +Marks the beginning of LLM extended thinking content. + + + Whether to append thought content to the conversation context. Raises `ValueError` if set to `True` without specifying `llm`. + + + + Identifier for the LLM producing the thought. Required when `append_to_context` is `True`. + + +### LLMThoughtEndFrame + +Marks the end of LLM extended thinking content. + + + Thought signature, if provided by the LLM. Anthropic models include a signature that must be preserved when appending thoughts back to context. + + +## Function Calling + +### FunctionCallInProgressFrame + +Indicates that a function call is currently executing. Inherits from `ControlFrame` and `UninterruptibleFrame`, ensuring it reaches downstream processors even during interruption. + + + Name of the function being called. + + + + Unique identifier for this tool call. + + + + Arguments passed to the function. + + + + Whether the function call should be cancelled if the user interrupts. + + +## TTS State + +### TTSStartedFrame + +Signals the beginning of a TTS audio response. + + + Identifier linking this TTS output to its originating context. + + +### TTSStoppedFrame + +Signals the end of a TTS audio response. + + + Identifier linking this TTS output to its originating context. + + +## Service Settings + +Runtime settings updates for LLM, TTS, STT, and other services. These let you change service configuration mid-conversation without rebuilding the pipeline. + +### ServiceUpdateSettingsFrame + +Base frame for runtime service settings updates. Inherits from `ControlFrame` and `UninterruptibleFrame`. + + + Dictionary of settings to update. + + + + Typed settings delta. Takes precedence over the `settings` dict when both are provided. + + + + Target a specific service instance. When `None`, the frame applies to the first matching service in the pipeline. + + +### LLMUpdateSettingsFrame + +Update LLM service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. + + +### TTSUpdateSettingsFrame + +Update TTS service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. + + +### STTUpdateSettingsFrame + +Update STT service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. + + +## Audio Processing + +### VADParamsUpdateFrame + +Update Voice Activity Detection parameters at runtime. + + + New VAD parameters to apply. + + +### FilterControlFrame + +Base frame for audio filter control. Subclass this for custom filter commands. + + +### FilterUpdateSettingsFrame + +Update audio filter settings. Inherits from `FilterControlFrame`. + + + Filter settings to update. + + +### FilterEnableFrame + +Enable or disable an audio filter. Inherits from `FilterControlFrame`. + + + `True` to enable the filter, `False` to disable it. + + +### MixerControlFrame + +Base frame for audio mixer control. + + +### MixerUpdateSettingsFrame + +Update audio mixer settings. Inherits from `MixerControlFrame`. + + + Mixer settings to update. + + +### MixerEnableFrame + +Enable or disable an audio mixer. Inherits from `MixerControlFrame`. + + + `True` to enable the mixer, `False` to disable it. + + +## Service Switching + +### ServiceSwitcherFrame + +Base frame for service switching operations. + + +### ManuallySwitchServiceFrame + +Request a manual switch to a different service instance. Inherits from `ServiceSwitcherFrame`. + + + The service to switch to. + + +### ServiceSwitcherRequestMetadataFrame + +Request that a service re-emit its metadata. Useful after switching services to ensure downstream processors have current configuration. + + + The service to request metadata from. + + +## Task Frames + +Task frames are pushed upstream to the pipeline task, which converts them into the appropriate downstream frame. This indirection lets processors request pipeline-level actions without needing direct access to the pipeline task. + +### TaskFrame + +Base frame for task control. Inherits from `ControlFrame`. + + +### EndTaskFrame + +Request graceful pipeline shutdown. The pipeline task converts this into an `EndFrame` and pushes it downstream. Inherits from `TaskFrame` and `UninterruptibleFrame`. + + + Optional reason for the shutdown request. + + +### StopTaskFrame + +Request pipeline stop while keeping processors alive. Converted to a `StopFrame` downstream. Inherits from `TaskFrame` and `UninterruptibleFrame`. diff --git a/server/frames/data-frames.mdx b/server/frames/data-frames.mdx new file mode 100644 index 00000000..2b202da7 --- /dev/null +++ b/server/frames/data-frames.mdx @@ -0,0 +1,278 @@ +--- +title: "Data Frames" +description: "Reference for DataFrame types: audio, image, text, transcription, and transport messages" +--- + +## Overview + +DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order, and any pending data frames are discarded when a user interrupts. See the [Frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. + +```python +from pipecat.frames.frames import TextFrame, OutputAudioRawFrame, TTSSpeakFrame +``` + +## Audio Frames + +These frames carry raw audio through the pipeline toward the output transport. Each inherits the `audio`, `sample_rate`, `num_channels`, and `num_frames` fields from the [`AudioRawFrame`](/server/frames/overview#audiorawframe) mixin. + +### OutputAudioRawFrame + +A chunk of raw audio destined for the output transport. Use the inherited `transport_destination` field when your transport supports multiple audio tracks. + +Inherits from `DataFrame`, `AudioRawFrame`. + + +### TTSAudioRawFrame + +Audio generated by a TTS service, ready for playback. + +Inherits from `OutputAudioRawFrame`. + + + Identifier for the TTS context that generated this audio. + + +### SpeechOutputAudioRawFrame + +Audio from a continuous speech stream. The stream may contain silence frames intermixed with speech, so downstream processors may need to distinguish between the two. + +Inherits from `OutputAudioRawFrame`. + + +## Image Frames + +Frames for carrying image data to the output transport. Each inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/server/frames/overview#imagerawframe) mixin. + +### OutputImageRawFrame + +An image for display by the output transport. Supports the `transport_destination` field for transports with multiple video tracks. + +Inherits from `DataFrame`, `ImageRawFrame`. + + + +The `sync_with_audio` field (default `False`) is set internally, not via the constructor. When `True`, the image is queued with audio frames so it displays only after all preceding audio has been sent. When `False`, the transport displays it immediately. + + +### URLImageRawFrame + +An output image with an associated download URL, typically from a third-party image generation service. + +Inherits from `OutputImageRawFrame`. + + + URL where the image can be downloaded. + + +### AssistantImageRawFrame + +An image generated by the assistant for both display and inclusion in LLM context. The superclass handles display; the additional fields here carry the original image data in a format suitable for direct use in LLM context messages. + +Inherits from `OutputImageRawFrame`. + + + Original image data for use in LLM context messages without further encoding. + + + + MIME type of the original image data. + + +### SpriteFrame + +An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport's `camera_out_framerate` parameter. + +Inherits from `DataFrame`. + + + Ordered list of image frames that make up the sprite animation. + + +## Text Frames + +Text content at various stages of processing: raw text, LLM output, aggregated results, and TTS input. + +### TextFrame + +The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors. + +Inherits from `DataFrame`. + + + The text content. + + + +Several non-constructor fields control downstream behavior: +- `skip_tts` (default `None`): when set, tells the TTS service to skip this text +- `includes_inter_frame_spaces` (default `False`): indicates whether leading/trailing spaces are already included +- `append_to_context` (default `True`): whether this text should be appended to the LLM context + + +### LLMTextFrame + +Text generated by an LLM service. Behaves like a `TextFrame` with `includes_inter_frame_spaces` set to `True`, since LLM services include all necessary spacing. + +Inherits from `TextFrame`. + + +### AggregatedTextFrame + +Multiple text frames combined into a single frame for processing or output. + +Inherits from `TextFrame`. + + + Method used to aggregate the text frames. + + + + Identifier for the TTS context associated with this text. + + +### VisionTextFrame + +Text output from a vision model. Functionally identical to `LLMTextFrame` but distinguished by type for routing purposes. + +Inherits from `LLMTextFrame`. + + +### TTSTextFrame + +Text that has been sent to a TTS service for synthesis. + +Inherits from `AggregatedTextFrame`. + + + Identifier for the TTS context that generated this text. + + +## Transcription Frames + +Frames produced by speech-to-text services at different stages of recognition: interim results, final transcriptions, and translations. + +### TranscriptionFrame + +A non-interim transcription result from an STT service: the service's best recognition of what the user said, as opposed to the streaming partial results in `InterimTranscriptionFrame`. + +Inherits from `TextFrame`. + + + Identifier for the user who spoke. + + + + When the transcription occurred. + + + + Detected or specified language of the speech. + + + + Raw result object from the STT service. + + + + Whether the STT service has explicitly committed this transcription via a finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics) support this; others don't, so it defaults to `False`. Turn detection strategies can use this flag to trigger the bot's response immediately rather than waiting for a timeout. + + +### InterimTranscriptionFrame + +A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by a `TranscriptionFrame` once the STT service produces its result. + +Inherits from `TextFrame`. + + + The partial transcription text. + + + + Identifier for the user who spoke. + + + + When the interim transcription occurred. + + + + Detected or specified language of the speech. + + + + Raw result object from the STT service. + + +### TranslationFrame + +A translated transcription, typically placed in the transport's receive queue when a participant speaks in a different language. + +Inherits from `TextFrame`. + + + Identifier for the user who spoke. + + + + When the translation occurred. + + + + Target language of the translation. + + +## LLM Context Timestamp + +### LLMContextAssistantTimestampFrame + +Carries timestamp information for assistant messages in the LLM context. Used internally to track when assistant responses were generated. + +Inherits from `DataFrame`. + + + Timestamp when the assistant message was created. + + +## TTS Frames + +### TTSSpeakFrame + +Sends text to the pipeline's TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for each `TTSSpeakFrame`, whereas `TextFrame`s produced during an LLM response are grouped under the same turn context. + +Inherits from `DataFrame`. + + + The text to be spoken. + + + + Whether to append the spoken text to the LLM context. + + +## Transport Message Frames + +### OutputTransportMessageFrame + +A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation. + +Inherits from `DataFrame`. + + + The transport message payload. + + +## DTMF Frames + +### OutputDTMFFrame + +A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits the `button` field from the `DTMFFrame` mixin, which holds the keypad entry that was pressed. + +Inherits from `DTMFFrame`, `DataFrame`. + + + The DTMF keypad entry to send. + + + +For transports that support multiple dial-out destinations, set the `transport_destination` field (inherited from `Frame`) to specify which destination receives the DTMF tone. + diff --git a/server/frames/llm-frames.mdx b/server/frames/llm-frames.mdx new file mode 100644 index 00000000..47ae8c55 --- /dev/null +++ b/server/frames/llm-frames.mdx @@ -0,0 +1,303 @@ +--- +title: "LLM Frames" +description: "Reference for LLM-related frames: context management, messages, thinking, tool configuration, and function calling" +--- + +This page collects all frames related to LLM operations. These frames span multiple base types: some are DataFrames (message content), some are ControlFrames (response boundaries, settings), and a few are SystemFrames (function call lifecycle). They are grouped here by function rather than by base class. See the [frames overview](/server/frames/overview) for base class behavior, interruption rules, and properties common to all frames. + +```python +from pipecat.frames.frames import ( + LLMContextFrame, + LLMMessagesAppendFrame, + LLMMessagesUpdateFrame, + LLMRunFrame, + FunctionCallResultFrame, +) +``` + +## Context Management + +Frames that create, modify, or trigger processing of the LLM conversation context. + +### LLMContextFrame + +Contains a complete LLM context. Acts as a signal to LLM services to ingest the provided context and generate a response. + +Inherits directly from `Frame` (not `DataFrame`, `ControlFrame`, or `SystemFrame`). + + + The LLM context containing messages, tools, and configuration. + + +### LLMMessagesAppendFrame + +Appends messages to the current conversation context without replacing existing ones. + +Inherits from `DataFrame`. + + + List of message dictionaries to append. + + + + Whether the LLM should process the updated context immediately. When `None`, the default behavior of the context aggregator applies. + + +### LLMMessagesUpdateFrame + +Replaces the current context messages entirely with a new set. + +Inherits from `DataFrame`. + + + List of message dictionaries to replace the current context. + + + + Whether the LLM should process the updated context immediately. When `None`, the default behavior of the context aggregator applies. + + +### LLMRunFrame + +Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled. + +Inherits from `DataFrame`. + + +### LLMContextAssistantTimestampFrame + +Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context. + +Inherits from `DataFrame`. + + + Timestamp when the assistant message was created. + + +## Response Boundaries + +These frames bracket LLM output. Between start and end, the LLM emits `TextFrame`s (or `LLMThoughtTextFrame`s for thinking content). They are fully documented on the control frames page and cross-referenced here. + +### LLMFullResponseStartFrame + +Marks the beginning of an LLM response. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). + +Inherits from `ControlFrame`. + +### LLMFullResponseEndFrame + +Marks the end of an LLM response. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). + +Inherits from `ControlFrame`. + +## Thinking / Extended Reasoning + +LLMs that support chain-of-thought reasoning (such as Anthropic's extended thinking) emit thought frames between response boundaries. Thought text is intentionally not a `TextFrame` subclass, which prevents it from triggering TTS or being captured by text aggregators. + +### LLMThoughtStartFrame + +Marks the beginning of extended thinking content. See [Control Frames](/server/frames/control-frames#llm-thought-frames) for full parameter documentation. + +Inherits from `ControlFrame`. + +### LLMThoughtTextFrame + +A chunk of thought or reasoning text from the LLM. + +Inherits from `DataFrame`. + + + The text (or text chunk) of the thought. + + + +This is a `DataFrame`, not a `TextFrame` subclass. TTS services and text aggregators will not process it. + + +### LLMThoughtEndFrame + +Marks the end of extended thinking content. See [Control Frames](/server/frames/control-frames#llm-thought-frames) for full parameter documentation. + +Inherits from `ControlFrame`. + +## Tool Configuration + +Frames for configuring LLM function calling behavior and output settings at runtime. + +### LLMSetToolsFrame + +Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider. + +Inherits from `DataFrame`. + + + List of tool/function definitions for the LLM. + + +### LLMSetToolChoiceFrame + +Configures how the LLM selects tools during function calling. + +Inherits from `DataFrame`. + + + Tool choice setting: `"none"` disables tool use, `"auto"` lets the LLM decide, `"required"` forces a tool call, or a dict specifying a particular tool. + + +### LLMEnablePromptCachingFrame + +Toggles prompt caching for LLMs that support it. + +Inherits from `DataFrame`. + + + Whether to enable prompt caching. + + +### LLMConfigureOutputFrame + +Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud. + +Inherits from `DataFrame`. + + + When `True`, LLM tokens are added to context but not passed to TTS. + + +## Function Calling + +Function calling involves frames from multiple base types. `FunctionCallInProgressFrame` and `FunctionCallResultFrame` are both uninterruptible: once a function call starts, its progress and result must reach the context aggregator to keep the conversation consistent. + +### Helper Dataclasses + +These are plain dataclasses used as fields within function calling frames, not frames themselves. + +#### FunctionCallFromLLM + +Represents a function call returned by the LLM, ready for execution. + + + The name of the function to call. + + + + A unique identifier for the function call. + + + + The arguments to pass to the function. + + + + The LLM context at the time the function call was made. + + +#### FunctionCallResultProperties + +Configures how a function call result is handled after execution. + + + Whether to run the LLM after receiving this result. + + + + Async callback to execute when the context is updated with the result. + + +### Frames + +#### FunctionCallInProgressFrame + +Indicates that a function call is currently executing. See [Control Frames](/server/frames/control-frames#function-calling) for full parameter documentation. + +Inherits from `ControlFrame`, `UninterruptibleFrame`. + +#### FunctionCallResultFrame + +Contains the result of a completed function call execution. Uninterruptible to ensure the result always reaches the context aggregator. + +Inherits from `DataFrame`, `UninterruptibleFrame`. + + + Name of the function that was executed. + + + + Unique identifier for the function call. + + + + Arguments that were passed to the function. + + + + The result returned by the function. + + + + Whether to run the LLM after this result. Overrides the default behavior. + + + + Additional properties for result handling. + + +#### FunctionCallsStartedFrame + +Signals that one or more function calls are about to begin executing. As a system frame, this is never discarded during interruption. + +Inherits from `SystemFrame`. + + + Sequence of function calls that will be executed. + + +#### FunctionCallCancelFrame + +Signals that a function call was cancelled, typically due to user interruption when the function's `cancel_on_interruption` flag is set. + +Inherits from `SystemFrame`. + + + Name of the function that was cancelled. + + + + Unique identifier for the cancelled function call. + + +## Context Summarization + +Frames that coordinate automatic context compression to stay within token limits. These are fully documented on the control frames page and listed here for cross-reference. + +### LLMSummarizeContextFrame + +Triggers manual context summarization. See [Control Frames](/server/frames/control-frames#llm-context-summarization). + +Inherits from `ControlFrame`. + +### LLMContextSummaryRequestFrame + +Internal request from the aggregator to the LLM service for a summary. See [Control Frames](/server/frames/control-frames#llm-context-summarization). + +Inherits from `ControlFrame`. + +### LLMContextSummaryResultFrame + +The LLM's summarization result, delivered back to the aggregator. See [Control Frames](/server/frames/control-frames#llm-context-summarization). + +Inherits from `ControlFrame`, `UninterruptibleFrame`. + +## Settings + +### LLMUpdateSettingsFrame + +Updates LLM service settings at runtime. Inherits the `settings`, `delta`, and `service` parameters from `ServiceUpdateSettingsFrame`. See [Control Frames](/server/frames/control-frames#service-settings) for parameter details. + +Inherits from `ServiceUpdateSettingsFrame`. + +### LLMAssistantPushAggregationFrame + +Forces the assistant aggregator to commit its buffered text to context immediately, rather than waiting for the normal end-of-response boundary. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). + +Inherits from `ControlFrame`. diff --git a/server/frames/overview.mdx b/server/frames/overview.mdx new file mode 100644 index 00000000..506ebae7 --- /dev/null +++ b/server/frames/overview.mdx @@ -0,0 +1,302 @@ +--- +title: "Frames" +description: "Understanding frames — the data containers that carry information through Pipecat pipelines" +--- + +## Overview + +Frames are the fundamental units of data in Pipecat. Every piece of information that moves through a pipeline — audio, text, images, control signals — is wrapped in a frame. Frame processors receive frames, act on them, and push new or modified frames along to the next processor. + +All frames inherit from the base `Frame` class and are Python [dataclasses](https://docs.python.org/3/library/dataclasses.html). + +```python +from pipecat.frames.frames import Frame, TextFrame, TTSAudioRawFrame +``` + +## Frame Categories + +Pipecat has three base frame types, each with different processing behavior: + +| Base Type | Processing | Interruption Behavior | +|-----------|-----------|----------------------| +| `DataFrame` | Queued, processed in order | Cancelled on user interruption | +| `ControlFrame` | Queued, processed in order | Cancelled on user interruption | +| `SystemFrame` | Higher priority, processed in order | **Not** cancelled on user interruption | + +### DataFrame + +Data frames carry the main content flowing through a pipeline: audio chunks, text, images, and LLM messages. They are queued and processed in order. If a user interrupts (starts speaking while the bot is responding), any pending data frames are discarded so the new input can be handled immediately. + +```python +@dataclass +class DataFrame(Frame): + """Processed in order. Cancelled by user interruptions.""" + pass +``` + +Examples: `TextFrame`, `OutputAudioRawFrame`, `LLMMessagesFrame`, `TTSSpeakFrame` + +### ControlFrame + +Control frames signal processing boundaries and configuration changes: response start/end markers, settings updates, and state transitions. They follow the same ordering and interruption rules as data frames. + +```python +@dataclass +class ControlFrame(Frame): + """Processed in order. Cancelled by user interruptions.""" + pass +``` + +Examples: `EndFrame`, `LLMFullResponseStartFrame`, `TTSStartedFrame`, `ServiceUpdateSettingsFrame` + +### SystemFrame + +System frames are high-priority signals that must always be delivered: interruptions, user input, error notifications, and pipeline lifecycle events. Unlike data and control frames, they are never discarded when a user interrupts. + +```python +@dataclass +class SystemFrame(Frame): + """Higher priority. Not cancelled by user interruptions.""" + pass +``` + +Examples: `StartFrame`, `CancelFrame`, `StartInterruptionFrame`, `UserStartedSpeakingFrame`, `InputAudioRawFrame` + +## UninterruptibleFrame Mixin + +Occasionally a data or control frame is too important to discard during an interruption. Adding the `UninterruptibleFrame` mixin protects it: the frame stays in internal queues and any task processing it will not be cancelled. + +```python +@dataclass +class FunctionCallResultFrame(DataFrame, UninterruptibleFrame): + """Must be delivered even if the user interrupts.""" + ... +``` + +Examples: `EndFrame`, `StopFrame`, `FunctionCallResultFrame`, `FunctionCallInProgressFrame` + +## Frame Properties + +Every frame has these properties set automatically: + + + Unique identifier for the frame instance. + + + + Human-readable name combining class name and instance count (e.g., `TextFrame#3`). Useful for debugging. + + + + Presentation timestamp in nanoseconds. Used for audio/video synchronization. + + + + Dictionary for arbitrary frame metadata. + + + + Name of the transport source that created this frame. + + + + Name of the transport destination for this frame. Used when a transport supports multiple output tracks. + + +## Frame Direction + +Frames flow through the pipeline in one of two directions: + +```python +from pipecat.processors.frame_processor import FrameDirection + +class FrameDirection(Enum): + DOWNSTREAM = 1 # Input → Output (default) + UPSTREAM = 2 # Output → Input +``` + +**Downstream** is the default. In a typical voice AI pipeline, audio enters from the transport input, gets transcribed, runs through the LLM, converts to speech, and reaches the transport output. + +**Upstream** lets processors send information back toward the start of the pipeline. The most common example: the assistant context aggregator at the end of the pipeline pushes context frames upstream so they flow back to the LLM. + +### Pushing Frames + +Within a frame processor, call `push_frame()` to send a frame to the next processor: + +```python +# Push downstream (default) +await self.push_frame(frame, FrameDirection.DOWNSTREAM) + +# Push upstream +await self.push_frame(frame, FrameDirection.UPSTREAM) +``` + +### Broadcasting Frames + +To send a frame in **both** directions simultaneously, use `broadcast_frame()`: + +```python +# Create and push instances upstream and downstream +await self.broadcast_frame(UserStartedSpeakingFrame) +``` + +Each direction receives its own frame instance, linked by `broadcast_sibling_id`. + +## Mixins + +Beyond `UninterruptibleFrame`, frames use mixins to share common data structures across the hierarchy: + +### AudioRawFrame + +Carries raw audio fields shared by both input and output audio frames. + + + Raw audio bytes in PCM format. + + + + Audio sample rate in Hz (e.g., 16000). + + + + Number of audio channels (e.g., 1 for mono). + + + + Number of audio frames. Calculated automatically from the audio data. + + +### ImageRawFrame + +Carries raw image fields shared by both input and output image frames. + + + Raw image bytes. + + + + Image dimensions as (width, height). + + + + Image format (e.g., `"RGB"`, `"RGBA"`). + + +## Common Patterns + +Most frames are produced and consumed by Pipecat's built-in services. The patterns below cover the frames you're most likely to push yourself in application code. + +### Starting a Conversation + +Add an initial message to the context, then push `LLMRunFrame` to kick off processing: + +```python +@transport.event_handler("on_client_connected") +async def on_client_connected(transport, client): + context.add_message({"role": "user", "content": "Please introduce yourself."}) + await task.queue_frames([LLMRunFrame()]) +``` + +### Injecting a Prompt + +`LLMMessagesAppendFrame` adds messages to the context without replacing what's already there. Set `run_llm=True` to trigger a response immediately: + +```python +message = { + "role": "user", + "content": "The user has been quiet. Ask if they're still there.", +} +await aggregator.push_frame(LLMMessagesAppendFrame([message], run_llm=True)) +``` + +### Speaking Without the LLM + +`TTSSpeakFrame` sends text directly to the TTS service as a standalone utterance, bypassing the LLM entirely: + +```python +@llm.event_handler("on_function_calls_started") +async def on_function_calls_started(service, function_calls): + await tts.queue_frame(TTSSpeakFrame("Let me check on that.")) +``` + +### Ending a Conversation + +Push `EndTaskFrame` upstream to gracefully shut down the pipeline. Pair it with a `TTSSpeakFrame` to say goodbye first: + +```python +await aggregator.push_frame( + TTSSpeakFrame("It seems like you're busy. Have a nice day!") +) +await aggregator.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM) +``` + +### Changing Service Settings at Runtime + +Push settings frames to adjust LLM, TTS, or STT configuration mid-conversation: + +```python +await task.queue_frame( + LLMUpdateSettingsFrame(delta=OpenAILLMService.Settings(temperature=0.1)) +) +``` + +### Updating Tools at Runtime + +Add or replace available function-calling tools while the conversation is active: + +```python +new_tools = ToolsSchema( + standard_tools=[weather_function, restaurant_function] +) +await task.queue_frames([LLMSetToolsFrame(tools=new_tools)]) +``` + +### Playing Sound Effects + +Load audio files and push `OutputAudioRawFrame` directly from a custom processor: + +```python +with wave.open("ding.wav") as f: + ding = OutputAudioRawFrame(f.readframes(-1), f.getframerate(), f.getnchannels()) + +class SoundEffect(FrameProcessor): + async def process_frame(self, frame, direction): + await super().process_frame(frame, direction) + if isinstance(frame, LLMFullResponseEndFrame): + await self.push_frame(ding) + await self.push_frame(frame, direction) +``` + +### Reacting to LLM Response Boundaries + +`LLMFullResponseStartFrame` and `LLMFullResponseEndFrame` bracket every LLM response. Custom processors can watch for these to trigger side effects: + +```python +class ResponseLogger(FrameProcessor): + async def process_frame(self, frame, direction): + await super().process_frame(frame, direction) + if isinstance(frame, LLMFullResponseStartFrame): + logger.info("LLM response started") + elif isinstance(frame, LLMFullResponseEndFrame): + logger.info("LLM response finished") + await self.push_frame(frame, direction) +``` + +## Frame Type Reference + +The individual reference pages below document every frame class, organized by function: + + + + Audio, image, text, transcription, and transport message frames that carry content through the pipeline. + + + Pipeline lifecycle, LLM response boundaries, TTS state, service settings, and filter/mixer configuration. + + + Interruptions, user/bot speaking state, VAD events, errors, metrics, and raw input frames. + + + LLM context management, message manipulation, thinking/reasoning, tool configuration, and function calling. + + diff --git a/server/frames/system-frames.mdx b/server/frames/system-frames.mdx new file mode 100644 index 00000000..3fdf7ede --- /dev/null +++ b/server/frames/system-frames.mdx @@ -0,0 +1,477 @@ +--- +title: "System Frames" +description: "Reference for SystemFrame types: pipeline lifecycle, interruptions, speaking state, input, and diagnostics" +--- + +SystemFrames have higher priority than data and control frames and are never cancelled during user interruptions. They carry signals that must always be delivered: pipeline startup and teardown, error notifications, user input, and speaking state changes. See the [frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. + +```python +from pipecat.frames.frames import StartFrame, CancelFrame, ErrorFrame, InterruptionFrame +``` + +## Pipeline Lifecycle + +### StartFrame + +The first frame pushed into a pipeline, initializing all processors. Every processor receives this before any data or control frames arrive. + +Inherits from `SystemFrame`. + + + Input audio sample rate in Hz. + + + + Output audio sample rate in Hz. + + + + Whether user interruptions are allowed. Deprecated since 0.0.99: use interruption strategies instead. + + + + Enable performance metrics collection from processors. + + + + Enable tracing for pipeline execution. + + + + Enable usage metrics (token counts, API calls) from services. + + + + List of interruption strategies for the pipeline. Deprecated since 0.0.99. + + + + When `True`, only report time-to-first-byte for the initial response rather than every response. + + + + Optional tracing context for distributed tracing integration. + + +### CancelFrame + +Stops the pipeline immediately, skipping any queued frames. Use this when you need to abort without waiting for pending work to drain. + +Inherits from `SystemFrame`. + + + Optional reason for the cancellation. + + +## Errors + +### ErrorFrame + +Carries an error notification, typically pushed upstream so earlier processors can react. + +Inherits from `SystemFrame`. + + + Human-readable error message. + + + + Whether this error is fatal and requires the bot to shut down. + + + + The processor that raised the error. + + + + The underlying exception, if one was caught. + + +### FatalErrorFrame + +An unrecoverable error requiring the bot to shut down. The `fatal` field is always `True`. + +Inherits from `ErrorFrame`. + + +## Processor Pause/Resume (Urgent) + +These are the system-frame variants of `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`. As system frames, they flow through the high-priority input queue rather than the process queue, so they are not blocked by paused state or buffered frames. This makes `FrameProcessorResumeUrgentFrame` the correct way to resume a processor externally — the control frame variant (`FrameProcessorResumeFrame`) would get stuck behind any data frames that queued up during the pause. See [Control Frames](/server/frames/control-frames#processor-pauseresume) for the full explanation. + +### FrameProcessorPauseUrgentFrame + +Pauses a processor immediately, without waiting for queued frames to drain first. + + + The processor to pause. + + +### FrameProcessorResumeUrgentFrame + +Resumes a paused processor immediately, releasing buffered frames. Use this instead of `FrameProcessorResumeFrame` when the processor may have frames queued up. + + + The processor to resume. + + +## Interruptions + +### InterruptionFrame + +Interrupts the pipeline, discarding pending data and control frames. Typically triggered when the user starts speaking during a bot response. + +Inherits from `SystemFrame`. + + +## User Speaking State and Mute + +### User Mute + +User mute is a system for temporarily suppressing user input during specific periods. When muted, the `LLMUserAggregator` drops incoming user frames entirely: `InputAudioRawFrame`, `TranscriptionFrame`, `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, VAD signals, and `InterruptionFrame`. Lifecycle frames (`StartFrame`, `EndFrame`, `CancelFrame`) are never muted. + +Muting is controlled by one or more **strategies** passed to `LLMUserAggregatorParams.user_mute_strategies`. If any active strategy says "mute", the user is muted. All strategies must return `False` for unmuting to occur. Built-in strategies include: + +- **`AlwaysUserMuteStrategy`** — mutes whenever the bot is speaking +- **`FirstSpeechUserMuteStrategy`** — mutes only during the bot's first speaking turn +- **`MuteUntilFirstBotCompleteUserMuteStrategy`** — mutes from pipeline start until the bot finishes its first response +- **`FunctionCallUserMuteStrategy`** — mutes while function calls are in progress + +When mute state changes, `UserMuteStartedFrame` or `UserMuteStoppedFrame` is broadcast in both directions. + +### UserStartedSpeakingFrame + +Indicates that a user turn has begun. By this point, transcriptions are usually already flowing through the pipeline. + +Inherits from `SystemFrame`. + + + Whether this event was emulated rather than detected by VAD. Deprecated since 0.0.99. + + +### UserStoppedSpeakingFrame + +Marks the end of a user turn. The bot's response is triggered separately by the turn detection system. + +Inherits from `SystemFrame`. + + + Whether this event was emulated rather than detected by VAD. Deprecated since 0.0.99. + + +### UserSpeakingFrame + +Emitted by the VAD processor while the user is actively speaking. Useful for UI feedback or suppressing idle timeouts. + +Inherits from `SystemFrame`. + + +### UserMuteStartedFrame + +Broadcast when one or more [user mute strategies](#user-mute) activate. While muted, the `LLMUserAggregator` silently drops user-originating frames: audio input, transcriptions (interim and final), VAD speaking signals, and interruption attempts. Other frame types continue to flow normally. + +Inherits from `SystemFrame`. + + +### UserMuteStoppedFrame + +Broadcast when all active user mute strategies deactivate, allowing user input to be processed again. + +Inherits from `SystemFrame`. + + +## VAD Events + +These frames are emitted directly by the Voice Activity Detection (VAD) processor and carry timing metadata. Higher-level speaking-state frames (`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`) are derived from these. + +### VADUserStartedSpeakingFrame + +VAD confirmed that speech has started. + +Inherits from `SystemFrame`. + + + Timestamp in seconds when speech onset was detected. + + + + Wall-clock time when the frame was created. + + +### VADUserStoppedSpeakingFrame + +VAD confirmed that speech has ended. + +Inherits from `SystemFrame`. + + + Timestamp in seconds when speech ended. + + + + Wall-clock time when the frame was created. + + +### SpeechControlParamsFrame + +Notifies processors that VAD or turn detection parameters have changed at runtime. + +Inherits from `SystemFrame`. + + + Updated VAD parameters. + + + + Updated turn detection parameters. + + +## Bot Speaking State + +### BotStartedSpeakingFrame + +Emitted by the output transport when the bot begins speaking. Broadcast in both directions so processors on either side of the transport can react. + +Inherits from `SystemFrame`. + + +### BotStoppedSpeakingFrame + +Emitted by the output transport when the bot finishes speaking. Also broadcast in both directions. + +Inherits from `SystemFrame`. + + +### BotSpeakingFrame + +Emitted continuously while the bot is speaking. Processors can use this to suppress idle timeouts or drive visual indicators. + +Inherits from `SystemFrame`. + + +## Connection Status + +### BotConnectedFrame + +The bot has joined the transport room. Only relevant for SFU-based transports: Daily, LiveKit, HeyGen, and Tavus. + +Inherits from `SystemFrame`. + + +### ClientConnectedFrame + +A client or participant has connected to the transport. + +Inherits from `SystemFrame`. + + +## Input Frames + +Input frames carry raw data from transport sources into the pipeline. They inherit from `SystemFrame` so they are never discarded during interruptions — incoming user data must always be processed. + +### InputAudioRawFrame + +Raw audio received from the transport. Inherits the `audio`, `sample_rate`, `num_channels`, and `num_frames` fields from the [`AudioRawFrame`](/server/frames/overview#audiorawframe) mixin. + +Inherits from `SystemFrame`, `AudioRawFrame`. + + +### UserAudioRawFrame + +Audio from a specific user in a multi-participant session. + +Inherits from `InputAudioRawFrame`. + + + Identifier for the user who produced this audio. + + +### InputImageRawFrame + +Raw image received from the transport. Inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/server/frames/overview#imagerawframe) mixin. + +Inherits from `SystemFrame`, `ImageRawFrame`. + + +### UserImageRawFrame + +An image from a specific user, optionally tied to a pending image request. + +Inherits from `InputImageRawFrame`. + + + Identifier for the user who produced this image. + + + + Optional text associated with the image. + + + + Whether to append this image to the LLM context. + + + + The original request frame that triggered this image capture. + + +### InputTextRawFrame + +Text received from the transport, such as a user typing in a chat interface. Inherits the `text` field from `TextFrame`. + +Inherits from `SystemFrame`, `TextFrame`. + + +## DTMF Input + +### InputDTMFFrame + +A DTMF keypress received from the transport. Inherits the `button` field from the `DTMFFrame` mixin. + +Inherits from `DTMFFrame`, `SystemFrame`. + + +### OutputDTMFUrgentFrame + +A DTMF keypress for immediate output, bypassing the normal frame queue. + +Inherits from `DTMFFrame`, `SystemFrame`. + + +## Transport Messages + +### InputTransportMessageFrame + +A message received from an external transport. The message format is transport-specific. + +Inherits from `SystemFrame`. + + + The transport message payload. + + +### OutputTransportMessageUrgentFrame + +An outbound transport message that bypasses the normal queue for immediate delivery. + +Inherits from `SystemFrame`. + + + The transport message payload. + + +## User Interaction + +### UserImageRequestFrame + +Requests an image from a specific user, typically to capture a camera frame for vision processing. + +Inherits from `SystemFrame`. + + + Identifier for the user to capture from. + + + + Optional text prompt associated with the image request. + + + + Whether to append the resulting image to the LLM context. + + + + Specific video source to capture from. + + + + Function name if this request originated from a tool call. + + + + Tool call identifier if this request originated from a tool call. + + + + Callback to invoke with the captured image result. + + +### STTMuteFrame + +Mutes or unmutes the STT service. While muted, incoming audio is not sent to the STT provider. + +Inherits from `SystemFrame`. + + + `True` to mute, `False` to unmute. + + +### UserIdleTimeoutUpdateFrame + +Updates the user idle timeout at runtime. Set to `0` to disable idle detection entirely. + +Inherits from `SystemFrame`. + + + New idle timeout in seconds. `0` disables detection. + + +## Diagnostics + +### MetricsFrame + +Performance metrics collected from processors. Emitted when metrics reporting is enabled via `StartFrame`. + +Inherits from `SystemFrame`. + + + List of metrics data entries. + + +## Service Metadata + +### ServiceMetadataFrame + +Base metadata frame broadcast by services at startup, providing information about service capabilities and configuration. + +Inherits from `SystemFrame`. + + + Name of the service that emitted this metadata. + + +### STTMetadataFrame + +Metadata from an STT service, including latency characteristics used for turn detection tuning. + +Inherits from `ServiceMetadataFrame`. + + + P99 latency in seconds for time-to-final-segment. Used by turn detectors to calibrate wait times. + + +## Task Frames + +Task frames provide a system-priority mechanism for requesting pipeline actions from outside the normal frame flow. They are converted into their corresponding standard frames when processed. + +### TaskSystemFrame + +Base class for system-priority task frames. + +Inherits from `SystemFrame`. + + +### CancelTaskFrame + +Requests immediate pipeline cancellation. Converted to a `CancelFrame` when processed by the pipeline. + +Inherits from `TaskSystemFrame`. + + + Optional reason for the cancellation request. + + +### InterruptionTaskFrame + +Requests a pipeline interruption. Converted to an `InterruptionFrame` when processed. + +Inherits from `TaskSystemFrame`. From 41b991997d015a19c36a698ea422a371475d5631 Mon Sep 17 00:00:00 2001 From: Mark Backman Date: Wed, 25 Mar 2026 15:56:36 -0400 Subject: [PATCH 2/2] Code review fixes with Claude --- server/frames/control-frames.mdx | 75 +++++---- server/frames/data-frames.mdx | 208 ++++++++++++++++++------ server/frames/llm-frames.mdx | 271 ++----------------------------- server/frames/overview.mdx | 112 +++++++------ server/frames/system-frames.mdx | 211 ++++++++++++------------ 5 files changed, 377 insertions(+), 500 deletions(-) diff --git a/server/frames/control-frames.mdx b/server/frames/control-frames.mdx index c4cf01f2..73b4a987 100644 --- a/server/frames/control-frames.mdx +++ b/server/frames/control-frames.mdx @@ -3,13 +3,15 @@ title: "Control Frames" description: "Reference for ControlFrame types: pipeline lifecycle, response boundaries, service settings, and runtime configuration" --- -ControlFrames are queued and processed in order alongside data frames. They signal boundaries, state changes, and configuration updates within the pipeline. Unlike system frames, control frames respect ordering guarantees — they won't skip ahead of data frames already in the queue. Control frames are cancelled on user interruption unless combined with `UninterruptibleFrame`. See the [frames overview](/server/frames/overview) for base class details and the full frame hierarchy. +ControlFrames signal boundaries, state changes, and configuration updates within the pipeline. They are queued and processed in order alongside DataFrames. ControlFrames are cancelled on `InterruptionFrame` unless combined with `UninterruptibleFrame`. See the [frames overview](/server/frames/overview) for base class details and the full frame hierarchy. ## Pipeline Lifecycle ### EndFrame -Signals graceful pipeline shutdown. The transport stops sending, closes its threads, and the pipeline winds down completely. Inherits from both `ControlFrame` and `UninterruptibleFrame`, so it cannot be cancelled by interruption. +Signals graceful pipeline shutdown. `EndFrame` is queued with other non-SystemFrames, which allows FrameProcessors to be shut down in order, allowing queued frames ahead of the `EndFrame` to be processed first. + +Inherits from `UninterruptibleFrame`, meaning it cannot be cancelled by `InterruptionFrame`. Optional reason for the shutdown, passed along for logging or inspection. @@ -17,14 +19,14 @@ Signals graceful pipeline shutdown. The transport stops sending, closes its thre ### StopFrame -Stops the pipeline but keeps processors in a running state. Useful when you need to halt frame flow without tearing down the entire processor graph. Inherits from `ControlFrame` and `UninterruptibleFrame`. +Stops the pipeline but keeps processors in a running state. Like `EndFrame`, `StopFrame` is queued with other non-SystemFrames allowing frames preceding it to be processed first. Useful when you need to halt frame flow without tearing down the entire processor graph. +Inherits from `UninterruptibleFrame`. ### OutputTransportReadyFrame Indicates that the output transport is ready to receive frames. Processors waiting on transport availability can use this as their signal to begin sending. - ### HeartbeatFrame Used for pipeline health monitoring. Processors can observe these to detect stalls or measure latency. @@ -37,12 +39,18 @@ Used for pipeline health monitoring. Processors can observe these to detect stal While a processor is paused, incoming frames accumulate in its internal queue rather than being dropped. Once the processor is resumed, it drains the queue and processes all buffered frames in the order they arrived. -For example, the TTS service pauses itself while synthesizing a `TTSSpeakFrame`. If new text frames arrive during synthesis, they queue up instead of producing overlapping audio. The TTS resumes when `BotStoppedSpeakingFrame` (a system frame) arrives, and the buffered frames are processed in order. +For example, the TTS service pauses itself while synthesizing a `TTSSpeakFrame`. If new text frames arrive during synthesis, they queue up instead of producing overlapping audio. The TTS resumes when `BotStoppedSpeakingFrame` (a `SystemFrame`) arrives, and the buffered frames are processed in order. -Internally, each processor has two queues: a high-priority input queue for system frames and a process queue for everything else. Pausing blocks the process queue, but system frames continue to flow through the input queue. This is why the typical pattern is for a processor to pause itself and then resume in response to a system frame. +Internally, each processor has two queues: a high-priority input queue for SystemFrames and a process queue for everything else. Pausing blocks the process queue, but SystemFrames continue to flow through the input queue. This is why the typical pattern is for a processor to pause itself and then resume in response to a `SystemFrame`. -`FrameProcessorResumeFrame` is a control frame, which means it enters the same process queue that pausing blocks. If data frames have already queued up ahead of it, the resume frame will be stuck behind them and the processor will stay paused. To resume a paused processor from outside, use the system frame variant `FrameProcessorResumeUrgentFrame` instead — it bypasses the process queue entirely. See [System Frames](/server/frames/system-frames#processor-pauseresume-urgent). + `FrameProcessorResumeFrame` is a `ControlFrame`, which means it enters the + same process queue that pausing blocks. If DataFrames have already queued up + ahead of it, the resume frame will be stuck behind them and the processor will + stay paused. To resume a paused processor from outside, use the `SystemFrame` + variant `FrameProcessorResumeUrgentFrame` instead — it bypasses the process + queue entirely. See [System + Frames](/server/frames/system-frames#processor-pauseresume-urgent). ### FrameProcessorPauseFrame @@ -62,7 +70,9 @@ Resumes a previously paused processor, releasing all buffered frames for process -Because this is a control frame, it will be blocked behind any data frames that queued up while the processor was paused. Use `FrameProcessorResumeUrgentFrame` if the processor may have buffered frames. + Because this is a `ControlFrame`, it will be blocked behind any DataFrames + that queued up while the processor was paused. Use + `FrameProcessorResumeUrgentFrame` if the processor may have buffered frames. ## LLM Response Boundaries @@ -73,27 +83,22 @@ These frames bracket LLM output, letting downstream processors (aggregators, TTS Marks the beginning of an LLM response. Followed by one or more `TextFrame`s and terminated by `LLMFullResponseEndFrame`. - ### LLMFullResponseEndFrame Marks the end of an LLM response. - ### VisionFullResponseStartFrame Beginning of a vision model response. Inherits from `LLMFullResponseStartFrame`. - ### VisionFullResponseEndFrame End of a vision model response. Inherits from `LLMFullResponseEndFrame`. - ### LLMAssistantPushAggregationFrame Forces the assistant aggregator to commit its buffered text to context immediately, rather than waiting for the normal end-of-response boundary. - ## LLM Context Summarization Frames that coordinate context summarization: compressing conversation history to stay within token limits. @@ -102,7 +107,11 @@ Frames that coordinate context summarization: compressing conversation history t Triggers manual context summarization. Push this frame to request that the LLM summarize the current conversation context. - + Optional configuration controlling summarization behavior. @@ -136,7 +145,9 @@ Internal request from the aggregator to the LLM service, asking it to produce a ### LLMContextSummaryResultFrame -The LLM's summarization result, sent back to the aggregator. Inherits from both `ControlFrame` and `UninterruptibleFrame` to ensure the result is never dropped. +The LLM's summarization result, sent back to the aggregator. + +Inherits from `UninterruptibleFrame` to ensure the result is never dropped. Matches the originating request. @@ -163,11 +174,13 @@ Bracket extended thinking output from LLMs that support it (e.g., Claude with ex Marks the beginning of LLM extended thinking content. - Whether to append thought content to the conversation context. Raises `ValueError` if set to `True` without specifying `llm`. + Whether to append thought content to the conversation context. Raises + `ValueError` if set to `True` without specifying `llm`. - Identifier for the LLM producing the thought. Required when `append_to_context` is `True`. + Identifier for the LLM producing the thought. Required when + `append_to_context` is `True`. ### LLMThoughtEndFrame @@ -175,14 +188,17 @@ Marks the beginning of LLM extended thinking content. Marks the end of LLM extended thinking content. - Thought signature, if provided by the LLM. Anthropic models include a signature that must be preserved when appending thoughts back to context. + Thought signature, if provided by the LLM. Anthropic models include a + signature that must be preserved when appending thoughts back to context. ## Function Calling ### FunctionCallInProgressFrame -Indicates that a function call is currently executing. Inherits from `ControlFrame` and `UninterruptibleFrame`, ensuring it reaches downstream processors even during interruption. +Indicates that a function call is currently executing. + +Inherits from `UninterruptibleFrame`, ensuring it reaches downstream processors even during interruption. Name of the function being called. @@ -220,39 +236,40 @@ Signals the end of a TTS audio response. ## Service Settings -Runtime settings updates for LLM, TTS, STT, and other services. These let you change service configuration mid-conversation without rebuilding the pipeline. +Runtime settings updates for LLM, TTS, STT, and other services. These let you change service configuration mid-conversation without rebuilding the pipeline. Push an `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, or `STTUpdateSettingsFrame` to update the corresponding service. See the [Changing Service Settings at Runtime](/server/frames/overview#changing-service-settings-at-runtime) pattern for an example. ### ServiceUpdateSettingsFrame -Base frame for runtime service settings updates. Inherits from `ControlFrame` and `UninterruptibleFrame`. +Base frame for runtime service settings updates. + +Inherits from `UninterruptibleFrame`. Dictionary of settings to update. - Typed settings delta. Takes precedence over the `settings` dict when both are provided. + Typed settings delta. Takes precedence over the `settings` dict when both are + provided. - Target a specific service instance. When `None`, the frame applies to the first matching service in the pipeline. + Target a specific service instance. When `None`, the frame applies to the + first matching service in the pipeline. ### LLMUpdateSettingsFrame Update LLM service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. - ### TTSUpdateSettingsFrame Update TTS service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. - ### STTUpdateSettingsFrame Update STT service settings at runtime. Inherits from `ServiceUpdateSettingsFrame`. - ## Audio Processing ### VADParamsUpdateFrame @@ -267,7 +284,6 @@ Update Voice Activity Detection parameters at runtime. Base frame for audio filter control. Subclass this for custom filter commands. - ### FilterUpdateSettingsFrame Update audio filter settings. Inherits from `FilterControlFrame`. @@ -288,7 +304,6 @@ Enable or disable an audio filter. Inherits from `FilterControlFrame`. Base frame for audio mixer control. - ### MixerUpdateSettingsFrame Update audio mixer settings. Inherits from `MixerControlFrame`. @@ -311,7 +326,6 @@ Enable or disable an audio mixer. Inherits from `MixerControlFrame`. Base frame for service switching operations. - ### ManuallySwitchServiceFrame Request a manual switch to a different service instance. Inherits from `ServiceSwitcherFrame`. @@ -334,8 +348,7 @@ Task frames are pushed upstream to the pipeline task, which converts them into t ### TaskFrame -Base frame for task control. Inherits from `ControlFrame`. - +Base frame for task control. ### EndTaskFrame diff --git a/server/frames/data-frames.mdx b/server/frames/data-frames.mdx index 2b202da7..540b67a0 100644 --- a/server/frames/data-frames.mdx +++ b/server/frames/data-frames.mdx @@ -5,11 +5,7 @@ description: "Reference for DataFrame types: audio, image, text, transcription, ## Overview -DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order, and any pending data frames are discarded when a user interrupts. See the [Frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. - -```python -from pipecat.frames.frames import TextFrame, OutputAudioRawFrame, TTSSpeakFrame -``` +DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order with other DataFrames and ControlFrames, and any pending DataFrames are discarded when a user interrupts. See the [Frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. ## Audio Frames @@ -19,8 +15,7 @@ These frames carry raw audio through the pipeline toward the output transport. E A chunk of raw audio destined for the output transport. Use the inherited `transport_destination` field when your transport supports multiple audio tracks. -Inherits from `DataFrame`, `AudioRawFrame`. - +Inherits from `AudioRawFrame`. ### TTSAudioRawFrame @@ -38,7 +33,6 @@ Audio from a continuous speech stream. The stream may contain silence frames int Inherits from `OutputAudioRawFrame`. - ## Image Frames Frames for carrying image data to the output transport. Each inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/server/frames/overview#imagerawframe) mixin. @@ -47,11 +41,13 @@ Frames for carrying image data to the output transport. Each inherits `image`, ` An image for display by the output transport. Supports the `transport_destination` field for transports with multiple video tracks. -Inherits from `DataFrame`, `ImageRawFrame`. - +Inherits from `ImageRawFrame`. -The `sync_with_audio` field (default `False`) is set internally, not via the constructor. When `True`, the image is queued with audio frames so it displays only after all preceding audio has been sent. When `False`, the transport displays it immediately. + The `sync_with_audio` field (default `False`) is set internally, not via the + constructor. When `True`, the image is queued with audio frames so it displays + only after all preceding audio has been sent. When `False`, the transport + displays it immediately. ### URLImageRawFrame @@ -82,31 +78,28 @@ Inherits from `OutputImageRawFrame`. An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport's `camera_out_framerate` parameter. -Inherits from `DataFrame`. - Ordered list of image frames that make up the sprite animation. ## Text Frames -Text content at various stages of processing: raw text, LLM output, aggregated results, and TTS input. +Text content at various stages of processing: raw text, LLM output, aggregated results, TTS input, and transcriptions. ### TextFrame The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors. -Inherits from `DataFrame`. - The text content. -Several non-constructor fields control downstream behavior: -- `skip_tts` (default `None`): when set, tells the TTS service to skip this text -- `includes_inter_frame_spaces` (default `False`): indicates whether leading/trailing spaces are already included -- `append_to_context` (default `True`): whether this text should be appended to the LLM context + Several non-constructor fields control downstream behavior: - `skip_tts` + (default `None`): when set, tells the TTS service to skip this text - + `includes_inter_frame_spaces` (default `False`): indicates whether + leading/trailing spaces are already included - `append_to_context` (default + `True`): whether this text should be appended to the LLM context ### LLMTextFrame @@ -115,7 +108,6 @@ Text generated by an LLM service. Behaves like a `TextFrame` with `includes_inte Inherits from `TextFrame`. - ### AggregatedTextFrame Multiple text frames combined into a single frame for processing or output. @@ -136,7 +128,6 @@ Text output from a vision model. Functionally identical to `LLMTextFrame` but di Inherits from `LLMTextFrame`. - ### TTSTextFrame Text that has been sent to a TTS service for synthesis. @@ -147,16 +138,14 @@ Inherits from `AggregatedTextFrame`. Identifier for the TTS context that generated this text. -## Transcription Frames +### Transcriptions -Frames produced by speech-to-text services at different stages of recognition: interim results, final transcriptions, and translations. +Frames produced by speech-to-text services at different stages of recognition. All inherit from `TextFrame`, so they flow through text aggregators and other `TextFrame` handlers. -### TranscriptionFrame +#### TranscriptionFrame A non-interim transcription result from an STT service: the service's best recognition of what the user said, as opposed to the streaming partial results in `InterimTranscriptionFrame`. -Inherits from `TextFrame`. - Identifier for the user who spoke. @@ -174,15 +163,17 @@ Inherits from `TextFrame`. - Whether the STT service has explicitly committed this transcription via a finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics) support this; others don't, so it defaults to `False`. Turn detection strategies can use this flag to trigger the bot's response immediately rather than waiting for a timeout. + Whether the STT service has explicitly committed this transcription via a + finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics) + support this; others don't, so it defaults to `False`. Turn detection + strategies can use this flag to trigger the bot's response immediately rather + than waiting for a timeout. -### InterimTranscriptionFrame +#### InterimTranscriptionFrame A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by a `TranscriptionFrame` once the STT service produces its result. -Inherits from `TextFrame`. - The partial transcription text. @@ -203,12 +194,10 @@ Inherits from `TextFrame`. Raw result object from the STT service. -### TranslationFrame +#### TranslationFrame A translated transcription, typically placed in the transport's receive queue when a participant speaks in a different language. -Inherits from `TextFrame`. - Identifier for the user who spoke. @@ -221,26 +210,12 @@ Inherits from `TextFrame`. Target language of the translation. -## LLM Context Timestamp - -### LLMContextAssistantTimestampFrame - -Carries timestamp information for assistant messages in the LLM context. Used internally to track when assistant responses were generated. - -Inherits from `DataFrame`. - - - Timestamp when the assistant message was created. - - ## TTS Frames ### TTSSpeakFrame Sends text to the pipeline's TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for each `TTSSpeakFrame`, whereas `TextFrame`s produced during an LLM response are grouped under the same turn context. -Inherits from `DataFrame`. - The text to be spoken. @@ -255,8 +230,6 @@ Inherits from `DataFrame`. A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation. -Inherits from `DataFrame`. - The transport message payload. @@ -267,12 +240,143 @@ Inherits from `DataFrame`. A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits the `button` field from the `DTMFFrame` mixin, which holds the keypad entry that was pressed. -Inherits from `DTMFFrame`, `DataFrame`. +Inherits from `DTMFFrame`. The DTMF keypad entry to send. -For transports that support multiple dial-out destinations, set the `transport_destination` field (inherited from `Frame`) to specify which destination receives the DTMF tone. + For transports that support multiple dial-out destinations, set the + `transport_destination` field (inherited from `Frame`) to specify which + destination receives the DTMF tone. + +## LLM Context Management + +Frames that modify or trigger processing of the LLM conversation context. + +### LLMMessagesAppendFrame + +Appends messages to the current conversation context without replacing existing ones. + + + List of message dictionaries to append. + + + + Whether the LLM should process the updated context immediately. When `None`, + the default behavior of the context aggregator applies. + + +### LLMMessagesUpdateFrame + +Replaces the current context messages entirely with a new set. + + + List of message dictionaries to replace the current context. + + + + Whether the LLM should process the updated context immediately. When `None`, + the default behavior of the context aggregator applies. + + +### LLMRunFrame + +Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled. + +### LLMContextAssistantTimestampFrame + +Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context. + + + Timestamp when the assistant message was created. + + +## LLM Thinking + +### LLMThoughtTextFrame + +A chunk of thought or reasoning text from the LLM. This is a `DataFrame`, not a `TextFrame` subclass — TTS services and text aggregators will not process it. + + + The text (or text chunk) of the thought. + + +## LLM Tool Configuration + +Frames for configuring LLM function calling behavior and output settings at runtime. + +### LLMSetToolsFrame + +Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider. + + + List of tool/function definitions for the LLM. + + +### LLMSetToolChoiceFrame + +Configures how the LLM selects tools during function calling. + + + Tool choice setting: `"none"` disables tool use, `"auto"` lets the LLM decide, + `"required"` forces a tool call, or a dict specifying a particular tool. + + +### LLMEnablePromptCachingFrame + +Toggles prompt caching for LLMs that support it. + + + Whether to enable prompt caching. + + +### LLMConfigureOutputFrame + +Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud. + + + When `True`, LLM tokens are added to context but not passed to TTS. + + +## Function Call Results + +### FunctionCallResultFrame + +Contains the result of a completed function call execution. + +Inherits from `UninterruptibleFrame` to ensure the result always reaches the context aggregator. + + + Name of the function that was executed. + + + + Unique identifier for the function call. + + + + Arguments that were passed to the function. + + + + The result returned by the function. + + + + Whether to run the LLM after this result. Overrides the default behavior. + + + + Additional properties for result handling. + diff --git a/server/frames/llm-frames.mdx b/server/frames/llm-frames.mdx index 47ae8c55..e0664779 100644 --- a/server/frames/llm-frames.mdx +++ b/server/frames/llm-frames.mdx @@ -1,25 +1,11 @@ --- title: "LLM Frames" -description: "Reference for LLM-related frames: context management, messages, thinking, tool configuration, and function calling" +description: "LLM context frame and function calling helper dataclasses" --- -This page collects all frames related to LLM operations. These frames span multiple base types: some are DataFrames (message content), some are ControlFrames (response boundaries, settings), and a few are SystemFrames (function call lifecycle). They are grouped here by function rather than by base class. See the [frames overview](/server/frames/overview) for base class behavior, interruption rules, and properties common to all frames. +This page documents LLM-specific types that don't belong on a base-type page: `LLMContextFrame` (which inherits directly from `Frame`) and the helper dataclasses used by function calling frames. All other LLM-related frames are documented on their base-type pages. See [Related Frames](#related-frames) below for links. -```python -from pipecat.frames.frames import ( - LLMContextFrame, - LLMMessagesAppendFrame, - LLMMessagesUpdateFrame, - LLMRunFrame, - FunctionCallResultFrame, -) -``` - -## Context Management - -Frames that create, modify, or trigger processing of the LLM conversation context. - -### LLMContextFrame +## LLMContextFrame Contains a complete LLM context. Acts as a signal to LLM services to ingest the provided context and generate a response. @@ -29,150 +15,11 @@ Inherits directly from `Frame` (not `DataFrame`, `ControlFrame`, or `SystemFrame The LLM context containing messages, tools, and configuration. -### LLMMessagesAppendFrame - -Appends messages to the current conversation context without replacing existing ones. - -Inherits from `DataFrame`. - - - List of message dictionaries to append. - - - - Whether the LLM should process the updated context immediately. When `None`, the default behavior of the context aggregator applies. - - -### LLMMessagesUpdateFrame - -Replaces the current context messages entirely with a new set. - -Inherits from `DataFrame`. - - - List of message dictionaries to replace the current context. - - - - Whether the LLM should process the updated context immediately. When `None`, the default behavior of the context aggregator applies. - - -### LLMRunFrame - -Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled. - -Inherits from `DataFrame`. - - -### LLMContextAssistantTimestampFrame - -Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context. - -Inherits from `DataFrame`. - - - Timestamp when the assistant message was created. - - -## Response Boundaries - -These frames bracket LLM output. Between start and end, the LLM emits `TextFrame`s (or `LLMThoughtTextFrame`s for thinking content). They are fully documented on the control frames page and cross-referenced here. - -### LLMFullResponseStartFrame - -Marks the beginning of an LLM response. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). - -Inherits from `ControlFrame`. - -### LLMFullResponseEndFrame - -Marks the end of an LLM response. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). - -Inherits from `ControlFrame`. - -## Thinking / Extended Reasoning - -LLMs that support chain-of-thought reasoning (such as Anthropic's extended thinking) emit thought frames between response boundaries. Thought text is intentionally not a `TextFrame` subclass, which prevents it from triggering TTS or being captured by text aggregators. - -### LLMThoughtStartFrame - -Marks the beginning of extended thinking content. See [Control Frames](/server/frames/control-frames#llm-thought-frames) for full parameter documentation. - -Inherits from `ControlFrame`. - -### LLMThoughtTextFrame - -A chunk of thought or reasoning text from the LLM. - -Inherits from `DataFrame`. - - - The text (or text chunk) of the thought. - - - -This is a `DataFrame`, not a `TextFrame` subclass. TTS services and text aggregators will not process it. - - -### LLMThoughtEndFrame - -Marks the end of extended thinking content. See [Control Frames](/server/frames/control-frames#llm-thought-frames) for full parameter documentation. - -Inherits from `ControlFrame`. - -## Tool Configuration - -Frames for configuring LLM function calling behavior and output settings at runtime. - -### LLMSetToolsFrame - -Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider. - -Inherits from `DataFrame`. - - - List of tool/function definitions for the LLM. - - -### LLMSetToolChoiceFrame - -Configures how the LLM selects tools during function calling. - -Inherits from `DataFrame`. - - - Tool choice setting: `"none"` disables tool use, `"auto"` lets the LLM decide, `"required"` forces a tool call, or a dict specifying a particular tool. - - -### LLMEnablePromptCachingFrame - -Toggles prompt caching for LLMs that support it. - -Inherits from `DataFrame`. - - - Whether to enable prompt caching. - - -### LLMConfigureOutputFrame - -Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud. - -Inherits from `DataFrame`. - - - When `True`, LLM tokens are added to context but not passed to TTS. - - -## Function Calling - -Function calling involves frames from multiple base types. `FunctionCallInProgressFrame` and `FunctionCallResultFrame` are both uninterruptible: once a function call starts, its progress and result must reach the context aggregator to keep the conversation consistent. - -### Helper Dataclasses +## Function Calling Helper Dataclasses These are plain dataclasses used as fields within function calling frames, not frames themselves. -#### FunctionCallFromLLM +### FunctionCallFromLLM Represents a function call returned by the LLM, ready for execution. @@ -192,7 +39,7 @@ Represents a function call returned by the LLM, ready for execution. The LLM context at the time the function call was made. -#### FunctionCallResultProperties +### FunctionCallResultProperties Configures how a function call result is handled after execution. @@ -200,104 +47,18 @@ Configures how a function call result is handled after execution. Whether to run the LLM after receiving this result. - + Async callback to execute when the context is updated with the result. -### Frames - -#### FunctionCallInProgressFrame - -Indicates that a function call is currently executing. See [Control Frames](/server/frames/control-frames#function-calling) for full parameter documentation. - -Inherits from `ControlFrame`, `UninterruptibleFrame`. - -#### FunctionCallResultFrame - -Contains the result of a completed function call execution. Uninterruptible to ensure the result always reaches the context aggregator. - -Inherits from `DataFrame`, `UninterruptibleFrame`. - - - Name of the function that was executed. - - - - Unique identifier for the function call. - - - - Arguments that were passed to the function. - - - - The result returned by the function. - - - - Whether to run the LLM after this result. Overrides the default behavior. - - - - Additional properties for result handling. - - -#### FunctionCallsStartedFrame - -Signals that one or more function calls are about to begin executing. As a system frame, this is never discarded during interruption. - -Inherits from `SystemFrame`. - - - Sequence of function calls that will be executed. - - -#### FunctionCallCancelFrame - -Signals that a function call was cancelled, typically due to user interruption when the function's `cancel_on_interruption` flag is set. - -Inherits from `SystemFrame`. - - - Name of the function that was cancelled. - - - - Unique identifier for the cancelled function call. - - -## Context Summarization - -Frames that coordinate automatic context compression to stay within token limits. These are fully documented on the control frames page and listed here for cross-reference. - -### LLMSummarizeContextFrame - -Triggers manual context summarization. See [Control Frames](/server/frames/control-frames#llm-context-summarization). - -Inherits from `ControlFrame`. - -### LLMContextSummaryRequestFrame - -Internal request from the aggregator to the LLM service for a summary. See [Control Frames](/server/frames/control-frames#llm-context-summarization). - -Inherits from `ControlFrame`. - -### LLMContextSummaryResultFrame - -The LLM's summarization result, delivered back to the aggregator. See [Control Frames](/server/frames/control-frames#llm-context-summarization). - -Inherits from `ControlFrame`, `UninterruptibleFrame`. - -## Settings - -### LLMUpdateSettingsFrame - -Updates LLM service settings at runtime. Inherits the `settings`, `delta`, and `service` parameters from `ServiceUpdateSettingsFrame`. See [Control Frames](/server/frames/control-frames#service-settings) for parameter details. - -Inherits from `ServiceUpdateSettingsFrame`. - -### LLMAssistantPushAggregationFrame +## Related Frames -Forces the assistant aggregator to commit its buffered text to context immediately, rather than waiting for the normal end-of-response boundary. See [Control Frames](/server/frames/control-frames#llm-response-boundaries). +LLM-related frames organized by base type: -Inherits from `ControlFrame`. +- **Data Frames**: [Context Management](/server/frames/data-frames#llm-context-management), [Thinking](/server/frames/data-frames#llm-thinking), [Tool Configuration](/server/frames/data-frames#llm-tool-configuration), [Function Call Results](/server/frames/data-frames#function-call-results) +- **Control Frames**: [Response Boundaries](/server/frames/control-frames#llm-response-boundaries), [Context Summarization](/server/frames/control-frames#llm-context-summarization), [Thought Frames](/server/frames/control-frames#llm-thought-frames), [Function Calling](/server/frames/control-frames#function-calling), [Service Settings](/server/frames/control-frames#service-settings) +- **System Frames**: [Function Calling](/server/frames/system-frames#function-calling) diff --git a/server/frames/overview.mdx b/server/frames/overview.mdx index 506ebae7..6f86ef6e 100644 --- a/server/frames/overview.mdx +++ b/server/frames/overview.mdx @@ -1,6 +1,6 @@ --- title: "Frames" -description: "Understanding frames — the data containers that carry information through Pipecat pipelines" +description: "Frame categories, processing behavior, and common patterns for Pipecat pipelines" --- ## Overview @@ -9,71 +9,33 @@ Frames are the fundamental units of data in Pipecat. Every piece of information All frames inherit from the base `Frame` class and are Python [dataclasses](https://docs.python.org/3/library/dataclasses.html). -```python -from pipecat.frames.frames import Frame, TextFrame, TTSAudioRawFrame -``` - ## Frame Categories Pipecat has three base frame types, each with different processing behavior: -| Base Type | Processing | Interruption Behavior | -|-----------|-----------|----------------------| -| `DataFrame` | Queued, processed in order | Cancelled on user interruption | -| `ControlFrame` | Queued, processed in order | Cancelled on user interruption | -| `SystemFrame` | Higher priority, processed in order | **Not** cancelled on user interruption | +| Base Type | Processing | Interruption Behavior | +| -------------- | ------------------------------------------------------------- | -------------------------------------- | +| `DataFrame` | Queued, processed in order with non-SystemFrames | Cancelled on user interruption | +| `ControlFrame` | Queued, processed in order with non-SystemFrames | Cancelled on user interruption | +| `SystemFrame` | Higher priority, queued, processed in order with SystemFrames | **Not** cancelled on user interruption | ### DataFrame -Data frames carry the main content flowing through a pipeline: audio chunks, text, images, and LLM messages. They are queued and processed in order. If a user interrupts (starts speaking while the bot is responding), any pending data frames are discarded so the new input can be handled immediately. +Data frames carry the main content flowing through a pipeline: audio chunks, text, images, and LLM messages. They are queued and processed in order with other DataFrames and ControlFrames. If a user interrupts (starts speaking while the bot is responding), any pending data frames are discarded so the new input can be handled immediately. -```python -@dataclass -class DataFrame(Frame): - """Processed in order. Cancelled by user interruptions.""" - pass -``` - -Examples: `TextFrame`, `OutputAudioRawFrame`, `LLMMessagesFrame`, `TTSSpeakFrame` +Examples: `TextFrame`, `OutputAudioRawFrame`, `LLMMessagesAppendFrame`, `TTSSpeakFrame` ### ControlFrame -Control frames signal processing boundaries and configuration changes: response start/end markers, settings updates, and state transitions. They follow the same ordering and interruption rules as data frames. - -```python -@dataclass -class ControlFrame(Frame): - """Processed in order. Cancelled by user interruptions.""" - pass -``` +ControlFrames signal processing boundaries and configuration changes: response start/end markers, settings updates, and state transitions. They are queued and processed in order alongside DataFrames, and like DataFrames, any pending ControlFrames are discarded when a user interrupts unless combined with `UninterruptibleFrame`. Examples: `EndFrame`, `LLMFullResponseStartFrame`, `TTSStartedFrame`, `ServiceUpdateSettingsFrame` ### SystemFrame -System frames are high-priority signals that must always be delivered: interruptions, user input, error notifications, and pipeline lifecycle events. Unlike data and control frames, they are never discarded when a user interrupts. - -```python -@dataclass -class SystemFrame(Frame): - """Higher priority. Not cancelled by user interruptions.""" - pass -``` - -Examples: `StartFrame`, `CancelFrame`, `StartInterruptionFrame`, `UserStartedSpeakingFrame`, `InputAudioRawFrame` - -## UninterruptibleFrame Mixin - -Occasionally a data or control frame is too important to discard during an interruption. Adding the `UninterruptibleFrame` mixin protects it: the frame stays in internal queues and any task processing it will not be cancelled. - -```python -@dataclass -class FunctionCallResultFrame(DataFrame, UninterruptibleFrame): - """Must be delivered even if the user interrupts.""" - ... -``` +SystemFrames are high-priority signals that must always be delivered: interruptions, user input, error notifications, and pipeline lifecycle events. They are queued and processed in order with other SystemFrames. Unlike DataFrames and ControlFrames, they are never discarded when a user interrupts. -Examples: `EndFrame`, `StopFrame`, `FunctionCallResultFrame`, `FunctionCallInProgressFrame` +Examples: `StartFrame`, `CancelFrame`, `InterruptionFrame.`, `UserStartedSpeakingFrame`, `InputAudioRawFrame` ## Frame Properties @@ -84,7 +46,8 @@ Every frame has these properties set automatically: - Human-readable name combining class name and instance count (e.g., `TextFrame#3`). Useful for debugging. + Human-readable name combining class name and instance count (e.g., + `TextFrame#3`). Useful for debugging. @@ -100,7 +63,8 @@ Every frame has these properties set automatically: - Name of the transport destination for this frame. Used when a transport supports multiple output tracks. + Name of the transport destination for this frame. Used when a transport + supports multiple output tracks. ## Frame Direction @@ -142,9 +106,33 @@ await self.broadcast_frame(UserStartedSpeakingFrame) Each direction receives its own frame instance, linked by `broadcast_sibling_id`. +To broadcast an existing frame instance (when you are not the original creator of the frame), use `broadcast_frame_instance()`: + +```python +# Broadcast an existing frame instance in both directions +await self.broadcast_frame_instance(frame) +``` + +This creates two new instances by shallow-copying all fields from the original frame except `id` and `name`, which get fresh values. + +Prefer `broadcast_frame()` when possible, as it is more efficient. + ## Mixins -Beyond `UninterruptibleFrame`, frames use mixins to share common data structures across the hierarchy: +Mixins add cross-cutting behavior or shared data fields to frames without changing their base type. + +### UninterruptibleFrame + +Occasionally a `DataFrame` or `ControlFrame` is too important to discard during an interruption. Adding the `UninterruptibleFrame` mixin protects it: the frame stays in internal queues and any task processing it will not be cancelled. + +```python +@dataclass +class FunctionCallResultFrame(DataFrame, UninterruptibleFrame): + """Must be delivered even if the user interrupts.""" + ... +``` + +Examples: `EndFrame`, `StopFrame`, `FunctionCallResultFrame`, `FunctionCallInProgressFrame` ### AudioRawFrame @@ -184,6 +172,8 @@ Carries raw image fields shared by both input and output image frames. ## Common Patterns +Pipecat prefers pushing frames over calling methods directly between processors. Routing data through the pipeline as frames ensures correct processing order, which is critical for real-time use cases. + Most frames are produced and consumed by Pipecat's built-in services. The patterns below cover the frames you're most likely to push yourself in application code. ### Starting a Conversation @@ -288,15 +278,23 @@ The individual reference pages below document every frame class, organized by fu - Audio, image, text, transcription, and transport message frames that carry content through the pipeline. + Audio, image, text, transcription, and transport message frames that carry + content through the pipeline. - - Pipeline lifecycle, LLM response boundaries, TTS state, service settings, and filter/mixer configuration. + + Pipeline lifecycle, LLM response boundaries, TTS state, service settings, + and filter/mixer configuration. - Interruptions, user/bot speaking state, VAD events, errors, metrics, and raw input frames. + Interruptions, user/bot speaking state, VAD events, errors, metrics, and raw + input frames. - LLM context management, message manipulation, thinking/reasoning, tool configuration, and function calling. + LLM context frame, function calling helper dataclasses, and links to + LLM-related frames on other pages. diff --git a/server/frames/system-frames.mdx b/server/frames/system-frames.mdx index 3fdf7ede..deb3d24e 100644 --- a/server/frames/system-frames.mdx +++ b/server/frames/system-frames.mdx @@ -3,19 +3,13 @@ title: "System Frames" description: "Reference for SystemFrame types: pipeline lifecycle, interruptions, speaking state, input, and diagnostics" --- -SystemFrames have higher priority than data and control frames and are never cancelled during user interruptions. They carry signals that must always be delivered: pipeline startup and teardown, error notifications, user input, and speaking state changes. See the [frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. - -```python -from pipecat.frames.frames import StartFrame, CancelFrame, ErrorFrame, InterruptionFrame -``` +SystemFrames have higher priority than DataFrames and ControlFrames and are never cancelled during user interruptions. They are queued and processed in order with other SystemFrames. They carry signals that must always be delivered: pipeline startup and teardown, error notifications, user input, and speaking state changes. See the [frames overview](/server/frames/overview) for base class details, mixin fields, and frame properties common to all frames. ## Pipeline Lifecycle ### StartFrame -The first frame pushed into a pipeline, initializing all processors. Every processor receives this before any data or control frames arrive. - -Inherits from `SystemFrame`. +The first frame pushed into a pipeline, initializing all processors. Every processor receives this before any DataFrames or ControlFrames arrive. Input audio sample rate in Hz. @@ -26,7 +20,8 @@ Inherits from `SystemFrame`. - Whether user interruptions are allowed. Deprecated since 0.0.99: use interruption strategies instead. + Whether user interruptions are allowed. Deprecated since 0.0.99: use + interruption strategies instead. @@ -41,23 +36,30 @@ Inherits from `SystemFrame`. Enable usage metrics (token counts, API calls) from services. - + List of interruption strategies for the pipeline. Deprecated since 0.0.99. - When `True`, only report time-to-first-byte for the initial response rather than every response. + When `True`, only report time-to-first-byte for the initial response rather + than every response. - + Optional tracing context for distributed tracing integration. ### CancelFrame -Stops the pipeline immediately, skipping any queued frames. Use this when you need to abort without waiting for pending work to drain. - -Inherits from `SystemFrame`. +Stops the pipeline immediately, skipping any queued non-SystemFrames. Use this when you need to abort without waiting for pending work to drain. For example, when the user has left the session. Optional reason for the cancellation. @@ -69,8 +71,6 @@ Inherits from `SystemFrame`. Carries an error notification, typically pushed upstream so earlier processors can react. -Inherits from `SystemFrame`. - Human-readable error message. @@ -93,10 +93,9 @@ An unrecoverable error requiring the bot to shut down. The `fatal` field is alwa Inherits from `ErrorFrame`. - ## Processor Pause/Resume (Urgent) -These are the system-frame variants of `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`. As system frames, they flow through the high-priority input queue rather than the process queue, so they are not blocked by paused state or buffered frames. This makes `FrameProcessorResumeUrgentFrame` the correct way to resume a processor externally — the control frame variant (`FrameProcessorResumeFrame`) would get stuck behind any data frames that queued up during the pause. See [Control Frames](/server/frames/control-frames#processor-pauseresume) for the full explanation. +These are the `SystemFrame` variants of `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`. As SystemFrames, they flow through the high-priority input queue rather than the process queue, so they are not blocked by paused state or buffered frames. This makes `FrameProcessorResumeUrgentFrame` the correct way to resume a processor externally — the `ControlFrame` variant (`FrameProcessorResumeFrame`) would get stuck behind any DataFrames that queued up during the pause. See [Control Frames](/server/frames/control-frames#processor-pauseresume) for the full explanation. ### FrameProcessorPauseUrgentFrame @@ -118,66 +117,39 @@ Resumes a paused processor immediately, releasing buffered frames. Use this inst ### InterruptionFrame -Interrupts the pipeline, discarding pending data and control frames. Typically triggered when the user starts speaking during a bot response. - -Inherits from `SystemFrame`. - - -## User Speaking State and Mute - -### User Mute - -User mute is a system for temporarily suppressing user input during specific periods. When muted, the `LLMUserAggregator` drops incoming user frames entirely: `InputAudioRawFrame`, `TranscriptionFrame`, `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, VAD signals, and `InterruptionFrame`. Lifecycle frames (`StartFrame`, `EndFrame`, `CancelFrame`) are never muted. +Interrupts the pipeline, discarding pending DataFrames and ControlFrames. Typically triggered when the user starts speaking during a bot response. -Muting is controlled by one or more **strategies** passed to `LLMUserAggregatorParams.user_mute_strategies`. If any active strategy says "mute", the user is muted. All strategies must return `False` for unmuting to occur. Built-in strategies include: - -- **`AlwaysUserMuteStrategy`** — mutes whenever the bot is speaking -- **`FirstSpeechUserMuteStrategy`** — mutes only during the bot's first speaking turn -- **`MuteUntilFirstBotCompleteUserMuteStrategy`** — mutes from pipeline start until the bot finishes its first response -- **`FunctionCallUserMuteStrategy`** — mutes while function calls are in progress - -When mute state changes, `UserMuteStartedFrame` or `UserMuteStoppedFrame` is broadcast in both directions. +## User Speaking State ### UserStartedSpeakingFrame Indicates that a user turn has begun. By this point, transcriptions are usually already flowing through the pipeline. -Inherits from `SystemFrame`. - - Whether this event was emulated rather than detected by VAD. Deprecated since 0.0.99. + Whether this event was emulated rather than detected by VAD. Deprecated since + 0.0.99. ### UserStoppedSpeakingFrame Marks the end of a user turn. The bot's response is triggered separately by the turn detection system. -Inherits from `SystemFrame`. - - Whether this event was emulated rather than detected by VAD. Deprecated since 0.0.99. + Whether this event was emulated rather than detected by VAD. Deprecated since + 0.0.99. ### UserSpeakingFrame Emitted by the VAD processor while the user is actively speaking. Useful for UI feedback or suppressing idle timeouts. -Inherits from `SystemFrame`. - - ### UserMuteStartedFrame -Broadcast when one or more [user mute strategies](#user-mute) activate. While muted, the `LLMUserAggregator` silently drops user-originating frames: audio input, transcriptions (interim and final), VAD speaking signals, and interruption attempts. Other frame types continue to flow normally. - -Inherits from `SystemFrame`. - +Broadcast when one or more [user mute strategies](/server/utilities/turn-management/user-mute-strategies) activate. User mute temporarily suppresses user input while the bot is speaking to prevent interruptions. While muted, the `LLMUserAggregator` drops incoming user frames (`InputAudioRawFrame`, `TranscriptionFrame`, `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, VAD signals, and `InterruptionFrame`). Lifecycle frames (`StartFrame`, `EndFrame`, `CancelFrame`) are never muted. ### UserMuteStoppedFrame -Broadcast when all active user mute strategies deactivate, allowing user input to be processed again. - -Inherits from `SystemFrame`. - +Broadcast when all active [user mute strategies](/server/utilities/turn-management/user-mute-strategies) deactivate, allowing user input to be processed again. ## VAD Events @@ -187,8 +159,6 @@ These frames are emitted directly by the Voice Activity Detection (VAD) processo VAD confirmed that speech has started. -Inherits from `SystemFrame`. - Timestamp in seconds when speech onset was detected. @@ -201,8 +171,6 @@ Inherits from `SystemFrame`. VAD confirmed that speech has ended. -Inherits from `SystemFrame`. - Timestamp in seconds when speech ended. @@ -215,8 +183,6 @@ Inherits from `SystemFrame`. Notifies processors that VAD or turn detection parameters have changed at runtime. -Inherits from `SystemFrame`. - Updated VAD parameters. @@ -231,49 +197,33 @@ Inherits from `SystemFrame`. Emitted by the output transport when the bot begins speaking. Broadcast in both directions so processors on either side of the transport can react. -Inherits from `SystemFrame`. - - ### BotStoppedSpeakingFrame Emitted by the output transport when the bot finishes speaking. Also broadcast in both directions. -Inherits from `SystemFrame`. - - ### BotSpeakingFrame Emitted continuously while the bot is speaking. Processors can use this to suppress idle timeouts or drive visual indicators. -Inherits from `SystemFrame`. - - ## Connection Status ### BotConnectedFrame The bot has joined the transport room. Only relevant for SFU-based transports: Daily, LiveKit, HeyGen, and Tavus. -Inherits from `SystemFrame`. - - ### ClientConnectedFrame A client or participant has connected to the transport. -Inherits from `SystemFrame`. - - ## Input Frames -Input frames carry raw data from transport sources into the pipeline. They inherit from `SystemFrame` so they are never discarded during interruptions — incoming user data must always be processed. +Input frames carry raw data from transport sources into the pipeline. As `SystemFrame`s, they are never discarded during interruptions. Incoming user data must always be processed. ### InputAudioRawFrame Raw audio received from the transport. Inherits the `audio`, `sample_rate`, `num_channels`, and `num_frames` fields from the [`AudioRawFrame`](/server/frames/overview#audiorawframe) mixin. -Inherits from `SystemFrame`, `AudioRawFrame`. - +Inherits from `AudioRawFrame`. ### UserAudioRawFrame @@ -289,8 +239,7 @@ Inherits from `InputAudioRawFrame`. Raw image received from the transport. Inherits `image`, `size`, and `format` from the [`ImageRawFrame`](/server/frames/overview#imagerawframe) mixin. -Inherits from `SystemFrame`, `ImageRawFrame`. - +Inherits from `ImageRawFrame`. ### UserImageRawFrame @@ -310,7 +259,11 @@ Inherits from `InputImageRawFrame`. Whether to append this image to the LLM context. - + The original request frame that triggered this image capture. @@ -318,8 +271,7 @@ Inherits from `InputImageRawFrame`. Text received from the transport, such as a user typing in a chat interface. Inherits the `text` field from `TextFrame`. -Inherits from `SystemFrame`, `TextFrame`. - +Inherits from `TextFrame`. ## DTMF Input @@ -327,15 +279,13 @@ Inherits from `SystemFrame`, `TextFrame`. A DTMF keypress received from the transport. Inherits the `button` field from the `DTMFFrame` mixin. -Inherits from `DTMFFrame`, `SystemFrame`. - +Inherits from `DTMFFrame`. ### OutputDTMFUrgentFrame A DTMF keypress for immediate output, bypassing the normal frame queue. -Inherits from `DTMFFrame`, `SystemFrame`. - +Inherits from `DTMFFrame`. ## Transport Messages @@ -343,8 +293,6 @@ Inherits from `DTMFFrame`, `SystemFrame`. A message received from an external transport. The message format is transport-specific. -Inherits from `SystemFrame`. - The transport message payload. @@ -353,20 +301,38 @@ Inherits from `SystemFrame`. An outbound transport message that bypasses the normal queue for immediate delivery. -Inherits from `SystemFrame`. - The transport message payload. +## Function Calling + +### FunctionCallsStartedFrame + +Signals that one or more function calls are about to begin executing. + + + Sequence of function calls that will be executed. + + +### FunctionCallCancelFrame + +Signals that a function call was cancelled, typically due to user interruption when the function's `cancel_on_interruption` flag is set. + + + Name of the function that was cancelled. + + + + Unique identifier for the cancelled function call. + + ## User Interaction ### UserImageRequestFrame Requests an image from a specific user, typically to capture a camera frame for vision processing. -Inherits from `SystemFrame`. - Identifier for the user to capture from. @@ -399,8 +365,6 @@ Inherits from `SystemFrame`. Mutes or unmutes the STT service. While muted, incoming audio is not sent to the STT provider. -Inherits from `SystemFrame`. - `True` to mute, `False` to unmute. @@ -409,8 +373,6 @@ Inherits from `SystemFrame`. Updates the user idle timeout at runtime. Set to `0` to disable idle detection entirely. -Inherits from `SystemFrame`. - New idle timeout in seconds. `0` disables detection. @@ -421,8 +383,6 @@ Inherits from `SystemFrame`. Performance metrics collected from processors. Emitted when metrics reporting is enabled via `StartFrame`. -Inherits from `SystemFrame`. - List of metrics data entries. @@ -433,8 +393,6 @@ Inherits from `SystemFrame`. Base metadata frame broadcast by services at startup, providing information about service capabilities and configuration. -Inherits from `SystemFrame`. - Name of the service that emitted this metadata. @@ -446,7 +404,53 @@ Metadata from an STT service, including latency characteristics used for turn de Inherits from `ServiceMetadataFrame`. - P99 latency in seconds for time-to-final-segment. Used by turn detectors to calibrate wait times. + P99 latency in seconds for time-to-final-segment. Used by turn detectors to + calibrate wait times. + + +## RTVI + +Frames for the [Real-Time Voice Interface (RTVI)](/server/frameworks/rtvi) protocol, which bridges clients and the pipeline. These frames handle custom messaging between the client and server. + +### RTVIServerMessageFrame + +Sends a server message to the connected client. + + + The message data to send to the client. + + +### RTVIClientMessageFrame + +A message received from the client, expecting a server response via `RTVIServerResponseFrame`. + + + Unique identifier for the client message. + + + + The message type. + + + + Optional message data from the client. + + +### RTVIServerResponseFrame + +Responds to an `RTVIClientMessageFrame`. Include the original client message frame to ensure the response is properly correlated. Set the `error` field to respond with an error instead of a normal response. + + + The original client message this response is for. + + + + Response data to send to the client. + + + + Error message. When set, the client receives an `error-response` instead of a + `server-response`. ## Task Frames @@ -457,9 +461,6 @@ Task frames provide a system-priority mechanism for requesting pipeline actions Base class for system-priority task frames. -Inherits from `SystemFrame`. - - ### CancelTaskFrame Requests immediate pipeline cancellation. Converted to a `CancelFrame` when processed by the pipeline.