Skip to main content
Add a real-time, lip-synced video avatar to a voice agent. The avatar consumes the agent’s TTS audio and renders a talking-head video that streams to the browser alongside the voice.
Avatars are WebRTC + cascade only. The avatar taps the tts stage, which S2S doesn’t have. On a phone (Twilio/WhatsApp) or S2S agent the avatar is skipped with a warning.

Configuration

Set transport: "webrtc", pipeline_mode: "cascade", and an avatar block.
{
  "transport": "webrtc",
  "pipeline_mode": "cascade",
  "avatar": {
    "enabled": true,
    "provider": "heygen",
    "avatar_id": "<liveavatar-id>",
    "is_sandbox": true
  }
}

Options

FieldRequiredDescription
enabledYesTurn the avatar on
providerYesheygen or tavus
avatar_idHeyGenLiveAvatar avatar ID
is_sandboxNoHeyGen sandbox mode (default true); some avatars are production-only
replica_idTavusTavus replica ID
persona_idNoTavus persona (default pipecat-stream — lip-syncs your TTS)

Providers

ProviderLatencyQualityNotes
Tavussub-600ms1080p, highest fidelityRecommended for quality + latency
HeyGen~600ms+ render buffergoodLiveAvatar platform
HeyGen needs a LiveAvatar key, not a HeyGen key. Pipecat targets api.liveavatar.com; HeyGen’s old /v1/streaming.* API is sunset. Get the key from app.liveavatar.com.

Required API Keys

ProviderEnvironment VariableWhere
HeyGenHEYGEN_LIVE_AVATAR_API_KEYapp.liveavatar.com
TavusTAVUS_API_KEYplatform.tavus.io
Tavus also requires the tavus extra (pip install -e . pulls daily-python).

Pipeline

transport.input → STT → user_agg (VAD + SmartTurn)
  → LLM → TTS → [avatar] → transport.output (audio + video)
  → context_aggregator.assistant → observability
The avatar provider runs its own WebRTC leg to its servers (HeyGen → LiveKit, Tavus → Daily) and emits video frames back into the pipeline; TurnCall’s SmallWebRTC transport carries them to the browser. The provider’s leg is internal — the user-facing transport stays SmallWebRTC.

Latency

The avatar adds an inherent render/buffer delay (~600ms+) on top of the cascade response — it cannot be tuned away, only minimized by provider choice. Tavus is currently the lowest-latency option. See examples/video-avatar for a runnable setup script.