Speech-to-Speech - TurnCall

Speech-to-Speech (S2S) mode skips separate STT and TTS stages — the model handles audio natively for ultra-low latency (~300ms).

Configuration

Set pipeline_mode: "s2s" in the agent config. The stt, llm, and tts fields are ignored in S2S mode.

{
  "pipeline_mode": "s2s",
  "s2s": {
    "provider": "openai",
    "model": "gpt-4o-realtime-preview",
    "voice": "alloy"
  }
}

Options

Field	Required	Description
`provider`	Yes	`openai` or `google`
`model`	No	Model name (defaults per provider)
`voice`	No	Voice name (default: `alloy` / `Charon`)
`turn_detection`	No	`server_vad` (default) or `pipecat_vad`

Pipeline

transport.input → user_agg (VAD)
  → S2S_LLM (OpenAI Realtime / Gemini Live WebSocket)
  → transport.output
  → context_aggregator.assistant → observability

Required API Keys

Provider	Environment Variable
OpenAI Realtime	`OPENAI_API_KEY`
Gemini Live	`GOOGLE_API_KEY`

S2S mode cannot be combined with voicemail_detection.enabled: true.

ToolsBuilt-in tools, webhook tools, and tool invocation recording

⌘I

​Configuration

​Options

​Pipeline

​Required API Keys

Configuration

Options

Pipeline

Required API Keys