orchestrator/; every other module is framework-agnostic.
System Overview
Inbound Call Flow (Twilio)
Dynamic routing: if the phone numberβs
routing_target_type is webhook, TurnCall POSTs a call-init request to your server first and applies the returned agent / variables / knowledge context before the pipeline starts. See Pre-Call Init.Other entry points
Outbound call
Outbound call
POST /v1/calls/outbound creates the Call record and initiates the Twilio call β Twilio hits /webhooks/twilio/voice/outbound β the handler resolves the agent from the Call record by CallSid β same pipeline as inbound.Browser (WebRTC)
Browser (WebRTC)
POST /v1/webrtc/connect with an SDP offer β SmallWebRTCRequestHandler creates the connection and returns the SDP answer β PATCH /v1/webrtc/connect trickles ICE candidates β audio flows peer-to-peer at 16kHz into the same pipeline.WhatsApp voice
WhatsApp voice
Meta POSTs
/webhooks/whatsapp (field calls) β signature validated β Pipecat WhatsAppClient handles the WebRTC SDP exchange β 16kHz SmallWebRTCTransport pipeline.SMS / Chat (text)
SMS / Chat (text)
Inbound text β resolve session (24h TTL) β build LLM history β chat completion β reply. No Pipecat pipeline β itβs a text path through
services/.Real-time Pipeline
Two pipeline modes, selected per agent viapipeline_mode.
Cascade (default, ~800ms)
Optional stages (dashed in the code): VoicemailDetector (outbound), KnowledgeRetrieval (auto-mode RAG), video avatar (HeyGen/Tavus, WebRTC + cascade only). Transcript taps sit after STT and after the LLM to record both sides.Speech-to-Speech (~300ms)
A single model handles STT + reasoning + TTS natively over one WebSocket, so thestt/llm/tts config fields are ignored.
Twilio media is 8kHz ΞΌ-law on the wire; the serializer converts to/from PCM16. S2S models run at 24kHz, so an internal resampler bridges the rates.
Call State Machine
Module Responsibilities
| Module | Purpose |
|---|---|
api/ | REST API β projects, agents, phone numbers, calls, webhooks, WebRTC, chat |
auth/ | API key generation (SHA-256, tc_ prefix), RBAC, project scoping |
domain/ | Immutable Pydantic models, enums, call + session state machines |
orchestrator/ | Pipecat pipeline β all Pipecat imports isolated here |
services/ | Call control, SMS/chat, retrieval, analysis, weighted routing, template rendering |
storage/ | SQLAlchemy async models, repository pattern, PostgreSQL + Redis |
adapters/ | Object storage (local filesystem, S3) |
events/ | Webhook delivery, server events, signing |
webhooks/ | Twilio + WhatsApp handlers, media stream WebSocket |
Data Model
| Table | Purpose |
|---|---|
projects | Tenant boundary |
api_keys | Auth (hashed, prefix-indexed) |
agents | Versioned voice agent configs (JSONB config_blob) |
phone_numbers | Twilio number β agent routing (incl. weighted A/B) |
calls | Call records with state machine |
call_events | Event log β transcripts, tools, transfers |
tool_invocations | Tool execution audit (input/output/latency) |
webhook_subscriptions | Outbound webhook config |
knowledge_bases Β· documents Β· document_chunks | RAG β metadata, files, pgvector embeddings |
agent_knowledge_bases | Agent β KB links with retrieval mode |
sms_sessions Β· sms_messages | SMS/chat session + message history |
test_suites Β· test_runs | Agent test scenarios + results |