Skip to main content
TurnCall is a modular monolith (FastAPI) that orchestrates real-time voice AI agents over phone calls, browser WebRTC, WhatsApp, and SMS/chat. All Pipecat code is isolated in orchestrator/; every other module is framework-agnostic.

System Overview

Inbound Call Flow (Twilio)

Dynamic routing: if the phone number’s routing_target_type is webhook, TurnCall POSTs a call-init request to your server first and applies the returned agent / variables / knowledge context before the pipeline starts. See Pre-Call Init.

Other entry points

POST /v1/calls/outbound creates the Call record and initiates the Twilio call β†’ Twilio hits /webhooks/twilio/voice/outbound β†’ the handler resolves the agent from the Call record by CallSid β†’ same pipeline as inbound.
POST /v1/webrtc/connect with an SDP offer β†’ SmallWebRTCRequestHandler creates the connection and returns the SDP answer β†’ PATCH /v1/webrtc/connect trickles ICE candidates β†’ audio flows peer-to-peer at 16kHz into the same pipeline.
Meta POSTs /webhooks/whatsapp (field calls) β†’ signature validated β†’ Pipecat WhatsAppClient handles the WebRTC SDP exchange β†’ 16kHz SmallWebRTCTransport pipeline.
Inbound text β†’ resolve session (24h TTL) β†’ build LLM history β†’ chat completion β†’ reply. No Pipecat pipeline β€” it’s a text path through services/.

Real-time Pipeline

Two pipeline modes, selected per agent via pipeline_mode.

Cascade (default, ~800ms)

Optional stages (dashed in the code): VoicemailDetector (outbound), KnowledgeRetrieval (auto-mode RAG), video avatar (HeyGen/Tavus, WebRTC + cascade only). Transcript taps sit after STT and after the LLM to record both sides.

Speech-to-Speech (~300ms)

A single model handles STT + reasoning + TTS natively over one WebSocket, so the stt/llm/tts config fields are ignored.
Twilio media is 8kHz ΞΌ-law on the wire; the serializer converts to/from PCM16. S2S models run at 24kHz, so an internal resampler bridges the rates.

Call State Machine

Module Responsibilities

ModulePurpose
api/REST API β€” projects, agents, phone numbers, calls, webhooks, WebRTC, chat
auth/API key generation (SHA-256, tc_ prefix), RBAC, project scoping
domain/Immutable Pydantic models, enums, call + session state machines
orchestrator/Pipecat pipeline β€” all Pipecat imports isolated here
services/Call control, SMS/chat, retrieval, analysis, weighted routing, template rendering
storage/SQLAlchemy async models, repository pattern, PostgreSQL + Redis
adapters/Object storage (local filesystem, S3)
events/Webhook delivery, server events, signing
webhooks/Twilio + WhatsApp handlers, media stream WebSocket

Data Model

TablePurpose
projectsTenant boundary
api_keysAuth (hashed, prefix-indexed)
agentsVersioned voice agent configs (JSONB config_blob)
phone_numbersTwilio number β†’ agent routing (incl. weighted A/B)
callsCall records with state machine
call_eventsEvent log β€” transcripts, tools, transfers
tool_invocationsTool execution audit (input/output/latency)
webhook_subscriptionsOutbound webhook config
knowledge_bases Β· documents Β· document_chunksRAG β€” metadata, files, pgvector embeddings
agent_knowledge_basesAgent ↔ KB links with retrieval mode
sms_sessions Β· sms_messagesSMS/chat session + message history
test_suites Β· test_runsAgent test scenarios + results