Why Navan Should Use LiveKit

A comprehensive proposal covering the LiveKit open-source framework, recommended architecture, and the advantages of LiveKit Cloud for hosting your voice AI infrastructure

Current State

Navan Voice Agent Current Architecture

Twilio SIP Trunk + Azure-hosted Voice Agent with OpenAI LLM

End UserPSTN / Mobile

TwilioSIP Trunk

WebSockets

Voice Agent

Azure

Navan AgentAzure Container Apps

WebSockets

OpenAIGPT-4 / Realtime API

Inbound Call

User dials in via PSTN, Twilio receives and converts to SIP

SIP Trunking

Twilio SIP Trunk bridges to Azure via Websockets

Agent Processing

Voice Agent on Azure handles STT, TTS, and conversation flow

LLM Response

OpenAI generates intelligent responses, streamed back to user

Disadvantages of This Architecture

Limited to OpenAI Realtime API Capabilities

Every new feature request becomes an architecture crisis

Building on an Incomplete and Unstable Foundation

OpenAI's Realtime API only does one thing—everything else requires workarounds

PM wants to add screen sharing to support calls? Can't do it.

Product wants an AI avatar for branding? Impossible.

Customer success needs conversation recording with custom metadata? Not supported.

Sales wants multi-agent handoffs for specialized expertise? Rewrite everything.

You're constantly telling stakeholders 'no' because the API only supports voice-in, voice-out. Each workaround means duct-taping external services together, creating technical debt and maintenance nightmares.

Each Workaround Creates More Problems

Separate recording infrastructure (Recall, MeetKay, custom builds)
Bolt-on analytics that can't see inside the conversation flow
Hacked interruption handling that feels janky to users
Zero ability to inject visual context or tools mid-conversation
Manual session management code that's fragile and hard to test

Building your Own Framework for Voice Agents vs. Leveraging Open Source

No new updates for voice features: better turn detection, better barge-in detection, new model/plugin support

Building Voice AI Without an Open-Source Framework

Every feature you need, you must build and maintain yourself

DIY

STT → LLM → TTS Pipeline

DIY

Turn Detection

DIY

Barge-in Handling

DIY

LLM Orchestration

DIY

Multi-modal (Voice/Video/Text)

DIY

Tool Use & Function Calling

DIY

Multi-agent Handoff

DIY

Provider Integrations

The Result

→Months of engineering time to build core features
→No community contributions or shared improvements
→Every new AI model requires custom integration
→Security patches are your responsibility