All Case Studies
Voice AIPipecatPlivoDeepgramSarvamGemini FlashPythonAWS EC2

Wavelength: AI Voice Agent Platform

Built a full voice AI pipeline from the transport layer up. Not an API wrapper. Real telephony, real conversations, real sales outcomes.

0 calls

In a single campaign batch

0%

Call connection rate

0%

Warm/hot qualification rate

0 versions

Prompt iterations from real logs

The Problem

Our own company (Freedom With AI) runs weekly webinars for a community of 480,000+ learners. We needed to qualify leads, recover no-shows, and warm up registrants before each session. Doing this manually with a team of telecallers was expensive, inconsistent, and couldn't scale. We needed AI agents that could have real phone conversations — not chatbots, actual voice calls over telephony.

The existing solutions (Vapi, Retell) were either too expensive at scale or didn't give us the control we needed over conversation design, voice quality, and call flow logic.

So we built our own.

What We Built

A full voice AI pipeline called Wavelength, built from the ground up:

  • Pipecat (open-source voice AI framework) for orchestration
  • Plivo for telephony (SIP trunking, call recording)
  • Deepgram and Sarvam Saaras V3 for speech-to-text (Sarvam specifically chosen for Indian English and Hindi accuracy)
  • Google Gemini Flash for LLM (chosen for speed, evaluated against Groq + Llama 4 Maverick)
  • Gemini Flash TTS for text-to-speech (migrated from Google Cloud Chirp3)

We didn't use a no-code voice AI builder. We built the pipeline from the transport layer up, including writing a custom stateful overlap-save resampler to fix audio chunking discontinuities that were causing garbled speech at chunk boundaries.

The Numbers

MetricValueContext
Calls in single batch1,436Outbound warm-up campaign for one webinar
Call connection rate54.7%Industry average for cold outbound is ~30-40%
Engaged beyond 30s25.9%Meaning the AI held a real conversation
Engaged beyond 60s20.3%Deep conversations with qualification
Warm/Hot qualification rate16.4%From total calls, not just connected
Bot personas built10+Sales, qualifier, no-show recovery, warm-up, onboarding, attendance
Prompt iterationsv1 through v11Each version based on transcript analysis of real calls
Bot silence bug23.9% → under 5%Diagnosed as concurrency issue: 46% silence at 6 PM peak, 0% at off-peak

Key Engineering Challenges

  1. Audio Frame Loss (BOT_VAD_STOP_SECS): Calls were cutting out mid-sentence. Traced to a Voice Activity Detection timeout that was too aggressive. The bot was interpreting natural speech pauses as end-of-utterance, killing the audio stream. Fixed by tuning VAD parameters and implementing a buffer window.
  2. Audio Chunking Discontinuities: TTS output was arriving in chunks that didn't align at sample boundaries, causing audible pops and garbled transitions. Built a stateful overlap-save resampler that maintains phase continuity across chunk boundaries. This is low-level DSP work, not prompt engineering.
  3. 23.9% Silent Call Rate: Nearly 1 in 4 connected calls had the bot go completely silent after the greeting. Data analysis revealed it was a concurrency problem: at 3 PM (228 calls) silence was 30%, at 6 PM (196 calls) it hit 46%, but at 5 PM (249 calls) it was only 1%. Fixed by implementing call staggering and connection pooling.
  4. STT Aggregation Bug: ~24% of calls going silent after initial greeting due to partial transcripts not being properly assembled into complete utterances. Diagnosed by analyzing transcript patterns across 786 connected calls.
← Back to all case studies

Want to build something like this? Let's talk.