Last episode we talked about the challenges teams face building production voice AI systems. This time, Aidan and Jack share the practical tips, tricks, and hacks they've learned from dozens of teams building voice agents in the real world.

From audio isolation tools to parallel transcription, from thinking noises to turn-taking tuning—these are the techniques that actually work.

What We Cover

Audio Quality & Transcription

Audio isolation tools like Krisp and AI Acoustics that "solve challenges overnight"
Using LLMs to filter out background noise from transcriptions
The simple fix: prompting agents to ask users to repeat themselves
Parallel transcription with fast/slow models running simultaneously
Post-call processing with higher-quality models for summaries and verification

Speed vs. Accuracy Trade-offs

Deliberate pauses to buy time for slower, more accurate transcription
Language-specific model performance (Mistral for German, etc.)
Why teams keep evaluating new models—and why it's so time-consuming
WhatsApp voice memos as a transcription quality hack

Conversation Flow & UX

Thinking noises: why silence breaks conversations
The elevator mirror principle—distraction beats optimization
Proactive agents that ask "can you hear me?" during silence
Turn-taking: why being patient beats being eager
Using IPA (International Phonetic Alphabet) for brand name pronunciation

Hybrid Approaches

Text as a fallback when voice isn't working
Boardy.ai as an example of mixing voice and text effectively

Timestamps

— Intro: tips & tricks from production teams
— Audio isolation tools (Krisp, AI Acoustics)
— LLM filtering for background noise
— Prompting agents to ask users to repeat themselves
— Parallel transcription with fast/slow models
— Post-call processing for better transcripts
— Speed vs accuracy trade-offs
— Language-specific model performance
— WhatsApp voice memo hack
— Text as a fallback, Boardy.ai example
— Thinking noises and conversation flow
— The elevator mirror story
— Proactive silence handling
— Turn-taking: patience over eagerness
— IPA for pronunciation + wrap-up

Resources

Mentioned in This Episode

Krisp — AI-powered noise cancellation
AI Acoustics — Audio isolation (powers Sennheiser/Bose)
Boardy.ai — Voice + text hybrid onboarding example
Deepgram Flux — Speech-to-text with embedded turn-taking

Have a topic you'd like us to cover? Reach out on X @uselayercode or email us at podcast@layercode.com

Tips and tricks for reliable voice AI agents

What We Cover

Audio Quality & Transcription

Speed vs. Accuracy Trade-offs

Conversation Flow & UX

Hybrid Approaches

Timestamps

Resources

Mentioned in This Episode

Related posts

Why voice AI agents feel weird in production

90 minutes of voice AI madness at our lightning hackathon

How to write prompts for voice AI agents