Enhanced diarization + Dovetail case study + G2 wins | July 18, 2025 Newsletter
AssemblyAI's new speaker diarization model delivers 30% better accuracy in noisy audio. Plus: Build 465ms voice agents & learn how Dovetail improved WER by 36%.



Hey 👋, welcome to the AssemblyAI Weekly Newsletter! Each week, we share our latest product improvements, in-depth tutorials, real-world case studies, and the best content from our developer community.
Revolutionary Speaker Diarization: 30% Accuracy Improvement Now Live
The Challenge of Real-World Audio
If you've worked with audio transcription, you know that clean recording conditions are rare. Conference rooms have background noise, voices overlap during discussions, and remote participants join with varying audio quality. These scenarios have traditionally been challenging for speaker diarization models—which is why we've focused our latest improvements here.
Breaking Down the Numbers
Our team has released new speaker embedding improvements with notable updates for noisy and far-field audio:
- Overall accuracy improvement: Error rates dropped from 29.1% to 20.4%—a dramatic 30% reduction
- Short segment performance: 43% improvement for ultra-short segments (250ms)
- Reverberant environments: 57% improvement in echo-prone spaces
- Industry-leading speaker count accuracy: Just 2.9% error rate in identifying the number of speakers
The Technology Behind the Breakthrough
This isn't just an incremental update—it's a fundamental reimagining of how speaker diarization should work. Our new speaker embedding model incorporates several key innovations:
- Updated embedding architecture: Better identification and extraction of acoustical features to differentiate between speakers in challenging conditions
- Higher resolution processing: Increased input sampling frequency from 8 kHz to 16 kHz for capturing more voice characteristics
- Improved noise handling: Better differentiation between speech and background noise
Real-World Impact
For teams processing customer calls, meeting recordings, or field interviews, these improvements can help with:
- More accurate call analytics and coaching insights
- Better meeting transcription with clear speaker attribution
- Reduced manual correction time for transcription services
- Enhanced accessibility for multi-speaker content
These improvements are available now for all customers—no code changes required. Enable speaker diarization in your API calls to use the enhanced model. Read more details here.
Industry Recognition: G2 Summer 2025 Voice Recognition Reports
Leading the Pack in 10 Categories
AssemblyAI received top rankings across 10 categories in G2's Summer 2025 Voice Recognition Reports. Based on customer reviews, key ratings include:
Our customers consistently highlight:
- Ease of Use: 9.9/10 rating (industry average: 8.3)
- Quality of Support: 9.6/10 rating (industry average: 8.3)
- Developer Experience: Customers note well-documented APIs and consistent updates
- Accuracy: Strong performance particularly for challenging audio
These rankings are based on peer reviews from actual customers using the platform in production. View the full report here.
Technical Deep Dive: Building the Lowest Latency Voice Agent with Vapi
Achieving ~465ms End-to-End Latency
Conversational AI requires minimal latency to feel natural. We've created a guide with Vapi on building voice agents that achieve approximately 465ms end-to-end latency.
The Latency Breakdown
Understanding where latency comes from is the first step to eliminating it:
- Speech-to-Text (STT): ~90ms with AssemblyAI's Universal-Streaming
- Language Model Processing: ~200ms with optimized LLMs
- Text-to-Speech (TTS): ~150ms with modern TTS engines
- Network overhead: ~25ms (varies by location)
Key Optimization Strategies
Our guide reveals several crucial optimizations that many developers miss:
- Disable STT formatting: Setting
format_turns: false
reduces processing time, as modern LLMs can handle unformatted transcripts - Configure turn detection: Proper configuration of Vapi's
startSpeakingPlan
can reduce response delays - Model selection: AssemblyAI's Universal-Streaming API delivers transcripts in approximately 90ms
- Network optimization: Strategic deployment and edge computing can reduce network latency
Customer Success Story: How Dovetail Improved WER by 36%
The Challenge
Dovetail, a customer intelligence platform used by companies including Amazon and Canva, processes customer feedback from interviews, support calls, and product reviews. Accurate transcription is essential for their analysis capabilities.
As Doug Rathbone, VP of Engineering at Dovetail, notes: "The accuracy of transcription is critical to the value we deliver to our customers. Humans communicate in a very messy fashion."
The Solution
After conducting an in-depth comparison of transcription providers, Dovetail chose AssemblyAI. The results speak for themselves:
- 36% improvement in Word Error Rate (WER)
- Significant enhancement in speaker diarization accuracy
- Better handling of accents and conversational speech
- Seamless integration through AWS Marketplace
The Impact
This partnership has enabled Dovetail to:
- Process and analyze customer conversations 3× faster
- Deliver more accurate insights to their 4,000+ customers
- Build advanced AI features like auto-summarization and real-time coaching agents
- Scale their platform while maintaining transcription quality
"These results made it really simple for us to use AssemblyAI as our preferred transcription provider," says Rathbone. The AWS Marketplace integration helped streamline procurement: "It made it easier to just think about AssemblyAI as part of our rolled-up cloud costs, which helps us stay nimble."
Read the complete case study here.
Get Started Today
Ready to experience these improvements for yourself? Here's how to get started:
- Test in the Playground: Try our new speaker diarization model with your own audio files at assemblyai.com/playground
- Read the Documentation: Explore our comprehensive guides at assemblyai.com/docs
- Join the Community: Connect with other developers on our Discord server
- Start Building: Sign up for a free API key and get $50 in credits to experiment at assemblyai.com/dashboard/signup
The advancements we're announcing today—from our 30% improvement in speaker diarization to the ultra-low latency voice agent configurations—represent our ongoing commitment to pushing the boundaries of Speech AI technology. Whether you're building the next generation of customer service tools, creating accessible content, or innovating in voice-first applications, AssemblyAI provides the foundation you need to succeed.
Join thousands of developers and companies who are already building the future with AssemblyAI. The voice revolution is here, and it's more accurate, faster, and more accessible than ever before.
Want to stay updated on the latest in Speech AI? Subscribe to our newsletter and follow us on Twitter and LinkedIn for real-time updates and insights.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.