Newsletter
July 17, 2025

Enhanced diarization + Dovetail case study + G2 wins | July 18, 2025 Newsletter

AssemblyAI's new speaker diarization model delivers 30% better accuracy in noisy audio. Plus: Build 465ms voice agents & learn how Dovetail improved WER by 36%.

No items found.
Devon Malloy
Senior Growth Manager
Devon Malloy
Senior Growth Manager
Reviewed by
No items found.
No items found.
No items found.
No items found.

Hey 👋, welcome to the AssemblyAI Weekly Newsletter! Each week, we share our latest product improvements, in-depth tutorials, real-world case studies, and the best content from our developer community.

Revolutionary Speaker Diarization: 30% Accuracy Improvement Now Live

The Challenge of Real-World Audio

If you've worked with audio transcription, you know that clean recording conditions are rare. Conference rooms have background noise, voices overlap during discussions, and remote participants join with varying audio quality. These scenarios have traditionally been challenging for speaker diarization models—which is why we've focused our latest improvements here.

Breaking Down the Numbers

Our team has released new speaker embedding improvements with notable updates for noisy and far-field audio:

  • Overall accuracy improvement: Error rates dropped from 29.1% to 20.4%—a dramatic 30% reduction
  • Short segment performance: 43% improvement for ultra-short segments (250ms)
  • Reverberant environments: 57% improvement in echo-prone spaces
  • Industry-leading speaker count accuracy: Just 2.9% error rate in identifying the number of speakers

The Technology Behind the Breakthrough

This isn't just an incremental update—it's a fundamental reimagining of how speaker diarization should work. Our new speaker embedding model incorporates several key innovations:

  • Updated embedding architecture: Better identification and extraction of acoustical features to differentiate between speakers in challenging conditions
  • Higher resolution processing: Increased input sampling frequency from 8 kHz to 16 kHz for capturing more voice characteristics
  • Improved noise handling: Better differentiation between speech and background noise

Real-World Impact

For teams processing customer calls, meeting recordings, or field interviews, these improvements can help with:

  • More accurate call analytics and coaching insights
  • Better meeting transcription with clear speaker attribution
  • Reduced manual correction time for transcription services
  • Enhanced accessibility for multi-speaker content

These improvements are available now for all customers—no code changes required. Enable speaker diarization in your API calls to use the enhanced model. Read more details here.

Industry Recognition: G2 Summer 2025 Voice Recognition Reports

Leading the Pack in 10 Categories

AssemblyAI received top rankings across 10 categories in G2's Summer 2025 Voice Recognition Reports. Based on customer reviews, key ratings include:

Our customers consistently highlight:

  • Ease of Use: 9.9/10 rating (industry average: 8.3)
  • Quality of Support: 9.6/10 rating (industry average: 8.3)
  • Developer Experience: Customers note well-documented APIs and consistent updates
  • Accuracy: Strong performance particularly for challenging audio

These rankings are based on peer reviews from actual customers using the platform in production. View the full report here.

Technical Deep Dive: Building the Lowest Latency Voice Agent with Vapi

Achieving ~465ms End-to-End Latency

Conversational AI requires minimal latency to feel natural. We've created a guide with Vapi on building voice agents that achieve approximately 465ms end-to-end latency.

The Latency Breakdown

Understanding where latency comes from is the first step to eliminating it:

  1. Speech-to-Text (STT): ~90ms with AssemblyAI's Universal-Streaming
  2. Language Model Processing: ~200ms with optimized LLMs
  3. Text-to-Speech (TTS): ~150ms with modern TTS engines
  4. Network overhead: ~25ms (varies by location)

Key Optimization Strategies

Our guide reveals several crucial optimizations that many developers miss:

  1. Disable STT formatting: Setting format_turns: false reduces processing time, as modern LLMs can handle unformatted transcripts
  2. Configure turn detection: Proper configuration of Vapi's startSpeakingPlan can reduce response delays
  3. Model selection: AssemblyAI's Universal-Streaming API delivers transcripts in approximately 90ms
  4. Network optimization: Strategic deployment and edge computing can reduce network latency

Read the complete guide here.

Customer Success Story: How Dovetail Improved WER by 36%

The Challenge

Dovetail, a customer intelligence platform used by companies including Amazon and Canva, processes customer feedback from interviews, support calls, and product reviews. Accurate transcription is essential for their analysis capabilities.

As Doug Rathbone, VP of Engineering at Dovetail, notes: "The accuracy of transcription is critical to the value we deliver to our customers. Humans communicate in a very messy fashion."

The Solution

After conducting an in-depth comparison of transcription providers, Dovetail chose AssemblyAI. The results speak for themselves:

  • 36% improvement in Word Error Rate (WER)
  • Significant enhancement in speaker diarization accuracy
  • Better handling of accents and conversational speech
  • Seamless integration through AWS Marketplace

The Impact

This partnership has enabled Dovetail to:

  • Process and analyze customer conversations 3× faster
  • Deliver more accurate insights to their 4,000+ customers
  • Build advanced AI features like auto-summarization and real-time coaching agents
  • Scale their platform while maintaining transcription quality
"These results made it really simple for us to use AssemblyAI as our preferred transcription provider," says Rathbone. The AWS Marketplace integration helped streamline procurement: "It made it easier to just think about AssemblyAI as part of our rolled-up cloud costs, which helps us stay nimble."

Read the complete case study here.

Get Started Today

Ready to experience these improvements for yourself? Here's how to get started:

  1. Test in the Playground: Try our new speaker diarization model with your own audio files at assemblyai.com/playground
  2. Read the Documentation: Explore our comprehensive guides at assemblyai.com/docs
  3. Join the Community: Connect with other developers on our Discord server
  4. Start Building: Sign up for a free API key and get $50 in credits to experiment at assemblyai.com/dashboard/signup

The advancements we're announcing today—from our 30% improvement in speaker diarization to the ultra-low latency voice agent configurations—represent our ongoing commitment to pushing the boundaries of Speech AI technology. Whether you're building the next generation of customer service tools, creating accessible content, or innovating in voice-first applications, AssemblyAI provides the foundation you need to succeed.

Join thousands of developers and companies who are already building the future with AssemblyAI. The voice revolution is here, and it's more accurate, faster, and more accessible than ever before.

Want to stay updated on the latest in Speech AI? Subscribe to our newsletter and follow us on Twitter and LinkedIn for real-time updates and insights.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
No items found.