Changelog
Follow along to see weekly accuracy and product improvements.
Automatic Language Detection improvements
We've made improvements to our Automatic Language Detection (ALD) model, yielding increased accuracy, expanded language support, and customizable confidence thresholds.
In particular, we have added support for 10 new languages, including Chinese, Finnish, and Hindi, to support a total of 17 languages in our Best tier. Additionally, we've achieved best in-class accuracy in 15 of those 17 languages when benchmarked against four leading providers.
Finally, we've added a customizable confidence threshold for ALD, allowing you to set a minimum confidence threshold for the detected language and be alerted if this threshold is not satisfied.
Read more about these recent improvements in our announcement post.
Free Offer improvements
We've made a series of improvements to our Free Offer:
- All new and existing users will get $50 in free credits (equivalent to 135 hours of Best transcription, or 417 hours of Nano transcription)
- All unused free credits will be automatically transferred to a user's account balance after upgrade to pay-as-you-go pricing.
- Free Offer users will now see a tracker in their dashboard to see how many credits they have remaining
- Free Offer users will now have access to the usage dashboard, their billing rates, concurrency limit, and billing alerts
Learn more about our Free Offer on our Pricing page, and then check out our Quickstart in our Docs to get started.
Speaker Diarization improvements
We've made improvements to our Speaker Diarization model, especially robustness in distinguishing between speakers with similar voices.
We've fixed an error in which the last word in a transcript was always attributed to the same speaker as the second-to-last word.
File upload improvements and more
We've made improvements to error handling for file uploads that fail. Now if there is an error, such as a file containing no audio, the following 422 error will be returned:
Upload failed, please try again. If you continue to have issues please reach out to support@assemblyai.com
We've made scaling improvements that reduce p90 latency for some non-English languages when using the Best tier
We've made improvements to notifications for auto-refill failures. Now, users will be alerted more rapidly when their automatic payments are unsuccessful.
New endpoints for LeMUR Claude 3
Last month, we announced support for Claude 3 in LeMUR. Today, we are adding support for two new endpoints - Question & Answer and Summary (in addition to the pre-existing Task endpoint) - for these newest models:
- Claude 3 Opus
- Claude 3.5 Sonnet
- Claude 3 Sonnet
- Claude 3 Haiku
Here's how you can use Claude 3.5 Sonnet to summarize a virtual meeting with LeMUR:
import assemblyai as aai
aai.settings.api_key = "YOUR-KEY-HERE"
audio_url = "https://storage.googleapis.com/aai-web-samples/meeting.mp4"
transcript = aai.Transcriber().transcribe(audio_url)
result = transcript.lemur.summarize(
final_model=aai.LemurModel.claude3_5_sonnet,
context="A GitLab meeting to discuss logistics",
answer_format="TLDR"
)
print(result.response)
Learn more about these specialized endpoints and how to use them in our Docs.
Enhanced AssemblyAI app for Zapier
We've launched our Zapier integration v2.0, which makes it easy to use our API in a no-code way. The enhanced app is more flexible, supports more Speech AI features, and integrates more closely into the Zap editor.
The Transcribe
event (formerly Get Transcript
) now supports all of the options available in our transcript API, making all of our Speech Recognition and Audio Intelligence features available to Zapier users, including asynchronous transcription. In addition, we've added 5 new events to the AssemblyAI app for Zapier:
Get Transcript
: Retrieve a transcript that you have previously created.Get Transcript Subtitles
: Generate STT or VTT subtitles for the transcript.Get Transcript Paragraphs
: Retrieve the transcript segmented into paragraphs.Get Transcript Sentences
: Retrieve the transcript segmented into sentences.Get Transcript Redacted Audio Result
: Retrieve the result of the PII audio redaction model. The result contains the status and the URL to the redacted audio file.
Read more about how to use the new app in our Docs, or check out our tutorial to see how you can generate subtitles with Zapier and AssemblyAI.
LeMUR browser support
LeMUR can now be used from browsers, either via our JavaScript SDK or fetch
.
LeMUR - Claude 3 support
Last week, we released Anthropic's Claude 3 model family into LeMUR, our LLM framework for speech.
- Claude 3.5 Sonnet
- Claude 3 Opus
- Claude 3 Sonnet
- Claude 3 Haiku
You can now easily apply any of these models to your audio data. Learn more about how to get started in our docs or try out the new models in a no-code way through our playground.
For more information, check out our blog post about the release.
import assemblyai as aai
# Step 1: Transcribe an audio file
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("./common_sports_injuries.mp3")
# Step 2: Define a prompt
prompt = "Provide a brief summary of the transcript."
# Step 3: Choose an LLM to use with LeMUR
result = transcript.lemur.task(
prompt,
final_model=aai.LemurModel.claude3_5_sonnet
)
print(result.response)
JavaScript SDK fix
We've fixed an issue which was causing the JavaScript SDK to surface the following error when using the SDK in the browser:
Access to fetch at 'https://api.assemblyai.com/v2/transcript' from origin 'https://exampleurl.com' has been blocked by CORS policy: Request header field assemblyai-agent is not allowed by Access-Control-Allow-Headers in preflight response.
Timestamps improvement; bugfixes
We've made significant improvements to the timestamp accuracy of our Speech-to-Text Best tier for English, Spanish, and German. 96% of timestamps are accurate within 200ms, and 86% of timestamps are now accurate within 100ms.
We've fixed a bug in which confidence scores of transcribed words for the Nano tier would sometimes be outside of the range [0, 1]
We've fixed a rare issue in which the speech for only one channel in a short dual channel file would be transcribed when disfluencies
was also enabled.
Streaming (formerly Real-time) improvements
We've made model improvements that significantly improve the accuracy of timestamps when using our Streaming Speech-to-Text service. Most timestamps are now accurate within 100 ms.
Our Streaming Speech-to-Text service will now return a new error 'Audio too small to be transcoded'
(code 4034
) when a client submits an audio chunk that is too small to be transcoded (less than 10 ms).
Variable-bitrate video support; bugfix
We've deployed changes which now permit variable-bitrate video files to be submitted to our API.
We've fixed a recent bug in which audio files with a large amount of silence at the beginning of the file would fail to transcribe.
LeMUR improvements
We have added two new keys to the LeMUR response, input_tokens
and output_tokens
, which can help users track usage.
We've implemented a new fallback system to further boost the reliability of LeMUR.
We have addressed an edge case issue affecting LeMUR and certain XML tags. In particular, when LeMUR responds with a <question>
XML tag, it will now always close it with a </question>
tag rather than erroneous tags which would sometimes be returned (e.g. </answer>
).
PII Redaction and Entity Detection improvements
We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts.
We've added 16 new entities, including vehicle_id
and account_number
, and updated 4 of our existing entities. Users may need to update to the latest version of our SDKs to use these new entities.
We've added PII Text Redaction and Entity Detection support in 4 new languages:
- Chinese
- Dutch
- Japanese
- Georgian
PII Text Redaction and Entity Detection now support a total of 47 languages between our Best and Nano tiers.
Usage and spend alerts
Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold.
To set up a billing alert, go to the billing page of your portal, and click Set up a new alert
under the Your alerts
widget:

You can then set up an alert by specifying whether to alert on monthly spend or account balance, as well as the specific threshold at which to send an alert.
Universal-1 now available in German
Universal-1, our most powerful and accurate multilingual Speech-to-Text model, is now available in German.
No special action is needed to utilize Universal-1 on German audio - all requests sent to our /v2/transcript
endpoint with German audio files will now use Universal-1 by default. Learn more about how to integrate Universal-1 into your apps in our Getting Started guides.
New languages for Speaker Diarization
Speaker Diarization is now available in five additional languages for both the Best and Nano tiers:
- Chinese
- Hindi
- Japanese
- Korean
- Vietnamese
New API Reference, Timestamps improvements
We’ve released a new version of the API Reference section of our docs for an improved developer experience. Here’s what’s new:
- New API Reference pages with exhaustive endpoint documentation for transcription, LeMUR, and streaming
- cURL examples for every endpoint
- Interactive Playground: Test our API endpoints with the interactive playground. It includes a form-builder for generating requests and corresponding code examples in cURL, Python, and TypeScript
- Always up to date: The new API Reference is autogenerated based on our Open-Source OpenAPI and AsyncAPI specs
We’ve made improvements to Universal-1’s timestamps for both the Best and Nano tiers, yielding improved timestamp accuracy and a reduced incidence of overlapping timestamps.
We’ve fixed an issue in which users could receive an `Unable to create transcription. Developers have been alerted` error that would be surfaced when using long files with Sentiment Analysis.
New codec support; account deletion support
We’ve upgraded our transcoding library and now support the following new codecs:
Bonk
,APAC
,Mi-SC4
,100i
,VQC
,FTR PHM
,WBMP
,XMD ADPCM
,WADY DPCM
,CBD2 DPCM
HEVC
,VP9
,AV1
codec in enhancedflv
format
Users can now delete their accounts by selecting the Delete account
option on the Account page of their AssemblyAI Dashboards.
Users will now receive a 400 error when using an invalid tier and language code combination, with an error message such as The selected language_code is supported by the following speech_models: best, conformer-2. See https://www.assemblyai.com/docs/concepts/supported-languages.
.
We’ve fixed an issue in which nested JSON responses from LeMUR would cause Invalid LLM response, unable to fulfill request. Please try again.
errors.
We’ve fixed a bug in which very long files would sometimes fail to transcribe, leading to timeout errors.
AssemblyAI app for Make.com
Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services.
We’ve released the AssemblyAI app for Make that allows Make users to incorporate AssemblyAI into their workflows, or scenarios. In other words, in Make you can now use our AI models to
- Transcribe audio data with speech recognition models
- Analyze audio data with audio intelligence models
- Build generative features on top of audio data with LLMs
For example, in our tutorial on Redacting PII with Make, we demonstrate how to build a Make scenario that automatically creates a redacted audio file and redacted transcription for any audio file uploaded to a Google Drive folder.
