Changelog

Follow along to see weekly accuracy and product improvements.

Subscribe to updates Follow us on Twitter

September 5, 2024

Automatic Language Detection improvements

We've made improvements to our Automatic Language Detection (ALD) model, yielding increased accuracy, expanded language support, and customizable confidence thresholds.

In particular, we have added support for 10 new languages, including Chinese, Finnish, and Hindi, to support a total of 17 languages in our Best tier. Additionally, we've achieved best in-class accuracy in 15 of those 17 languages when benchmarked against four leading providers.

Finally, we've added a customizable confidence threshold for ALD, allowing you to set a minimum confidence threshold for the detected language and be alerted if this threshold is not satisfied.

Read more about these recent improvements in our announcement post.

August 29, 2024

Free Offer improvements

We've made a series of improvements to our Free Offer:

All new and existing users will get $50 in free credits (equivalent to 135 hours of Best transcription, or 417 hours of Nano transcription)
All unused free credits will be automatically transferred to a user's account balance after upgrade to pay-as-you-go pricing.
Free Offer users will now see a tracker in their dashboard to see how many credits they have remaining
Free Offer users will now have access to the usage dashboard, their billing rates, concurrency limit, and billing alerts

Learn more about our Free Offer on our Pricing page, and then check out our Quickstart in our Docs to get started.

August 29, 2024

Speaker Diarization improvements

We've made improvements to our Speaker Diarization model, especially robustness in distinguishing between speakers with similar voices.

We've fixed an error in which the last word in a transcript was always attributed to the same speaker as the second-to-last word.

August 29, 2024

File upload improvements and more

We've made improvements to error handling for file uploads that fail. Now if there is an error, such as a file containing no audio, the following 422 error will be returned:

Upload failed, please try again. If you continue to have issues please reach out to support@assemblyai.com

We've made scaling improvements that reduce p90 latency for some non-English languages when using the Best tier

We've made improvements to notifications for auto-refill failures. Now, users will be alerted more rapidly when their automatic payments are unsuccessful.

August 8, 2024

New endpoints for LeMUR Claude 3

Last month, we announced support for Claude 3 in LeMUR. Today, we are adding support for two new endpoints - Question & Answer and Summary (in addition to the pre-existing Task endpoint) - for these newest models:

Claude 3 Opus
Claude 3.5 Sonnet
Claude 3 Sonnet
Claude 3 Haiku

Here's how you can use Claude 3.5 Sonnet to summarize a virtual meeting with LeMUR:

import assemblyai as aai

aai.settings.api_key = "YOUR-KEY-HERE"

audio_url = "https://storage.googleapis.com/aai-web-samples/meeting.mp4"
transcript = aai.Transcriber().transcribe(audio_url)

result = transcript.lemur.summarize(
    final_model=aai.LemurModel.claude3_5_sonnet,
    context="A GitLab meeting to discuss logistics",
    answer_format="TLDR"
)

print(result.response)

Learn more about these specialized endpoints and how to use them in our Docs.

August 6, 2024

Enhanced AssemblyAI app for Zapier

We've launched our Zapier integration v2.0, which makes it easy to use our API in a no-code way. The enhanced app is more flexible, supports more Speech AI features, and integrates more closely into the Zap editor.

The Transcribe event (formerly Get Transcript) now supports all of the options available in our transcript API, making all of our Speech Recognition and Audio Intelligence features available to Zapier users, including asynchronous transcription. In addition, we've added 5 new events to the AssemblyAI app for Zapier:

Get Transcript: Retrieve a transcript that you have previously created.
Get Transcript Subtitles: Generate STT or VTT subtitles for the transcript.
Get Transcript Paragraphs: Retrieve the transcript segmented into paragraphs.
Get Transcript Sentences: Retrieve the transcript segmented into sentences.
Get Transcript Redacted Audio Result: Retrieve the result of the PII audio redaction model. The result contains the status and the URL to the redacted audio file.

Read more about how to use the new app in our Docs, or check out our tutorial to see how you can generate subtitles with Zapier and AssemblyAI.

July 29, 2024

LeMUR browser support

LeMUR can now be used from browsers, either via our JavaScript SDK or fetch.

July 16, 2024

LeMUR - Claude 3 support

Last week, we released Anthropic's Claude 3 model family into LeMUR, our LLM framework for speech.

Claude 3.5 Sonnet
Claude 3 Opus
Claude 3 Sonnet
Claude 3 Haiku

You can now easily apply any of these models to your audio data. Learn more about how to get started in our docs or try out the new models in a no-code way through our playground.

For more information, check out our blog post about the release.

import assemblyai as aai

# Step 1: Transcribe an audio file
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("./common_sports_injuries.mp3")

# Step 2: Define a prompt
prompt = "Provide a brief summary of the transcript."

# Step 3: Choose an LLM to use with LeMUR
result = transcript.lemur.task(
    prompt,
    final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)

July 16, 2024

JavaScript SDK fix

We've fixed an issue which was causing the JavaScript SDK to surface the following error when using the SDK in the browser:

Access to fetch at 'https://api.assemblyai.com/v2/transcript' from origin 'https://exampleurl.com' has been blocked by CORS policy: Request header field assemblyai-agent is not allowed by Access-Control-Allow-Headers in preflight response.

July 9, 2024

Timestamps improvement; bugfixes

We've made significant improvements to the timestamp accuracy of our Speech-to-Text Best tier for English, Spanish, and German. 96% of timestamps are accurate within 200ms, and 86% of timestamps are now accurate within 100ms.

We've fixed a bug in which confidence scores of transcribed words for the Nano tier would sometimes be outside of the range [0, 1]

We've fixed a rare issue in which the speech for only one channel in a short dual channel file would be transcribed when disfluencies was also enabled.

June 24, 2024

Streaming (formerly Real-time) improvements

We've made model improvements that significantly improve the accuracy of timestamps when using our Streaming Speech-to-Text service. Most timestamps are now accurate within 100 ms.

Our Streaming Speech-to-Text service will now return a new error 'Audio too small to be transcoded' (code 4034) when a client submits an audio chunk that is too small to be transcoded (less than 10 ms).

June 20, 2024

Variable-bitrate video support; bugfix

We've deployed changes which now permit variable-bitrate video files to be submitted to our API.

We've fixed a recent bug in which audio files with a large amount of silence at the beginning of the file would fail to transcribe.

June 19, 2024

LeMUR improvements

We have added two new keys to the LeMUR response, input_tokens and output_tokens, which can help users track usage.

We've implemented a new fallback system to further boost the reliability of LeMUR.

We have addressed an edge case issue affecting LeMUR and certain XML tags. In particular, when LeMUR responds with a <question> XML tag, it will now always close it with a </question> tag rather than erroneous tags which would sometimes be returned (e.g. </answer>).

June 18, 2024

PII Redaction and Entity Detection improvements

We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts.

We've added 16 new entities, including vehicle_id and account_number, and updated 4 of our existing entities. Users may need to update to the latest version of our SDKs to use these new entities.

We've added PII Text Redaction and Entity Detection support in 4 new languages:

Chinese
Dutch
Japanese
Georgian

PII Text Redaction and Entity Detection now support a total of 47 languages between our Best and Nano tiers.

June 14, 2024

Usage and spend alerts

Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold.

To set up a billing alert, go to the billing page of your portal, and click Set up a new alert under the Your alerts widget:

You can then set up an alert by specifying whether to alert on monthly spend or account balance, as well as the specific threshold at which to send an alert.

June 13, 2024

Universal-1 now available in German

Universal-1, our most powerful and accurate multilingual Speech-to-Text model, is now available in German.

No special action is needed to utilize Universal-1 on German audio - all requests sent to our /v2/transcript endpoint with German audio files will now use Universal-1 by default. Learn more about how to integrate Universal-1 into your apps in our Getting Started guides.

May 24, 2024

New languages for Speaker Diarization

Speaker Diarization is now available in five additional languages for both the Best and Nano tiers:

Chinese
Hindi
Japanese
Korean
Vietnamese

May 13, 2024

New API Reference, Timestamps improvements

We’ve released a new version of the API Reference section of our docs for an improved developer experience. Here’s what’s new:

New API Reference pages with exhaustive endpoint documentation for transcription, LeMUR, and streaming
cURL examples for every endpoint
Interactive Playground: Test our API endpoints with the interactive playground. It includes a form-builder for generating requests and corresponding code examples in cURL, Python, and TypeScript
Always up to date: The new API Reference is autogenerated based on our Open-Source OpenAPI and AsyncAPI specs

We’ve made improvements to Universal-1’s timestamps for both the Best and Nano tiers, yielding improved timestamp accuracy and a reduced incidence of overlapping timestamps.

We’ve fixed an issue in which users could receive an `Unable to create transcription. Developers have been alerted` error that would be surfaced when using long files with Sentiment Analysis.

May 3, 2024

New codec support; account deletion support

We’ve upgraded our transcoding library and now support the following new codecs:

Bonk, APAC, Mi-SC4, 100i, VQC, FTR PHM, WBMP, XMD ADPCM, WADY DPCM, CBD2 DPCM
HEVC, VP9, AV1 codec in enhanced flv format

Users can now delete their accounts by selecting the Delete account option on the Account page of their AssemblyAI Dashboards.

Users will now receive a 400 error when using an invalid tier and language code combination, with an error message such as The selected language_code is supported by the following speech_models: best, conformer-2. See https://www.assemblyai.com/docs/concepts/supported-languages..

We’ve fixed an issue in which nested JSON responses from LeMUR would cause Invalid LLM response, unable to fulfill request. Please try again. errors.

We’ve fixed a bug in which very long files would sometimes fail to transcribe, leading to timeout errors.

May 1, 2024

AssemblyAI app for Make.com

Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services.

We’ve released the AssemblyAI app for Make that allows Make users to incorporate AssemblyAI into their workflows, or scenarios. In other words, in Make you can now use our AI models to

Transcribe audio data with speech recognition models
Analyze audio data with audio intelligence models
Build generative features on top of audio data with LLMs

For example, in our tutorial on Redacting PII with Make, we demonstrate how to build a Make scenario that automatically creates a redacted audio file and redacted transcription for any audio file uploaded to a Google Drive folder.