Changelog

Follow along to see weekly accuracy and product improvements.

Subscribe to updates Follow us on Twitter

July 16, 2024

LeMUR - Claude 3 support

Last week, we released Anthropic's Claude 3 model family into LeMUR, our LLM framework for speech.

Claude 3.5 Sonnet
Claude 3 Opus
Claude 3 Sonnet
Claude 3 Haiku

You can now easily apply any of these models to your audio data. Learn more about how to get started in our docs or try out the new models in a no-code way through our playground.

For more information, check out our blog post about the release.

import assemblyai as aai

# Step 1: Transcribe an audio file
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("./common_sports_injuries.mp3")

# Step 2: Define a prompt
prompt = "Provide a brief summary of the transcript."

# Step 3: Choose an LLM to use with LeMUR
result = transcript.lemur.task(
    prompt,
    final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)

July 16, 2024

JavaScript SDK fix

We've fixed an issue which was causing the JavaScript SDK to surface the following error when using the SDK in the browser:

Access to fetch at 'https://api.assemblyai.com/v2/transcript' from origin 'https://exampleurl.com' has been blocked by CORS policy: Request header field assemblyai-agent is not allowed by Access-Control-Allow-Headers in preflight response.

July 9, 2024

Timestamps improvement; bugfixes

We've made significant improvements to the timestamp accuracy of our Speech-to-Text Best tier for English, Spanish, and German. 96% of timestamps are accurate within 200ms, and 86% of timestamps are now accurate within 100ms.

We've fixed a bug in which confidence scores of transcribed words for the Nano tier would sometimes be outside of the range [0, 1]

We've fixed a rare issue in which the speech for only one channel in a short dual channel file would be transcribed when disfluencies was also enabled.

June 24, 2024

Streaming (formerly Real-time) improvements

We've made model improvements that significantly improve the accuracy of timestamps when using our Streaming Speech-to-Text service. Most timestamps are now accurate within 100 ms.

Our Streaming Speech-to-Text service will now return a new error 'Audio too small to be transcoded' (code 4034) when a client submits an audio chunk that is too small to be transcoded (less than 10 ms).

June 20, 2024

Variable-bitrate video support; bugfix

We've deployed changes which now permit variable-bitrate video files to be submitted to our API.

We've fixed a recent bug in which audio files with a large amount of silence at the beginning of the file would fail to transcribe.

June 19, 2024

LeMUR improvements

We have added two new keys to the LeMUR response, input_tokens and output_tokens, which can help users track usage.

We've implemented a new fallback system to further boost the reliability of LeMUR.

We have addressed an edge case issue affecting LeMUR and certain XML tags. In particular, when LeMUR responds with a <question> XML tag, it will now always close it with a </question> tag rather than erroneous tags which would sometimes be returned (e.g. </answer>).

June 18, 2024

PII Redaction and Entity Detection improvements

We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts.

We've added 16 new entities, including vehicle_id and account_number, and updated 4 of our existing entities. Users may need to update to the latest version of our SDKs to use these new entities.

We've added PII Text Redaction and Entity Detection support in 4 new languages:

Chinese
Dutch
Japanese
Georgian

PII Text Redaction and Entity Detection now support a total of 47 languages between our Best and Nano tiers.

June 14, 2024

Usage and spend alerts

Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold.

To set up a billing alert, go to the billing page of your portal, and click Set up a new alert under the Your alerts widget:

You can then set up an alert by specifying whether to alert on monthly spend or account balance, as well as the specific threshold at which to send an alert.

June 13, 2024

Universal-1 now available in German

Universal-1, our most powerful and accurate multilingual Speech-to-Text model, is now available in German.

No special action is needed to utilize Universal-1 on German audio - all requests sent to our /v2/transcript endpoint with German audio files will now use Universal-1 by default. Learn more about how to integrate Universal-1 into your apps in our Getting Started guides.

May 24, 2024

New languages for Speaker Diarization

Speaker Diarization is now available in five additional languages for both the Best and Nano tiers:

Chinese
Hindi
Japanese
Korean
Vietnamese

May 13, 2024

New API Reference, Timestamps improvements

We’ve released a new version of the API Reference section of our docs for an improved developer experience. Here’s what’s new:

New API Reference pages with exhaustive endpoint documentation for transcription, LeMUR, and streaming
cURL examples for every endpoint
Interactive Playground: Test our API endpoints with the interactive playground. It includes a form-builder for generating requests and corresponding code examples in cURL, Python, and TypeScript
Always up to date: The new API Reference is autogenerated based on our Open-Source OpenAPI and AsyncAPI specs

We’ve made improvements to Universal-1’s timestamps for both the Best and Nano tiers, yielding improved timestamp accuracy and a reduced incidence of overlapping timestamps.

We’ve fixed an issue in which users could receive an `Unable to create transcription. Developers have been alerted` error that would be surfaced when using long files with Sentiment Analysis.

May 3, 2024

New codec support; account deletion support

We’ve upgraded our transcoding library and now support the following new codecs:

Bonk, APAC, Mi-SC4, 100i, VQC, FTR PHM, WBMP, XMD ADPCM, WADY DPCM, CBD2 DPCM
HEVC, VP9, AV1 codec in enhanced flv format

Users can now delete their accounts by selecting the Delete account option on the Account page of their AssemblyAI Dashboards.

Users will now receive a 400 error when using an invalid tier and language code combination, with an error message such as The selected language_code is supported by the following speech_models: best, conformer-2. See https://www.assemblyai.com/docs/concepts/supported-languages..

We’ve fixed an issue in which nested JSON responses from LeMUR would cause Invalid LLM response, unable to fulfill request. Please try again. errors.

We’ve fixed a bug in which very long files would sometimes fail to transcribe, leading to timeout errors.

May 1, 2024

AssemblyAI app for Make.com

Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services.

We’ve released the AssemblyAI app for Make that allows Make users to incorporate AssemblyAI into their workflows, or scenarios. In other words, in Make you can now use our AI models to

Transcribe audio data with speech recognition models
Analyze audio data with audio intelligence models
Build generative features on top of audio data with LLMs

For example, in our tutorial on Redacting PII with Make, we demonstrate how to build a Make scenario that automatically creates a redacted audio file and redacted transcription for any audio file uploaded to a Google Drive folder.

April 18, 2024

GDPR and PCI DSS compliance

AssemblyAI is now officially PCI Compliant. The Payment Card Industry Data Security Standard Requirements and Security Assessment Procedures (PCI DSS) certification is a rigorous assessment that ensures card holder data is being properly and securely handled and stored. You can read more about PCI DSS here.

Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our PCI attestation of compliance, as well as other security-related documents.

AssemblyAI is also GDPR Compliant. The General Data Protection Regulation (GDPR) is regulation regarding privacy and security for the European Union that applies to businesses that serve customers within the EU. You can read more about GDPR here.

Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our GDPR assessment on compliance, as well as other security-related documents.

April 18, 2024

Self-serve invoices; dual-channel improvement

Users of our API can now view and download their self-serve invoices in their dashboards under Billing > Your invoices.

We’ve made readability improvements to the formatting of utterances for dual-channel transcription by combining sequential utterances from the same channel.

We’ve added a patch to improve stability in turnaround times for our async transcription and LeMUR services.

We’ve fixed an issue in which timestamp accuracy would be degraded in certain edge cases when using our async transcription service.

April 11, 2024

Introducing Universal-1

Last week we released Universal-1, a state-of-the-art multimodal speech recognition model. Universal-1 is trained on 12.5M hours of multilingual audio data, yielding impressive performance across the four key languages for which it was trained - English, Spanish, German, and French.

Word Error Rate across four languages for several providers. Lower is better.

Universal-1 is now the default model for English and Spanish audio files sent to our v2/transcript endpoint for async processing, while German and French will be rolled out in the coming weeks.

You can read more about Universal-1 in our announcement blog or research blog, or you can try it out now on our Playground.

April 11, 2024

New Streaming STT features

We’ve added a new message type to our Streaming Speech-to-Text (STT) service. This new message type SessionInformation is sent immediately before the final SessionTerminated message when closing a Streaming session, and it contains a field called audio_duration_seconds which contains the total audio duration processed during the session. This feature allows customers to run end-user-specific billing calculations.

To enable this feature, set the enable_extra_session_information query parameter to true when connecting to a Streaming WebSocket.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&enable_extra_session_information=true'

This feature will be rolled out in our SDKs soon.

We’ve added a new feature to our Streaming STT service, allowing users to disable Partial Transcripts in a Streaming session. Our Streaming API sends two types of transcripts - Partial Transcripts (unformatted and unpunctuated) that gradually build up the current utterance, and Final Transcripts which are sent when an utterance is complete, containing the entire utterance punctuated and formatted.

Users can now set the disable_partial_transcripts query parameter to true when connecting to a Streaming WebSocket to disable the sending of Partial Transcript messages.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&disable_partial_transcripts=true'

This feature will be rolled out in our SDKs soon.

We have fixed a bug in our async transcription service, eliminating File does not appear to contain audio errors. Previously, this error would be surfaced in edge cases where our transcoding pipeline would not have enough resources to transcode a given file, thus failing due to resource starvation.

March 20, 2024

Dual channel transcription improvements

We’ve made improvements to how utterances are handled during dual-channel transcription. In particular, the transcription service now has elevated sensitivity when detecting utterances, leading to improved utterance insertions when there is overlapping speech on the two channels.

March 14, 2024

LeMUR concurrency fix

We’ve fixed a temporary issue in which users with low account balances would occasionally be rate-limited to a value less than 30 when using LeMUR.

March 6, 2024

Fewer "File does not appear to contain audio" errors

We’ve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio. Users can expect to see an immediate reduction in this type of error. If this error does occur, users should retry their requests given that retries are generally successful.

We’ve made improvements to our transcription service autoscaling, leading to improved turnaround times for requests that use Word Boost when there is a spike in requests to our API.