Changelog

Follow along to see weekly accuracy and product improvements.

Subscribe to updates Follow us on Twitter

April 4, 2022

Spanish Language Support, Automatic Language Detection, and Custom Spelling Released

Spanish transcription is now publicly available. Check out our documentation for more information on Specifying a Language in your POST request.
Automatic Language Detection is now available for our /v2/transcript endpoint. This feature can identify the dominant language that’s spoken in an audio file and route the file to the appropriate model for the detected language.
Our new Custom Spelling feature gives you the ability to specify how words are spelled or formatted in the transcript text. For example, Custom Spelling could be used to change all instances "CS 50" to "CS50".

March 28, 2022

Auto Chapters v6 Released

Released Auto Chapters v6, improving the summarization of longer chapters.

March 14, 2022

Auto Chapters v5 Released

Auto Chapters v5 released, improving headline and gist generation and quote formatting in the summary key.

Fixed an edge case in Dual-Channel files where initial words in an audio file would occasionally be missed in the transcription.

March 8, 2022

Regional Spelling Improvements

Region-specific spelling improved for en_uk and en_au language codes.
Improved the formatting of “MP3” in transcripts.
Improved Real-Time transcription error handling for corrupted audio files.

February 28, 2022

Real-Time v3 Released

Released v3 of our Real-Time Transcription model, improving overall accuracy by 18% and proper noun recognition by 23% relative to the v2 model.

Improved PII Redaction and Entity Detection for CREDIT_CARD_CVV and LOCATION.

February 22, 2022

Auto Chapters v4 Released, Auto Retry Feature Added

Added an Auto Retry feature, which automatically retries transcripts that fail with a Server error, developers have been alerted message. This feature is enabled by default. To disable it, visit the Account tab in your Developer Dashboard.

Auto Chapters v4 released, improving chapter summarization in the summary key.
Added a trailing period for the gist key in the Auto Chapters feature.

February 7, 2022

Auto Chapters v3 Released

Released v3 of our Auto Chapters model, improving the model’s ability to segment audio into chapters and chapter boundary detection by 56.3%.
Improved formatting for Auto Chapters summaries. The summary, headline, and gist keys now include better punctuation, casing, and text formatting.

January 31, 2022

Miscellaneous Bug Fixes

Fixed a rare edge case affecting audio duration calculation of a small percentage of multi-channel files that contained no speech.
Miscellaneous bug fixes for Real-Time Transcription.

January 24, 2022

Webhook Status Codes, Entity Detection Improved

POST requests from the API to webhook URLs will now accept any status code from 200 to 299 as a successful HTTP response. Previously only 200 status codes were accepted.
Updated the text key in our Entity Detection feature to return the proper noun rather than the possessive noun. For example, Andrew instead of Andrew’s.

Fixed an edge case with Entity Detection where under certain contexts, a disfluency could be identified as an entity.

January 17, 2022

Punctuation and Casing Accuracy Improved, Inverse Text Normalization Model Updated

Released v4 of our Punctuation model, increasing punctuation and casing accuracy by ~2%.
Updated our Inverse Text Normalization (ITN) model for our /v2/transcript endpoint, improving web address and email address formatting and fixing the occasional number formatting issue.

Fixed an edge case where multi-channel files would return no text when the two channels were out of phase with each other.

January 10, 2022

Support for Non-English Languages Coming Soon

Our Deep Learning team has been hard at work training our new non-English language models. In the coming weeks, we will be adding support for French, German, Italian, and Spanish.

January 3, 2022

Shorter Summaries Added to Auto Chapters, Improved Filler Word Detection

Added a new gist key to the Auto Chapters feature. This new key provides an ultra-short, usually 3 to 8 word summary of the content spoken during that chapter.

Implemented profanity filtering into Auto Chapters, which will prevent the API from generating a summary, headline, or gist that includes profanity.
Improved Filler Word (aka, disfluencies) detection by ~5%.
Improved accuracy for Real-Time Streaming Transcription.

Fixed an edge case where WebSocket connections for Real-Time Transcription sessions would occasionally not close properly after the session was terminated. This resulted in the client receiving a 4031 error code even after sending a session termination message.
Corrected a bug that occasionally attributed disfluencies to the wrong utterance when Speaker Labels or Dual-Channel Transcription was enabled.

December 27, 2021

v8.5 Asynchronous Transcription Model Released

Our Asynchronous Speech Recognition model is now even better with the release of v8.5.
This update improves overall accuracy by 4% relative to our v8 model.
This is achieved by improving the model’s ability to handle noisy or difficult-to-decipher audio.
The v8.5 model also improves Inverse Text Normalization for numbers.

December 20, 2021

New and Improved API Documentation

Launched the new AssemblyAI Docs, with more complete documentation and an easy-to-navigate interface so developers can effectively use and integrate with our API. Click here to view the new and improved documentation.

Added two new fields to the FinalTranscript response for Real-time Transcriptions. The punctuated key is a Boolean value indicating if punctuation was successful. The text_formatted key is a Boolean value indicating if Inverse Text Normalization (ITN) was successful.

December 13, 2021

Inverse Text Normalization Added to Real-Time, Word Boost Accuracy Improved

Inverse Text Normalization (ITN) added for our /v2/realtime and /v2/stream endpoints. ITN improves formatting of entities like numbers, dates, and proper nouns in the transcription text.

Improved accuracy for Custom Vocabulary (aka, Word Boosts) with the Real-Time transcription API.

Fixed an edge case that would sometimes cause transcription errors when disfluencies was set to true and no words were identified in the audio file.

December 4, 2021

Entity Detection Released, Improved Filler Word Detection, Usage Alerts

v1 release of Entity Detection - automatically detects a wide range of entities like person and company names, emails, addresses, dates, locations, events, and more.
To include Entity Detection in your transcript, set entity_detection to true in your POST request to /v2/transcript.
When your transcript is complete, you will see an entities key towards the bottom of the JSON response containing the entities detected, as shown here:

Read more about Entity Detection in our official documentation.
Usage Alert feature added, allowing customers to set a monthly usage threshold on their account along with a list of email addresses to be notified when that monthly threshold has been exceeded. This feature can be enabled by clicking “Set up alerts” on the “Developers” tab in the Dashboard.

When Content Safety is enabled, a summary of the severity scores detected will now be returned in the API response under the severity_score_summary nested inside of the content_safety_labels key, as shown below.

Improved Filler Word (aka, disfluencies) detection by ~25%.

Fixed a bug in Auto Chapters that would occasionally add an extra space between sentences for headlines and summaries.

November 27, 2021

Additional MIME Type Detection Added for OPUS Files

Added additional MIME type detection to detect a wider variety of OPUS files.

Fixed an issue with word timing calculations that caused issues with speaker labeling for a small number of transcripts.

November 23, 2021

Custom Vocabulary Accuracy Significantly Improved

Significantly improved the accuracy of Custom Vocabulary, and the impact of the boost_param field to control the weight for Custom Vocabulary.
Improved precision of word timings.

November 12, 2021

New Auto Chapters, Sentiment Analysis, and Disfluencies Features Released

v1 release of Auto Chapters - which provides a "summary over time" by breaking audio/video files into "chapters" based on the topic of conversation. Check out our blog to read more about this new feature. To enable Auto Chapters in your request, you can set auto_chapters: true in your POST request to /v2/transcript.
v1 release of Sentiment Analysis - that determines the sentiment of sentences in a transcript as "positive", "negative", or "neutral". Sentiment Analysis can be enabled by including the sentiment_analysis: true parameter in your POST request to /v2/transcript.
Filler-words like "um" and "uh" can now be included in the transcription text. Simply include disfluencies: true in your POST request to /v2/transcript.

Deployed Speaker Labels version 1.3.0. Improves overall diarization/labeling accuracy.
Improved our internal auto-scaling for asynchronous transcription, to keep turnaround times consistently low during periods of high usage.

November 7, 2021

New Language Code Parameter for English Spelling

Added a new language_code parameter when making requests to /v2/transcript.
Developers can set this to en_us, en_uk, and en_au, which will ensure the correct English spelling is used - British English, Australian English, or US English (Default).
Quick note: for customers that were historically using the assemblyai_en_au or assemblyai_en_uk acoustic models, the language_code parameter is essentially redundant and doesn't need to be used.

Fixed an edge-case where some files with prolonged silences would occasionally have a single word predicted, such as "you" or "hi."