Changelog

Follow along to see weekly accuracy and product improvements.

Subscribe to updates Follow us on Twitter

December 4, 2021

Entity Detection Released, Improved Filler Word Detection, Usage Alerts

v1 release of Entity Detection - automatically detects a wide range of entities like person and company names, emails, addresses, dates, locations, events, and more.
To include Entity Detection in your transcript, set entity_detection to true in your POST request to /v2/transcript.
When your transcript is complete, you will see an entities key towards the bottom of the JSON response containing the entities detected, as shown here:

Read more about Entity Detection in our official documentation.
Usage Alert feature added, allowing customers to set a monthly usage threshold on their account along with a list of email addresses to be notified when that monthly threshold has been exceeded. This feature can be enabled by clicking “Set up alerts” on the “Developers” tab in the Dashboard.

When Content Safety is enabled, a summary of the severity scores detected will now be returned in the API response under the severity_score_summary nested inside of the content_safety_labels key, as shown below.

Improved Filler Word (aka, disfluencies) detection by ~25%.

Fixed a bug in Auto Chapters that would occasionally add an extra space between sentences for headlines and summaries.

November 27, 2021

Additional MIME Type Detection Added for OPUS Files

Added additional MIME type detection to detect a wider variety of OPUS files.

Fixed an issue with word timing calculations that caused issues with speaker labeling for a small number of transcripts.

November 23, 2021

Custom Vocabulary Accuracy Significantly Improved

Significantly improved the accuracy of Custom Vocabulary, and the impact of the boost_param field to control the weight for Custom Vocabulary.
Improved precision of word timings.

November 12, 2021

New Auto Chapters, Sentiment Analysis, and Disfluencies Features Released

v1 release of Auto Chapters - which provides a "summary over time" by breaking audio/video files into "chapters" based on the topic of conversation. Check out our blog to read more about this new feature. To enable Auto Chapters in your request, you can set auto_chapters: true in your POST request to /v2/transcript.
v1 release of Sentiment Analysis - that determines the sentiment of sentences in a transcript as "positive", "negative", or "neutral". Sentiment Analysis can be enabled by including the sentiment_analysis: true parameter in your POST request to /v2/transcript.
Filler-words like "um" and "uh" can now be included in the transcription text. Simply include disfluencies: true in your POST request to /v2/transcript.

Deployed Speaker Labels version 1.3.0. Improves overall diarization/labeling accuracy.
Improved our internal auto-scaling for asynchronous transcription, to keep turnaround times consistently low during periods of high usage.

November 7, 2021

New Language Code Parameter for English Spelling

Added a new language_code parameter when making requests to /v2/transcript.
Developers can set this to en_us, en_uk, and en_au, which will ensure the correct English spelling is used - British English, Australian English, or US English (Default).
Quick note: for customers that were historically using the assemblyai_en_au or assemblyai_en_uk acoustic models, the language_code parameter is essentially redundant and doesn't need to be used.

Fixed an edge-case where some files with prolonged silences would occasionally have a single word predicted, such as "you" or "hi."

November 1, 2021

New Features Coming Soon, Bug Fixes

This week, our engineering team has been hard at work preparing for the release of exciting new features like:
Chapter Detection: Automatically summarize audio and video files into segments (aka "chapters").
Sentiment Analysis: Determine the sentiment of sentences in your transcript as "positive", "negative", or "neutral".
Disfluencies: Detects filler-words like "um" and "uh".

Improved average real-time latency by 2.1% and p99 latency by 0.06%.

Fixed an edge-case where confidence scores in the utterances category for dual-channel audio files would occasionally receive a confidence score greater than 1.0.

October 24, 2021

Improved v8 Model Processing Speed

Improved the API's ability to handle audio/video files with a duration over 8 hours.

Further improved transcription processing times by 12%.
Fixed an edge case in our responses for dual channel audio files where if speaker 2 interrupted speaker 1, the text from speaker 2 would cause the text from speaker 1 to be split into multiple turns, rather than contextually keeping all of speaker 1's text together.

October 18, 2021

v8 Transcription Model Released

Today, we're happy to announce the release of our most accurate Speech Recognition model for asynchronous transcription to date—version 8 (v8).
This new model dramatically improves overall accuracy (up to 19% relative), and proper noun accuracy as well (up to 25% relative).
You can read more about our v8 model in our blog here.

Fixed an edge case where a small percentage of short (<60 seconds in length) dual-channel audio files, with the same audio on each channel, resulted in repeated words in the transcription.

October 11, 2021

v2 Real-Time and v4 Topic Detection Models Released

Launched our v2 Real-Time Streaming Transcription model (read more on our blog).
This new model improves accuracy of our Real-Time Streaming Transcription by ~10%.
Launched our Topic Detection v4 model, with an accuracy boost of ~8.37% over v3 (read more on our blog).

October 3, 2021

v3 Topic Detection Model, PII Redaction Bug Fixes

Released our v3 Topic Detection model.
This model dramatically improves the Topic Detection feature's ability to accurately detect topics based on context.
For example, in the following text, the model was able to accurately predict "Rugby" without the mention of the sport directly, due to the mention of "Ed Robinson" (a Rugby coach).

PII Redaction has been improved to better identify (and redact) phone numbers even when they are not explicitly referred to as a phone number.

Released a fix for PII Redaction that corrects an issue where the model would sometimes detect phone numbers as credit card numbers or social security numbers.

September 26, 2021

Severity Scores for Content Safety

The API now returns a severity score along with the confidence and label keys when using the Content Safety feature.
The severity score measures how intense a detected Content Safety label is on a scale of 0 to 1.
For example, a natural disaster that leads to mass casualties will have a score of 1.0, while a small storm that breaks a mailbox will only be 0.1.

Fixed an edge case where a small number of transcripts with Automatic Transcript Highlights turned on were not returning any results.

September 19, 2021

Real-time Transcription and Streaming Fixes

Fixed an edge case where higher sample rates would occasionally trigger a Client sent audio too fast error from the Real-Time Streaming WebSocket API.
Fixed an edge case where some streams from Real-Time Streaming WebSocket API were held open after a customer idled their session.
Fixed an edge case in the /v2/stream endpoint, where large periods of silence would occasionally cause automatic punctuation to fail.
Improved error handling when non-JSON input is sent to the /v2/transcript endpoint.

September 3, 2021

Punctuation v3, Word Search, Bug Fixes

v3 Punctuation Model released.
v3 brings improved accuracy to automatic punctuation and casing for both async (/v2/transcript) and real-time (WebSocket API) transcripts.
Released an all-new Word Search feature that will allow developers to search for words in a completed transcript.
This new feature returns how many times the word was spoken, the index of that word in the transcript's JSON response word list/array, and the associated timestamps for each matched word.

Fixed an issue causing a small subset of words not to be filtered when profanity filtering was turned on.

August 24, 2021

General Improvements

Fixed a bug with PII Redaction, where sometimes dollar amount and date tokens were not being properly redacted.
AssemblyAI now supports even more audio/video file formats thanks to improvements to our audio transcoding pipeline!
Fixed a rare bug where a small percentage of transcripts (0.01%) would incorrectly sit in a status of "queued" for up to 60 seconds.

August 16, 2021

ITN Model Update

Today we've released a major improvement to our ITN (Inverse Text Normalization) model. This results in better formatting for entities within the transcription, such as phone numbers, money amounts, and dates.

For example:

Money:

Spoken: "Hey, do you have five dollars?"
Model output with ITN: "Hey, do you have $5?"

Years:

Spoken: "Yes, I believe it was back in two thousand eight"
Model output with ITN: "Yes, I believe it was back in 2008."

August 13, 2021

Punctuation Model v2.5 Released

Today we've released an updated Automatic Punctuation and Casing Restoration model (Punctuation v2.5)! This update results in improved capitalization of proper nouns in transcripts, reduces over-capitalization issues where some words like were being incorrectly capitalized, and improves some edge cases around words with commas around them. For example:

"....in the Us" now becomes "....in the US."
"whatsapp," now becomes "WhatsApp,"

August 3, 2021

Content Safety Model (v7) Released

We have released an updated Content Safety Model - v7! Performance for 10 out of all 19 Content Safety labels has been improved, with the biggest improvements being for the Profanity and Natural Disasters labels.

August 2, 2021

Real-Time Transcription Model v1.1 Released

We have just released a major real-time update!

Developers will now be able to use the word_boost parameter in requests to the real-time API, allowing you to introduce your own custom vocabulary to the model for that given session! This custom vocabulary will lead to improved accuracy for the provided words.

General Improvements

We will now be limiting one websocket connection per real-time session to ensure the integrity of a customer's transcription and prevent multiple users/clients from using the websocket same session.

Note: Developers can still have multiple real-time sessions open in parallel, up to the Concurrency Limit on the account. For example, if an account has a Concurrency Limit of 32, that account could have up to 32 concurrent real-time sessions open.

August 2, 2021

Topic Detection Model v2 Released

Today we have released v2 of our Topic Detection Model. This new model will predict multiple topics for each paragraph of text, whereas v1 was limited to predicting a single. For example, given the text:

"Elon Musk just released a new Tesla that drives itself!"

v1:

Automotive>AutoType>DriverlessCars: 1

v2:

Automotive>AutoType>DriverlessCars: 1
PopCulture : 0.84
PopCulture>CelebrityStyle: 0.56

This improvement will result in the visual output looking significantly better, and containing more informative responses for developers!

July 29, 2021

Increased Number of Categories Returned for Topic Detection Summary

In this minor improvement, we have increased the number of topics the model can return in the summary key of the JSON response from 10 to 20.