Changelog

Follow along to see weekly accuracy and product improvements.

December 2, 2022

New Summarization Models Tailored to Use Cases

We are excited to announce that new Summarization models are now available! Developers can now choose between multiple summary models that best fit their use case and customize the output based on the summary length.

The new models are:

  • Informative which is best for files with a single speaker, like a presentation or lecture
  • Conversational which is best for any multi-person conversation, like customer/agent phone calls or interview/interviewee calls
  • Catchy which is best for creating video, podcast, or media titles

Developers can use the summary_model parameter in their POST request to specify which of our summary models they would like to use. This new parameter can be used along with the existing summary_type parameter to allow the developer to customize the summary to their needs.

import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
    "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_model": "informative", # conversational | catchy
    "summary_type": "bullets" # bullets_verbose | gist | headline | paragraph
}
headers = {
	"authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())

Check out our latest blog post to learn more about the new Summarization models or head to the AssemblyAI Playground to test Summarization in your browser!

October 31, 2022

Improved Transcription Accuracy for COVID

We’ve made updates to our Core Transcription model to improve the transcription accuracy of the word COVID. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription.

Static IP support for webhooks is now generally available!

Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.

See our walkthrough on how to start receiving webhooks for your transcriptions.

October 25, 2022

New Audio Intelligence Models: Summarization

import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
  "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_type": "bullets" # paragraph | headline | gist 
}
headers = {
  "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())

Starting today, you can now transcribe and summarize entire audio files with a single API call.

To enable our new Summarization models, include the following parameter: "summarization": true in your POST request to /v2/transcript. When the transcription finishes, you will see the summary key in the JSON response containing the summary of your transcribed audio or video file.

By default, summaries will be returned in the style of bullet points. You can customize the style of summary by including the optional summary_type parameter in your POST request along with one of the following values: paragraph, headline, or gist. Here is the full list of summary types we support.

// summary_type = "paragraph"

"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat."

// summary_type = "headline"

"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcomes versus output."

// summary_type = "gist"

"summary": "Outcomes over output"

// summary_type = = "bullets"

"summary": "Josh Seiden and Brian Donohue discuss
the topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat.\n- ..."

Examples of use cases for Summarization include:

  • Identify key takeaways from phone calls to speed up post-call review and reduce manual summarization
  • Summarize long podcasts into short descriptions so users can preview before they listen.
  • Instantly generate meetings summaries to quickly recap virtual meetings and highlight post-meeting actions
  • Suggest 3-5 word video titles automatically for user-generated content
  • Synthesize long educational courses, lectures, and media broadcasts into their most important points for faster consumption

We're really excited to see what you build with our new Summarization models. To get started, try it out for free in our no-code playground or visit our documentation for more info on how to enable Summarization in your API requests.

October 19, 2022

Automatic Casing / Short Utterances

We’ve improved our Automatic Casing model and fixed a minor bug that caused over-capitalization in English transcripts. The Automatic Casing model is enabled by default with our Core Transcription API to improve transcript readability for video captions (SRT/VTT). See our documentation for more info on Automatic Casing.

Our Core Transcription model has been fine-tuned to better detect short utterances in English transcripts. Examples of short utterances include one-word answers such as “No.” and “Right.” This update will take effect immediately for all customers.

October 14, 2022

Static IP Support for Webhooks

Over the next few weeks, we will begin rolling out Static IP support for webhooks to customers in stages.

Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.

See our walkthrough on how to start receiving webhooks for your transcriptions.

October 12, 2022

Improved Number Transcription

PII Redaction Examples

We’ve made improvements to our Core Transcription model to better identify and transcribe numbers present in your audio files.

Accurate number transcription is critical for customers that need to redact Personally Identifiable Information (PII) that gets exchanged during phone calls. Examples of PII include credit card numbers, addresses, phone numbers, and social security numbers.

In order to help you handle sensitive user data at scale, our PII Redaction model automatically detects and removes sensitive info from transcriptions. For example, when PII redaction is enabled, a phone number like 412-412-4124 would become ###-###-####.

To learn more, check out our blog that covers all of our PII Redaction Policies or try our PII Redaction model in our Sandbox here!

September 6, 2022

Improved Disfluency Timestamps

We've updated our Disfluency Detection model to improve the accuracy of timestamps for disfluency words.

By default, disfluencies such as "um" or "uh" and "hm" are automatically excluded from transcripts. However, we allow customers to include these filler words by simply setting the disfluencies parameter to true in their POST request to /v2/transcript, which enables our Disfluency Detection model.

More info and code examples can be found here.

August 19, 2022

Speaker Label Improvement

We've improved the Speaker Label model’s ability to identify unique speakers for single word or short utterances.

August 1, 2022

Historical Transcript Bug Fix

We've fixed a bug with the Historical Transcript endpoint that was causing null to appear as the value of the completed key.

July 18, 2022

Japanese Transcription Now Available

Code snippet for Japanese transcription

Today, we’re releasing our new Japanese transcription model to help you transcribe and analyze your Japanese audio and video files using our cutting-edge AI.

Now you can automatically convert any Japanese audio or video file to text by including "language_code": "ja" in your POST request to our /v2/transcript endpoint.

In conjunction with transcription, we’ve also added Japanese support for our AI models including Custom Vocabulary (Word Boost), Custom Spelling, Automatic Punctuation / Casing, Profanity Filtering, and more. This means you can boost transcription accuracy with more granularity based on your use case. See the full list of supported models available for Japanese transcriptions here.

To get started, visit our walkthrough on Specifying a Language on our AssemblyAI documentation page or try it out now in our Sandbox!

July 11, 2022

Hindi Transcription / Custom Webhook Headers

Code snippet for Hindi transcriptions

We’ve released our new Hindi transcription model to help you transcribe and analyze your Hindi audio and video files.

Now you can automatically convert any Hindi audio or video file to text by including "language_code": "hi" in your POST request to our /v2/transcript endpoint.

We’ve also added Hindi support for our AI models including Custom Vocabulary (Word Boost), Custom Spelling, Automatic Punctuation / Casing, Profanity Filtering, and more. See the full list of supported models available for Hindi transcriptions here.

To get started with Hindi transcription, visit our walkthrough on Specifying a Language on our AssemblyAI documentation page.

Our Webhook service now supports the use of Custom Headers for authentication.

A Custom Header can be used for added security to authenticate webhook requests from AssemblyAI. This feature allows a developer to optionally provide a value to be used as an authorization header on the returning webhook from AssemblyAI, giving the ability to validate incoming webhook requests.

To use a Custom Header, you will include two additional parameters in your POST request to /v2/transcript: webhook_auth_header_name and webhook_auth_header_value. The webhook_auth_header_name parameter accepts a string containing the header's name which will be inserted into the webhook request. The webhook_auth_header_value parameter accepts a string with the value of the header that will be inserted into the webhook request. See our Using Webhooks documentation to learn more and view our code examples.

July 1, 2022

Improved Speaker Labels Accuracy and Speaker Segmentation

  • Improved the overall accuracy of the Speaker Labels feature and the model’s ability to segment speakers.

  • Fix a small edge case that would occasionally cause some transcripts to complete with NULL as the language_code value.
June 24, 2022

Content Moderation and Topic Detection Available for Portuguese

  • Improved Inverse Text Normalization of money amounts in transcript text.

  • Addressed an issue with Real-Time Transcription that would occasionally cause variance in timestamps over the course of a session.
  • Fixed an edge case with transcripts including Filler Words that would occasionally cause server errors.
June 10, 2022

Automatic Language Detection Available for Dutch and Portuguese

  • Accuracy of the Automatic Language Detection model improved on files with large amounts of silence.
  • Improved speaker segmentation accuracy for Speaker Labels.
May 27, 2022

Dutch and Portuguese Support Released

  • Dutch and Portuguese transcription is now generally available for our /v2/transcript endpoint. See our documentation for more information on specifying a language in your POST request.
May 20, 2022

Content Moderation and Topic Detection Available for French, German, and Spanish

  • Improved redaction accuracy for credit_card_number, credit_card_expiration, and credit_card_cvv policies in our PII Redaction feature.

  • Fixed an edge case that would occasionally affect the capitalization of words in transcripts when disfluencies was set to true.
May 2, 2022

French, German, and Italian Support Released

  • French, German, and Italian transcription is now publicly available. Check out our documentation for more information on Specifying a Language in your POST request.

  • Released v2 of our Spanish model, improving absolute accuracy by ~4%.
  • Automatic Language Detection now supports French, German, and Italian.
  • Reduced the volume of the beep used to redact PII information in redacted audio files.
April 18, 2022

Miscellaneous Bug Fixes

  • Fixed an edge case that would occasionally affect timestamps for a small number of words when disfluencies was set to true.
  • Fixed an edge case where PII audio redaction would occasionally fail when using local files.
April 12, 2022

New Policies Added for PII Redaction and Entity Detection

April 4, 2022

Spanish Language Support, Automatic Language Detection, and Custom Spelling Released

  • Spanish transcription is now publicly available. Check out our documentation for more information on Specifying a Language in your POST request.
  • Automatic Language Detection is now available for our /v2/transcript endpoint. This feature can identify the dominant language that’s spoken in an audio file and route the file to the appropriate model for the detected language.
  • Our new Custom Spelling feature gives you the ability to specify how words are spelled or formatted in the transcript text. For example, Custom Spelling could be used to change all instances "CS 50" to "CS50".