Changelog

Follow along to see weekly accuracy and product improvements.

March 6, 2024

Fewer "File does not appear to contain audio" errors

We’ve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio. Users can expect to see an immediate reduction in this type of error. If this error does occur, users should retry their requests given that retries are generally successful.

We’ve made improvements to our transcription service autoscaling, leading to improved turnaround times for requests that use Word Boost when there is a spike in requests to our API.

February 27, 2024

New developer controls for real-time end-of-utterance

We have released developer controls for real-time end-of-utterance detection, providing developers control over when an utterance is considered complete. Developers can now either manually force the end of an utterance, or set a threshold for time of silence before an utterance is considered complete. 

We have made changes to our English async transcription service that improve sentence segmentation for our Sentiment Analysis, Topic Detection, and Content Moderation models. The improvements fix a bug in which these models would sometimes delineate sentences on titles that end in periods like Dr. and Mrs.

We have fixed an issue in which transcriptions of very long files (8h+) with disfluencies enabled would error out.

February 19, 2024

PII Redaction and Entity Detection available in 13 additional languages

We have launched PII Text Redaction and Entity Detection for 13 new languages:

  1. Spanish
  2. Finnish
  3. French
  4. German
  5. Hindi
  6. Italian
  7. Korean
  8. Polish
  9. Portuguese
  10. Russian
  11. Turkish
  12. Ukrainian
  13. Vietnamese

We have increased the memory of our transcoding service workers, leading to a significant reduction in errors that say File does not appear to contain audio.

February 6, 2024

Fewer LeMUR 500 errors

We’ve made improvements to our LeMUR service to reduce the number of 500 errors.

We’ve made improvements to our real-time service, which provides a small increase to the accuracy of timestamps in some edge cases.

January 18, 2024

Free tier limit increase; Real-time concurrency increase

We have increased the usage limit for our free tier to 100 hours. New users can now use our async API to transcribe up to 100 hours of audio, with a concurrency limit of 5, before needing to upgrade their accounts.

We have rolled out the concurrency limit increase for our real-time service. Users now have access to up to 100 concurrent streams by default when using our real-time service.

Higher concurrency is available upon request with no limit to what our API can support. If you need a higher concurrency limit, please either contact our Sales team or reach out to us at support@assemblyai.com. Note that our real-time service is only available for upgraded accounts.

January 12, 2024

Latency and cost reductions, concurrency increase

We introduced major improvements to our API’s inference latency, with the majority of audio files now completing in well under 45 seconds regardless of audio duration, with a Real-Time Factor (RTF) of up to .008.

To put an RTF of .008x into perspective, this means you can now convert a:

  • 1h3min (75MB) meeting in 35 seconds
  • 3h15min (191MB) podcast in 133 seconds
  • 8h21min (464MB) video course in 300 seconds

In addition to these latency improvements, we have reduced our Speech-to-Text pricing. You can now access our Speech AI models with the following pricing:

  • Async Speech-to-Text for $0.37 per hour (previously $0.65) 
  • Real-time Speech-to-Text for $0.47 per hour (previously $0.75)

We’ve also reduced our pricing for the following Audio Intelligence models: Key Phrases, Sentiment Analysis, Summarization, PII Audio Redaction, PII Redaction, Auto Chapters, Entity Detection, Content Moderation, and Topic Detection. You can view the complete list of pricing updates on our Pricing page.

Finally, we've increased the default concurrency limits for both our async and real-time services. The increase is immediate for async, and will be rolled out soon for real-time. These new limits are now:

  • 200 for async (up from 32)
  • 100 for real-time (up from 32)

These new changes stem from the efficiencies that our incredible research and engineering teams drive at every level of our inference pipeline, including optimized model compilation, intelligent mini batching, hardware parallelization, and optimized serving infrastructure.

Learn more about these changes and our inference pipeline in our blog post.

January 12, 2024

Claude 2.1 available through LeMUR

Anthropic’s Claude 2.1 is now generally available through LeMUR. Claude 2.1 is similar to our Default model and has reduced hallucinations, a larger context window, and performs better in citations.

Claude 2.1 can be used by setting the final_model parameter to anthropic/claude-2-1 in API requests to LeMUR. Here's an example of how to do this through our Python SDK:

import assemblyai as aai

transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://example.org/customer.mp3")

result = transcript.lemur.task(
  "Summarize the following transcript in three to five sentences.",
  final_model=aai.LemurModel.claude2_1,
)


print(result.response)

You can learn more about setting the model used with LeMUR in our docs.

January 12, 2024

Real-time Binary support, improved async timestamps

Our real-time service now supports binary mode for sending audio segments. Users no longer need to encode audio segments as base64 sequences inside of JSON objects - the raw binary audio segment can now be directly sent to our API.

Moving forward, sending audio segments through websockets via the audio_data field is considered a deprecated functionality, although it remains the default for now to avoid breaking changes. We plan to support the audio_data field until 2025.

If you are using our SDKs, no changes are required on your end.

We have fixed a bug that would yield a degradation to timestamp accuracy at the end of very long files with many disfluencies.

December 13, 2023

New Node/JavaScript SDK works in multiple runtimes

We’ve released v4 of our Node JavaScript SDK. Previously, the SDK was developed specifically for Node, but the latest version now works in additional runtimes without any extra steps. The SDK can now be used in the browser, Deno, Bun, Cloudflare Workers, etc.

Check out the SDK’s GitHub repository for additional information.

December 13, 2023

New Punctuation Restoration and Truecasing models, PCM Mu-law support

We’ve released new Punctuation and Truecasing models, achieving significant improvements for acronyms, mixed-case words, and more.

Below is a visual comparison between our previous Punctuation Restoration and Truecasing models (red) and the new models (green):

Going forward, the new Punctuation Restoration and Truecasing models will automatically be used for async and real-time transcriptions, with no need to upgrade for special access. Use the parameters punctuate and format_text, respectively, to enable/disable the models in a request (enabled by default).

Read more about our new models here.

Our real-time transcription service now supports PCM Mu-law, an encoding used primarily in the telephony industry. This encoding is set by using the `encoding` parameter in requests to our API. You can read more about our PCM Mu-law support here.

We have improved internal reporting for our transcription service, which will allow us to better monitor traffic.

November 29, 2023

New LeMUR parameter, reduced hold music hallucinations

Users can now directly pass in custom text inputs into LeMUR through the input_text parameter as an alternative to transcript IDs. This gives users the ability to use any information from the async API, formatted however they want, with LeMUR for maximum flexibility.

For example, users can assign action items per user by inputting speaker-labeled transcripts, or pull citations by inputting timestamped transcripts. Learn more about the new input_text parameter in our LeMUR API reference, or check out examples of how to use the input_text parameter in the AssemblyAI Cookbook.

We’ve made improvements that reduce hallucinations which sometimes occurred from transcribing hold music on phone calls. This improvement is effective immediately with no changes required by users.

We’ve fixed an issue that would sometimes yield an inability to fulfill a request when XML was returned by LeMUR /task endpoint.

October 31, 2023

Reduced latency, improved error messaging

We’ve made improvements to our file downloading pipeline which reduce transcription latency. Latency has been reduced by at least 3 seconds for all audio files, with greater improvements for large audio files provided via external URLs.

We’ve improved error messaging for increased clarity in the case of internal server errors.

October 3, 2023

New Dashboard features and LeMUR fix

We have released the beta for our new usage dashboard. You can now see a usage summary broken down by async transcription, real-time transcription, Audio Intelligence, and LeMUR. Additionally, you can see charts of usage over time broken down by model.

We have added support for AWS marketplace on the dashboard/account management pages of our web application.

We have fixed an issue in which LeMUR would sometimes fail when handling extremely short transcripts.

September 19, 2023

New LeMUR features and other improvements

We have added a new parameter to LeMUR that allows users to specify a temperature for LeMUR generation. Temperature refers to how stochastic the generated text is and can be a value from 0 to 1, inclusive, where 0 corresponds to low creativity and 1 corresponds to high creativity. Lower values are preferred for tasks like multiple choice, and higher values are preferred for tasks like coming up with creative summaries of clips for social media.

Here is an example of how to set the temperature parameter with our Python SDK (which is available in version 0.18.0 and up):

import assemblyai as aai

aai.settings.api_key = f"{API_TOKEN}"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://storage.googleapis.com/aai-web-samples/meeting.mp4")

result = transcript.lemur.summarize(
	temperature=0.25
)

print(result.response)

We have added a new endpoint that allows users to delete the data for a previously submitted LeMUR request. The response data as well as any context provided in the original request will be removed. Continuing the example from above, we can see how to delete LeMUR data using our Python SDK:

request_id = result.request_id

deletion_result = aai.Lemur.purge_request_data(request_id)
print(deletion_result)

We have improved the error messaging for our Word Search functionality. Each phrase used in a Word Search functionality must be 5 words or fewer. We have improved the clarity of the error message when a user makes a request which contains a phrase that exceeds this limit.

We have fixed an edge case error that would occur when both disfluencies and Auto Chapters were enabled for audio files that contained non-fluent English.

September 13, 2023

Improvements - observability, logging, and patches

We have improved logging for our LeMUR service to allow for the surfacing of more detailed errors to users.

We have increased observability into our Speech API internally, allowing for finer grained metrics of usage.

We have fixed a minor bug that would sometimes lead to incorrect timestamps for zero-confidence words.

We have fixed an issue in which requests to LeMUR would occasionally hang during peak usage due to a memory leak issue.

August 28, 2023

Multi-language speaker labels

We have recently launched Speaker Labels for 10 additional languages:

  • Spanish
  • Portuguese
  • German
  • Dutch
  • Finnish
  • French
  • Italian
  • Polish
  • Russian
  • Turkish
August 28, 2023

Audio Intelligence unbundling and price decreases

We have unbundled and lowered the price for our Audio Intelligence models. Previously, the bundled price for all Audio Intelligence models was $2.10/hr, regardless of the number of models used.

We have made each model accessible at a lower, unbundled, per-model rate:

  • Auto chapters: $0.30/hr
  • Content Moderation: $0.25/hr
  • Entity detection: $0.15/hr
  • Key Phrases: $0.06/hr
  • PII Redaction: $0.20/hr
  • Audio Redaction: $0.05/hr
  • Sentiment analysis: $0.12/hr
  • Summarization: $0.06/hr
  • Topic detection: $0.20/hr
August 22, 2023

New language support and improvements to existing languages

We now support the following additional languages for asynchronous transcription through our /v2/transcript endpoint:

  • Chinese
  • Finnish
  • Korean
  • Polish
  • Russian
  • Turkish
  • Ukrainian
  • Vietnamese

Additionally, we've made improvements in accuracy and quality to the following languages:

  • Dutch
  • French
  • German
  • Italian
  • Japanese
  • Portuguese
  • Spanish

You can see a full list of supported languages and features here. You can see how to specify a language in your API request here. Note that not all languages support Automatic Language Detection.

August 17, 2023

Pricing decreases

We have decreased the price of Core Transcription from $0.90 per hour to $0.65 per hour, and decreased the price of Real-Time Transcription from $0.90 per hour to $0.75 per hour.

Both decreases were effective as of August 3rd.

August 1, 2023

Significant Summarization model speedups

We’ve implemented changes that yield between a 43% to 200% increase in processing speed for our Summarization models, depending on which model is selected, with no measurable impact on the quality of results.

We have standardized the response from our API for automatically detected languages that do not support requested features. In particular, when Automatic Language Detection is used and the detected language does not support a feature requested in the transcript request, our API will return null in the response for that feature.