July 15, 2025

Transcribe Twilio Phone Calls in Real-Time with AssemblyAI

Learn how to transcribe Twilio phone calls in real-time using AssemblyAI's Universal-Streaming API.

Tutorial

Streaming Speech-to-Text

Ryan O'Connor

Senior Developer Educator

Ryan O'Connor

Senior Developer Educator

Reviewed by

No items found.

Table of contents

[Visible on live site]

Get $50 in credits

Twilio is a leading platform for customer communication and engagement that makes it easy to generate personalized user experiences, from tailored coupons to individualized appointment reminders. But what if, instead of just sending communication out to customers, you want to record communication coming in from customers?

For example, a doctor might want to automatically transcribe a patient phone call so she can focus on engaging with the patient rather than scrambling to write down notes. After the call, the doctor can review the transcription in order to give a diagnosis, confident she didn't miss any details.

Not only is transcribing phone calls in this way useful, it's also easy to do with AssemblyAI's Universal-Streaming Speech-to-Text API. By the end of this tutorial, you'll be able to make a call and see your words transcribed in front of you in real time! Let's get started.

Prerequisites

To follow along with this tutorial, you'll need to have a Twilio account set up. You'll also need to have an AssemblyAI account with billing enabled to access Universal Streaming Speech-to-Text. Universal-Streaming is available at $0.15/hour, providing ultra-fast, ultra-accurate streaming speech-to-text designed specifically for voice agents and real-time applications.

The commands in this tutorial are for Debian-based systems (Ubuntu 20.04 LTS), so you may need to change some of the commands to suit your OS.

Installations

First, you'll need to make sure you have Node.js, the Node package manager, cURL, and wget installed.

(base) ryan@ubuntu:~$ sudo apt install nodejs npm curl wget

‍

Additional Information: You can copy Bash commands and paste them into the terminal by right clicking in terminal and selecting "Paste".

ngrok

Next, we'll install ngrok, which allows us to generate a public-facing URL which we can use to forward HTTP requests from Twilio to our localhost.

(base) ryan@ubuntu:~$ curl -s 
https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee 
/etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && echo "deb
https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee etc/apt/sources.list.d/ngrok.list && sudo apt update && sudo apt
install ngrok

‍

Twilio CLI tools

Finally, we'll install Twilio's CLI tools, which will allow us to specify that our Twilio POST request is directed to our ngrok URL.

(base) ryan@ubuntu:~$ wget -qO- https://twilio-cli-prod.s3.amazonaws.com/twilio_pub.asc \
| sudo apt-key add -
sudo touch /etc/apt/sources.list.d/twilio.list
echo 'deb https://twilio-cli-prod.s3.amazonaws.com/apt/ /' \
| sudo tee /etc/apt/sources.list.d/twilio.list
sudo apt update
sudo apt install -y twilio

‍

You'll be prompted to login with twilio login, and then set a shorthand identifier for your account. Finally, run twilio profiles:use $IDENTIFIER to use your account, where $IDENTIFIER has been replaced with the shorthand identifier that you just set for your account.

Creating ngrok tunnel

Now that we're done with setup, we can move on to the fun stuff! The first thing we'll do is create an ngrok tunnel. Open a new terminal and create an ngrok tunnel with

(base) ryan@ubuntu:~$ ngrok http 8080

‍

where 8080 specifies the localhost port. After running this command, several URLs will be displayed in the terminal. Copy the HTTPS Forwarding URL, like the one circled in red below.

Setting up Twilio URL

Leave ngrok running and open up a new terminal. We need to tell Twilio what the routing address for our Twilio number is. We do this as follows

(base) ryan@ubuntu:~$ twilio phone-numbers:update $TWILIO_NUMBER
--voice-url $NGROK_HTTP_URL

‍

Where $TWILIO_NUMBER has been replaced with your Twilio number that you can find in the Twilio console (the number under "Trial Number")

And $NGROK_HTTP_URL has been replaced with the ngrok URL that you copied from the previous section.

Exporting AssemblyAI API key

Next, go to AssemblyAI and copy your API Key, which you'll find in place of the red box below. Now, export your AssemblyAI API key to an environment variable with:

(base) ryan@ubuntu:~$ export ASSEMBLYAI_API_KEY=$YOUR_KEY

‍

Where you have replaced $YOUR_KEY with the AssemblyAI API Key you just copied.

Spinning up your localhost

Next, navigate into the directory from which you want to run your transcription, and then execute

(base) ryan@ubuntu:~$ npm install --save assemblyai express ws

‍

in order to install the necessary packages. Now run

(base) ryan@ubuntu:~$ wget
https://raw.githubusercontent.com/AssemblyAI/twilio-realtime-tuto
rial/master/transcribe.js

‍

to download transcribe.js from AssemblyAI's GitHub. This code uses the current Universal-Streaming (v3) API with the correct endpoint wss://streaming.assemblyai.com/v3/ws and proper parameters for Twilio's µ-law audio format.

Code verification: The transcribe.js file includes all necessary configurations:

Endpoint: wss://streaming.assemblyai.com/v3/ws (Universal-Streaming v3)
Encoding: pcm_mulaw (matches Twilio's audio format)
Sample rate: 8000 (Twilio's default)
Format turns: true (enables formatted transcripts)‍
Turn handling: Listens for Turn objects with immutable transcripts

Finally, spin up the server with

(base) ryan@ubuntu:~$ node transcribe.js

‍

and then call your Twilio number and begin speaking. You will see the transcription from AssemblyAI in the console!

Understanding the Universal-Streaming API

The code example uses AssemblyAI's Universal-Streaming (v3) API which returns Turn objects with immutable transcripts. Unlike traditional streaming APIs that provide partial transcripts that change over time, Universal-Streaming delivers final, unchanging transcripts in approximately 300ms. Each Turn object includes:

transcript - The final transcribed text (immutable)
end_of_turn - Boolean indicating if the speaker has finished
turn_order - Sequential turn number
end_of_turn_confidence - Confidence score (0-1) for turn completion
turn_is_formatted - Boolean indicating if text includes punctuation and casing (when format_turns=true)
words - Array of word objects with timestamps and confidence scores

This immutable transcript model eliminates the need to handle changing partial transcripts, making it ideal for voice agents and real-time applications.

Feel free to adapt transcribe.js to suit your needs - you can check out this video for both an in-depth explanation of how the above approach works and instructions on how to display your transcription in-browser rather than in the console!

Use case and benefits

‍This Twilio integration is particularly valuable for call tracking, telecom infrastructure, and customer service applications. Universal-Streaming's intelligent endpointing and superior accuracy make it ideal for:

Call Centers: Real-time agent assistance and quality monitoring
Sales Intelligence: Automated call documentation and analysis
Marketing Attribution: Campaign tracking through phone call analysis‍
Healthcare: Patient call transcription for better care documentation

Next steps

Now that you have real-time phone call transcription working, you can build upon this foundation to create more sophisticated voice applications:

Voice Agents: Combine with LLMs to build interactive voice assistants
Call Analytics: Use AssemblyAI's Audio Intelligence features for sentiment analysis, topic detection, and key phrase extraction
Compliance Monitoring: Implement PII redaction and content moderation for regulated industries
Real-time Coaching: Provide live feedback to call center agents based on conversation analysis

To learn more about AssemblyAI's Universal Streaming Speech-to-Text capabilities, check out our comprehensive documentation and explore our Universal-Streaming announcement for detailed performance benchmarks and use cases.

Ready to build with Universal Streaming Speech-to-Text?

Get started with AssemblyAI today