October 29, 2025

Build a real-time medical transcription analysis app with AssemblyAI and LLM Gateway

AI medical transcription converts doctor-patient conversations into accurate clinical notes, streamlining documentation for healthcare providers.

Speech-to-Text

Medical

Tutorial

Kelsey Foster

Growth

Kelsey Foster

Growth

Reviewed by

No items found.

Table of contents

[Visible on live site]

This tutorial shows you how to build a real-time medical transcription system that captures patient conversations, separates speakers, and automatically generates SOAP notes. You'll create a streaming application that processes audio as it happens, giving healthcare providers instant transcription with proper clinical documentation format.

You'll use AssemblyAI's Python SDK for real-time speech-to-text with multichannel speaker separation, combined with OpenAI's GPT-4 API to transform raw transcripts into structured clinical notes. The implementation includes microphone audio capture, streaming transcription processing, and FHIR integration for Electronic Health Records systems.

What is AI medical transcription and when to use streaming

AI medical transcription is software that converts doctor-patient conversations into written clinical notes automatically. This means you record a conversation, the AI transcribes the speech, then structures it into proper medical documentation like SOAP notes without manual typing.

The technology works in four steps. First, it records audio from patient visits in real-time. Second, speech-to-text models convert spoken words into text using medical vocabulary. Third, AI models identify medical information and organize it into clinical note formats. Finally, doctors review and approve the generated notes before adding them to patient records.

Streaming transcription processes audio as it happens, giving you text within milliseconds. You'd use this during live patient visits when you need immediate feedback and real-time documentation.

Feature	Streaming Transcription	Async Transcription
Use Case	Live patient visits, telehealth calls	Dictations, recorded consultations
Latency	Real-time processing	Minutes to process
Best For	Interactive sessions	Post-visit documentation
Example Scenarios	Emergency consultations, therapy sessions	Radiology reports, surgical notes

Choose streaming when you need instant results during patient interactions. Choose async when you're processing recorded dictations after appointments.

How AI medical transcription works in practice

The workflow transforms spoken conversations into structured clinical notes through a specific process. Your system captures audio from patient-clinician conversations using secure recording technology. Speech recognition then converts these spoken words into text using specialized medical vocabulary trained on clinical conversations.

AI models process this raw text to identify medical entities like symptoms, medications, and diagnoses. The system organizes this information into standard formats like SOAP notes with proper medical structure.

Time savings: Eliminates manual note-taking during patient visits
Reduced documentation burden: Cuts evening charting sessions that cause burnout
Improved accuracy: Creates consistent notes with proper medical terminology
Better patient focus: Lets you maintain eye contact instead of typing

Major healthcare systems like Kaiser Permanente and UC San Francisco have already implemented AI transcription, but the technology has limitations you need to understand. Some AI transcription systems create "hallucinations"—they invent text that was never spoken. This happens during pauses, with background noise, or when processing unclear speech.

Key considerations for your implementation:

Accuracy verification: Always require human review before finalizing notes
HIPAA compliance: Use systems with proper encryption and Business Associate Agreements
Patient consent: Get explicit permission before recording medical conversations
Bias prevention: Choose AI models trained on diverse patient populations

Build real-time medical transcription with Python SDK

You'll build a streaming system that captures patient conversations, processes multichannel audio for speaker separation, and generates SOAP notes. This implementation uses AssemblyAI's Python SDK for real-time transcription with OpenAI's GPT-4 API to transform transcripts into clinical documentation.

Set up Python environment and streaming session

Create a new Python project and install the required dependencies. You need Python 3.8 or higher for this implementation.

# Create project directory mkdir medical-transcription-app cd medical-transcription-app # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install assemblyai pyaudio numpy openai python-dotenv

Create a .env file to store your API keys securely:

ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here OPENAI_API_KEY=your_openai_api_key_here

Set up the basic streaming configuration with medical-optimized settings:

import os import assemblyai as aai from assemblyai.streaming.v3 import StreamingClient, StreamingParameters, TurnEvent from dotenv import load_dotenv import pyaudio import numpy as np from queue import Queue import threading # Load environment variables load_dotenv() # Configure AssemblyAI with your API key aai.settings.api_key = os.getenv('ASSEMBLYAI_API_KEY') # Audio configuration for medical conversations SAMPLE_RATE = 16000 # 16kHz for optimal speech recognition CHANNELS = 1 CHUNK_SIZE = 3200 # 200ms chunks for low latency AUDIO_FORMAT = pyaudio.paInt16 class MedicalTranscriptionStream: def __init__(self): self.audio_queue = Queue() self.transcript_queue = Queue() self.is_running = False self.pyaudio = pyaudio.PyAudio() def start_streaming(self): """Initialize streaming session with medical-specific settings""" # Configure streaming parameters params = StreamingParameters( encoding='pcm_s16le', sample_rate=SAMPLE_RATE, channels=CHANNELS, # Medical terminology optimization keyterms_prompt=["hypertension", "diabetes", "metformin", "systolic", "diastolic"] ) # Create streaming client self.transcriber = StreamingClient( on_turn=self.on_transcription_turn, on_error=self.on_error ) # Connect to AssemblyAI streaming service self.transcriber.connect(params) self.is_running = True

This configuration optimizes for medical conversations with 16kHz sampling. The keyterms_prompt feature improves accuracy for common medical terms you specify. Note that real-time speaker separation requires multichannel audio where each channel represents a different speaker.

Start streaming medical transcription in minutes

Use AssemblyAI's Python SDK with speaker diarization and word boosting to capture clinical conversations in real time. Create your account to get an API key and begin streaming.

Get API key

Capture and stream audio with speaker separation

Implement audio capture from your microphone with real-time streaming to AssemblyAI. For speaker separation in real-time, you need to provide multichannel audio where each channel represents a different speaker.

def capture_microphone_audio(self): """Capture audio from microphone and send to transcription service""" stream = self.pyaudio.open( format=AUDIO_FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, input=True, frames_per_buffer=CHUNK_SIZE ) print("🎤 Listening... Press Ctrl+C to stop") try: while self.is_running: # Read audio chunk from microphone audio_data = stream.read(CHUNK_SIZE, exception_on_overflow=False) # Send to AssemblyAI for transcription self.transcriber.stream(audio_data) except KeyboardInterrupt: print("\n⏹️ Stopping transcription...") finally: stream.stop_stream() stream.close() self.stop_streaming() def on_transcription_turn(self, turn: TurnEvent): """Handle incoming transcription turns""" if turn.text: # For multichannel audio, speaker is identified by channel # Single channel audio will not have speaker separation # Store transcript data transcript_data = { 'text': turn.text, 'timestamp': turn.start_time, 'confidence': turn.confidence, 'is_final': True # TurnEvents represent finalized segments } self.transcript_queue.put(transcript_data) # Display real-time transcription print(f"\nTranscript: {turn.text}")

For single-channel audio, all transcription appears without speaker labels. To achieve speaker separation in real-time, configure multichannel audio where each participant uses a separate audio channel. Post-processing with async transcription can provide speaker diarization from single-channel recordings.

Process transcripts and generate SOAP notes

Transform raw transcripts into structured SOAP notes using OpenAI's GPT-4 API. This combines AssemblyAI's transcription accuracy with GPT-4's medical knowledge.

import openai from datetime import datetime import json class SOAPNoteGenerator: def __init__(self): self.client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY')) self.conversation_buffer = [] def process_transcript(self, transcript_data): """Add transcript to conversation buffer""" if transcript_data['is_final']: self.conversation_buffer.append({ 'text': transcript_data['text'], 'timestamp': transcript_data['timestamp'] }) def generate_soap_note(self): """Convert conversation transcript to SOAP format using LLM""" if not self.conversation_buffer: return None # Prepare conversation for LLM processing conversation_text = self.format_conversation() prompt = f""" Convert the following medical conversation into a SOAP note format. Identify the healthcare provider and patient speakers. Conversation: {conversation_text} Generate a SOAP note with these sections: - SUBJECTIVE: Patient's complaints, symptoms, and history - OBJECTIVE: Vital signs, physical exam findings, test results - ASSESSMENT: Diagnosis or differential diagnoses - PLAN: Treatment plan, medications, follow-up Format as JSON with keys: subjective, objective, assessment, plan """ try: response = self.client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a medical documentation assistant. Convert conversations into accurate SOAP notes."}, {"role": "user", "content": prompt} ], temperature=0.3, # Lower temperature for consistency max_tokens=1000 ) soap_content = response.choices[0].message.content soap_json = json.loads(soap_content) return self.format_soap_note(soap_json) except Exception as e: print(f"Error generating SOAP note: {e}") return None def format_conversation(self): """Format buffered conversation for LLM processing""" formatted = [] for entry in self.conversation_buffer: formatted.append(f"{entry['text']}") return "\n".join(formatted) def format_soap_note(self, soap_json): """Format SOAP note for display""" note = f""" SOAP NOTE - {datetime.now().strftime('%Y-%m-%d %H:%M')} {'='*50} SUBJECTIVE: {soap_json.get('subjective', 'No subjective data recorded')} OBJECTIVE: {soap_json.get('objective', 'No objective data recorded')} ASSESSMENT: {soap_json.get('assessment', 'No assessment recorded')} PLAN: {soap_json.get('plan', 'No plan recorded')} {'='*50} Generated by AI - Requires physician review """ return note

The OpenAI API processes conversation flow and extracts medical information for each SOAP section. It identifies symptoms for Subjective, vital signs for Objective, and treatment plans for the Plan section.

Integrate with EHR systems

Connect your transcription system to Electronic Health Records through standard FHIR APIs. Most modern EHRs support FHIR for seamless data exchange.

import requests from typing import Optional class EHRIntegration: def __init__(self, ehr_base_url: str, api_key: str): self.base_url = ehr_base_url self.headers = { 'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json' } def create_clinical_note(self, patient_id: str, soap_note: str, provider_id: str) -> Optional[str]: """Submit SOAP note to EHR system via FHIR API""" fhir_document = { "resourceType": "DocumentReference", "status": "current", "type": { "coding": [{ "system": "http://loinc.org", "code": "11488-4", "display": "Consultation note" }] }, "subject": {"reference": f"Patient/{patient_id}"}, "author": [{"reference": f"Practitioner/{provider_id}"}], "date": datetime.now().isoformat(), "content": [{ "attachment": { "contentType": "text/plain", "data": self.encode_base64(soap_note) } }] } try: response = requests.post( f"{self.base_url}/DocumentReference", json=fhir_document, headers=self.headers ) if response.status_code == 201: document_id = response.json().get('id') return document_id else: return None except Exception as e: print(f"Error submitting to EHR: {e}") return None

Here's the complete application that ties everything together:

class MedicalTranscriptionApp: def __init__(self): self.stream = MedicalTranscriptionStream() self.soap_generator = SOAPNoteGenerator() def run(self): """Main application loop""" print("🏥 Medical Transcription System Started") # Start streaming transcription self.stream.start_streaming() # Start audio capture in separate thread audio_thread = threading.Thread(target=self.stream.capture_microphone_audio) audio_thread.start() try: while self.stream.is_running: if not self.stream.transcript_queue.empty(): transcript = self.stream.transcript_queue.get() self.soap_generator.process_transcript(transcript) except KeyboardInterrupt: print("\n📝 Generating SOAP note...") soap_note = self.soap_generator.generate_soap_note() if soap_note: print(soap_note) finally: self.stream.stop_streaming() audio_thread.join() if __name__ == "__main__": app = MedicalTranscriptionApp() app.run()

Run this script and it will capture audio from your microphone, transcribe it in real-time with speaker separation, and generate SOAP notes when you stop recording.

Security and compliance requirements

Healthcare transcription systems must meet strict HIPAA requirements to protect patient information, especially as healthcare data breaches affected 276+ million patients in 2024 with an average cost of $9.77 million per incident. HIPAA mandates technical, physical, and administrative safeguards for any system handling Protected Health Information (PHI).

Your AI transcription system needs specific security measures:

HIPAA Requirement	Implementation	What You Need
Encryption	TLS 1.2+ for data in transit, AES-256 for storage	Secure API connections
Access Control	API key authentication, role-based permissions	Restricted system access
Audit Logs	Track all access and modifications	Request logging enabled
Business Associate Agreement	Signed BAA with all vendors	Legal agreements

Patient consent is another critical requirement. You must get explicit permission before recording medical conversations.

Document consent: Store signed consent forms with patient records
Allow withdrawal: Patients can revoke permission at any time
Clear disclosure: Explain how recordings will be used and stored
State compliance: Check local laws for additional consent requirements

The accuracy challenge needs special attention. Some AI transcription systems hallucinate—they invent text that was never spoken. This risk increases during pauses, background noise, or unclear speech.

Safeguards you should implement:

Confidence scoring: Flag transcriptions with low confidence scores
Human review: Require physician approval before finalizing documentation
Quality monitoring: Regular audits of transcription accuracy
Fallback procedures: Manual documentation when AI fails

Final words

This real-time medical transcription system captures patient conversations, processes them with streaming speech-to-text, separates speakers using multichannel audio, and transforms raw transcripts into structured SOAP notes using AI models. The workflow eliminates manual note-taking during visits and reduces post-appointment documentation time through automated clinical note generation.

Test real-time transcription in your browser

Experiment with streaming transcription, speaker diarization, and formatting without writing code. Validate latency and accuracy before integrating the SDK.

Open playground

FAQ

How accurate is AI medical transcription for complex medical terminology?

Modern AI medical transcription achieves high accuracy through specialized vocabulary models trained on clinical conversations and custom word boosting for medical terms. Systems handle complex drug names, medical abbreviations, and technical procedures when optimized for healthcare vocabulary.

What happens if AI transcription creates hallucinated text in medical notes?

Implement confidence scoring to flag uncertain transcriptions and require mandatory physician review before finalizing any clinical documentation. Monitor for common hallucination patterns during pauses or background noise and establish verification protocols for critical medical information.

Which EHR systems support FHIR integration for AI transcription notes?

Major EHR systems like Epic, Cerner, Allscripts, and athenahealth support FHIR R4 standards for document integration. Use DocumentReference resources to submit clinical notes with proper patient linking and encounter context for seamless workflow integration.

How do I get proper HIPAA compliance for AI medical transcription?

Sign Business Associate Agreements with all AI vendors, implement end-to-end encryption for data transmission and storage, maintain comprehensive audit logs of all system access, and establish patient consent workflows before recording any medical conversations.