This tutorial shows you how to build a real-time medical transcription system that captures patient conversations, separates speakers, and automatically generates SOAP notes. You'll create a streaming application that processes audio as it happens, giving healthcare providers instant transcription with proper clinical documentation format.
You'll use AssemblyAI's Python SDK for real-time speech-to-text with multichannel speaker separation, combined with OpenAI's GPT-4 API to transform raw transcripts into structured clinical notes. The implementation includes microphone audio capture, streaming transcription processing, and FHIR integration for Electronic Health Records systems.
What is AI medical transcription and when to use streaming
AI medical transcription is software that converts doctor-patient conversations into written clinical notes automatically. This means you record a conversation, the AI transcribes the speech, then structures it into proper medical documentation like SOAP notes without manual typing.
The technology works in four steps. First, it records audio from patient visits in real-time. Second, speech-to-text models convert spoken words into text using medical vocabulary. Third, AI models identify medical information and organize it into clinical note formats. Finally, doctors review and approve the generated notes before adding them to patient records.
Streaming transcription processes audio as it happens, giving you text within milliseconds. You'd use this during live patient visits when you need immediate feedback and real-time documentation.
Feature | Streaming Transcription | Async Transcription |
|---|
Use Case | Live patient visits, telehealth calls | Dictations, recorded consultations |
Latency | Real-time processing | Minutes to process |
Best For | Interactive sessions | Post-visit documentation |
Example Scenarios | Emergency consultations, therapy sessions | Radiology reports, surgical notes |
Choose streaming when you need instant results during patient interactions. Choose async when you're processing recorded dictations after appointments.
How AI medical transcription works in practice
The workflow transforms spoken conversations into structured clinical notes through a specific process. Your system captures audio from patient-clinician conversations using secure recording technology. Speech recognition then converts these spoken words into text using specialized medical vocabulary trained on clinical conversations.
AI models process this raw text to identify medical entities like symptoms, medications, and diagnoses. The system organizes this information into standard formats like SOAP notes with proper medical structure.
- Time savings: Eliminates manual note-taking during patient visits
- Reduced documentation burden: Cuts evening charting sessions that cause burnout
- Improved accuracy: Creates consistent notes with proper medical terminology
- Better patient focus: Lets you maintain eye contact instead of typing
Major healthcare systems like Kaiser Permanente and UC San Francisco have already implemented AI transcription, but the technology has limitations you need to understand. Some AI transcription systems create "hallucinations"—they invent text that was never spoken. This happens during pauses, with background noise, or when processing unclear speech.
Key considerations for your implementation:
- Accuracy verification: Always require human review before finalizing notes
- HIPAA compliance: Use systems with proper encryption and Business Associate Agreements
- Patient consent: Get explicit permission before recording medical conversations
- Bias prevention: Choose AI models trained on diverse patient populations
Build real-time medical transcription with Python SDK
You'll build a streaming system that captures patient conversations, processes multichannel audio for speaker separation, and generates SOAP notes. This implementation uses AssemblyAI's Python SDK for real-time transcription with OpenAI's GPT-4 API to transform transcripts into clinical documentation.
Set up Python environment and streaming session
Create a new Python project and install the required dependencies. You need Python 3.8 or higher for this implementation.
# Create project directory
mkdir medical-transcription-app
cd medical-transcription-app
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install assemblyai pyaudio numpy openai python-dotenv
Create a .env file to store your API keys securely:
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
Set up the basic streaming configuration with medical-optimized settings:
import os
import assemblyai as aai
from assemblyai.streaming.v3 import StreamingClient, StreamingParameters, TurnEvent
from dotenv import load_dotenv
import pyaudio
import numpy as np
from queue import Queue
import threading
# Load environment variables
load_dotenv()
# Configure AssemblyAI with your API key
aai.settings.api_key = os.getenv('ASSEMBLYAI_API_KEY')
# Audio configuration for medical conversations
SAMPLE_RATE = 16000 # 16kHz for optimal speech recognition
CHANNELS = 1
CHUNK_SIZE = 3200 # 200ms chunks for low latency
AUDIO_FORMAT = pyaudio.paInt16
class MedicalTranscriptionStream:
def __init__(self):
self.audio_queue = Queue()
self.transcript_queue = Queue()
self.is_running = False
self.pyaudio = pyaudio.PyAudio()
def start_streaming(self):
"""Initialize streaming session with medical-specific settings"""
# Configure streaming parameters
params = StreamingParameters(
encoding='pcm_s16le',
sample_rate=SAMPLE_RATE,
channels=CHANNELS,
# Medical terminology optimization
keyterms_prompt=["hypertension", "diabetes", "metformin", "systolic", "diastolic"]
)
# Create streaming client
self.transcriber = StreamingClient(
on_turn=self.on_transcription_turn,
on_error=self.on_error
)
# Connect to AssemblyAI streaming service
self.transcriber.connect(params)
self.is_running = True
This configuration optimizes for medical conversations with 16kHz sampling. The keyterms_prompt feature improves accuracy for common medical terms you specify. Note that real-time speaker separation requires multichannel audio where each channel represents a different speaker.
Start streaming medical transcription in minutes
Use AssemblyAI's Python SDK with speaker diarization and word boosting to capture clinical conversations in real time. Create your account to get an API key and begin streaming.
Get API key
Capture and stream audio with speaker separation
Implement audio capture from your microphone with real-time streaming to AssemblyAI. For speaker separation in real-time, you need to provide multichannel audio where each channel represents a different speaker.
def capture_microphone_audio(self):
"""Capture audio from microphone and send to transcription service"""
stream = self.pyaudio.open(
format=AUDIO_FORMAT,
channels=CHANNELS,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=CHUNK_SIZE
)
print("🎤 Listening... Press Ctrl+C to stop")
try:
while self.is_running:
# Read audio chunk from microphone
audio_data = stream.read(CHUNK_SIZE, exception_on_overflow=False)
# Send to AssemblyAI for transcription
self.transcriber.stream(audio_data)
except KeyboardInterrupt:
print("\n⏹️ Stopping transcription...")
finally:
stream.stop_stream()
stream.close()
self.stop_streaming()
def on_transcription_turn(self, turn: TurnEvent):
"""Handle incoming transcription turns"""
if turn.text:
# For multichannel audio, speaker is identified by channel
# Single channel audio will not have speaker separation
# Store transcript data
transcript_data = {
'text': turn.text,
'timestamp': turn.start_time,
'confidence': turn.confidence,
'is_final': True # TurnEvents represent finalized segments
}
self.transcript_queue.put(transcript_data)
# Display real-time transcription
print(f"\nTranscript: {turn.text}")
For single-channel audio, all transcription appears without speaker labels. To achieve speaker separation in real-time, configure multichannel audio where each participant uses a separate audio channel. Post-processing with async transcription can provide speaker diarization from single-channel recordings.
Process transcripts and generate SOAP notes
Transform raw transcripts into structured SOAP notes using OpenAI's GPT-4 API. This combines AssemblyAI's transcription accuracy with GPT-4's medical knowledge.
import openai
from datetime import datetime
import json
class SOAPNoteGenerator:
def __init__(self):
self.client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
self.conversation_buffer = []
def process_transcript(self, transcript_data):
"""Add transcript to conversation buffer"""
if transcript_data['is_final']:
self.conversation_buffer.append({
'text': transcript_data['text'],
'timestamp': transcript_data['timestamp']
})
def generate_soap_note(self):
"""Convert conversation transcript to SOAP format using LLM"""
if not self.conversation_buffer:
return None
# Prepare conversation for LLM processing
conversation_text = self.format_conversation()
prompt = f"""
Convert the following medical conversation into a SOAP note format.
Identify the healthcare provider and patient speakers.
Conversation:
{conversation_text}
Generate a SOAP note with these sections:
- SUBJECTIVE: Patient's complaints, symptoms, and history
- OBJECTIVE: Vital signs, physical exam findings, test results
- ASSESSMENT: Diagnosis or differential diagnoses
- PLAN: Treatment plan, medications, follow-up
Format as JSON with keys: subjective, objective, assessment, plan
"""
try:
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a medical documentation assistant. Convert conversations into accurate SOAP notes."},
{"role": "user", "content": prompt}
],
temperature=0.3, # Lower temperature for consistency
max_tokens=1000
)
soap_content = response.choices[0].message.content
soap_json = json.loads(soap_content)
return self.format_soap_note(soap_json)
except Exception as e:
print(f"Error generating SOAP note: {e}")
return None
def format_conversation(self):
"""Format buffered conversation for LLM processing"""
formatted = []
for entry in self.conversation_buffer:
formatted.append(f"{entry['text']}")
return "\n".join(formatted)
def format_soap_note(self, soap_json):
"""Format SOAP note for display"""
note = f"""
SOAP NOTE - {datetime.now().strftime('%Y-%m-%d %H:%M')}
{'='*50}
SUBJECTIVE:
{soap_json.get('subjective', 'No subjective data recorded')}
OBJECTIVE:
{soap_json.get('objective', 'No objective data recorded')}
ASSESSMENT:
{soap_json.get('assessment', 'No assessment recorded')}
PLAN:
{soap_json.get('plan', 'No plan recorded')}
{'='*50}
Generated by AI - Requires physician review
"""
return note
The OpenAI API processes conversation flow and extracts medical information for each SOAP section. It identifies symptoms for Subjective, vital signs for Objective, and treatment plans for the Plan section.
Integrate with EHR systems
Connect your transcription system to Electronic Health Records through standard FHIR APIs. Most modern EHRs support FHIR for seamless data exchange.
import requests
from typing import Optional
class EHRIntegration:
def __init__(self, ehr_base_url: str, api_key: str):
self.base_url = ehr_base_url
self.headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}
def create_clinical_note(self, patient_id: str, soap_note: str, provider_id: str) -> Optional[str]:
"""Submit SOAP note to EHR system via FHIR API"""
fhir_document = {
"resourceType": "DocumentReference",
"status": "current",
"type": {
"coding": [{
"system": "http://loinc.org",
"code": "11488-4",
"display": "Consultation note"
}]
},
"subject": {"reference": f"Patient/{patient_id}"},
"author": [{"reference": f"Practitioner/{provider_id}"}],
"date": datetime.now().isoformat(),
"content": [{
"attachment": {
"contentType": "text/plain",
"data": self.encode_base64(soap_note)
}
}]
}
try:
response = requests.post(
f"{self.base_url}/DocumentReference",
json=fhir_document,
headers=self.headers
)
if response.status_code == 201:
document_id = response.json().get('id')
return document_id
else:
return None
except Exception as e:
print(f"Error submitting to EHR: {e}")
return None
Here's the complete application that ties everything together:
class MedicalTranscriptionApp:
def __init__(self):
self.stream = MedicalTranscriptionStream()
self.soap_generator = SOAPNoteGenerator()
def run(self):
"""Main application loop"""
print("🏥 Medical Transcription System Started")
# Start streaming transcription
self.stream.start_streaming()
# Start audio capture in separate thread
audio_thread = threading.Thread(target=self.stream.capture_microphone_audio)
audio_thread.start()
try:
while self.stream.is_running:
if not self.stream.transcript_queue.empty():
transcript = self.stream.transcript_queue.get()
self.soap_generator.process_transcript(transcript)
except KeyboardInterrupt:
print("\n📝 Generating SOAP note...")
soap_note = self.soap_generator.generate_soap_note()
if soap_note:
print(soap_note)
finally:
self.stream.stop_streaming()
audio_thread.join()
if __name__ == "__main__":
app = MedicalTranscriptionApp()
app.run()
Run this script and it will capture audio from your microphone, transcribe it in real-time with speaker separation, and generate SOAP notes when you stop recording.
Security and compliance requirements
Healthcare transcription systems must meet strict HIPAA requirements to protect patient information, especially as healthcare data breaches affected 276+ million patients in 2024 with an average cost of $9.77 million per incident. HIPAA mandates technical, physical, and administrative safeguards for any system handling Protected Health Information (PHI).
Your AI transcription system needs specific security measures:
HIPAA Requirement | Implementation | What You Need |
|---|
Encryption | TLS 1.2+ for data in transit, AES-256 for storage | Secure API connections |
Access Control | API key authentication, role-based permissions | Restricted system access |
Audit Logs | Track all access and modifications | Request logging enabled |
Business Associate Agreement | Signed BAA with all vendors | Legal agreements |
Patient consent is another critical requirement. You must get explicit permission before recording medical conversations.
- Document consent: Store signed consent forms with patient records
- Allow withdrawal: Patients can revoke permission at any time
- Clear disclosure: Explain how recordings will be used and stored
- State compliance: Check local laws for additional consent requirements
The accuracy challenge needs special attention. Some AI transcription systems hallucinate—they invent text that was never spoken. This risk increases during pauses, background noise, or unclear speech.
Safeguards you should implement:
- Confidence scoring: Flag transcriptions with low confidence scores
- Human review: Require physician approval before finalizing documentation
- Quality monitoring: Regular audits of transcription accuracy
- Fallback procedures: Manual documentation when AI fails
Final words
This real-time medical transcription system captures patient conversations, processes them with streaming speech-to-text, separates speakers using multichannel audio, and transforms raw transcripts into structured SOAP notes using AI models. The workflow eliminates manual note-taking during visits and reduces post-appointment documentation time through automated clinical note generation.
Test real-time transcription in your browser
Experiment with streaming transcription, speaker diarization, and formatting without writing code. Validate latency and accuracy before integrating the SDK.
Open playground
FAQ
How accurate is AI medical transcription for complex medical terminology?
Modern AI medical transcription achieves high accuracy through specialized vocabulary models trained on clinical conversations and custom word boosting for medical terms. Systems handle complex drug names, medical abbreviations, and technical procedures when optimized for healthcare vocabulary.
What happens if AI transcription creates hallucinated text in medical notes?
Implement confidence scoring to flag uncertain transcriptions and require mandatory physician review before finalizing any clinical documentation. Monitor for common hallucination patterns during pauses or background noise and establish verification protocols for critical medical information.
Which EHR systems support FHIR integration for AI transcription notes?
Major EHR systems like Epic, Cerner, Allscripts, and athenahealth support FHIR R4 standards for document integration. Use DocumentReference resources to submit clinical notes with proper patient linking and encounter context for seamless workflow integration.
How do I get proper HIPAA compliance for AI medical transcription?
Sign Business Associate Agreements with all AI vendors, implement end-to-end encryption for data transmission and storage, maintain comprehensive audit logs of all system access, and establish patient consent workflows before recording any medical conversations.
Title goes here
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Button Text