Azure OpenAI SDK for real-time audio processing with GPT-4o
Top 44.0% on sourcepulse
This repository provides resources for leveraging Azure OpenAI's GPT-4o real-time audio capabilities via a new /realtime
WebSocket API. It targets developers building low-latency, speech-to-speech conversational applications like support agents, assistants, and translators, offering a more responsive interaction model than traditional request-response APIs.
How It Works
The /realtime
API utilizes WebSockets for asynchronous, bi-directional streaming between a client application and the Azure OpenAI service. It supports text, function calling, and audio input/output. A key feature is flexible turn detection, allowing either server-side Voice Activity Detection (VAD) for automatic response triggering or manual response.create
calls for explicit control, suitable for push-to-talk scenarios. The architecture involves an intermediate service managing user connections and model endpoint communication.
Quick Start & Requirements
eastus2
or swedencentral
region.gpt-4o-realtime-preview
model (version 2024-10-01
).2024-10-01-preview
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The /realtime
API is in public preview, meaning API contracts and behavior may change. It is not designed for direct use from untrusted end-user devices, requiring an intermediate service. Handling lengthy audio inputs with server VAD can lead to rapid, potentially unreliable responses; manual turn control is recommended for such cases.
1 month ago
Inactive