Open-source platform for building voice-driven multimodal agents
Top 71.0% on sourcepulse
Bolna is an end-to-end, open-source framework for building voice-first, multimodal conversational AI agents. It targets developers and researchers looking to quickly create production-ready voice applications, enabling features like initiating phone calls, real-time transcription, LLM-driven conversations, and text-to-speech synthesis.
How It Works
Bolna orchestrates a pipeline of specialized components for voice interactions. It leverages providers for telephony (e.g., Twilio), Automatic Speech Recognition (ASR) (e.g., Deepgram), Large Language Models (LLMs) (e.g., OpenAI, Mistral via LiteLLM), and Text-to-Speech (TTS) (e.g., ElevenLabs, AWS Polly). Agents are configured via JSON, defining task flows, toolchains (parallel or sequential processing), and specific provider configurations, allowing for flexible and modular voice agent development.
Quick Start & Requirements
docker-compose build --no-cache <twilio-app | plivo-app>
and run with docker-compose up <twilio-app | plivo-app>
..env
file with provider API keys (Twilio/Plivo, Deepgram, LLM provider, TTS provider), and ngrok
for tunneling.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
ngrok
for external access.9 months ago
1 day