transcribe by vivekuppal

Real-time transcription and AI conversation platform

Created 2 years ago

260 stars

Top 97.4% on SourcePulse

Project Summary

Transcribe is a real-time transcription and conversation platform designed for language learning and interactive communication. It provides live transcripts from both microphone and speaker audio, leveraging OpenAI's GPT API (or compatible providers) to generate suggested conversation responses. The platform aims to simulate natural, live conversations, offering multilingual support and streaming LLM responses for a more dynamic user experience.

How It Works

The core of Transcribe involves real-time speech-to-text (STT) processing, supporting offline transcription for free and online options. It integrates with various Large Language Models (LLMs) including OpenAI's GPT series, Together, Perplexity, and Azure-hosted OpenAI. A key advantage is its ability to stream LLM responses, providing immediate feedback rather than waiting for a full generation. This approach allows for interactive language practice and dynamic conversation simulation.

Quick Start & Requirements

Primary install / run command:
- Download and unzip the latest binary from Google Drive.
- Alternatively, clone the repository (git clone https://github.com/vivekuppal/transcribe) and run setup.bat, then python main.py from app/transcribe/.
Non-default prerequisites and dependencies:
- Windows OS (primary focus).
- FFmpeg (install via Chocolatey: choco install ffmpeg).
- Python >= 3.11.0 (for code installation).
- OpenAI API key (paid account required for response generation).
- CUDA libraries (for GPU acceleration, download from NVIDIA).
Estimated setup time or resource footprint: GPU support offers 2-3x faster response times. Setup involves downloading binaries or cloning and running scripts.
Links:
- GitHub Repository: vivekuppal/transcribe
- Binary Download: Google Drive
- CUDA Downloads: NVIDIA Developer

Highlighted Details

Free core functionality for real-time transcription.
Supports multiple LLM providers (OpenAI, Together, Perplexity, Azure) and models (GPT-4o, 4, 3.5).
Streaming LLM responses for interactive use.
Offline Speech-to-Text capability.
Enhanced security features including GitGuardian, Bandit, and Snyk.
GPU acceleration for improved performance.

Maintenance & Community

The project acknowledges contributions from Fahd Mirza and Lappu AI. Users can join the community by emailing for an invite or sharing their email in an issue. On-demand feature development is available via GitHub issues or direct LinkedIn contact. The project was forked from ecoute but has diverged significantly.

Licensing & Compatibility

This project is licensed under the MIT License, permitting broad use and modification. It is primarily tested on Windows, with no explicit compatibility notes for other operating systems or closed-source linking beyond standard MIT terms.

Limitations & Caveats

Response generation and advanced features necessitate a paid API key from an OpenAI-compatible provider. Azure-hosted OpenAI integration may require custom code modifications. The primary development and testing focus is on Windows. Generated binaries may not always reflect the absolute latest codebase. Effective LLM response generation requires at least 1-2 minutes of prior conversation for sufficient context.

transcribe by vivekuppal

Explore Similar Projects

izwi by izwi-ai

S.A.T.U.R.D.A.Y by GRVYDEV

LLMVoX by mbzuai-oryx

alibabacloud-bailian-speech-demo by aliyun

dia2 by nari-labs

Babagaboosh by DougDougGithub

WhisperFusion by collabora

QuickAgent by gkamradt

Scriberr by rishikanthc

speaches by speaches-ai

RealtimeVoiceChat by KoljaB

Linly-Talker by Kedreamix