obs-localvocal by royshil

OBS plugin for local speech recognition and captioning

Created 2 years ago

1,375 stars

Top 28.9% on SourcePulse

Project Summary

LocalVocal is an OBS Studio plugin that provides real-time, local speech-to-text transcription and translation using AI models. It targets streamers, content creators, and accessibility users who need on-device captioning and translation without relying on cloud services, ensuring privacy and eliminating ongoing costs.

How It Works

The plugin leverages Whisper.cpp for efficient, CPU-based (with optional GPU acceleration via CUDA, ROCm, Vulkan, or Metal) processing of audio into text. Translation is handled by CTranslate2. This approach allows for high-performance, local operation, supporting a wide range of languages and offering flexibility in model selection, including custom GGML models.

Quick Start & Requirements

Installation: Download pre-built releases for Windows, macOS, or Linux from the releases page. For building from source, follow platform-specific build scripts (.github/scripts/build-macos, .github/scripts/build-linux, Build-Windows.ps1).
Prerequisites:
- OBS Studio installed.
- For GPU acceleration: NVIDIA GPU with CUDA drivers, AMD GPU with ROCm, or Vulkan-compatible GPU.
- macOS requires specific builds for Intel or Apple Silicon.
- Linux requires libssl-dev.
Resources: The plugin ships with the tiny.en Whisper model; larger models can be downloaded. Performance depends heavily on CPU/GPU capabilities.
Links: Releases, Usage Tutorials.

Highlighted Details

Supports real-time transcription in 100 languages and translation to major languages.
Outputs captions to screen, text files, or directly to RTMP streams.
Offers various acceleration options: CUDA, hipBLAS (AMD ROCm), Apple Arm64, Vulkan, AVX, SSE.
Allows users to bring their own GGML Whisper models.

Maintenance & Community

Actively maintained with regular releases.
Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under the MIT License.
Permits commercial use and integration with closed-source applications.

Limitations & Caveats

AMD ROCm and Vulkan acceleration are noted as experimental.
Building on Linux for non-Ubuntu distributions may require manual dependency management and CMake configuration.
The README mentions potential folder name mismatches when packaging macOS builds.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

9

Star History

68 stars in the last 30 days

Explore Similar Projects

Synthalingua by cyberofficial

Real-time translation tool using AI for audio transcription and translation

Created 2 years ago

Updated 6 days ago

AI-Video-Translation by pranauv1

AI-powered video translation and lip-syncing

Created 2 years ago

Updated 11 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

babelfish.ai by supabase-community

Realtime transcription/translation app using browser-based models

Created 1 year ago

Updated 1 year ago

Auralis by astramind-ai

TTS engine for fast voice cloning

Created 1 year ago

Updated 1 year ago

Speech-Translate by Dadangdut33

Speech-to-text app using Whisper for transcription and translation

Created 3 years ago

Updated 2 years ago

generate-subtitles by mayeaux

Web app for audio/video transcription and translation

Created 3 years ago

Updated 2 years ago

LanguageLeapAI by SociallyIneptWeeb

Real-time AI translator for cross-lingual online communication

Created 3 years ago

Updated 2 years ago

stt by jianchang512

Offline speech-to-text tool for local audio/video transcription

Created 2 years ago

Updated 1 month ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 2 months ago

VideoCaptioner by WEIFENG2333

Subtitle tool for video transcription, translation, and editing using LLMs

Created 1 year ago

Updated 2 weeks ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp), and

2 more.

buzz by chidiwilliams

Desktop app for offline audio transcription and translation

Created 3 years ago

Updated 3 days ago

pyvideotrans by jianchang512

Video translation CLI tool

Created 2 years ago

Updated 22 hours ago

Feedback? Help us improve.