chaplin  by amanvirparhar

Real-time silent speech recognition tool

Created 7 months ago
539 stars

Top 59.1% on SourcePulse

GitHubView on GitHub
Project Summary

Chaplin is a real-time, fully local visual speech recognition (VSR) tool that transcribes silently mouthed words by reading lips. It is designed for users interested in silent communication or exploring advanced VSR technologies.

How It Works

Chaplin utilizes a pre-trained model from the Auto-AVSR project, specifically trained on the Lip Reading Sentences 3 (LRS3) dataset. It employs the MediaPipe framework for lip detection and integrates with Ollama for language modeling, enabling real-time transcription of lip movements.

Quick Start & Requirements

  • Install and run Ollama, then pull the llama3.2 model.
  • Install uv (Python package manager).
  • Run: sudo uv run --with-requirements requirements.txt --python 3.12 main.py config_filename=./configs/LRS3_V_WER19.1.ini detector=mediapipe
  • Requires Python 3.12.
  • Download and place LRS3_V_WER19.1 and lm_en_subword model components in the specified directory structure.
  • Official demo: Watch a demo of Chaplin here

Highlighted Details

  • Real-time silent speech recognition.
  • Fully local execution, no cloud dependencies.
  • Utilizes MediaPipe for lip detection.
  • Integrates with Ollama for language modeling.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific model downloads and manual placement. The use of sudo for the run command suggests potential permission issues or system-level integration. No performance benchmarks or accuracy metrics are provided.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Starred by Shane Thomas Shane Thomas(Cofounder of Mastra), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
2 more.

Wav2Lip by Rudrabha

0.2%
12k
Lip-syncing tool for generating videos from speech
Created 5 years ago
Updated 2 months ago
Feedback? Help us improve.