chaplin by amanvirparhar

Real-time silent speech recognition tool

Created 11 months ago

641 stars

Top 51.9% on SourcePulse

Project Summary

Chaplin is a real-time, fully local visual speech recognition (VSR) tool that transcribes silently mouthed words by reading lips. It is designed for users interested in silent communication or exploring advanced VSR technologies.

How It Works

Chaplin utilizes a pre-trained model from the Auto-AVSR project, specifically trained on the Lip Reading Sentences 3 (LRS3) dataset. It employs the MediaPipe framework for lip detection and integrates with Ollama for language modeling, enabling real-time transcription of lip movements.

Quick Start & Requirements

Install and run Ollama, then pull the llama3.2 model.
Install uv (Python package manager).
Run: sudo uv run --with-requirements requirements.txt --python 3.12 main.py config_filename=./configs/LRS3_V_WER19.1.ini detector=mediapipe
Requires Python 3.12.
Download and place LRS3_V_WER19.1 and lm_en_subword model components in the specified directory structure.
Official demo: Watch a demo of Chaplin here

Highlighted Details

Real-time silent speech recognition.
Fully local execution, no cloud dependencies.
Utilizes MediaPipe for lip detection.
Integrates with Ollama for language modeling.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific model downloads and manual placement. The use of sudo for the run command suggests potential permission issues or system-level integration. No performance benchmarks or accuracy metrics are provided.

chaplin by amanvirparhar

Explore Similar Projects

praises by ElmTran

Transcribro by soupslurpr

edgedict by theblackcat102

LiveWhisper by Nikorasu

LLaSM by LinkSoul-AI

speech_course by yandexdataschool

av_hubert by facebookresearch

ASR-LLM-TTS by ABexit

rhubarb-lip-sync by DanielSWolf

RealtimeSTT by KoljaB

sherpa-onnx by k2-fsa

Wav2Lip by Rudrabha