Fun demo app for generating David Attenborough-style narrations
Top 11.3% on sourcepulse
This project provides a Python application that uses a webcam to capture video, processes it with an AI model to identify objects, and then uses text-to-speech to narrate the user's life in the style of David Attenborough. It's designed for users interested in creating AI-powered applications and experiencing novel AI interactions.
How It Works
The application leverages a combination of AI models. Object detection is performed using a pre-trained model, likely from a library like YOLO or similar, to identify items in the webcam feed. The identified objects are then fed into a language model (OpenAI) to generate descriptive text. Finally, this text is converted into speech using ElevenLabs' text-to-speech API, with a specific voice ID configured to mimic a particular narration style.
Quick Start & Requirements
pip install -r requirements.txt
export OPENAI_API_KEY=<token>
, export ELEVENLABS_API_KEY=<token>
, export ELEVENLABS_VOICE_ID=<voice-id>
python capture.py
python narrator.py
Highlighted Details
Maintenance & Community
The project is maintained by cbh123. Further community or maintenance details are not provided in the README.
Licensing & Compatibility
The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.
Limitations & Caveats
The project requires multiple third-party API keys (OpenAI, ElevenLabs), which may incur costs. The quality of narration is dependent on the chosen ElevenLabs voice and the accuracy of the object detection model.
3 weeks ago
1 day