smol-podcaster by FanaHOVA

Podcast production agent

Created 2 years ago

406 stars

Top 71.6% on SourcePulse

View on GitHub

7 Experts Love This Project

Nathan Lambert

Research Scientist at AI2

Paul Copplestone

Cofounder of Supabase

and 3 more!

Project Summary

smol-podcaster automates podcast production tasks, including transcription, chapter generation, and content ideation for social media. It targets podcast creators and researchers seeking to streamline post-production workflows. The primary benefit is the automated generation of structured content from raw audio files.

How It Works

The tool leverages large language models (LLMs) for transcription, speaker diarization, chapter creation, and content summarization. It processes audio files, extracts key information, and formats it into usable show notes, chapter lists, and promotional content. The system supports both CLI and a web UI with background task processing via Celery, allowing for parallel execution and remote operation.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Configure API keys in .env file.
Run via CLI: python smol_podcaster.py AUDIO_FILE_URL GUEST_NAME NUMBER_OF_SPEAKERS
Run with Web UI + Celery: Requires a message broker (e.g., RabbitMQ). Use honcho start or run celery -A tasks worker --loglevel=INFO and flask --app web.py --debug run separately. Access at localhost:5000.
Prerequisites: Python 3.x, a message broker for Celery, and API keys for LLM services.

Highlighted Details

Generates diarized transcripts with speaker labels and timestamps.
Creates chapter lists with timestamps, adaptable for audio/video sync issues via string similarity.
Provides title ideas and tweet suggestions based on podcast content.
Offers an "Edit Show Notes" feature for consolidating and refining LLM-generated content.

Maintenance & Community

The project is maintained by FanaHOVA. No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project requires configuration of external LLM API keys and a message broker for distributed task execution. Audio/video timestamp synchronization relies on string similarity, which may not perfectly align content if transcription errors are significant.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days