video-podcast-maker  by Agents365-ai

Automated video podcast generation skill

Created 1 month ago
374 stars

Top 75.8% on SourcePulse

GitHubView on GitHub
Project Summary

Automated video podcast creation is addressed by this project, which streamlines the process from topic generation to a professional video podcast, specifically optimized for Bilibili. It targets content creators and users seeking to produce video podcasts without deep technical expertise, offering a benefit of simplified, AI-assisted production.

How It Works

This project employs an AI-driven workflow, orchestrated by agents like Claude Code, to guide users through natural language prompts. It integrates several key components: web-based research for content gathering, structured scriptwriting with chapter markers, multi-engine Text-to-Speech (TTS) synthesis (Azure Speech, CosyVoice, Edge TTS), and React-based video rendering via Remotion. Visual styles are editable in Remotion Studio, with audio synchronized using FFmpeg, background music layered, and optional SRT subtitles burned in. The approach is novel in its comprehensive automation and Bilibili-specific optimizations, including AI-generated covers and chapter timestamps.

Quick Start & Requirements

  • Primary Install/Run: The workflow is primarily designed for AI agents. For manual setup and preview:
    1. Create a Remotion project: npx create-video@latest my-video-project
    2. Navigate: cd my-video-project
    3. Install Remotion dependencies: npm i
    4. Install Python dependencies: pip install azure-cognitiveservices-speech dashscope edge-tts requests
    5. Run Remotion Studio for preview: npx remotion studio src/remotion/index.ts
  • Prerequisites:
    • OS: macOS / Linux (Windows requires WSL validation)
    • Python: 3.8+
    • Node.js: 18+
    • FFmpeg: 4.0+
    • API Keys: Azure Speech, Aliyun CosyVoice, Google Gemini (optional). Edge TTS is free.
  • Environment Variables: Configure TTS backend (TTS_BACKEND), Azure keys/region, and other API keys via ~/.zshrc or ~/.bashrc.
  • Links: npx create-video, npx remotion studio.

Highlighted Details

  • Bilibili Optimization: Includes optimized script structure, automatic chapter timestamps (MM:SS), and AI or Remotion-generated dual-version (16:9 + 4:3) thumbnails.
  • Multi-TTS Support: Integrates Azure Speech (default, supports mixed-language), Aliyun CosyVoice, and a free Edge TTS option.
  • High-Quality Output: Supports 4K resolution (3840x2160) and includes features like chapter progress bars and optional SRT subtitle embedding.
  • Visual Editing & Learning: Real-time preview and visual style editing via Remotion Studio, with preference learning to adapt to user styles over time.
  • Vertical Video: Supports 9:16 aspect ratio for mobile-optimized playback.

Maintenance & Community

The project is actively developed by Agents365-ai, with a roadmap indicating ongoing improvements. Support options include WeChat Pay, Alipay, and Buy Me a Coffee. The GitHub repository is available for contributions and issue tracking.

Licensing & Compatibility

The project is released under the MIT license, which permits commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under continuous iteration, and some features may not yet be fully mature. The quality of the initial video generation is dependent on the underlying AI model used (e.g., Codex, Claude Code, GLM-5). Windows compatibility requires further validation via WSL.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
4
Star History
255 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.