pdf-to-podcast  by NVIDIA-AI-Blueprints

AI blueprint for converting PDFs into podcast-style audio

created 7 months ago
722 stars

Top 48.7% on sourcepulse

GitHubView on GitHub
Project Summary

This NVIDIA AI Blueprint transforms PDFs into AI-generated podcasts, targeting developers and researchers who need to create engaging audio content from documents. It leverages NVIDIA NIM microservices for flexible, private deployment, enabling on-demand audio insights without data sharing.

How It Works

The system ingests a primary PDF and optional context PDFs, using a guide prompt to focus the AI's output. It employs Docling for document parsing, NVIDIA NIM microservices (e.g., Llama 3.1 models) for response generation, and ElevenLabs for text-to-speech. This modular, microservices-based architecture allows for customization, such as swapping LLMs or disabling GPU usage for specific components to manage resource requirements.

Quick Start & Requirements

  • Install: Clone the repository, set environment variables (ELEVENLABS_API_KEY, NVIDIA_API_KEY), and run make uv to install dependencies.
  • Prerequisites: Docker Engine & Compose (v2.29.1+), NVIDIA Container Toolkit (for GPU), git, Ubuntu 20.04/22.04, NVIDIA AI Enterprise developer license for local NIM hosting, API keys for NVIDIA NIM and ElevenLabs.
  • Run: Execute make all-services to start all microservices. Generate podcasts with python tests/test.py --target <pdf1.pdf> [--context <pdf2.pdf>] [--monologue].
  • Docs: Swagger UI available at localhost:8002/docs.

Highlighted Details

  • Utilizes NVIDIA NIM microservices for LLM and TTS, offering flexibility in model choice (e.g., Llama 3.1 variants).
  • Supports private network deployment for secure handling of sensitive data.
  • Allows customization of components and GPU assignments to optimize resource usage.
  • Includes tracing capabilities via Jaeger for debugging.

Maintenance & Community

The project is an NVIDIA AI Blueprint, indicating official support and development from NVIDIA. Contributions are managed via standard GitHub pull requests.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The provided setup uses HTTP and is not intended for production; SSL/TLS encryption and security headers are recommended for production deployments. Initial service startup, particularly Docling, can take 10-15 minutes.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
93 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.