PDF2Audio by lamm-mit

PDF-to-audio conversion tool

Created 1 year ago

1,364 stars

Top 29.1% on SourcePulse

Project Summary

This project converts PDF documents into audio content such as podcasts, lectures, or summaries, targeting users who need to consume or repurpose document information audibly. It leverages OpenAI's GPT models for text processing and speech synthesis, offering customization and iterative refinement of the generated audio.

How It Works

The system processes uploaded PDF files, extracts text, and then utilizes OpenAI's GPT models to generate content based on user-selected templates (e.g., podcast, lecture, summary). Users can further refine the generated transcript through iterative feedback and edits before text-to-speech conversion. This approach allows for flexible content creation and personalized audio output.

Quick Start & Requirements

Install: Clone the repository, create and activate a Conda environment (conda create -n pdf2audio python=3.9, conda activate pdf2audio), and install dependencies (pip install -r requirements.txt).
Prerequisites: OpenAI API key (placed in a .env file as OPENAI_API_KEY=your_api_key_here).
Running: Execute python app.py to launch the Gradio interface.
Docs: Hugging Face Spaces

Highlighted Details

Supports uploading multiple PDF files.
Offers various instruction templates for content generation.
Allows customization of text generation and audio models.
Enables iterative editing and feedback on generated transcripts.

Maintenance & Community

This project is inspired by and based on code from knowsuchagency/pdf-to-podcast and knowsuchagency/promptic. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of BibTeX entries suggests a research-oriented origin, but commercial use or compatibility with closed-source projects is not specified.

Limitations & Caveats

The application strictly requires an OpenAI API key, which incurs costs. The README does not detail performance benchmarks, supported PDF complexities, or potential limitations on document length or content.

PDF2Audio by lamm-mit

Explore Similar Projects

yt-transcriber by pmarreck

Pandrator by lukaszliniewicz

PodCastLM by YOYZHANG

WavJourney by Audio-AGI

Podcast by artnoage

Qwen3-Audiobook-Converter by WhiskeyCoder

ArxivPapers by imelnyk

audiobook_maker by JarodMica

MOSS-TTSD by OpenMOSS

whisper-plus by kadirnar

pdf-to-podcast by NVIDIA-AI-Blueprints

Amphion by open-mmlab