PDF2Audio  by lamm-mit

PDF-to-audio conversion tool

Created 1 year ago
1,331 stars

Top 30.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project converts PDF documents into audio content such as podcasts, lectures, or summaries, targeting users who need to consume or repurpose document information audibly. It leverages OpenAI's GPT models for text processing and speech synthesis, offering customization and iterative refinement of the generated audio.

How It Works

The system processes uploaded PDF files, extracts text, and then utilizes OpenAI's GPT models to generate content based on user-selected templates (e.g., podcast, lecture, summary). Users can further refine the generated transcript through iterative feedback and edits before text-to-speech conversion. This approach allows for flexible content creation and personalized audio output.

Quick Start & Requirements

  • Install: Clone the repository, create and activate a Conda environment (conda create -n pdf2audio python=3.9, conda activate pdf2audio), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: OpenAI API key (placed in a .env file as OPENAI_API_KEY=your_api_key_here).
  • Running: Execute python app.py to launch the Gradio interface.
  • Docs: Hugging Face Spaces

Highlighted Details

  • Supports uploading multiple PDF files.
  • Offers various instruction templates for content generation.
  • Allows customization of text generation and audio models.
  • Enables iterative editing and feedback on generated transcripts.

Maintenance & Community

This project is inspired by and based on code from knowsuchagency/pdf-to-podcast and knowsuchagency/promptic. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of BibTeX entries suggests a research-oriented origin, but commercial use or compatibility with closed-source projects is not specified.

Limitations & Caveats

The application strictly requires an OpenAI API key, which incurs costs. The README does not detail performance benchmarks, supported PDF complexities, or potential limitations on document length or content.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.