PDF2Audio  by lamm-mit

PDF-to-audio conversion tool

created 10 months ago
1,291 stars

Top 31.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project converts PDF documents into audio content such as podcasts, lectures, or summaries, targeting users who need to consume or repurpose document information audibly. It leverages OpenAI's GPT models for text processing and speech synthesis, offering customization and iterative refinement of the generated audio.

How It Works

The system processes uploaded PDF files, extracts text, and then utilizes OpenAI's GPT models to generate content based on user-selected templates (e.g., podcast, lecture, summary). Users can further refine the generated transcript through iterative feedback and edits before text-to-speech conversion. This approach allows for flexible content creation and personalized audio output.

Quick Start & Requirements

  • Install: Clone the repository, create and activate a Conda environment (conda create -n pdf2audio python=3.9, conda activate pdf2audio), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: OpenAI API key (placed in a .env file as OPENAI_API_KEY=your_api_key_here).
  • Running: Execute python app.py to launch the Gradio interface.
  • Docs: Hugging Face Spaces

Highlighted Details

  • Supports uploading multiple PDF files.
  • Offers various instruction templates for content generation.
  • Allows customization of text generation and audio models.
  • Enables iterative editing and feedback on generated transcripts.

Maintenance & Community

This project is inspired by and based on code from knowsuchagency/pdf-to-podcast and knowsuchagency/promptic. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of BibTeX entries suggests a research-oriented origin, but commercial use or compatibility with closed-source projects is not specified.

Limitations & Caveats

The application strictly requires an OpenAI API key, which incurs costs. The README does not detail performance benchmarks, supported PDF complexities, or potential limitations on document length or content.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
48 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.