ArxivPapers by imelnyk

ArXiv paper to video/audio converter

Created 1 year ago

539 stars

Top 58.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Andy Konwinski

Cofounder of Perplexity, Databricks

Paul Gauthier

Founder of Aider

Project Summary

This project provides an end-to-end pipeline for converting ArXiv papers into engaging video and audio formats, targeting researchers, students, and content creators who want to consume or distribute scientific literature in more accessible ways. It automates the complex process of extracting, simplifying, and presenting technical content.

How It Works

The core pipeline automates paper consumption by first downloading the LaTeX source from ArXiv. It then converts the LaTeX to HTML using either latex2html or LaTeXML, parsing the HTML to extract plain text and equations while discarding tables and figures. Crucially, it leverages OpenAI's GPT API to paraphrase, simplify, and explain the extracted text, creating both detailed and summarized versions. Finally, it generates audio using Google's Text-to-Speech API and creates video content with ffmpeg, including automatically generated slides for summarized versions.

Quick Start & Requirements

Install: pip install openai PyPDF2 spacy tiktoken pyperclip google-cloud-texttospeech pydrive2 pdflatex
Prerequisites: LaTeXML or latex2html, OpenAI API key, ffmpeg, Google Cloud SDK with Application Default Credentials (ADC) setup, Google Text-to-Speech, and optionally Google Drive setup.
Run: python main.py --verbose --include_summary --create_short --create_video --openai_key <your_key> --paperid <arxiv_id>
Docs: LaTeXML, Google Cloud SDK, PyDrive2

Highlighted Details

Generates both long, detailed videos with section summaries and short, summarized videos with auto-generated slides.
Creates audio versions of papers for on-the-go listening.
Optionally uploads generated audio files to Google Drive.
Allows stopping processing at a specified keyword (e.g., "experiments").

Maintenance & Community

The project is associated with YouTube, TikTok, and Apple Podcasts channels under the "Arxiv Papers" brand, suggesting active use and potential community engagement around the content generated by this tool. No specific contributor or roadmap information is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the use of external APIs (OpenAI, Google Cloud) and tools, users must adhere to their respective terms of service. Compatibility for commercial use depends on the licensing of these underlying services and the ArXiv content itself.

Limitations & Caveats

The default latex2html tool is noted to sometimes fail, requiring a switch to latexmlc. The process relies heavily on external, potentially costly APIs (OpenAI, Google Cloud TTS), and setup involves multiple complex dependencies. The quality of paraphrasing and video generation is dependent on the GPT model and ffmpeg capabilities.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days