ArXiv paper to video/audio converter
Top 60.9% on sourcepulse
This project provides an end-to-end pipeline for converting ArXiv papers into engaging video and audio formats, targeting researchers, students, and content creators who want to consume or distribute scientific literature in more accessible ways. It automates the complex process of extracting, simplifying, and presenting technical content.
How It Works
The core pipeline automates paper consumption by first downloading the LaTeX source from ArXiv. It then converts the LaTeX to HTML using either latex2html
or LaTeXML
, parsing the HTML to extract plain text and equations while discarding tables and figures. Crucially, it leverages OpenAI's GPT API to paraphrase, simplify, and explain the extracted text, creating both detailed and summarized versions. Finally, it generates audio using Google's Text-to-Speech API and creates video content with ffmpeg
, including automatically generated slides for summarized versions.
Quick Start & Requirements
pip install openai PyPDF2 spacy tiktoken pyperclip google-cloud-texttospeech pydrive2 pdflatex
python main.py --verbose --include_summary --create_short --create_video --openai_key <your_key> --paperid <arxiv_id>
Highlighted Details
Maintenance & Community
The project is associated with YouTube, TikTok, and Apple Podcasts channels under the "Arxiv Papers" brand, suggesting active use and potential community engagement around the content generated by this tool. No specific contributor or roadmap information is provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Given the use of external APIs (OpenAI, Google Cloud) and tools, users must adhere to their respective terms of service. Compatibility for commercial use depends on the licensing of these underlying services and the ArXiv content itself.
Limitations & Caveats
The default latex2html
tool is noted to sometimes fail, requiring a switch to latexmlc
. The process relies heavily on external, potentially costly APIs (OpenAI, Google Cloud TTS), and setup involves multiple complex dependencies. The quality of paraphrasing and video generation is dependent on the GPT model and ffmpeg capabilities.
1 year ago
1 day