Discover and explore top open-source AI tools and projects—updated daily.
Generate academic presentation videos from papers
New!
Top 36.7% on SourcePulse
Summary
Paper2Video automates the creation of presentation videos from scientific papers, targeting researchers and academics. It offers an agent (PaperTalker) for video generation and a benchmark (Paper2Video) for evaluating video quality, aiming to streamline academic communication and boost scholarly visibility.
How It Works
The project comprises two core components: PaperTalker, an agent synthesizing videos by integrating slides, subtitles, cursor grounding, speech synthesis, and talking-head rendering from a paper, image, and audio; and Paper2Video, a novel benchmark evaluating videos on conveying core ideas, audience accessibility, author identity, and research visibility using metrics like Meta Similarity, PresentArena, PresentQuiz, and IP Memory.
Quick Start & Requirements
pip install -r requirements.txt
in the src/
directory. Separate setup is needed for hallo2
(talking-head generation) and Paper2Poster
, including cloning their repositories and creating distinct Conda environments (e.g., hallo
).GEMINI_API_KEY
) and OpenAI (OPENAI_API_KEY
) are mandatory; GPT4.1 or Gemini2.5-Pro are recommended. Local models via Paper2Poster are supported.hallo2
.pipeline.py
script for an automated generation process from LaTeX papers, a reference image, and reference audio.python pipeline.py --model_name_t gpt-4.1 --model_name_v gpt-4.1 --model_name_talking hallo2 --result_dir /path/to/output --paper_latex_root /path/to/latex_proj --ref_img /path/to/ref_img.png --ref_audio /path/to/ref_audio.wav --talking_head_env /path/to/hallo2_env --gpu_list [0,1,2,3,4,5,6,7]
Highlighted Details
Maintenance & Community
Developed by Show Lab at the National University of Singapore. The project acknowledges contributions from the CAMEL multi-agent framework. No specific community channels (Discord, Slack) or roadmap links are provided in the README snippet.
Licensing & Compatibility
The provided README snippet does not specify a software license, preventing assessment of commercial use or closed-source linking compatibility.
Limitations & Caveats
Setup is complex, requiring separate environment configurations for core components and the hallo2
talking-head module, potentially leading to package conflicts. Significant GPU resources (NVIDIA A6000 48GB) are necessary for running the pipeline. Reliance on external LLM APIs mandates API key management. The future-dated updates (October 2025) may indicate placeholder information or an unusual project timeline.
2 days ago
Inactive