Paper2Video  by showlab

Generate academic presentation videos from papers

Created 2 weeks ago

New!

1,019 stars

Top 36.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Paper2Video automates the creation of presentation videos from scientific papers, targeting researchers and academics. It offers an agent (PaperTalker) for video generation and a benchmark (Paper2Video) for evaluating video quality, aiming to streamline academic communication and boost scholarly visibility.

How It Works

The project comprises two core components: PaperTalker, an agent synthesizing videos by integrating slides, subtitles, cursor grounding, speech synthesis, and talking-head rendering from a paper, image, and audio; and Paper2Video, a novel benchmark evaluating videos on conveying core ideas, audience accessibility, author identity, and research visibility using metrics like Meta Similarity, PresentArena, PresentQuiz, and IP Memory.

Quick Start & Requirements

  • Installation: Requires Python 3.10 and Conda. Core dependencies are installed via pip install -r requirements.txt in the src/ directory. Separate setup is needed for hallo2 (talking-head generation) and Paper2Poster, including cloning their repositories and creating distinct Conda environments (e.g., hallo).
  • Prerequisites:
    • GPU: NVIDIA A6000 with 48GB VRAM is the minimum recommended.
    • LLMs: API keys for Gemini (GEMINI_API_KEY) and OpenAI (OPENAI_API_KEY) are mandatory; GPT4.1 or Gemini2.5-Pro are recommended. Local models via Paper2Poster are supported.
    • Model Weights: Download required weights for hallo2.
  • Inference: Use the pipeline.py script for an automated generation process from LaTeX papers, a reference image, and reference audio.
  • Example Command: python pipeline.py --model_name_t gpt-4.1 --model_name_v gpt-4.1 --model_name_talking hallo2 --result_dir /path/to/output --paper_latex_root /path/to/latex_proj --ref_img /path/to/ref_img.png --ref_audio /path/to/ref_audio.wav --talking_head_env /path/to/hallo2_env --gpu_list [0,1,2,3,4,5,6,7]
  • Links: Project Website, Hallo2 repository, Paper2Poster repository.

Highlighted Details

  • Automates the creation of academic presentation videos from LaTeX papers, integrating slides, subtitles, speech, and talking heads.
  • Introduces a specialized evaluation benchmark (Paper2Video) with metrics focused on scholarly communication effectiveness.
  • Leverages advanced LLMs (GPT4.1/Gemini2.5-Pro) and a dedicated talking-head model (Hallo2).
  • Features updates dated October 2025, suggesting recent or future-dated activity.

Maintenance & Community

Developed by Show Lab at the National University of Singapore. The project acknowledges contributions from the CAMEL multi-agent framework. No specific community channels (Discord, Slack) or roadmap links are provided in the README snippet.

Licensing & Compatibility

The provided README snippet does not specify a software license, preventing assessment of commercial use or closed-source linking compatibility.

Limitations & Caveats

Setup is complex, requiring separate environment configurations for core components and the hallo2 talking-head module, potentially leading to package conflicts. Significant GPU resources (NVIDIA A6000 48GB) are necessary for running the pipeline. Reliance on external LLM APIs mandates API key management. The future-dated updates (October 2025) may indicate placeholder information or an unusual project timeline.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
11
Star History
1,037 stars in the last 14 days

Explore Similar Projects

Feedback? Help us improve.