Paper2Slides  by HKUDS

Generate presentations from documents instantly

Created 1 month ago
2,779 stars

Top 17.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Paper2Slides addresses the time-consuming task of creating presentations from research papers and documents. It targets researchers, engineers, and academics by automating the generation of professional slides and posters, offering significant time savings and ensuring content accuracy through a RAG-powered approach. The tool provides flexibility with custom styling and rapid iteration capabilities.

How It Works

The project employs a four-stage pipeline: RAG for document parsing and retrieval, Analysis for extracting key content like figures and tables, Planning for structuring the presentation, and Creation for rendering high-quality visuals. This RAG-powered approach ensures comprehensive content extraction with precise source-linking, maintaining traceability. Users benefit from custom styling options, including natural language prompts, and a lightning-fast generation process with instant previews and robust checkpointing for seamless session management.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.12 Conda environment, and installing dependencies via pip install -r requirements.txt. API keys are required and should be configured in a .env file. A web interface is available alongside the command-line interface.

Highlighted Details

  • Universal Document Support: Processes PDF, Word, Excel, PowerPoint, and Markdown files.
  • RAG-Powered Extraction: Captures critical insights, figures, and data points with source-linked accuracy.
  • Custom Styling: Offers built-in themes (e.g., 'doraemon', 'academic') and allows custom styles via natural language descriptions.
  • Performance Features: Includes a 'fast' mode that skips RAG indexing for quicker previews, and parallel generation (--parallel) for multi-worker processing.
  • Checkpoint & Resume: Automatically saves progress at each stage, enabling seamless resumption or modification of specific pipeline steps.

Maintenance & Community

The project has recently been open-sourced (Dec 2025). Communication channels are available via Feishu and WeChat groups.

Licensing & Compatibility

The project is released under the MIT License, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The 'fast' mode bypasses RAG indexing, which may impact the depth of context retrieval for complex or lengthy documents. Image generation relies on a specific preview model (gemini-3-pro-image-preview), and prompt engineering for styling requires careful consideration. API key management is necessary for operation.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
15
Star History
1,087 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.