Auto-Slides  by Westlake-AGI-Lab

Automatic academic presentation generation from research papers

Created 5 months ago
265 stars

Top 96.4% on SourcePulse

GitHubView on GitHub
Project Summary

Auto-Slides is an intelligent system that automatically converts academic research papers into structured, pedagogically optimized presentation slides. It targets researchers and academics, leveraging large language models and cognitive science principles to generate multimodal presentations with interactive customization, significantly streamlining the slide creation process from lengthy papers.

How It Works

The system utilizes a multi-agent framework with specialized agents for PDF content extraction (OCR, layout analysis), presentation planning (informed by cognitive science), verification, and repair. This approach, combined with LLMs, generates presentation-oriented narratives optimized for learning. Multimodal output ensures proper formatting for figures, tables, and code, while an interactive dialogue interface allows real-time slide refinement.

Quick Start & Requirements

Installation involves cloning the repo, setting up a virtual environment, and running pip install -r requirements.txt. A critical step is downloading the marker-pdf model (python down_model.py, ~2GB). Prerequisites include Python 3.8+, a LaTeX environment, and an OpenAI API key. The marker-pdf model requires 8GB+ RAM. Key resources are the project page (auto-slides.github.io) and arXiv paper (arXiv:2509.11062).

Highlighted Details

  • Intelligent PDF Processing: Extracts text, figures, tables, and structure using OCR and layout analysis.
  • Multi-Agent Framework: Specialized agents for extraction, planning, verification, and repair.
  • Interactive Customization: Real-time refinement via natural language dialogue.
  • Pedagogical Optimization: Creates learning-enhancing presentation narratives.
  • Multimodal Output: Generates LaTeX Beamer slides with proper formatting for figures, tables, and code.
  • Thematic & Language Support: Multiple Beamer themes (e.g., Madrid) and bilingual (English/Chinese) support, with optional speech generation.

Maintenance & Community

Associated with AGI Lab, Westlake University, and UC Merced. Community interaction is via GitHub Issues. Project page and contact details are provided.

Licensing & Compatibility

Released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires 8GB+ RAM for the PDF processing model. Relies on OpenAI API (potential costs). LaTeX installation is mandatory. Performance may require disabling verification or interactive features.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
6
Star History
233 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.