Auto-Slides by Westlake-AGI-Lab

Automatic academic presentation generation from research papers

Created 9 months ago

424 stars

Top 69.6% on SourcePulse

Project Summary

Auto-Slides is an intelligent system that automatically converts academic research papers into structured, pedagogically optimized presentation slides. It targets researchers and academics, leveraging large language models and cognitive science principles to generate multimodal presentations with interactive customization, significantly streamlining the slide creation process from lengthy papers.

How It Works

The system utilizes a multi-agent framework with specialized agents for PDF content extraction (OCR, layout analysis), presentation planning (informed by cognitive science), verification, and repair. This approach, combined with LLMs, generates presentation-oriented narratives optimized for learning. Multimodal output ensures proper formatting for figures, tables, and code, while an interactive dialogue interface allows real-time slide refinement.

Quick Start & Requirements

Installation involves cloning the repo, setting up a virtual environment, and running pip install -r requirements.txt. A critical step is downloading the marker-pdf model (python down_model.py, ~2GB). Prerequisites include Python 3.8+, a LaTeX environment, and an OpenAI API key. The marker-pdf model requires 8GB+ RAM. Key resources are the project page (auto-slides.github.io) and arXiv paper (arXiv:2509.11062).

Highlighted Details

Intelligent PDF Processing: Extracts text, figures, tables, and structure using OCR and layout analysis.
Multi-Agent Framework: Specialized agents for extraction, planning, verification, and repair.
Interactive Customization: Real-time refinement via natural language dialogue.
Pedagogical Optimization: Creates learning-enhancing presentation narratives.
Multimodal Output: Generates LaTeX Beamer slides with proper formatting for figures, tables, and code.
Thematic & Language Support: Multiple Beamer themes (e.g., Madrid) and bilingual (English/Chinese) support, with optional speech generation.

Maintenance & Community

Associated with AGI Lab, Westlake University, and UC Merced. Community interaction is via GitHub Issues. Project page and contact details are provided.

Licensing & Compatibility

Released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires 8GB+ RAM for the PDF processing model. Relies on OpenAI API (potential costs). LaTeX installation is mandatory. Performance may require disabling verification or interactive features.

Auto-Slides by Westlake-AGI-Lab

Explore Similar Projects

zotero-AI-Butler by steven-jianhao-li

Podcast by artnoage

TrainPPTAgent by johnson7788

Nano-PDF by gavrielc

paper_to_podcast by Azzedde

chat-gpt-ppt by williamfzc

Paper2Any by OpenDCAI

PPTAgent by icip-cas

Paper2Slides by HKUDS

baoyu-skills by JimLiu

banana-slides by Anionex

docling by docling-project