AutoSurvey  by AutoSurveys

Framework for automated literature surveys (NeurIPS 2024 paper)

Created 1 year ago
418 stars

Top 70.3% on SourcePulse

GitHubView on GitHub
Project Summary

AutoSurvey provides an automated framework for generating comprehensive literature surveys using large language models. It is designed for researchers and academics seeking to streamline the process of synthesizing existing research on a given topic, offering high citation and content quality.

How It Works

AutoSurvey leverages LLMs to automate survey creation through a structured process. It utilizes a Retrieval-Augmented Generation (RAG) approach, incorporating a large database of arXiv paper abstracts to inform the generation. Key parameters allow control over survey length, section structure, and the number of references used for both outline generation and RAG, enabling tailored and contextually rich survey outputs.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.10.x, a database of arXiv paper abstracts (provided via a OneDrive link), an OpenAI API key, and a GPU.
  • Setup: Requires cloning the repository, installing dependencies, and downloading/unzipping the database.
  • Docs: NeurIPS 2024 Paper

Highlighted Details

  • Demonstrated high citation and content quality across various survey lengths (8k, 16k, 32k, 64k tokens).
  • Supports multiple LLMs, including gpt-4o-2024-05-13.
  • Utilizes nomic-ai/nomic-embed-text-v1 for embedding.
  • Includes an evaluation script to assess generated surveys.

Maintenance & Community

The project is associated with authors from Westlake University, Peking University, Nanjing University, Harbin Institute of Technology (Shenzhen), and Squirrel AI. Contributions are welcome via GitHub issues.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The framework requires access to an OpenAI API key and relies on a specific database of arXiv abstracts, which may not cover all research domains. The quality of the generated survey is dependent on the chosen LLM and the quality of the underlying data.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.