storm  by stanford-oval

LLM system for automated knowledge curation and article generation

created 1 year ago
27,054 stars

Top 1.4% on sourcepulse

GitHubView on GitHub
Project Summary

STORM is an LLM-powered system for automated knowledge curation, designed to research a topic and generate comprehensive, citation-backed reports. It targets researchers, content creators, and anyone needing to synthesize information, offering a significant head start in the pre-writing phase of article generation. The system's novelty lies in its multi-perspective question-asking and simulated conversation approaches to improve research depth and breadth.

How It Works

STORM breaks down article generation into research and writing stages. It employs "Perspective-Guided Question Asking" by analyzing similar topics to inform its queries and "Simulated Conversation" where an LLM acts as both a writer and an expert, grounded in retrieved sources, to refine understanding and ask follow-up questions. Co-STORM enhances this with a collaborative protocol involving a moderator, question-answering agents, and human users, maintaining a shared mind map for conceptual clarity.

Quick Start & Requirements

  • Install: pip install knowledge-storm
  • Prerequisites: Python 3.11+, API keys for LLMs (e.g., OpenAI, Azure) and search engines (e.g., Bing, You.com).
  • Setup: Requires configuring API keys in secrets.toml. Example scripts are provided for STORM and Co-STORM.
  • Docs: STORM Paper, Co-STORM Paper, Website

Highlighted Details

  • Supports numerous LLM and embedding models via litellm integration.
  • Extensive retrieval module support including YouRM, BingSearch, VectorRM, and more.
  • Co-STORM enables human-AI collaborative knowledge curation with a discourse protocol and dynamic mind map.
  • Includes datasets like FreshWiki and WildSeek for research and evaluation.

Maintenance & Community

The project is actively developed, with recent updates including litellm integration and Co-STORM release. Contributions are welcomed via issues and pull requests. Contact persons are Yijia Shao and Yucheng Jiang.

Licensing & Compatibility

The FreshWiki dataset is licensed under CC BY-SA. The code's license is not explicitly stated in the README, but it is a research preview from Stanford. Commercial use compatibility requires careful review of the underlying model licenses and any explicit project licensing.

Limitations & Caveats

The system produces articles that are helpful in a pre-writing stage but may require significant edits to be publication-ready. The README mentions specific branches for replicating paper results, indicating potential differences between the main branch and historical experimental setups.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
3
Star History
3,098 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.