storm by stanford-oval

LLM system for automated knowledge curation and article generation

Created 1 year ago

27,928 stars

Top 1.4% on SourcePulse

View on GitHub

10 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Casper Hansen

Author of AutoAWQ

Pawel Garbacki

Cofounder of Fireworks AI

Gabriel Almeida

Cofounder of Langflow

and 6 more!

Project Summary

STORM is an LLM-powered system for automated knowledge curation, designed to research a topic and generate comprehensive, citation-backed reports. It targets researchers, content creators, and anyone needing to synthesize information, offering a significant head start in the pre-writing phase of article generation. The system's novelty lies in its multi-perspective question-asking and simulated conversation approaches to improve research depth and breadth.

How It Works

STORM breaks down article generation into research and writing stages. It employs "Perspective-Guided Question Asking" by analyzing similar topics to inform its queries and "Simulated Conversation" where an LLM acts as both a writer and an expert, grounded in retrieved sources, to refine understanding and ask follow-up questions. Co-STORM enhances this with a collaborative protocol involving a moderator, question-answering agents, and human users, maintaining a shared mind map for conceptual clarity.

Quick Start & Requirements

Install: pip install knowledge-storm
Prerequisites: Python 3.11+, API keys for LLMs (e.g., OpenAI, Azure) and search engines (e.g., Bing, You.com).
Setup: Requires configuring API keys in secrets.toml. Example scripts are provided for STORM and Co-STORM.
Docs: STORM Paper, Co-STORM Paper, Website

Highlighted Details

Supports numerous LLM and embedding models via litellm integration.
Extensive retrieval module support including YouRM, BingSearch, VectorRM, and more.
Co-STORM enables human-AI collaborative knowledge curation with a discourse protocol and dynamic mind map.
Includes datasets like FreshWiki and WildSeek for research and evaluation.

Maintenance & Community

The project is actively developed, with recent updates including litellm integration and Co-STORM release. Contributions are welcomed via issues and pull requests. Contact persons are Yijia Shao and Yucheng Jiang.

Licensing & Compatibility

The FreshWiki dataset is licensed under CC BY-SA. The code's license is not explicitly stated in the README, but it is a research preview from Stanford. Commercial use compatibility requires careful review of the underlying model licenses and any explicit project licensing.

Limitations & Caveats

The system produces articles that are helpful in a pre-writing stage but may require significant edits to be publication-ready. The README mentions specific branches for replicating paper results, indicating potential differences between the main branch and historical experimental setups.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

166 stars in the last 30 days