POPPER by snap-stanford

Automated validation of free-form hypotheses

Created 1 year ago

274 stars

Top 94.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Summary

POPPER is an agentic framework for automated validation of free-form hypotheses, particularly those generated by LLMs. It addresses the impracticality of manually validating voluminous, abstract, or potentially hallucinated hypotheses by using LLM agents to design and execute falsification experiments. This offers researchers a scalable, rigorous, and significantly faster method for hypothesis validation across scientific domains, ensuring strict Type-I error control.

How It Works

POPPER leverages Karl Popper's principle of falsification. LLM agents design and execute experiments to disprove hypotheses by targeting their measurable implications. A novel sequential testing framework ensures strict Type-I error control while actively gathering evidence from diverse data sources, enabling robust validation of complex, abstract hypotheses.

Quick Start & Requirements

Installation is recommended within a Python virtual environment (e.g., conda create -n popper_env python=3.10). Install via pip (pip install popper_agent) or from source. Prerequisites include setting OpenAI/Anthropic API keys as environment variables. POPPER supports inference with locally-served LLMs (vLLM, SGLang, llama.cpp) via OpenAI-compatible APIs. Datasets are auto-downloaded or specified via data_path. Paper link (arXiv:2502.09858) and demo are mentioned but not directly provided.

Highlighted Details

Achieved comparable performance to human scientists in biological hypothesis validation, reducing time by 10-fold.
Offers robust Type-I error control and high statistical power across biology, economics, and sociology.
Scalable architecture for handling large hypothesis/data volumes.
Includes a Gradio UI for interactive validation (agent.launch_UI()).

Maintenance & Community

Hosted by snap-stanford. Contact Kexin Huang (kexinh@cs.stanford.edu) or raise GitHub issues. No specific community channels (Discord/Slack) are listed.

Licensing & Compatibility

The specific open-source license is not explicitly stated, potentially impacting commercial use or integration. The framework is compatible with commercial LLM APIs and locally hosted models.

Limitations & Caveats

The absence of a declared license is a significant caveat for adoption. The system requires API keys for cloud LLMs or local LLM server setup. The README advises running benchmarks in containerized environments due to the agent's filesystem access capabilities.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days