data-to-paper by Technion-Kishony-lab

AI framework for end-to-end scientific research, from data to paper

Created 2 years ago

758 stars

Top 45.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

This project provides an AI-driven framework for automating scientific research, from raw data to a complete, human-verifiable research paper. It targets researchers and scientists seeking to accelerate discovery and enhance transparency in AI-assisted research, offering end-to-end, field-agnostic capabilities with backward traceability.

How It Works

The framework employs interacting AI agents to navigate the scientific process, including data exploration, literature review, hypothesis generation, data analysis, interpretation, and paper writing. Its core innovation is "data-chaining," which creates backward-traceable manuscripts where numerical values can be traced to the specific code lines that generated them, ensuring verifiability. Users can opt for fully autonomous operation or utilize a Copilot App for human oversight, guidance, and review.

Quick Start & Requirements

Install via pip: pip install data-to-paper
Run: data-to-paper
Dependencies: Refer to INSTALL for detailed requirements.
Additional resources: Example AI-created paper, Copilot App DEMO, data-chaining DEMO

Highlighted Details

End-to-end, field-agnostic research automation.
Backward-traceable manuscripts with click-to-code lineage.
Autonomous or human-guided (Copilot App) research modes.
Built-in coding guardrails to minimize LLM errors.
Implemented based on the NEJM AI paper "Autonomous LLM-Driven Research — from Data to Human-Verifiable Research Papers."

Maintenance & Community

Developed by the Technion-Kishony-lab.
Open to contributions and feedback for extending the framework.
Currently designed for simpler research goals and datasets.

Licensing & Compatibility

License details are not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Users assume all risks associated with using the software, including LLM-generated code execution and potential data loss. Accountability for manuscript rigor, quality, and ethics rests solely with the user, requiring human oversight and expert vetting. The process is not error-proof, and users are responsible for API token costs. AI-created manuscripts are watermarked and should not be altered.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days