BioReason  by bowang-lab

DNA-LLM for biological reasoning

Created 3 months ago
280 stars

Top 93.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

BioReason addresses the challenge of deep, interpretable biological reasoning from genomic data by integrating a DNA foundation model with a large language model (LLM). This novel multimodal architecture enables direct processing of genomic information by the LLM, fostering a new paradigm for AI-driven biological discovery and providing biologically intuitive explanations for complex deductions.

How It Works

BioReason employs a sophisticated multi-step reasoning methodology, combining supervised fine-tuning with targeted reinforcement learning. This approach incentivizes the LLM to generate logical, biologically coherent deductions by processing genomic data as a fundamental input. The integration of a DNA foundation model with an LLM is a novel methodology for AI-driven biological studies, enabling performance gains over single-modality baselines.

Quick Start & Requirements

  • Installation: pip install -e . after cloning the repository.
  • Prerequisites: Python 3.11+, CUDA/GPU for optimal performance.
  • Datasets and checkpoints are available on HuggingFace.

Highlighted Details

  • Achieves 97% accuracy on KEGG-based disease pathway prediction, a significant improvement from 88% baseline.
  • Demonstrates average 15%+ performance gains over strong single-modality DNA foundation models and LLMs.
  • Generates interpretable, step-by-step biological reasoning traces for enhanced scientific insight.
  • Evaluated on novel benchmarks for gene pathway and disease prediction, and variant effect prediction.

Maintenance & Community

The project is associated with researchers from the University of Toronto, Vector Institute, and University Health Network. Notable affiliations include Cohere, Arc Institute, University of California, San Francisco, and Google DeepMind.

Licensing & Compatibility

The repository does not explicitly state a license. The provided bibtex citation indicates it is a research paper (arXiv:2505.23579). Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README indicates that checkpoints and vLLM integration are expected to be released soon, suggesting the project may still be under active development or in a pre-release state. Performance gains are reported against specific baseline models; broader applicability may require further validation.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.