OmniGenBench  by COLA-Laboratory

Genomic foundation model toolkit

Created 11 months ago
352 stars

Top 79.1% on SourcePulse

GitHubView on GitHub
Project Summary

OmniGenBench provides a unified platform for fine-tuning, inference, and benchmarking of genomic foundation models (GFMs). It targets researchers and developers in genomics, enabling reproducible evaluations and efficient workflows for diverse DNA and RNA sequence modeling tasks.

How It Works

OmniGenBench employs a modular design, allowing users to easily integrate and evaluate over 30 GFMs. It supports automated benchmarking across five curated suites (RGB, BEACON, PGB, GUE, GB), covering sequence and structure-level tasks. The platform facilitates fine-tuning and inference through both CLI and Python APIs, leveraging libraries like Hugging Face Transformers and Accelerate for efficient execution.

Quick Start & Requirements

  • Installation: pip install omnigenbench -U or install from source.
  • Prerequisites: Python 3.10+, PyTorch 2.5+, Transformers 4.46.0+.
  • Usage: Automated benchmarking via CLI (autobench --model_name_or_path ...) or Python API (AutoBench(...).run(...)).
  • Resources: Requires downloading models and benchmark datasets from Hugging Face Hub.
  • Links: Installation, Getting Started, Tutorials.

Highlighted Details

  • Supports over 30 genomic foundation models for both RNA and DNA modalities.
  • Features five benchmark suites covering sequence and structure-level genomics tasks.
  • Includes tutorials for RNA design and secondary structure prediction.
  • Offers plug-and-play evaluation for models like OmniGenome, HyenaDNA, and DNABERT-2.

Maintenance & Community

The project is actively maintained by the COLA-Laboratory. Contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project requires specific versions of PyTorch (2.5+) and Transformers (4.46.0+), which may necessitate environment management. Specific benchmark datasets and model weights need to be downloaded separately.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
15
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0.3%
1k
DNA foundation model for long-context biological sequence modeling and design
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.