DNA-Diffusion  by pinellolab

Generative modeling of regulatory DNA sequences using diffusion

Created 3 years ago
452 stars

Top 66.6% on SourcePulse

GitHubView on GitHub
Project Summary

DNA-Diffusion is a Python library for generating synthetic regulatory DNA sequences using diffusion probabilistic models. It is designed for researchers and bioinformaticians working with genomics and synthetic biology, enabling the creation of cell-type-specific DNA elements for experimental validation or design.

How It Works

The project leverages diffusion probabilistic models, a class of generative models that learn to reverse a diffusion process (gradually adding noise) to generate new data. This approach allows for the generation of high-quality, realistic DNA sequences that capture the complex patterns found in regulatory elements. The model is trained on chromatin accessibility data to learn cell-type-specific sequence characteristics.

Quick Start & Requirements

  • Install via uv sync after cloning the repository.
  • Recommended: Linux with a recent GPU (e.g., A100).
  • Compatible with CPU, but GPU is preferred for performance.
  • Documentation: https://pinellolab.github.io/DNA-Diffusion

Highlighted Details

  • Generates 200bp cell-type-specific synthetic regulatory elements.
  • Provides scripts for both training and sequence generation.
  • Supports debugging with a single-sequence training configuration.

Maintenance & Community

  • Key contributors include Lucas Ferreira da Silva and Luca Pinello.
  • Follows the all-contributors specification, welcoming contributions.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • While compatible with CPU, performance is significantly better on a recent GPU.
  • The README does not specify the exact license, which may impact commercial use or integration into closed-source projects.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0.3%
1k
DNA foundation model for long-context biological sequence modeling and design
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.