DNA-Diffusion  by pinellolab

Generative modeling of regulatory DNA sequences using diffusion

Created 3 years ago
411 stars

Top 71.1% on SourcePulse

GitHubView on GitHub
Project Summary

DNA-Diffusion is a Python library for generating synthetic regulatory DNA sequences using diffusion probabilistic models. It is designed for researchers and bioinformaticians working with genomics and synthetic biology, enabling the creation of cell-type-specific DNA elements for experimental validation or design.

How It Works

The project leverages diffusion probabilistic models, a class of generative models that learn to reverse a diffusion process (gradually adding noise) to generate new data. This approach allows for the generation of high-quality, realistic DNA sequences that capture the complex patterns found in regulatory elements. The model is trained on chromatin accessibility data to learn cell-type-specific sequence characteristics.

Quick Start & Requirements

  • Install via uv sync after cloning the repository.
  • Recommended: Linux with a recent GPU (e.g., A100).
  • Compatible with CPU, but GPU is preferred for performance.
  • Documentation: https://pinellolab.github.io/DNA-Diffusion

Highlighted Details

  • Generates 200bp cell-type-specific synthetic regulatory elements.
  • Provides scripts for both training and sequence generation.
  • Supports debugging with a single-sequence training configuration.

Maintenance & Community

  • Key contributors include Lucas Ferreira da Silva and Luca Pinello.
  • Follows the all-contributors specification, welcoming contributions.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • While compatible with CPU, performance is significantly better on a recent GPU.
  • The README does not specify the exact license, which may impact commercial use or integration into closed-source projects.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
1
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0.3%
1k
DNA foundation model for long-context biological sequence modeling and design
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.