lamindb  by laminlabs

Data framework for scalable biological R&D

Created 4 years ago
260 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LaminDB is an open-source data framework for biological R&D, addressing the critical need for reproducible, traceable, and validated datasets and models at scale. It targets scientists and engineers in academia and biotech, providing essential context and memory for complex biological data, transforming fragmented research into a scalable, compounding process.

How It Works

The project implements a lineage-native lakehouse architecture, leveraging Postgres/SQLite for metadata and supporting bio-formats like AnnData and .zarr. It integrates directly with the pydata stack, offering a unified API for querying, tracing, and validating data. This approach provides crucial context and memory, auto-tracking code, compute environments, and data lineage with minimal code changes, simplifying complex biological data management and enabling agentic R&D.

Quick Start & Requirements

  • Install: pip install lamindb (full dependencies) or pip install lamindb-core (minimal).
  • Docs: docs.lamin.ai/setup

Highlighted Details

  • Lineage Tracking: Automatically captures inputs, outputs, code, and environments for scripts, notebooks, functions, and workflows, providing a Git-like audit trail.
  • FAIR Data: Supports validation and annotation for DataFrames, AnnData, SpatialData, parquet, and zarr, ensuring data discoverability and integrity.
  • Bio-Registries & Ontologies: Integrates with >20 public ontologies via the bionty plugin for programmatic experimental design and semantic data management.
  • Unified Access: Direct, zero-copy access to data across local, cloud storage (S3, GCP), and databases (Postgres, SQLite), avoiding REST API bottlenecks.
  • Versioning & Collaboration: Features Git-like branching and merging for change management and lineage-aware data sharing.
  • Extensibility: Allows custom plugins built on the Django ORM for tailored registries and features.

Maintenance & Community

LaminDB is adopted by researchers at leading institutions like Pfizer, scverse, Harvard, and MIT. LaminHub serves as a collaboration platform. Specific community links (Discord, Slack) or roadmap details are not provided in the README.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

No explicit limitations, alpha status, or known bugs are detailed in the provided README content.

Health Check
Last Commit

9 hours ago

Responsiveness

Inactive

Pull Requests (30d)
45
Issues (30d)
4
Star History
23 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.