pixels  by databricks-industry-solutions

Accelerating medical image processing and AI analysis in the Lakehouse

Created 3 years ago
363 stars

Top 77.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Databricks Lakehouse solution accelerator for large-scale processing of DICOM medical images and related documents. It targets engineers and researchers needing to ingest, index, analyze DICOM metadata, and perform AI-driven image segmentation and interactive labeling, offering streamlined workflows and advanced analytics capabilities within a secure environment.

How It Works

The solution ingests DICOM files from cloud storage (ADLS, S3, GCS) via Unity Catalog Volumes, extracts and indexes metadata into Databricks tables, and applies PHI redaction. It integrates the OHIF Viewer for interactive segmentation and labeling, powered by NVIDIA's MONAI for AI-driven segmentation and custom model training. Data can be processed in batch, incremental, or streaming modes, with results accessible via SQL, BI dashboards, and real-time inference endpoints.

Quick Start & Requirements

To set up, clone the repository into a Databricks workspace. Attach a notebook to Serverless Compute or a cluster (>= DBR 14.3 LTS) and run config/setup.py to install the pixels package. Subsequently, execute the RUNME notebook via a Databricks job. GPU-enabled compute is recommended for optimal performance. Official quick-start examples are available within the repository's notebooks.

Highlighted Details

  • Comprehensive DICOM metadata extraction, indexing, and SQL/ML analysis.
  • Integrated OHIF Viewer for secure, interactive DICOM visualization, segmentation, and labeling.
  • NVIDIA MONAI integration for automated segmentation and custom model training, deployable via Databricks Model Serving.
  • Built-in features for automatic zip file extraction and metadata anonymization/PHI redaction.
  • Support for incremental and streaming data ingestion using Databricks Autoloader.

Maintenance & Community

The project is developed by Databricks, with listed contributors from Databricks and Prominence Advisors. No explicit community channels (e.g., Discord, Slack) or public roadmaps are detailed in the provided README.

Licensing & Compatibility

The core dbx.pixels library is provided under a Databricks license. It integrates several third-party libraries with permissive licenses (MIT, Apache-2.0, BSD). While third-party components are broadly compatible, the primary Databricks license terms should be reviewed for specific commercial use or closed-source integration requirements.

Limitations & Caveats

The solution is primarily designed for and requires a Databricks workspace environment. Nifti file format ingestion and robust pixel-level PHI redaction are noted as items on the future roadmap. Users are responsible for associated Databricks compute costs.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
0
Star History
156 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.