TITAN  by mahmoodlab

Multimodal foundation model for pathology image and text analysis

Created 1 year ago
302 stars

Top 88.5% on SourcePulse

GitHubView on GitHub
Project Summary

TITAN is a multimodal whole-slide foundation model addressing limitations in clinical data for computational pathology. It enables patient- and slide-level analysis, especially for rare diseases, by extracting general-purpose slide representations and generating pathology reports. Pretrained via self-supervised learning and vision-language alignment, it generalizes to resource-limited scenarios without fine-tuning and outperforms existing models.

How It Works

TITAN (Transformer-based pathology Image and Text Alignment Network) combines visual self-supervised learning (SSL) and vision-language alignment for pre-training. It utilizes 335,645 whole-slide images (WSIs), over 182,000 pathology reports, and >423,000 synthetic captions. This dual strategy captures richer morphological semantics than single-approach models. Crucially, it avoids pretraining on large public histology collections (e.g., TCGA) to prevent benchmark data contamination.

Quick Start & Requirements

  • Installation: Clone the repository, create and activate a Python 3.9 conda environment, then run pip install -e ..
  • Prerequisites: Python 3.9, Conda, huggingface_hub library. GPU recommended for inference.
  • Model Access: Requires requesting access from the Hugging Face model page and logging in via huggingface_hub.
  • Resources: Demo notebooks are available for slide embedding extraction, zero-shot classification, and linear probing. Links to the GitHub repository and Hugging Face model page are provided in the README.

Highlighted Details

  • Achieves state-of-the-art performance on linear probing, few-shot/zero-shot classification, rare cancer retrieval, cross-modal retrieval, and report generation.
  • Outperforms other slide foundation models on benchmarks like TCGA-UT-8K and TCGA-OT.
  • Pretrained without common public histology datasets to avoid benchmark contamination.
  • Feature extraction integrated into TRIDENT and CLAM.

Maintenance & Community

Recent updates include the integration of TITAN slide feature extraction into TRIDENT (February 2025) and CONCHv1.5 feature extraction into CLAM (December 2024). The preprint and model weights were released in December 2024. The GitHub repository is the primary community hub.

Licensing & Compatibility

Released under CC-BY-NC-ND 4.0, strictly for non-commercial, academic research use with attribution. Commercial use, sale, or monetization requires prior written approval. Distribution and reproduction are restricted, necessitating individual registration and agreement to terms.

Limitations & Caveats

This is a "TITAN-preview" release. Decoder weights are removed from public release as a PHI leakage precaution, though encoder performance remains unaffected. Model access requires Hugging Face login and agreement to specific terms, including individual user registration and distribution restrictions.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
375
Multimodal framework for vision-and-language transformer research
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.