lang-seg  by isl-org

Semantic segmentation model using language

Created 3 years ago
807 stars

Top 43.8% on SourcePulse

GitHubView on GitHub
Project Summary

LSeg (Language-driven Semantic Segmentation) offers a novel approach to image segmentation by leveraging natural language descriptions for class labels. This enables zero-shot generalization to unseen categories without retraining, making it valuable for researchers and practitioners in computer vision and NLP seeking flexible and adaptable segmentation models.

How It Works

LSeg employs a transformer-based image encoder to generate dense, per-pixel embeddings and a text encoder to create embeddings for descriptive labels (e.g., "grass"). A contrastive objective aligns these embeddings, allowing semantically similar labels to map to similar image regions. This design facilitates generalization to novel classes at test time by exploiting the semantic relationships captured in the text embeddings.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt followed by specific installs for PyTorch, PyTorch-Encoding, PyTorch-Lightning, OpenCV, imageio, ftfy, regex, tqdm, CLIP, altair, streamlit, protobuf, timm, tensorboardX, matplotlib, test-tube, and wandb.
  • Data Preparation: Requires ADE20k dataset (python prepare_ade20k.py).
  • Demo: Download demo model (checkpoints/demo_e200.ckpt) and run streamlit run lseg_app.py or use lseg_demo.ipynb.
  • Dependencies: PyTorch (v1.7.1), CLIP, PyTorch-Lightning (v1.3.5), Streamlit.

Highlighted Details

  • Achieves competitive zero-shot performance on semantic segmentation tasks.
  • Generalizes to unseen categories without retraining or additional samples.
  • Matches traditional segmentation accuracy with fixed label sets.
  • Provides interactive demo applications via Streamlit.

Maintenance & Community

This project is NOT UNDER ACTIVE MANAGEMENT by Intel. Intel has ceased development, maintenance, bug fixes, and contributions. Users are encouraged to fork the project for ongoing needs.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it acknowledges codebases from DPT, PyTorch-Lightning, CLIP, PyTorch Encoding, Streamlit, and Wandb, which have various open-source licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The project is no longer maintained by Intel, meaning no future updates, bug fixes, or support are expected. Users requiring ongoing development or maintenance will need to fork the repository.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

fromage by kohjingyu

0%
482
Multimodal model for grounding language models to images
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.