Semantic segmentation model using language
Top 44.9% on sourcepulse
LSeg (Language-driven Semantic Segmentation) offers a novel approach to image segmentation by leveraging natural language descriptions for class labels. This enables zero-shot generalization to unseen categories without retraining, making it valuable for researchers and practitioners in computer vision and NLP seeking flexible and adaptable segmentation models.
How It Works
LSeg employs a transformer-based image encoder to generate dense, per-pixel embeddings and a text encoder to create embeddings for descriptive labels (e.g., "grass"). A contrastive objective aligns these embeddings, allowing semantically similar labels to map to similar image regions. This design facilitates generalization to novel classes at test time by exploiting the semantic relationships captured in the text embeddings.
Quick Start & Requirements
pip install -r requirements.txt
followed by specific installs for PyTorch, PyTorch-Encoding, PyTorch-Lightning, OpenCV, imageio, ftfy, regex, tqdm, CLIP, altair, streamlit, protobuf, timm, tensorboardX, matplotlib, test-tube, and wandb.python prepare_ade20k.py
).checkpoints/demo_e200.ckpt
) and run streamlit run lseg_app.py
or use lseg_demo.ipynb
.Highlighted Details
Maintenance & Community
This project is NOT UNDER ACTIVE MANAGEMENT by Intel. Intel has ceased development, maintenance, bug fixes, and contributions. Users are encouraged to fork the project for ongoing needs.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it acknowledges codebases from DPT, PyTorch-Lightning, CLIP, PyTorch Encoding, Streamlit, and Wandb, which have various open-source licenses. Users should verify compatibility for commercial or closed-source use.
Limitations & Caveats
The project is no longer maintained by Intel, meaning no future updates, bug fixes, or support are expected. Users requiring ongoing development or maintenance will need to fork the repository.
7 months ago
1 week