LISA  by dvlab-research

Reasoning segmentation assistant via LLM

created 2 years ago
2,332 stars

Top 20.0% on sourcepulse

GitHubView on GitHub
Project Summary

LISA (Large Language Instructed Segmentation Assistant) introduces a novel "reasoning segmentation" task, enabling segmentation models to interpret complex, implicit text queries requiring world knowledge and reasoning. It targets researchers and developers in computer vision and multimodal AI, offering advanced segmentation capabilities beyond simple object identification.

How It Works

LISA leverages a multi-modal Large Language Model (LLM) architecture, integrating visual understanding with language generation. It's trained on a diverse dataset including semantic segmentation, referring segmentation, visual question answering, and its custom "ReasonSeg" dataset. This approach allows LISA to generate segmentation masks based on nuanced instructions, often accompanied by explanatory reasoning, and supports multi-turn conversations.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install flash-attn --no-build-isolation.
  • Prerequisites: Requires LLaVA and SAM pre-trained weights. Datasets (ADE20K, COCO, LLaVA-Instruct-150k, ReasonSeg, etc.) must be downloaded and organized.
  • Inference: CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' (supports 4-bit, 8-bit, bf16, fp16 precision).
  • Deployment: CUDA_VISIBLE_DEVICES=0 python app.py --version='xinlai/LISA-13B-llama2-v1 --load_in_4bit'
  • Resources: 13B model inference requires ~30GB VRAM (16-bit), ~16GB (8-bit), or ~9GB (4-bit). Training requires significant data and compute resources.
  • Docs: Paper, LISA++ Paper, Online Demo

Highlighted Details

  • Handles complex reasoning and world knowledge for segmentation.
  • Provides explanatory answers alongside segmentation masks.
  • Supports multi-turn conversational segmentation.
  • Demonstrates robust zero-shot capabilities and significant performance gains with minimal reasoning-specific fine-tuning.
  • Released LISA++ model and datasets for enhanced global understanding.

Maintenance & Community

  • Project is actively developed, with recent updates including LISA++ release (Dec 2024) and CVPR 2024 Oral Presentation (June 2024).
  • Several model versions (7B, 13B, explanatory variants) have been released.
  • Built upon LLaVA and SAM projects.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README. However, its reliance on LLaVA and SAM suggests potential licensing considerations from those projects. Commercial use should be verified.

Limitations & Caveats

  • Older model versions (v0) are not supported by the current chat.py script.
  • Reproducing validation results for v1 requires using v0 models and checking out a specific legacy commit.
Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
164 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.