LISA  by dvlab-research

Reasoning segmentation assistant via LLM

Created 2 years ago
2,409 stars

Top 19.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LISA (Large Language Instructed Segmentation Assistant) introduces a novel "reasoning segmentation" task, enabling segmentation models to interpret complex, implicit text queries requiring world knowledge and reasoning. It targets researchers and developers in computer vision and multimodal AI, offering advanced segmentation capabilities beyond simple object identification.

How It Works

LISA leverages a multi-modal Large Language Model (LLM) architecture, integrating visual understanding with language generation. It's trained on a diverse dataset including semantic segmentation, referring segmentation, visual question answering, and its custom "ReasonSeg" dataset. This approach allows LISA to generate segmentation masks based on nuanced instructions, often accompanied by explanatory reasoning, and supports multi-turn conversations.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install flash-attn --no-build-isolation.
  • Prerequisites: Requires LLaVA and SAM pre-trained weights. Datasets (ADE20K, COCO, LLaVA-Instruct-150k, ReasonSeg, etc.) must be downloaded and organized.
  • Inference: CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' (supports 4-bit, 8-bit, bf16, fp16 precision).
  • Deployment: CUDA_VISIBLE_DEVICES=0 python app.py --version='xinlai/LISA-13B-llama2-v1 --load_in_4bit'
  • Resources: 13B model inference requires ~30GB VRAM (16-bit), ~16GB (8-bit), or ~9GB (4-bit). Training requires significant data and compute resources.
  • Docs: Paper, LISA++ Paper, Online Demo

Highlighted Details

  • Handles complex reasoning and world knowledge for segmentation.
  • Provides explanatory answers alongside segmentation masks.
  • Supports multi-turn conversational segmentation.
  • Demonstrates robust zero-shot capabilities and significant performance gains with minimal reasoning-specific fine-tuning.
  • Released LISA++ model and datasets for enhanced global understanding.

Maintenance & Community

  • Project is actively developed, with recent updates including LISA++ release (Dec 2024) and CVPR 2024 Oral Presentation (June 2024).
  • Several model versions (7B, 13B, explanatory variants) have been released.
  • Built upon LLaVA and SAM projects.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README. However, its reliance on LLaVA and SAM suggests potential licensing considerations from those projects. Commercial use should be verified.

Limitations & Caveats

  • Older model versions (v0) are not supported by the current chat.py script.
  • Reproducing validation results for v1 requires using v0 models and checking out a specific legacy commit.
Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
53 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0.3%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 1 month ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

autolabel by refuel-ai

0.1%
2k
Python library to label text datasets using LLMs
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.