LISA by JIA-Lab-research

Reasoning segmentation assistant via LLM

Created 2 years ago

2,550 stars

Top 18.2% on SourcePulse

1 Expert Loves This Project

jiamings

Chief Scientist at Luma AI

Project Summary

LISA (Large Language Instructed Segmentation Assistant) introduces a novel "reasoning segmentation" task, enabling segmentation models to interpret complex, implicit text queries requiring world knowledge and reasoning. It targets researchers and developers in computer vision and multimodal AI, offering advanced segmentation capabilities beyond simple object identification.

How It Works

LISA leverages a multi-modal Large Language Model (LLM) architecture, integrating visual understanding with language generation. It's trained on a diverse dataset including semantic segmentation, referring segmentation, visual question answering, and its custom "ReasonSeg" dataset. This approach allows LISA to generate segmentation masks based on nuanced instructions, often accompanied by explanatory reasoning, and supports multi-turn conversations.

Quick Start & Requirements

Install: pip install -r requirements.txt and pip install flash-attn --no-build-isolation.
Prerequisites: Requires LLaVA and SAM pre-trained weights. Datasets (ADE20K, COCO, LLaVA-Instruct-150k, ReasonSeg, etc.) must be downloaded and organized.
Inference: CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' (supports 4-bit, 8-bit, bf16, fp16 precision).
Deployment: CUDA_VISIBLE_DEVICES=0 python app.py --version='xinlai/LISA-13B-llama2-v1 --load_in_4bit'
Resources: 13B model inference requires ~30GB VRAM (16-bit), ~16GB (8-bit), or ~9GB (4-bit). Training requires significant data and compute resources.
Docs: Paper, LISA++ Paper, Online Demo

Highlighted Details

Handles complex reasoning and world knowledge for segmentation.
Provides explanatory answers alongside segmentation masks.
Supports multi-turn conversational segmentation.
Demonstrates robust zero-shot capabilities and significant performance gains with minimal reasoning-specific fine-tuning.
Released LISA++ model and datasets for enhanced global understanding.

Maintenance & Community

Project is actively developed, with recent updates including LISA++ release (Dec 2024) and CVPR 2024 Oral Presentation (June 2024).
Several model versions (7B, 13B, explanatory variants) have been released.
Built upon LLaVA and SAM projects.

Licensing & Compatibility

The specific license is not explicitly stated in the README. However, its reliance on LLaVA and SAM suggests potential licensing considerations from those projects. Commercial use should be verified.

Limitations & Caveats

Older model versions (v0) are not supported by the current chat.py script.
Reproducing validation results for v1 requires using v0 models and checking out a specific legacy commit.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

3

Star History

34 stars in the last 30 days

Explore Similar Projects

Starred by

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake),

Douwe Kiela

Douwe Kiela(Cofounder of Contextual AI), and

1 more.

lens by ContextualAI

Vision-language research paper using LLMs

Created 2 years ago

Updated 5 months ago

VARGPT by VARGPT-family

Multimodal LLM for visual understanding and generation tasks

Created 11 months ago

Updated 8 months ago

Raspberry by daveshap

Open-source dataset for finetuning LLMs with reasoning

Created 1 year ago

Updated 1 year ago

OMG-Seg by lxtGH

Vision model research combining visual perception, reasoning, and multi-modal language tasks

Created 2 years ago

Updated 2 months ago

Awesome-RL-based-Reasoning-MLLMs by Sun-Haoyuan23

Curated list for RL-based reasoning in multimodal LLMs

Created 10 months ago

Updated 1 month ago

bce-qianfan-sdk by baidubce

SDK for Baidu's Qianfan LLM platform, enabling AI workflows

Created 2 years ago

Updated 1 month ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Awesome-LLM4IE-Papers by quqxui

Curated list of LLM papers for generative information extraction (IE)

Created 2 years ago

Updated 1 year ago

llms-interview-questions by Devinterview-io

LLMs interview questions for ML/DS roles

Created 2 years ago

Updated 6 days ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

Aria by rhymes-ai

Multimodal MoE model for video, document understanding, and dialog

Created 1 year ago

Updated 11 months ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

7 more.

autolabel by refuel-ai

Python library to label text datasets using LLMs

Created 2 years ago

Updated 10 months ago

BERT-keras by Separius

Keras implementation for BERT and Transformer LM research

Created 7 years ago

Updated 6 years ago

Starred by

Alexander Wu

Alexander Wu(Founder of MetaGPT).

AutoDL by DeepWisdom

Automated ML framework for multimodal multi-label classification

Created 5 years ago

Updated 3 years ago

Feedback? Help us improve.