Reasoning segmentation assistant via LLM
Top 20.0% on sourcepulse
LISA (Large Language Instructed Segmentation Assistant) introduces a novel "reasoning segmentation" task, enabling segmentation models to interpret complex, implicit text queries requiring world knowledge and reasoning. It targets researchers and developers in computer vision and multimodal AI, offering advanced segmentation capabilities beyond simple object identification.
How It Works
LISA leverages a multi-modal Large Language Model (LLM) architecture, integrating visual understanding with language generation. It's trained on a diverse dataset including semantic segmentation, referring segmentation, visual question answering, and its custom "ReasonSeg" dataset. This approach allows LISA to generate segmentation masks based on nuanced instructions, often accompanied by explanatory reasoning, and supports multi-turn conversations.
Quick Start & Requirements
pip install -r requirements.txt
and pip install flash-attn --no-build-isolation
.CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
(supports 4-bit, 8-bit, bf16, fp16 precision).CUDA_VISIBLE_DEVICES=0 python app.py --version='xinlai/LISA-13B-llama2-v1 --load_in_4bit'
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
chat.py
script.5 months ago
1 day