LLaVA-Interactive-Demo  by LLaVA-VL

All-in-one demo for image chat, segmentation, generation, and editing

created 1 year ago
375 stars

Top 76.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an all-in-one demonstration for LLaVA, enabling interactive image chat, segmentation, and generation/editing. It targets researchers and users interested in multimodal AI capabilities, offering a unified interface for complex visual tasks.

How It Works

LLaVA-Interactive integrates multiple state-of-the-art models, including LLaVA for vision-language understanding, SEEM for comprehensive segmentation, and GLIGEN for grounded text-to-image generation. This combination allows for a seamless workflow where users can converse with images, precisely segment objects, and generate or edit images based on textual prompts.

Quick Start & Requirements

  • Install via conda and pip.
  • Requires CUDA 11.7 or above.
  • Recommended for desktop computers.
  • Project Page
  • Demo (Note: Live demo disabled as of June 10, 2024)
  • Paper

Highlighted Details

  • Unified interface for image chat, segmentation, and generation/editing.
  • Integrates LLaVA, SEEM, and GLIGEN models.
  • Utilizes LaMa for background hole filling.

Maintenance & Community

  • The project is associated with LLaVA-VL.
  • Related projects include LLaVA, SEEM, and GLIGEN.

Licensing & Compatibility

  • Licensed under the Apache License.
  • GLIGEN is MIT licensed.
  • Intended for non-commercial use only.
  • Subject to LLaMA model license, OpenAI Terms of Use, and ShareGPT Privacy Practices.

Limitations & Caveats

The live demo website is currently disabled. The service is a research preview with limited safety measures and may generate offensive content. It is not intended for commercial use.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
9 more.

LLaVA by haotian-liu

0.2%
23k
Multimodal assistant with GPT-4 level capabilities
created 2 years ago
updated 11 months ago
Feedback? Help us improve.