LLaVA-Interactive-Demo  by LLaVA-VL

All-in-one demo for image chat, segmentation, generation, and editing

Created 1 year ago
379 stars

Top 75.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an all-in-one demonstration for LLaVA, enabling interactive image chat, segmentation, and generation/editing. It targets researchers and users interested in multimodal AI capabilities, offering a unified interface for complex visual tasks.

How It Works

LLaVA-Interactive integrates multiple state-of-the-art models, including LLaVA for vision-language understanding, SEEM for comprehensive segmentation, and GLIGEN for grounded text-to-image generation. This combination allows for a seamless workflow where users can converse with images, precisely segment objects, and generate or edit images based on textual prompts.

Quick Start & Requirements

  • Install via conda and pip.
  • Requires CUDA 11.7 or above.
  • Recommended for desktop computers.
  • Project Page
  • Demo (Note: Live demo disabled as of June 10, 2024)
  • Paper

Highlighted Details

  • Unified interface for image chat, segmentation, and generation/editing.
  • Integrates LLaVA, SEEM, and GLIGEN models.
  • Utilizes LaMa for background hole filling.

Maintenance & Community

  • The project is associated with LLaVA-VL.
  • Related projects include LLaVA, SEEM, and GLIGEN.

Licensing & Compatibility

  • Licensed under the Apache License.
  • GLIGEN is MIT licensed.
  • Intended for non-commercial use only.
  • Subject to LLaMA model license, OpenAI Terms of Use, and ShareGPT Privacy Practices.

Limitations & Caveats

The live demo website is currently disabled. The service is a research preview with limited safety measures and may generate offensive content. It is not intended for commercial use.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.