LLaVA-Interactive-Demo  by LLaVA-VL

All-in-one demo for image chat, segmentation, generation, and editing

Created 2 years ago
380 stars

Top 74.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an all-in-one demonstration for LLaVA, enabling interactive image chat, segmentation, and generation/editing. It targets researchers and users interested in multimodal AI capabilities, offering a unified interface for complex visual tasks.

How It Works

LLaVA-Interactive integrates multiple state-of-the-art models, including LLaVA for vision-language understanding, SEEM for comprehensive segmentation, and GLIGEN for grounded text-to-image generation. This combination allows for a seamless workflow where users can converse with images, precisely segment objects, and generate or edit images based on textual prompts.

Quick Start & Requirements

  • Install via conda and pip.
  • Requires CUDA 11.7 or above.
  • Recommended for desktop computers.
  • Project Page
  • Demo (Note: Live demo disabled as of June 10, 2024)
  • Paper

Highlighted Details

  • Unified interface for image chat, segmentation, and generation/editing.
  • Integrates LLaVA, SEEM, and GLIGEN models.
  • Utilizes LaMa for background hole filling.

Maintenance & Community

  • The project is associated with LLaVA-VL.
  • Related projects include LLaVA, SEEM, and GLIGEN.

Licensing & Compatibility

  • Licensed under the Apache License.
  • GLIGEN is MIT licensed.
  • Intended for non-commercial use only.
  • Subject to LLaMA model license, OpenAI Terms of Use, and ShareGPT Privacy Practices.

Limitations & Caveats

The live demo website is currently disabled. The service is a research preview with limited safety measures and may generate offensive content. It is not intended for commercial use.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

0%
2k
Training-free paradigm for text-to-image generation/editing
Created 2 years ago
Updated 1 year ago
Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 5 years ago
Updated 4 years ago
Feedback? Help us improve.