LLaVA-Interactive-Demo by LLaVA-VL

All-in-one demo for image chat, segmentation, generation, and editing

Created 2 years ago

380 stars

Top 74.6% on SourcePulse

Project Summary

This project provides an all-in-one demonstration for LLaVA, enabling interactive image chat, segmentation, and generation/editing. It targets researchers and users interested in multimodal AI capabilities, offering a unified interface for complex visual tasks.

How It Works

LLaVA-Interactive integrates multiple state-of-the-art models, including LLaVA for vision-language understanding, SEEM for comprehensive segmentation, and GLIGEN for grounded text-to-image generation. This combination allows for a seamless workflow where users can converse with images, precisely segment objects, and generate or edit images based on textual prompts.

Quick Start & Requirements

Install via conda and pip.
Requires CUDA 11.7 or above.
Recommended for desktop computers.
Project Page
Demo (Note: Live demo disabled as of June 10, 2024)
Paper

Highlighted Details

Unified interface for image chat, segmentation, and generation/editing.
Integrates LLaVA, SEEM, and GLIGEN models.
Utilizes LaMa for background hole filling.

Maintenance & Community

The project is associated with LLaVA-VL.
Related projects include LLaVA, SEEM, and GLIGEN.

Licensing & Compatibility

Licensed under the Apache License.
GLIGEN is MIT licensed.
Intended for non-commercial use only.
Subject to LLaMA model license, OpenAI Terms of Use, and ShareGPT Privacy Practices.

Limitations & Caveats

The live demo website is currently disabled. The service is a research preview with limited safety measures and may generate offensive content. It is not intended for commercial use.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Comfyui_Comfly by ainewsto

ComfyUI extension for AI image/video generation workflows

Created 2 years ago

Updated 4 weeks ago

UltraPixel by catcathh

Research paper implementation for ultra-high-resolution image synthesis

Created 2 years ago

Updated 1 year ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

Lumina-mGPT by Alpha-VLLM

Multimodal autoregressive model for vision and language tasks

Created 1 year ago

Updated 8 months ago

Osprey by CircleRadon

Research paper for pixel understanding via visual instruction tuning

Created 2 years ago

Updated 10 months ago

ima2-gen by lidge-jun

Iterative AI image generation studio

Created 2 months ago

Updated 1 day ago

comfyui-tooling-nodes by Acly

ComfyUI extension for external tooling integration

Created 2 years ago

Updated 2 weeks ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

Training-free paradigm for text-to-image generation/editing

Created 2 years ago

Updated 1 year ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI),

Andreas Jansson

Andreas Jansson(Cofounder of Replicate), and

1 more.

Emu3 by baaivision

Multimodal model for vision-language understanding and generation

Created 1 year ago

Updated 6 months ago

Starred by

Max Howell

Max Howell(Author of Homebrew),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

big-sleep by lucidrains

CLI tool for text-to-image generation

Created 5 years ago

Updated 4 years ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Shyamal Anadkat

Shyamal Anadkat(Research Scientist at OpenAI), and

4 more.

VQGAN-CLIP by nerdyrodent

Local VQGAN+CLIP tool for text-to-image generation

Created 5 years ago

Updated 3 years ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Jianwei Yang

Jianwei Yang(Research Scientist at Meta Superintelligence Lab), and

1 more.

Segment-Everything-Everywhere-All-At-Once by UX-Decoder

Multi-modal segmentation research paper

Created 3 years ago

Updated 1 year ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen) and

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

Qwen-Image by QwenLM

Image generation model with advanced text rendering

Created 11 months ago

Updated 5 months ago

Feedback? Help us improve.