InternGPT  by OpenGVLab

Interactive demo platform for showcasing AI models

created 2 years ago
3,216 stars

Top 15.3% on sourcepulse

GitHubView on GitHub
Project Summary

InternGPT (iGPT) is an open-source platform designed for showcasing and interacting with various AI models, particularly in vision-centric tasks. It targets researchers and developers who want to easily demonstrate multimodal AI capabilities, offering an intuitive interface that combines language-based interaction with direct visual manipulation.

How It Works

iGPT leverages a pointing-language-driven approach, allowing users to interact with chatbots like ChatGPT not just through text but also by clicking, dragging, and drawing on images. This hybrid interaction model aims to significantly improve communication efficiency and accuracy in vision-based tasks. It incorporates an auxiliary control mechanism to enhance LLM control and fine-tunes a large vision-language model (Husky) for high-quality multimodal dialogue.

Quick Start & Requirements

  • Install/Run: python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 -e
  • Prerequisites: Requires CUDA-enabled GPUs for most features. Specific models can be loaded selectively (e.g., python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https -e for DragGAN).
  • Resources: Running all features requires loading numerous models, demanding substantial GPU memory. Selective loading is recommended for specific tasks.
  • Docs: INSTALL.md

Highlighted Details

  • Supports DragGAN for interactive image editing.
  • Integrates ImageBind for audio-conditioned image generation.
  • Enables multimodal dialogue with images, including visual question answering and object manipulation.
  • Features include interactive image editing, generation from scribbles, OCR, and action recognition.

Maintenance & Community

The project is actively under construction with ongoing updates and welcomes community contributions. Links to a WeChat group are provided for discussion.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The online demo has been suspended due to emergency reasons, requiring local deployment for full functionality. The project is still under construction, with a roadmap indicating planned future features and model integrations.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.