GPT4Tools by AILab-CVC

Intelligent system for visual foundation model control via LLM

Created 2 years ago

773 stars

Top 45.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

GPT4Tools is an intelligent system designed to enable conversational interaction with images by automatically selecting, controlling, and utilizing various visual foundation models. It targets users who need to perform image-related tasks within a conversational context, offering a unified interface for diverse visual operations.

How It Works

GPT4Tools leverages a Vicuna-based Large Language Model (LLM) fine-tuned on 71K self-built instruction data. The core approach involves the LLM analyzing conversational content to dynamically decide which visual foundation model (tool) to invoke and how to control it. This self-instructional fine-tuning allows the LLM to learn to use a suite of 22 integrated visual tools, facilitating seamless image manipulation and analysis during conversations.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Requires downloading Vicuna base models (e.g., lmsys/vicuna-13b-v1.5) and GPT4Tools LoRA weights. Additional visual model weights (e.g., Stable Diffusion, BLIP, ControlNet) may need to be downloaded.
Resources: The demo script suggests configurations for 1 or 4 GPUs, with specific tool assignments to CUDA devices. Fine-tuning requires DeepSpeed and significant computational resources.
Links: Project Page, Online Demo, Dataset.

Highlighted Details

Supports 22 integrated visual tools, including image captioning, VQA, segmentation, inpainting, and ControlNet variations.
Offers a flexible and extensible architecture allowing users to add new tools or replace existing LLMs.
Provides 71K self-instructional data for fine-tuning and model adaptation via LoRA.
Paper accepted at NIPS 2023.

Maintenance & Community

The project is actively updated, with recent releases supporting Vicuna-v1.5 and new demos. Key contributors are listed as authors of the associated paper.

Licensing & Compatibility

The project releases LoRA weights to comply with the LLaMA model license. Compatibility with commercial or closed-source applications would depend on the underlying LLaMA and Vicuna licenses.

Limitations & Caveats

The system relies on specific versions of Vicuna and requires careful management of model and tool weight downloads. The multi-GPU setup advice indicates a significant hardware requirement for optimal performance.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days