GPT4Tools  by AILab-CVC

Intelligent system for visual foundation model control via LLM

Created 2 years ago
773 stars

Top 45.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GPT4Tools is an intelligent system designed to enable conversational interaction with images by automatically selecting, controlling, and utilizing various visual foundation models. It targets users who need to perform image-related tasks within a conversational context, offering a unified interface for diverse visual operations.

How It Works

GPT4Tools leverages a Vicuna-based Large Language Model (LLM) fine-tuned on 71K self-built instruction data. The core approach involves the LLM analyzing conversational content to dynamically decide which visual foundation model (tool) to invoke and how to control it. This self-instructional fine-tuning allows the LLM to learn to use a suite of 22 integrated visual tools, facilitating seamless image manipulation and analysis during conversations.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Requires downloading Vicuna base models (e.g., lmsys/vicuna-13b-v1.5) and GPT4Tools LoRA weights. Additional visual model weights (e.g., Stable Diffusion, BLIP, ControlNet) may need to be downloaded.
  • Resources: The demo script suggests configurations for 1 or 4 GPUs, with specific tool assignments to CUDA devices. Fine-tuning requires DeepSpeed and significant computational resources.
  • Links: Project Page, Online Demo, Dataset.

Highlighted Details

  • Supports 22 integrated visual tools, including image captioning, VQA, segmentation, inpainting, and ControlNet variations.
  • Offers a flexible and extensible architecture allowing users to add new tools or replace existing LLMs.
  • Provides 71K self-instructional data for fine-tuning and model adaptation via LoRA.
  • Paper accepted at NIPS 2023.

Maintenance & Community

The project is actively updated, with recent releases supporting Vicuna-v1.5 and new demos. Key contributors are listed as authors of the associated paper.

Licensing & Compatibility

The project releases LoRA weights to comply with the LLaMA model license. Compatibility with commercial or closed-source applications would depend on the underlying LLaMA and Vicuna licenses.

Limitations & Caveats

The system relies on specific versions of Vicuna and requires careful management of model and tool weight downloads. The multi-GPU setup advice indicates a significant hardware requirement for optimal performance.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Tony Lee Tony Lee(Author of HELM; Research Engineer at Meta), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
24 more.

LLaMA-Factory by hiyouga

1.1%
58k
Unified fine-tuning tool for 100+ LLMs & VLMs (ACL 2024)
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.