TaskMatrix by chenfei-wu

Visual ChatGPT connects LLMs to visual foundation models

Created 3 years ago

34,255 stars

Top 1.0% on SourcePulse

View on GitHub

18 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Author of LLaMA-Factory

and 14 more!

Project Summary

TaskMatrix enables ChatGPT to interact with visual foundation models, allowing users to send and receive images during conversations. It targets researchers and power users seeking to integrate diverse AI capabilities, offering a unified interface for complex visual tasks.

How It Works

TaskMatrix leverages a template system where pre-defined execution flows orchestrate multiple foundation models. ChatGPT acts as a general interface, while specialized models handle domain-specific visual tasks. This approach allows for complex operations like image editing (detection, segmentation, inpainting) or image extension without retraining individual models.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n visgpt python=3.8), activate it (conda activate visgpt), install requirements (pip install -r requirements.txt), and install GroundingDINO and segment-anything from their respective Git repositories.
Prerequisites: Python 3.8, Conda, OpenAI API key, and potentially multiple GPUs for optimal performance (e.g., CUDA 12 recommended for various models).
Start: Run python visual_chatgpt.py --load <model_name>_<device>[,...] to specify model and device assignments.
Docs: System Architecture, Demo

Highlighted Details

Supports Chinese language input.
Enables custom template creation for new execution flows.
Integrates with models like GroundingDINO, Segment Anything, and Stable Diffusion.
Provides GPU memory usage estimates for various foundation models.

Maintenance & Community

The project acknowledges contributions from various individuals and projects, including Hugging Face, LangChain, and Stable Diffusion. For issues, GitHub issues are preferred; contact Chenfei WU or Nan DUAN for other communications.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. However, it acknowledges dependencies on various open-source projects and notes that users must comply with the licenses of the recommended models. Microsoft disclaims liability for third-party rights infringement.

Limitations & Caveats

The README states that recommended models are examples for research and users must comply with individual model licenses. Microsoft is not liable for infringement of third-party rights.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days