TaskMatrix  by chenfei-wu

Visual ChatGPT connects LLMs to visual foundation models

created 2 years ago
34,411 stars

Top 1.0% on sourcepulse

GitHubView on GitHub
Project Summary

TaskMatrix enables ChatGPT to interact with visual foundation models, allowing users to send and receive images during conversations. It targets researchers and power users seeking to integrate diverse AI capabilities, offering a unified interface for complex visual tasks.

How It Works

TaskMatrix leverages a template system where pre-defined execution flows orchestrate multiple foundation models. ChatGPT acts as a general interface, while specialized models handle domain-specific visual tasks. This approach allows for complex operations like image editing (detection, segmentation, inpainting) or image extension without retraining individual models.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n visgpt python=3.8), activate it (conda activate visgpt), install requirements (pip install -r requirements.txt), and install GroundingDINO and segment-anything from their respective Git repositories.
  • Prerequisites: Python 3.8, Conda, OpenAI API key, and potentially multiple GPUs for optimal performance (e.g., CUDA 12 recommended for various models).
  • Start: Run python visual_chatgpt.py --load <model_name>_<device>[,...] to specify model and device assignments.
  • Docs: System Architecture, Demo

Highlighted Details

  • Supports Chinese language input.
  • Enables custom template creation for new execution flows.
  • Integrates with models like GroundingDINO, Segment Anything, and Stable Diffusion.
  • Provides GPU memory usage estimates for various foundation models.

Maintenance & Community

The project acknowledges contributions from various individuals and projects, including Hugging Face, LangChain, and Stable Diffusion. For issues, GitHub issues are preferred; contact Chenfei WU or Nan DUAN for other communications.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. However, it acknowledges dependencies on various open-source projects and notes that users must comply with the licenses of the recommended models. Microsoft disclaims liability for third-party rights infringement.

Limitations & Caveats

The README states that recommended models are examples for research and users must comply with individual model licenses. Microsoft is not liable for infringement of third-party rights.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
48 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 4 days ago
Feedback? Help us improve.