Visual ChatGPT connects LLMs to visual foundation models
Top 1.0% on sourcepulse
TaskMatrix enables ChatGPT to interact with visual foundation models, allowing users to send and receive images during conversations. It targets researchers and power users seeking to integrate diverse AI capabilities, offering a unified interface for complex visual tasks.
How It Works
TaskMatrix leverages a template system where pre-defined execution flows orchestrate multiple foundation models. ChatGPT acts as a general interface, while specialized models handle domain-specific visual tasks. This approach allows for complex operations like image editing (detection, segmentation, inpainting) or image extension without retraining individual models.
Quick Start & Requirements
conda create -n visgpt python=3.8
), activate it (conda activate visgpt
), install requirements (pip install -r requirements.txt
), and install GroundingDINO and segment-anything from their respective Git repositories.python visual_chatgpt.py --load <model_name>_<device>[,...]
to specify model and device assignments.Highlighted Details
Maintenance & Community
The project acknowledges contributions from various individuals and projects, including Hugging Face, LangChain, and Stable Diffusion. For issues, GitHub issues are preferred; contact Chenfei WU or Nan DUAN for other communications.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. However, it acknowledges dependencies on various open-source projects and notes that users must comply with the licenses of the recommended models. Microsoft disclaims liability for third-party rights infringement.
Limitations & Caveats
The README states that recommended models are examples for research and users must comply with individual model licenses. Microsoft is not liable for infringement of third-party rights.
1 year ago
Inactive