ChatAnything by zhoudaquan

AI Facetime chat with LLM-enhanced personas (research paper)

Created 2 years ago

381 stars

Top 74.9% on SourcePulse

Project Summary

ChatAnything enables interactive FaceTime-like chats with LLM-driven personas, allowing users to assign visual appearances to any concept. This project targets users interested in creative AI applications and offers a novel way to visualize and interact with AI agents.

How It Works

The pipeline integrates multiple open-source models for animation and chat. It uses an LLM to select an initial image generator (Stable Diffusion derivatives with ControlNet) and a Text-to-Speech (TTS) model based on user-defined concepts and desired persona. The selected models then generate a visual representation and voice for the AI agent, enabling interactive conversations.

Quick Start & Requirements

Install via conda env create -f environment.yaml and conda env update --name chatanything --file environment.yaml.
Run python python_scripts/prepare_models.py to download necessary models.
Supports local LLMs via FastChat, requiring a separate environment (Python incompatible with ChatAnything's 3.8.10). Local LLM setup needs ~14GB GPU memory for a 7B model.
Official documentation and technical report are available.

Highlighted Details

Supports custom image generation models (Stable Diffusion v1.5 derivatives) and LoRAs via configuration.
Integrates edge-tts for voice generation, with experimental support for custom voice cloning pipelines.
Allows configuration of LLM prompts for model selection and persona generation.
Offers Docker build instructions for easier deployment.

Maintenance & Community

The project is actively developed, with recent updates adding support for open-source LLMs like LLaMA. Milestone logs indicate ongoing work on face rendering and TTS modules. Community contributions are encouraged.

Licensing & Compatibility

The project relies on various open-source components, each with its own license. The primary license for the ChatAnything code itself is not explicitly stated in the README, but it heavily leverages and acknowledges numerous open-source libraries. Compatibility for commercial use would require careful review of all underlying dependencies.

Limitations & Caveats

Currently, image generation is limited to Stable Diffusion v1.5 derivatives; SDXL support is under consideration but not yet implemented due to the lack of a suitable face-landmark ControlNet. The Docker build is noted as not fully tested.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days