ChatAnything  by zhoudaquan

AI Facetime chat with LLM-enhanced personas (research paper)

created 1 year ago
382 stars

Top 75.9% on sourcepulse

GitHubView on GitHub
Project Summary

ChatAnything enables interactive FaceTime-like chats with LLM-driven personas, allowing users to assign visual appearances to any concept. This project targets users interested in creative AI applications and offers a novel way to visualize and interact with AI agents.

How It Works

The pipeline integrates multiple open-source models for animation and chat. It uses an LLM to select an initial image generator (Stable Diffusion derivatives with ControlNet) and a Text-to-Speech (TTS) model based on user-defined concepts and desired persona. The selected models then generate a visual representation and voice for the AI agent, enabling interactive conversations.

Quick Start & Requirements

  • Install via conda env create -f environment.yaml and conda env update --name chatanything --file environment.yaml.
  • Run python python_scripts/prepare_models.py to download necessary models.
  • Supports local LLMs via FastChat, requiring a separate environment (Python incompatible with ChatAnything's 3.8.10). Local LLM setup needs ~14GB GPU memory for a 7B model.
  • Official documentation and technical report are available.

Highlighted Details

  • Supports custom image generation models (Stable Diffusion v1.5 derivatives) and LoRAs via configuration.
  • Integrates edge-tts for voice generation, with experimental support for custom voice cloning pipelines.
  • Allows configuration of LLM prompts for model selection and persona generation.
  • Offers Docker build instructions for easier deployment.

Maintenance & Community

The project is actively developed, with recent updates adding support for open-source LLMs like LLaMA. Milestone logs indicate ongoing work on face rendering and TTS modules. Community contributions are encouraged.

Licensing & Compatibility

The project relies on various open-source components, each with its own license. The primary license for the ChatAnything code itself is not explicitly stated in the README, but it heavily leverages and acknowledges numerous open-source libraries. Compatibility for commercial use would require careful review of all underlying dependencies.

Limitations & Caveats

Currently, image generation is limited to Stable Diffusion v1.5 derivatives; SDXL support is under consideration but not yet implemented due to the lack of a suitable face-landmark ControlNet. The Docker build is noted as not fully tested.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

promptable by cfortuner

0%
2k
TS/JS library for building full-stack AI apps
created 2 years ago
updated 2 years ago
Feedback? Help us improve.