marketing_creator_pro_max_backend by libn-net

Backend API for AI digital human cloning and short video generation

Created 1 year ago

573 stars

Top 56.3% on SourcePulse

Project Summary

This project provides a backend API for an AI digital human creation platform, targeting individuals and businesses looking to leverage AI for marketing, customer acquisition, and content generation. It enables high-fidelity cloning of digital humans and voices, short video generation, AI dubbing, and AI subtitles, with a stated goal of empowering users to avoid costly marketing agency pitfalls.

How It Works

The backend integrates several open-source AI models for its core functionalities. It utilizes a digital human cloning module (likely based on Ultralight) for visual replication, a voice cloning module (likely based on fish-speech) for audio replication, and various text-to-video and text-to-speech pipelines. The architecture supports modularity, allowing different AI models (e.g., Wav2Lip for lip-sync) to be plugged in, and is designed to be scalable for various deployment scenarios including web, H5, and mini-programs.

Quick Start & Requirements

Install/Run: Backend API is started with uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload within the backend conda environment.
Prerequisites:
- NVIDIA GPU with drivers, CUDA (>=11.7 implied by PyTorch install), and cuDNN.
- Python 3.9 for backend, 3.10 for external modules (fish-speech, ultralight), 3.8 for wav2lip-onnx.
- Conda environment management is heavily utilized.
- FFmpeg installed and added to PATH.
- Specific model checkpoints and data files need to be downloaded and placed in designated directories.
- .env file configuration for project paths and cloud storage (OSS).
Setup: Requires setting up multiple conda environments and downloading external model weights, which can be time-consuming.
Docs: API documentation available at http://127.0.0.1:8000/docs after startup.

Highlighted Details

Supports high-fidelity digital human and voice cloning.
Offers a comprehensive suite of AI content generation tools: video, dubbing, subtitles, copywriting.
Modular design allows integration of different AI models (Ultralight, fish-speech, Wav2Lip).
Includes features for short video publishing to platforms like Douyin, Kuaishou, etc.
AI digital human live streaming is in development (20% complete).

Maintenance & Community

The project acknowledges several open-source contributors and lists specific GitHub users. It encourages questions via group chats or issues, prioritizing veterans, unemployed, and stay-at-home moms for support.

Licensing & Compatibility

The README does not explicitly state a license for the backend API itself. However, it heavily relies on and integrates other open-source projects, some of which may have their own licenses (e.g., Apache 2.0 for FFmpeg, MIT for some Python libraries). Compatibility for commercial use or closed-source linking would require careful review of all constituent project licenses.

Limitations & Caveats

Several features are marked as "to be released" or in early development (e.g., live streaming, AI private domain transactions, AI super sales). The setup process involves managing multiple complex environments and downloading large model files, which can be challenging. The project's origin story highlights potential instability and the need for technical expertise.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days