FunAudioLLM-APP by FunAudioLLM

AI app for voice chat and translation

Created 1 year ago

377 stars

Top 75.5% on SourcePulse

Project Summary

This project provides two applications: Voice Chat for interactive AI dialogues and Voice Translation for real-time language conversion. It targets users seeking to integrate advanced speech understanding and generation models into their workflows, offering a more natural and accessible way to interact with AI and overcome language barriers.

How It Works

The applications leverage pre-trained models from CosyVoice and SenseVoice, accessed as Git submodules. Voice Chat enables natural conversations, while Voice Translation facilitates on-the-fly spoken language conversion. The core functionality relies on external API tokens (e.g., DashScope) and specific CUDA device configurations for execution.

Quick Start & Requirements

Install: Clone the repository with submodules (git clone --recursive URL) and run pip install -r requirements.txt.
Prerequisites: DashScope API token, a .pem file, and CUDA-enabled GPU (e.g., CUDA_VISIBLE_DEVICES="0"). Environment setup for CosyVoice and SenseVoice submodules is also required.
Usage:
- Voice Chat: sudo CUDA_VISIBLE_DEVICES="0" DS_API_TOKEN="YOUR-DS-API-TOKEN" python app.py
- Voice Translation: sudo CUDA_VISIBLE_DEVICES="0" DS_API_TOKEN="YOUR-DS-API-TOKEN" python app.py
Links: FunAudioLLM Homepage, CosyVoice Paper, CosyVoice repo, SenseVoice repo

Highlighted Details

Real-time voice chat and translation capabilities.
Integration with CosyVoice and SenseVoice models.
Requires external API tokens for functionality.

Maintenance & Community

Information on maintainers, community channels, or roadmaps is not detailed in the provided README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not addressed.

Limitations & Caveats

The project requires specific external dependencies and API tokens, and its setup involves managing Git submodules which can be complex. The README lacks details on licensing, community support, and potential limitations of the underlying models.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days