vui  by fluxions-ai

Conversational speech models for on-device use

Created 3 months ago
632 stars

Top 52.4% on SourcePulse

GitHubView on GitHub
Project Summary

Vui provides small, on-device conversational speech models for researchers and developers. It enables local execution of speech interaction, reducing reliance on cloud services and offering potential for real-time applications.

How It Works

Vui is a Llama-based transformer that predicts audio tokens. It utilizes Fluac, an audio tokenizer derived from Descript Audio Codec, which quantizes audio at 21.53Hz (a 4x reduction from 86Hz). This approach aims to create efficient, on-device models capable of contextual speech generation.

Quick Start & Requirements

  • Install: pip install -e . (Linux/Windows with uv)
  • Prerequisites: Requires accepting Hugging Face model terms for VAD and segmentation.
  • Demo: Run python demo.py for Gradio interface.
  • Hardware: Developed on two NVIDIA 4090 GPUs.

Highlighted Details

  • Models include Vui.BASE (40k hours audio), Vui.ABRAHAM (single speaker, context-aware), and Vui.COHOST (two speakers interacting).
  • Voice cloning is supported with the base model, though not perfect.
  • The project leverages Whisper, Audiocraft, and Descript Audio Codec.

Maintenance & Community

  • Primary developer: Harry Coultas Blum.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model is known to hallucinate, and performance is noted as being constrained by limited resources. Voice Activity Detection (VAD) is used to remove silence but can slow down processing.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
1 more.

Qwen-Audio by QwenLM

0.4%
2k
Audio-language model for audio understanding and chat
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.