personaplex  by NVIDIA

Full-duplex conversational speech model with real-time persona control

Created 2 months ago
5,857 stars

Top 8.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

PersonaPlex offers real-time, full-duplex conversational speech models with precise persona and voice control. It targets developers building advanced conversational AI, enabling natural, low-latency interactions with consistent characterization via text and audio conditioning.

How It Works

Based on the Moshi architecture and Helium LLM, PersonaPlex processes speech bidirectionally in real-time. It uses text-based role prompts and audio-based voice conditioning for consistent persona. Trained on diverse synthetic and real dialogues, it achieves naturalistic, low-latency spoken interactions.

Quick Start & Requirements

Install via pip install moshi/. after cloning. Requires accepting the PersonaPlex model license on Huggingface and setting export HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>. Launch a server with python -m moshi.server --ssl <SSL_DIR> for web UI access. Offline evaluation uses moshi.offline with voice prompts and input WAVs.

Highlighted Details

  • Voice Variety: Features 16 pre-packaged embeddings across Natural (NAT) and Variety (VAR) categories, male/female (e.g., NATF0-3, NATM0-3, VARF0-4, VARM0-4).
  • Flexible Prompting: Supports detailed role-playing for assistant, customer service (e.g., CitySan, Jerusalem Shakshuka), and casual conversations, leveraging corpora like Fisher English.
  • Emergent Generalization: Leverages the Helium LLM for plausible responses to out-of-distribution prompts, encouraging experimental use cases.

Maintenance & Community

The README provides no details on community channels, active maintainers, or a public roadmap.

Licensing & Compatibility

Code is MIT licensed. Model weights use the NVIDIA Open Model license, which may restrict commercial use. Users must review the NVIDIA Open Model license terms carefully.

Limitations & Caveats

Setup necessitates accepting a Huggingface model license and configuring authentication. Performance on highly novel or out-of-distribution prompts is an area for user experimentation, not a guaranteed feature.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
4
Star History
424 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
3 more.

ChatTTS by 2noise

0.1%
39k
Generative speech model for daily dialogue
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.