personaplex  by NVIDIA

Full-duplex conversational speech model with real-time persona control

Created 3 weeks ago

New!

3,801 stars

Top 12.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

PersonaPlex offers real-time, full-duplex conversational speech models with precise persona and voice control. It targets developers building advanced conversational AI, enabling natural, low-latency interactions with consistent characterization via text and audio conditioning.

How It Works

Based on the Moshi architecture and Helium LLM, PersonaPlex processes speech bidirectionally in real-time. It uses text-based role prompts and audio-based voice conditioning for consistent persona. Trained on diverse synthetic and real dialogues, it achieves naturalistic, low-latency spoken interactions.

Quick Start & Requirements

Install via pip install moshi/. after cloning. Requires accepting the PersonaPlex model license on Huggingface and setting export HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>. Launch a server with python -m moshi.server --ssl <SSL_DIR> for web UI access. Offline evaluation uses moshi.offline with voice prompts and input WAVs.

Highlighted Details

  • Voice Variety: Features 16 pre-packaged embeddings across Natural (NAT) and Variety (VAR) categories, male/female (e.g., NATF0-3, NATM0-3, VARF0-4, VARM0-4).
  • Flexible Prompting: Supports detailed role-playing for assistant, customer service (e.g., CitySan, Jerusalem Shakshuka), and casual conversations, leveraging corpora like Fisher English.
  • Emergent Generalization: Leverages the Helium LLM for plausible responses to out-of-distribution prompts, encouraging experimental use cases.

Maintenance & Community

The README provides no details on community channels, active maintainers, or a public roadmap.

Licensing & Compatibility

Code is MIT licensed. Model weights use the NVIDIA Open Model license, which may restrict commercial use. Users must review the NVIDIA Open Model license terms carefully.

Limitations & Caveats

Setup necessitates accepting a Huggingface model license and configuring authentication. Performance on highly novel or out-of-distribution prompts is an area for user experimentation, not a guaranteed feature.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
16
Issues (30d)
23
Star History
3,826 stars in the last 23 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
3 more.

ChatTTS by 2noise

0.1%
39k
Generative speech model for daily dialogue
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.