Discover and explore top open-source AI tools and projects—updated daily.
xzf-thuUnified model for real-time, always-on audio interaction
New!
Top 68.5% on SourcePulse
This project introduces the Audio Interaction Model (AIM), addressing the limitations of current Large Audio Language Models (LALMs) that are offline or single-task. AIM offers a unified, always-on model for offline tasks, real-time streaming, and general instruction following. It benefits developers by enabling continuous, proactive, context-aware audio processing within a single system.
How It Works
AudioInteraction operates as a unified, always-on model that continuously processes audio frames, intelligently deciding when to speak. It maintains a ⟨Silent⟩ state, transitioning to ⟨Speak⟩ based on task or acoustic context. This design integrates ASR, S2TT, and AQA into a single, proactive perceive-decide-respond loop, moving beyond single-task or offline paradigms.
Quick Start & Requirements
Installation requires cloning the repo, setting up a Python 3.12 Conda environment, and running pip install -r requirements.txt. PyTorch with CUDA and ffmpeg are prerequisites. Model weights are downloadable via python download.py. Inference can be run offline (infer_offline.py) or real-time (infer_online.py). A WebUI demo is available via web/server.py. Links to technical reports and demos are in the README.
Highlighted Details
Maintenance & Community
The project was recently released (May 2026) with no explicit details on maintainers, community channels, sponsorships, or a public roadmap provided in the README.
Licensing & Compatibility
Released under the Apache-2.0 License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
Given its recent release, the project may be experimental. Specific hardware requirements (e.g., VRAM) are not detailed. Fine-tuning requires specific checkpoints (QWEN_OMNI_CKPT, AUDIO_TOWER_CKPT) which may need separate acquisition. Performance on all edge cases is not exhaustively documented.
1 week ago
Inactive
QwenLM