FluidAudio  by FluidInference

Native Swift audio processing for Apple devices

Created 9 months ago
1,816 stars

Top 23.4% on SourcePulse

GitHubView on GitHub
Project Summary

FluidAudio is a Swift framework for on-device, low-latency audio processing on Apple platforms, targeting developers building real-time applications. It offers speaker diarization, voice activity detection (VAD), and automatic speech recognition (ASR) using open-source models converted to Apple's Core ML format, optimized for efficient background processing on Apple Silicon.

How It Works

FluidAudio leverages native Swift and Core ML for all audio processing, ensuring full local operation and minimal latency. It utilizes custom-converted, optimized versions of state-of-the-art models like Parakeet TDT for ASR and Pyannote for speaker diarization. The framework prioritizes CPU-based execution, avoiding GPU/MPS/Shaders to guarantee consistent performance and battery efficiency on Apple devices, including leveraging the Apple Neural Engine.

Quick Start & Requirements

  • Installation: Add via Swift Package Manager: https://github.com/FluidInference/FluidAudio.git. Ensure the library is added to your target, not the executable.
  • Requirements: macOS 14.0+ or iOS 17.0+. Apple Silicon devices recommended.
  • Documentation: https://deepwiki.com/FluidInference/FluidAudio

Highlighted Details

  • Performance: Achieves an RTF of 0.02x (50x faster than real-time) for ASR and competitive DER/JER for speaker diarization on benchmarks.
  • Core ML Native: All models are converted and optimized for Apple's Core ML framework.
  • Real-time Focus: Designed for near real-time workloads with streaming support for ASR.
  • Cross-Platform: Supports both macOS and iOS.

Maintenance & Community

  • Community: Discord server available for custom use cases and feedback.
  • Roadmap: Includes planned system audio access via CoreAudio.

Licensing & Compatibility

  • License: Apache 2.0. Models are also permissively licensed (MIT/Apache 2.0).
  • Compatibility: Suitable for commercial and closed-source applications due to permissive licensing and local processing.

Limitations & Caveats

  • Voice Activity Detection (VAD) APIs are noted as complex for production and are a lower maintenance priority.
  • CLI tools are macOS-only; iOS applications must use the library programmatically.
Health Check
Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)
117
Issues (30d)
25
Star History
205 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

voxtral.c by antirez

1.1%
2k
Pure C speech-to-text inference engine for Mistral Voxtral Realtime 4B
Created 2 months ago
Updated 1 month ago
Feedback? Help us improve.