FluidAudio by FluidInference

Native Swift audio processing for Apple devices

Created 1 year ago

2,413 stars

Top 18.3% on SourcePulse

1 Expert Loves This Project

joewalnes

Head of Experimental Projects at Stripe

Project Summary

FluidAudio is a Swift framework for on-device, low-latency audio processing on Apple platforms, targeting developers building real-time applications. It offers speaker diarization, voice activity detection (VAD), and automatic speech recognition (ASR) using open-source models converted to Apple's Core ML format, optimized for efficient background processing on Apple Silicon.

How It Works

FluidAudio leverages native Swift and Core ML for all audio processing, ensuring full local operation and minimal latency. It utilizes custom-converted, optimized versions of state-of-the-art models like Parakeet TDT for ASR and Pyannote for speaker diarization. The framework prioritizes CPU-based execution, avoiding GPU/MPS/Shaders to guarantee consistent performance and battery efficiency on Apple devices, including leveraging the Apple Neural Engine.

Quick Start & Requirements

Installation: Add via Swift Package Manager: https://github.com/FluidInference/FluidAudio.git. Ensure the library is added to your target, not the executable.
Requirements: macOS 14.0+ or iOS 17.0+. Apple Silicon devices recommended.
Documentation: https://deepwiki.com/FluidInference/FluidAudio

Highlighted Details

Performance: Achieves an RTF of 0.02x (50x faster than real-time) for ASR and competitive DER/JER for speaker diarization on benchmarks.
Core ML Native: All models are converted and optimized for Apple's Core ML framework.
Real-time Focus: Designed for near real-time workloads with streaming support for ASR.
Cross-Platform: Supports both macOS and iOS.

Maintenance & Community

Community: Discord server available for custom use cases and feedback.
Roadmap: Includes planned system audio access via CoreAudio.

Licensing & Compatibility

License: Apache 2.0. Models are also permissively licensed (MIT/Apache 2.0).
Compatibility: Suitable for commercial and closed-source applications due to permissive licensing and local processing.

Limitations & Caveats

Voice Activity Detection (VAD) APIs are noted as complex for production and are a lower maintenance priority.
CLI tools are macOS-only; iOS applications must use the library programmatically.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

64

Issues (30d)

38

Star History

249 stars in the last 30 days

Explore Similar Projects

AudioWhisper by mazdak

Lightweight macOS audio transcription app

Created 1 year ago

Updated 4 months ago

AirTranslate by himomohi

macOS app for live system-audio transcription and translation

Created 1 month ago

Updated 1 day ago

macparakeet by moona3k

Local voice AI app for macOS with offline transcription and AI features

Created 5 months ago

Updated 16 hours ago

TheWhisper by TheStageAI

Optimized speech-to-text inference for streaming and on-device use

Created 8 months ago

Updated 3 weeks ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

whisper.rn by mybigday

React Native binding for high-performance local speech recognition

Created 3 years ago

Updated 1 day ago

qwen3-tts-apple-silicon by kapi2800

AI text-to-speech inference for Apple Silicon Macs

Created 5 months ago

Updated 2 months ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm).

muesli by Muesli-HQ

Local dictation and meeting transcription for macOS

Created 4 months ago

Updated 21 hours ago

SwiftWhisper by exPHAT

Swift SDK for audio transcription

Created 3 years ago

Updated 2 years ago

Starred by

Theo Browne

Theo Browne(Founder of Ping.gg).

typewhisper-mac by TypeWhisper

On-device speech-to-text and AI processing for macOS

Created 4 months ago

Updated 1 day ago

kokoro-ios by mlalma

Fast, high-quality text-to-speech for Apple platforms

Created 1 year ago

Updated 6 months ago

Starred by

Max Howell

Max Howell(Author of Homebrew).

mlx-audio-swift by Blaizzy

Swift SDK for on-device audio AI

Created 7 months ago

Updated 2 days ago

Starred by

Joe Walnes

Joe Walnes(Head of Experimental Projects at Stripe),

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai), and

3 more.

argmax-oss-swift by argmaxinc

Speech recognition framework for Apple Silicon

Created 2 years ago

Updated 1 week ago

Feedback? Help us improve.