whisper.unity by Macoron

Unity3d bindings for local speech-to-text inference

Created 2 years ago

680 stars

Top 49.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This project provides Unity3D bindings for whisper.cpp, enabling local, offline speech-to-text inference within Unity applications. It targets game developers and researchers seeking to integrate advanced ASR capabilities directly into their projects without relying on cloud services. The primary benefit is high-performance, multilingual transcription and translation running entirely on the user's device.

How It Works

The project leverages whisper.cpp, a C++ implementation of OpenAI's Whisper model, optimized for performance. It uses GGML for efficient CPU and GPU (Vulkan/Metal) inference. The Unity bindings expose the whisper.cpp functionality through C# scripts, allowing developers to load models, process audio streams from microphones or files, and receive transcribed text or translations. This approach minimizes latency and ensures data privacy by keeping processing local.

Quick Start & Requirements

Install via Unity Package Manager using the Git URL: https://github.com/Macoron/whisper.unity.git?path=/Packages/com.whisper.unity
Requires Unity.
Optional GPU acceleration via Vulkan (Windows/Linux) or Metal (macOS/iOS).
Pre-built ggml-tiny.bin model included; larger models can be downloaded and placed in StreamingAssets.
Official examples are included within the repository.

Highlighted Details

Supports approximately 60 languages with multilingual transcription and translation capabilities.
Offers GPU acceleration via Vulkan and Metal for significant performance gains.
Runs entirely offline on the user's local machine, ensuring privacy and reducing latency.
Includes different model sizes for trade-offs between speed and accuracy.

Maintenance & Community

The project is maintained by Macoron. Further community engagement details (Discord, Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License. This license permits commercial use and integration into closed-source projects. The underlying whisper.cpp and OpenAI Whisper code/weights are also MIT licensed.

Limitations & Caveats

WebGL platform is not currently supported. CUDA acceleration is deprecated in favor of Vulkan; users requiring CUDA must use older releases. Metal support requires Apple Silicon (M1 or newer) for optimal performance, falling back to CPU on older hardware.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days