useful-transformers  by moonshine-ai

Library for efficient Transformer model inference on edge devices

Created 2 years ago
459 stars

Top 65.9% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides efficient inference for Transformer models, specifically targeting low-cost, low-energy edge processors. It aims to enable high-speed speech-to-text transcription using OpenAI's Whisper model on devices like RK3588-based single-board computers, offering significant speedups over existing implementations.

How It Works

The core innovation lies in leveraging the NPU (Neural Processing Unit) available on RK3588 processors for FP16 matrix multiplication. This approach significantly accelerates the large matrix operations within the Transformer encoder, which are critical for performance. The library's initial focus is on optimizing the Whisper model, particularly the tiny.en variant.

Quick Start & Requirements

  • Install via wheel package: python -m pip install https://github.com/usefulsensors/useful-transformers/releases/download/0.1_rk3588/useful_transformers-0.1-cp310-cp310-linux_aarch64.whl
  • Requires RK3588 processor and Linux aarch64 environment.
  • Example transcription: taskset -c 4-7 python -m useful_transformers.transcribe_wav <wav_file>
  • See GitHub Releases for the wheel package.

Highlighted Details

  • Achieves 30x real-time transcription speeds for Whisper tiny.en.
  • Demonstrates 2x speed improvement over faster-whisper's int8 implementation.
  • Utilizes FP16 matrix multiplication on the RK3588 NPU for performance gains.

Maintenance & Community

  • Active contributors include Nat Jeffries, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden, and Ali Zartash.
  • TODO list indicates plans for larger Whisper models, int8/int4 matmuls, and asynchronous kernel launches.

Licensing & Compatibility

  • The license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore unclear.

Limitations & Caveats

The current implementation is limited to the tiny.en and base.en Whisper models, with larger models yet to be supported. Further optimizations are planned, including int8/int4 matmuls and asynchronous kernel launches, suggesting the library is still under active development.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.