usls  by jamjamjon

Efficient Rust inference for vision and vision-language models

Created 2 years ago
407 stars

Top 71.1% on SourcePulse

GitHubView on GitHub
Project Summary

A Rust library powered by ONNX Runtime, usls provides efficient, cross-platform inference for state-of-the-art vision and vision-language models, particularly those under 1 billion parameters. It targets engineers and researchers needing high performance and a unified API for diverse hardware and operating systems, simplifying complex model deployments.

How It Works

The library leverages Rust for performance and ONNX Runtime for accelerated inference. It employs multi-threading, SIMD instructions, and optional CUDA acceleration. A key design feature is a unified API with consistent methods like run(), forward(), encode_images(), and encode_texts() across all supported models. Automatic model downloading from Hugging Face/GitHub, caching, and path resolution streamline the development workflow.

Quick Start & Requirements

Installation is managed via Rust's package manager (cargo). Examples demonstrate running models with specific configurations:

# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16

# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cuda:0

Prerequisites include a Rust toolchain. GPU acceleration requires compatible hardware and drivers (e.g., NVIDIA CUDA, Apple Silicon CoreML, Intel OpenVINO). Links to API Documentation, Examples, and the Model Zoo are provided.

Highlighted Details

  • Performance: Achieves high throughput via multi-threading, SIMD, and ONNX Runtime execution providers like TensorRT, CUDA, CoreML, and OpenVINO. Benchmarks show low latency for models like YOLOv8n.
  • Extensive Model Support: Features over 50 SOTA models across categories including YOLO-series, image classification, object detection, segmentation, vision-language models (VLMs), and embedding models.
  • Cross-Platform Compatibility: Runs on Linux, macOS, and Windows, supporting a wide range of hardware acceleration backends.
  • Unified API & Auto-Management: Provides a consistent Model trait interface and automates model downloading, caching, and path resolution from Hugging Face/GitHub.

Maintenance & Community

This project is maintained as a personal effort in spare time, with a strong welcome for community contributions, particularly PRs for model optimization. Users can report issues or open discussions on the GitHub repository.

Licensing & Compatibility

The project is licensed under a standard open-source license (refer to the LICENSE file). Specific compatibility for commercial use depends on the exact license terms.

Limitations & Caveats

The library focuses on vision and VLM models under 1B parameters, explicitly excluding large language models due to their specialized inference engines. As a personal project, the pace of new model integration and performance optimization may vary. Some models may require further interface or post-processing tuning.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
788
Toolkit for easy model parallelization
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.