usls  by jamjamjon

Efficient Rust inference for vision and vision-language models

Created 1 year ago
358 stars

Top 78.5% on SourcePulse

GitHubView on GitHub
Project Summary

A Rust library powered by ONNX Runtime, usls provides efficient, cross-platform inference for state-of-the-art vision and vision-language models, particularly those under 1 billion parameters. It targets engineers and researchers needing high performance and a unified API for diverse hardware and operating systems, simplifying complex model deployments.

How It Works

The library leverages Rust for performance and ONNX Runtime for accelerated inference. It employs multi-threading, SIMD instructions, and optional CUDA acceleration. A key design feature is a unified API with consistent methods like run(), forward(), encode_images(), and encode_texts() across all supported models. Automatic model downloading from Hugging Face/GitHub, caching, and path resolution streamline the development workflow.

Quick Start & Requirements

Installation is managed via Rust's package manager (cargo). Examples demonstrate running models with specific configurations:

# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16

# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cuda:0

Prerequisites include a Rust toolchain. GPU acceleration requires compatible hardware and drivers (e.g., NVIDIA CUDA, Apple Silicon CoreML, Intel OpenVINO). Links to API Documentation, Examples, and the Model Zoo are provided.

Highlighted Details

  • Performance: Achieves high throughput via multi-threading, SIMD, and ONNX Runtime execution providers like TensorRT, CUDA, CoreML, and OpenVINO. Benchmarks show low latency for models like YOLOv8n.
  • Extensive Model Support: Features over 50 SOTA models across categories including YOLO-series, image classification, object detection, segmentation, vision-language models (VLMs), and embedding models.
  • Cross-Platform Compatibility: Runs on Linux, macOS, and Windows, supporting a wide range of hardware acceleration backends.
  • Unified API & Auto-Management: Provides a consistent Model trait interface and automates model downloading, caching, and path resolution from Hugging Face/GitHub.

Maintenance & Community

This project is maintained as a personal effort in spare time, with a strong welcome for community contributions, particularly PRs for model optimization. Users can report issues or open discussions on the GitHub repository.

Licensing & Compatibility

The project is licensed under a standard open-source license (refer to the LICENSE file). Specific compatibility for commercial use depends on the exact license terms.

Limitations & Caveats

The library focuses on vision and VLM models under 1B parameters, explicitly excluding large language models due to their specialized inference engines. As a personal project, the pace of new model integration and performance optimization may vary. Some models may require further interface or post-processing tuning.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
14
Issues (30d)
13
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
791
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Feedback? Help us improve.