usls by jamjamjon

Efficient Rust inference for vision and vision-language models

Created 2 years ago

428 stars

Top 68.6% on SourcePulse

Project Summary

A Rust library powered by ONNX Runtime, usls provides efficient, cross-platform inference for state-of-the-art vision and vision-language models, particularly those under 1 billion parameters. It targets engineers and researchers needing high performance and a unified API for diverse hardware and operating systems, simplifying complex model deployments.

How It Works

The library leverages Rust for performance and ONNX Runtime for accelerated inference. It employs multi-threading, SIMD instructions, and optional CUDA acceleration. A key design feature is a unified API with consistent methods like run(), forward(), encode_images(), and encode_texts() across all supported models. Automatic model downloading from Hugging Face/GitHub, caching, and path resolution streamline the development workflow.

Quick Start & Requirements

Installation is managed via Rust's package manager (cargo). Examples demonstrate running models with specific configurations:

# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16

# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cuda:0

Prerequisites include a Rust toolchain. GPU acceleration requires compatible hardware and drivers (e.g., NVIDIA CUDA, Apple Silicon CoreML, Intel OpenVINO). Links to API Documentation, Examples, and the Model Zoo are provided.

Highlighted Details

Performance: Achieves high throughput via multi-threading, SIMD, and ONNX Runtime execution providers like TensorRT, CUDA, CoreML, and OpenVINO. Benchmarks show low latency for models like YOLOv8n.
Extensive Model Support: Features over 50 SOTA models across categories including YOLO-series, image classification, object detection, segmentation, vision-language models (VLMs), and embedding models.
Cross-Platform Compatibility: Runs on Linux, macOS, and Windows, supporting a wide range of hardware acceleration backends.
Unified API & Auto-Management: Provides a consistent Model trait interface and automates model downloading, caching, and path resolution from Hugging Face/GitHub.

Maintenance & Community

This project is maintained as a personal effort in spare time, with a strong welcome for community contributions, particularly PRs for model optimization. Users can report issues or open discussions on the GitHub repository.

Licensing & Compatibility

The project is licensed under a standard open-source license (refer to the LICENSE file). Specific compatibility for commercial use depends on the exact license terms.

Limitations & Caveats

The library focuses on vision and VLM models under 1B parameters, explicitly excluding large language models due to their specialized inference engines. As a personal project, the pace of new model integration and performance optimization may vary. Some models may require further interface or post-processing tuning.

usls by jamjamjon

Explore Similar Projects

timber by kossisoroyce

xinfer by guoqingbao

ScaleLLM by vectorch-ai

tensor_parallel by BlackSamorez

Crane by lucasjinreal

LLM-TPU by sophgo

parallelformers by tunib-ai

uzu by trymirai

TinyMaix by sipeed

esp-dl by espressif

inference by mlcommons

oneflow by Oneflow-Inc