Discover and explore top open-source AI tools and projects—updated daily.
jamjamjonEfficient Rust inference for vision and vision-language models
Top 78.5% on SourcePulse
A Rust library powered by ONNX Runtime, usls provides efficient, cross-platform inference for state-of-the-art vision and vision-language models, particularly those under 1 billion parameters. It targets engineers and researchers needing high performance and a unified API for diverse hardware and operating systems, simplifying complex model deployments.
How It Works
The library leverages Rust for performance and ONNX Runtime for accelerated inference. It employs multi-threading, SIMD instructions, and optional CUDA acceleration. A key design feature is a unified API with consistent methods like run(), forward(), encode_images(), and encode_texts() across all supported models. Automatic model downloading from Hugging Face/GitHub, caching, and path resolution streamline the development workflow.
Quick Start & Requirements
Installation is managed via Rust's package manager (cargo). Examples demonstrate running models with specific configurations:
# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16
# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cuda:0
Prerequisites include a Rust toolchain. GPU acceleration requires compatible hardware and drivers (e.g., NVIDIA CUDA, Apple Silicon CoreML, Intel OpenVINO). Links to API Documentation, Examples, and the Model Zoo are provided.
Highlighted Details
Model trait interface and automates model downloading, caching, and path resolution from Hugging Face/GitHub.Maintenance & Community
This project is maintained as a personal effort in spare time, with a strong welcome for community contributions, particularly PRs for model optimization. Users can report issues or open discussions on the GitHub repository.
Licensing & Compatibility
The project is licensed under a standard open-source license (refer to the LICENSE file). Specific compatibility for commercial use depends on the exact license terms.
Limitations & Caveats
The library focuses on vision and VLM models under 1B parameters, explicitly excluding large language models due to their specialized inference engines. As a personal project, the pace of new model integration and performance optimization may vary. Some models may require further interface or post-processing tuning.
5 days ago
Inactive
trymirai
tunib-ai