vit.cpp  by staghado

C/C++ inference engine for Vision Transformer (ViT) models

Created 1 year ago
294 stars

Top 89.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a C/C++ inference engine for Vision Transformer (ViT) models, leveraging the ggml library for optimized performance on edge devices. It targets developers and researchers needing a lightweight, dependency-free solution for ViT inference, offering significantly faster startup times and lower memory footprints compared to traditional deep learning frameworks.

How It Works

The implementation is a direct C/C++ translation of the ViT architecture, utilizing ggml for efficient tensor operations and memory management. This approach enables aggressive quantization (4-bit, 5-bit, 8-bit) and allows for per-device optimizations via compiler flags like -march=native, leading to substantial speedups and reduced memory usage, particularly on CPUs.

Quick Start & Requirements

  • Install: Clone the repository (git clone --recurse-submodules), install Python dependencies (pip install torch timm), convert PyTorch models to GGUF format using convert-pth-to-ggml.py, build the C++ inference engine (mkdir build && cd build && cmake .. && make -j4).
  • Prerequisites: Python 3.x, PyTorch, timm, CMake.
  • Run: ./bin/vit -t <threads> -m <model_path.gguf> -i <image_path.jpeg>
  • Docs: https://github.com/staghado/vit.cpp

Highlighted Details

  • Up to 6x faster inference compared to native PyTorch on Apple M1.
  • Significantly lower memory usage: ViT-base uses ~179 MB vs ~1.61 GB in PyTorch.
  • Supports various ggml quantization types (q4_0, q4_1, q5_0, q5_1, q8_0).
  • Fast startup times suitable for serverless deployments.

Maintenance & Community

The project is inspired by whisper.cpp and llama.cpp, suggesting a community familiar with these highly successful projects. No specific community links (Discord/Slack) or active maintainer information are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify the exact range of ViT variants supported beyond "timm ViTs with different variants out of the box." Evaluation on standard datasets like ImageNet1k is listed as a "To-Do," indicating ongoing development and potential accuracy validation needs.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
1 more.

GPTQ-triton by fpgaminer

0%
307
Triton kernel for GPTQ inference, improving context scaling
Created 2 years ago
Updated 2 years ago
Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0%
456
C++ example for StarCoder inference
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.