vit-pytorch  by lucidrains

PyTorch library for Vision Transformer variants and related techniques

created 4 years ago
23,514 stars

Top 1.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch implementations of various Vision Transformer (ViT) architectures and related techniques for image classification. It serves researchers and practitioners looking to experiment with or deploy state-of-the-art ViT variants, offering a comprehensive collection of models and training strategies.

How It Works

The library implements numerous ViT variations, including architectural modifications for efficiency, performance, and specific tasks like masked image modeling. It leverages PyTorch's flexibility to offer modular components and easy integration of different attention mechanisms and positional encodings. The core approach involves patching images into sequences, processing them through transformer blocks, and outputting class predictions.

Quick Start & Requirements

  • Primary install: pip install vit-pytorch
  • Requirements: PyTorch. Specific models may have additional dependencies (e.g., nystrom-attention, x-transformers).
  • Usage: Import ViT or specific model classes and instantiate with desired parameters. See README for detailed examples.

Highlighted Details

  • Extensive collection of ViT variants (e.g., NaViT, CvT, LeViT, MaxViT, MobileViT).
  • Implementations for self-supervised learning (DINO, SimMIM, MAE).
  • Support for 3D video processing.
  • Utilities for accessing attention maps and embeddings.
  • Modules for efficient attention mechanisms.

Maintenance & Community

The repository is actively maintained by lucidrains, with contributions from the broader PyTorch and computer vision community. Links to resources and citations for each model are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README is extensive, covering many models, which can be overwhelming. Some advanced features or specific model configurations might require careful reading of the associated papers and code. Pre-trained weights are not directly provided within this repository but are often linked to external sources.

Health Check
Last commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
901 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.