vit-pytorch by lucidrains

PyTorch library for Vision Transformer variants and related techniques

Created 5 years ago

24,839 stars

Top 1.6% on SourcePulse

View on GitHub

15 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Phil Wang

Prolific Research Paper Implementer

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

and 11 more!

Project Summary

This repository provides PyTorch implementations of various Vision Transformer (ViT) architectures and related techniques for image classification. It serves researchers and practitioners looking to experiment with or deploy state-of-the-art ViT variants, offering a comprehensive collection of models and training strategies.

How It Works

The library implements numerous ViT variations, including architectural modifications for efficiency, performance, and specific tasks like masked image modeling. It leverages PyTorch's flexibility to offer modular components and easy integration of different attention mechanisms and positional encodings. The core approach involves patching images into sequences, processing them through transformer blocks, and outputting class predictions.

Quick Start & Requirements

Primary install: pip install vit-pytorch
Requirements: PyTorch. Specific models may have additional dependencies (e.g., nystrom-attention, x-transformers).
Usage: Import ViT or specific model classes and instantiate with desired parameters. See README for detailed examples.

Highlighted Details

Extensive collection of ViT variants (e.g., NaViT, CvT, LeViT, MaxViT, MobileViT).
Implementations for self-supervised learning (DINO, SimMIM, MAE).
Support for 3D video processing.
Utilities for accessing attention maps and embeddings.
Modules for efficient attention mechanisms.

Maintenance & Community

The repository is actively maintained by lucidrains, with contributions from the broader PyTorch and computer vision community. Links to resources and citations for each model are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README is extensive, covering many models, which can be overwhelming. Some advanced features or specific model configurations might require careful reading of the associated papers and code. Pre-trained weights are not directly provided within this repository but are often linked to external sources.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

250 stars in the last 30 days