PaddleViT by BR-IDL

PaddleViT: Vision models for PaddlePaddle

Created 4 years ago

1,238 stars

Top 31.7% on SourcePulse

Project Summary

PaddleViT is a comprehensive library for state-of-the-art Vision Transformer (ViT) and MLP models, designed for the PaddlePaddle deep learning framework. It provides implementations, pre-trained weights, and training/validation scripts for various computer vision tasks including image classification, object detection, semantic segmentation, and GANs, targeting researchers and practitioners looking to leverage cutting-edge CV techniques.

How It Works

PaddleViT offers a modular design, with each model architecture implemented in a standalone Python module. This allows for easy modification and experimentation. The library integrates popular layers, utilities, optimizers, schedulers, and data augmentations, enabling users to reproduce state-of-the-art results and fine-tune models on custom datasets. It supports distributed data-parallel training (DDP) and mixed-precision training (AMP) for enhanced performance.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip. A conda environment is recommended.
Prerequisites: Linux/macOS/Windows, Python 3.6+, PaddlePaddle 2.1.0+ with CUDA 10.2+.
Usage: Pre-trained weights can be downloaded and loaded directly into models using provided configuration files and Python scripts for evaluation and training.
Links: Docs, Image Classification, Object Detection, Semantic Segmentation, GANs.

Highlighted Details

Implements a wide array of ViT and MLP architectures including ViT, DeiT, Swin Transformer, VOLO, CSwin Transformer, CaiT, PVTv2, Shuffle Transformer, T2T-ViT, CrossViT, BEiT, Focal Transformer, Mobile-ViT, ViP, XCiT, PiT, HaloNet, PoolFormer, BoTNet, CvT, HvT, TopFormer, ConvNeXt, CoaT, ResT, ResTV2, MLP-Mixer, ResMLP, gMLP, FF-Only, RepMLP, CycleMLP, ConvMixer, ConvMLP, RepLKNet, MobileOne, DETR, SegFormer, and more.
Provides extensive benchmark results for ImageNet classification, COCO object detection, and Pascal Context/Cityscapes/ADE20K semantic segmentation.
Includes scripts for both single-GPU and multi-GPU training and evaluation.
Offers model export capabilities for production deployment.

Maintenance & Community

The project is actively maintained by BR-IDL.
Contributions are encouraged via CONTRIBUTING.md.
Contact is via GitHub issues.

Licensing & Compatibility

Licensed under the Apache-2.0 license.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Some model checkpoints are marked as "TODO".
The README is primarily in English and Chinese, with some tutorials noted as Chinese-only.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Parameter-Efficient-Transfer-Learning-Benchmark by synbol

Visual PEFT benchmark for computer vision tasks

Created 1 year ago

Updated 1 year ago

Awesome-Parameter-Efficient-Transfer-Learning by synbol

Resource list for parameter-efficient transfer learning

Created 3 years ago

Updated 1 month ago

Awesome-Vision-Transformer-Collection by GuanRunwei

A comprehensive compendium of Vision Transformer research

Created 4 years ago

Updated 3 years ago

Starred by

Noah Snavely

Noah Snavely(Research Scientist at Google DeepMind; Professor at Cornell Tech).

DenseMatching by PruneTruong

PyTorch library for dense matching network research

Created 5 years ago

Updated 2 years ago

Modern-Computer-Vision-with-PyTorch-2E by PacktPublishing

Practical computer vision and generative AI using PyTorch

Created 1 year ago

Updated 4 months ago

PytorchNetHub by bobo0810

Pytorch repo for paper reproduction, algorithm contests, and model deployment

Created 7 years ago

Updated 8 months ago

neural-api by joaopauloschuler

Pascal-based deep learning API for AVX/OpenCL-capable devices

Created 6 years ago

Updated 2 weeks ago

ECCV2024-Papers-with-Code by amusi

Curated list of ECCV 2024 papers with code

Created 5 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0), and

4 more.

Awesome-Transformer-Attention by cmhungsteve

Vision Transformer/Attention paper list

Created 4 years ago

Updated 1 year ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

8 more.

vision_transformer by google-research

Vision Transformer and MLP-Mixer models in JAX/Flax

Created 5 years ago

Updated 2 days ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda), and

6 more.

3D-Machine-Learning by timzhang642

Resource list for 3D machine learning

Created 8 years ago

Updated 1 year ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

26 more.

pytorch-image-models by huggingface

PyTorch image model collection with training, eval, and inference scripts

Created 7 years ago

Updated 2 days ago

Feedback? Help us improve.