PaddleViT  by BR-IDL

PaddleViT: Vision models for PaddlePaddle

created 3 years ago
1,237 stars

Top 32.5% on sourcepulse

GitHubView on GitHub
Project Summary

PaddleViT is a comprehensive library for state-of-the-art Vision Transformer (ViT) and MLP models, designed for the PaddlePaddle deep learning framework. It provides implementations, pre-trained weights, and training/validation scripts for various computer vision tasks including image classification, object detection, semantic segmentation, and GANs, targeting researchers and practitioners looking to leverage cutting-edge CV techniques.

How It Works

PaddleViT offers a modular design, with each model architecture implemented in a standalone Python module. This allows for easy modification and experimentation. The library integrates popular layers, utilities, optimizers, schedulers, and data augmentations, enabling users to reproduce state-of-the-art results and fine-tune models on custom datasets. It supports distributed data-parallel training (DDP) and mixed-precision training (AMP) for enhanced performance.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip. A conda environment is recommended.
  • Prerequisites: Linux/macOS/Windows, Python 3.6+, PaddlePaddle 2.1.0+ with CUDA 10.2+.
  • Usage: Pre-trained weights can be downloaded and loaded directly into models using provided configuration files and Python scripts for evaluation and training.
  • Links: Docs, Image Classification, Object Detection, Semantic Segmentation, GANs.

Highlighted Details

  • Implements a wide array of ViT and MLP architectures including ViT, DeiT, Swin Transformer, VOLO, CSwin Transformer, CaiT, PVTv2, Shuffle Transformer, T2T-ViT, CrossViT, BEiT, Focal Transformer, Mobile-ViT, ViP, XCiT, PiT, HaloNet, PoolFormer, BoTNet, CvT, HvT, TopFormer, ConvNeXt, CoaT, ResT, ResTV2, MLP-Mixer, ResMLP, gMLP, FF-Only, RepMLP, CycleMLP, ConvMixer, ConvMLP, RepLKNet, MobileOne, DETR, SegFormer, and more.
  • Provides extensive benchmark results for ImageNet classification, COCO object detection, and Pascal Context/Cityscapes/ADE20K semantic segmentation.
  • Includes scripts for both single-GPU and multi-GPU training and evaluation.
  • Offers model export capabilities for production deployment.

Maintenance & Community

  • The project is actively maintained by BR-IDL.
  • Contributions are encouraged via CONTRIBUTING.md.
  • Contact is via GitHub issues.

Licensing & Compatibility

  • Licensed under the Apache-2.0 license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Some model checkpoints are marked as "TODO".
  • The README is primarily in English and Chinese, with some tutorials noted as Chinese-only.
Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.