PaddleViT  by BR-IDL

PaddleViT: Vision models for PaddlePaddle

Created 4 years ago
1,231 stars

Top 32.0% on SourcePulse

GitHubView on GitHub
Project Summary

PaddleViT is a comprehensive library for state-of-the-art Vision Transformer (ViT) and MLP models, designed for the PaddlePaddle deep learning framework. It provides implementations, pre-trained weights, and training/validation scripts for various computer vision tasks including image classification, object detection, semantic segmentation, and GANs, targeting researchers and practitioners looking to leverage cutting-edge CV techniques.

How It Works

PaddleViT offers a modular design, with each model architecture implemented in a standalone Python module. This allows for easy modification and experimentation. The library integrates popular layers, utilities, optimizers, schedulers, and data augmentations, enabling users to reproduce state-of-the-art results and fine-tune models on custom datasets. It supports distributed data-parallel training (DDP) and mixed-precision training (AMP) for enhanced performance.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip. A conda environment is recommended.
  • Prerequisites: Linux/macOS/Windows, Python 3.6+, PaddlePaddle 2.1.0+ with CUDA 10.2+.
  • Usage: Pre-trained weights can be downloaded and loaded directly into models using provided configuration files and Python scripts for evaluation and training.
  • Links: Docs, Image Classification, Object Detection, Semantic Segmentation, GANs.

Highlighted Details

  • Implements a wide array of ViT and MLP architectures including ViT, DeiT, Swin Transformer, VOLO, CSwin Transformer, CaiT, PVTv2, Shuffle Transformer, T2T-ViT, CrossViT, BEiT, Focal Transformer, Mobile-ViT, ViP, XCiT, PiT, HaloNet, PoolFormer, BoTNet, CvT, HvT, TopFormer, ConvNeXt, CoaT, ResT, ResTV2, MLP-Mixer, ResMLP, gMLP, FF-Only, RepMLP, CycleMLP, ConvMixer, ConvMLP, RepLKNet, MobileOne, DETR, SegFormer, and more.
  • Provides extensive benchmark results for ImageNet classification, COCO object detection, and Pascal Context/Cityscapes/ADE20K semantic segmentation.
  • Includes scripts for both single-GPU and multi-GPU training and evaluation.
  • Offers model export capabilities for production deployment.

Maintenance & Community

  • The project is actively maintained by BR-IDL.
  • Contributions are encouraged via CONTRIBUTING.md.
  • Contact is via GitHub issues.

Licensing & Compatibility

  • Licensed under the Apache-2.0 license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Some model checkpoints are marked as "TODO".
  • The README is primarily in English and Chinese, with some tutorials noted as Chinese-only.
Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chuan Li Chuan Li(Chief Scientific Officer at Lambda), and
6 more.

3D-Machine-Learning by timzhang642

0.1%
10k
Resource list for 3D machine learning
Created 8 years ago
Updated 1 year ago
Feedback? Help us improve.