dropout by facebookresearch

PyTorch implementation for "Dropout Reduces Underfitting" research paper

Created 2 years ago

316 stars

Top 85.6% on SourcePulse

Project Summary

This repository provides the official PyTorch implementation for the "Dropout Reduces Underfitting" paper, introducing novel "early dropout" and "late dropout" techniques. It targets researchers and practitioners in deep learning, particularly those working with vision transformers and convolutional networks, aiming to improve model performance by addressing both underfitting and overfitting.

How It Works

The project implements two distinct dropout strategies: "early dropout" is applied early in training to help underfitting models achieve lower training loss, while "late dropout" is applied later in training to enhance generalization and combat overfitting. This dual approach allows for more nuanced control over the training process and model convergence.

Quick Start & Requirements

Install: Follow instructions in INSTALL.md.
Prerequisites: PyTorch, timm library, ConvNeXt codebase. Training commands suggest multi-node (4 nodes, 8 GPUs each) or single-machine (8 GPUs) setups.
Resources: Requires significant GPU resources for training.
Links: INSTALL.md (link not provided in README), timm library, ConvNeXt codebase.

Highlighted Details

Achieves state-of-the-art results on ImageNet-1K for various models including ViT, Mixer, and ConvNeXt.
Demonstrates performance gains with both "basic" and "improved" training recipes.
Provides example commands for training and evaluation on both multi-node and single-machine configurations.
Codebase built upon the established timm and ConvNeXt libraries.

Maintenance & Community

Authors are affiliated with Meta AI, UC Berkeley, and MBZUAI.
No explicit community links (Discord, Slack) or roadmap are provided in the README.

Licensing & Compatibility

License: CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
Restrictions: Non-commercial use only. Compatibility with closed-source projects is restricted due to the NC clause.

Limitations & Caveats

The CC-BY-NC 4.0 license strictly prohibits commercial use, limiting adoption for many industry applications. The README also implies significant computational resources are needed for training, potentially posing a barrier for users without access to large GPU clusters.

dropout by facebookresearch

Explore Similar Projects

Targeted-Dropout by Cohere-Labs-Community

segformer-pytorch by bubbliiiing

mistral by stanford-crfm

distribuuuu by BIGBALLON

libai by Oneflow-Inc

dl_note by harleyszhang

open-metric-learning by OML-Team

vicreg by facebookresearch

WeightWatcher by CalculatedContent

Megatron-DeepSpeed by bigscience-workshop

ktrain by amaiya

FlagAI by FlagAI-Open