TencentPretrain  by Tencent

PyTorch framework for multimodal pre-training and fine-tuning

created 2 years ago
1,078 stars

Top 35.8% on sourcepulse

GitHubView on GitHub
Project Summary

TencentPretrain is a modular PyTorch framework for pre-training and fine-tuning large-scale models across multiple modalities (text, vision, audio). It aims to simplify the process of reproducing existing models like BERT and GPT-2, and to facilitate the development of new multimodal architectures, targeting researchers and engineers in NLP and computer vision.

How It Works

The framework employs a modular design, separating models into distinct components: embedding, encoder, decoder, and target layers. This allows users to easily combine various pre-implemented modules to construct custom pre-training models. It supports a wide range of pre-training objectives and downstream tasks, offering flexibility for experimentation and adaptation.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >= 3.6, PyTorch >= 1.1, SentencePiece, DeepSpeed (for gigantic models), torchvision/torchaudio (for vision/audio), jieba (for whole word masking), pytorch-crf (for sequence labeling).
  • Setup: Pre-processing can be time-consuming; distributed training and DeepSpeed are supported for large-scale training.
  • Docs: Full Documentation

Highlighted Details

  • Reproduces SOTA results for models like BERT, GPT-2, ELMo, T5, and CLIP.
  • Supports multimodal pre-training (text, vision, audio).
  • Offers a model zoo with various pre-trained models.
  • Includes winning solutions for competitions like CLUE.
  • Supports distributed training and DeepSpeed for massive model training.

Maintenance & Community

The project is associated with Tencent and has a published ACL 2023 paper. Community links are not explicitly provided in the README.

Licensing & Compatibility

The repository states it is licensed under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The README implies extensive functionality but lacks explicit details on supported hardware beyond GPU requirements for distributed training. Some specialized functions require additional libraries like opencv-python and editdistance.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.