TencentPretrain by Tencent

PyTorch framework for multimodal pre-training and fine-tuning

Created 3 years ago

1,087 stars

Top 35.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

TencentPretrain is a modular PyTorch framework for pre-training and fine-tuning large-scale models across multiple modalities (text, vision, audio). It aims to simplify the process of reproducing existing models like BERT and GPT-2, and to facilitate the development of new multimodal architectures, targeting researchers and engineers in NLP and computer vision.

How It Works

The framework employs a modular design, separating models into distinct components: embedding, encoder, decoder, and target layers. This allows users to easily combine various pre-implemented modules to construct custom pre-training models. It supports a wide range of pre-training objectives and downstream tasks, offering flexibility for experimentation and adaptation.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python >= 3.6, PyTorch >= 1.1, SentencePiece, DeepSpeed (for gigantic models), torchvision/torchaudio (for vision/audio), jieba (for whole word masking), pytorch-crf (for sequence labeling).
Setup: Pre-processing can be time-consuming; distributed training and DeepSpeed are supported for large-scale training.
Docs: Full Documentation

Highlighted Details

Reproduces SOTA results for models like BERT, GPT-2, ELMo, T5, and CLIP.
Supports multimodal pre-training (text, vision, audio).
Offers a model zoo with various pre-trained models.
Includes winning solutions for competitions like CLUE.
Supports distributed training and DeepSpeed for massive model training.

Maintenance & Community

The project is associated with Tencent and has a published ACL 2023 paper. Community links are not explicitly provided in the README.

Licensing & Compatibility

The repository states it is licensed under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The README implies extensive functionality but lacks explicit details on supported hardware beyond GPU requirements for distributed training. Some specialized functions require additional libraries like opencv-python and editdistance.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days