PyTorch framework for multimodal pre-training and fine-tuning
Top 35.8% on sourcepulse
TencentPretrain is a modular PyTorch framework for pre-training and fine-tuning large-scale models across multiple modalities (text, vision, audio). It aims to simplify the process of reproducing existing models like BERT and GPT-2, and to facilitate the development of new multimodal architectures, targeting researchers and engineers in NLP and computer vision.
How It Works
The framework employs a modular design, separating models into distinct components: embedding, encoder, decoder, and target layers. This allows users to easily combine various pre-implemented modules to construct custom pre-training models. It supports a wide range of pre-training objectives and downstream tasks, offering flexibility for experimentation and adaptation.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
The project is associated with Tencent and has a published ACL 2023 paper. Community links are not explicitly provided in the README.
Licensing & Compatibility
The repository states it is licensed under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The README implies extensive functionality but lacks explicit details on supported hardware beyond GPU requirements for distributed training. Some specialized functions require additional libraries like opencv-python
and editdistance
.
1 year ago
1 week