Large-scale Chinese language models and optimization techniques
Top 15.7% on sourcepulse
This repository offers a collection of advanced pretrained language models and optimization techniques from Huawei Noah's Ark Lab, targeting NLP researchers and engineers. It provides access to state-of-the-art Chinese language models, efficient model compression techniques, and novel architectural approaches for various NLP tasks.
How It Works
The project showcases a diverse range of models, including large-scale autoregressive models like PanGu-α (200B parameters) and efficient compressed models like TinyBERT (7.5x smaller, 9.4x faster inference). It also features dynamic models (DynaBERT), byte-level tokenization tools (BBPE), and novel approaches like probabilistically masked language models (PMLM) and weight ternarization/binarization (TernaryBERT, BinaryBERT). The models are developed across multiple frameworks, including MindSpore and TensorFlow.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository contains a wide array of models with varying dependencies and development frameworks (MindSpore, TensorFlow, PyTorch), requiring users to navigate individual subdirectories for specific setup and usage instructions. The lack of a unified quick-start guide or explicit licensing information across all models may hinder rapid adoption.
1 year ago
Inactive