ERNIE by PaddlePaddle

PaddlePaddle implementations for ERNIE family pre-training models

Created 7 years ago

7,681 stars

Top 6.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Jordan Burgess

Cofounder of Humanloop

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository provides official implementations for the ERNIE family of pre-trained models, developed by Baidu. It targets researchers and developers working on Natural Language Processing (NLP) and multimodal understanding and generation tasks, offering a comprehensive suite for building and deploying advanced language models.

How It Works

ERNIE models are knowledge-enhanced large language models that integrate external knowledge into pre-training. This approach aims to improve language understanding and generation capabilities by explicitly modeling relationships between entities and concepts. The framework supports both dynamic and static graph training, allowing for flexibility in model development and deployment.

Quick Start & Requirements

Install: git clone https://github.com/PaddlePaddle/ERNIE.git
Prerequisites: PaddlePaddle framework. Specific model versions may have additional requirements.
Setup: Download pre-trained models (e.g., sh download_ernie_3.0_base_ch.sh) and configure JSON files for training/inference.
Docs: ERNIE Model Introduction

Highlighted Details

Supports a wide range of NLP tasks including text classification, sequence labeling, information extraction, and text generation.
Includes multimodal models like ERNIE-ViL for vision-language understanding.
Offers data preprocessing tools for cleaning, augmentation, and format conversion.
Achieved state-of-the-art results on benchmarks like GLUE and SemEval.

Maintenance & Community

The project is actively maintained by Baidu and has seen contributions from numerous researchers. Information on roadmaps and community channels is available within the repository.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README, but it is associated with the Apache 2.0 licensed PaddlePaddle framework. Compatibility for commercial use should be verified.

Limitations & Caveats

Older versions of ERNIE code have been migrated to a repro branch, indicating potential breaking changes or a shift in the primary development focus. The README mentions using "newly upgraded dynamic-static combined ERNIE suite," suggesting that users should be aware of potential differences between versions.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days