BMPrinciples by OpenBMB

Resource for understanding emergent properties of large language models

Created 2 years ago

284 stars

Top 92.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

This repository collects and categorizes observed phenomena during the scaling of large foundation models, aiming to distill these into future principles or laws. It targets researchers and engineers working with large language models, providing insights into both training methodologies and emergent model properties.

How It Works

The project organizes findings into two main categories: "How" (training techniques) and "What" (model properties). For training, it highlights predictable scaling laws for loss, optimal compute allocation, batch size considerations, and learning rate schedulers (favoring cosine). For model properties, it documents emergent abilities, the inverse scaling phenomenon, double descent, grokking, and the emergence of modularity and sparse activations.

Quick Start & Requirements

This is a curated collection of research findings and does not involve direct code execution or installation. It serves as a knowledge base.

Highlighted Details

Predictable scaling laws suggest downstream metrics like HumanEval pass rate can be forecasted based on compute.
Optimal training compute involves a balance between model size and the number of tokens, with a rough estimate of $20 \times N$ parameters for $N$ parameters.
Cosine learning rate schedulers are favored over Noam, with the scheduler period ideally matching the total training steps.
Diverse data mixtures and the inclusion of code in pre-training datasets are noted to improve generalization and reasoning abilities.

Maintenance & Community

This is a community-driven effort to collect and synthesize knowledge. Further details on community engagement or specific contributors are not detailed in the README.

Licensing & Compatibility

The repository content is presented for informational purposes. Specific licensing for the collected research papers or data is not detailed, but the project itself appears to be under a permissive license allowing for broad use and contribution.

Limitations & Caveats

The repository is a work in progress, with many phenomena still under investigation and lacking definitive consensus. Some findings, like the quantitative verification of code's contribution to reasoning, are still pending.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days