Resource for understanding emergent properties of large language models
Top 93.1% on sourcepulse
This repository collects and categorizes observed phenomena during the scaling of large foundation models, aiming to distill these into future principles or laws. It targets researchers and engineers working with large language models, providing insights into both training methodologies and emergent model properties.
How It Works
The project organizes findings into two main categories: "How" (training techniques) and "What" (model properties). For training, it highlights predictable scaling laws for loss, optimal compute allocation, batch size considerations, and learning rate schedulers (favoring cosine). For model properties, it documents emergent abilities, the inverse scaling phenomenon, double descent, grokking, and the emergence of modularity and sparse activations.
Quick Start & Requirements
This is a curated collection of research findings and does not involve direct code execution or installation. It serves as a knowledge base.
Highlighted Details
Maintenance & Community
This is a community-driven effort to collect and synthesize knowledge. Further details on community engagement or specific contributors are not detailed in the README.
Licensing & Compatibility
The repository content is presented for informational purposes. Specific licensing for the collected research papers or data is not detailed, but the project itself appears to be under a permissive license allowing for broad use and contribution.
Limitations & Caveats
The repository is a work in progress, with many phenomena still under investigation and lacking definitive consensus. Some findings, like the quantitative verification of code's contribution to reasoning, are still pending.
2 years ago
Inactive