Curated list of open-source language models and resources
Top 92.5% on sourcepulse
This repository serves as a curated list of open-source language models and associated resources, targeting AI researchers and practitioners interested in fully reproducible LLM development. It aims to counter the trend of proprietary models by providing access to training code, data, and architectures, fostering scientific study and the development of truly open LMs.
How It Works
The project curates links to various components of the LLM development pipeline, including pretraining data, model architectures, training code, and adaptation techniques like instruction tuning and RLHF. It emphasizes models where more than just weights are open, prioritizing projects that offer the complete pipeline for transparency and scientific rigor.
Quick Start & Requirements
This repository is a curated list and does not have a direct installation or execution command. It links to external projects, each with its own requirements.
Highlighted Details
Maintenance & Community
The project is maintained by Allen Institute for AI (Ai2) and encourages community contributions via Pull Requests. It was built for a 2024 NeurIPS tutorial.
Licensing & Compatibility
The repository itself is a list of links. The licensing and compatibility of the individual projects linked within the repository will vary and must be checked on a per-project basis.
Limitations & Caveats
This is a curated list, not a unified framework. Users must navigate to individual linked projects to assess their specific features, maturity, and usability. Some linked projects may be in early stages of development or have specific hardware requirements.
7 months ago
Inactive