Curated list of LLM practical guide resources (tree, examples, papers)
Top 5.1% on sourcepulse
This repository provides a curated collection of practical resources for Large Language Models (LLMs), aimed at practitioners and researchers. It offers a structured overview of LLM development, applications, data, and ethical considerations, drawing from a comprehensive survey paper.
How It Works
The project organizes information into practical guides covering LLM architectures (BERT-style, GPT-style), data (pretraining, finetuning, test), NLP tasks, scaling abilities, efficiency, trustworthiness, and alignment efforts. It includes an evolutionary tree of modern LLMs and a detailed table of usage and licensing restrictions for various models and their datasets.
Quick Start & Requirements
This is a curated list of resources, not a runnable software project. No installation or execution commands are provided.
Highlighted Details
Maintenance & Community
The repository is actively updated, with recent additions including usage and restrictions sections, and new models like AlexaTM and UniLM. It cites a survey paper and welcomes pull requests for refinement.
Licensing & Compatibility
The content itself is not explicitly licensed, but the repository links to numerous papers and resources with varying licenses. The "Usage and Restrictions" table clearly outlines licenses (e.g., Apache 2.0, MIT, CC BY-SA 4.0, CC BY-NC 4.0, BigScience RAIL License, TII Falcon LLM License) and commercial use permissions for many LLMs and their datasets, with some models explicitly prohibiting commercial use or having specific restrictions.
Limitations & Caveats
This repository is a curated list of links and information, not a functional tool. Users must independently verify the licensing and usage terms for each model and dataset referenced.
1 year ago
Inactive