tiny-llm-zh  by wdndev

Chinese LLM for learning large language models

Created 1 year ago
820 stars

Top 43.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project implements small-parameter Chinese Large Language Models (LLMs) from scratch, targeting engineers and researchers for rapid learning of LLM concepts. It provides a complete pipeline from tokenization to deployment, with open-source code and data, enabling a full understanding of LLM development.

How It Works

The project follows a standard LLM architecture, incorporating components like RMSNorm and RoPE. It details a two-stage training process: pre-training (PTM) and instruction fine-tuning (SFT), with optional human alignment (RLHF, DPO). The implementation leverages the Hugging Face Transformers library and DeepSpeed for efficient multi-GPU/multi-node training, supporting various model sizes and an optional Mixture-of-Experts (MoE) architecture.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+, Transformers 4.37.2+, CUDA 11.4+ (recommended for training).
  • Usage: Load models via Hugging Face or ModelScope. Example inference code provided for both.
  • Docs: ModelScope, Hugging Face

Highlighted Details

  • Offers models ranging from 16M to 1.5B parameters.
  • Supports vLLM and a modified llama.cpp for efficient inference.
  • Includes a Mixture-of-Experts (MoE) variant.
  • Provides a complete pipeline: Tokenizer -> PTM -> SFT -> RLHF/DPO -> Evaluation -> Deployment.

Maintenance & Community

The project is maintained by wdndev. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project prioritizes demonstrating the full LLM pipeline over achieving state-of-the-art performance, resulting in lower evaluation scores and occasional generation errors. The llama.cpp deployment is a modified version and is recommended for Linux environments.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
47 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Feedback? Help us improve.