Yi  by 01-ai

Open-source bilingual LLMs trained from scratch

created 1 year ago
7,834 stars

Top 6.8% on sourcepulse

GitHubView on GitHub
Project Summary

The Yi series models are open-source large language models developed by 01.AI, trained from scratch on a 3T multilingual corpus. They are designed for strong language understanding, reasoning, and comprehension, targeting researchers, developers, and businesses seeking high-performing bilingual LLMs.

How It Works

Yi models are built upon the Transformer architecture, similar to Llama, but are not derivatives. This foundation provides stability and compatibility within the AI ecosystem. The key differentiators are 01.AI's proprietary training datasets, efficient pipelines, and robust infrastructure, which contribute to Yi models' competitive performance against leading LLMs.

Quick Start & Requirements

  • Installation: Options include pip (Python 3.10+), Docker, conda-lock, and llama.cpp for quantized models.
  • Dependencies: Python 3.10+, PyTorch, Transformers, DeepSpeed (for fine-tuning), CUDA (for GPU acceleration), Docker, git-lfs.
  • Hardware: Varies by model size; e.g., Yi-6B requires ~15GB VRAM, while Yi-34B requires ~72GB VRAM. Quantized versions (4-bit, 8-bit) significantly reduce VRAM requirements.
  • Resources: Yi Cookbook, Hugging Face, ModelScope.

Highlighted Details

  • Performance: Yi-34B-Chat ranked second on AlpacaEval (behind GPT-4 Turbo) and first among open-source models on Hugging Face Open LLM Leaderboard and C-Eval.
  • Context Window: Models like Yi-34B-200K support a 200K context window.
  • Bilingual: Trained on a 3T multilingual corpus, excelling in both English and Chinese.
  • Quantization: Supports GPTQ and AWQ for reduced VRAM and faster inference.

Maintenance & Community

The project is actively maintained by 01.AI. Community engagement is encouraged via Discord and WeChat. Recent updates include the Yi-1.5 series and the Yi Cookbook.

Licensing & Compatibility

The Yi series models are distributed under the Apache 2.0 license, permitting personal, academic, and commercial use. Derivative works require attribution.

Limitations & Caveats

The chat models' increased response diversity, while beneficial for creative tasks, may lead to higher instances of hallucination or non-determinism. Adjusting generation parameters like temperature is recommended for more coherent outputs.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

ChatGLM-6B by zai-org

0.1%
41k
Bilingual dialogue language model for research
created 2 years ago
updated 1 year ago
Feedback? Help us improve.