Taiwan-LLM  by MiuLab

LLM for Traditional Mandarin, tailored for Taiwanese culture

created 2 years ago
1,355 stars

Top 30.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides TAME (TAiwan Mixture of Experts) LLMs, specifically fine-tuned for Traditional Mandarin and Taiwanese culture. It targets researchers and developers needing robust Mandarin language capabilities, offering state-of-the-art performance on local benchmarks and supporting diverse applications like chatbots, RAG, and structured data generation.

How It Works

The project leverages the Llama-3 architecture, fine-tuning it on a large corpus of Traditional Mandarin and English data. This includes specialized knowledge from legal, manufacturing, medical, and electronics domains. The models are trained using NVIDIA NeMo and Megatron on DGX H100 systems, with inference optimized via NVIDIA TensorRT-LLM, enabling efficient deployment and high performance.

Quick Start & Requirements

  • Fine-tuning: Use Axolotl via Docker (docker run --gpus '"all"' --rm -it winglian/axolotl:main-latest) or direct execution (accelerate launch -m axolotl.cli.train example_training_config_for_finetuning_twllm.yaml).
  • Inference:
    • Hugging Face Transformers: pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct")
    • vLLM: Start server with docker run ... vllm/vllm-openai:v0.4.0.post1 --model "yentinglin/Llama-3-Taiwan-70B-Instruct".
  • Prerequisites: NVIDIA GPUs (multiple recommended for 70B model), CUDA, Docker.
  • Resources: Training and inference of the 70B model require significant GPU memory and compute.
  • Links: Demo Site, Model Collection, Axolotl, vLLM.

Highlighted Details

  • State-of-the-art performance on Taiwanese Mandarin NLP benchmarks (TMLU, Taiwan Truthful QA, Legal Eval, TW MT-Bench).
  • Supports up to 128k context length for extended text processing.
  • Demonstrates strong capabilities in multi-turn dialogue, RAG, and function calling.
  • Models are trained on NVIDIA DGX H100 systems and optimized with TensorRT-LLM.

Maintenance & Community

  • Developed in partnership with multiple Taiwanese institutions and companies including Pegatron, Chang Gung Memorial Hospital, and NVIDIA.
  • Active development with previous releases noted.
  • Twitter/X for updates.

Licensing & Compatibility

  • Released under the Llama-3 license.
  • The license permits commercial use and linking, but users should review the specific terms.

Limitations & Caveats

  • The model is provided "as-is" and is not intended for high-risk applications like medical diagnosis or legal advice. Users are responsible for evaluating output accuracy and suitability.
Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.