Taiwan-LLM by MiuLab

LLM for Traditional Mandarin, tailored for Taiwanese culture

Created 2 years ago

1,385 stars

Top 29.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ishaan Jaffer

Cofounder of LiteLLM

Project Summary

This repository provides TAME (TAiwan Mixture of Experts) LLMs, specifically fine-tuned for Traditional Mandarin and Taiwanese culture. It targets researchers and developers needing robust Mandarin language capabilities, offering state-of-the-art performance on local benchmarks and supporting diverse applications like chatbots, RAG, and structured data generation.

How It Works

The project leverages the Llama-3 architecture, fine-tuning it on a large corpus of Traditional Mandarin and English data. This includes specialized knowledge from legal, manufacturing, medical, and electronics domains. The models are trained using NVIDIA NeMo and Megatron on DGX H100 systems, with inference optimized via NVIDIA TensorRT-LLM, enabling efficient deployment and high performance.

Quick Start & Requirements

Fine-tuning: Use Axolotl via Docker (docker run --gpus '"all"' --rm -it winglian/axolotl:main-latest) or direct execution (accelerate launch -m axolotl.cli.train example_training_config_for_finetuning_twllm.yaml).
Inference:
- Hugging Face Transformers: pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct")
- vLLM: Start server with docker run ... vllm/vllm-openai:v0.4.0.post1 --model "yentinglin/Llama-3-Taiwan-70B-Instruct".
Prerequisites: NVIDIA GPUs (multiple recommended for 70B model), CUDA, Docker.
Resources: Training and inference of the 70B model require significant GPU memory and compute.
Links: Demo Site, Model Collection, Axolotl, vLLM.

Highlighted Details

State-of-the-art performance on Taiwanese Mandarin NLP benchmarks (TMLU, Taiwan Truthful QA, Legal Eval, TW MT-Bench).
Supports up to 128k context length for extended text processing.
Demonstrates strong capabilities in multi-turn dialogue, RAG, and function calling.
Models are trained on NVIDIA DGX H100 systems and optimized with TensorRT-LLM.

Maintenance & Community

Developed in partnership with multiple Taiwanese institutions and companies including Pegatron, Chang Gung Memorial Hospital, and NVIDIA.
Active development with previous releases noted.
Twitter/X for updates.

Licensing & Compatibility

Released under the Llama-3 license.
The license permits commercial use and linking, but users should review the specific terms.

Limitations & Caveats

The model is provided "as-is" and is not intended for high-risk applications like medical diagnosis or legal advice. Users are responsible for evaluating output accuracy and suitability.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days