SEA-LION is a family of open-source Large Language Models (LLMs) specifically designed to understand and cater to the diverse linguistic and cultural contexts of Southeast Asia. It targets researchers, developers, and organizations working with or within the region, aiming to improve representation for under-represented populations and low-resource languages.
How It Works
SEA-LION models are built through a combination of continued pre-training (CPT) and supervised fine-tuning (SFT) on foundational models like Llama 3.1 and Gemma2. This approach leverages existing powerful architectures while adapting them to the specific nuances of Southeast Asian languages and cultures, as evaluated by their custom SEA-HELM benchmark.
Quick Start & Requirements
- Models are available via Hugging Face (links not provided in README).
- Requires standard LLM inference hardware (GPU recommended).
- Specific model variants may inherit licensing restrictions from base models (e.g., Llama 3.1, Gemma2).
Highlighted Details
- Offers multiple model sizes (3B to 70B) and context lengths (up to 128K).
- v3.5 models are optimized for reasoning tasks.
- Evaluated using SEA-HELM, a custom benchmark focusing on English performance, SEA chat proficiency, instruction-following, and linguistic tasks.
- Models are available in Base, Instruct, and GGUF formats.
Maintenance & Community
- Anchored by AI Singapore's Products Pillar.
- Welcomes community contributions for bug reporting, documentation, evaluation tasks, and model training.
- Contact via GitHub issues or an inquiry form.
Licensing & Compatibility
- Primarily licensed under MIT, but exact terms depend on the base model used.
- Llama-based variants may be subject to the Llama 3 License, potentially restricting commercial use. Gemma-based variants may have different terms. Users must check individual model cards.
Limitations & Caveats
- Commercial use restrictions may apply depending on the base model. Users must verify licensing for each specific SEA-LION model.