LLM project provides data, models, and evaluation benchmark
Top 16.6% on sourcepulse
LLM Zoo provides data, models, and evaluation benchmarks for large language models, aiming to democratize ChatGPT-like capabilities across languages. It targets researchers and developers seeking to build, evaluate, and deploy multilingual instruction-following and conversational LLMs. The project offers open-source models and training code, enabling replication and customization.
How It Works
LLM Zoo's models, Phoenix and Chimera, are trained on a combination of instruction data (self-instructed/translated and user-centered) and user-shared conversation data. This dual-data approach aims to imbue models with both instruction adherence and conversational fluency, addressing limitations of models trained on only one data type. Phoenix is a multilingual model based on BLOOMZ, while Chimera is based on LLaMA and focuses on Latin and Cyrillic languages.
Quick Start & Requirements
pip install -r requirements.txt
.python -m llmzoo.deploy.cli --model-path <model_name_or_path>
.FreedomIntelligence/phoenix-inst-chat-7b
).Highlighted Details
Maintenance & Community
The project is primarily contributed by researchers from The Chinese University of Hong Kong, Shenzhen. Contributions are welcomed via the GitHub repository.
Licensing & Compatibility
Models are released under a license that permits use and modification. However, Chimera models require original LLaMA weights, which have their own licensing terms.
Limitations & Caveats
The project acknowledges limitations common to LLMs, including lack of common sense, limited knowledge domains, potential biases inherited from training data, and difficulties in understanding emotions or nuanced context. Benchmarking is noted as a challenging task.
1 year ago
1 day