Bilingual LLaMA enhances reasoning
Top 71.3% on sourcepulse
BiLLa is an open-source, bilingual (Chinese/English) LLaMA model designed to enhance reasoning capabilities while preserving English performance. It targets researchers and developers seeking improved Chinese understanding and task-solving logic in LLMs, offering a foundation for more capable AI applications.
How It Works
BiLLa undergoes a three-stage training process. First, it's pre-trained on a mix of Chinese (Wudao), English (PILE), and translation corpora to expand its Chinese vocabulary and understanding. The second stage incorporates task-specific data (math, reading comprehension, code generation, etc.) with ChatGPT-generated explanations, aiming to strengthen the model's grasp of problem-solving logic. The final stage fine-tunes the model using conversational formats of task data and additional instruction datasets (like Alpaca, Dolly 2.0), with full parameter updates throughout.
Quick Start & Requirements
embedding_convert.py
to merge BiLLa weights with LLaMA weights.
python3 embedding_convert.py \
--model_dir /path_to/BiLLa-7B-SFT \
--meta_llama_pth_file /path_to/LLaMA/llama-7b/consolidated.00.pth
eval_codes/get_model_answer.py
for usage examples.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
BiLLa has not undergone RLHF, potentially impacting generalization. Its reasoning focus may have come at the expense of general knowledge, common sense, and real-time information recall. Multi-turn conversation capabilities are noted as potentially weak due to the training data composition. The model may generate harmful content.
2 years ago
1 day