Open-source PaLM implementation for language model research
Top 44.2% on sourcepulse
This repository provides an open-source implementation of Google's PaLM models, targeting researchers and developers interested in large language models. It offers pre-trained models of various sizes (150M to 2.1B parameters) with 8k context length, enabling efficient inference and fine-tuning for custom applications.
How It Works
The implementation leverages several advanced techniques for performance and efficiency. It utilizes Flash Attention for faster and more memory-efficient attention mechanisms, Xpos Rotary Embeddings for improved length extrapolation, and multi-query single-key-value attention for more efficient decoding. The models are trained using decoupled weight decay Adam W, with an option for Stable Adam W, and distributed training scripts compatible with Accelerate and Slurm.
Quick Start & Requirements
pip3 install -r requirements.txt
accelerate
, deepspeed
. A100 GPU recommended for dtype
inference.torch.hub.load("conceptofmind/PaLM", "palm_410m_8k_v0").cuda()
or load checkpoints directly.python3 inference.py "Your prompt"
Highlighted Details
torch.compile()
, Flash Attention, and Hidet for performance.Maintenance & Community
The project acknowledges contributions from CarperAI, Stability.ai, and Phil Wang (Lucidrains). Huggingface integration is in progress.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is described as baseline versions with further training planned. Huggingface integration is a work-in-progress. Specific hardware requirements (A100 GPU) are mentioned for certain inference features.
1 year ago
1 day