bloomchat  by sambanova

Training and inference code for BLOOMChat, a 176B multilingual chat model

created 2 years ago
584 stars

Top 56.3% on sourcepulse

GitHubView on GitHub
Project Summary

BLOOMChat provides the code and methodology for instruction-tuning the 176 billion parameter BLOOM model into a multilingual conversational AI. It is targeted at researchers and developers interested in replicating or adapting large-scale chat model training and deployment. The project offers a path to a powerful, open-source conversational agent.

How It Works

BLOOMChat is instruction-tuned from the base BLOOM model using a curated mix of conversational datasets, including OpenChatKit's OIG, Dolly 2.0, and OASST1. The training process, detailed in the training directory, was performed on SambaNova DataScale systems leveraging their proprietary Reconfigurable Dataflow Architecture (RDU). While the training code is specific to their hardware, the inference code is adapted for standard GPU setups using Hugging Face's transformers-bloom-inference repository.

Quick Start & Requirements

  • Inference Setup: Clone huggingface/transformers-bloom-inference, install dependencies via pipenv, and modify specific files (hf_accelerate.py, cli.py) as per the README.
  • Prerequisites: Python 3.9, pipenv, deepspeed (for inference), and potentially multiple A100 GPUs (80GB recommended) for efficient inference.
  • Inference Commands: Provided for bf16 and int8 precision with and without sampling. See README for specific commands.
  • Resources: Inference requires significant GPU memory. Training was conducted on SambaNova's RDU.
  • Links: HF Hosting, Blog Post, Discord

Highlighted Details

  • 176 billion parameter multilingual chat model.
  • Instruction-tuned on OIG, Dolly 2.0, and OASST1 datasets.
  • Inference code adapted for Hugging Face Accelerate and transformers-bloom-inference.
  • Supports multiple languages for conversation, QA, and generative answers.

Maintenance & Community

The project acknowledges contributions from SambaNova Systems and Together Computer. Further details on community engagement or roadmap are not explicitly provided in the README.

Licensing & Compatibility

The model weights are available via Hugging Face. The code repository itself does not explicitly state a license, but it is associated with SambaNova Systems. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms of the code and model weights.

Limitations & Caveats

The training code is specific to SambaNova's RDU hardware and is not directly usable on standard GPUs. The README notes that dataset reproduction might not be 100% reproducible with the current OIG dataset from OpenChatKit, with updates promised. Inference with int8 quantization is noted as suboptimal compared to bf16.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.