StableLM  by Stability-AI

Language models by Stability AI

Created 2 years ago
15,794 stars

Top 3.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Stability AI's StableLM series of language models, offering a range of sizes and fine-tuned versions for various applications. It targets researchers and developers looking for open-source LLMs, with models like StableLM-3B-4E1T and StableLM-Alpha v2 available, aiming to provide competitive performance with established models.

How It Works

StableLM models are decoder-only transformers, largely based on the LLaMA architecture. Key modifications include Rotary Position Embeddings applied to the first 25% of head embedding dimensions for improved throughput and LayerNorm with learned bias terms instead of RMSNorm. The models are trained on large, filtered datasets including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder, with specific versions trained on up to 4 trillion tokens across multiple epochs to study the impact of repeated data.

Quick Start & Requirements

  • Install/Run: Use Hugging Face transformers library.
  • Prerequisites: Python, PyTorch, transformers. GPU recommended for inference.
  • Demo: Hugging Face Spaces available for the 7B model.
  • Docs: Technical reports and configuration files are linked for detailed information.

Highlighted Details

  • StableLM-3B-4E1T achieves state-of-the-art performance at the 3B parameter scale and is competitive with many 7B models.
  • StableLM-Alpha v2 models incorporate architectural improvements like SwiGLU and use higher-quality data sources, extending context length to 4096 tokens.
  • StableVicuna-13B is an RLHF fine-tune of Vicuna-13B, aiming to be an open-source RLHF LLM Chatbot.
  • Examples demonstrate capabilities in chit-chat, formal writing, creative writing (rap battles, stories), and humor.

Maintenance & Community

  • The project is actively updated with new checkpoints.
  • Community involvement is encouraged via Discord for contributions and ideas.

Licensing & Compatibility

  • Base models (StableLM-Base-Alpha) are under CC BY-SA-4.0.
  • Fine-tuned models (StableLM-Tuned-Alpha, StableVicuna) are under CC BY-NC-SA-4.0 (Non-Commercial).
  • All code is under Apache License 2.0.
  • The non-commercial license for fine-tuned models restricts commercial use.

Limitations & Caveats

  • Fine-tuned models are explicitly licensed for non-commercial use.
  • As with any pre-trained LLM, responses may vary in quality and could include offensive content, though this is expected to improve with scale and feedback.
  • StableVicuna-13B delta weights require combining with the original LLaMA model due to its license.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Binyuan Hui Binyuan Hui(Research Scientist at Alibaba Qwen), and
3 more.

xgen by salesforce

0%
722
LLM research release with 8k sequence length
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.