starcoder2 by bigcode-project

Code generation model family (3B, 7B, 15B) for code completion

Created 2 years ago

2,024 stars

Top 21.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Thomas Wolf

Cofounder of Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

StarCoder2 is a family of large language models designed for code generation, supporting over 600 programming languages. It targets developers and researchers seeking advanced code completion and generation capabilities. The models offer significant improvements in code understanding and generation accuracy due to their extensive training data and architectural enhancements.

How It Works

StarCoder2 models utilize Grouped Query Attention and a 16,384 token context window with a 4,096 token sliding window attention mechanism. This architecture allows for processing longer code sequences and capturing more complex dependencies, leading to more coherent and contextually relevant code generation. The models are trained on over 3 trillion tokens of code and natural language data, providing a broad understanding of programming languages and software development patterns.

Quick Start & Requirements

Installation: pip install -r requirements.txt and pip install git+https://github.com/huggingface/transformers.git. Requires Hugging Face Hub token (export HF_TOKEN=xxx).
Prerequisites: Python, transformers library (from source), PyTorch (with CUDA 12.1 support recommended for fine-tuning).
Resources: StarCoder2-15B in full precision requires ~32GB VRAM. Quantized versions (8-bit, 4-bit) significantly reduce memory footprint to ~17GB and ~9GB respectively.
Links: Models & Datasets, Paper, Text-generation-inference.

Highlighted Details

Available in 3B, 7B, and 15B parameter sizes.
Trained on The Stack v2 dataset, covering 600+ programming languages.
Supports 16k context window with sliding window attention.
Fine-tuning examples provided using PEFT, bitsandbytes, and TRL for efficient adaptation.

Maintenance & Community

The project is part of the BigCode initiative, a collaboration involving Hugging Face and ServiceNow. Further resources and community discussions can be found via Hugging Face and related GitHub repositories.

Licensing & Compatibility

The models are released under the BigCode OpenRAIL-M license. This license permits commercial use but includes specific use-case restrictions to prevent misuse.

Limitations & Caveats

StarCoder2 models are primarily for code completion and may not perform well on instruction-following tasks. The README notes that some PRs for transformers might still need merging for full compatibility.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days