Code generation model for multilingual programming
Top 6.0% on sourcepulse
CodeGeeX is a 13-billion parameter, open-source, multilingual code generation model designed for tasks like code completion, translation, and summarization. It targets developers and researchers seeking to leverage large language models for programming assistance and evaluation across multiple languages.
How It Works
CodeGeeX is a transformer-based decoder-only model trained on a corpus of over 158.7 billion tokens spanning 23 programming languages. It utilizes a vocabulary of 50,400 tokens, processing whitespaces as separate tokens. The model architecture features 40 transformer layers with a hidden size of 5,120 and an expanded feed-forward layer size of 20,480, supporting a maximum sequence length of 2,048.
Quick Start & Requirements
pip install -e .
or use the provided Docker image (docker pull codegeex/codegeex:latest
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model weights license is not explicitly detailed, potentially impacting commercial use. While competitive, performance can vary across language pairs for translation tasks.
11 months ago
Inactive