codeshell by WisdomShell

Code LLM for code generation, completion, and question answering

Created 2 years ago

1,636 stars

Top 25.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Jiaming Song

Chief Scientist at Luma AI

Project Summary

CodeShell is a 7B parameter, multilingual code large language model developed by PKU-KCL and Sichuan Tianfu Bank AI Team. It offers strong performance on code generation and understanding tasks, targeting developers seeking efficient coding assistance. The project provides a full-stack solution including models, IDE plugins, and deployment options, aiming to enhance software development workflows.

How It Works

CodeShell is built on a GPT-2 architecture, incorporating Grouped-Query Attention and RoPE positional embeddings. It was trained on 500 billion tokens of data, including GitHub, Stack, and StarCoder datasets, with rigorous deduplication and filtering. The model features an optimized tokenizer that improves Chinese language compression and supports an 8192 token context window.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.8+, PyTorch 2.0+, Transformers 4.32+, CUDA 11.8+ (recommended for GPU).
Usage: Models are available on Hugging Face. Code examples for generation, chat, and fill-in-the-middle are provided.
Resources: Quantized versions (4-bit) require ~6GB VRAM. C++ version supports CPU inference.
Demos: Web UI, CLI, and OpenAI-compatible API demos are available.
Docs: CodeShell GitHub, VSCode Plugin, IntelliJ Plugin.

Highlighted Details

Achieves state-of-the-art performance on HumanEval and MBPP benchmarks for 7B code models.
Offers a complete ecosystem with IDE plugins (VS Code, JetBrains) for seamless integration.
Supports lightweight deployment via C++ for CPU inference, enabling use on standard PCs.
Provides 4-bit quantization for reduced memory footprint and faster inference.

Maintenance & Community

The project is actively developed by PKU-KCL. Community discussions and support are available via GitHub issues for the main repository and associated plugins.

Licensing & Compatibility

The models are released under a custom license that permits commercial use under specific conditions: daily active users must not exceed 1 million, the entity cannot be a software or cloud service provider, and re-licensing is prohibited without permission. An application process is required for commercial use. The project also references the Apache 2.0 license.

Limitations & Caveats

Commercial use requires explicit permission via an email application process, which may introduce delays or restrictions. While performance is strong on benchmarks, real-world effectiveness may vary. The project mentions a multi-task evaluation system is "coming soon."

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days