CodeTF  by salesforce

Transformer library for code LLMs and code intelligence tasks

created 2 years ago
1,477 stars

Top 28.4% on sourcepulse

GitHubView on GitHub
Project Summary

CodeTF is a comprehensive Python library for code Large Language Models (Code LLMs) and code intelligence, targeting researchers and developers. It simplifies training, fine-tuning, and inference for tasks like code generation, summarization, and translation, offering a unified interface to state-of-the-art models and benchmarks.

How It Works

CodeTF leverages the HuggingFace Transformers ecosystem, providing optimized pipelines for serving pre-quantized models (int8, int16, float16) with features like weight sharding for large models. It integrates HuggingFace PEFT for efficient fine-tuning and uses tree-sitter for robust Abstract Syntax Tree (AST) parsing across 15+ programming languages, enabling detailed code attribute extraction and manipulation.

Quick Start & Requirements

  • Install via pip: pip install salesforce-codetf
  • Additional dependencies for quantization: pip install -U git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/accelerate.git
  • HuggingFace login required for some models (e.g., StarCoder): huggingface-cli login
  • Documentation: Documentation
  • Examples: Examples

Highlighted Details

  • Supports 10+ Code LLM architectures (CodeT5, StarCoder, CodeGen, etc.) with various sizes.
  • Offers simplified fine-tuning (14 LOCs vs. ~300 LOCs) and evaluation (14 LOCs vs. ~230 LOCs) pipelines.
  • Includes utilities for code manipulation, such as AST parsing and comment removal for multiple languages.
  • Preprocesses popular benchmarks like HumanEval, MBPP, and CodeXGLUE for easy loading.

Maintenance & Community

Licensing & Compatibility

  • License: Apache License Version 2.0
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

CodeTF is designed to complement HuggingFace Transformers; users needing extensive customization may prefer building from scratch. The library does not guarantee infallible code intelligence and advises users to examine models for potential inaccuracies, biases, or security risks before adoption.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Feedback? Help us improve.