CodeTF  by salesforce

Transformer library for code LLMs and code intelligence tasks

Created 2 years ago
1,480 stars

Top 27.8% on SourcePulse

GitHubView on GitHub
Project Summary

CodeTF is a comprehensive Python library for code Large Language Models (Code LLMs) and code intelligence, targeting researchers and developers. It simplifies training, fine-tuning, and inference for tasks like code generation, summarization, and translation, offering a unified interface to state-of-the-art models and benchmarks.

How It Works

CodeTF leverages the HuggingFace Transformers ecosystem, providing optimized pipelines for serving pre-quantized models (int8, int16, float16) with features like weight sharding for large models. It integrates HuggingFace PEFT for efficient fine-tuning and uses tree-sitter for robust Abstract Syntax Tree (AST) parsing across 15+ programming languages, enabling detailed code attribute extraction and manipulation.

Quick Start & Requirements

  • Install via pip: pip install salesforce-codetf
  • Additional dependencies for quantization: pip install -U git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/accelerate.git
  • HuggingFace login required for some models (e.g., StarCoder): huggingface-cli login
  • Documentation: Documentation
  • Examples: Examples

Highlighted Details

  • Supports 10+ Code LLM architectures (CodeT5, StarCoder, CodeGen, etc.) with various sizes.
  • Offers simplified fine-tuning (14 LOCs vs. ~300 LOCs) and evaluation (14 LOCs vs. ~230 LOCs) pipelines.
  • Includes utilities for code manipulation, such as AST parsing and comment removal for multiple languages.
  • Preprocesses popular benchmarks like HumanEval, MBPP, and CodeXGLUE for easy loading.

Maintenance & Community

Licensing & Compatibility

  • License: Apache License Version 2.0
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

CodeTF is designed to complement HuggingFace Transformers; users needing extensive customization may prefer building from scratch. The library does not guarantee infallible code intelligence and advises users to examine models for potential inaccuracies, biases, or security risks before adoption.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
5 more.

CodeXGLUE by microsoft

0.3%
2k
Benchmark for code intelligence tasks
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.