CodeTF by salesforce

Transformer library for code LLMs and code intelligence tasks

Created 2 years ago

1,481 stars

Top 27.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Cofounder of Cloudera

Project Summary

CodeTF is a comprehensive Python library for code Large Language Models (Code LLMs) and code intelligence, targeting researchers and developers. It simplifies training, fine-tuning, and inference for tasks like code generation, summarization, and translation, offering a unified interface to state-of-the-art models and benchmarks.

How It Works

CodeTF leverages the HuggingFace Transformers ecosystem, providing optimized pipelines for serving pre-quantized models (int8, int16, float16) with features like weight sharding for large models. It integrates HuggingFace PEFT for efficient fine-tuning and uses tree-sitter for robust Abstract Syntax Tree (AST) parsing across 15+ programming languages, enabling detailed code attribute extraction and manipulation.

Quick Start & Requirements

Install via pip: pip install salesforce-codetf
Additional dependencies for quantization: pip install -U git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/accelerate.git
HuggingFace login required for some models (e.g., StarCoder): huggingface-cli login
Documentation: Documentation
Examples: Examples

Highlighted Details

Supports 10+ Code LLM architectures (CodeT5, StarCoder, CodeGen, etc.) with various sizes.
Offers simplified fine-tuning (14 LOCs vs. ~300 LOCs) and evaluation (14 LOCs vs. ~230 LOCs) pipelines.
Includes utilities for code manipulation, such as AST parsing and comment removal for multiple languages.
Preprocesses popular benchmarks like HumanEval, MBPP, and CodeXGLUE for easy loading.

Maintenance & Community

Developed by Salesforce.
Contact: codetf@salesforce.com
Technical Report: Technical Report

Licensing & Compatibility

License: Apache License Version 2.0
Compatible with commercial use and closed-source linking.

Limitations & Caveats

CodeTF is designed to complement HuggingFace Transformers; users needing extensive customization may prefer building from scratch. The library does not guarantee infallible code intelligence and advises users to examine models for potential inaccuracies, biases, or security risks before adoption.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days