CodeT5 by salesforce

Code LLMs for code understanding and generation research

Created 4 years ago

3,097 stars

Top 15.3% on SourcePulse

View on GitHub

8 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Elvis Saravia

Founder of DAIR.AI

Beyang Liu

Cofounder of Sourcegraph

Eugene Yan

AI Scientist at AWS

and 4 more!

Project Summary

CodeT5 and CodeT5+ are open-source large language models from Salesforce Research designed for code understanding and generation. They aim to boost developer productivity by providing capabilities like text-to-code generation, code autocompletion, and code summarization, functioning as an AI-powered coding assistant.

How It Works

CodeT5 utilizes a unified encoder-decoder architecture, pre-trained on a massive corpus of code and natural language. Its identifier-aware approach enhances understanding of code structure and semantics. CodeT5+ builds upon this foundation with further architectural improvements and training strategies, as detailed in its associated research papers.

Quick Start & Requirements

Models are available on HuggingFace. Installation typically involves transformers library. Specific requirements depend on the model size and task, often including Python and PyTorch.

Highlighted Details

Supports text-to-code generation, code autocompletion, and code summarization.
CodeT5+ models released in May 2023.
CodeRL paper and associated checkpoints released in July 2022.
Models are available for various tasks, including multilingual code summarization.

Maintenance & Community

The project is actively maintained by Salesforce Research. Users can get involved by creating GitHub issues or submitting Pull Requests. Contact is encouraged via email for application sharing.

Licensing & Compatibility

Released under the BSD-3 License. However, usage is restricted from promoting or profiting from harmful activities. Commercial use is permitted, but users are encouraged to document high-stakes applications.

Limitations & Caveats

While powerful, the models are research releases. Specific performance and limitations are detailed in the associated academic papers. Users are encouraged to report applications and use appropriate documentation for high-stakes scenarios.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days