Discover and explore top open-source AI tools and projects—updated daily.
shuyanzhouCode generation via documentation retrieval
Top 99.8% on SourcePulse
This project provides the official implementation for "DocPrompting: Generating Code by Retrieving the Docs," an approach to natural language-to-code generation that explicitly leverages documentation. It addresses the challenge of keeping code generation models current with evolving APIs by retrieving relevant documentation before generating code. The target audience includes researchers and engineers in NLP and software engineering, offering a method to improve code generation accuracy and relevance.
How It Works
DocPrompting employs a two-stage pipeline: first, it retrieves relevant documentation snippets based on a natural language intent using either dense retrieval (e.g., CodeT5 with SimCSE) or sparse retrieval (e.g., BM25). Second, a generative model (e.g., FiD T5 or CodeT5) produces code conditioned on both the original natural language intent and the retrieved documentation. This retrieval-augmented generation approach aims to ground code generation in up-to-date API specifications.
Quick Start & Requirements
tldr, conala) and evaluation metrics are available via Huggingface datasets and evaluate libraries.
import datasets
tldr = datasets.load_dataset('neulab/tldr')
conala = datasets.load_dataset('neulab/docprompting-conala')
neulab/docprompting-tldr-gpt-neo-1.3B.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("neulab/docprompting-tldr-gpt-neo-1.3B")
model = AutoModelForCausalLM.from_pretrained("neulab/docprompting-tldr-gpt-neo-1.3B")
transformers library (version 3.0.2 specifically required for FiD), datasets, evaluate. GPU is recommended for training and inference. Elasticsearch is needed for BM25 retrieval.docprompting_data.zip) and generator models (docprompting_generator_models.zip).Highlighted Details
tldr) and NL-to-Python (CoNaLa) benchmarks with unseen function splits.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
transformers version 3.0.2, which may hinder reproducibility if not precisely matched.1 year ago
Inactive
superagent-ai
context-labs
VHellendoorn