Open-source implementation of Toolformer research paper
Top 78.1% on sourcepulse
This repository provides an open-source implementation of Toolformer, a language model capable of learning to use external tools via APIs. It targets researchers and developers looking to enhance LLM capabilities with external functionalities like calculators, search engines, and calendars, aiming to combine the broad knowledge of LLMs with the precision of specialized tools.
How It Works
Toolformer is trained in a self-supervised manner to predict API calls and integrate their results. It learns to decide which APIs to call, what arguments to pass, and how to use the returned information to predict subsequent tokens. This approach allows the model to improve its performance on tasks requiring factual lookup or computation without explicit fine-tuning for each tool.
Quick Start & Requirements
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
tokenizer = AutoTokenizer.from_pretrained(r"dmayhem93/toolformer_v0_epoch2")
model = AutoModelForCausalLM.from_pretrained(
r"dmayhem93/toolformer_v0_epoch2",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
).cuda()
generator = pipeline(
"text-generation", model=model, tokenizer=tokenizer, device=0
)
data_generator.py
script, with each instance consuming a full GPU.train_gptj_toolformer.py
(a modified run_clm.py
) with DeepSpeed.Highlighted Details
Maintenance & Community
The project is associated with the original Meta AI research paper. No specific community channels or active maintenance signals are detailed in the README.
Licensing & Compatibility
The repository itself does not explicitly state a license. The cited paper's copyright is "arXiv.org perpetual, non-exclusive license." Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Calculation and calendar tool functionalities are marked as work-in-progress with no guarantee of good results. Tool integration into the sampling process is manual, requiring user intervention to feed tool outputs back into the model.
2 years ago
1 day