toolformer by conceptofmind

Open-source implementation of Toolformer research paper

Created 2 years ago

379 stars

Top 75.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Julien Chaumond

Cofounder of Hugging Face

Project Summary

This repository provides an open-source implementation of Toolformer, a language model capable of learning to use external tools via APIs. It targets researchers and developers looking to enhance LLM capabilities with external functionalities like calculators, search engines, and calendars, aiming to combine the broad knowledge of LLMs with the precision of specialized tools.

How It Works

Toolformer is trained in a self-supervised manner to predict API calls and integrate their results. It learns to decide which APIs to call, what arguments to pass, and how to use the returned information to predict subsequent tokens. This approach allows the model to improve its performance on tasks requiring factual lookup or computation without explicit fine-tuning for each tool.

Quick Start & Requirements

Inference:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
tokenizer = AutoTokenizer.from_pretrained(r"dmayhem93/toolformer_v0_epoch2")
model = AutoModelForCausalLM.from_pretrained(
    r"dmayhem93/toolformer_v0_epoch2",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
).cuda()
generator = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, device=0
)

Prerequisites: Python, PyTorch, Hugging Face Transformers, DeepSpeed. Requires a GPU with CUDA.
Tool Integration: Manual integration is required for tool outputs. A retrieval script is provided for setting up data.
Data Generation: Uses data_generator.py script, with each instance consuming a full GPU.
Training: Utilizes train_gptj_toolformer.py (a modified run_clm.py) with DeepSpeed.
Links: Hugging Face Models

Highlighted Details

Achieves improved zero-shot performance on various downstream tasks.
Supports retrieval functionality; calculation and calendar tools are in progress.
Requires manual integration of tool outputs during inference.
Data generation and training scripts are provided, leveraging DeepSpeed for distributed training.

Maintenance & Community

The project is associated with the original Meta AI research paper. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. The cited paper's copyright is "arXiv.org perpetual, non-exclusive license." Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Calculation and calendar tool functionalities are marked as work-in-progress with no guarantee of good results. Tool integration into the sampling process is manual, requiring user intervention to feed tool outputs back into the model.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days