toolformer  by conceptofmind

Open-source implementation of Toolformer research paper

created 2 years ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides an open-source implementation of Toolformer, a language model capable of learning to use external tools via APIs. It targets researchers and developers looking to enhance LLM capabilities with external functionalities like calculators, search engines, and calendars, aiming to combine the broad knowledge of LLMs with the precision of specialized tools.

How It Works

Toolformer is trained in a self-supervised manner to predict API calls and integrate their results. It learns to decide which APIs to call, what arguments to pass, and how to use the returned information to predict subsequent tokens. This approach allows the model to improve its performance on tasks requiring factual lookup or computation without explicit fine-tuning for each tool.

Quick Start & Requirements

  • Inference:
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    tokenizer = AutoTokenizer.from_pretrained(r"dmayhem93/toolformer_v0_epoch2")
    model = AutoModelForCausalLM.from_pretrained(
        r"dmayhem93/toolformer_v0_epoch2",
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True,
    ).cuda()
    generator = pipeline(
        "text-generation", model=model, tokenizer=tokenizer, device=0
    )
    
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, DeepSpeed. Requires a GPU with CUDA.
  • Tool Integration: Manual integration is required for tool outputs. A retrieval script is provided for setting up data.
  • Data Generation: Uses data_generator.py script, with each instance consuming a full GPU.
  • Training: Utilizes train_gptj_toolformer.py (a modified run_clm.py) with DeepSpeed.
  • Links: Hugging Face Models

Highlighted Details

  • Achieves improved zero-shot performance on various downstream tasks.
  • Supports retrieval functionality; calculation and calendar tools are in progress.
  • Requires manual integration of tool outputs during inference.
  • Data generation and training scripts are provided, leveraging DeepSpeed for distributed training.

Maintenance & Community

The project is associated with the original Meta AI research paper. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. The cited paper's copyright is "arXiv.org perpetual, non-exclusive license." Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Calculation and calendar tool functionalities are marked as work-in-progress with no guarantee of good results. Tool integration into the sampling process is manual, requiring user intervention to feed tool outputs back into the model.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ToolBench by OpenBMB

0.1%
5k
Open platform for LLM tool learning (ICLR'24 spotlight)
created 2 years ago
updated 2 months ago
Feedback? Help us improve.