ReAct  by ysymyth

GPT-3 prompting code for ReAct research paper

created 2 years ago
2,873 stars

Top 16.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for ReAct prompting, a method that synergizes reasoning and acting in language models. It is targeted at researchers and practitioners interested in improving LLM performance on complex tasks requiring multi-step decision-making and interaction with external tools or environments. The primary benefit is enhanced task completion through a more robust reasoning process.

How It Works

ReAct combines the strengths of Chain-of-Thought (CoT) prompting for reasoning and standard prompting for acting. It allows language models to generate intermediate reasoning traces (like CoT) and then take actions based on those traces, observing the results, and iterating. This approach enables models to interact with environments, search for information, or use tools, leading to more grounded and effective decision-making.

Quick Start & Requirements

  • Install the openai package.
  • Install alfworld following its specific instructions.
  • Set the OPENAI_API_KEY environment variable.
  • Run experiments via .ipynb notebooks (e.g., hotpotqa.ipynb).

Highlighted Details

  • Implements ReAct prompting for tasks like HotpotQA, FEVER, AlfWorld, and WebShop.
  • Benchmarks show GPT-3 (davinci-002) outperforming PaLM-540B on AlfWorld and HotpotQA (with a smaller sample size).
  • Paper is published at ICLR 2023.
  • Offers a link to the arXiv paper for detailed methodology.

Maintenance & Community

The project is associated with the ICLR 2023 paper "ReAct: Synergizing Reasoning and Acting in Language Models" by Yao et al. Further development and broader adoption are suggested via LangChain's zero-shot ReAct Agent.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README notes that experiments on HotpotQA and FEVER use only 500 random validation examples due to dataset size. Performance may vary with different model versions or full dataset evaluation.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
281 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.