reflexion by noahshinn

Language agent research paper using verbal reinforcement learning

Created 2 years ago

3,023 stars

Top 15.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Shyamal Anadkat

Research Scientist at OpenAI

Simon Willison

Coauthor of Django

Project Summary

Reflexion provides a framework for language agents that learn from their mistakes through verbal reinforcement learning, enhancing performance on complex reasoning and decision-making tasks. It is targeted at AI researchers and developers building advanced language agents.

How It Works

Reflexion agents augment standard language models with a mechanism for self-reflection and memory. After an initial attempt, the agent generates a "reflection" on its errors, which is then incorporated as context for subsequent attempts. This iterative process allows the agent to learn from past failures and improve its strategy over time, mimicking human learning.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt (within hotpotqa_runs or alfworld_runs directories).
Set OPENAI_API_KEY environment variable.
For decision-making tasks, run ./run_reflexion.sh after configuring run_reflexion.sh.
For reasoning tasks, run notebooks in ./hotpotqa_runs/notebooks/.
Requires Python and OpenAI API access (GPT-4 recommended).

Highlighted Details

Implements ReAct, CoT (with/without context) agent types.
Supports multiple reflection strategies: NONE, LAST_ATTEMPT, REFLEXION, LAST_ATTEMPT_AND_REFLEXION.
Includes pre-computed logs for reasoning (HotPotQA), decision-making (AlfWorld), and programming tasks.

Maintenance & Community

Project associated with NeurIPS 2023.
Contact: noahrshinn@gmail.com.

Licensing & Compatibility

License not explicitly stated in the README.
Requires OpenAI API, which has its own terms of service and costs.

Limitations & Caveats

Rerunning experiments may be infeasible for individual developers due to GPT-4 access limitations and significant API costs. The project focuses on specific benchmarks and may require adaptation for other tasks.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

40 stars in the last 30 days