ChainForge  by ianarawjo

Visual environment for LLM prompt battle-testing

Created 2 years ago
2,767 stars

Top 17.2% on SourcePulse

GitHubView on GitHub
Project Summary

ChainForge is an open-source visual programming environment designed for prompt engineering and evaluating Large Language Models (LLMs). It allows users to rapidly test and compare prompt variations, models, and settings, facilitating systematic analysis and optimization of LLM interactions. The target audience includes researchers, developers, and anyone needing to rigorously assess LLM performance.

How It Works

ChainForge utilizes a data flow paradigm, built on ReactFlow and Flask, to construct complex LLM interaction pipelines. Users visually connect nodes representing prompts, LLM providers, and evaluation metrics. This architecture enables combinatorial testing by taking the cross-product of prompt inputs and model configurations, allowing for efficient, large-scale querying and comparison of LLM responses.

Quick Start & Requirements

  • Install via pip: pip install chainforge
  • Run locally: chainforge serve
  • Access at: localhost:8000
  • Requires Python 3.8+
  • Supported providers include OpenAI, Anthropic, Google Gemini/PaLM2, HuggingFace, Ollama, and more.
  • Docker installation is also available.
  • Documentation: https://chainforge.ai/
  • Web Play version: https://chainforge.ai/play/

Highlighted Details

  • Visual prompt engineering and LLM hypothesis testing.
  • Supports comparison across prompts, prompt parameters, and models.
  • Features include prompt permutations, model setting variations, and evaluation nodes.
  • Built-in AI features for synthetic data generation and accelerated prompt engineering.
  • Interactive response inspector with exportable data.
  • Chat turn functionality for multi-turn conversations.

Maintenance & Community

Developed by Ian Arawjo and collaborators from Harvard HCI's Glassman Lab, with support from NSF grants. Open to collaborators via GitHub Issues and Pull Requests.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The web version has a limited feature set. Sharing flows is restricted to 10 flows at a time, each under 5MB, with older links breaking if the limit is exceeded. Visualization nodes currently support only numeric and boolean metrics.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
1
Star History
45 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
2 more.

YiVal by YiVal

0.1%
2k
Prompt engineering assistant for GenAI apps
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.