auto-j by GAIR-NLP

Generative judge for evaluating LLM alignment

Created 2 years ago

250 stars

Top 100.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

Generative Judge for Evaluating Alignment (Auto-J) is an open-source tool designed to evaluate Large Language Models (LLMs) on their alignment with human preferences. It addresses the need for reliable, interpretable LLM assessment by providing a generative judge capable of handling diverse real-world scenarios. Auto-J is beneficial for researchers and developers seeking to benchmark LLM performance and identify areas for improvement in alignment.

How It Works

Auto-J operates as a generative model trained on a broad dataset of real-world user queries and LLM responses across 58 scenarios. It offers flexibility by supporting both pairwise response comparison (determining which of two responses is superior) and single-response evaluation (providing critiques and ratings). The core innovation lies in its ability to generate detailed, natural language critiques, which enhance the transparency and reliability of the evaluation process and facilitate human involvement.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -r requirements.txt. Creating a virtual environment (e.g., with conda) is recommended.
Prerequisites: Python 3.10 is required. A compatible PyTorch version for your CUDA setup (e.g., torch>=2.0.1+cu118) is necessary, and GPU(s) are essential for running the models.
Model Access: Pre-trained models are available on Hugging Face Hub.
Links:
- Main Model: 🤗 GAIR/autoj-13b
- Bilingual Model: 🤗 GAIR/autoj-bilingual-6b
- 4-bit Quantized: 🤗 GAIR/autoj-13b-GPTQ-4bits

Highlighted Details

Auto-J demonstrates strong performance in pairwise response comparison, achieving a 62.28% agreement rate with human preference, surpassing many other evaluated models.
In critique generation tasks, Auto-J achieves a 73.7% win rate against a reference model (ChatGPT), as judged by GPT-4.
A bilingual (Chinese/English) 6B model (autoj-bilingual-6b) is available for multilingual evaluation.
A 4-bit quantized version (autoj-13b-GPTQ-4bits) is provided, reducing VRAM requirements to approximately 8GB.

Maintenance & Community

The project acknowledges computing resource support from Shanghai AI Lab and contributions for the bilingual version and human annotation. It builds upon the PKU-Alignment/safe-rlhf and vllm-project/vllm projects. No explicit community channels (e.g., Discord, Slack) or a public roadmap are detailed in the provided information.

Licensing & Compatibility

The primary Auto-J (13B) and Auto-J-Scenario-Classifier (13B) models are licensed under Llama 2.
The Auto-J-Bilingual (6B) model is released under the Yi License.
Commercial use compatibility is contingent on the specific terms of the Llama 2 and Yi licenses.

Limitations & Caveats

The 4-bit quantized version may exhibit behavioral differences compared to the original model. The bilingual version has known issues, including occasional code-switching and limitations in mathematical and coding capabilities. Model deployment requires specific tensor_parallel_size configurations (e.g., 1, 2, 4, 8) due to the underlying vLLM implementation.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days