Discover and explore top open-source AI tools and projects—updated daily.
arcprizeCLI tool for benchmarking LLMs on ARC-AGI tasks
Top 83.1% on SourcePulse
This repository provides a framework for benchmarking Large Language Models (LLMs) on the ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) dataset. It enables researchers and developers to systematically evaluate and compare the performance of various LLMs across different configurations and tasks within the ARC-AGI benchmark.
How It Works
The framework utilizes a modular adapter system to interface with different LLM providers. Users define model configurations, including provider details, model names, and API parameters, in a models.yml file. The core execution script (main.py) takes a data directory, a model configuration, and optionally a specific task ID, to run predictions. It supports single-task testing, batch processing with concurrency (using parallel), and submission management for uploading results to Hugging Face.
Quick Start & Requirements
git clone https://github.com/arcprizeorg/model_baseline.git followed by pip install -r requirements.txt.git, parallel (optional, for concurrency).Highlighted Details
models.yml) and adapter pattern.test_providers.sh script for validating new provider implementations.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
huggingface-cli login.1 week ago
1+ week
carlini
mlfoundations
groq
huggingface