SandboxFusion  by bytedance

Secure code sandbox for LLM-generated code execution and evaluation

Created 11 months ago
597 stars

Top 54.7% on SourcePulse

GitHubView on GitHub
Project Summary

SandboxFusion provides a secure, containerized environment for executing and evaluating code generated by Large Language Models (LLMs). It is designed for researchers and developers working with LLM-based code generation, offering support for numerous programming languages and popular code evaluation benchmarks.

How It Works

The system utilizes Docker containers to isolate code execution, ensuring security and reproducibility. It supports a wide array of languages including Python, C++, Java, Go, Node.js, and even CUDA for GPU acceleration. SandboxFusion also integrates with various code evaluation datasets like HumanEval, MultiPL-E, and MBPP, facilitating robust benchmarking of LLM-generated code.

Quick Start & Requirements

  • Installation: Via Docker or manual setup with conda and poetry.
  • Prerequisites: conda, poetry. For Docker, a base image is provided, with instructions to customize the server image.
  • Resources: Requires Docker or a Python 3.12 environment with conda and poetry.
  • Documentation: https://bytedance.github.io/SandboxFusion/

Highlighted Details

  • Supports 20+ programming languages, including Python and CUDA with GPU acceleration.
  • Includes implementations for numerous LLM code evaluation benchmarks (HumanEval, MBPP, etc.).
  • Offers both a code runner and an online judge for evaluation and RL datasets.
  • Provides comprehensive testing utilities for development and validation.

Maintenance & Community

The project lists several contributors from Bytedance. Further community engagement details (e.g., Discord, Slack) are not specified in the README.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known issues. The setup for specific language runtimes requires manual execution of provided shell scripts.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
1
Issues (30d)
0
Star History
53 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.