testpilot  by githubnext

Unit test generator for npm packages using LLMs

Created 1 year ago
557 stars

Top 57.5% on SourcePulse

GitHubView on GitHub
Project Summary

TestPilot is a research prototype for automatically generating unit tests for JavaScript/TypeScript npm packages using large language models (LLMs). It targets researchers and developers exploring LLM-based test generation, offering a framework that requires no additional training data.

How It Works

TestPilot prompts an LLM with a test skeleton, including function signatures, body, and examples mined from documentation. The LLM's response is parsed into a runnable unit test. Optionally, failed tests trigger re-prompting with failure details for refinement. This approach avoids the need for example test-function pairs or reinforcement learning.

Quick Start & Requirements

  • Install: Install from a tarball (not on npm registry) or from source (npm install and npm run build in the root directory).
  • Prerequisites: Access to a Codex-style LLM with a completion API. Set TESTPILOT_LLM_API_ENDPOINT and TESTPILOT_LLM_AUTH_HEADERS environment variables.
  • Dependencies: Requires mocha for testing if not already a project dependency.
  • Docs: Research paper available on arXiv and IEEExplore.

Highlighted Details

  • Generates tests for exported functions in npm packages.
  • Supports re-prompting for test refinement upon failure.
  • Includes a benchmarking harness for evaluating performance on multiple packages.
  • Offers a reproduction mode to replay benchmark runs using recorded API requests and responses.

Maintenance & Community

  • Archived project; refer to neu-se/testpilot2.
  • Maintained by Max Schaefer, Frank Tip, and Sarah Nadi.
  • Not officially supported; file issues for questions or feedback.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

This version is archived and intended for research. For daily use, Copilot Chat is recommended. Reproduction mode may encounter issues with replaying refined tests due to system-specific failure messages.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

evalite by mattpocock

0.8%
864
TypeScript testing framework for LLM apps
Created 10 months ago
Updated 3 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Edward Z. Yang Edward Z. Yang(Research Engineer at Meta; Maintainer of PyTorch), and
5 more.

yet-another-applied-llm-benchmark by carlini

0.2%
1k
LLM benchmark for evaluating models on previously asked programming questions
Created 1 year ago
Updated 4 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

human-eval by openai

0.4%
3k
Evaluation harness for LLMs trained on code
Created 4 years ago
Updated 8 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Meng Zhang Meng Zhang(Cofounder of TabbyML), and
3 more.

qodo-cover by qodo-ai

0.2%
5k
CLI tool for AI-powered test generation and code coverage enhancement
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.