sketch  by approximatelabs

AI code-writing assistant for Pandas dataframes

Created 3 years ago
2,290 stars

Top 19.8% on SourcePulse

GitHubView on GitHub
Project Summary

Sketch is an AI-powered code-writing assistant designed for pandas users, aiming to enhance data analysis workflows by understanding data context for more relevant suggestions. It targets data analysts and engineers seeking to streamline tasks like data cataloging, cleaning, analysis, and visualization without IDE plugin installations.

How It Works

Sketch leverages data sketches, a technique for summarizing large datasets efficiently, to provide context to large language models (LLMs). It summarizes dataframe columns and feeds these statistics into prompts for code generation. The project plans to integrate these sketches directly into custom "data + language" foundation models for improved accuracy.

Quick Start & Requirements

  • Install via pip: pip install sketch
  • Requires Python.
  • For apply functionality, an OpenAI API key is needed (OPENAI_API_KEY environment variable).
  • Local execution with Hugging Face models (MPT-7B, StarCoder) requires setting LAMBDAPROMPT_BACKEND, SKETCH_USE_REMOTE_LAMBDAPROMPT='False', and HF_ACCESS_TOKEN.
  • Demo available on Colab.

Highlighted Details

  • Integrates seamlessly with pandas DataFrames via a .sketch extension.
  • Offers ask for data understanding and howto for code generation.
  • apply enables data generation and feature engineering, built on LambdaPrompt.
  • Supports remote (prompts.approx.dev) and local LLM backends.

Maintenance & Community

  • Primarily developed by Approximate Labs.
  • Links to a demo are provided.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project is in active development, with the apply feature requiring an OpenAI API key for full functionality unless local models are configured. The README implies future enhancements to integrate data sketches more deeply with foundation models.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

DS-1000 by xlang-ai

0.4%
256
Benchmark for data science code generation
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.