sketch  by approximatelabs

AI code-writing assistant for Pandas dataframes

created 3 years ago
2,280 stars

Top 20.3% on sourcepulse

GitHubView on GitHub
Project Summary

Sketch is an AI-powered code-writing assistant designed for pandas users, aiming to enhance data analysis workflows by understanding data context for more relevant suggestions. It targets data analysts and engineers seeking to streamline tasks like data cataloging, cleaning, analysis, and visualization without IDE plugin installations.

How It Works

Sketch leverages data sketches, a technique for summarizing large datasets efficiently, to provide context to large language models (LLMs). It summarizes dataframe columns and feeds these statistics into prompts for code generation. The project plans to integrate these sketches directly into custom "data + language" foundation models for improved accuracy.

Quick Start & Requirements

  • Install via pip: pip install sketch
  • Requires Python.
  • For apply functionality, an OpenAI API key is needed (OPENAI_API_KEY environment variable).
  • Local execution with Hugging Face models (MPT-7B, StarCoder) requires setting LAMBDAPROMPT_BACKEND, SKETCH_USE_REMOTE_LAMBDAPROMPT='False', and HF_ACCESS_TOKEN.
  • Demo available on Colab.

Highlighted Details

  • Integrates seamlessly with pandas DataFrames via a .sketch extension.
  • Offers ask for data understanding and howto for code generation.
  • apply enables data generation and feature engineering, built on LambdaPrompt.
  • Supports remote (prompts.approx.dev) and local LLM backends.

Maintenance & Community

  • Primarily developed by Approximate Labs.
  • Links to a demo are provided.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project is in active development, with the apply feature requiring an OpenAI API key for full functionality unless local models are configured. The README implies future enhancements to integrate data sketches more deeply with foundation models.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Carol Willing Carol Willing(Core Contributor to CPython, Jupyter).

genai by rgbkrk

0%
352
IPython extension for generative AI assistance in Jupyter notebooks
created 3 years ago
updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Feedback? Help us improve.