dataset-generator  by metabase

AI dataset generator for realistic data

Created 3 months ago
676 stars

Top 50.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an AI-powered tool for generating realistic datasets for demos, learning, and dashboards, targeting developers and data analysts. It simplifies data creation through a conversational prompt builder and integrates with Metabase for immediate data exploration, offering free CSV/SQL exports after an initial low-cost preview.

How It Works

The core of the generator uses OpenAI's GPT-4o to interpret user prompts and create a detailed data specification (schema, business rules). Actual data rows are then generated locally using the Faker library based on this LLM-generated spec. This approach ensures that only the initial preview or schema definition incurs OpenAI costs; subsequent data exports are free and instantaneous.

Quick Start & Requirements

  • Install via npm install and run with npm run dev.
  • Prerequisites: Docker, OpenAI API key.
  • Setup involves cloning the repo, copying .env.example to .env.local, and adding the OpenAI API key.
  • The application runs at http://localhost:3000.
  • Metabase is launched on-demand via Docker.
  • Official Docs: https://github.com/metabase/dataset-generator

Highlighted Details

  • Conversational prompt builder for defining business type, schema, and row count.
  • Real-time data preview in the browser.
  • Exports data as CSV (single or multi-table ZIP) or SQL inserts.
  • One-click Metabase launch for data exploration.
  • Supports "One Big Table" (OBT) and "Star Schema" data structures.

Maintenance & Community

The project is maintained by Metabase. Further community interaction details are not specified in the README.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The generation process relies on an external OpenAI API key, incurring costs for data previews. While data exports are free, the quality and realism of the generated data are dependent on the LLM's interpretation of the prompt and the Faker library's capabilities.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.