Streamline-Analyst  by Wilson-ZheLin

AI agent for streamlined data analysis

created 1 year ago
436 stars

Top 69.4% on sourcepulse

GitHubView on GitHub
Project Summary

Streamline Analyst is an AI-powered data analysis agent designed to automate and simplify the entire data analysis workflow for users of all expertise levels. It leverages Large Language Models (LLMs) to handle tasks from data cleaning and preprocessing to model selection, training, and visualization, aiming to expedite insights and high-performance modeling.

How It Works

The agent utilizes LLMs to intelligently guide the data analysis process. It automatically identifies target variables, recommends strategies for handling null values and encoding categorical features, and suggests dimensionality reduction techniques like PCA. The LLM-driven approach extends to data balancing, transformation, and even recommending optimal cluster numbers using methods like the Elbow Rule. It supports a wide range of classification, clustering, and regression models, automating their training and providing comprehensive evaluation metrics and visualizations.

Quick Start & Requirements

  • Install using pip install -r requirements.txt
  • Run with streamlit run app.py
  • Requires Python 3.11.5 and an OpenAI API Key (GPT-4 Turbo recommended, note free quota limitations).
  • A live demo is available: Streamline Analyst Demo

Highlighted Details

  • Automates target variable identification, null value management, data encoding, and dimensionality reduction (PCA).
  • Offers LLM-recommended strategies for data balancing (SMOTE, ADASYN) and dataset proportion adjustment.
  • Supports a broad spectrum of classification, clustering, and regression models, including XGBoost and various ensemble methods.
  • Provides automated calculation and visualization of model performance metrics (e.g., confusion matrix, R-squared, Silhouette score).
  • Includes a visual analysis toolkit with single/multi-attribute plots, 3D plotting, word clouds, and heat maps.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "cutting-edge" and mentions planned enhancements like NLP and object detection, suggesting it may still be under active development or evolving. The reliance on OpenAI API keys means costs are associated with using GPT-4 Turbo, and the project's functionality is tied to the availability and performance of these external services.

Health Check
Last commit

11 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.