kiteco-public  by kiteco

Code analysis tool for Python, leveraging ML for enhanced developer experience

created 3 years ago
754 stars

Top 47.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository contains a public, adapted version of Kite's internal codebase, primarily focused on its Python code analysis infrastructure. It offers insights into how Kite analyzed GitHub repositories to build its AI-powered programming assistant, targeting developers interested in code analysis, static/dynamic analysis techniques, and large-scale data processing pipelines.

How It Works

The project details a multi-stage analysis pipeline for Python code. This includes a GitHub crawler, package introspection (analyzing imports and attributes), type induction (statistically estimating function return types), dynamic analysis (runtime type extraction from scripts), and extraction of return types from documentation. These stages leverage map-reduce paradigms, with specific mention of local pipelines and AWS EMR for workflow management. The output is a structured "SymbolGraph" and associated metadata like argument specifications and popular call patterns.

Quick Start & Requirements

  • Installation: Requires Go 1.15.3, Git LFS, and potentially TensorFlow (make install-libtensorflow).
  • Build: Primarily uses Go for the core daemon and Electron for the sidebar (./osx/build_electron.sh, ./linux/build_electron.sh, ./windows/build_electron.sh). Development builds can be run with make run-standalone.
  • Dependencies: Go Modules are used for dependency management.
  • Access: VPN access to AWS/Azure hosts and SSH credentials for remote machines are required for full functionality.
  • Data: Datasets and ML models are managed via Git LFS and S3, requiring a manual rebuild of datadeps (./scripts/build_datadeps.sh).

Highlighted Details

  • Python Analysis Pipeline: Comprehensive system for static and dynamic code analysis, type inference, and documentation parsing.
  • Go Monorepo: Core backend services are written in Go, with infrastructure managed by Terraform.
  • ML Integration: Utilizes TensorFlow for training models, with details on offline training and incremental updates for client deployments.
  • Custom Parser: Implemented a custom Python parser in Golang, robust to syntax errors.

Maintenance & Community

The repository is a public archive of a previously commercial product. While there are no active development or community channels mentioned, the extensive documentation and detailed pipeline descriptions serve as a historical record.

Licensing & Compatibility

The repository does not explicitly state a license. Given its origin as a public version of a private commercial codebase, commercial use or integration into closed-source projects may be restricted.

Limitations & Caveats

This repository is a snapshot of a past project and is not actively maintained. Many components are adapted with placeholders ("XXXXXXX") and may not function out-of-the-box. Access to proprietary datasets and backend infrastructure is not provided.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.