hercules  by src-d

CLI tool for Git history analysis, providing insights into repository evolution

created 8 years ago
2,721 stars

Top 17.8% on sourcepulse

GitHubView on GitHub
Project Summary

Hercules is a Go-based engine for deep Git repository analysis, designed for developers and researchers seeking insights into project evolution, team dynamics, and code contributions. It offers a highly customizable DAG of analyses, generating metrics like project burndown, code churn, and developer collaboration patterns, with a companion Python script for visualization.

How It Works

Hercules processes Git repositories using the go-git library, executing a configurable Directed Acyclic Graph (DAG) of analysis tasks. It supports custom analyses via plugins and can merge results from multiple runs. The labours Python script visualizes the data, offering features like resampling and custom plotting backends. This approach allows for comprehensive, single-pass analysis of complex Git histories.

Quick Start & Requirements

  • Install labours: pip3 install labours
  • Build Hercules from source: Requires Go (>= v1.11) and protoc. Clone repo, run make, then pip3 install -e ./python.
  • Pre-built binaries available on the Releases page.
  • Official GitHub Action available.
  • Documentation: Overview, Installation, Usage Examples

Highlighted Details

  • Performance: Claims to be significantly faster than tools like git-of-theseus for burndown analysis.
  • Customization: Supports custom analyses via plugins and configurable DAGs.
  • Visualization: Integrates with labours for plotting, including support for Tensorflow Projector for embeddings.
  • Sentiment Analysis: Can analyze code comments for sentiment using a BiDiSentiment model (requires libtensorflow and building Hercules with TAGS=tensorflow).

Maintenance & Community

  • Development is gradually resuming after a hiatus.
  • Contributions are welcomed. See CONTRIBUTING and code of conduct.
  • Roadmap includes switching to go-git/go-git, updating docs, fixing bugs, and removing the Babelfish dependency.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Burndown analysis can cause Out-Of-Memory errors on large, branching repositories; workarounds include using --first-parent, disk caching, or hibernation features.
  • YAML parsing can be slow for large datasets; Protocol Buffers (--pb) is recommended as an alternative.
  • Dependency on the abandoned Babelfish library for code parsing is noted in the roadmap for removal.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

AutoPR by irgolic

0.1%
1k
AI-powered workflows for codebase automation
created 2 years ago
updated 1 year ago
Feedback? Help us improve.