Interactive map for GitHub project discovery
Top 18.1% on sourcepulse
This project visualizes the GitHub ecosystem by mapping over 690,000 repositories based on shared stargazers. It targets developers and researchers interested in understanding GitHub's landscape, offering a unique way to discover project relationships and community clusters.
How It Works
The project leverages a massive dataset of GitHub activity events from Google BigQuery, processing approximately 500 million stars. It calculates Jaccard Similarity between repositories to quantify their shared audience. Leiden clustering is then applied to group similar projects, followed by a force-directed layout algorithm (ngraph.forcelayout) for visualization. The final map is rendered using MapLibre, with data converted to GeoJSON and tiles generated via tippecanoe.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is primarily maintained by the author, anvaka. Support can be sought via GitHub issues or Twitter.
Licensing & Compatibility
Released under the MIT license. Attribution is requested if the data is used in other works.
Limitations & Caveats
The initial data processing phase is resource-intensive, requiring substantial RAM and computation time. The visual design of the map is noted as an area for potential improvement.
2 months ago
Inactive