Python script for LLM-driven GitHub repo analysis
Top 46.3% on sourcepulse
This Python script automates the analysis of GitHub repositories for Large Language Models (LLMs), extracting READMEs, repository structure, and non-binary file contents. It provides structured outputs with pre-formatted prompts to aid in comprehensive repo evaluation, targeting developers and researchers working with LLMs.
How It Works
The script uses an iterative traversal method to map repository structure, avoiding recursion limits. It selectively extracts text content from files, intelligently skipping binary files to ensure efficient processing and focus on analyzable data.
Quick Start & Requirements
pip install PyGithub tqdm
GITHUB_TOKEN
environment variable).python repototxt.py
and enter the repository URL.Highlighted Details
Maintenance & Community
Contributions are welcomed via pull requests and issue reporting.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Requires a GitHub Personal Access Token for operation. The script's effectiveness is dependent on the quality and format of the repository's files.
1 year ago
1 day