RepoToTextForLLMs  by Doriandarko

Python script for LLM-driven GitHub repo analysis

created 1 year ago
769 stars

Top 46.3% on sourcepulse

GitHubView on GitHub
Project Summary

This Python script automates the analysis of GitHub repositories for Large Language Models (LLMs), extracting READMEs, repository structure, and non-binary file contents. It provides structured outputs with pre-formatted prompts to aid in comprehensive repo evaluation, targeting developers and researchers working with LLMs.

How It Works

The script uses an iterative traversal method to map repository structure, avoiding recursion limits. It selectively extracts text content from files, intelligently skipping binary files to ensure efficient processing and focus on analyzable data.

Quick Start & Requirements

  • Install: pip install PyGithub tqdm
  • Prerequisites: Python, GitHub Personal Access Token (as GITHUB_TOKEN environment variable).
  • Usage: Run python repototxt.py and enter the repository URL.

Highlighted Details

  • README retrieval for initial insights.
  • Structured repository traversal without recursion limits.
  • Selective extraction of text file contents, skipping binaries.
  • Outputs include analysis prompts for LLM guidance.

Maintenance & Community

Contributions are welcomed via pull requests and issue reporting.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Requires a GitHub Personal Access Token for operation. The script's effectiveness is dependent on the quality and format of the repository's files.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

repomix by yamadashy

0.8%
18k
CLI tool to pack codebases into AI-friendly formats for LLMs
created 1 year ago
updated 5 days ago
Feedback? Help us improve.