git2txt  by addyosmani

CLI tool to convert GitHub repos to text files for LLMs

created 8 months ago
507 stars

Top 62.3% on sourcepulse

GitHubView on GitHub
Project Summary

This CLI tool converts GitHub repositories into single text files, ideal for LLM training, analysis, or documentation. It targets developers and researchers needing to process codebases efficiently, offering automatic binary exclusion and configurable file size limits.

How It Works

The tool recursively downloads a specified GitHub repository, excluding binary files and those exceeding a configurable size threshold (default 100KB). It then concatenates the content of eligible files into a single text file, clearly marking each file's path and size before its content. This approach simplifies large-scale code processing for AI models.

Quick Start & Requirements

  • Primary install: npm install -g git2txt
  • Prerequisites: Node.js and npm.
  • Usage: git2txt username/repository or git2txt https://github.com/username/repository.
  • Options: --output, --threshold, --include-all, --debug.
  • Examples and detailed usage are available in the README.

Highlighted Details

  • Supports HTTPS, short format, and SSH GitHub repository URLs.
  • Automatically excludes binary files and .git directories.
  • Configurable file size threshold to manage output size.
  • Cross-platform compatibility (Windows, macOS, Linux).

Maintenance & Community

The project is open for contributions, with a guide available for interested parties.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

By default, files larger than 100KB and binary files are excluded, which might require using the --include-all flag for comprehensive processing.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

repomix by yamadashy

0.8%
18k
CLI tool to pack codebases into AI-friendly formats for LLMs
created 1 year ago
updated 5 days ago
Feedback? Help us improve.