autodoc by context-labs

Toolkit for auto-generating codebase documentation using LLMs

Created 2 years ago

2,294 stars

Top 19.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Cofounder of Sourcegraph

Project Summary

Autodoc is an experimental toolkit for automatically generating codebase documentation using Large Language Models (LLMs) like GPT-4. It indexes Git repositories by traversing files and using LLMs to create documentation, which is stored within the codebase itself. This allows developers to query their codebase for specific information and receive answers with direct code references, aiming to keep documentation synchronized with code changes via CI pipelines.

How It Works

Autodoc performs a depth-first traversal of a Git repository's contents. For each file, it calculates token count and selects an LLM (currently only OpenAI models are supported) based on cost and context length, prioritizing GPT-4 for better accuracy. The generated documentation is stored locally within the .autodoc folder, enabling CLI-based querying of the codebase.

Quick Start & Requirements

Install globally via npm: npm install -g @context-labs/autodoc
Requires Node.js v18.0.0+ (v19.0.0+ recommended).
Requires an OpenAI API key set as an environment variable: export OPENAI_API_KEY=<YOUR_KEY_HERE>.
Indexing command: doc index
Querying command: doc q
Official documentation: https://github.com/context-labs/autodoc

Highlighted Details

Generates documentation stored directly within the codebase.
Supports querying the codebase via a CLI tool.
Estimates indexing costs before execution.
Future support planned for self-hosted models (Llama, Alpaca) and a web version.

Maintenance & Community

Active development with a core team.
Community channels: Discord, Twitter.
Open to contributions.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Primarily targets Node.js environments. Requires OpenAI API access.

Limitations & Caveats

Autodoc is in early development and not production-ready. The README notes that response quality can vary, and a "naive model selection strategy" may use less accurate GPT-3.5 for smaller files. Indexing large projects can be costly, with estimates in the hundreds of dollars.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days