llm-tldr  by parcadei

Code analysis and context optimization for LLMs

Created 1 month ago
841 stars

Top 42.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LLM-TLDR provides a solution for efficiently processing large codebases for AI agents by extracting structured code information, drastically reducing token usage and query latency. It targets developers and AI systems needing to understand, refactor, or debug code without overwhelming LLM context windows. The primary benefit is enabling LLMs to work with codebases far larger than their native token limits, offering significant performance gains.

How It Works

The project employs a multi-layered analysis approach, building five distinct layers: Abstract Syntax Tree (AST), Call Graph, Control Flow Graph (CFG), Data Flow Graph (DFG), and Program Dependence Graph (PDG). This granular analysis allows for task-specific depth, from simple structure browsing to complex data flow tracing. These layers feed into a semantic indexing system using 1024-dimensional embeddings generated by bge-large-en-v1.5 and stored in FAISS. This enables natural language search based on code behavior rather than just keywords. A background daemon maintains in-memory indexes for near-instantaneous queries.

Quick Start & Requirements

  • Install: pip install llm-tldr
  • Prerequisites: Python, tree-sitter (handled by pip), bge-large-en-v1.5, faiss-cpu.
  • Setup: Indexing a project with tldr warm . takes approximately 30-60 seconds for a typical project, after which queries are sub-100ms.
  • Documentation: Full Documentation available at [link not provided in README, but implied].

Highlighted Details

  • Achieves up to 95% token savings for LLM context and 89% for codebase overviews.
  • Reduces query latency from ~30 seconds to ~100ms via an in-memory daemon.
  • Supports integration with AI tools like Claude Desktop and Claude Code via MCP.
  • Enables semantic search for code by behavior, e.g., finding "validate JWT tokens" without exact text matches.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source linking.

Limitations & Caveats

The daemon requires explicit notification of file changes (e.g., via git hooks or editor integrations) to maintain index freshness, though it auto-rebuilds after a configurable threshold of changes. Monorepo support requires specific configuration files (.claude/workspace.json). The README does not detail any alpha status, known bugs, or unsupported platforms.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
2
Star History
138 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.