llm-tldr by parcadei

Code analysis and context optimization for LLMs

Created 6 months ago

1,169 stars

Top 32.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Didier Lopes

Founder of OpenBB

Project Summary

LLM-TLDR provides a solution for efficiently processing large codebases for AI agents by extracting structured code information, drastically reducing token usage and query latency. It targets developers and AI systems needing to understand, refactor, or debug code without overwhelming LLM context windows. The primary benefit is enabling LLMs to work with codebases far larger than their native token limits, offering significant performance gains.

How It Works

The project employs a multi-layered analysis approach, building five distinct layers: Abstract Syntax Tree (AST), Call Graph, Control Flow Graph (CFG), Data Flow Graph (DFG), and Program Dependence Graph (PDG). This granular analysis allows for task-specific depth, from simple structure browsing to complex data flow tracing. These layers feed into a semantic indexing system using 1024-dimensional embeddings generated by bge-large-en-v1.5 and stored in FAISS. This enables natural language search based on code behavior rather than just keywords. A background daemon maintains in-memory indexes for near-instantaneous queries.

Quick Start & Requirements

Install: pip install llm-tldr
Prerequisites: Python, tree-sitter (handled by pip), bge-large-en-v1.5, faiss-cpu.
Setup: Indexing a project with tldr warm . takes approximately 30-60 seconds for a typical project, after which queries are sub-100ms.
Documentation: Full Documentation available at [link not provided in README, but implied].

Highlighted Details

Achieves up to 95% token savings for LLM context and 89% for codebase overviews.
Reduces query latency from ~30 seconds to ~100ms via an in-memory daemon.
Supports integration with AI tools like Claude Desktop and Claude Code via MCP.
Enables semantic search for code by behavior, e.g., finding "validate JWT tokens" without exact text matches.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source linking.

Limitations & Caveats

The daemon requires explicit notification of file changes (e.g., via git hooks or editor integrations) to maintain index freshness, though it auto-rebuilds after a configurable threshold of changes. Monorepo support requires specific configuration files (.claude/workspace.json). The README does not detail any alpha status, known bugs, or unsupported platforms.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days