CodeWiki  by FSoft-AI4Code

AI for holistic code documentation

Created 5 months ago
354 stars

Top 78.9% on SourcePulse

GitHubView on GitHub
Project Summary

CodeWiki is an open-source framework designed to automate the generation of holistic, architecture-aware documentation for large-scale, multilingual codebases. It addresses the challenge of understanding complex software systems by capturing not just individual functions but also their interdependencies and system-level interactions, benefiting developers and researchers by providing a comprehensive, structured view of code.

How It Works

CodeWiki employs a three-stage process: Hierarchical Decomposition partitions codebases into modules using dynamic programming-inspired algorithms while preserving architectural context. A Recursive Multi-Agent System then processes these modules adaptively with dynamic task delegation, enabling scalability to large repositories. Finally, Multi-Modal Synthesis integrates textual descriptions with visual artifacts like architecture diagrams, data flows, and sequence diagrams for a comprehensive understanding. This approach is advantageous for handling arbitrary codebase sizes and maintaining quality across repository-level scope.

Quick Start & Requirements

  • Install: pip install git+https://github.com/FSoft-AI4Code/CodeWiki.git
  • Prerequisites: Python 3.12+, Node.js (for Mermaid diagram validation), LLM API access (e.g., Anthropic Claude, OpenAI), Git.
  • Resources: Documentation & Guides, Docker Deployment, Development Guide, CodeWikiBench, Live Demo, Academic Paper.

Highlighted Details

  • Supports seven programming languages: Python, Java, JavaScript, TypeScript, C, C++, C#.
  • Generates comprehensive textual documentation and visual artifacts including system architecture diagrams, data flows, and sequence diagrams.
  • Evaluated on CodeWikiBench, achieving an overall average of 68.79% documentation quality, a +4.73% improvement over DeepWiki, with notable gains in high-level languages.
  • Handles large codebases (tested up to 1.4M LOC) through its hierarchical decomposition and recursive agentic system.

Licensing & Compatibility

Licensed under the MIT License, which generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

Performance metrics indicate a slight decrease in documentation quality for Systems languages (C, C++) compared to a baseline, suggesting potential areas for further optimization in these categories.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
7
Star History
354 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Vasek Mlejnsky Vasek Mlejnsky(Cofounder of E2B).

super-rag by superagent-ai

0%
385
RAG pipeline for AI apps
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.