json_repair by mangiucugna

JSON repair tool for LLM outputs

Created 2 years ago

4,303 stars

Top 11.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Philipp Schmid

DevRel at Google DeepMind

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Project Summary

This Python module addresses the common issue of malformed JSON output from Large Language Models (LLMs). It provides a robust solution for repairing syntactically incorrect JSON strings, making them parsable by standard libraries. The target audience includes developers working with LLM-generated data that requires reliable JSON parsing, offering a lightweight and effective way to handle common LLM output errors.

How It Works

The library employs a heuristic-based approach to fix JSON. It parses the JSON string according to the standard BNF definition, identifying and correcting common syntax errors such as missing quotes, misplaced commas, unescaped characters, and incomplete structures. When errors are detected, it applies simple, intelligent fixes like adding missing delimiters, quoting unquoted strings, and cleaning up extraneous characters or whitespace.

Quick Start & Requirements

Install with: pip install json-repair
Usage: from json_repair import repair_json
For online validation and demos, visit: https://mangiucugna.github.io/json_repair/

Highlighted Details

Supports fixing syntax errors, malformed arrays/objects, and auto-completing missing values.
Offers drop-in replacements for json.loads() and json.load() via json_repair.loads() and json_repair.load().
Includes CLI support via pipx install json-repair.
Handles non-Latin characters correctly with ensure_ascii=False.

Maintenance & Community

The project follows strict semantic versioning and TDD, with frequent updates and no breaking changes in minor/patch versions. Users are advised to pin dependencies as json_repair==0.*.

Licensing & Compatibility

The library is available under a permissive license, suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

While comprehensive, the library may not cover all obscure JSON corruption scenarios, and users are encouraged to contribute examples or pull requests for unhandled edge cases. The skip_json_loads=True option should only be used when the input is guaranteed to be invalid JSON.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

169 stars in the last 30 days