LaTeXTrans  by NiuTrans

LaTeX document translation system

Created 2 months ago
408 stars

Top 71.3% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LaTeXTrans is a system designed to translate the content of LaTeX documents, particularly arXiv papers, into various natural languages. It addresses the common problem of traditional PDF translation tools breaking formulas and formatting, offering researchers and students a way to efficiently read and understand papers with high fidelity to the original layout and terminology.

How It Works

This project employs a structured, multi-agent collaboration approach to translate LaTeX sources directly. It leverages Large Language Models (LLMs) to process pre-parsed LaTeX code, utilizing a workflow of six specialized agents: Parser, Translator, Validator, Summarizer, Terminology Extractor, and Generator. This method allows for end-to-end conversion from an arXiv paper ID to a translated PDF, preserving complex elements like mathematical formulas, cross-references, and ensuring consistent terminology.

Quick Start & Requirements

Installation involves cloning the repository, navigating to the directory, and installing dependencies via pip install -r requirements.txt. A LaTeX distribution like MikTex (with Strawberry Perl support) or TeXLive is required for PDF compilation. Users must configure their chosen LLM's API key and base URL in config/default.toml. Translation is initiated by providing an arXiv paper ID via python main.py --arxiv ${xxxx}.

Highlighted Details

  • Preserves the integrity of mathematical formulas, document layout, and cross-references during translation.
  • Ensures consistent terminology translation across the document.
  • Provides end-to-end conversion from arXiv paper IDs to fully compiled translated PDFs.
  • Utilizes a novel multi-agent system for structured LaTeX source translation.

Maintenance & Community

The project's authors are listed in the provided citation. No specific community channels (e.g., Discord, Slack) or roadmap details are present in the README.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification for adoption decisions, especially concerning commercial use or integration with closed-source projects.

Limitations & Caveats

The system's LaTeX compilation pipeline is currently optimized primarily for English-to-Chinese translations; errors may occur in the final PDF output when translating to other languages. Configuration of external LLM API keys and base URLs is mandatory.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
127 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.