annotateai  by neuml

CLI tool for automated paper annotation using LLMs

created 7 months ago
332 stars

Top 83.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an automated system for annotating research papers using Large Language Models (LLMs). It aims to enhance the reading experience by offering concise topic summaries and highlighting key sections within documents, particularly benefiting researchers and students working with scientific literature.

How It Works

The system processes PDF documents by identifying the paper's title and key concepts. It then iterates through each page, locating sections that best exemplify these concepts. For each relevant section, it generates a brief topic summary, effectively annotating the paper with contextual information. This approach leverages LLMs to distill complex information and provide targeted insights during the reading process.

Quick Start & Requirements

  • Install via pip: pip install annotateai
  • Python 3.10+ required.
  • For local LLM execution, autoawq[kernels] or llama-cpp-python may be needed depending on the model and OS.
  • Supports various LLMs, including API-based models (GPT-4o, Claude 3.5 Sonnet), Ollama endpoints, and Hugging Face GGUF models.
  • Docker image available: docker run -d --gpus=all -it -p 8501:8501 neuml/annotateai
  • Official documentation: Introducing AnnotateAI

Highlighted Details

  • Works with any PDF, with optimized performance for medical and scientific papers from sources like arXiv, PubMed, bioRxiv, and medRxiv.
  • Supports custom keyword input for targeted annotation.
  • Offers a Dockerized web application for easy deployment.
  • Integrates with the txtai library, supporting a wide range of LLMs.

Maintenance & Community

The project is maintained by NeuML. Further details on community engagement or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is primarily focused on PDF documents and may not support other formats. Specific LLM compatibility and performance can vary. The README does not detail any known bugs or deprecation plans.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.