paperai  by neuml

AI for scientific paper analysis and report generation

created 5 years ago
1,449 stars

Top 28.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an AI-powered application for semantic search and workflow automation on medical and scientific papers. It targets researchers and data scientists, enabling them to efficiently generate reports and extract insights from large document repositories using LLMs and Retrieval Augmented Generation (RAG).

How It Works

PaperAI leverages a RAG pipeline built on top of txtai embeddings. It indexes articles, parsing them into sections and storing them with metadata. Embeddings are generated over the entire corpus, allowing for semantic search. When a query is run, the system retrieves relevant document sections, feeds them as context to an LLM with a configurable prompt, and generates structured outputs like reports or annotated PDFs. This approach allows for bulk LLM inference and automated data extraction from research papers.

Quick Start & Requirements

  • Install via pip: pip install paperai
  • Requires Python 3.10+.
  • Docker image available.
  • See examples for notebooks and applications.

Highlighted Details

  • Supports bulk LLM inference and report generation in Markdown, CSV, or PDF annotations.
  • Enables dynamic column generation in reports driven by LLM questions and RAG queries.
  • Integrates txtai for embeddings and RAG pipelines, with configurable LLM backends.
  • Can process large datasets of scientific papers for automated research tasks.

Maintenance & Community

  • Developed by NeuML.
  • Recognized in articles for its application in COVID-19 research.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The annotation feature for PDFs requires the original PDF files to be present and accessible. The project's core functionality relies on the txtai library, and performance may vary based on the chosen LLM and embedding models.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
54 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.