primeqa  by primeqa

Open-source repo for multilingual question answering research

created 3 years ago
736 stars

Top 48.0% on sourcepulse

GitHubView on GitHub
Project Summary

PrimeQA is an open-source toolkit for state-of-the-art multilingual question answering (QA) research and development. It enables researchers and developers to train, replicate, and deploy advanced QA models, supporting tasks like information retrieval, machine reading comprehension, question generation, and retrieval-augmented generation. The project targets NLP researchers and developers seeking to build and experiment with cutting-edge QA systems.

How It Works

PrimeQA integrates various techniques for QA, including traditional (BM25) and neural (ColBERT, DPR) information retrieval for document and passage retrieval. It also supports multilingual machine reading comprehension using models like XLM-R for answer extraction and generation, and multilingual question generation for domain adaptation. A key feature is its support for Retrieval Augmented Generation (RAG) using large language models like GPT-3/ChatGPT, conditioned on retrieved context. This modular approach allows for flexible pipeline construction and experimentation.

Quick Start & Requirements

  • Installation: pip install . (minimal), pip install .[gpu] (GPU support), pip install -e .[all] (editable, full install).
  • Prerequisites: Python, PyTorch (ensure CUDA compatibility if using GPU). Java 11 is required for BM25 retrieval (install via conda install -c conda-forge openjdk=11). For improved performance, consider installing faiss and faiss-gpu from conda-forge and modifying setup.py.
  • Resources: GPU recommended for advanced models.
  • Links: Documentation, Quick Tour, Tutorials, GPT-3/ChatGPT Notebooks, Examples.

Highlighted Details

  • Achieves top leaderboard positions on XOR-TyDi, TyDiQA-main, OTT-QA, and HybridQA.
  • Enables replication of experiments from leading NLP conferences.
  • Supports end-to-end QA pipelines, including RAG with LLMs.
  • Collaborations with Stanford NLP, IBM Research, and multiple universities.

Maintenance & Community

PrimeQA is a collaborative effort involving Stanford NLP, IBM Research, and numerous universities. Community engagement is encouraged via pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that modifying dependencies from setup.py when installing from source is not officially supported. Specific details on model sizes, training times, or hardware requirements for achieving state-of-the-art performance are not detailed.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.