GeneGPT  by ncbi

Research paper for tool-augmented LLM access to biomedical information

created 2 years ago
409 stars

Top 72.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GeneGPT is a tool-augmented LLM designed to improve the accuracy and reliability of biomedical information retrieval by enabling LLMs to interact with NCBI Web APIs. It targets researchers and professionals in the life sciences who require precise answers to complex biological questions, offering a significant improvement over standard LLMs by reducing hallucinations and providing verifiable, tool-backed responses.

How It Works

GeneGPT employs in-context learning to teach LLMs how to utilize external tools, specifically NCBI Web APIs. A novel decoding algorithm identifies when an API call is necessary, constructs the appropriate query, executes it, and integrates the results into its response. This approach leverages the LLM's natural language understanding while grounding its answers in real-time, domain-specific data, outperforming existing biomedical LLMs and general-purpose models on specialized tasks.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires an OpenAI API key, configured in config.py.
  • Run GeneGPT: python main.py <documentation_flags> (e.g., python main.py 111111 to use all demonstrations and documentations).
  • Evaluate results: python evaluate.py ${RESULT_DIRECTORY}.
  • Tested with Python 3.9.13.

Highlighted Details

  • Achieves SOTA performance on eight GeneTuring tasks with an average score of 0.83, significantly outperforming New Bing (0.44), BioGPT (0.04), and ChatGPT (0.12).
  • Demonstrations are more effective than documentation for in-context tool learning.
  • Generalizes to longer API call chains and multi-hop questions.
  • Provides task-specific error analysis for future improvements.

Maintenance & Community

  • Supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.
  • Citation available via provided BibTeX entry.

Licensing & Compatibility

  • The repository does not explicitly state a license. The disclaimer indicates it shows research results from NCBI/NLM and is not intended for direct diagnostic use. Commercial use or linking with closed-source projects may require clarification.

Limitations & Caveats

  • Requires an OpenAI API key, implying costs and dependency on OpenAI's services.
  • The disclaimer highlights that the tool's output is for research purposes and not for direct diagnostic or medical decision-making.
Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.