GeneGPT  by ncbi

Research paper for tool-augmented LLM access to biomedical information

Created 2 years ago
412 stars

Top 71.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GeneGPT is a tool-augmented LLM designed to improve the accuracy and reliability of biomedical information retrieval by enabling LLMs to interact with NCBI Web APIs. It targets researchers and professionals in the life sciences who require precise answers to complex biological questions, offering a significant improvement over standard LLMs by reducing hallucinations and providing verifiable, tool-backed responses.

How It Works

GeneGPT employs in-context learning to teach LLMs how to utilize external tools, specifically NCBI Web APIs. A novel decoding algorithm identifies when an API call is necessary, constructs the appropriate query, executes it, and integrates the results into its response. This approach leverages the LLM's natural language understanding while grounding its answers in real-time, domain-specific data, outperforming existing biomedical LLMs and general-purpose models on specialized tasks.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires an OpenAI API key, configured in config.py.
  • Run GeneGPT: python main.py <documentation_flags> (e.g., python main.py 111111 to use all demonstrations and documentations).
  • Evaluate results: python evaluate.py ${RESULT_DIRECTORY}.
  • Tested with Python 3.9.13.

Highlighted Details

  • Achieves SOTA performance on eight GeneTuring tasks with an average score of 0.83, significantly outperforming New Bing (0.44), BioGPT (0.04), and ChatGPT (0.12).
  • Demonstrations are more effective than documentation for in-context tool learning.
  • Generalizes to longer API call chains and multi-hop questions.
  • Provides task-specific error analysis for future improvements.

Maintenance & Community

  • Supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.
  • Citation available via provided BibTeX entry.

Licensing & Compatibility

  • The repository does not explicitly state a license. The disclaimer indicates it shows research results from NCBI/NLM and is not intended for direct diagnostic use. Commercial use or linking with closed-source projects may require clarification.

Limitations & Caveats

  • Requires an OpenAI API key, implying costs and dependency on OpenAI's services.
  • The disclaimer highlights that the tool's output is for research purposes and not for direct diagnostic or medical decision-making.
Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
44 more.

llama_index by run-llama

0.3%
44k
Data framework for building LLM-powered agents
Created 2 years ago
Updated 18 hours ago
Feedback? Help us improve.