NexusRaven  by nexusflowai

Evaluation framework for function-calling LLM, NexusRaven-13B

Created 2 years ago
317 stars

Top 85.2% on SourcePulse

GitHubView on GitHub
Project Summary

NexusRaven-13B is an open-source LLM specifically designed for function calling, aiming to surpass existing state-of-the-art models in this domain. It is targeted at developers and researchers needing robust and efficient API interaction capabilities from LLMs, offering significant performance gains and commercial viability.

How It Works

NexusRaven-13B is trained for function calling, accepting Python function signatures and docstrings to generate appropriate API calls. It is designed to generalize to unseen tools and is compatible with frameworks like LangChain. The model's output often includes a "reflection" step, which the authors recommend bypassing by using a specific stop criterion (["\nReflection:"]) to prioritize the "Initial Call" for efficiency and direct execution.

Quick Start & Requirements

  • Install: pip install transformers accelerate
  • Usage: Requires Hugging Face transformers library. GPU recommended for inference.
  • Demo: Nexusflow HF
  • Documentation: NexusRaven blog post

Highlighted Details

  • Achieves 95% success rate in using cybersecurity tools (CVE/CPE Search, VirusTotal) with a retrieval system, outperforming GPT-4 (64%).
  • Generalizes to unseen tools in a zero-shot setting, outperforming other open-source LLMs of similar size.
  • Trained without proprietary LLM data, enabling commercial use.
  • Evaluation framework and data processing code are Apache 2.0 licensed.

Maintenance & Community

Licensing & Compatibility

  • Code: Apache 2.0
  • Evaluation Data: CC-BY-NC-4.0 (Non-commercial due to use of GPT-generated data from ToolLLM and ToolAlpaca datasets).

Limitations & Caveats

The model may generate reflections that are not always helpful; using a stop criterion is recommended. It performs best with a retriever when dealing with many functions, as a large number can saturate the context window. The model can be prone to generating incorrect calls, necessitating guardrails.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
7 more.

lighteval by huggingface

2.6%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.