NexusRaven by nexusflowai

Evaluation framework for function-calling LLM, NexusRaven-13B

Created 2 years ago

318 stars

Top 85.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Simon Willison

Coauthor of Django

Pawel Garbacki

Cofounder of Fireworks AI

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

NexusRaven-13B is an open-source LLM specifically designed for function calling, aiming to surpass existing state-of-the-art models in this domain. It is targeted at developers and researchers needing robust and efficient API interaction capabilities from LLMs, offering significant performance gains and commercial viability.

How It Works

NexusRaven-13B is trained for function calling, accepting Python function signatures and docstrings to generate appropriate API calls. It is designed to generalize to unseen tools and is compatible with frameworks like LangChain. The model's output often includes a "reflection" step, which the authors recommend bypassing by using a specific stop criterion (["\nReflection:"]) to prioritize the "Initial Call" for efficiency and direct execution.

Quick Start & Requirements

Install: pip install transformers accelerate
Usage: Requires Hugging Face transformers library. GPU recommended for inference.
Demo: Nexusflow HF
Documentation: NexusRaven blog post

Highlighted Details

Achieves 95% success rate in using cybersecurity tools (CVE/CPE Search, VirusTotal) with a retrieval system, outperforming GPT-4 (64%).
Generalizes to unseen tools in a zero-shot setting, outperforming other open-source LLMs of similar size.
Trained without proprietary LLM data, enabling commercial use.
Evaluation framework and data processing code are Apache 2.0 licensed.

Maintenance & Community

Contact: info@nexusflow.ai
Blog: NexusRaven blog post

Licensing & Compatibility

Code: Apache 2.0
Evaluation Data: CC-BY-NC-4.0 (Non-commercial due to use of GPT-generated data from ToolLLM and ToolAlpaca datasets).

Limitations & Caveats

The model may generate reflections that are not always helpful; using a stop criterion is recommended. It performs best with a retriever when dealing with many functions, as a large number can saturate the context window. The model can be prone to generating incorrect calls, necessitating guardrails.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days