Lemur by OpenLemur

Open language model for language agents

Created 2 years ago

555 stars

Top 57.6% on SourcePulse

View on GitHub

5 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Binyuan Hui

Research Scientist at Alibaba Qwen

Jeremy Howard

Cofounder of fast.ai

Yaowei Zheng

Author of LLaMA-Factory

and 1 more!

Project Summary

Lemur provides open foundation models specifically designed for language agents, balancing strong natural language understanding with coding capabilities. This dual proficiency enables agents to execute tasks, reason effectively, and interact with real-world environments, targeting developers and researchers building sophisticated AI agents.

How It Works

Lemur is built upon Llama-2-70B, enhanced through a two-stage training process. First, it undergoes pretraining on a 90B token corpus with a 10:1 code-to-text ratio, creating Lemur-70B-v1. This is followed by instruction tuning on 300K text and code examples, resulting in Lemur-70B-Chat-v1. This approach aims to achieve state-of-the-art performance across diverse language and coding benchmarks, bridging the gap between open-source and commercial models for agentic tasks.

Quick Start & Requirements

Installation: Clone the repository and install with pip install -e . after setting up a conda environment with Python 3.10 and PyTorch 2.0.1 with CUDA 11.8.
Prerequisites: CUDA 11.8, Python 3.10, PyTorch 2.0.1.
Models: Available on HuggingFace Hub as OpenLemur/lemur-70b-v1 and OpenLemur/lemur-70b-chat-v1.
Serving: vLLM is recommended for serving; a Docker script vllm_lemur.sh is provided.
Evaluation: Forked codebases for MINT, WebArena, and InterCode are available for evaluation.
Docs: Paper, Blog

Highlighted Details

Evaluated on 8 language/code benchmarks (MMLU, HumanEval, etc.) and 13 interactive agent benchmarks.
Achieves state-of-the-art performance among open-source models for agent abilities.
Supports text and code generation, tool usage, and environment interaction.
Models are available in 8-bit for reduced memory footprint.

Maintenance & Community

Project is a collaborative effort between XLang Lab and Salesforce Research.
Repository is actively updated.

Licensing & Compatibility

The specific license is not explicitly stated in the README, but the models are released openly. Compatibility for commercial use or closed-source linking would require clarification on the license terms.

Limitations & Caveats

The README indicates that official FastChat codebase support for Lemur-Chat is not yet available, requiring the use of provided vLLM serving scripts. Some evaluation tasks (Spider, MultiPL-E, DS-1000) are marked as "in progress" (🚧).

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days