adept-inference by persimmon-ai-labs

Inference code for the Persimmon-8B LLM

Created 2 years ago

412 stars

Top 70.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides inference code for Persimmon-8B, a large language model from Adept AI. It enables users to download and run both the base and chat-fine-tuned versions of the model, offering a powerful tool for text generation tasks.

How It Works

The inference code is designed to serve the Persimmon-8B model via a REST API. It leverages a Dockerized environment for dependency management and ease of deployment. The core functionality involves loading the model weights, processing input prompts according to a specific chat format (human: {prompt}\n\nadept:), and generating text outputs.

Quick Start & Requirements

Install/Run: Build and run via Docker using docker build -f docker/Dockerfile -t 'adeptdocker' . and sh docker_launch.sh.
Prerequisites: Requires an 80GB GPU for naive execution. A 40GB GPU may suffice with modifications to remove unused embeddings, reduce sequence length, or by using 8-bit quantization.
Model Download: Checkpoints are available via OCI bucket links provided in the README.
Documentation: User guide and model details are in the README.

Highlighted Details

Offers both a base and a chat-fine-tuned version of Persimmon-8B.
Requires specific prompt formatting for optimal chat model performance.
Supports tensor parallelism of 1, impacting GPU memory requirements.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

Base model: Apache 2.0 license.
Chat model: CC-BY-NC 4.0 license (Non-Commercial use).

Limitations & Caveats

The chat model's CC-BY-NC 4.0 license restricts commercial use. Running the model naively requires a substantial 80GB GPU, with potential workarounds for 40GB cards.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days