databricks-ml-examples by databricks

ML examples on Databricks

Created 8 years ago

364 stars

Top 77.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Reynold Xin

Cofounder of Databricks

Project Summary

This repository provides example notebooks and scripts for utilizing and fine-tuning state-of-the-art Large Language Models (LLMs) on the Databricks platform. It targets data scientists and ML engineers looking to implement generative AI applications, offering guidance on model selection, performance evaluation, and customization for various use cases like text generation, embeddings, transcription, image, and code generation.

How It Works

The repository is structured into directories for llm-models and llm-fine-tuning. It leverages Databricks' ML capabilities, including its managed infrastructure and libraries, to demonstrate the deployment and fine-tuning of various open-source LLMs. The examples showcase practical applications and provide performance benchmarks using the Mosaic Eval Gauntlet framework, enabling users to compare model effectiveness across different tasks.

Quick Start & Requirements

Requires a Databricks environment.
Notebooks are typically run within Databricks notebooks.
Specific model requirements (e.g., GPU, memory) depend on the chosen LLM.
Refer to individual notebook instructions for detailed setup.

Highlighted Details

Curated list of recommended open-source LLMs for commercial use, categorized by use case (Quality, Balanced, Speed).
Includes examples for text generation, embeddings, transcription, image generation, and code generation.
Features performance benchmarks from the Mosaic Eval Gauntlet framework for various LLMs.
Provides fine-tuning scripts and notebooks, including QLoRA for efficient model tuning.

Maintenance & Community

Maintained by Databricks.
No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

The repository itself appears to be under an open-source license (likely Apache 2.0, common for Databricks projects, though not explicitly stated).
The included LLMs have their own licenses, with the repository specifically highlighting models "for free commercial use." Users must verify individual model licenses.

Limitations & Caveats

The examples are specifically tailored for the Databricks platform, limiting direct applicability to other cloud or on-premises environments without significant adaptation. The README does not specify the exact license for the repository itself.

databricks-ml-examples by databricks

Explore Similar Projects

llamaduo by deep-diver

mega-data-factory by duoan

OpenAlpaca by yxuansu

ScaleLLM by vectorch-ai

pyllms by kagisearch

Skills by NVIDIA-NeMo

xTuring by stochasticai

data-prep-kit by data-prep-kit

dbrx by databricks

large-language-models by databricks-academy

transformerlab-app by transformerlab

gpt-llm-trainer by mshumer