databricks-ml-examples  by databricks

ML examples on Databricks

Created 8 years ago
356 stars

Top 78.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides example notebooks and scripts for utilizing and fine-tuning state-of-the-art Large Language Models (LLMs) on the Databricks platform. It targets data scientists and ML engineers looking to implement generative AI applications, offering guidance on model selection, performance evaluation, and customization for various use cases like text generation, embeddings, transcription, image, and code generation.

How It Works

The repository is structured into directories for llm-models and llm-fine-tuning. It leverages Databricks' ML capabilities, including its managed infrastructure and libraries, to demonstrate the deployment and fine-tuning of various open-source LLMs. The examples showcase practical applications and provide performance benchmarks using the Mosaic Eval Gauntlet framework, enabling users to compare model effectiveness across different tasks.

Quick Start & Requirements

  • Requires a Databricks environment.
  • Notebooks are typically run within Databricks notebooks.
  • Specific model requirements (e.g., GPU, memory) depend on the chosen LLM.
  • Refer to individual notebook instructions for detailed setup.

Highlighted Details

  • Curated list of recommended open-source LLMs for commercial use, categorized by use case (Quality, Balanced, Speed).
  • Includes examples for text generation, embeddings, transcription, image generation, and code generation.
  • Features performance benchmarks from the Mosaic Eval Gauntlet framework for various LLMs.
  • Provides fine-tuning scripts and notebooks, including QLoRA for efficient model tuning.

Maintenance & Community

  • Maintained by Databricks.
  • No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

  • The repository itself appears to be under an open-source license (likely Apache 2.0, common for Databricks projects, though not explicitly stated).
  • The included LLMs have their own licenses, with the repository specifically highlighting models "for free commercial use." Users must verify individual model licenses.

Limitations & Caveats

The examples are specifically tailored for the Databricks platform, limiting direct applicability to other cloud or on-premises environments without significant adaptation. The README does not specify the exact license for the repository itself.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), John Yang John Yang(Coauthor of SWE-bench, SWE-agent), and
28 more.

stanford_alpaca by tatsu-lab

0.1%
30k
Instruction-following LLaMA model training and data generation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.