databricks-ml-examples  by databricks

ML examples on Databricks

created 8 years ago
353 stars

Top 80.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides example notebooks and scripts for utilizing and fine-tuning state-of-the-art Large Language Models (LLMs) on the Databricks platform. It targets data scientists and ML engineers looking to implement generative AI applications, offering guidance on model selection, performance evaluation, and customization for various use cases like text generation, embeddings, transcription, image, and code generation.

How It Works

The repository is structured into directories for llm-models and llm-fine-tuning. It leverages Databricks' ML capabilities, including its managed infrastructure and libraries, to demonstrate the deployment and fine-tuning of various open-source LLMs. The examples showcase practical applications and provide performance benchmarks using the Mosaic Eval Gauntlet framework, enabling users to compare model effectiveness across different tasks.

Quick Start & Requirements

  • Requires a Databricks environment.
  • Notebooks are typically run within Databricks notebooks.
  • Specific model requirements (e.g., GPU, memory) depend on the chosen LLM.
  • Refer to individual notebook instructions for detailed setup.

Highlighted Details

  • Curated list of recommended open-source LLMs for commercial use, categorized by use case (Quality, Balanced, Speed).
  • Includes examples for text generation, embeddings, transcription, image generation, and code generation.
  • Features performance benchmarks from the Mosaic Eval Gauntlet framework for various LLMs.
  • Provides fine-tuning scripts and notebooks, including QLoRA for efficient model tuning.

Maintenance & Community

  • Maintained by Databricks.
  • No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

  • The repository itself appears to be under an open-source license (likely Apache 2.0, common for Databricks projects, though not explicitly stated).
  • The included LLMs have their own licenses, with the repository specifically highlighting models "for free commercial use." Users must verify individual model licenses.

Limitations & Caveats

The examples are specifically tailored for the Databricks platform, limiting direct applicability to other cloud or on-premises environments without significant adaptation. The README does not specify the exact license for the repository itself.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.