oci-data-science-ai-samples  by oracle-samples

Cloud-native ML development and MLOps examples

Created 4 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository offers a comprehensive suite of tutorials and code examples for Oracle Cloud Infrastructure (OCI) Data Science and AI services. It targets data scientists and ML practitioners, aiming to accelerate model development, training, and deployment within OCI. The primary benefit is enabling users to quickly leverage OCI's ML capabilities, from SDK usage to advanced MLOps and distributed training.

How It Works

The project utilizes the Accelerated Data Science (ADS) SDK for streamlining ML tasks and OCI integration, presenting examples primarily as JupyterLab notebooks within OCI-provided conda environments. It covers diverse OCI features: Large Language Model (LLM) integration (fine-tuning, LangChain, direct coding), Model Catalog artifact creation, OCI Data Science Jobs for scalable ML tasks (supporting distributed training frameworks like Dask, Horovod, TensorFlow, PyTorch), automated ML Pipelines, an MLOps platform via ML Applications, Data Labeling Service scripts, and Feature Store examples.

Quick Start & Requirements

An Oracle Cloud Infrastructure account and OCI Data Science service access are mandatory. Code execution is intended within OCI's managed notebook sessions using pre-configured conda environments. Key documentation links include the ADS user guide and OCI Data Science service guide.

Highlighted Details

  • Features the Accelerated Data Science (ADS) SDK for efficient ML workflow management.
  • Provides extensive examples for Large Language Models (LLMs), covering fine-tuning, deployment, and LangChain integration.
  • Demonstrates distributed training using Dask, Horovod, TensorFlow Distributed, and PyTorch Distributed within OCI Jobs.
  • Showcases ML Applications as an MLOps platform for packaging and deploying ML models as services.
  • Includes examples for the OCI Feature Store for centralized data management.

Maintenance & Community

Community feedback and contributions are actively encouraged via GitHub issues and a contribution guide. Security vulnerability disclosures are handled through the security guide and issue filing.

Licensing & Compatibility

Released under the Universal Permissive License v1.0 (UPL 1.0), which is permissive for commercial use and integration.

Limitations & Caveats

This repository contains examples and tutorials, not a standalone installable tool. Running the code requires an active OCI environment and service configurations. Some content may represent experimental features.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
1
Star History
17 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.