Skip to content

mlrun/demos

 
 

Repository files navigation

MLRun Demos

The mlrun/demos repository provides demos that implement full end-to-end ML use-case applications with MLRun and demonstrate different aspects of working with MLRun.

For more information about the MLRun Hackathon, refer to the hackathon getting-started section.

In This Document

Overview

The MLRun demos are end-to-end use-case applications that leverage MLRun to implement complete machine-learning (ML) pipelines — including data collection and preparation, model training, and deployment automation.

The demos demonstrate how you can

  • Run ML pipelines locally from a web notebook such as Jupyter Notebook.
  • Run some or all tasks on an elastic Kubernetes cluster by using serverless functions.

The demo applications are tested on the Iguazio Data Science Platform ("the platform") and use its shared data fabric, which is accessible via the v3io file-system mount; if you're not already a platform user, request a free trial.

General ML Workflow

The provided demos implement some or all of the ML workflow steps illustrated in the following image:

ML workflow

Prerequisites

To run the MLRun demos, first do the following:

  • Prepare a Kubernetes cluster with preinstalled operators or custom resources (CRDs) for Horovod and/or Nuclio, depending on the demos that you wish to run.
  • Install an MLRun service on your cluster. See the instructions in the MLRun documentation.
  • Ensure that your cluster has a shared file or object storage for storing the data (artifacts).

Getting-started Tutorial

The tutorial covers MLRun fundamentals such as creation of projects and data ingestion and preparation, and demonstrates how to create an end-to-end machine-learning (ML) pipeline. MLRun is integrated as a default (pre-deployed) shared service in the Iguazio Data Science Platform.

You'll learn how to

  • Collect (ingest), prepare, and analyze data
  • Train, deploy, and monitor an ML model

You'll also learn about the basic concepts, components, and APIs that allow you to perform these tasks, including

  • Setting up MLRun
  • Creating and working with projects
  • Creating, deploying and running MLRun functions
  • Using MLRun to run functions, jobs, and full workflows
  • Deploying a model to a serving layer using serverless functions

How-To: Converting Existing ML Code to an MLRun Project

The converting-to-mlrun how-to demo demonstrates how to convert existing ML code to an MLRun project. The demo implements an MLRun project for taxi ride-fare prediction based on a Kaggle notebook with an ML Python script that uses data from the New York City Taxi Fare Prediction competition.

The code includes the following components:

  1. Data ingestion
  2. Data cleaning and preparation
  3. Model training
  4. Model serving

Pipeline Output

converting-to-mlrun pipeline output

Integrating with CI Pipelines

The CI Pipeline demo demonstrates how to build a full end-to-end automated-ML pipeline using scikit-learn and the UCI Iris data set.

Users may want to run their ML Pipelines using CI frameworks like Github Actions, GitLab CI/CD, etc. MLRun support simple and native integration with the CI systems, see the following example in which we combine local code (from the repository) with MLRun marketplace functions to build an automated ML pipeline which:

  • Runs data preparation
  • Train a model
  • Test the trained model
  • Deploy the model into a cluster
  • Test the deployed model

The demo by default uses Slack notifications. To run slack notification, you will need to create an app and enable webhooks. This process is straightforward and should take a few minutes. For more information see the slack documentation

scikit-learn tress image

Model deployment Pipeline: Real-time operational Pipeline

This demo shows how to deploy a model with streaming information.

This demo is comprised of several steps:

Model deployment Pipeline Real-time operational Pipeline

Note: this demo uses the multi-model data layer (V3IO), primarily for real-time streaming. Contact Iguazio to get credentials to access a V3IO system. To test access to the V3IO API see the v3io-api test notebook.

While this demo covers the use case of 1st-day churn, it is easy to replace the data, related features and training model and reuse the same workflow for different business cases.

These steps are covered by the following pipeline:

  • 1. Data generator — Generates events for the training and serving and Create an enrichment table (lookup values).
  • 2. Event handler - Receive data from the input. This is a common input stream for all the data. This way, one can easily replace the event source data (in this case we have a data generator) without affecting the rest of this flow. It also store all incoming data to parquet files.
  • 3. Stream to features - Enrich the stream using the enrichment table and Update aggregation features using the incoming event handler.
  • 4. Optional model training steps -
  • 4.1 Get Data Snapshot - Takes a snapshot of the feature table for training.
  • 4.2 Describe the Dataset - Runs common analysis on the datasets and produces plots suche as histogram, feature importance, corollation and more.
  • 4.3 Training - Runing training with multiple classification models.
  • 4.4 Testing - Testing the best performing model.
  • 5. Serving - Serve the model and process the data from the enriched stream and aggregation features.
  • 6. Inference logger - We use the same event handler function from above but only its capability to store incoming data to parquet files.

Healthcare Demo with Feature Store

This demo shows the usage of MLRun and the feature store. The demo will showcase:

Healthcare facilities need to closely monitor their patients and identify early signs that can indicate that medical intervention is necessary. Time is a key factor, the earlier the medical teams can attend to an issue, the better the outcome. This means an effective system that can alert of issues in real-time can save lives.

In this demo we will learn how to Ingest different data sources to our Feature Store. Specifically, this patient data has been successfully used to treat hospitalized COVID-19 patients prior to their condition becoming severe or critical. To do this we will use a medical dataset which includes three types of data:

  • Healthcare systems: Batch updated dataset, containing different lab test results (Blood test results for ex.).
  • Patient Records: Static dataset containing general patient details.
  • Real-time sensors: Real-Time patient metric monitoring sensor.

Note: this demo uses the multi-model data layer (V3IO), primarily for real-time streaming. Contact Iguazio to get credentials to access a V3IO system. To test access to the V3IO API see the v3io-api test notebook.

About

End to end MLRun demos

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages