0% found this document useful (0 votes)

1K views3,901 pages

Azure Machine Learning Azureml API 2

Uploaded by

Bobby Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views3,901 pages

Azure Machine Learning Azureml API 2

Uploaded by

Bobby Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3901

Tell us about your PDF experience.

Azure Machine Learning documentation

Learn how to train and deploy models and manage the ML lifecycle (MLOps) with Azure
Machine Learning. Tutorials, code examples, API references, and more.

Overview

ｅ OVERVIEW

What is Azure Machine Learning?

Setup & quickstart

ｆ QUICKSTART

Create resources

Get started with Azure Machine Learning

Start with the basics

ｇ TUTORIAL

Prepare and explore data

Develop on a cloud workstation

Train a model

Deploy a model

Set up a reusable pipeline

Work with data

ｃ HOW-TO GUIDE

Use Apache Spark in Azure Machine Learning

Create data assets

Work with tables

Train models

ｃ HOW-TO GUIDE

Run training with CLI, SDK, or REST API

Tune hyperparameters for model training

Build pipelines from reuseable components

Use automated ML in studio

Train with R

Deploy models

｀ DEPLOY

Streamline model deployment with endpoints

Real-time scoring with online endpoints

Batch scoring with batch endpoints

Deploy R models

Manage the ML lifecycle (MLOps)

ｃ HOW-TO GUIDE

Track, monitor, analyze training runs

Model management, deployment & monitoring

Security for ML projects

ｃ HOW-TO GUIDE

Create a secure workspace

Connect to data sources

Enterprise security & governance

Reference docs

ｉ REFERENCE

Python SDK (v2)

CLI (v2)

REST API

Algorithm & component reference

Resources

ｉ REFERENCE

Upgrade to v2

Python SDK (v2) code examples

CLI (v2) code examples

ML Studio (classic) documentation

What is Azure Machine Learning?
Article • 12/04/2023

Azure Machine Learning is a cloud service for accelerating and managing the machine
learning (ML) project lifecycle. ML professionals, data scientists, and engineers can use it
in their day-to-day workflows to train and deploy models and manage machine learning
operations (MLOps).

You can create a model in Machine Learning or use a model built from an open-source
platform, such as PyTorch, TensorFlow, or scikit-learn. MLOps tools help you monitor,
retrain, and redeploy models.

 Tip

Free trial! If you don't have an Azure subscription, create a free account before you
begin. Try the free or paid version of Azure Machine Learning . You get credits
to spend on Azure services. After they're used up, you can keep the account and
use free Azure services . Your credit card is never charged unless you explicitly
change your settings and ask to be charged.

Who is Azure Machine Learning for?

Machine Learning is for individuals and teams implementing MLOps within their
organization to bring ML models into production in a secure and auditable production
environment.

Data scientists and ML engineers can use tools to accelerate and automate their day-to-
day workflows. Application developers can use tools for integrating models into
applications or services. Platform developers can use a robust set of tools, backed by
durable Azure Resource Manager APIs, for building advanced ML tooling.

Enterprises working in the Microsoft Azure cloud can use familiar security and role-
based access control for infrastructure. You can set up a project to deny access to
protected data and select operations.

Productivity for everyone on the team

ML projects often require a team with a varied skill set to build and maintain. Machine
Learning has tools that help enable you to:
Collaborate with your team via shared notebooks, compute resources, serverless
compute, data, and environments

Develop models for fairness and explainability, tracking and auditability to fulfill
lineage and audit compliance requirements

Deploy ML models quickly and easily at scale, and manage and govern them
efficiently with MLOps

Run machine learning workloads anywhere with built-in governance, security, and
compliance

Cross-compatible platform tools that meet your needs

Anyone on an ML team can use their preferred tools to get the job done. Whether
you're running rapid experiments, hyperparameter-tuning, building pipelines, or
managing inferences, you can use familiar interfaces including:

Azure Machine Learning studio

Python SDK (v2)
Azure CLI (v2))
Azure Resource Manager REST APIs

As you're refining the model and collaborating with others throughout the rest of the
Machine Learning development cycle, you can share and find assets, resources, and
metrics for your projects on the Machine Learning studio UI.

Studio
Machine Learning studio offers multiple authoring experiences depending on the type
of project and the level of your past ML experience, without having to install anything.

Notebooks: Write and run your own code in managed Jupyter Notebook servers
that are directly integrated in the studio.

Visualize run metrics: Analyze and optimize your experiments with visualization.
Azure Machine Learning designer: Use the designer to train and deploy ML
models without writing any code. Drag and drop datasets and components to
create ML pipelines.

Automated machine learning UI: Learn how to create automated ML experiments

with an easy-to-use interface.

Data labeling: Use Machine Learning data labeling to efficiently coordinate image
labeling or text labeling projects.

Enterprise-readiness and security

Machine Learning integrates with the Azure cloud platform to add security to ML
projects.

Security integrations include:

Azure Virtual Networks with network security groups.

Azure Key Vault, where you can save security secrets, such as access information
for storage accounts.
Azure Container Registry set up behind a virtual network.

For more information, see Tutorial: Set up a secure workspace.

Azure integrations for complete solutions

Other integrations with Azure services support an ML project from end to end. They
include:
Azure Synapse Analytics, which is used to process and stream data with Spark.
Azure Arc, where you can run Azure services in a Kubernetes environment.
Storage and database options, such as Azure SQL Database and Azure Blob
Storage.
Azure App Service, which you can use to deploy and manage ML-powered apps.
Microsoft Purview, which allows you to discover and catalog data assets across
your organization.

） Important

Machine Learning doesn't store or process your data outside of the region where
you deploy.

Machine learning project workflow

Typically, models are developed as part of a project with an objective and goals. Projects
often involve more than one person. When you experiment with data, algorithms, and
models, development is iterative.

Project lifecycle
The project lifecycle can vary by project, but it often looks like this diagram.

A workspace organizes a project and allows for collaboration for many users all working
toward a common objective. Users in a workspace can easily share the results of their
runs from experimentation in the studio user interface. Or they can use versioned assets
for jobs like environments and storage references.

For more information, see Manage Azure Machine Learning workspaces.

When a project is ready for operationalization, users' work can be automated in an ML

pipeline and triggered on a schedule or HTTPS request.

You can deploy models to the managed inferencing solution, for both real-time and
batch deployments, abstracting away the infrastructure management typically required
for deploying models.

Train models
In Machine Learning, you can run your training script in the cloud or build a model from
scratch. Customers often bring models they've built and trained in open-source
frameworks so that they can operationalize them in the cloud.

Open and interoperable

Data scientists can use models in Machine Learning that they've created in common
Python frameworks, such as:

PyTorch
TensorFlow
scikit-learn
XGBoost
LightGBM

Other languages and frameworks are also supported:

R
.NET

For more information, see Open-source integration with Azure Machine Learning.

Automated featurization and algorithm selection

In a repetitive, time-consuming process, in classical ML, data scientists use prior
experience and intuition to select the right data featurization and algorithm for training.
Automated ML (AutoML) speeds this process. You can use it through the Machine
Learning studio UI or the Python SDK.
For more information, see What is automated machine learning?.

Hyperparameter optimization
Hyperparameter optimization, or hyperparameter tuning, can be a tedious task. Machine
Learning can automate this task for arbitrary parameterized commands with little
modification to your job definition. Results are visualized in the studio.

For more information, see Tune hyperparameters.

Multinode distributed training

Efficiency of training for deep learning and sometimes classical machine learning
training jobs can be drastically improved via multinode distributed training. Azure
Machine Learning compute clusters and serverless compute offer the latest GPU
options.

Supported via Azure Machine Learning Kubernetes, Azure Machine Learning compute
clusters, and serverless compute:

PyTorch
TensorFlow
MPI

You can use MPI distribution for Horovod or custom multinode logic. Apache Spark is
supported via serverless Spark compute and attached Synapse Spark pool that use
Azure Synapse Analytics Spark clusters.

For more information, see Distributed training with Azure Machine Learning.

Embarrassingly parallel training

Scaling an ML project might require scaling embarrassingly parallel model training. This
pattern is common for scenarios like forecasting demand, where a model might be
trained for many stores.

Deploy models
To bring a model into production, it's deployed. The Machine Learning managed
endpoints abstract the required infrastructure for both batch or real-time (online) model
scoring (inferencing).
Real-time and batch scoring (inferencing)
Batch scoring, or batch inferencing, involves invoking an endpoint with a reference to
data. The batch endpoint runs jobs asynchronously to process data in parallel on
compute clusters and store the data for further analysis.

Real-time scoring, or online inferencing, involves invoking an endpoint with one or more
model deployments and receiving a response in near real time via HTTPS. Traffic can be
split across multiple deployments, allowing for testing new model versions by diverting
some amount of traffic initially and increasing after confidence in the new model is
established.

For more information, see:

Deploy a model with a real-time managed endpoint

Use batch endpoints for scoring

MLOps: DevOps for machine learning

DevOps for ML models, often called MLOps, is a process for developing models for
production. A model's lifecycle from training to deployment must be auditable if not
reproducible.

ML model lifecycle

Learn more about MLOps in Azure Machine Learning.

Integrations enabling MLOPs

Machine Learning is built with the model lifecycle in mind. You can audit the model
lifecycle down to a specific commit and environment.
Some key features enabling MLOps include:

git integration.

MLflow integration.
Machine learning pipeline scheduling.
Azure Event Grid integration for custom triggers.
Ease of use with CI/CD tools like GitHub Actions or Azure DevOps.

Machine Learning also includes features for monitoring and auditing:

Job artifacts, such as code snapshots, logs, and other outputs.

Lineage between jobs and assets, such as containers, data, and compute resources.

If you use Apache Airflow, the airflow-provider-azure-machinelearning package is a

provider that enables you to submit workflows to Azure Machine Learning from Apache
AirFlow.

Next steps
Start using Azure Machine Learning:

Set up an Azure Machine Learning workspace

Tutorial: Build a first machine learning project
Run training jobs
What is Azure Machine Learning CLI and
Python SDK v2?
Article • 10/31/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

Azure Machine Learning CLI v2 (CLI v2) and Azure Machine Learning Python SDK v2
(SDK v2) introduce a consistency of features and terminology across the interfaces. To
create this consistency, the syntax of commands differs, in some cases significantly, from
the first versions (v1).

There are no differences in functionality between CLI v2 and SDK v2. The command line-
based CLI might be more convenient in CI/CD MLOps types of scenarios, while the SDK
might be more convenient for development.

Azure Machine Learning CLI v2

Azure Machine Learning CLI v2 is the latest extension for the Azure CLI. CLI v2 provides
commands in the format az ml <noun> <verb> <options> to create and maintain
Machine Learning assets and workflows. The assets or workflows themselves are defined
by using a YAML file. The YAML file defines the configuration of the asset or workflow.
For example, what is it, and where should it run?

A few examples of CLI v2 commands:

az ml job create --file my_job_definition.yaml

az ml environment update --name my-env --file my_updated_env_definition.yaml

az ml model list

az ml compute show --name my_compute

Use cases for CLI v2

CLI v2 is useful in the following scenarios:

Onboard to Machine Learning without the need to learn a specific programming

language.

The YAML file defines the configuration of the asset or workflow, such as what is it
and where should it run? Any custom logic or IP used, say data preparation, model
training, and model scoring, can remain in script files. These files are referred to in
the YAML but aren't part of the YAML itself. Machine Learning supports script files
in Python, R, Java, Julia, or C#. All you need to learn is YAML format and command
lines to use Machine Learning. You can stick with script files of your choice.

Take advantage of ease of deployment and automation.

The use of command line for execution makes deployment and automation
simpler because you can invoke workflows from any offering or platform, which
allows users to call the command line.

Use managed inference deployments.

Machine Learning offers endpoints to streamline model deployments for both real-
time and batch inference deployments. This functionality is available only via CLI v2
and SDK v2.

Reuse components in pipelines.

Machine Learning introduces components for managing and reusing common

logic across pipelines. This functionality is available only via CLI v2 and SDK v2.

Azure Machine Learning Python SDK v2

Azure Machine Learning Python SDK v2 is an updated Python SDK package, which
allows users to:

Submit training jobs.

Manage data, models, and environments.
Perform managed inferencing (real time and batch).
Stitch together multiple tasks and production workflows by using Machine
Learning pipelines.

SDK v2 is on par with CLI v2 functionality and is consistent in how assets (nouns) and
actions (verbs) are used between SDK and CLI. For example, to list an asset, you can use
the list action in both SDK and CLI. You can use the same list action to list a
compute, model, environment, and so on.

Use cases for SDK v2

SDK v2 is useful in the following scenarios:

Use Python functions to build a single step or a complex workflow.

SDK v2 allows you to build a single command or a chain of commands like Python
functions. The command has a name and parameters, expects input, and returns
output.

Move from simple to complex concepts incrementally.

SDK v2 allows you to:

Construct a single command.
Add a hyperparameter sweep on top of that command.
Add the command with various others into a pipeline one after the other.

This construction is useful because of the iterative nature of machine learning.

Reuse components in pipelines.

Machine Learning introduces components for managing and reusing common

logic across pipelines. This functionality is available only via CLI v2 and SDK v2.

Use managed inferencing.

Machine Learning offers endpoints to streamline model deployments for both real-
time and batch inference deployments. This functionality is available only via CLI v2
and SDK v2.

Should I use v1 or v2?

Here are some considerations to help you decide which version to use.

CLI v2
Azure Machine Learning CLI v1 has been deprecated. We recommend that you use CLI
v2 if:

You were a CLI v1 user.

You want to use new features like reusable components and managed inferencing.
You don't want to use a Python SDK. CLI v2 allows you to use YAML with scripts in
Python, R, Java, Julia, or C#.
You were a user of R SDK previously. Machine Learning won't support an SDK in R .
However, CLI v2 has support for R scripts.
You want to use command line-based automation or deployments.
You don't need Spark Jobs. This feature is currently available in preview in CLI v2.

SDK v2
Azure Machine Learning Python SDK v1 doesn't have a planned deprecation date. If you
have significant investments in Python SDK v1 and don't need any new features offered
by SDK v2, you can continue to use SDK v1. However, you should consider using SDK v2
if:

You want to use new features like reusable components and managed inferencing.
You're starting a new workflow or pipeline. All new features and future investments
will be introduced in v2.
You want to take advantage of the improved usability of the Python SDK v2 ability
to compose jobs and pipelines by using Python functions, with easy evolution from
simple to complex tasks.

Next steps
Upgrade from v1 to v2

Get started with CLI v2:

Install and set up CLI (v2)
Train models with CLI (v2)
Deploy and score models with online endpoints

Get started with SDK v2:

Install and set up SDK (v2)
Train models with Azure Machine Learning Python SDK v2
Tutorial: Create production Machine Learning pipelines with Python SDK v2 in a
Jupyter notebook
Azure Machine Learning glossary
Article • 11/05/2023

The Azure Machine Learning glossary is a short dictionary of terminology for the
Machine Learning platform. For general Azure terminology, see also:

Microsoft Azure glossary: A dictionary of cloud terminology on the Azure platform

Cloud computing terms : General industry cloud terms
Azure fundamental concepts: Microsoft Cloud Adoption Framework for Azure

Component
A Machine Learning component is a self-contained piece of code that does one step in
a machine learning pipeline. Components are the building blocks of advanced machine
learning pipelines. Components can do tasks such as data processing, model training,
and model scoring. A component is analogous to a function. It has a name and
parameters, expects input, and returns output.

Compute
A compute is a designated compute resource where you run your job or host your
endpoint. Machine Learning supports the following types of compute:

Compute cluster: A managed-compute infrastructure that you can use to easily

create a cluster of CPU or GPU compute nodes in the cloud.

７ Note

Instead of creating a compute cluster, use serverless compute (preview) to

offload compute lifecycle management to Azure Machine Learning.

Compute instance: A fully configured and managed development environment in

the cloud. You can use the instance as a training or inference compute for
development and testing. It's similar to a virtual machine in the cloud.

Kubernetes cluster: Used to deploy trained machine learning models to Azure

Kubernetes Service (AKS). You can create an AKS cluster from your Machine
Learning workspace or attach an existing AKS cluster.
Attached compute: You can attach your own compute resources to your
workspace and use them for training and inference.

Data
Machine Learning allows you to work with different types of data:

URIs (a location in local or cloud storage):

uri_folder
uri_file

Tables (a tabular data abstraction):

mltable

Primitives:
string
boolean

number

For most scenarios, you use URIs ( uri_folder and uri_file ) to identify a location in
storage that can be easily mapped to the file system of a compute node in a job by
either mounting or downloading the storage to the node.

The mltable parameter is an abstraction for tabular data that's used for automated
machine learning (AutoML) jobs, parallel jobs, and some advanced scenarios. If you're
starting to use Machine Learning and aren't using AutoML, we strongly encourage you
to begin with URIs.

Datastore
Machine Learning datastores securely keep the connection information to your data
storage on Azure so that you don't have to code it in your scripts. You can register and
create a datastore to easily connect to your storage account and access the data in your
underlying storage service. The Azure Machine Learning CLI v2 and SDK v2 support the
following types of cloud-based storage services:

Azure Blob Storage container

Azure Files share
Azure Data Lake Storage
Azure Data Lake Storage Gen2

Environment
Machine Learning environments are an encapsulation of the environment where your
machine learning task happens. They specify the software packages, environment
variables, and software settings around your training and scoring scripts. The
environments are managed and versioned entities within your Machine Learning
workspace. Environments enable reproducible, auditable, and portable machine learning
workflows across various computes.

Types of environment
Machine Learning supports two types of environments: curated and custom.

Curated environments are provided by Machine Learning and are available in your
workspace by default. They're intended to be used as is. They contain collections of
Python packages and settings to help you get started with various machine learning
frameworks. These precreated environments also allow for faster deployment time. For a
full list, see Azure Machine Learning curated environments.

In custom environments, you're responsible for setting up your environment. Make sure
to install the packages and any other dependencies that your training or scoring script
needs on the compute. Machine Learning allows you to create your own environment
by using:

A Docker image.
A base Docker image with a conda YAML to customize further.
A Docker build context.

Model
Machine Learning models consist of the binary files that represent a machine learning
model and any corresponding metadata. You can create models from a local or remote
file or directory. For remote locations, https , wasbs , and azureml locations are
supported. The created model is tracked in the workspace under the specified name and
version. Machine Learning supports three types of storage format for models:

custom_model

mlflow_model
triton_model

Workspace
The workspace is the top-level resource for Machine Learning. It provides a centralized
place to work with all the artifacts you create when you use Machine Learning. The
workspace keeps a history of all jobs, including logs, metrics, output, and a snapshot of
your scripts. The workspace stores references to resources like datastores and compute.
It also holds all assets like models, environments, components, and data assets.

Next steps
What is Azure Machine Learning?
Tutorial: Create resources you need to
get started
Article • 08/17/2023

This article was partially created with the help of AI. An author reviewed and revised
the content as needed. Read more.

In this tutorial, you will create the resources you need to start working with Azure
Machine Learning.

＂ A workspace. To use Azure Machine Learning, you'll first need a workspace. The
workspace is the central place to view and manage all the artifacts and resources
you create.
＂ A compute instance. A compute instance is a pre-configured cloud-computing
resource that you can use to train, automate, manage, and track machine learning
models. A compute instance is the quickest way to start using the Azure Machine
Learning SDKs and CLIs. You'll use it to run Jupyter notebooks and Python scripts in
the rest of the tutorials.

This video shows you how to create a workspace and compute instance. The steps are
also described in the sections below.
https://learn-video.azurefd.net/vod/player?id=a0e901d2-e82a-4e96-9c7f-
3b5467859969&locale=en-us&embedUrl=%2Fazure%2Fmachine-
learning%2Fquickstart-create-resources

Prerequisites
An Azure account with an active subscription. Create an account for free .

Create the workspace

The workspace is the top-level resource for your machine learning activities, providing a
centralized place to view and manage the artifacts you create when you use Azure
Machine Learning.

If you already have a workspace, skip this section and continue to Create a compute
instance.

If you don't yet have a workspace, create one now:

1. Sign in to Azure Machine Learning studio

2. Select Create workspace

3. Provide the following information to configure your new workspace:

Field Description

Workspace Enter a unique name that identifies your workspace. Names must be unique
name across the resource group. Use a name that's easy to recall and to
differentiate from workspaces created by others. The workspace name is
case-insensitive.

Subscription Select the Azure subscription that you want to use.

Resource Use an existing resource group in your subscription or enter a name to

group create a new resource group. A resource group holds related resources for
an Azure solution. You need contributor or owner role to use an existing
resource group. For more information about access, see Manage access to
an Azure Machine Learning workspace.

Region Select the Azure region closest to your users and the data resources to
create your workspace.

4. Select Create to create the workspace

７ Note

This creates a workspace along with all required resources. If you would like to
reuse resources, such as Storage Account, Azure Container Registry, Azure KeyVault,
or Application Insights, use the Azure portal instead.

Create a compute instance

You'll use the compute instance to run Jupyter notebooks and Python scripts in the rest
of the tutorials. If you don't yet have a compute instance, create one now:

1. On the left navigation, select Notebooks.

2. Select Create compute in the middle of the page.

 Tip

You'll only see this option if you don't yet have a compute instance in your
workspace.

3. Supply a name. Keep all the defaults on the first page.

4. Keep the default values for the rest of the page.

5. Select Create.

Quick tour of the studio

The studio is your web portal for Azure Machine Learning. This portal combines no-code
and code-first experiences for an inclusive data science platform.

Review the parts of the studio on the left-hand navigation bar:

The Authoring section of the studio contains multiple ways to get started in
creating machine learning models. You can:
Notebooks section allows you to create Jupyter Notebooks, copy sample
notebooks, and run notebooks and Python scripts.
Automated ML steps you through creating a machine learning model without
writing code.
Designer gives you a drag-and-drop way to build models using prebuilt
components.
The Assets section of the studio helps you keep track of the assets you create as
you run your jobs. If you have a new workspace, there's nothing in any of these
sections yet.

The Manage section of the studio lets you create and manage compute and
external services you link to your workspace. It's also where you can create and
manage a Data labeling project.

Learn from sample notebooks

Use the sample notebooks available in studio to help you learn about how to train and
deploy models. They're referenced in many of the other articles and tutorials.

1. On the left navigation, select Notebooks.

2. At the top, select Samples.
Use notebooks in the SDK v2 folder for examples that show the current version of
the SDK, v2.
These notebooks are read-only, and are updated periodically.
When you open a notebook, select the Clone this notebook button at the top to
add your copy of the notebook and any associated files into your own files. A new
folder with the notebook is created for you in the Files section.

Create a new notebook

When you clone a notebook from Samples, a copy is added to your files and you can
start running or modifying it. Many of the tutorials will mirror these sample notebooks.

But you could also create a new, empty notebook, then copy/paste code from a tutorial
into the notebook. To do so:

1. Still in the Notebooks section, select Files to go back to your files,

2. Select + to add files.

3. Select Create new file.

Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, on the left, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.

Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
You now have an Azure Machine Learning workspace, which contains a compute
instance to use for your development environment.

Continue on to learn how to use the compute instance to run notebooks and scripts in
the Azure Machine Learning cloud.

Quickstart: Get to know Azure Machine Learning

Use your compute instance with the following tutorials to train and deploy a model.

Tutorial Description

Upload, access and explore your data in Store large data in the cloud and retrieve it from
Azure Machine Learning notebooks and scripts

Model development on a cloud workstation Start prototyping and developing machine

learning models

Train a model in Azure Machine Learning Dive in to the details of training a model
Tutorial Description

Deploy a model as an online endpoint Dive in to the details of deploying a model

Create production machine learning pipelines Split a complete machine learning task into a
multistep workflow.
Set up a Python development
environment for Azure Machine
Learning
Article • 04/25/2023

Learn how to configure a Python development environment for Azure Machine

Learning.

The following table shows each development environment covered in this article, along
with pros and cons.

Environment Pros Cons

Local Full control of your development Takes longer to get started.

environment environment and dependencies. Run with Necessary SDK packages must be
any build tool, environment, or IDE of your installed, and an environment must
choice. also be installed if you don't already
have one.

The Data Similar to the cloud-based compute A slower getting started experience
Science instance (Python is pre-installed), but with compared to the cloud-based
Virtual additional popular data science and compute instance.
Machine machine learning tools pre-installed. Easy
(DSVM) to scale and combine with other custom
tools and workflows.

Azure Easiest way to get started. The SDK is Lack of control over your
Machine already installed in your workspace VM, development environment and
Learning and notebook tutorials are pre-cloned and dependencies. Additional cost
compute ready to run. incurred for Linux VM (VM can be
instance stopped when not in use to avoid
charges). See pricing details .

This article also provides additional usage tips for the following tools:

Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some
extras that you should install.

Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning
extension includes language support for Python, and features to make working
with the Azure Machine Learning much more convenient and productive.

Prerequisites
Azure Machine Learning workspace. If you don't have one, you can create an Azure
Machine Learning workspace through the Azure portal, Azure CLI, and Azure
Resource Manager templates.

Local and DSVM only: Create a workspace configuration

file
The workspace configuration file is a JSON file that tells the SDK how to communicate
with your Azure Machine Learning workspace. The file is named config.json, and it has
the following format:

JSON

{
"subscription_id": "<subscription-id>",
"resource_group": "<resource-group>",
"workspace_name": "<workspace-name>"
}

This JSON file must be in the directory structure that contains your Python scripts or
Jupyter Notebooks. It can be in the same directory, a subdirectory named.azureml*, or in
a parent directory.

To use this file from your code, use the MLClient.from_config method. This code loads
the information from the file and connects to your workspace.

Create a workspace configuration file in one of the following methods:

Azure Machine Learning studio

Download the file:

1. Sign in to Azure Machine Learning studio

2. In the upper right Azure Machine Learning studio toolbar, select your
workspace name.
3. Select the Download config file link.


Azure Machine Learning Python SDK

Create a script to connect to your Azure Machine Learning workspace. Make sure
to replace subscription_id , resource_group , and workspace_name with your own.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Python

#import required libraries

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

#Enter details of your Azure Machine Learning workspace

subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AZUREML_WORKSPACE_NAME>'

#connect to the workspace

ml_client = MLClient(DefaultAzureCredential(), subscription_id,
resource_group, workspace)

Local computer or remote VM environment

You can set up an environment on a local computer or remote virtual machine, such as
an Azure Machine Learning compute instance or Data Science VM.

To configure a local development environment or remote VM:

1. Create a Python virtual environment (virtualenv, conda).

７ Note
Although not required, it's recommended you use Anaconda or
Miniconda to manage Python virtual environments and install packages.

） Important

If you're on Linux or macOS and use a shell other than bash (for example, zsh)
you might receive errors when you run some commands. To work around this
problem, use the bash command to start a new bash shell and run the
commands there.

2. Activate your newly created Python virtual environment.

3. Install the Azure Machine Learning Python SDK.

4. To configure your local environment to use your Azure Machine Learning

workspace, create a workspace configuration file or use an existing one.

Now that you have your local environment set up, you're ready to start working with
Azure Machine Learning. See the Tutorial: Azure Machine Learning in a day to get
started.

Jupyter Notebooks
When running a local Jupyter Notebook server, it's recommended that you create an
IPython kernel for your Python virtual environment. This helps ensure the expected
kernel and package import behavior.

1. Enable environment-specific IPython kernels

Bash

conda install notebook ipykernel

2. Create a kernel for your Python virtual environment. Make sure to replace <myenv>
with the name of your Python virtual environment.

Bash

ipython kernel install --user --name <myenv> --display-name "Python

(myenv)"

3. Launch the Jupyter Notebook server

 Tip

For example notebooks, see the AzureML-Examples repository. SDK examples

are located under /sdk/python . For example, the Configuration notebook
example.

Visual Studio Code

To use Visual Studio Code for development:

1. Install Visual Studio Code .

2. Install the Azure Machine Learning Visual Studio Code extension (preview).

Once you have the Visual Studio Code extension installed, use it to:

Manage your Azure Machine Learning resources

Connect to an Azure Machine Learning compute instance
Run and debug experiments
Deploy trained models.

Azure Machine Learning compute instance

The Azure Machine Learning compute instance is a secure, cloud-based Azure
workstation that provides data scientists with a Jupyter Notebook server, JupyterLab,
and a fully managed machine learning environment.

There's nothing to install or configure for a compute instance.

Create one anytime from within your Azure Machine Learning workspace. Provide just a
name and specify an Azure VM type. Try it now with Create resources to get started.

To learn more about compute instances, including how to install packages, see Create
and manage an Azure Machine Learning compute instance.

 Tip

To prevent incurring charges for an unused compute instance, enable idle

shutdown.

In addition to a Jupyter Notebook server and JupyterLab, you can use compute
instances in the integrated notebook feature inside of Azure Machine Learning studio.
You can also use the Azure Machine Learning Visual Studio Code extension to connect
to a remote compute instance using VS Code.

Data Science Virtual Machine

The Data Science VM is a customized virtual machine (VM) image you can use as a
development environment. It's designed for data science work that's pre-configured
tools and software like:

Packages such as TensorFlow, PyTorch, Scikit-learn, XGBoost, and the Azure

Machine Learning SDK
Popular data science tools such as Spark Standalone and Drill
Azure tools such as the Azure CLI, AzCopy, and Storage Explorer
Integrated development environments (IDEs) such as Visual Studio Code and
PyCharm
Jupyter Notebook Server

For a more comprehensive list of the tools, see the Data Science VM tools guide.

） Important

If you plan to use the Data Science VM as a compute target for your training or
inferencing jobs, only Ubuntu is supported.

To use the Data Science VM as a development environment:

1. Create a Data Science VM using one of the following methods:

Use the Azure portal to create an Ubuntu or Windows DSVM.

Create a Data Science VM using ARM templates.

Use the Azure CLI

To create an Ubuntu Data Science VM, use the following command:

Azure CLI

# create a Ubuntu Data Science VM in your resource group

# note you need to be at least a contributor to the resource group
in order to execute this command successfully
# If you need to create a new resource group use: "az group create
--name YOUR-RESOURCE-GROUP-NAME --location YOUR-REGION (For
example: westus2)"
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name
YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-
ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --
admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-
type password

To create a Windows DSVM, use the following command:

Azure CLI

# create a Windows Server 2016 DSVM in your resource group

# note you need to be at least a contributor to the resource group
in order to execute this command successfully
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name
YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-
2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-
PASSWORD --authentication-type password

2. Create a conda environment for the Azure Machine Learning SDK:

Bash

conda create -n py310 python=310

3. Once the environment has been created, activate it and install the SDK

Bash

conda activate py310

pip install azure-ai-ml azure-identity

4. To configure the Data Science VM to use your Azure Machine Learning workspace,
create a workspace configuration file or use an existing one.

 Tip

Similar to local environments, you can use Visual Studio Code and the Azure
Machine Learning Visual Studio Code extension to interact with Azure
Machine Learning.

For more information, see Data Science Virtual Machines .

Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference .
Install and set up the CLI (v2)
Article • 04/04/2023

APPLIES TO: Azure CLI ml extension v2 (current)

The ml extension to the Azure CLI is the enhanced interface for Azure Machine Learning.
It enables you to train and deploy models from the command line, with features that
accelerate scaling data science up and out while tracking the model lifecycle.

Prerequisites
To use the CLI, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today.
To use the CLI commands in this document from your local environment, you
need the Azure CLI.

Installation
The new Machine Learning extension requires Azure CLI version >=2.38.0 . Ensure this
requirement is met:

Azure CLI

az version

If it isn't, upgrade your Azure CLI.

Check the Azure CLI extensions you've installed:

Azure CLI

az extension list

Remove any existing installation of the ml extension and also the CLI v1 azure-cli-ml
extension:

Azure CLI

az extension remove -n azure-cli-ml

az extension remove -n ml
Now, install the ml extension:

Azure CLI

az extension add -n ml

Run the help command to verify your installation and see available subcommands:

Azure CLI

az ml -h

You can upgrade the extension to the latest version:

Azure CLI

az extension update -n ml

Installation on Linux
If you're using Linux, the fastest way to install the necessary CLI version and the Machine
Learning extension is:

Bash

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

az extension add -n ml -y

For more, see Install the Azure CLI for Linux.

Set up
Login:

Azure CLI

az login

If you have access to multiple Azure subscriptions, you can set your active subscription:

Azure CLI
az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>"

Optionally, setup common variables in your shell for usage in subsequent commands:

Azure CLI

GROUP="azureml-examples"

LOCATION="eastus"

WORKSPACE="main"

２ Warning

This uses Bash syntax for setting variables -- adjust as needed for your shell. You
can also replace the values in commands below inline rather than using variables.

If it doesn't already exist, you can create the Azure resource group:

Azure CLI

az group create -n $GROUP -l $LOCATION

And create a machine learning workspace:

Azure CLI

az ml workspace create -n $WORKSPACE -g $GROUP -l $LOCATION

Machine learning subcommands require the --workspace/-w and --resource-group/-g

parameters. To avoid typing these repeatedly, configure defaults:

Azure CLI

az configure --defaults group=$GROUP workspace=$WORKSPACE location=$LOCATION

 Tip
Most code examples assume you have set a default workspace and resource group.
You can override these on the command line.

You can show your current defaults using --list-defaults/-l :

Azure CLI

az configure -l -o table

 Tip

Combining with --output/-o allows for more readable output formats.

Secure communications
The ml CLI extension (sometimes called 'CLI v2') for Azure Machine Learning sends
operational data (YAML parameters and metadata) over the public internet. All the ml
CLI extension commands communicate with the Azure Resource Manager. This
communication is secured using HTTPS/TLS 1.2.

Data in a data store that is secured in a virtual network is not sent over the public
internet. For example, if your training data is located in the default storage account for
the workspace, and the storage account is in a virtual network.

７ Note

With the previous extension ( azure-cli-ml , sometimes called 'CLI v1'), only some of
the commands communicate with the Azure Resource Manager. Specifically,
commands that create, update, delete, list, or show Azure resources. Operations
such as submitting a training job communicate directly with the Azure Machine
Learning workspace. If your workspace is secured with a private endpoint, that is
enough to secure commands provided by the azure-cli-ml extension.

Public workspace

If your Azure Machine Learning workspace is public (that is, not behind a virtual
network), then there is no additional configuration required. Communications are
secured using HTTPS/TLS 1.2
Next steps
Train models using CLI (v2)
Set up the Visual Studio Code Azure Machine Learning extension
Train an image classification TensorFlow model using the Azure Machine Learning
Visual Studio Code extension
Explore Azure Machine Learning with examples
Set up Visual Studio Code desktop with
the Azure Machine Learning extension
(preview)
Article • 06/15/2023

Learn how to set up the Azure Machine Learning Visual Studio Code extension for your
machine learning workflows. You only need to do this setup when using the VS Code
desktop application. If you use VS Code for the Web, this is handled for you.

The Azure Machine Learning extension for VS Code provides a user interface to:

Manage Azure Machine Learning resources (experiments, virtual machines, models,

deployments, etc.)
Develop locally using remote compute instances
Train machine learning models
Debug machine learning experiments locally
Schema-based language support, autocompletion and diagnostics for specification
file authoring

） Important

This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .

Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning .
Visual Studio Code. If you don't have it, install it .
Python
(Optional) To create resources using the extension, you need to install the CLI (v2).
For setup instructions, see Install, set up, and use the CLI (v2).
Clone the community driven repository

Bash
git clone https://github.com/Azure/azureml-examples.git --depth 1

Install the extension

1. Open Visual Studio Code.

2. Select Extensions icon from the Activity Bar to open the Extensions view.

3. In the Extensions view search bar, type "Azure Machine Learning" and select the
first extension.

4. Select Install.

７ Note

The Azure Machine Learning VS Code extension uses the CLI (v2) by default. To
switch to the 1.0 CLI, set the azureML.CLI Compatibility Mode setting in Visual
Studio Code to 1.0 . For more information on modifying your settings in Visual
Studio, see the user and workspace settings documentation .

Sign in to your Azure Account

In order to provision resources and job workloads on Azure, you have to sign in with
your Azure account credentials. To assist with account management, Azure Machine
Learning automatically installs the Azure Account extension. Visit the following site to
learn more about the Azure Account extension .

To sign into your Azure account, select the Azure: Sign In button in the bottom right
corner on the Visual Studio Code status bar to start the sign in process.

Choose your default workspace

Choosing a default Azure Machine Learning workspace enables the following when
authoring CLI (v2) YAML specification files:

Schema validation
Autocompletion
Diagnostics

If you don't have a workspace, create one. For more information, see manage Azure
Machine Learning resources with the VS Code extension.

To choose your default workspace, select the Set Azure Machine Learning Workspace
button on the Visual Studio Code status bar and follow the prompts to set your
workspace.

Alternatively, use the > Azure ML: Set Default Workspace command in the command
palette and follow the prompts to set your workspace.

Next Steps
Manage your Azure Machine Learning resources
Develop on a remote compute instance locally
Train an image classification model using the Visual Studio Code extension
Run and debug machine learning experiments locally (CLI v1)
Quickstart: Get started with Azure
Machine Learning
Article • 10/20/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

This tutorial is an introduction to some of the most used features of the Azure Machine
Learning service. In it, you will create, register and deploy a model. This tutorial will help
you become familiar with the core concepts of Azure Machine Learning and their most
common usage.

You'll learn how to run a training job on a scalable compute resource, then deploy it,
and finally test the deployment.

You'll create a training script to handle the data preparation, train and register a model.
Once you train the model, you'll deploy it as an endpoint, then call the endpoint for
inferencing.

The steps you'll take are:

＂ Set up a handle to your Azure Machine Learning workspace

＂ Create your training script
＂ Create a scalable compute resource, a compute cluster
＂ Create and run a command job that will run the training script on the compute
cluster, configured with the appropriate job environment
＂ View the output of your training script
＂ Deploy the newly-trained model as an endpoint
＂ Call the Azure Machine Learning endpoint for inferencing

Watch this video for an overview of the steps in this quickstart.

https://learn-video.azurefd.net/vod/player?id=02ca158d-103d-4934-a8aa-
fe6667533433&locale=en-us&embedUrl=%2Fazure%2Fmachine-learning%2Ftutorial-
azure-ml-in-a-day

Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.

2. Sign in to studio and select your workspace if it's not already open.
3. Open or create a notebook in your workspace:

Create a new notebook, if you want to copy/paste code into cells.

Or, open tutorials/get-started-notebooks/quickstart.ipynb from the
Samples section of studio. Then select Clone to add the notebook to your
Files. (See where to find Samples.)

Set your kernel

1. On the top bar above your opened notebook, create a compute instance if you
don't already have one.

2. If the compute instance is stopped, select Start compute and wait until it is
running.

3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.

4. If you see a banner that says you need to be authenticated, select Authenticate.

） Important

The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.

Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. The workspace
is the top-level resource for Azure Machine Learning, providing a centralized place to
work with all the artifacts you create when you use Azure Machine Learning.

You'll create ml_client for a handle to the workspace. You'll then use ml_client to
manage resources and jobs.
In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:

1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.

Python

from azure.ai.ml import MLClient

from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

SUBSCRIPTION="<SUBSCRIPTION_ID>"
RESOURCE_GROUP="<RESOURCE_GROUP>"
WS_NAME="<AML_WORKSPACE_NAME>"
# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id=SUBSCRIPTION,
resource_group_name=RESOURCE_GROUP,
workspace_name=WS_NAME,
)
７ Note

Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).

Python

# Verify that the handle works correctly.

# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and
WS_NAME in the previous cell.
ws = ml_client.workspaces.get(WS_NAME)
print(ws.location,":", ws.resource_group)

Create training script

Let's start by creating the training script - the main.py Python file.

First create a source folder for the script:

Python

import os

train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

This script handles the preprocessing of the data, splitting it into test and train data. It
then consumes this data to train a tree based model and return the output model.

MLFlow will be used to log the parameters and metrics during our pipeline run.

The cell below uses IPython magic to write the training script into the directory you just
created.

Python

%%writefile {train_src_dir}/main.py
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

def main():
"""Main function of the script."""

# input and output arguments

parser = argparse.ArgumentParser()
parser.add_argument("--data", type=str, help="path to input data")
parser.add_argument("--test_train_ratio", type=float, required=False,
default=0.25)
parser.add_argument("--n_estimators", required=False, default=100,
type=int)
parser.add_argument("--learning_rate", required=False, default=0.1,
type=float)
parser.add_argument("--registered_model_name", type=str, help="model
name")
args = parser.parse_args()

# Start Logging
mlflow.start_run()

# enable autologging
mlflow.sklearn.autolog()

###################
#<prepare the data>
###################
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))

print("input data:", args.data)

credit_df = pd.read_csv(args.data, header=1, index_col=0)

mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)

train_df, test_df = train_test_split(

credit_df,
test_size=args.test_train_ratio,
)
####################
#</prepare the data>
####################

##################
#<train the model>
##################
# Extracting the label column
y_train = train_df.pop("default payment next month")

# convert the dataframe values to array

X_train = train_df.values

# Extracting the label column

y_test = test_df.pop("default payment next month")
# convert the dataframe values to array
X_test = test_df.values

print(f"Training with data of shape {X_train.shape}")

clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))
###################
#</train the model>
###################

##########################
#<save and register model>
##########################
# Registering the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=args.registered_model_name,
artifact_path=args.registered_model_name,
)

# Saving the model to a file

mlflow.sklearn.save_model(
sk_model=clf,
path=os.path.join(args.registered_model_name, "trained_model"),
)
###########################
#</save and register model>
###########################

# Stop Logging
mlflow.end_run()

if __name__ == "__main__":
main()

As you can see in this script, once the model is trained, the model file is saved and
registered to the workspace. Now you can use the registered model in inferencing
endpoints.

You might need to select Refresh to see the new folder and script in your Files.
Configure the command
Now that you have a script that can perform the desired tasks, and a compute cluster to
run the script, you'll use a general purpose command that can run command line
actions. This command line action can directly call system commands or run a script.

Here, you'll create input variables to specify the input data, split ratio, learning rate and
registered model name. The command script will:

Use an environment that defines software and runtime libraries needed for the
training script. Azure Machine Learning provides many curated or ready-made
environments, which are useful for common training and inference scenarios. You'll
use one of those environments here. In Tutorial: Train a model in Azure Machine
Learning, you'll learn how to create a custom environment.
Configure the command line action itself - python main.py in this case. The
inputs/outputs are accessible in the command via the ${{ ... }} notation.
In this sample, we access the data from a file on the internet.
Since a compute resource was not specified, the script will be run on a serverless
compute cluster that is automatically created.

Python

from azure.ai.ml import command

from azure.ai.ml import Input

registered_model_name = "credit_defaults_model"

job = command(
inputs=dict(
data=Input(
type="uri_file",

path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
),
test_train_ratio=0.2,
learning_rate=0.25,
registered_model_name=registered_model_name,
),
code="./src/", # location of source code
command="python main.py --data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --
registered_model_name ${{inputs.registered_model_name}}",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
display_name="credit_default_prediction",
)

Submit the job

It's now time to submit the job to run in Azure Machine Learning. This time you'll use
create_or_update on ml_client .

Python

ml_client.create_or_update(job)

View job output and wait for job completion

View the job in Azure Machine Learning studio by selecting the link in the output of the
previous cell.

The output of this job will look like this in the Azure Machine Learning studio. Explore
the tabs for various details like metrics, outputs etc. Once completed, the job will
register a model in your workspace as a result of training.
） Important

Wait until the status of the job is complete before returning to this notebook to
continue. The job will take 2 to 3 minutes to run. It could take longer (up to 10
minutes) if the compute cluster has been scaled down to zero nodes and custom
environment is still building.

Deploy the model as an online endpoint

Now deploy your machine learning model as a web service in the Azure cloud, an online
endpoint.

To deploy a machine learning service, you'll use the model you registered.

Create a new online endpoint

Python

import uuid

# Creating a unique name for the endpoint

online_endpoint_name = "credit-endpoint-" + str(uuid.uuid4())[:8]

Create the endpoint:

Python

# Expect the endpoint creation to take a few minutes

from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
)

# create an online endpoint

endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is an online endpoint",
auth_mode="key",
tags={
"training_dataset": "credit_defaults",
"model_type": "sklearn.GradientBoostingClassifier",
},
)

endpoint =
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

print(f"Endpoint {endpoint.name} provisioning state:

{endpoint.provisioning_state}")

７ Note

Expect the endpoint creation to take a few minutes.

Once the endpoint has been created, you can retrieve it as below:

Python

endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

print(
f'Endpoint "{endpoint.name}" with provisioning state "
{endpoint.provisioning_state}" is retrieved'
)

Deploy the model to the endpoint

Once the endpoint is created, deploy the model with the entry script. Each endpoint can
have multiple deployments. Direct traffic to these deployments can be specified using
rules. Here you'll create a single deployment that handles 100% of the incoming traffic.
We have chosen a color name for the deployment, for example, blue, green, red
deployments, which is arbitrary.

You can check the Models page on Azure Machine Learning studio, to identify the latest
version of your registered model. Alternatively, the code below will retrieve the latest
version number for you to use.

Python

# Let's pick the latest version of the model

latest_model_version = max(
[int(m.version) for m in
ml_client.models.list(name=registered_model_name)]
)
print(f'Latest model is version "{latest_model_version}" ')

Deploy the latest version of the model.

Python

# picking the model to deploy. Here we use the latest version of our
registered model
model = ml_client.models.get(name=registered_model_name,
version=latest_model_version)

# Expect this deployment to take approximately 6 to 8 minutes.

# create an online deployment.
# if you run into an out of quota error, change the instance_type to a
comparable VM that is available.
# Learn more on https://azure.microsoft.com/pricing/details/machine-
learning/.
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
instance_type="Standard_DS3_v2",
instance_count=1,
)

blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()

７ Note

Expect this deployment to take approximately 6 to 8 minutes.

When the deployment is done, you're ready to test it.

Test with a sample query

Once the model is deployed to the endpoint, you can run inference with it.

Create a sample request file following the design expected in the run method in the
score script.

Python

deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)
Python

%%writefile {deploy_dir}/sample-request.json
{
"input_data": {
"columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
"index": [0, 1],
"data": [

[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
10, 9, 8]
]
}
}

Python

# test the blue deployment with some sample data

ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
request_file="./deploy/sample-request.json",
deployment_name="blue",
)

Clean up resources
If you're not going to use the endpoint, delete it to stop using the resource. Make sure
no other deployments are using an endpoint before you delete it.

７ Note

Expect the complete deletion to take approximately 20 minutes.

Python

ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, in the left navigation area, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.

Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Now that you have an idea of what's involved in training and deploying a model, learn
more about the process in these tutorials:
Tutorial Description

Upload, access and explore your data in Store large data in the cloud and retrieve it from
Azure Machine Learning notebooks and scripts

Model development on a cloud workstation Start prototyping and developing machine

learning models

Train a model in Azure Machine Learning Dive in to the details of training a model

Deploy a model as an online endpoint Dive in to the details of deploying a model

Create production machine learning pipelines Split a complete machine learning task into a
multistep workflow.
Tutorial: Upload, access and explore
your data in Azure Machine Learning
Article • 12/27/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

In this tutorial you learn how to:

＂ Upload your data to cloud storage

＂ Create an Azure Machine Learning data asset
＂ Access your data in a notebook for interactive development
＂ Create new versions of data assets

The start of a machine learning project typically involves exploratory data analysis (EDA),
data-preprocessing (cleaning, feature engineering), and the building of Machine
Learning model prototypes to validate hypotheses. This prototyping project phase is
highly interactive. It lends itself to development in an IDE or a Jupyter notebook, with a
Python interactive console. This tutorial describes these ideas.

This video shows how to get started in Azure Machine Learning studio so that you can
follow the steps in the tutorial. The video shows how to create a notebook, clone the
notebook, create a compute instance, and download the data needed for the tutorial.
The steps are also described in the following sections.
https://learn-video.azurefd.net/vod/player?id=514a29e2-0ae7-4a5d-a537-
8f10681f5545&locale=en-us&embedUrl=%2Fazure%2Fmachine-learning%2Ftutorial-
explore-data

2. Sign in to studio and select your workspace if it's not already open.

3. Open or create a notebook in your workspace:

Create a new notebook, if you want to copy/paste code into cells.

Or, open tutorials/get-started-notebooks/explore-data.ipynb from the
Samples section of studio. Then select Clone to add the notebook to your
Files. (See where to find Samples.)
Set your kernel
1. On the top bar above your opened notebook, create a compute instance if you
don't already have one.

2. If the compute instance is stopped, select Start compute and wait until it is
running.

3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.

4. If you see a banner that says you need to be authenticated, select Authenticate.

） Important

The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.

Download the data used in this tutorial

For data ingestion, the Azure Data Explorer handles raw data in these formats. This
tutorial uses this CSV-format credit card client data sample . We see the steps proceed
in an Azure Machine Learning resource. In that resource, we'll create a local folder with
the suggested name of data directly under the folder where this notebook is located.

７ Note

This tutorial depends on data placed in an Azure Machine Learning resource folder
location. For this tutorial, 'local' means a folder location in that Azure Machine
Learning resource.

1. Select Open terminal below the three dots, as shown in this image:
2. The terminal window opens in a new tab.

3. Make sure you cd to the same folder where this notebook is located. For example,
if the notebook is in a folder named get-started-notebooks:

cd get-started-notebooks # modify this to the path where your

notebook is located

4. Enter these commands in the terminal window to copy the data to your compute
instance:

mkdir data
cd data # the sub-folder where you'll store the
data
wget
https://azuremlexamples.blob.core.windows.net/datasets/credit_card/defa
ult_of_credit_card_clients.csv

5. You can now close the terminal window.

Learn more about this data on the UCI Machine Learning Repository.

Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create
ml_client for a handle to the workspace. You'll then use ml_client to manage

resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:

Python

from azure.ai.ml import MLClient

from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# authenticate
credential = DefaultAzureCredential()

# Get a handle to the workspace

ml_client = MLClient(
credential=credential,
subscription_id="<SUBSCRIPTION_ID>",
resource_group_name="<RESOURCE_GROUP>",
workspace_name="<AML_WORKSPACE_NAME>",
)

７ Note

Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).

Upload data to cloud storage

Azure Machine Learning uses Uniform Resource Identifiers (URIs), which point to storage
locations in the cloud. A URI makes it easy to access data in notebooks and jobs. Data
URI formats look similar to the web URLs that you use in your web browser to access
web pages. For example:

Access data from public https server:

https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>

Access data from Azure Data Lake Gen 2:

abfss://<file_system>@<account_name>.dfs.core.windows.net/<folder>/<file>

An Azure Machine Learning data asset is similar to web browser bookmarks (favorites).
Instead of remembering long storage paths (URIs) that point to your most frequently
used data, you can create a data asset, and then access that asset with a friendly name.
Data asset creation also creates a reference to the data source location, along with a
copy of its metadata. Because the data remains in its existing location, you incur no
extra storage cost, and don't risk data source integrity. You can create Data assets from
Azure Machine Learning datastores, Azure Storage, public URLs, and local files.

 Tip

For smaller-size data uploads, Azure Machine Learning data asset creation works
well for data uploads from local machine resources to cloud storage. This approach
avoids the need for extra tools or utilities. However, a larger-size data upload might
require a dedicated tool or utility - for example, azcopy. The azcopy command-line
tool moves data to and from Azure Storage. Learn more about azcopy here.

The next notebook cell creates the data asset. The code sample uploads the raw data file
to the designated cloud storage resource.

Each time you create a data asset, you need a unique version for it. If the version already
exists, you'll get an error. In this code, we're using the "initial" for the first read of the
data. If that version already exists, we'll skip creating it again.

You can also omit the version parameter, and a version number is generated for you,
starting with 1 and then incrementing from there.

In this tutorial, we use the name "initial" as the first version. The Create production
machine learning pipelines tutorial will also use this version of the data, so here we are
using a value that you'll see again in that tutorial.

Python

from azure.ai.ml.entities import Data

from azure.ai.ml.constants import AssetTypes

# update the 'my_path' variable to match the location of where you

downloaded the data on your
# local filesystem

my_path = "./data/default_of_credit_card_clients.csv"
# set the version number of the data asset
v1 = "initial"

my_data = Data(
name="credit-card",
version=v1,
description="Credit card data",
path=my_path,
type=AssetTypes.URI_FILE,
)
## create data asset if it doesn't already exist:
try:
data_asset = ml_client.data.get(name="credit-card", version=v1)
print(
f"Data asset already exists. Name: {my_data.name}, version:
{my_data.version}"
)
except:
ml_client.data.create_or_update(my_data)
print(f"Data asset created. Name: {my_data.name}, version:
{my_data.version}")

You can see the uploaded data by selecting Data on the left. You'll see the data is
uploaded and a data asset is created:

This data is named credit-card, and in the Data assets tab, we can see it in the Name
column. This data uploaded to your workspace's default datastore named
workspaceblobstore, seen in the Data source column.

An Azure Machine Learning datastore is a reference to an existing storage account on

Azure. A datastore offers these benefits:

1. A common and easy-to-use API, to interact with different storage types

(Blob/Files/Azure Data Lake Storage) and authentication methods.
2. An easier way to discover useful datastores, when working as a team.
3. In your scripts, a way to hide connection information for credential-based data
access (service principal/SAS/key).

Access your data in a notebook

Pandas directly support URIs - this example shows how to read a CSV file from an Azure
Machine Learning Datastore:
import pandas as pd

df =
pd.read_csv("azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspa
ces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/<filename>.c
sv")

However, as mentioned previously, it can become hard to remember these URIs.

Additionally, you must manually substitute all <substring> values in the pd.read_csv
command with the real values for your resources.

You'll want to create data assets for frequently accessed data. Here's an easier way to
access the CSV file in Pandas:

） Important

In a notebook cell, execute this code to install the azureml-fsspec Python library in
your Jupyter kernel:

Python

%pip install -U azureml-fsspec

Python

import pandas as pd

# get a handle of the data asset and print the URI

data_asset = ml_client.data.get(name="credit-card", version=v1)
print(f"Data asset URI: {data_asset.path}")

# read into pandas - note that you will see 2 headers in your data frame -
that is ok, for now

df = pd.read_csv(data_asset.path)
df.head()

Read Access data from Azure cloud storage during interactive development to learn
more about data access in a notebook.

Create a new version of the data asset

You might have noticed that the data needs a little light cleaning, to make it fit to train a
machine learning model. It has:

two headers
a client ID column; we wouldn't use this feature in Machine Learning
spaces in the response variable name

Also, compared to the CSV format, the Parquet file format becomes a better way to
store this data. Parquet offers compression, and it maintains schema. Therefore, to clean
the data and store it in Parquet, use:

Python

# read in data again, this time using the 2nd row as the header
df = pd.read_csv(data_asset.path, header=1)
# rename column
df.rename(columns={"default payment next month": "default"}, inplace=True)
# remove ID column
df.drop("ID", axis=1, inplace=True)

# write file to filesystem

df.to_parquet("./data/cleaned-credit-card.parquet")

This table shows the structure of the data in the original

default_of_credit_card_clients.csv file .CSV file downloaded in an earlier step. The
uploaded data contains 23 explanatory variables and 1 response variable, as shown
here:

ﾉ Expand table

Column Variable Description

Name(s) Type

X1 Explanatory Amount of the given credit (NT dollar): it includes both the individual
consumer credit and their family (supplementary) credit.

X2 Explanatory Gender (1 = male; 2 = female).

X3 Explanatory Education (1 = graduate school; 2 = university; 3 = high school; 4 =

others).

X4 Explanatory Marital status (1 = married; 2 = single; 3 = others).

X5 Explanatory Age (years).

X6-X11 Explanatory History of past payment. We tracked the past monthly payment
records (from April to September 2005). -1 = pay duly; 1 = payment
delay for one month; 2 = payment delay for two months; . . .; 8 =
Column Variable Description
Name(s) Type

payment delay for eight months; 9 = payment delay for nine months
and above.

X12-17 Explanatory Amount of bill statement (NT dollar) from April to September 2005.

X18-23 Explanatory Amount of previous payment (NT dollar) from April to September
2005.

Y Response Default payment (Yes = 1, No = 0)

Next, create a new version of the data asset (the data automatically uploads to cloud
storage). For this version, we'll add a time value, so that each time this code is run, a
different version number will be created.

Python

from azure.ai.ml.entities import Data

from azure.ai.ml.constants import AssetTypes
import time

# Next, create a new *version* of the data asset (the data is automatically
uploaded to cloud storage):
v2 = "cleaned" + time.strftime("%Y.%m.%d.%H%M%S", time.gmtime())
my_path = "./data/cleaned-credit-card.parquet"

# Define the data asset, and use tags to make it clear the asset can be used
in training

my_data = Data(
name="credit-card",
version=v2,
description="Default of credit card clients data.",
tags={"training_data": "true", "format": "parquet"},
path=my_path,
type=AssetTypes.URI_FILE,
)

## create the data asset

my_data = ml_client.data.create_or_update(my_data)

print(f"Data asset created. Name: {my_data.name}, version:

{my_data.version}")

The cleaned parquet file is the latest version data source. This code shows the CSV
version result set first, then the Parquet version:

Python
import pandas as pd

# get a handle of the data asset and print the URI

data_asset_v1 = ml_client.data.get(name="credit-card", version=v1)
data_asset_v2 = ml_client.data.get(name="credit-card", version=v2)

# print the v1 data

print(f"V1 Data asset URI: {data_asset_v1.path}")
v1df = pd.read_csv(data_asset_v1.path)
print(v1df.head(5))

# print the v2 data

print(

"___________________________________________________________________________
__________________________________\n"
)
print(f"V2 Data asset URI: {data_asset_v2.path}")
v2df = pd.read_parquet(data_asset_v2.path)
print(v2df.head(5))

Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, in the left navigation area, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.

Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Read Create data assets for more information about data assets.

Read Create datastores to learn more about datastores.

Continue with tutorials to learn how to develop a training script.

Model development on a cloud workstation

Tutorial: Model development on a cloud
workstation
Article • 11/28/2023

Learn how to develop a training script with a notebook on an Azure Machine Learning
cloud workstation. This tutorial covers the basics you need to get started:

＂ Set up and configuring the cloud workstation. Your cloud workstation is powered by
an Azure Machine Learning compute instance, which is pre-configured with
environments to support your various model development needs.
＂ Use cloud-based development environments.
＂ Use MLflow to track your model metrics, all from within a notebook.

Prerequisites
To use Azure Machine Learning, you'll first need a workspace. If you don't have one,
complete Create resources you need to get started to create a workspace and learn
more about using it.

Start with Notebooks

The Notebooks section in your workspace is a good place to start learning about Azure
Machine Learning and its capabilities. Here you can connect to compute resources, work
with a terminal, and edit and run Jupyter Notebooks and scripts.

1. Sign in to Azure Machine Learning studio .

2. Select your workspace if it isn't already open.

3. On the left navigation, select Notebooks.

4. If you don't have a compute instance, you'll see Create compute in the middle of
the screen. Select Create compute and fill out the form. You can use all the
defaults. (If you already have a compute instance, you'll instead see Terminal in
that spot. You'll use Terminal later in this tutorial.)
Set up a new environment for prototyping
(optional)
In order for your script to run, you need to be working in an environment configured
with the dependencies and libraries the code expects. This section helps you create an
environment tailored to your code. To create the new Jupyter kernel your notebook
connects to, you'll use a YAML file that defines the dependencies.

Upload a file.

Files you upload are stored in an Azure file share, and these files are mounted to
each compute instance and shared within the workspace.

1. Download this conda environment file, workstation_env.yml to your

computer by using the Download raw file button at the top right.

1. Select Add files, then select Upload files to upload it to your workspace.
2. Select Browse and select file(s).

3. Select workstation_env.yml file you downloaded.

4. Select Upload.

You'll see the workstation_env.yml file under your username folder in the Files tab.
Select this file to preview it, and see what dependencies it specifies. You'll see
contents like this:

yml

name: workstation_env
# This file serves as an example - you can update packages or versions
to fit your use case
dependencies:
- python=3.8
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- mlflow-skinny
- azureml-mlflow
- psutil>=5.8,<5.9
- ipykernel~=6.0
- matplotlib

Create a kernel.

Now use the Azure Machine Learning terminal to create a new Jupyter kernel,
based on the workstation_env.yml file.
1. Select Terminal to open a terminal window. You can also open the terminal
from the left command bar:

2. If the compute instance is stopped, select Start compute and wait until it's
running.

3. Once the compute is running, you see a welcome message in the terminal,
and you can start typing commands.

4. View your current conda environments. The active environment is marked

with a *.

Bash

conda env list

5. If you created a subfolder for this tutorial, cd to that folder now.

6. Create the environment based on the conda file provided. It takes a few
minutes to build this environment.

Bash

conda env create -f workstation_env.yml

7. Activate the new environment.

Bash

conda activate workstation_env

8. Validate the correct environment is active, again looking for the environment
marked with a *.
Bash

conda env list

9. Create a new Jupyter kernel based on your active environment.

Bash

python -m ipykernel install --user --name workstation_env --

display-name "Tutorial Workstation Env"

10. Close the terminal window.

You now have a new kernel. Next you'll open a notebook and use this kernel.

Create a notebook
1. Select Add files, and choose Create new file.

2. Name your new notebook develop-tutorial.ipynb (or enter your preferred name).

3. If the compute instance is stopped, select Start compute and wait until it's
running.


4. You'll see the notebook is connected to the default kernel in the top right. Switch
to use the Tutorial Workstation Env kernel if you created the kernel.

Develop a training script

In this section, you develop a Python training script that predicts credit card default
payments, using the prepared test and training datasets from the UCI dataset .

This code uses sklearn for training and MLflow for logging the metrics.

1. Start with code that imports the packages and libraries you'll use in the training
script.

Python

import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

2. Next, load and process the data for this experiment. In this tutorial, you read the
data from a file on the internet.

Python

# load the data

credit_df = pd.read_csv(

"https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
header=1,
index_col=0,
)

train_df, test_df = train_test_split(

credit_df,
test_size=0.25,
)

3. Get the data ready for training:

Python
# Extracting the label column
y_train = train_df.pop("default payment next month")

# convert the dataframe values to array

X_train = train_df.values

# Extracting the label column

y_test = test_df.pop("default payment next month")

# convert the dataframe values to array

X_test = test_df.values

4. Add code to start autologging with MLflow , so that you can track the metrics and
results. With the iterative nature of model development, MLflow helps you log
model parameters and results. Refer back to those runs to compare and
understand how your model performs. The logs also provide context for when
you're ready to move from the development phase to the training phase of your
workflows within Azure Machine Learning.

Python

# set name for logging

mlflow.set_experiment("Develop on cloud tutorial")
# enable autologging with MLflow
mlflow.sklearn.autolog()

5. Train a model.

Python

# Train Gradient Boosting Classifier

print(f"Training with data of shape {X_train.shape}")

mlflow.start_run()
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))
# Stop logging for this model
mlflow.end_run()

７ Note
You can ignore the mlflow warnings. You'll still get all the results you need
tracked.

Iterate
Now that you have model results, you may want to change something and try again. For
example, try a different classifier technique:

Python

# Train AdaBoost Classifier

from sklearn.ensemble import AdaBoostClassifier

print(f"Training with data of shape {X_train.shape}")

mlflow.start_run()
ada = AdaBoostClassifier()

ada.fit(X_train, y_train)

y_pred = ada.predict(X_test)

print(classification_report(y_test, y_pred))
# Stop logging for this model
mlflow.end_run()

７ Note

You can ignore the mlflow warnings. You'll still get all the results you need tracked.

Examine results
Now that you've tried two different models, use the results tracked by MLFfow to decide
which model is better. You can reference metrics like accuracy, or other indicators that
matter most for your scenarios. You can dive into these results in more detail by looking
at the jobs created by MLflow .

1. On the left navigation, select Jobs.

2. Select the link for Develop on cloud tutorial.

3. There are two different jobs shown, one for each of the models you tried. These
names are autogenerated. As you hover over a name, use the pencil tool next to
the name if you want to rename it.

4. Select the link for the first job. The name appears at the top. You can also rename it
here with the pencil tool.

5. The page shows details of the job, such as properties, outputs, tags, and
parameters. Under Tags, you'll see the estimator_name, which describes the type of
model.

6. Select the Metrics tab to view the metrics that were logged by MLflow . (Expect
your results to differ, as you have a different training set.)
7. Select the Images tab to view the images generated by MLflow .

8. Go back and review the metrics and images for the other model.

Create a Python script

Now create a Python script from your notebook for model training.

1. On the notebook toolbar, select the menu.

2. Select Export as> Python.

3. Name the file train.py.

4. Look through this file and delete the code you don't want in the training script. For
example, keep the code for the model you wish to use, and delete code for the
model you don't want.

Make sure you keep the code that starts autologging

( mlflow.sklearn.autolog() ).
You may wish to delete the autogenerated comments and add in more of
your own comments.
When you run the Python script interactively (in a terminal or notebook), you
can keep the line that defines the experiment name
( mlflow.set_experiment("Develop on cloud tutorial") ). Or even give it a
different name to see it as a different entry in the Jobs section. But when you
prepare the script for a training job, that line won't work and should be
omitted - the job definition includes the experiment name.
When you train a single model, the lines to start and end a run
( mlflow.start_run() and mlflow.end_run() ) are also not necessary (they'll
have no effect), but can be left in if you wish.

5. When you're finished with your edits, save the file.

You now have a Python script to use for training your preferred model.
Run the Python script
For now, you're running this code on your compute instance, which is your Azure
Machine Learning development environment. Tutorial: Train a model shows you how to
run a training script in a more scalable way on more powerful compute resources.

1. On the left, select Open terminal to open a terminal window.

2. View your current conda environments. The active environment is marked with a *.

Bash

conda env list

3. If you created a new kernel, activate it now:

Bash

conda activate workstation_env

4. If you created a subfolder for this tutorial, cd to that folder now.

5. Run your training script.

Bash

python train.py

７ Note

You can ignore the mlflow warnings. You'll still get all the metric and images from
autologging.

Examine script results

Go back to Jobs to see the results of your training script. Keep in mind that the training
data changes with each split, so the results differ between runs as well.

Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, in the left navigation area, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.

Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Learn more about:

From artifacts to models in MLflow

Using Git with Azure Machine Learning
Running Jupyter notebooks in your workspace
Working with a compute instance terminal in your workspace
Manage notebook and terminal sessions

This tutorial showed you the early steps of creating a model, prototyping on the same
machine where the code resides. For your production training, learn how to use that
training script on more powerful remote compute resources:

Train a model
Tutorial: Train a model in Azure Machine
Learning
Article • 11/15/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Learn how a data scientist uses Azure Machine Learning to train a model. In this
example, we use the associated credit card dataset to show how you can use Azure
Machine Learning for a classification problem. The goal is to predict if a customer has a
high likelihood of defaulting on a credit card payment.

The training script handles the data preparation, then trains and registers a model. This
tutorial takes you through steps to submit a cloud-based training job (command job). If
you would like to learn more about how to load your data into Azure, see Tutorial:
Upload, access and explore your data in Azure Machine Learning. The steps are:

＂ Get a handle to your Azure Machine Learning workspace

＂ Create your compute resource and job environment
＂ Create your training script
＂ Create and run your command job to run the training script on the compute
resource, configured with the appropriate job environment and the data source
＂ View the output of your training script
＂ Deploy the newly-trained model as an endpoint
＂ Call the Azure Machine Learning endpoint for inferencing

2. Sign in to studio and select your workspace if it's not already open.

3. Open or create a notebook in your workspace:

Create a new notebook, if you want to copy/paste code into cells.

Or, open tutorials/get-started-notebooks/train-model.ipynb from the
Samples section of studio. Then select Clone to add the notebook to your
Files. (See where to find Samples.)
Set your kernel
1. On the top bar above your opened notebook, create a compute instance if you
don't already have one.

2. If the compute instance is stopped, select Start compute and wait until it is
running.

3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.

4. If you see a banner that says you need to be authenticated, select Authenticate.

） Important

The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.

Use a command job to train a model in Azure

Machine Learning
To train a model, you need to submit a job. The type of job you'll submit in this tutorial
is a command job. Azure Machine Learning offers several different types of jobs to train
models. Users can select their method of training based on complexity of the model,
data size, and training speed requirements. In this tutorial, you'll learn how to submit a
command job to run a training script.

A command job is a function that allows you to submit a custom training script to train
your model. This can also be defined as a custom training job. A command job in Azure
Machine Learning is a type of job that runs a script or command in a specified
environment. You can use command jobs to train models, process data, or any other
custom code you want to execute in the cloud.
In this tutorial, we'll focus on using a command job to create a custom training job that
we'll use to train a model. For any custom training job, the below items are required:

environment
data
command job
training script

In this tutorial we'll provide all these items for our example: creating a classifier to
predict customers who have a high likelihood of defaulting on credit card payments.

Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create
ml_client for a handle to the workspace. You'll then use ml_client to manage
resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:

Python

from azure.ai.ml import MLClient

from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).

Python

# Verify that the handle works correctly.

# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and
WS_NAME in the previous cell.
ws = ml_client.workspaces.get(WS_NAME)
print(ws.location,":", ws.resource_group)

Create a job environment

To run your Azure Machine Learning job on your compute resource, you need an
environment. An environment lists the software runtime and libraries that you want
installed on the compute where you'll be training. It's similar to your python
environment on your local machine.

Azure Machine Learning provides many curated or ready-made environments, which are
useful for common training and inference scenarios.

In this example, you'll create a custom conda environment for your jobs, using a conda
yaml file.

First, create a directory to store the file in.

Python

import os

dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)

The cell below uses IPython magic to write the conda file into the directory you just
created.

Python

%%writefile {dependencies_dir}/conda.yaml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=1.0.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- mlflow==2.8.0
- mlflow-skinny==2.8.0
- azureml-mlflow==1.51.0
- psutil>=5.8,<5.9
- tqdm>=4.59,<4.60
- ipykernel~=6.0
- matplotlib

The specification contains some usual packages, that you'll use in your job (numpy, pip).

Reference this yaml file to create and register this custom environment in your
workspace:

Python

from azure.ai.ml.entities import Environment

custom_env_name = "aml-scikit-learn"

custom_job_env = Environment(
name=custom_env_name,
description="Custom environment for Credit Card Defaults job",
tags={"scikit-learn": "1.0.2"},
conda_file=os.path.join(dependencies_dir, "conda.yaml"),
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)
custom_job_env = ml_client.environments.create_or_update(custom_job_env)

print(
f"Environment with name {custom_job_env.name} is registered to
workspace, the environment version is {custom_job_env.version}"
)

Configure a training job using the command

function
You create an Azure Machine Learning command job to train a model for credit default
prediction. The command job runs a training script in a specified environment on a
specified compute resource. You've already created the environment and the compute
cluster. Next you'll create the training script. In our specific case, we're training our
dataset to produce a classifier using the GradientBoostingClassifier model.

The training script handles the data preparation, training and registering of the trained
model. The method train_test_split handles splitting the dataset into test and training
data. In this tutorial, you'll create a Python training script.

Command jobs can be run from CLI, Python SDK, or studio interface. In this tutorial,
you'll use the Azure Machine Learning Python SDK v2 to create and run the command
job.

Create training script

Let's start by creating the training script - the main.py python file.

First create a source folder for the script:

Python

import os

train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

This script handles the preprocessing of the data, splitting it into test and train data. It
then consumes this data to train a tree based model and return the output model.

MLFlow is used to log the parameters and metrics during our job. The MLFlow package
allows you to keep track of metrics and results for each model Azure trains. We'll be
using MLFlow to first get the best model for our data, then we'll view the model's
metrics on the Azure studio.

Python

def main():
"""Main function of the script."""

# input and output arguments

# Start Logging
mlflow.start_run()

# enable autologging
mlflow.sklearn.autolog()

###################
#<prepare the data>
###################
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))

print("input data:", args.data)

credit_df = pd.read_csv(args.data, header=1, index_col=0)

mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)

#Split train and test datasets

train_df, test_df = train_test_split(
credit_df,
test_size=args.test_train_ratio,
)
####################
#</prepare the data>
####################

##################
#<train the model>
##################
# Extracting the label column
y_train = train_df.pop("default payment next month")

# convert the dataframe values to array

X_train = train_df.values

# Extracting the label column

y_test = test_df.pop("default payment next month")

# convert the dataframe values to array

X_test = test_df.values

print(f"Training with data of shape {X_train.shape}")

clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))
###################
#</train the model>
###################

# Saving the model to a file

mlflow.sklearn.save_model(
sk_model=clf,
path=os.path.join(args.registered_model_name, "trained_model"),
)
###########################
#</save and register model>
###########################

# Stop Logging
mlflow.end_run()

if __name__ == "__main__":
main()

In this script, once the model is trained, the model file is saved and registered to the
workspace. Registering your model allows you to store and version your models in the
Azure cloud, in your workspace. Once you register a model, you can find all other
registered model in one place in the Azure Studio called the model registry. The model
registry helps you organize and keep track of your trained models.

Configure the command

Now that you have a script that can perform the classification task, use the general
purpose command that can run command line actions. This command line action can be
directly calling system commands or by running a script.

Here, create input variables to specify the input data, split ratio, learning rate and
registered model name. The command script will:

Use the environment created earlier - you can use the @latest notation to indicate
the latest version of the environment when the command is run.
Configure the command line action itself - python main.py in this case. The
inputs/outputs are accessible in the command via the ${{ ... }} notation.
Since a compute resource was not specified, the script will be run on a serverless
compute cluster that is automatically created.

Python

from azure.ai.ml import command

from azure.ai.ml import Input

registered_model_name = "credit_defaults_model"

job = command(
inputs=dict(
data=Input(
type="uri_file",

path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
),
test_train_ratio=0.2,
learning_rate=0.25,
registered_model_name=registered_model_name,
),
code="./src/", # location of source code
command="python main.py --data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --
registered_model_name ${{inputs.registered_model_name}}",
environment="aml-scikit-learn@latest",
display_name="credit_default_prediction",
)

Submit the job

It's now time to submit the job to run in Azure Machine Learning studio. This time you'll
use create_or_update on ml_client . ml_client is a client class that allows you to
connect to your Azure subscription using Python and interact with Azure Machine
Learning services. ml_client allows you to submit your jobs using Python.

Python

ml_client.create_or_update(job)

View job output and wait for job completion

View the job in Azure Machine Learning studio by selecting the link in the output of the
previous cell. The output of this job will look like this in the Azure Machine Learning
studio. Explore the tabs for various details like metrics, outputs etc. Once completed, the
job will register a model in your workspace as a result of training.

） Important

When you run the cell, the notebook output shows a link to the job's details page on
Azure Studio. Alternatively, you can also select Jobs on the left navigation menu. A job is
a grouping of many runs from a specified script or piece of code. Information for the run
is stored under that job. The details page gives an overview of the job, the time it took
to run, when it was created, etc. The page also has tabs to other information about the
job such as metrics, Outputs + logs, and code. Listed below are the tabs available in the
job's details page:

Overview: The overview section provides basic information about the job, including
its status, start and end times, and the type of job that was run
Inputs: The input section lists the data and code that were used as inputs for the
job. This section can include datasets, scripts, environment configurations, and
other resources that were used during training.
Outputs + logs: The Outputs + logs tab contains logs generated while the job was
running. This tab assists in troubleshooting if anything goes wrong with your
training script or model creation.
Metrics: The metrics tab showcases key performance metrics from your model such
as training score, f1 score, and precision score.

Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, in the left navigation area, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.

Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.
3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next Steps
Learn about deploying a model

Deploy a model .

This tutorial used an online data file. To learn more about other ways to access data, see
Tutorial: Upload, access and explore your data in Azure Machine Learning.

If you would like to learn more about different ways to train models in Azure Machine
Learning, see What is automated machine learning (AutoML)?. Automated ML is a
supplemental tool to reduce the amount of time a data scientist spends finding a model
that works best with their data.

If you would like more examples similar to this tutorial, see Samples section of studio.
These same samples are available at our GitHub examples page. The examples include
complete Python Notebooks that you can run code and learn to train a model. You can
modify and run existing scripts from the samples, containing scenarios including
classification, natural language processing, and anomaly detection.
Deploy a model as an online endpoint
Article • 04/20/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Learn to deploy a model to an online endpoint, using Azure Machine Learning Python
SDK v2.

In this tutorial, we use a model trained to predict the likelihood of defaulting on a credit
card payment. The goal is to deploy this model and show its use.

The steps you'll take are:

＂ Register your model

＂ Create an endpoint and a first deployment
＂ Deploy a trial run
＂ Manually send test data to the deployment
＂ Get details of the deployment
＂ Create a second deployment
＂ Manually scale the second deployment
＂ Update allocation of production traffic between both deployments
＂ Get details of the second deployment
＂ Roll out the new deployment and delete the first one

2. Sign in to studio and select your workspace if it's not already open.

3. Open or create a notebook in your workspace:

Create a new notebook, if you want to copy/paste code into cells.

Or, open tutorials/get-started-notebooks/deploy-model.ipynb from the
Samples section of studio. Then select Clone to add the notebook to your
Files. (See where to find Samples.)

4. View your VM quota and ensure you have enough quota available to create online
deployments. In this tutorial, you will need at least 8 cores of STANDARD_DS3_v2 and
12 cores of STANDARD_F4s_v2 . To view your VM quota usage and request quota
increases, see Manage resource quotas.

Set your kernel

1. On the top bar above your opened notebook, create a compute instance if you
don't already have one.

2. If the compute instance is stopped, select Start compute and wait until it is
running.

3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.

4. If you see a banner that says you need to be authenticated, select Authenticate.

） Important

The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.

Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create
ml_client for a handle to the workspace. You'll then use ml_client to manage

resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:

Python

from azure.ai.ml import MLClient

from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

# Get a handle to the workspace

ml_client = MLClient(
credential=credential,
subscription_id="<SUBSCRIPTION_ID>",
resource_group_name="<RESOURCE_GROUP>",
workspace_name="<AML_WORKSPACE_NAME>",
)

７ Note

Creating MLClient will not connect to the workspace. The client initialization is lazy
and will wait for the first time it needs to make a call (this will happen in the next
code cell).

Register the model

If you already completed the earlier training tutorial, Train a model, you've registered an
MLflow model as part of the training script and can skip to the next section.

If you didn't complete the training tutorial, you'll need to register the model. Registering
your model before deployment is a recommended best practice.

In this example, we specify the path (where to upload files from) inline. If you cloned the
tutorials folder, then run the following code as-is. Otherwise, download the files and
metadata for the model to deploy . Update the path to the location on your local
computer where you've unzipped the model's files.
The SDK automatically uploads the files and registers the model.

For more information on registering your model as an asset, see Register your model as
an asset in Machine Learning by using the SDK.

Python

# Import the necessary libraries

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# Provide the model details, including the

# path to the model files, if you've stored them locally.
mlflow_model = Model(
path="./deploy/credit_defaults_model/",
type=AssetTypes.MLFLOW_MODEL,
name="credit_defaults_model",
description="MLflow Model created from local files.",
)

# Register the model

ml_client.models.create_or_update(mlflow_model)

Confirm that the model is registered

You can check the Models page in Azure Machine Learning studio to identify the
latest version of your registered model.

Alternatively, the code below will retrieve the latest version number for you to use.

Python
registered_model_name = "credit_defaults_model"

# Let's pick the latest version of the model

latest_model_version = max(
[int(m.version) for m in
ml_client.models.list(name=registered_model_name)]
)

print(latest_model_version)

Now that you have a registered model, you can create an endpoint and deployment.
The next section will briefly cover some key details about these topics.

Endpoints and deployments

After you train a machine learning model, you need to deploy it so that others can use it
for inferencing. For this purpose, Azure Machine Learning allows you to create
endpoints and add deployments to them.

An endpoint, in this context, is an HTTPS path that provides an interface for clients to
send requests (input data) to a trained model and receive the inferencing (scoring)
results back from the model. An endpoint provides:

Authentication using "key or token" based auth

TLS(SSL) termination
A stable scoring URI (endpoint-name.region.inference.ml.azure.com)

A deployment is a set of resources required for hosting the model that does the actual
inferencing.

A single endpoint can contain multiple deployments. Endpoints and deployments are
independent Azure Resource Manager resources that appear in the Azure portal.

Azure Machine Learning allows you to implement online endpoints for real-time
inferencing on client data, and batch endpoints for inferencing on large volumes of data
over a period of time.

In this tutorial, we'll walk you through the steps of implementing a managed online
endpoint. Managed online endpoints work with powerful CPU and GPU machines in
Azure in a scalable, fully managed way that frees you from the overhead of setting up
and managing the underlying deployment infrastructure.

Create an online endpoint

Now that you have a registered model, it's time to create your online endpoint. The
endpoint name needs to be unique in the entire Azure region. For this tutorial, you'll
create a unique name using a universally unique identifier UUID . For more information
on the endpoint naming rules, see managed online endpoint limits.

Python

import uuid

# Create a unique name for the endpoint

online_endpoint_name = "credit-endpoint-" + str(uuid.uuid4())[:8]

First, we'll define the endpoint, using the ManagedOnlineEndpoint class.

 Tip

auth_mode : Use key for key-based authentication. Use aml_token for Azure

Machine Learning token-based authentication. A key doesn't expire, but

aml_token does expire. For more information on authenticating, see

Authenticate to an online endpoint.

Optionally, you can add a description and tags to your endpoint.

Python

from azure.ai.ml.entities import ManagedOnlineEndpoint

# define an online endpoint

endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is an online endpoint",
auth_mode="key",
tags={
"training_dataset": "credit_defaults",
},
)

Using the MLClient created earlier, we'll now create the endpoint in the workspace. This
command will start the endpoint creation and return a confirmation response while the
endpoint creation continues.

７ Note

Expect the endpoint creation to take approximately 2 minutes.

Python

# create the online endpoint

# expect the endpoint to take approximately 2 minutes.

endpoint =
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Once you've created the endpoint, you can retrieve it as follows:

Python

endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

print(
f'Endpoint "{endpoint.name}" with provisioning state "
{endpoint.provisioning_state}" is retrieved'
)

Understanding online deployments

The key aspects of a deployment include:

name - Name of the deployment.

endpoint_name - Name of the endpoint that will contain the deployment.

model - The model to use for the deployment. This value can be either a reference

to an existing versioned model in the workspace or an inline model specification.

environment - The environment to use for the deployment (or to run the model).

This value can be either a reference to an existing versioned environment in the

workspace or an inline environment specification. The environment can be a
Docker image with Conda dependencies or a Dockerfile.
code_configuration - the configuration for the source code and scoring script.
path - Path to the source code directory for scoring the model.

scoring_script - Relative path to the scoring file in the source code directory.
This script executes the model on a given input request. For an example of a
scoring script, see Understand the scoring script in the "Deploy an ML model
with an online endpoint" article.
instance_type - The VM size to use for the deployment. For the list of supported

sizes, see Managed online endpoints SKU list.

instance_count - The number of instances to use for the deployment.
Deployment using an MLflow model
Azure Machine Learning supports no-code deployment of a model created and logged
with MLflow. This means that you don't have to provide a scoring script or an
environment during model deployment, as the scoring script and environment are
automatically generated when training an MLflow model. If you were using a custom
model, though, you'd have to specify the environment and scoring script during
deployment.

） Important

If you typically deploy models using scoring scripts and custom environments and
want to achieve the same functionality using MLflow models, we recommend
reading Using MLflow models for no-code deployment.

Deploy the model to the endpoint

You'll begin by creating a single deployment that handles 100% of the incoming traffic.
We've chosen an arbitrary color name (blue) for the deployment. To create the
deployment for our endpoint, we'll use the ManagedOnlineDeployment class.

７ Note

No need to specify an environment or scoring script as the model to deploy is an

MLflow model.

Python

from azure.ai.ml.entities import ManagedOnlineDeployment

# Choose the latest version of our registered model for deployment

model = ml_client.models.get(name=registered_model_name,
version=latest_model_version)

# define an online deployment

# if you run into an out of quota error, change the instance_type to a
comparable VM that is available.\
# Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-
learning/.
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
instance_type="Standard_DS3_v2",
instance_count=1,
)

Using the MLClient created earlier, we'll now create the deployment in the workspace.
This command will start the deployment creation and return a confirmation response
while the deployment creation continues.

Python

# create the online deployment

blue_deployment = ml_client.online_deployments.begin_create_or_update(
blue_deployment
).result()

# blue deployment takes 100% traffic

# expect the deployment to take approximately 8 to 10 minutes.
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Check the status of the endpoint

You can check the status of the endpoint to see whether the model was deployed
without error:

Python

# return an object that contains metadata for the endpoint

endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# print a selection of the endpoint's metadata

print(
f"Name: {endpoint.name}\nStatus:
{endpoint.provisioning_state}\nDescription: {endpoint.description}"
)

Python

# existing traffic details

print(endpoint.traffic)

# Get the scoring URI

print(endpoint.scoring_uri)

Test the endpoint with sample data

Now that the model is deployed to the endpoint, you can run inference with it. Let's
create a sample request file following the design expected in the run method in the
scoring script.

Python

import os

# Create a directory to store the sample request file.

deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

Now, create the file in the deploy directory. The cell below uses IPython magic to write
the file into the directory you just created.

Python

%%writefile {deploy_dir}/sample-request.json
{
"input_data": {
"columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
"index": [0, 1],
"data": [

[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
10, 9, 8]
]
}
}

Using the MLClient created earlier, we'll get a handle to the endpoint. The endpoint can
be invoked using the invoke command with the following parameters:

endpoint_name - Name of the endpoint

request_file - File with request data

deployment_name - Name of the specific deployment to test in an endpoint

We'll test the blue deployment with the sample data.

Python

# test the blue deployment with the sample data

ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
deployment_name="blue",
request_file="./deploy/sample-request.json",
)
Get logs of the deployment
Check the logs to see whether the endpoint/deployment were invoked successfully If
you face errors, see Troubleshooting online endpoints deployment.

Python

logs = ml_client.online_deployments.get_logs(
name="blue", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

Create a second deployment

Deploy the model as a second deployment called green . In practice, you can create
several deployments and compare their performance. These deployments could use a
different version of the same model, a completely different model, or a more powerful
compute instance. In our example, you'll deploy the same model version using a more
powerful compute instance that could potentially improve performance.

Python

# picking the model to deploy. Here we use the latest version of our
registered model
model = ml_client.models.get(name=registered_model_name,
version=latest_model_version)

# define an online deployment using a more powerful instance type

# if you run into an out of quota error, change the instance_type to a
comparable VM that is available.\
# Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-
learning/.
green_deployment = ManagedOnlineDeployment(
name="green",
endpoint_name=online_endpoint_name,
model=model,
instance_type="Standard_F4s_v2",
instance_count=1,
)

# create the online deployment

# expect the deployment to take approximately 8 to 10 minutes
green_deployment = ml_client.online_deployments.begin_create_or_update(
green_deployment
).result()
Scale deployment to handle more traffic
Using the MLClient created earlier, we'll get a handle to the green deployment. The
deployment can be scaled by increasing or decreasing the instance_count .

In the following code, you'll increase the VM instance manually. However, note that it is
also possible to autoscale online endpoints. Autoscale automatically runs the right
amount of resources to handle the load on your application. Managed online endpoints
support autoscaling through integration with the Azure monitor autoscale feature. To
configure autoscaling, see autoscale online endpoints.

Python

# update definition of the deployment

green_deployment.instance_count = 2

# update the deployment

# expect the deployment to take approximately 8 to 10 minutes
ml_client.online_deployments.begin_create_or_update(green_deployment).result
()

Update traffic allocation for deployments

You can split production traffic between deployments. You may first want to test the
green deployment with sample data, just like you did for the blue deployment. Once
you've tested your green deployment, allocate a small percentage of traffic to it.

Python

endpoint.traffic = {"blue": 80, "green": 20}

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

You can test traffic allocation by invoking the endpoint several times:

Python

# You can invoke the endpoint several times

for i in range(30):
ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
request_file="./deploy/sample-request.json",
)
Show logs from the green deployment to check that there were incoming requests and
the model was scored successfully.

Python

logs = ml_client.online_deployments.get_logs(
name="green", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

View metrics using Azure Monitor

You can view various metrics (request numbers, request latency, network bytes,
CPU/GPU/Disk/Memory utilization, and more) for an online endpoint and its
deployments by following links from the endpoint's Details page in the studio.
Following these links will take you to the exact metrics page in the Azure portal for the
endpoint or deployment.


If you open the metrics for the online endpoint, you can set up the page to see metrics
such as the average request latency as shown in the following figure.

For more information on how to view online endpoint metrics, see Monitor online
endpoints.

Send all traffic to the new deployment

Once you're fully satisfied with your green deployment, switch all traffic to it.

Python

endpoint.traffic = {"blue": 0, "green": 100}

ml_client.begin_create_or_update(endpoint).result()

Delete the old deployment

Remove the old (blue) deployment:

Python

ml_client.online_deployments.begin_delete(
name="blue", endpoint_name=online_endpoint_name
).result()
Clean up resources
If you aren't going use the endpoint and deployment after completing this tutorial, you
should delete them.

７ Note

Expect the complete deletion to take approximately 20 minutes.

Python

ml_client.online_endpoints.begin_delete(name=online_endpoint_name).result()

Delete everything
Use these steps to delete your Azure Machine Learning workspace and all compute
resources.

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next Steps
Deploy and score a machine learning model by using an online endpoint.
Test the deployment with mirrored traffic
Monitor online endpoints
Autoscale an online endpoint
Customize MLflow model deployments with scoring script
View costs for an Azure Machine Learning managed online endpoint
Tutorial: Create production machine
learning pipelines
Article • 11/15/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

７ Note

For a tutorial that uses SDK v1 to build a pipeline, see Tutorial: Build an Azure
Machine Learning pipeline for image classification

The core of a machine learning pipeline is to split a complete machine learning task into
a multistep workflow. Each step is a manageable component that can be developed,
optimized, configured, and automated individually. Steps are connected through well-
defined interfaces. The Azure Machine Learning pipeline service automatically
orchestrates all the dependencies between pipeline steps. The benefits of using a
pipeline are standardized the MLOps practice, scalable team collaboration, training
efficiency and cost reduction. To learn more about the benefits of pipelines, see What
are Azure Machine Learning pipelines.

In this tutorial, you use Azure Machine Learning to create a production ready machine
learning project, using Azure Machine Learning Python SDK v2.

This means you will be able to leverage the Azure Machine Learning Python SDK to:

＂ Get a handle to your Azure Machine Learning workspace

＂ Create Azure Machine Learning data assets
＂ Create reusable Azure Machine Learning components
＂ Create, validate and run Azure Machine Learning pipelines

During this tutorial, you create an Azure Machine Learning pipeline to train a model for
credit default prediction. The pipeline handles two steps:

1. Data preparation
2. Training and registering the trained model

The next image shows a simple pipeline as you'll see it in the Azure studio once
submitted.

The two steps are first data preparation and second training.
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.

2. Sign in to studio and select your workspace if it's not already open.

3. Complete the tutorial Upload, access and explore your data to create the data
asset you need in this tutorial. Make sure you run all the code to create the initial
data asset. Explore the data and revise it if you wish, but you'll only need the initial
data in this tutorial.

4. Open or create a notebook in your workspace:

Create a new notebook, if you want to copy/paste code into cells.

Or, open tutorials/get-started-notebooks/pipeline.ipynb from the Samples
section of studio. Then select Clone to add the notebook to your Files. (See
where to find Samples.)

Set your kernel

1. On the top bar above your opened notebook, create a compute instance if you
don't already have one.


2. If the compute instance is stopped, select Start compute and wait until it is
running.

3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.

4. If you see a banner that says you need to be authenticated, select Authenticate.

） Important

The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.

Set up the pipeline resources

The Azure Machine Learning framework can be used from CLI, Python SDK, or studio
interface. In this example, you use the Azure Machine Learning Python SDK v2 to create
a pipeline.

Before creating the pipeline, you need the following resources:

The data asset for training

The software environment to run the pipeline
A compute resource to where the job runs

Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create
ml_client for a handle to the workspace. You'll then use ml_client to manage

resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.

Python

from azure.ai.ml import MLClient

from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

７ Note

Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).

Verify the connection by making a call to ml_client . Since this is the first time that
you're making a call to the workspace, you might be asked to authenticate.

Python

# Verify that the handle works correctly.

# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and
WS_NAME in the previous cell.
ws = ml_client.workspaces.get(WS_NAME)
print(ws.location,":", ws.resource_group)

Access the registered data asset

Start by getting the data that you previously registered in Tutorial: Upload, access and
explore your data in Azure Machine Learning.

Azure Machine Learning uses a Data object to register a reusable definition of

data, and consume data within a pipeline.

Python

# get a handle of the data asset and print the URI

credit_data = ml_client.data.get(name="credit-card", version="initial")
print(f"Data asset URI: {credit_data.path}")

Create a job environment for pipeline steps

So far, you've created a development environment on the compute instance, your
development machine. You also need an environment to use for each step of the
pipeline. Each step can have its own environment, or you can use some common
environments for multiple steps.

In this example, you create a conda environment for your jobs, using a conda yaml file.
First, create a directory to store the file in.

Python

import os

dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)

Now, create the file in the dependencies directory.

Python

%%writefile {dependencies_dir}/conda.yaml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- xlrd==2.0.1
- mlflow== 2.4.1
- azureml-mlflow==1.51.0

The specification contains some usual packages, that you use in your pipeline (numpy,
pip), together with some Azure Machine Learning specific packages (azureml-mlflow).

The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning
jobs. However, adding these packages let you interact with Azure Machine Learning for
logging metrics and registering models, all inside the Azure Machine Learning job. You
use them in the training script later in this tutorial.

Use the yaml file to create and register this custom environment in your workspace:

Python

from azure.ai.ml.entities import Environment

custom_env_name = "aml-scikit-learn"

pipeline_job_env = Environment(
name=custom_env_name,
description="Custom environment for Credit Card Defaults pipeline",
tags={"scikit-learn": "0.24.2"},
conda_file=os.path.join(dependencies_dir, "conda.yaml"),
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
version="0.2.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)

print(
f"Environment with name {pipeline_job_env.name} is registered to
workspace, the environment version is {pipeline_job_env.version}"
)

Build the training pipeline

Now that you have all assets required to run your pipeline, it's time to build the pipeline
itself.

Azure Machine Learning pipelines are reusable ML workflows that usually consist of
several components. The typical life of a component is:

Write the yaml specification of the component, or create it programmatically using

ComponentMethod .

Optionally, register the component with a name and version in your workspace, to
make it reusable and shareable.
Load that component from the pipeline code.
Implement the pipeline using the component's inputs, outputs and parameters.
Submit the pipeline.

There are two ways to create a component, programmatic and yaml definition. The next
two sections walk you through creating a component both ways. You can either create
the two components trying both options or pick your preferred method.

７ Note

In this tutorial for simplicity we are using the same compute for all components.
However, you can set different computes for each component, for example by
adding a line like train_step.compute = "cpu-cluster" . To view an example of
building a pipeline with different computes for each component, see the Basic
pipeline job section in the cifar-10 pipeline tutorial .

Create component 1: data prep (using programmatic

definition)
Let's start by creating the first component. This component handles the preprocessing
of the data. The preprocessing task is performed in the data_prep.py Python file.

First create a source folder for the data_prep component:

Python

import os

data_prep_src_dir = "./components/data_prep"
os.makedirs(data_prep_src_dir, exist_ok=True)

This script performs the simple task of splitting the data into train and test datasets.
Azure Machine Learning mounts datasets as folders to the computes, therefore, we
created an auxiliary select_first_file function to access the data file inside the
mounted input folder.

MLFlow is used to log the parameters and metrics during our pipeline run.

Python

%%writefile {data_prep_src_dir}/data_prep.py
import os
import argparse
import pandas as pd
from sklearn.model_selection import train_test_split
import logging
import mlflow

def main():
"""Main function of the script."""

# input and output arguments

parser = argparse.ArgumentParser()
parser.add_argument("--data", type=str, help="path to input data")
parser.add_argument("--test_train_ratio", type=float, required=False,
default=0.25)
parser.add_argument("--train_data", type=str, help="path to train data")
parser.add_argument("--test_data", type=str, help="path to test data")
args = parser.parse_args()

# Start Logging
mlflow.start_run()

print(" ".join(f"{k}={v}" for k, v in vars(args).items()))

print("input data:", args.data)

credit_df = pd.read_csv(args.data, header=1, index_col=0)

mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)

credit_train_df, credit_test_df = train_test_split(

credit_df,
test_size=args.test_train_ratio,
)

# output paths are mounted as folder, therefore, we are adding a

filename to the path
credit_train_df.to_csv(os.path.join(args.train_data, "data.csv"),
index=False)

credit_test_df.to_csv(os.path.join(args.test_data, "data.csv"),
index=False)

# Stop Logging
mlflow.end_run()

if __name__ == "__main__":
main()

Now that you have a script that can perform the desired task, create an Azure Machine
Learning Component from it.
Use the general purpose CommandComponent that can run command line actions. This
command line action can directly call system commands or run a script. The
inputs/outputs are specified on the command line via the ${{ ... }} notation.

Python

from azure.ai.ml import command

from azure.ai.ml import Input, Output

data_prep_component = command(
name="data_prep_credit_defaults",
display_name="Data preparation for training",
description="reads a .xl input, split the input to train and test",
inputs={
"data": Input(type="uri_folder"),
"test_train_ratio": Input(type="number"),
},
outputs=dict(
train_data=Output(type="uri_folder", mode="rw_mount"),
test_data=Output(type="uri_folder", mode="rw_mount"),
),
# The source folder of the component
code=data_prep_src_dir,
command="""python data_prep.py \
--data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} \
--train_data ${{outputs.train_data}} --test_data
${{outputs.test_data}} \
""",
environment=f"{pipeline_job_env.name}:{pipeline_job_env.version}",
)

Optionally, register the component in the workspace for future reuse.

Python

# Now we register the component to the workspace

data_prep_component =
ml_client.create_or_update(data_prep_component.component)

# Create (register) the component in your workspace

print(
f"Component {data_prep_component.name} with Version
{data_prep_component.version} is registered"
)

Create component 2: training (using yaml definition)

The second component that you create consumes the training and test data, train a tree
based model and return the output model. Use Azure Machine Learning logging
capabilities to record and visualize the learning progress.

You used the CommandComponent class to create your first component. This time you use
the yaml definition to define the second component. Each method has its own
advantages. A yaml definition can actually be checked-in along the code, and would
provide a readable history tracking. The programmatic method using CommandComponent
can be easier with built-in class documentation and code completion.

Create the directory for this component:

Python

import os

train_src_dir = "./components/train"
os.makedirs(train_src_dir, exist_ok=True)

Create the training script in the directory:

Python

%%writefile {train_src_dir}/train.py
import argparse
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
import os
import pandas as pd
import mlflow

def select_first_file(path):
"""Selects first file in folder, use under assumption there is only one
file in folder
Args:
path (str): path to directory or file to choose
Returns:
str: full path of selected file
"""
files = os.listdir(path)
return os.path.join(path, files[0])

# Start Logging
mlflow.start_run()

# enable autologging
mlflow.sklearn.autolog()
os.makedirs("./outputs", exist_ok=True)

def main():
"""Main function of the script."""

# input and output arguments

parser = argparse.ArgumentParser()
parser.add_argument("--train_data", type=str, help="path to train data")
parser.add_argument("--test_data", type=str, help="path to test data")
parser.add_argument("--n_estimators", required=False, default=100,
type=int)
parser.add_argument("--learning_rate", required=False, default=0.1,
type=float)
parser.add_argument("--registered_model_name", type=str, help="model
name")
parser.add_argument("--model", type=str, help="path to model file")
args = parser.parse_args()

# paths are mounted as folder, therefore, we are selecting the file from
folder
train_df = pd.read_csv(select_first_file(args.train_data))

# Extracting the label column

y_train = train_df.pop("default payment next month")

# convert the dataframe values to array

X_train = train_df.values

# paths are mounted as folder, therefore, we are selecting the file from
folder
test_df = pd.read_csv(select_first_file(args.test_data))

# Extracting the label column

y_test = test_df.pop("default payment next month")

# convert the dataframe values to array

X_test = test_df.values

print(f"Training with data of shape {X_train.shape}")

clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))

# Registering the model to the workspace

print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=args.registered_model_name,
artifact_path=args.registered_model_name,
)

# Saving the model to a file

mlflow.sklearn.save_model(
sk_model=clf,
path=os.path.join(args.model, "trained_model"),
)

# Stop Logging
mlflow.end_run()

if __name__ == "__main__":
main()

As you can see in this training script, once the model is trained, the model file is saved
and registered to the workspace. Now you can use the registered model in inferencing
endpoints.

For the environment of this step, you use one of the built-in (curated) Azure Machine
Learning environments. The tag azureml , tells the system to use look for the name in
curated environments. First, create the yaml file describing the component:

Python

%%writefile {train_src_dir}/train.yml
# <component>
name: train_credit_defaults_model
display_name: Train Credit Defaults Model
# version: 1 # Not specifying a version will automatically update the
version
type: command
inputs:
train_data:
type: uri_folder
test_data:
type: uri_folder
learning_rate:
type: number
registered_model_name:
type: string
outputs:
model:
type: uri_folder
code: .
environment:
# for this step, we'll use an AzureML curate environment
azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1
command: >-
python train.py
--train_data ${{inputs.train_data}}
--test_data ${{inputs.test_data}}
--learning_rate ${{inputs.learning_rate}}
--registered_model_name ${{inputs.registered_model_name}}
--model ${{outputs.model}}
# </component>

Now create and register the component. Registering it allows you to re-use it in other
pipelines. Also, anyone else with access to your workspace can use the registered
component.

Python

# importing the Component Package

from azure.ai.ml import load_component

# Loading the component from the yml file

train_component = load_component(source=os.path.join(train_src_dir,
"train.yml"))

# Now we register the component to the workspace

train_component = ml_client.create_or_update(train_component)

# Create (register) the component in your workspace

print(
f"Component {train_component.name} with Version
{train_component.version} is registered"
)

Create the pipeline from components

Now that both your components are defined and registered, you can start implementing
the pipeline.

Here, you use input data, split ratio and registered model name as input variables. Then
call the components and connect them via their inputs/outputs identifiers. The outputs
of each step can be accessed via the .outputs property.

The Python functions returned by load_component() work as any regular Python function
that we use within a pipeline to call each step.

To code the pipeline, you use a specific @dsl.pipeline decorator that identifies the
Azure Machine Learning pipelines. In the decorator, we can specify the pipeline
description and default resources like compute and storage. Like a Python function,
pipelines can have inputs. You can then create multiple instances of a single pipeline
with different inputs.
Here, we used input data, split ratio and registered model name as input variables. We
then call the components and connect them via their inputs/outputs identifiers. The
outputs of each step can be accessed via the .outputs property.

Python

# the dsl decorator tells the sdk that we are defining an Azure Machine
Learning pipeline
from azure.ai.ml import dsl, Input, Output

@dsl.pipeline(
compute="serverless", # "serverless" value runs pipeline on serverless
compute
description="E2E data_perp-train pipeline",
)
def credit_defaults_pipeline(
pipeline_job_data_input,
pipeline_job_test_train_ratio,
pipeline_job_learning_rate,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
test_train_ratio=pipeline_job_test_train_ratio,
)

# using train_func like a python call with its own inputs

train_job = train_component(
train_data=data_prep_job.outputs.train_data, # note: using outputs
from previous step
test_data=data_prep_job.outputs.test_data, # note: using outputs
from previous step
learning_rate=pipeline_job_learning_rate, # note: using a pipeline
input as parameter
registered_model_name=pipeline_job_registered_model_name,
)

# a pipeline returns a dictionary of outputs

# keys will code for the pipeline output identifier
return {
"pipeline_job_train_data": data_prep_job.outputs.train_data,
"pipeline_job_test_data": data_prep_job.outputs.test_data,
}

Now use your pipeline definition to instantiate a pipeline with your dataset, split rate of
choice and the name you picked for your model.

Python
registered_model_name = "credit_defaults_model"

# Let's instantiate the pipeline with the parameters of our choice

pipeline = credit_defaults_pipeline(
pipeline_job_data_input=Input(type="uri_file", path=credit_data.path),
pipeline_job_test_train_ratio=0.25,
pipeline_job_learning_rate=0.05,
pipeline_job_registered_model_name=registered_model_name,
)

Submit the job

It's now time to submit the job to run in Azure Machine Learning. This time you use
create_or_update on ml_client.jobs .

Here you also pass an experiment name. An experiment is a container for all the
iterations one does on a certain project. All the jobs submitted under the same
experiment name would be listed next to each other in Azure Machine Learning studio.

Once completed, the pipeline registers a model in your workspace as a result of training.

Python

# submit the pipeline job

pipeline_job = ml_client.jobs.create_or_update(
pipeline,
# Project's name
experiment_name="e2e_registered_components",
)
ml_client.jobs.stream(pipeline_job.name)

You can track the progress of your pipeline, by using the link generated in the previous
cell. When you first select this link, you might see that the pipeline is still running. Once
it's complete, you can examine each component's results.

Double-click the Train Credit Defaults Model component.

There are two important results you'll want to see about training:

View your logs:

1. Select the Outputs+logs tab.

2. Open the folders to user_logs > std_log.txt This section shows the script
run stdout.


View your metrics: Select the Metrics tab. This section shows different logged
metrics. In this example. mlflow autologging , has automatically logged the training
metrics.

Deploy the model as an online endpoint

To learn how to deploy your model to an online endpoint, see Deploy a model as an
online endpoint tutorial.

Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.

Stop compute instance

If you're not going to use it now, stop the compute instance:

1. In the studio, in the left navigation area, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.
Delete all resources

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Learn how to Schedule machine learning pipeline jobs
Tutorial: Train an object detection model
with AutoML and Python
Article • 11/07/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

In this tutorial, you learn how to train an object detection model using Azure Machine
Learning automated ML with the Azure Machine Learning CLI extension v2 or the Azure
Machine Learning Python SDK v2. This object detection model identifies whether the
image contains objects, such as a can, carton, milk bottle, or water bottle.

Automated ML accepts training data and configuration settings, and automatically

iterates through combinations of different feature normalization/standardization
methods, models, and hyperparameter settings to arrive at the best model.

You write code using the Python SDK in this tutorial and learn the following tasks:

＂ Download and transform data

＂ Train an automated machine learning object detection model
＂ Specify hyperparameter values for your model
＂ Perform a hyperparameter sweep
＂ Deploy your model
＂ Visualize detections

Prerequisites
To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.

Python 3.6 or 3.7 are supported for this feature

Download and unzip the *odFridgeObjects.zip data file. The dataset is annotated
in Pascal VOC format, where each image corresponds to an xml file. Each xml file
contains information on where its corresponding image file is located and also
contains information about the bounding boxes and the object labels. In order to
use this data, you first need to convert it to the required JSONL format as seen in
the Convert the downloaded data to JSONL section of the notebook.
Use a compute instance to follow this tutorial without further installation. (See how
to create a compute instance.) Or install the CLI/SDK to use your own local
environment.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

This tutorial is also available in the azureml-examples repository on GitHub .

If you wish to run it in your own local environment:
Install and set up CLI (v2) and make sure you install the ml extension.

Compute target setup

７ Note

To try serverless compute (preview), skip this step and proceed to Experiment
setup.

You first need to set up a compute target to use for your automated ML model training.
Automated ML models for image tasks require GPU SKUs.

This tutorial uses the NCsv3-series (with V100 GPUs) as this type of compute target uses
multiple GPUs to speed up training. Additionally, you can set up multiple nodes to take
advantage of parallelism when tuning hyperparameters for your model.

The following code creates a GPU compute of size Standard_NC24s_v3 with four nodes.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Create a .yml file with the following configuration.

yml

$schema:
https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: gpu-cluster
type: amlcompute
size: Standard_NC24s_v3
min_instances: 0
max_instances: 4
idle_time_before_scale_down: 120

To create the compute, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.

Azure CLI

az ml compute create -f [PATH_TO_YML_FILE] --workspace-name

[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]

Experiment setup
You can use an Experiment to track your model training jobs.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Experiment name can be provided using experiment_name key as follows:

YAML

experiment_name: dpv2-cli-automl-image-object-detection-experiment

Visualize input data

Once you have the input image data prepared in JSONL (JSON Lines) format, you can
visualize the ground truth bounding boxes for an image. To do so, be sure you have
matplotlib installed.

%pip install --upgrade matplotlib

Python

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.patches as patches
from PIL import Image as pil_image
import numpy as np
import json
import os

def plot_ground_truth_boxes(image_file, ground_truth_boxes):

# Display the image
plt.figure()
img_np = mpimg.imread(image_file)
img = pil_image.fromarray(img_np.astype("uint8"), "RGB")
img_w, img_h = img.size

fig,ax = plt.subplots(figsize=(12, 16))

ax.imshow(img_np)
ax.axis("off")

label_to_color_mapping = {}

for gt in ground_truth_boxes:
label = gt["label"]

xmin, ymin, xmax, ymax = gt["topX"], gt["topY"], gt["bottomX"],

gt["bottomY"]
topleft_x, topleft_y = img_w * xmin, img_h * ymin
width, height = img_w * (xmax - xmin), img_h * (ymax - ymin)

if label in label_to_color_mapping:
color = label_to_color_mapping[label]
else:
# Generate a random color. If you want to use a specific color,
you can use something like "red".
color = np.random.rand(3)
label_to_color_mapping[label] = color

# Display bounding box

rect = patches.Rectangle((topleft_x, topleft_y), width, height,
linewidth=2, edgecolor=color,
facecolor="none")
ax.add_patch(rect)

# Display label
ax.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)

plt.show()

def plot_ground_truth_boxes_jsonl(image_file, jsonl_file):

image_base_name = os.path.basename(image_file)
ground_truth_data_found = False
with open(jsonl_file) as fp:
for line in fp.readlines():
line_json = json.loads(line)
filename = line_json["image_url"]
if image_base_name in filename:
ground_truth_data_found = True
plot_ground_truth_boxes(image_file, line_json["label"])
break
if not ground_truth_data_found:
print("Unable to find ground truth information for image:
{}".format(image_file))

Using the above helper functions, for any given image, you can run the following code
to display the bounding boxes.

Python

image_file = "./odFridgeObjects/images/31.jpg"
jsonl_file = "./odFridgeObjects/train_annotations.jsonl"

plot_ground_truth_boxes_jsonl(image_file, jsonl_file)

Upload data and create MLTable

In order to use the data for training, upload data to default Blob Storage of your Azure
Machine Learning Workspace and register it as an asset. The benefits of registering data
are:

Easy to share with other members of the team

Versioning of the metadata (location, description, etc.)
Lineage tracking

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Create a .yml file with the following configuration.

yml

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder

To upload the images as a data asset, you run the following CLI v2 command with
the path to your .yml file, workspace name, resource group and subscription ID.

Azure CLI
az ml data create -f [PATH_TO_YML_FILE] --workspace-name
[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]

Next step is to create MLTable from your data in jsonl format as shown below. MLtable
package your data into a consumable object for training.

YAML

paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

The following configuration creates training and validation data from the MLTable.

YAML

target_column_name: label
training_data:
path: data/training-mltable-folder
type: mltable
validation_data:
path: data/validation-mltable-folder
type: mltable

Configure your object detection experiment

To configure automated ML jobs for image-related tasks, create a task specific AutoML
job.

Azure CLI
APPLIES TO: Azure CLI ml extension v2 (current)

To use serverless compute (preview), replace the line compute: azureml:gpu-

cluster with this code:

yml

resources:
instance_type: Standard_NC24s_v3
instance_count: 4

```yaml
task: image_object_detection
primary_metric: mean_average_precision
compute: azureml:gpu-cluster

Automatic hyperparameter sweeping for image tasks

(AutoMode)

） Important

This feature is currently in public preview. This preview version is provided without
a service-level agreement. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .

In your AutoML job, you can perform an automatic hyperparameter sweep in order to
find the optimal model (we call this functionality AutoMode). You only specify the
number of trials; the hyperparameter search space, sampling method and early
termination policy aren't needed. The system will automatically determine the region of
the hyperparameter space to sweep based on the number of trials. A value between 10
and 20 will likely work well on many datasets.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

limits:
max_trials: 10
max_concurrent_trials: 2

You can then submit the job to train an image model.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

To submit your AutoML job, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.

Azure CLI

az ml job create --file ./hello-automl-job-basic.yml --workspace-name

[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]

Manual hyperparameter sweeping for image tasks

In your AutoML job, you can specify the model architectures by using model_name
parameter and configure the settings to perform a hyperparameter sweep over a
defined search space to find the optimal model.

In this example, we'll train an object detection model with yolov5 and
fasterrcnn_resnet50_fpn , both of which are pretrained on COCO, a large-scale object

detection, segmentation, and captioning dataset that contains over thousands of

labeled images with over 80 label categories.

You can perform a hyperparameter sweep over a defined search space to find the
optimal model.

Job limits

You can control the resources spent on your AutoML Image training job by specifying
the timeout_minutes , max_trials and the max_concurrent_trials for the job in limit
settings. Refer to detailed description on Job Limits parameters.
Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

limits:
timeout_minutes: 60
max_trials: 10
max_concurrent_trials: 2

The following code defines the search space in preparation for the hyperparameter
sweep for each defined architecture, yolov5 and fasterrcnn_resnet50_fpn . In the search
space, specify the range of values for learning_rate , optimizer , lr_scheduler , etc., for
AutoML to choose from as it attempts to generate a model with the optimal primary
metric. If hyperparameter values aren't specified, then default values are used for each
architecture.

For the tuning settings, use random sampling to pick samples from this parameter space
by using the random sampling_algorithm. The job limits configured above, tells
automated ML to try a total of 10 trials with these different samples, running two trials
at a time on our compute target, which was set up using four nodes. The more
parameters the search space has, the more trials you need to find optimal models.

The Bandit early termination policy is also used. This policy terminates poor performing
trials; that is, those trials that aren't within 20% slack of the best performing trial, which
significantly saves compute resources.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

sweep:
sampling_algorithm: random
early_termination:
type: bandit
evaluation_interval: 2
slack_factor: 0.2
delay_evaluation: 6

YAML
search_space:
- model_name:
type: choice
values: [yolov5]
learning_rate:
type: uniform
min_value: 0.0001
max_value: 0.01
model_size:
type: choice
values: [small, medium]

- model_name:
type: choice
values: [fasterrcnn_resnet50_fpn]
learning_rate:
type: uniform
min_value: 0.0001
max_value: 0.001
optimizer:
type: choice
values: [sgd, adam, adamw]
min_size:
type: choice
values: [600, 800]

Once the search space and sweep settings are defined, you can then submit the job to
train an image model using your training dataset.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

To submit your AutoML job, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.

Azure CLI

az ml job create --file ./hello-automl-job-basic.yml --workspace-name

[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]

When doing a hyperparameter sweep, it can be useful to visualize the different trials
that were tried using the HyperDrive UI. You can navigate to this UI by going to the
'Child jobs' tab in the UI of the main automl_image_job from above, which is the
HyperDrive parent job. Then you can go into the 'Child jobs' tab of this one.
Alternatively, here below you can see directly the HyperDrive parent job and navigate to
its 'Child jobs' tab:

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

CLI example not available, please use Python SDK.

Register and deploy model

Once the job completes, you can register the model that was created from the best trial
(configuration that resulted in the best primary metric). You can either register the
model after downloading or by specifying the azureml path with corresponding jobid .

Get the best trial

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

CLI example not available, please use Python SDK.

Register the model

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Azure CLI

az ml model create --name od-fridge-items-mlflow-model --version 1 --

path azureml://jobs/$best_run/outputs/artifacts/outputs/mlflow-model/ --
type mlflow_model --workspace-name [YOUR_AZURE_WORKSPACE] --resource-
group [YOUR_AZURE_RESOURCE_GROUP] --subscription
[YOUR_AZURE_SUBSCRIPTION]

After you register the model you want to use, you can deploy it using the managed
online endpoint deploy-managed-online-endpoint

Configure online endpoint

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema
.json
name: od-fridge-items-endpoint
auth_mode: key

Create the endpoint

Using the MLClient created earlier, we'll now create the Endpoint in the workspace. This
command starts the endpoint creation and return a confirmation response while the
endpoint creation continues.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Azure CLI

az ml online-endpoint create --file .\create_endpoint.yml --workspace-

name [YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP]
--subscription [YOUR_AZURE_SUBSCRIPTION]

We can also create a batch endpoint for batch inferencing on large volumes of data over
a period of time. Check out the object detection batch scoring notebook for batch
inferencing using the batch endpoint.
Configure online deployment
A deployment is a set of resources required for hosting the model that does the actual
inferencing. We create a deployment for our endpoint using the
ManagedOnlineDeployment class. You can use either GPU or CPU VM SKUs for your

deployment cluster.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

name: od-fridge-items-mlflow-deploy
endpoint_name: od-fridge-items-endpoint
model: azureml:od-fridge-items-mlflow-model@latest
instance_type: Standard_DS3_v2
instance_count: 1
liveness_probe:
failure_threshold: 30
success_threshold: 1
timeout: 2
period: 10
initial_delay: 2000
readiness_probe:
failure_threshold: 10
success_threshold: 1
timeout: 10
period: 10
initial_delay: 2000

Create the deployment

Using the MLClient created earlier, we'll create the deployment in the workspace. This
command starts the deployment creation and return a confirmation response while the
deployment creation continues.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Azure CLI

az ml online-deployment create --file .\create_deployment.yml --

workspace-name [YOUR_AZURE_WORKSPACE] --resource-group
[YOUR_AZURE_RESOURCE_GROUP] --subscription [YOUR_AZURE_SUBSCRIPTION]

Update traffic:
By default the current deployment is set to receive 0% traffic. you can set the traffic
percentage current deployment should receive. Sum of traffic percentages of all the
deployments with one end point shouldn't exceed 100%.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

Azure CLI

az ml online-endpoint update --name 'od-fridge-items-endpoint' --traffic

'od-fridge-items-mlflow-deploy=100' --workspace-name
[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]

Test the deployment

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

CLI example not available, please use Python SDK.

Visualize detections
Now that you have scored a test image, you can visualize the bounding boxes for this
image. To do so, be sure you have matplotlib installed.

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

YAML

CLI example not available, please use Python SDK.

Clean up resources
Don't complete this section if you plan on running other Azure Machine Learning
tutorials.

If you don't plan to use the resources you created, delete them, so you don't incur any
charges.

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group you created.
3. Select Delete resource group.
4. Enter the resource group name. Then select Delete.

You can also keep the resource group but delete a single workspace. Display the
workspace properties and select Delete.

Next steps
In this automated machine learning tutorial, you did the following tasks:

＂ Configured a workspace and prepared data for an experiment.

＂ Trained an automated object detection model
＂ Specified hyperparameter values for your model
＂ Performed a hyperparameter sweep
＂ Deployed your model
＂ Visualized detections

Learn more about computer vision in automated ML.

Learn how to set up AutoML to train computer vision models with Python.

Learn how to configure incremental training on computer vision models.

See what hyperparameters are available for computer vision tasks.

Code examples:

Azure CLI
APPLIES TO: Azure CLI ml extension v2 (current)
Review detailed code examples and use cases in the azureml-examples
repository for automated machine learning samples . Check the folders
with 'cli-automl-image-' prefix for samples specific to building computer
vision models.

７ Note

Use of the fridge objects dataset is available through the license under the MIT
License .
Tutorial: Train a classification model with
no-code AutoML in the Azure Machine
Learning studio
Article • 08/09/2023

Learn how to train a classification model with no-code AutoML using Azure Machine
Learning automated ML in the Azure Machine Learning studio. This classification model
predicts if a client will subscribe to a fixed term deposit with a financial institution.

With automated ML, you can automate away time intensive tasks. Automated machine
learning rapidly iterates over many combinations of algorithms and hyperparameters to
help you find the best model based on a success metric of your choosing.

You won't write any code in this tutorial, you'll use the studio interface to perform
training. You'll learn how to do the following tasks:

＂ Create an Azure Machine Learning workspace.

＂ Run an automated machine learning experiment.
＂ Explore model details.
＂ Deploy the recommended model.

Also try automated machine learning for these other model types:

For a no-code example of forecasting, see Tutorial: Demand forecasting & AutoML.
For a code first example of an object detection model, see the Tutorial: Train an
object detection model with AutoML and Python,

Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account .

Download the bankmarketing_train.csv data file. The y column indicates if a

customer subscribed to a fixed term deposit, which is later identified as the target
column for predictions in this tutorial.

Create a workspace
An Azure Machine Learning workspace is a foundational resource in the cloud that you
use to experiment, train, and deploy machine learning models. It ties your Azure
subscription and resource group to an easily consumed object in the service.

In this tutorial, complete the follow steps to create a workspace and continue the
tutorial.

1. Sign in to Azure Machine Learning studio

2. Select Create workspace

3. Provide the following information to configure your new workspace:

Field Description

Workspace Enter a unique name that identifies your workspace. Names must be unique
name across the resource group. Use a name that's easy to recall and to differentiate
from workspaces created by others. The workspace name is case-insensitive.

Subscription Select the Azure subscription that you want to use.

Resource Use an existing resource group in your subscription or enter a name to create a
group new resource group. A resource group holds related resources for an Azure
solution. You need contributor or owner role to use an existing resource group. For
more information about access, see Manage access to an Azure Machine Learning
workspace.

Region Select the Azure region closest to your users and the data resources to create
your workspace.

1. Select Create to create the workspace

For more information on Azure resources refer to the steps in this article, Create
resources you need to get started.

For other ways to create a workspace in Azure, Manage Azure Machine Learning
workspaces in the portal or with the Python SDK (v2).

Create an Automated Machine Learning job

You complete the following experiment set-up and run steps via the Azure Machine
Learning studio at https://ml.azure.com , a consolidated web interface that includes
machine learning tools to perform data science scenarios for data science practitioners
of all skill levels. The studio is not supported on Internet Explorer browsers.

1. Select your subscription and the workspace you created.

2. In the left pane, select Automated ML under the Authoring section.

Since this is your first automated ML experiment, you'll see an empty list and links
to documentation.

3. Select +New automated ML job.

Create and load a dataset as a data asset

Before you configure your experiment, upload your data file to your workspace in the
form of an Azure Machine Learning data asset. In the case of this tutorial, you can think
of a data asset as your dataset for the AutoML job. Doing so, allows you to ensure that
your data is formatted appropriately for your experiment.

1. Create a new data asset by selecting From local files from the +Create data asset
drop-down.

a. On the Basic info form, give your data asset a name and provide an optional
description. The automated ML interface currently only supports
TabularDatasets, so the dataset type should default to Tabular.

b. Select Next on the bottom left

c. On the Datastore and file selection form, select the default datastore that was
automatically set up during your workspace creation, workspaceblobstore
(Azure Blob Storage). This is where you'll upload your data file to make it
available to your workspace.

d. Select Upload files from the Upload drop-down.

e. Choose the bankmarketing_train.csv file on your local computer. This is the file
you downloaded as a prerequisite .

f. Select Next on the bottom left, to upload it to the default container that was
automatically set up during your workspace creation.

When the upload is complete, the Settings and preview form is pre-populated
based on the file type.

g. Verify that your data is properly formatted via the Schema form. The data
should be populated as follows. After you verify that the data is accurate, select
Next.

Field Description Value for

tutorial

File format Defines the layout and type of data stored in a file. Delimited

Delimiter One or more characters for specifying the boundary Comma

between separate, independent regions in plain text or
other data streams.

Encoding Identifies what bit to character schema table to use to UTF-8

read your dataset.

Column Indicates how the headers of the dataset, if any, will be All files have
headers treated. same headers

Skip rows Indicates how many, if any, rows are skipped in the None
dataset.

h. The Schema form allows for further configuration of your data for this
experiment. For this example, select the toggle switch for the day_of_week, so
as to not include it. Select Next.

i. On the Confirm details form, verify the information matches what was
previously populated on the Basic info, Datastore and file selection and
Settings and preview forms.

j. Select Create to complete the creation of your dataset.

k. Select your dataset once it appears in the list.

l. Review the data by selecting the data asset and looking at the preview tab that
populates to ensure you didn't include day_of_week then, select Close.

m. Select Next.

Configure job
After you load and configure your data, you can set up your experiment. This setup
includes experiment design tasks such as, selecting the size of your compute
environment and specifying what column you want to predict.

1. Select the Create new radio button.

2. Populate the Configure Job form as follows:

a. Enter this experiment name: my-1st-automl-experiment

b. Select y as the target column, what you want to predict. This column indicates
whether the client subscribed to a term deposit or not.
c. Select compute cluster as your compute type.

d. A compute target is a local or cloud-based resource environment used to run

your training script or host your service deployment. For this experiment, you
can either try a cloud-based serverless compute (preview) or create your own
cloud-based compute.
i. To use serverless compute, enable the preview feature, select Serverless, and
skip the rest of this step.
ii. To create your own compute target, select +New to configure your compute
target.

i. Populate the Select virtual machine form to set up your compute.

Field Description Value for tutorial

Location Your region that you'd like to run West US 2

the machine from

Virtual machine tier Select what priority your Dedicated

experiment should have

Virtual machine type Select the virtual machine type for CPU (Central
your compute. Processing Unit)

Virtual machine size Select the virtual machine size for Standard_DS12_V2
your compute. A list of
recommended sizes is provided
based on your data and experiment
type.

ii. Select Next to populate the Configure settings form.

Field Description Value for

tutorial

Compute name A unique name that identifies your compute automl-

context. compute

Min / Max nodes To profile data, you must specify 1 or more Min nodes: 1
nodes. Max nodes:
6

Idle seconds Idle time before the cluster is automatically 120 (default)
before scale down scaled down to the minimum node count.

Advanced settings Settings to configure and authorize a virtual None

network for your experiment.
iii. Select Create to create your compute target.

This takes a couple minutes to complete.

iv. After creation, select your new compute target from the drop-down list.

e. Select Next.

3. On the Select task and settings form, complete the setup for your automated ML
experiment by specifying the machine learning task type and configuration
settings.

a. Select Classification as the machine learning task type.

b. Select View additional configuration settings and populate the fields as

follows. These settings are to better control the training job. Otherwise, defaults
are applied based on experiment selection and data.

Additional configurations Description Value for tutorial

Primary metric Evaluation metric that the AUC_weighted

machine learning algorithm
will be measured by.

Explain best model Automatically shows Enable

explainability on the best
Additional configurations Description Value for tutorial

model created by
automated ML.

Blocked algorithms Algorithms you want to None

exclude from the training
job

Additional classification These settings help improve Positive class label: None
settings the accuracy of your model

Exit criterion If a criteria is met, the Training job time (hours):

training job is stopped. 1
Metric score threshold:
None

Concurrency The maximum number of Max concurrent iterations:

parallel iterations executed 5
per iteration

Select Save.

c. Select Next.

4. On the [Optional] Validate and test form,

a. Select k-fold cross-validation as your Validation type.
b. Select 2 as your Number of cross validations.

5. Select Finish to run the experiment. The Job Detail screen opens with the Job
status at the top as the experiment preparation begins. This status updates as the
experiment progresses. Notifications also appear in the top right corner of the
studio to inform you of the status of your experiment.

） Important

Preparation takes 10-15 minutes to prepare the experiment run. Once running, it
takes 2-3 minutes more for each iteration.

In production, you'd likely walk away for a bit. But for this tutorial, we suggest you
start exploring the tested algorithms on the Models tab as they complete while the
others are still running.

Explore models
Navigate to the Models tab to see the algorithms (models) tested. By default, the
models are ordered by metric score as they complete. For this tutorial, the model that
scores the highest based on the chosen AUC_weighted metric is at the top of the list.

While you wait for all of the experiment models to finish, select the Algorithm name of
a completed model to explore its performance details.

The following navigates through the Details and the Metrics tabs to view the selected
model's properties, metrics, and performance charts.

Model explanations
While you wait for the models to complete, you can also take a look at model
explanations and see which data features (raw or engineered) influenced a particular
model's predictions.

These model explanations can be generated on demand, and are summarized in the
model explanations dashboard that's part of the Explanations (preview) tab.

To generate model explanations,

1. Select Job 1 at the top to navigate back to the Models screen.

2. Select the Models tab.

3. For this tutorial, select the first MaxAbsScaler, LightGBM model.

4. Select the Explain model button at the top. On the right, the Explain model pane
appears.

5. Select the automl-compute that you created previously. This compute cluster
initiates a child job to generate the model explanations.

6. Select Create at the bottom. A green success message appears towards the top of
your screen.

７ Note

The explainability job takes about 2-5 minutes to complete.

7. Select the Explanations (preview) button. This tab populates once the
explainability run completes.

8. On the left hand side, expand the pane and select the row that says raw under
Features.

9. Select the Aggregate feature importance tab on the right. This chart shows which
data features influenced the predictions of the selected model.

In this example, the duration appears to have the most influence on the predictions
of this model.
Deploy the best model
The automated machine learning interface allows you to deploy the best model as a
web service in a few steps. Deployment is the integration of the model so it can predict
on new data and identify potential areas of opportunity.

For this experiment, deployment to a web service means that the financial institution
now has an iterative and scalable web solution for identifying potential fixed term
deposit customers.

Check to see if your experiment run is complete. To do so, navigate back to the parent
job page by selecting Job 1 at the top of your screen. A Completed status is shown on
the top left of the screen.

Once the experiment run is complete, the Details page is populated with a Best model
summary section. In this experiment context, VotingEnsemble is considered the best
model, based on the AUC_weighted metric.

We deploy this model, but be advised, deployment takes about 20 minutes to complete.
The deployment process entails several steps including registering the model,
generating resources, and configuring them for the web service.

1. Select VotingEnsemble to open the model-specific page.

2. Select the Deploy menu in the top-left and select Deploy to web service.

3. Populate the Deploy a model pane as follows:

Field Value

Deployment name my-automl-deploy

Deployment My first automated machine learning experiment deployment

description

Compute type Select Azure Container Instance (ACI)

Enable Disable.
authentication

Use custom Disable. Allows for the default driver file (scoring script) and
deployments environment file to be auto-generated.

For this example, we use the defaults provided in the Advanced menu.

4. Select Deploy.
A green success message appears at the top of the Job screen, and in the Model
summary pane, a status message appears under Deploy status. Select Refresh
periodically to check the deployment status.

Now you have an operational web service to generate predictions.

Proceed to the Next Steps to learn more about how to consume your new web service,
and test your predictions using Power BI's built in Azure Machine Learning support.

Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store.
Delete only the deployment files to minimize costs to your account, or if you want to
keep your workspace and experiment files. Otherwise, delete the entire resource group,
if you don't plan to use any of the files.

Delete the deployment instance

Delete just the deployment instance from Azure Machine Learning at
https://ml.azure.com/, if you want to keep the resource group and workspace for other
tutorials and exploration.

1. Go to Azure Machine Learning . Navigate to your workspace and on the left

under the Assets pane, select Endpoints.

2. Select the deployment you want to delete and select Delete.

3. Select Proceed.

Delete the resource group

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.
3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
In this automated machine learning tutorial, you used Azure Machine Learning's
automated ML interface to create and deploy a classification model. See these articles
for more information and next steps:

Consume a web service

Learn more about automated machine learning.

For more information on classification metrics and charts, see the Understand
automated machine learning results article.

７ Note

This Bank Marketing dataset is made available under the Creative Commons (CCO:
Public Domain) License . Any rights in individual contents of the database are
licensed under the Database Contents License and available on Kaggle . This
dataset was originally available within the UCI Machine Learning Database .

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to

Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier,
62:22-31, June 2014.
Tutorial: Forecast demand with no-code
automated machine learning in the
Azure Machine Learning studio
Article • 11/25/2023

Learn how to create a time-series forecasting model without writing a single line of code
using automated machine learning in the Azure Machine Learning studio. This model
predicts rental demand for a bike sharing service.

You don't write any code in this tutorial, you use the studio interface to perform training.
You learn how to do the following tasks:

＂ Create and load a dataset.

＂ Configure and run an automated ML experiment.
＂ Specify forecasting settings.
＂ Explore the experiment results.
＂ Deploy the best model.

Also try automated machine learning for these other model types:

For a no-code example of a classification model, see Tutorial: Create a classification

model with automated ML in Azure Machine Learning.
For a code first example of an object detection model, see the Tutorial: Train an
object detection model with AutoML and Python.

Prerequisites
An Azure Machine Learning workspace. See Create workspace resources.

Download the bike-no.csv data file

Sign in to the studio

For this tutorial, you create your automated ML experiment run in Azure Machine
Learning studio, a consolidated web interface that includes machine learning tools to
perform data science scenarios for data science practitioners of all skill levels. The studio
isn't supported on Internet Explorer browsers.

1. Sign in to Azure Machine Learning studio .

2. Select your subscription and the workspace you created.

3. Select Get started.

4. In the left pane, select Automated ML under the Author section.

5. Select +New automated ML job.

Create and load dataset

Before you configure your experiment, upload your data file to your workspace in the
form of an Azure Machine Learning dataset. Doing so, allows you to ensure that your
data is formatted appropriately for your experiment.

1. On the Select dataset form, select From local files from the +Create dataset drop-
down.

a. On the Basic info form, give your dataset a name and provide an optional
description. The dataset type should default to Tabular, since automated ML in
Azure Machine Learning studio currently only supports tabular datasets.

b. Select Next on the bottom left

c. On the Datastore and file selection form, select the default datastore that was
automatically set up during your workspace creation, workspaceblobstore
(Azure Blob Storage). This is the storage location where you upload your data
file.

d. Select Upload files from the Upload drop-down.

e. Choose the bike-no.csv file on your local computer. This is the file you
downloaded as a prerequisite .

f. Select Next

When the upload is complete, the Settings and preview form is pre-populated
based on the file type.

g. Verify that the Settings and preview form is populated as follows and select
Next.

Field Description Value for

tutorial

File format Defines the layout and type of data stored in a file. Delimited
Field Description Value for
tutorial

Delimiter One or more characters for specifying the boundary Comma

between separate, independent regions in plain text or
other data streams.

Encoding Identifies what bit to character schema table to use to UTF-8

read your dataset.

Column Indicates how the headers of the dataset, if any, will be Only first file
headers treated. has headers

Skip rows Indicates how many, if any, rows are skipped in the None
dataset.

h. The Schema form allows for further configuration of your data for this
experiment.

i. For this example, choose to ignore the casual and registered columns. These
columns are a breakdown of the cnt column so, therefore we don't include
them.

ii. Also for this example, leave the defaults for the Properties and Type.

iii. Select Next.

i. On the Confirm details form, verify the information matches what was
previously populated on the Basic info and Settings and preview forms.

j. Select Create to complete the creation of your dataset.

k. Select your dataset once it appears in the list.

l. Select Next.

Configure job
After you load and configure your data, set up your remote compute target and select
which column in your data you want to predict.

1. Populate the Configure job form as follows:

a. Enter an experiment name: automl-bikeshare

b. Select cnt as the target column, what you want to predict. This column indicates
the number of total bike share rentals.
c. Select compute cluster as your compute type.

d. Select +New to configure your compute target. Automated ML only supports

Azure Machine Learning compute.

i. Populate the Select virtual machine form to set up your compute.

Field Description Value for tutorial

Virtual machine tier Select what priority your experiment Dedicated

should have

Virtual machine type Select the virtual machine type for CPU (Central
your compute. Processing Unit)

Virtual machine size Select the virtual machine size for your Standard_DS12_V2
compute. A list of recommended sizes
is provided based on your data and
experiment type.

ii. Select Next to populate the Configure settings form.

Field Description Value for

tutorial

Compute name A unique name that identifies your compute bike-

context. compute

Min / Max nodes To profile data, you must specify one or more Min nodes: 1
nodes. Max nodes:
6

Idle seconds before Idle time before the cluster is automatically 120 (default)
scale down scaled down to the minimum node count.

Advanced settings Settings to configure and authorize a virtual None

network for your experiment.

iii. Select Create to get the compute target.

This takes a couple minutes to complete.

iv. After creation, select your new compute target from the drop-down list.

e. Select Next.

Select forecast settings

Complete the setup for your automated ML experiment by specifying the machine
learning task type and configuration settings.

1. On the Task type and settings form, select Time series forecasting as the machine
learning task type.

2. Select date as your Time column and leave Time series identifiers blank.

3. The Frequency is how often your historic data is collected. Keep Autodetect
selected.
4.

5. The forecast horizon is the length of time into the future you want to predict.
Deselect Autodetect and type 14 in the field.

6. Select View additional configuration settings and populate the fields as follows.
These settings are to better control the training job and specify settings for your
forecast. Otherwise, defaults are applied based on experiment selection and data.

Additional configurations Description Value for tutorial

Primary metric Evaluation metric that the Normalized root mean

machine learning algorithm squared error
will be measured by.

Explain best model Automatically shows Enable

explainability on the best
model created by automated
ML.

Blocked algorithms Algorithms you want to Extreme Random Trees

exclude from the training job

Additional forecasting These settings help improve

settings the accuracy of your model.
Forecast target lags: None
Forecast target lags: how far Target rolling window size:
back you want to construct the None
lags of the target variable
Target rolling window:
specifies the size of the rolling
window over which features,
such as the max, min and sum,
is generated.

Exit criterion If a criteria is met, the training Training job time (hours): 3
job is stopped. Metric score threshold:
None
Additional configurations Description Value for tutorial

Concurrency The maximum number of Max concurrent iterations: 6

parallel iterations executed per
iteration

Select Save.

7. Select Next.

8. On the [Optional] Validate and test form,

a. Select k-fold cross-validation as your Validation type.
b. Select 5 as your Number of cross validations.

Run experiment
To run your experiment, select Finish. The Job details screen opens with the Job status
at the top next to the job number. This status updates as the experiment progresses.
Notifications also appear in the top right corner of the studio, to inform you of the
status of your experiment.

） Important

Preparation takes 10-15 minutes to prepare the experiment job. Once running, it
takes 2-3 minutes more for each iteration.

In production, you'd likely walk away for a bit as this process takes time. While you
wait, we suggest you start exploring the tested algorithms on the Models tab as
they complete.

While you wait for all of the experiment models to finish, select the Algorithm name of
a completed model to explore its performance details.
The following example navigates to select a model from the list of models that the job
created. Then, you select the Overview and the Metrics tabs to view the selected
model's properties, metrics and performance charts.

Deploy the model

Automated machine learning in Azure Machine Learning studio allows you to deploy the
best model as a web service in a few steps. Deployment is the integration of the model
so it can predict on new data and identify potential areas of opportunity.

For this experiment, deployment to a web service means that the bike share company
now has an iterative and scalable web solution for forecasting bike share rental demand.

Once the job is complete, navigate back to parent job page by selecting Job 1 at the top
of your screen.

In the Best model summary section, the best model in the context of this experiment, is
selected based on the Normalized root mean squared error metric.

1. Select the best model to open the model-specific page.

2. Select the Deploy button located in the top-left area of the screen.

3. Populate the Deploy a model pane as follows:

Field Value

Deployment name bikeshare-deploy

Deployment bike share demand deployment

description

Compute type Select Azure Compute Instance (ACI)

Enable authentication Disable.

Use custom Disable. Disabling allows for the default driver file (scoring script)
deployment assets and environment file to be autogenerated.

For this example, we use the defaults provided in the Advanced menu.

4. Select Deploy.

A green success message appears at the top of the Job screen stating that the
deployment was started successfully. The progress of the deployment can be
found in the Model summary pane under Deploy status.

Once deployment succeeds, you have an operational web service to generate

predictions.

Proceed to the Next steps to learn more about how to consume your new web service,
and test your predictions using Power BI's built in Azure Machine Learning support.

Delete the deployment instance

Delete just the deployment instance from the Azure Machine Learning studio, if you
want to keep the resource group and workspace for other tutorials and exploration.

1. Go to the Azure Machine Learning studio . Navigate to your workspace and on

the left under the Assets pane, select Endpoints.

2. Select the deployment you want to delete and select Delete.

3. Select Proceed.

Delete the resource group

） Important

The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group that you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
In this tutorial, you used automated ML in the Azure Machine Learning studio to create
and deploy a time series forecasting model that predicts bike share rental demand.
See this article for steps on how to create a Power BI supported schema to facilitate
consumption of your newly deployed web service:

Consume a web service

Learn more about automated machine learning.

For more information on classification metrics and charts, see the Understand
automated machine learning results article.

７ Note

This bike share dataset has been modified for this tutorial. This dataset was made
available as part of a Kaggle competition and was originally available via Capital
Bikeshare . It can also be found within the UCI Machine Learning Database .

Source: Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble
detectors and background knowledge, Progress in Artificial Intelligence (2013): pp.
1-15, Springer Berlin Heidelberg.
Tutorial: Train an image classification
TensorFlow model using the Azure
Machine Learning Visual Studio Code
Extension (preview)
Article • 11/15/2023

APPLIES TO: Azure CLI ml extension v2 (current)

Learn how to train an image classification model to recognize hand-written numbers

using TensorFlow and the Azure Machine Learning Visual Studio Code Extension.

） Important

This feature is currently in public preview. This preview version is provided without
a service-level agreement, and we don't recommend it for production workloads.
Certain features might not be supported or might have constrained capabilities.

For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .

In this tutorial, you learn the following tasks:

＂ Understand the code

＂ Create a workspace
＂ Train a model

Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning . If you're using the free subscription, only CPU clusters
are supported.
Install Visual Studio Code , a lightweight, cross-platform code editor.
Azure Machine Learning Studio Visual Studio Code extension. For install
instructions see the Setup Azure Machine Learning Visual Studio Code extension
guide
CLI (v2). For installation instructions, see Install, set up, and use the CLI (v2)
Clone the community driven repository
Bash

git clone https://github.com/Azure/azureml-examples.git

Understand the code

The code for this tutorial uses TensorFlow to train an image classification machine
learning model that categorizes handwritten digits from 0-9. It does so by creating a
neural network that takes the pixel values of 28 px x 28 px image as input and outputs a
list of 10 probabilities, one for each of the digits being classified. Below is a sample of
what the data looks like.

Create a workspace
The first thing you have to do to build an application in Azure Machine Learning is to
create a workspace. A workspace contains the resources to train models as well as the
trained models themselves. For more information, see what is a workspace.

1. Open the azureml-examples/cli/jobs/single-step/tensorflow/mnist directory from

the community driven repository in Visual Studio Code.

2. On the Visual Studio Code activity bar, select the Azure icon to open the Azure
Machine Learning view.

3. In the Azure Machine Learning view, right-click your subscription node and select
Create Workspace.
4. A specification file appears. Configure the specification file with the following
options.

yml

$schema:
https://azuremlschemas.azureedge.net/latest/workspace.schema.json
name: TeamWorkspace
location: WestUS2
display_name: team-ml-workspace
description: A workspace for training machine learning models
tags:
purpose: training
team: ml-team

The specification file creates a workspace called TeamWorkspace in the WestUS2

region. The rest of the options defined in the specification file provide friendly
naming, descriptions, and tags for the workspace.

5. Right-click the specification file and select AzureML: Execute YAML. Creating a
resource uses the configuration options defined in the YAML specification file and
submits a job using the CLI (v2). At this point, a request to Azure is made to create
a new workspace and dependent resources in your account. After a few minutes,
the new workspace appears in your subscription node.
6. Set TeamWorkspace as your default workspace. Doing so places resources and jobs
you create in the workspace by default. Select the Set Azure Machine Learning
Workspace button on the Visual Studio Code status bar and follow the prompts to
set TeamWorkspace as your default workspace.

For more information on workspaces, see how to manage resources in VS Code.

Train the model

During the training process, a TensorFlow model is trained by processing the training
data and learning patterns embedded within it for each of the respective digits being
classified.

Like workspaces and compute targets, training jobs are defined using resource
templates. For this sample, the specification is defined in the job.yml file which looks like
the following:

yml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >
python train.py
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:48
resources:
instance_type: Standard_NC12
instance_count: 3
experiment_name: tensorflow-mnist-example
description: Train a basic neural network with TensorFlow on the MNIST
dataset.

This specification file submits a training job called tensorflow-mnist-example to the

recently created gpu-cluster computer target that runs the code in the train.py Python
script. The environment used is one of the curated environments provided by Azure
Machine Learning which contains TensorFlow and other software dependencies required
to run the training script. For more information on curated environments, see Azure
Machine Learning curated environments.

To submit the training job:

1. Open the job.yml file.

2. Right-click the file in the text editor and select AzureML: Execute YAML.

At this point, a request is sent to Azure to run your experiment on the selected compute
target in your workspace. This process takes several minutes. The amount of time to run
the training job is impacted by several factors like the compute type and training data
size. To track the progress of your experiment, right-click the current run node and
select View Job in Azure portal.

When the dialog requesting to open an external website appears, select Open.

When the model is done training, the status label next to the run node updates to
"Completed".

Next steps
In this tutorial, you learn the following tasks:

＂ Understand the code

＂ Create a workspace
＂ Train a model

For next steps, see:

Launch Visual Studio Code integrated with Azure Machine Learning (preview)
For a walkthrough of how to edit, run, and debug code locally, see the Python
hello-world tutorial .
Run Jupyter Notebooks in Visual Studio Code using a remote Jupyter server.
For a walkthrough of how to train with Azure Machine Learning outside of Visual
Studio Code, see Tutorial: Train and deploy a model with Azure Machine Learning.
Tutorial 1: Develop and register a feature
set with managed feature store
Article • 11/28/2023

This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.

You can use Azure Machine Learning managed feature store to discover, create, and
operationalize features. The machine learning lifecycle includes a prototyping phase,
where you experiment with various features. It also involves an operationalization phase,
where models are deployed and inference steps look up feature data. Features serve as
the connective tissue in the machine learning lifecycle. To learn more about basic
concepts for managed feature store, see What is managed feature store? and
Understanding top-level entities in managed feature store.

This tutorial describes how to create a feature set specification with custom
transformations. It then uses that feature set to generate training data, enable
materialization, and perform a backfill. Materialization computes the feature values for a
feature window, and then stores those values in a materialization store. All feature
queries can then use those values from the materialization store.

Without materialization, a feature set query applies the transformations to the source on
the fly, to compute the features before it returns the values. This process works well for
the prototyping phase. However, for training and inference operations in a production
environment, we recommend that you materialize the features, for greater reliability and
availability.

This tutorial is the first part of the managed feature store tutorial series. Here, you learn
how to:

＂ Create a new, minimal feature store resource.

＂ Develop and locally test a feature set with feature transformation capability.
＂ Register a feature store entity with the feature store.
＂ Register the feature set that you developed with the feature store.
＂ Generate a sample training DataFrame by using the features that you created.
＂ Enable offline materialization on the feature sets, and backfill the feature data.

This tutorial series has two tracks:

The SDK-only track uses only Python SDKs. Choose this track for pure, Python-
based development and deployment.
The SDK and CLI track uses the Python SDK for feature set development and
testing only, and it uses the CLI for CRUD (create, read, update, and delete)
operations. This track is useful in continuous integration and continuous delivery
(CI/CD) or GitOps scenarios, where CLI/YAML is preferred.

Prerequisites
Before you proceed with this tutorial, be sure to cover these prerequisites:

An Azure Machine Learning workspace. For more information about workspace

creation, see Quickstart: Create workspace resources.

On your user account, the Owner role for the resource group where the feature
store is created.

If you choose to use a new resource group for this tutorial, you can easily delete all
the resources by deleting the resource group.

Prepare the notebook environment

This tutorial uses an Azure Machine Learning Spark notebook for development.

1. In the Azure Machine Learning studio environment, select Notebooks on the left
pane, and then select the Samples tab.

2. Browse to the featurestore_sample directory (select Samples > SDK v2 > sdk >
python > featurestore_sample), and then select Clone.


3. The Select target directory panel opens. Select the Users directory, then select
your user name, and finally select Clone.

4. To configure the notebook environment, you must upload the conda.yml file:
a. Select Notebooks on the left pane, and then select the Files tab.
b. Browse to the env directory (select Users > your_user_name >
featurestore_sample > project > env), and then select the conda.yml file.
c. Select Download.


a. Select Serverless Spark Compute in the top navigation Compute dropdown.

This operation might take one to two minutes. Wait for a status bar in the top to
display Configure session.
b. Select Configure session in the top status bar.
c. Select Python packages.
d. Select Upload conda files.
e. Select the conda.yml file you downloaded on your local device.
f. (Optional) Increase the session time-out (idle time in minutes) to reduce the
serverless spark cluster startup time.

5. In the Azure Machine Learning environment, open the notebook, and then select
Configure session.


6. On the Configure session panel, select Python packages.

7. Upload the Conda file:

a. On the Python packages tab, select Upload Conda file.
b. Browse to the directory that hosts the Conda file.
c. Select conda.yml, and then select Open.

8. Select Apply.


Start the Spark session

Python

# Run this cell to start the spark session (any code block will start the
session ). This can take around 10 mins.
print("start spark session")

Set up the root directory for the samples

Python

import os

# Please update <your_user_alias> below (or any custom directory you

uploaded the samples to).
# You can find the name from the directory structure in the left navigation
panel.
root_dir = "./Users/<your_user_alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")

Set up the CLI

SDK track
Not applicable.

７ Note

You use a feature store to reuse features across projects. You use a project
workspace (an Azure Machine Learning workspace) to train inference models, by
taking advantage of features from feature stores. Many project workspaces can
share and reuse the same feature store.

SDK track

This tutorial uses two SDKs:

Feature store CRUD SDK

You use the same MLClient (package name azure-ai-ml ) SDK that you use
with the Azure Machine Learning workspace. A feature store is implemented
as a type of workspace. As a result, this SDK is used for CRUD operations for
feature stores, feature sets, and feature store entities.

Feature store core SDK

This SDK ( azureml-featurestore ) is for feature set development and

consumption. Later steps in this tutorial describe these operations:
Develop a feature set specification.
Retrieve feature data.
List or get a registered feature set.
Generate and resolve feature retrieval specifications.
Generate training and inference data by using point-in-time joins.

This tutorial doesn't require explicit installation of those SDKs, because the earlier
conda.yml instructions cover this step.

Create a minimal feature store

1. Set feature store parameters, including name, location, and other values.

Python

# We use the subscription, resource group, region of this active

project workspace.
# You can optionally replace them to create the resources in a
different subsciprtion/resource group, or use existing resources.
import os

featurestore_name = "<FEATURESTORE_NAME>"
featurestore_location = "eastus"
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]

2. Create the feature store.

SDK track

Python

from azure.ai.ml import MLClient

from azure.ai.ml.entities import (
FeatureStore,
FeatureStoreEntity,
FeatureSet,
)
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

ml_client = MLClient(
AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
)

fs = FeatureStore(name=featurestore_name,
location=featurestore_location)
# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

3. Initialize a feature store core SDK client for Azure Machine Learning.

As explained earlier in this tutorial, the feature store core SDK client is used to
develop and consume features.

Python

# feature store client

from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

4. Grant the "Azure Machine Learning Data Scientist" role on the feature store to your
user identity. Obtain your Microsoft Entra object ID value from the Azure portal, as
described in Find the user object ID.

Assign the AzureML Data Scientist role to your user identity, so that it can create
resources in feature store workspace. The permissions might need some time to
propagate.

For more information more about access control, see Manage access control for
managed feature store.

Python

your_aad_objectid = "<USER_AAD_OBJECTID>"

!az role assignment create --role "AzureML Data Scientist" --assignee-

object-id $your_aad_objectid --assignee-principal-type User --scope
$feature_store_arm_id

Prototype and develop a feature set

In these steps, you build a feature set named transactions that has rolling window
aggregate-based features:

1. Explore the transactions source data.

This notebook uses sample data hosted in a publicly accessible blob container. It
can be read into Spark only through a wasbs driver. When you create feature sets
by using your own source data, host them in an Azure Data Lake Storage Gen2
account, and use an abfss driver in the data path.

Python

# remove the "." in the roor directory path as we need to generate

absolute path to read from spark
transactions_source_data_path =
"wasbs://[email protected]/feature-store-
prp/datasources/transactions-source/*.parquet"
transactions_src_df = spark.read.parquet(transactions_source_data_path)
display(transactions_src_df.head(5))
# Note: display(training_df.head(5)) displays the timestamp column in a
different format. You can can call transactions_src_df.show() to see
correctly formatted value

2. Locally develop the feature set.

A feature set specification is a self-contained definition of a feature set that you

can locally develop and test. Here, you create these rolling window aggregate
features:

transactions three-day count

transactions amount three-day sum

transactions amount three-day avg

transactions seven-day count

transactions amount seven-day sum

transactions amount seven-day avg

Review the feature transformation code file:

featurestore/featuresets/transactions/transformation_code/transaction_transform.py.
Note the rolling aggregation defined for the features. This is a Spark transformer.

To learn more about the feature set and transformations, see What is managed
feature store?.

Python

from azureml.featurestore import create_feature_set_spec

from azureml.featurestore.contracts import (
DateTimeOffset,
TransformationCode,
Column,
ColumnType,
SourceType,
TimestampColumn,
)
from azureml.featurestore.feature_source import ParquetFeatureSource

transactions_featureset_code_path = (
root_dir +
"/featurestore/featuresets/transactions/transformation_code"
)

transactions_featureset_spec = create_feature_set_spec(
source=ParquetFeatureSource(

path="wasbs://[email protected]/feature-
store-prp/datasources/transactions-source/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
),
feature_transformation=TransformationCode(
path=transactions_featureset_code_path,

3. Export as a feature set specification.

To register the feature set specification with the feature store, you must save that
specification in a specific format.

Review the generated transactions feature set specification. Open this file from
the file tree to see the specification:
featurestore/featuresets/accounts/spec/FeaturesetSpec.yaml.

The specification contains these elements:

source : A reference to a storage resource. In this case, it's a Parquet file in a

blob storage resource.

features : A list of features and their datatypes. If you provide transformation

code, the code must return a DataFrame that maps to the features and
datatypes.
index_columns : The join keys required to access values from the feature set.

To learn more about the specification, see Understanding top-level entities in

managed feature store and CLI (v2) feature set YAML schema.

Persisting the feature set specification offers another benefit: the feature set
specification can be source controlled.

Python

import os

# Create a new folder to dump the feature set specification.

transactions_featureset_spec_folder = (
root_dir + "/featurestore/featuresets/transactions/spec"
)
# Check if the folder exists, create one if it does not exist.
if not os.path.exists(transactions_featureset_spec_folder):
os.makedirs(transactions_featureset_spec_folder)

transactions_featureset_spec.dump(transactions_featureset_spec_folder,
overwrite=False)

Register a feature store entity

As a best practice, entities help enforce use of the same join key definition across
feature sets that use the same logical entities. Examples of entities include accounts and
customers. Entities are typically created once and then reused across feature sets. To
learn more, see Understanding top-level entities in managed feature store.

SDK track

1. Initialize the feature store CRUD client.

As explained earlier in this tutorial, MLClient is used for creating, reading,

updating, and deleting a feature store asset. The notebook code cell sample
shown here searches for the feature store that you created in an earlier step.
Here, you can't reuse the same ml_client value that you used earlier in this
tutorial, because it's scoped at the resource group level. Proper scoping is a
prerequisite for feature store creation.

In this code sample, the client is scoped at feature store level.

Python

# MLClient for feature store.

fs_client = MLClient(
AzureMLOnBehalfOfCredential(),
featurestore_subscription_id,
featurestore_resource_group_name,
featurestore_name,
)

2. Register the account entity with the feature store.

Create an account entity that has the join key accountID of type string .

Python
from azure.ai.ml.entities import DataColumn, DataColumnType

account_entity_config = FeatureStoreEntity(
name="account",
version="1",
index_columns=[DataColumn(name="accountID",
type=DataColumnType.STRING)],
stage="Development",
description="This entity represents user account index key
accountID.",
tags={"data_typ": "nonPII"},
)

poller =
fs_client.feature_store_entities.begin_create_or_update(account_ent
ity_config)
print(poller.result())

Register the transaction feature set with the

feature store
Use this code to register a feature set asset with the feature store. You can then reuse
that asset and easily share it. Registration of a feature set asset offers managed
capabilities, including versioning and materialization. Later steps in this tutorial series
cover managed capabilities.

SDK track

Python

from azure.ai.ml.entities import FeatureSetSpecification

transaction_fset_config = FeatureSet(
name="transactions",
version="1",
description="7-day and 3-day rolling aggregation of transactions
featureset",
entities=[f"azureml:account:1"],
stage="Development",

specification=FeatureSetSpecification(path=transactions_featureset_spec_
folder),
tags={"data_type": "nonPII"},
)

poller =
fs_client.feature_sets.begin_create_or_update(transaction_fset_config)
print(poller.result())

Explore the feature store UI

Feature store asset creation and updates can happen only through the SDK and CLI. You
can use the UI to search or browse through the feature store:

1. Open the Azure Machine Learning global landing page .

2. Select Feature stores on the left pane.
3. From the list of accessible feature stores, select the feature store that you created
earlier in this tutorial.

Grant the Storage Blob Data Reader role access

to your user account in the offline store
The Storage Blob Data Reader role must be assigned to your user account on the offline
store. This ensures that the user account can read materialized feature data from the
offline materialization store.

SDK track

1. Obtain your Microsoft Entra object ID value from the Azure portal, as
described in Find the user object ID.

2. Obtain information about the offline materialization store from the Feature
Store Overview page in the Feature Store UI. You can find the values for the
storage account subscription ID, storage account resource group name, and
storage account name for offline materialization store in the Offline
materialization store card.


For more information about access control, see Manage access control for
managed feature store.

Execute this code cell for role assignment. The permissions might need some
time to propagate.

Python

# This utility function is created for ease of use in the docs

tutorials. It uses standard azure API's.
# You can optionally inspect it
`featurestore/setup/setup_storage_uai.py`.
import sys

sys.path.insert(0, root_dir + "/featurestore/setup")

from setup_storage_uai import
grant_user_aad_storage_data_reader_role

your_aad_objectid = "<USER_AAD_OBJECTID>"
storage_subscription_id = "<SUBSCRIPTION_ID>"
storage_resource_group_name = "<RESOURCE_GROUP>"
storage_account_name = "<STORAGE_ACCOUNT_NAME>"

grant_user_aad_storage_data_reader_role(
AzureMLOnBehalfOfCredential(),
your_aad_objectid,
storage_subscription_id,
storage_resource_group_name,
storage_account_name,
)
Generate a training data DataFrame by using
the registered feature set
1. Load observation data.

Observation data typically involves the core data used for training and inferencing.
This data joins with the feature data to create the full training data resource.

Observation data is data captured during the event itself. Here, it has core
transaction data, including transaction ID, account ID, and transaction amount
values. Because you use it for training, it also has an appended target variable
(is_fraud).

Python

observation_data_path =
"wasbs://[email protected]/feature-store-
prp/observation_data/train/*.parquet"
observation_data_df = spark.read.parquet(observation_data_path)
obs_data_timestamp_column = "timestamp"

display(observation_data_df)
# Note: the timestamp column is displayed in a different format.
Optionally, you can can call training_df.show() to see correctly
formatted value

2. Get the registered feature set, and list its features.

Python

# Look up the featureset by providing a name and a version.

transactions_featureset = featurestore.feature_sets.get("transactions",
"1")
# List its features.
transactions_featureset.features

Python

# Print sample values.

display(transactions_featureset.to_spark_dataframe().head(5))

3. Select the features that become part of the training data. Then, use the feature
store SDK to generate the training data itself.

Python
from azureml.featurestore import get_offline_features

# You can select features in pythonic way.

features = [
transactions_featureset.get_feature("transaction_amount_7d_sum"),
transactions_featureset.get_feature("transaction_amount_7d_avg"),
]

# You can also specify features in string form:

featureset:version:feature.
more_features = [
f"transactions:1:transaction_3d_count",
f"transactions:1:transaction_amount_3d_avg",
]

more_features = featurestore.resolve_feature_uri(more_features)
features.extend(more_features)

# Generate training dataframe by using feature data and observation

data.
training_df = get_offline_features(
features=features,
observation_data=observation_data_df,
timestamp_column=obs_data_timestamp_column,
)

# Ignore the message that says feature set is not materialized

(materialization is optional). We will enable materialization in the
subsequent part of the tutorial.
display(training_df)
# Note: the timestamp column is displayed in a different format.
Optionally, you can can call training_df.show() to see correctly
formatted value

A point-in-time join appends the features to the training data.

Enable offline materialization on the

transactions feature set
After feature set materialization is enabled, you can perform a backfill. You can also
schedule recurrent materialization jobs. For more information, see the third tutorial in
the series.

SDK track
Set spark.sql.shuffle.partitions in the yaml file according to the
feature data size

The spark configuration spark.sql.shuffle.partitions is an OPTIONAL parameter

that can affect the number of parquet files generated (per day) when the feature set
is materialized into the offline store. The default value of this parameter is 200. As
best practice, avoid generation of many small parquet files. If offline feature
retrieval becomes slow after feature set materialization, go to the corresponding
folder in the offline store to check whether the issue involves too many small
parquet files (per day), and adjust the value of this parameter accordingly.

７ Note

The sample data used in this notebook is small. Therefore, this parameter is set
to 1 in the featureset_asset_offline_enabled.yaml file.

Python

from azure.ai.ml.entities import (

MaterializationSettings,
MaterializationComputeResource,
)

transactions_fset_config =
fs_client._featuresets.get(name="transactions", version="1")

transactions_fset_config.materialization_settings =
MaterializationSettings(
offline_enabled=True,

resource=MaterializationComputeResource(instance_type="standard_e8s_v3")
,
spark_configuration={
"spark.driver.cores": 4,
"spark.driver.memory": "36g",
"spark.executor.cores": 4,
"spark.executor.memory": "36g",
"spark.executor.instances": 2,
"spark.sql.shuffle.partitions": 1,
},
schedule=None,
)

fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)
print(fs_poller.result())

You can also save the feature set asset as a YAML resource.

SDK track

Python

## uncomment to run
transactions_fset_config.dump(
root_dir
+
"/featurestore/featuresets/transactions/featureset_asset_offline_enabled
.yaml"
)

Backfill data for the transactions feature set

As explained earlier, materialization computes the feature values for a feature window,
and it stores these computed values in a materialization store. Feature materialization
increases the reliability and availability of the computed values. All feature queries now
use the values from the materialization store. This step performs a one-time backfill for
a feature window of 18 months.

７ Note

You might need to determine a backfill data window value. The window must
match the window of your training data. For example, to use 18 months of data for
training, you must retrieve features for 18 months. This means you should backfill
for an 18-month window.

SDK track

This code cell materializes data by current status None or Incomplete for the defined
feature window.

Python

from datetime import datetime

from azure.ai.ml.entities import DataAvailabilityStatus
st = datetime(2022, 1, 1, 0, 0, 0, 0)
et = datetime(2023, 6, 30, 0, 0, 0, 0)

poller = fs_client.feature_sets.begin_backfill(
name="transactions",
version="1",
feature_window_start_time=st,
feature_window_end_time=et,
data_status=[DataAvailabilityStatus.NONE],
)
print(poller.result().job_ids)

Python

# Get the job URL, and stream the job logs.

fs_client.jobs.stream(poller.result().job_ids[0])

 Tip

The feature_window_start_time and feature_window_end_time granularity is

limited to seconds. Any milliseconds provided in the datetime object will be
ignored.
A materialization job will only be submitted if data in the feature window
matches the data_status that is defined while submitting the backfill job.

Print sample data from the feature set. The output information shows that the data was
retrieved from the materialization store. The get_offline_features() method retrieved
the training and inference data. It also uses the materialization store by default.

Python

# Look up the feature set by providing a name and a version and display few
records.
transactions_featureset = featurestore.feature_sets.get("transactions", "1")
display(transactions_featureset.to_spark_dataframe().head(5))

Further explore offline feature materialization

You can explore feature materialization status for a feature set in the Materialization
jobs UI.
1. Open the Azure Machine Learning global landing page .

2. Select Feature stores on the left pane.

3. From the list of accessible feature stores, select the feature store for which you
performed backfill.

4. Select Materialization jobs tab.

Data materialization status can be

Complete (green)
Incomplete (red)
Pending (blue)
None (gray)

A data interval represents a contiguous portion of data with same data

materialization status. For example, the earlier snapshot has 16 data intervals in the
offline materialization store.

The data can have a maximum of 2,000 data intervals. If your data contains more
than 2,000 data intervals, create a new feature set version.

You can provide a list of more than one data statuses (for example, ["None",
"Incomplete"] ) in a single backfill job.

During backfill, a new materialization job is submitted for each data interval that
falls within the defined feature window.
If a materialization job is pending, or that job is running for a data interval that
hasn't yet been backfilled, a new job isn't submitted for that data interval.

You can retry a failed materialization job.

７ Note

To get the job ID of a failed materialization job:

Navigate to the feature set Materialization jobs UI.
Select the Display name of a specific job with Status of Failed.
Locate the job ID under the Name property found on the job Overview
page. It starts with Featurestore-Materialization- .

SDK track

Python

poller = fs_client.feature_sets.begin_backfill(
name="transactions",
version=version,
job_id="<JOB_ID_OF_FAILED_MATERIALIZATION_JOB>",
)
print(poller.result().job_ids)

Updating offline materialization store

If an offline materialization store must be updated at the feature store level, then
all feature sets in the feature store should have offline materialization disabled.
If offline materialization is disabled on a feature set, materialization status of the
data already materialized in the offline materialization store resets. The reset
renders data that is already materialized unusable. You must resubmit
materialization jobs after enabling offline materialization.

This tutorial built the training data with features from the feature store, enabled
materialization to offline feature store, and performed a backfill. Next, you'll run model
training using these features.

Clean up
The fifth tutorial in the series describes how to delete the resources.

Next steps
See the next tutorial in the series: Experiment and train models by using features.
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 2: Experiment and train models
by using features
Article • 11/15/2023

This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.

The first tutorial showed how to create a feature set specification with custom
transformations, and then use that feature set to generate training data, enable
materialization, and perform a backfill. This tutorial shows how to enable materialization,
and perform a backfill. It also shows how to experiment with features, as a way to
improve model performance.

In this tutorial, you learn how to:

＂ Prototype a new accounts feature set specification, by using existing precomputed

values as features. Then, register the local feature set specification as a feature set
in the feature store. This process differs from the first tutorial, where you created a
feature set that had custom transformations.
＂ Select features for the model from the transactions and accounts feature sets, and
save them as a feature retrieval specification.
＂ Run a training pipeline that uses the feature retrieval specification to train a new
model. This pipeline uses the built-in feature retrieval component to generate the
training data.

Prerequisites
Before you proceed with this tutorial, be sure to complete the first tutorial in the series.

Set up
1. Configure the Azure Machine Learning Spark notebook.

You can create a new notebook and execute the instructions in this tutorial step by
step. You can also open and run the existing notebook named 2. Experiment and
train models using features.ipynb from the featurestore_sample/notebooks directory.
You can choose sdk_only or sdk_and_cli. Keep this tutorial open and refer to it for
documentation links and more explanation.
a. On the top menu, in the Compute dropdown list, select Serverless Spark
Compute under Azure Machine Learning Serverless Spark.

b. Configure the session:

i. When the toolbar displays Configure session, select it.
ii. On the Python packages tab, select Upload Conda file.
iii. Upload the conda.yml file that you uploaded in the first tutorial.
iv. Optionally, increase the session time-out (idle time) to avoid frequent
prerequisite reruns.

2. Start the Spark session.

Python

# run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")

3. Set up the root directory for the samples.

Python

import os

# please update the dir to ./Users/<your_user_alias> (or any custom

directory you uploaded the samples to).
# You can find the name from the directory structure in the left nav
root_dir = "./Users/<your_user_alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")

4. Set up the CLI.

Python SDK

Not applicable.

5. Initialize the project workspace variables.

This is the current workspace, and the tutorial notebook runs in this resource.

Python
### Initialize the MLClient of this project workspace
import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]

# connect to the project workspace

ws_client = MLClient(
AzureMLOnBehalfOfCredential(), project_ws_sub_id, project_ws_rg,
project_ws_name
)

6. Initialize the feature store variables.

Be sure to update the featurestore_name and featurestore_location values to

reflect what you created in the first tutorial.

Python

from azure.ai.ml import MLClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

# feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name from part #1 of the
tutorial
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]

# feature store ml client

fs_client = MLClient(
AzureMLOnBehalfOfCredential(),
featurestore_subscription_id,
featurestore_resource_group_name,
featurestore_name,
)

7. Initialize the feature store consumption client.

Python

# feature store client

from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)

8. Create a compute cluster named cpu-cluster in the project workspace.

You need this compute cluster when you run the training/batch inference jobs.

Python

from azure.ai.ml.entities import AmlCompute

cluster_basic = AmlCompute(
name="cpu-cluster-fs",
type="amlcompute",
size="STANDARD_F4S_V2", # you can replace it with other supported
VM SKUs

location=ws_client.workspaces.get(ws_client.workspace_name).location,
min_instances=0,
max_instances=1,
idle_time_before_scale_down=360,
)
ws_client.begin_create_or_update(cluster_basic).result()

Create the accounts feature set in a local

environment
In the first tutorial, you created a transactions feature set that had custom
transformations. Here, you create an accounts feature set that uses precomputed
values.

To onboard precomputed features, you can create a feature set specification without
writing any transformation code. You use a feature set specification to develop and test
a feature set in a fully local development environment.

You don't need to connect to a feature store. In this procedure, you create the feature
set specification locally, and then sample the values from it. For capabilities of managed
feature store, you must use a feature asset definition to register the feature set
specification with a feature store. Later steps in this tutorial provide more details.

1. Explore the source data for the accounts.

７ Note

This notebook uses sample data hosted in a publicly accessible blob

container. Only a wasbs driver can read it in Spark. When you create feature
sets by using your own source data, host those feature sets in an Azure Data
Lake Storage Gen2 account, and use an abfss driver in the data path.

Python

accounts_data_path =
"wasbs://[email protected]/feature-store-
prp/datasources/accounts-precalculated/*.parquet"
accounts_df = spark.read.parquet(accounts_data_path)

display(accounts_df.head(5))

2. Create the accounts feature set specification locally, from these precomputed
features.

You don't need any transformation code here, because you reference
precomputed features.

Python

from azureml.featurestore import create_feature_set_spec,

FeatureSetSpec
from azureml.featurestore.contracts import (
DateTimeOffset,
Column,
ColumnType,
SourceType,
TimestampColumn,
)
from azureml.featurestore.feature_source import ParquetFeatureSource

accounts_featureset_spec = create_feature_set_spec(
source=ParquetFeatureSource(

path="wasbs://[email protected]/feature-
store-prp/datasources/accounts-precalculated/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
# account profiles in the source are updated once a year. set
temporal_join_lookback to 365 days
temporal_join_lookback=DateTimeOffset(days=365, hours=0,
minutes=0),
infer_schema=True,
)

3. Export as a feature set specification.

To register the feature set specification with the feature store, you must save the
feature set specification in a specific format.

After you run the next cell, inspect the generated accounts feature set
specification. To see the specification, open the
featurestore/featuresets/accounts/spec/FeatureSetSpec.yaml file from the file tree.

The specification has these important elements:

source : A reference to a storage resource. In this case, it's a Parquet file in a

blob storage resource.

features : A list of features and their datatypes. With provided transformation

code, the code must return a DataFrame that maps to the features and
datatypes. Without the provided transformation code, the system builds the
query to map the features and datatypes to the source. In this case, the
generated accounts feature set specification doesn't contain transformation
code, because features are precomputed.

index_columns : The join keys required to access values from the feature set.

To learn more, see Understanding top-level entities in managed feature store and
the CLI (v2) feature set specification YAML schema.

As an extra benefit, persisting supports source control.

You don't need any transformation code here, because you reference
precomputed features.

Python

import os

# create a new folder to dump the feature set spec

accounts_featureset_spec_folder = root_dir +
"/featurestore/featuresets/accounts/spec"

# check if the folder exists, create one if not

if not os.path.exists(accounts_featureset_spec_folder):
os.makedirs(accounts_featureset_spec_folder)
accounts_featureset_spec.dump(accounts_featureset_spec_folder,
overwrite=False)

Locally experiment with unregistered features

and register with feature store when ready
As you develop features, you might want to locally test and validate them before you
register them with the feature store or run training pipelines in the cloud. A combination
of a local unregistered feature set ( accounts ) and a feature set registered in the feature
store ( transactions ) generates training data for the machine learning model.

1. Select features for the model.

Python

# get the registered transactions feature set, version 1

transactions_featureset = featurestore.feature_sets.get("transactions",
"1")
# Notice that account feature set spec is in your local dev environment
(this notebook): not registered with feature store yet
features = [
accounts_featureset_spec.get_feature("accountAge"),
accounts_featureset_spec.get_feature("numPaymentRejects1dPerUser"),
transactions_featureset.get_feature("transaction_amount_7d_sum"),
transactions_featureset.get_feature("transaction_amount_3d_sum"),
transactions_featureset.get_feature("transaction_amount_7d_avg"),
]

2. Locally generate training data.

This step generates training data for illustrative purposes. As an option, you can
locally train models here. Later steps in this tutorial explain how to train a model in
the cloud.

Python

from azureml.featurestore import get_offline_features

# Load the observation data. To understand observatio ndata, refer to

part 1 of this tutorial
observation_data_path =
"wasbs://[email protected]/feature-store-
prp/observation_data/train/*.parquet"
observation_data_df = spark.read.parquet(observation_data_path)
obs_data_timestamp_column = "timestamp"
Python

# generate training dataframe by using feature data and observation

data
training_df = get_offline_features(
features=features,
observation_data=observation_data_df,
timestamp_column=obs_data_timestamp_column,
)

# Ignore the message that says feature set is not materialized

(materialization is optional). We will enable materialization in the
next part of the tutorial.
display(training_df)
# Note: display(training_df.head(5)) displays the timestamp column in a
different format. You can can call training_df.show() to see correctly
formatted value

3. Register the accounts feature set with the feature store.

After you locally experiment with feature definitions, and they seem reasonable,
you can register a feature set asset definition with the feature store.

Python

from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

accounts_fset_config = FeatureSet(
name="accounts",
version="1",
description="accounts featureset",
entities=[f"azureml:account:1"],
stage="Development",

specification=FeatureSetSpecification(path=accounts_featureset_spec_fol
der),
tags={"data_type": "nonPII"},
)

poller =
fs_client.feature_sets.begin_create_or_update(accounts_fset_config)
print(poller.result())

4. Get the registered feature set and test it.

Python

# look up the featureset by providing name and version

accounts_featureset = featurestore.feature_sets.get("accounts", "1")
Run a training experiment
In the following steps, you select a list of features, run a training pipeline, and register
the model. You can repeat these steps until the model performs as you want.

1. Optionally, discover features from the feature store UI.

The first tutorial covered this step, when you registered the transactions feature
set. Because you also have an accounts feature set, you can browse through the
available features:
a. Go to the Azure Machine Learning global landing page .
b. On the left pane, select Feature stores.
c. In the list of feature stores, select the feature store that you created earlier.

The UI shows the feature sets and entity that you created. Select the feature sets to
browse through the feature definitions. You can use the global search box to
search for feature sets across feature stores.

2. Optionally, discover features from the SDK.

Python

# List available feature sets

all_featuresets = featurestore.feature_sets.list()
for fs in all_featuresets:
print(fs)

# List of versions for transactions feature set

all_transactions_featureset_versions = featurestore.feature_sets.list(
name="transactions"
)
for fs in all_transactions_featureset_versions:
print(fs)

# See properties of the transactions featureset including list of

features
featurestore.feature_sets.get(name="transactions",
version="1").features

3. Select features for the model, and export the model as a feature retrieval
specification.

In the previous steps, you selected features from a combination of registered and
unregistered feature sets, for local experimentation and testing. You can now
experiment in the cloud. Your model-shipping agility increases if you save the
selected features as a feature retrieval specification, and then use the specification
in the machine learning operations (MLOps) or continuous integration and
continuous delivery (CI/CD) flow for training and inference.

a. Select features for the model.

Python

# you can select features in pythonic way

features = [
accounts_featureset.get_feature("accountAge"),

transactions_featureset.get_feature("transaction_amount_7d_sum"),

transactions_featureset.get_feature("transaction_amount_3d_sum"),
]

# you can also specify features in string form:

featurestore:featureset:version:feature
more_features = [
f"accounts:1:numPaymentRejects1dPerUser",
f"transactions:1:transaction_amount_7d_avg",
]

more_features = featurestore.resolve_feature_uri(more_features)

features.extend(more_features)

b. Export the selected features as a feature retrieval specification.

A feature retrieval specification is a portable definition of the feature list

associated with a model. It can help streamline the development and
operationalization of a machine learning model. It becomes an input to the
training pipeline that generates the training data. It's then packaged with the
model.

The inference phase uses the feature retrieval to look up the features. It
integrates all phases of the machine learning lifecycle. Changes to the
training/inference pipeline can stay at a minimum as you experiment and
deploy.

Use of the feature retrieval specification and the built-in feature retrieval
component is optional. You can directly use the get_offline_features() API, as
shown earlier. The name of the specification should be
feature_retrieval_spec.yaml when it's packaged with the model. This way, the
system can recognize it.

Python
# Create feature retrieval spec
feature_retrieval_spec_folder = root_dir +
"/project/fraud_model/feature_retrieval_spec"

# check if the folder exists, create one if not

if not os.path.exists(feature_retrieval_spec_folder):
os.makedirs(feature_retrieval_spec_folder)

featurestore.generate_feature_retrieval_spec(feature_retrieval_spec_
folder, features)

Train in the cloud with pipelines, and register

the model
In this procedure, you manually trigger the training pipeline. In a production scenario, a
CI/CD pipeline could trigger it, based on changes to the feature retrieval specification in
the source repository. You can register the model if it's satisfactory.

1. Run the training pipeline.

The training pipeline has these steps:

a. Feature retrieval: For its input, this built-in component takes the feature retrieval
specification, the observation data, and the time-stamp column name. It then
generates the training data as output. It runs these steps as a managed Spark
job.

b. Training: Based on the training data, this step trains the model and then
generates a model (not yet registered).

c. Evaluation: This step validates whether the model performance and quality fall
within a threshold. (In this tutorial, it's a placeholder step for illustration
purposes.)

d. Register the model: This step registers the model.

７ Note

In the second tutorial, you ran a backfill job to materialize data for the
transactions feature set. The feature retrieval step reads feature values

from the offline store for this feature set. The behavior is the same, even if
you use the get_offline_features() API.
Python

from azure.ai.ml import load_job # will be used later

training_pipeline_path = (
root_dir +
"/project/fraud_model/pipelines/training_pipeline.yaml"
)
training_pipeline_definition =
load_job(source=training_pipeline_path)
training_pipeline_job =
ws_client.jobs.create_or_update(training_pipeline_definition)
ws_client.jobs.stream(training_pipeline_job.name)
# Note: First time it runs, each step in pipeline can take ~ 15
mins. However subsequent runs can be faster (assuming spark pool is
warm - default timeout is 30 mins)

e. Inspect the training pipeline and the model.

To display the pipeline steps, select the hyperlink for the Web View
pipeline, and open it in a new window.

2. Use the feature retrieval specification in the model artifacts:

a. On the left pane of the current workspace, select Models with the right mouse
button.
b. Select Open in a new tab or window.
c. Select fraud_model.
d. Select Artifacts.

The feature retrieval specification is packaged along with the model. The model
registration step in the training pipeline handled this step. You created the feature
retrieval specification during experimentation. Now it's part of the model
definition. In the next tutorial, you'll see how inferencing uses it.

View the feature set and model dependencies

1. View the list of feature sets associated with the model.

On the same Models page, select the Feature sets tab. This tab shows both the
transactions and accounts feature sets on which this model depends.

2. View the list of models that use the feature sets:

a. Open the feature store UI (explained earlier in this tutorial).
b. On the left pane, select Feature sets.
c. Select a feature set.
d. Select the Models tab.

The feature retrieval specification determined this list when the model was
registered.

Clean up
The fifth tutorial in the series describes how to delete the resources.

Next steps
Go to the next tutorial in the series: Enable recurrent materialization and run batch
inference.
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 3: Enable recurrent
materialization and run batch inference
Article • 11/28/2023

This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.

The first tutorial showed how to create a feature set specification with custom
transformations, and then use that feature set to generate training data, enable
materialization, and perform a backfill. The second tutorial showed how to enable
materialization, and perform a backfill. It also showed how to experiment with features,
as a way to improve model performance.

This tutorial explains how to:

＂ Enable recurrent materialization for the transactions feature set.

＂ Run a batch inference pipeline on the registered model.

Prerequisites
Before you proceed with this tutorial, be sure to complete the first and second tutorials
in the series.

Set up
1. Configure the Azure Machine Learning Spark notebook.

To run this tutorial, you can create a new notebook and execute the instructions
step by step. You can also open and run the existing notebook named 3. Enable
recurrent materialization and run batch inference. You can find that notebook, and
all the notebooks in this series, in the featurestore_sample/notebooks directory. You
can choose sdk_only or sdk_and_cli. Keep this tutorial open and refer to it for
documentation links and more explanation.

a. In the Compute dropdown list in the top nav, select Serverless Spark Compute
under Azure Machine Learning Serverless Spark.

b. Configure the session:

i. Select Configure session in the top status bar.
ii. Select the Python packages tab.
iii. Select Upload conda file.
iv. Select the azureml-examples/sdk/python/featurestore-
sample/project/env/online.yml file from your local machine.

v. Optionally, increase the session time-out (idle time) to avoid frequent

prerequisite reruns.

2. Start the Spark session.

Python

# run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")

3. Set up the root directory for the samples.

Python

import os

# please update the dir to ./Users/<your_user_alias> (or any custom

directory you uploaded the samples to).
# You can find the name from the directory structure in the left nav
root_dir = "./Users/<your_user_alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")

4. Set up the CLI.

Python SDK

Not applicable.

5. Initialize the project workspace CRUD (create, read, update, and delete) client.

The tutorial notebook runs from this current workspace.

Python

### Initialize the MLClient of this project workspace

import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]

# connect to the project workspace

ws_client = MLClient(
AzureMLOnBehalfOfCredential(), project_ws_sub_id, project_ws_rg,
project_ws_name
)

6. Initialize the feature store variables.

Be sure to update the featurestore_name value, to reflect what you created in the
first tutorial.

Python

from azure.ai.ml import MLClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

# feature store ml client

fs_client = MLClient(
AzureMLOnBehalfOfCredential(),
featurestore_subscription_id,
featurestore_resource_group_name,
featurestore_name,
)

7. Initialize the feature store SDK client.

Python

# feature store client

from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

Enable recurrent materialization on the

transactions feature set
In the second tutorial, you enabled materialization and performed backfill on the
transactions feature set. Backfill is an on-demand, one-time operation that computes

and places feature values in the materialization store.

To handle inference of the model in production, you might want to set up recurrent
materialization jobs to keep the materialization store up to date. These jobs run on user-
defined schedules. The recurrent job schedule works this way:

Interval and frequency values define a window. For example, the following values
define a three-hour window:
interval = 3

frequency = Hour

The first window starts at the start_time value defined in RecurrenceTrigger , and
so on.

The first recurrent job is submitted at the start of the next window after the update
time.

Later recurrent jobs are submitted at every window after the first job.

As explained in earlier tutorials, after data is materialized (backfill or recurrent

materialization), feature retrieval uses the materialized data by default.

Python

from datetime import datetime

from azure.ai.ml.entities import RecurrenceTrigger

transactions_fset_config = fs_client.feature_sets.get(name="transactions",
version="1")

# create a schedule that runs the materialization job every 3 hours

transactions_fset_config.materialization_settings.schedule =
RecurrenceTrigger(
interval=3, frequency="Hour", start_time=datetime(2023, 4, 15, 0, 4, 10,
0)
)

fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)

print(fs_poller.result())

(Optional) Save the YAML file for the feature

set asset
You use the updated settings to save the YAML file.

Python SDK

Python

## uncomment and run

# transactions_fset_config.dump(root_dir +
"/featurestore/featuresets/transactions/featureset_asset_offline_enabled
_with_schedule.yaml")

Run the batch inference pipeline

The batch inference has these steps:

1. You use the same built-in feature retrieval component for feature retrieval that you
used in the training pipeline (covered in the third tutorial). For pipeline training,
you provided a feature retrieval specification as a component input. For batch
inference, you pass the registered model as the input. The component looks for
the feature retrieval specification in the model artifact.

Additionally, for training, the observation data had the target variable. However,
the batch inference observation data doesn't have the target variable. The feature
retrieval step joins the observation data with the features and outputs the data for
batch inference.

2. The pipeline uses the batch inference input data from previous step, runs inference
on the model, and appends the predicted value as output.

７ Note

You use a job for batch inference in this example. You can also use batch
endpoints in Azure Machine Learning.
Python

from azure.ai.ml import load_job # will be used later

# set the batch inference pipeline path

batch_inference_pipeline_path = (
root_dir +
"/project/fraud_model/pipelines/batch_inference_pipeline.yaml"
)
batch_inference_pipeline_definition =
load_job(source=batch_inference_pipeline_path)

# run the training pipeline

batch_inference_pipeline_job = ws_client.jobs.create_or_update(
batch_inference_pipeline_definition
)

# stream the run logs

ws_client.jobs.stream(batch_inference_pipeline_job.name)

Inspect the output data for batch inference

In the pipeline view:

1. Select inference_step in the outputs card.

2. Copy the Data field value. It looks something like azureml_995abbc2-3171-461e-

8214-c3c5d17ede83_output_data_data_with_prediction:1 .

3. Paste the Data field value in the following cell, with separate name and version
values. The last character is the version, preceded by a colon ( : ).

4. Note the predict_is_fraud column that the batch inference pipeline generated.

In the batch inference pipeline

(/project/fraud_mode/pipelines/batch_inference_pipeline.yaml) outputs, because you
didn't provide name or version values for outputs of inference_step , the system
created an untracked data asset with a GUID as the name value and 1 as the
version value. In this cell, you derive and then display the data path from the asset.

Python

inf_data_output = ws_client.data.get(
name="azureml_1c106662-aa5e-4354-b5f9-
57c1b0fdb3a7_output_data_data_with_prediction",
version="1",
)
inf_output_df = spark.read.parquet(inf_data_output.path +
"data/*.parquet")
display(inf_output_df.head(5))

Clean up
The fifth tutorial in the series describes how to delete the resources.

Next steps
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 4: Enable online materialization
and run online inference
Article • 11/28/2023

An Azure Machine Learning managed feature store lets you discover, create, and
operationalize features. Features serve as the connective tissue in the machine learning
lifecycle, starting from the prototyping phase, where you experiment with various
features. That lifecycle continues to the operationalization phase, where you deploy your
models, and inference steps look up the feature data. For more information about
feature stores, see feature store concepts.

Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, and use that feature set to generate training data. Part 2 of the series
showed how to enable materialization, and perform a backfill. Additionally, Part 2
showed how to experiment with features, as a way to improve model performance. Part
3 showed how a feature store increases agility in the experimentation and training flows.
Part 3 also described how to run batch inference.

In this tutorial, you'll

＂ Set up an Azure Cache for Redis.

＂ Attach a cache to a feature store as the online materialization store, and grant the
necessary permissions.
＂ Materialize a feature set to the online store.
＂ Test an online deployment with mock data.

Prerequisites

７ Note

This tutorial uses Azure Machine Learning notebook with Serverless Spark
Compute.

Make sure you complete parts 1 through 4 of this tutorial series. This tutorial
reuses the feature store and other resources created in the earlier tutorials.

Set up
This tutorial uses the Python feature store core SDK ( azureml-featurestore ). The Python
SDK is used for create, read, update, and delete (CRUD) operations, on feature stores,
feature sets, and feature store entities.

You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the online.yml file covers them.

1. Configure the Azure Machine Learning Spark notebook.

You can create a new notebook and execute the instructions in this tutorial step by
step. You can also open and run the existing notebook
featurestore_sample/notebooks/sdk_only/4. Enable online store and run online
inference.ipynb. Keep this tutorial open and refer to it for documentation links and
more explanation.

a. In the Compute dropdown list in the top nav, select Serverless Spark Compute.

b. Configure the session:

i. Download azureml-examples/sdk/python/featurestore-
sample/project/env/online.yml file to your local machine.
ii. In configure session in the top nav, select Python packages
iii. Select Upload Conda file
iv. Upload the online.yml file from your local machine, with the same steps as
described in uploading conda.yml file in the first tutorial.
v. Optionally, increase the session time-out (idle time) to avoid frequent
prerequisite reruns.

2. This code cell starts the Spark session. It needs about 10 minutes to install all
dependencies and start the Spark session.

Python

# Run this cell to start the spark session (any code block will start
the session ). This can take approximately 10 mins.
print("start spark session")

3. Set up the root directory for the samples

Python

import os

# Please update the dir to ./Users/<your_user_alias> (or any custom

directory you uploaded the samples to).
# You can find the name from the directory structure in the left
navigation panel.
root_dir = "./Users/<your_user_alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")

4. Initialize the MLClient for the project workspace, where the tutorial notebook runs.
The MLClient is used for the create, read, update, and delete (CRUD) operations.

Python

import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]

# Connect to the project workspace

ws_client = MLClient(
AzureMLOnBehalfOfCredential(), project_ws_sub_id, project_ws_rg,
project_ws_name
)

5. Initialize the MLClient for the feature store workspace, for the create, read, update,
and delete (CRUD) operations on the feature store workspace.

Python

from azure.ai.ml import MLClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

# Feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name from part #1 of the
tutorial
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]

# Feature store MLClient

fs_client = MLClient(
AzureMLOnBehalfOfCredential(),
featurestore_subscription_id,
featurestore_resource_group_name,
featurestore_name,
)

７ Note

A feature store workspace supports feature reuse across projects. A project

workspace - the current workspace in use - leverages features from a specific
feature store, to train and inference models. Many project workspaces can
share and reuse the same feature store workspace.

6. As mentioned earlier, this tutorial uses the Python feature store core SDK ( azureml-
featurestore ). This initialized SDK client is used for create, read, update, and delete

(CRUD) operations, on feature stores, feature sets, and feature store entities.

Python

from azureml.featurestore import FeatureStoreClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

Prepare Azure Cache for Redis

This tutorial uses Azure Cache for Redis as the online materialization store. You can
create a new Redis instance, or reuse an existing instance.

1. Set values for the Azure Cache for Redis resource, to use as online materialization
store. In this code cell, define the name of the Azure Cache for Redis resource to
create or reuse. You can override other default settings.

Python

ws_location =
ws_client.workspaces.get(ws_client.workspace_name).location

redis_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
redis_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]
redis_name = "<REDIS_NAME>"
redis_location = ws_location
2. You can create a new Redis instance. You would select the Redis Cache tier (basic,
standard, premium, or enterprise). Choose an SKU family available for the cache
tier you select. For more information about tiers and cache performance, see this
resource. For more information about SKU tiers and Azure cache families, see this
resource .

Execute this code cell to create an Azure Cache for Redis with premium tier, SKU
family P , and cache capacity 2. It might take between 5 and 10 minutes to prepare
the Redis instance.

Python

from azure.mgmt.redis import RedisManagementClient

from azure.mgmt.redis.models import RedisCreateParameters, Sku,
SkuFamily, SkuName

management_client = RedisManagementClient(
AzureMLOnBehalfOfCredential(), redis_subscription_id
)

# It usually takes about 5 - 10 min to finish the provision of the

Redis instance.
# If the following begin_create() call still hangs for longer than
that,
# please check the status of the Redis instance on the Azure portal and
cancel the cell if the provision has completed.
# This sample uses a PREMIUM tier Redis SKU from family P, which may
cost more than a STANDARD tier SKU from family C.
# Please choose the SKU tier and family according to your performance
and pricing requirements.

redis_arm_id = (
management_client.redis.begin_create(
resource_group_name=redis_resource_group_name,
name=redis_name,
parameters=RedisCreateParameters(
location=redis_location,
sku=Sku(name=SkuName.PREMIUM, family=SkuFamily.P,
capacity=2),
),
)
.result()
.id
)

print(redis_arm_id)

3. Optionally, this code cell reuses an existing Redis instance with the previously
defined name.
Python

redis_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Cache/
Redis/{name}".format(
sub_id=redis_subscription_id,
rg=redis_resource_group_name,
name=redis_name,
)

Attach online materialization store to the

feature store
The feature store needs the Azure Cache for Redis as an attached resource, for use as
the online materialization store. This code cell handles that step.

Python

from azure.ai.ml.entities import (

ManagedIdentityConfiguration,
FeatureStore,
MaterializationStore,
)

online_store = MaterializationStore(type="redis", target=redis_arm_id)

ml_client = MLClient(
AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
)

fs = FeatureStore(
name=featurestore_name,
online_store=online_store,
)

fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

７ Note

During a feature store update, setting grant_materiaization_permissions=True

alone will not grant the required RBAC permissions to the UAI. The role
assignments to UAI will happen only when one of the following is updated:
Materialization identity
Online store target
Offline store target

Materialize the accounts feature set data to

online store

Enable materialization on the accounts feature set

Earlier in this tutorial series, you did not materialize the accounts feature set because it
had precomputed features, and only batch inference scenarios used it. This code cell
enables online materialization so that the features become available in the online store,
with low latency access. For consistency, it also enables offline materialization. Enabling
offline materialization is optional.

Python

from azure.ai.ml.entities import (

MaterializationSettings,
MaterializationComputeResource,
)

# Turn on both offline and online materialization on the "accounts"

featureset.

accounts_fset_config = fs_client._featuresets.get(name="accounts",
version="1")

accounts_fset_config.materialization_settings = MaterializationSettings(
offline_enabled=True,
online_enabled=True,

resource=MaterializationComputeResource(instance_type="standard_e8s_v3"),
spark_configuration={
"spark.driver.cores": 4,
"spark.driver.memory": "36g",
"spark.executor.cores": 4,
"spark.executor.memory": "36g",
"spark.executor.instances": 2,
},
schedule=None,
)

fs_poller =
fs_client.feature_sets.begin_create_or_update(accounts_fset_config)
print(fs_poller.result())

Backfill the account feature set

The begin_backfill function backfills data to all the materialization stores enabled for
this feature set. Here offline and online materialization are both enabled. This code cell
backfills the data to both online and offline materialization stores.

Python

from datetime import datetime, timedelta

# Trigger backfill on the "accounts" feature set.

# Backfill from 01/01/2020 to all the way to 3 hours ago.

st = datetime(2020, 1, 1, 0, 0, 0, 0)
et = datetime.now() - timedelta(hours=3)

poller = fs_client.feature_sets.begin_backfill(
name="accounts",
version="1",
feature_window_start_time=st,
feature_window_end_time=et,
data_status=["None"],
)
print(poller.result().job_ids)

 Tip

The feature_window_start_time and feature_window_end_time granularily is

limited to seconds. Any milliseconds provided in the datetime object will be
ignored.
A materialization job will only be submitted if there is data in the feature
window matching the data_status defined while submitting the backfill job.

This code cell tracks completion of the backfill job. With the Azure Cache for Redis
premium tier provisioned earlier, this step might need approximately 10 minutes to
complete.

Python

# Get the job URL, and stream the job logs.

# With PREMIUM Redis SKU, SKU family "P", and cache capacity 2,
# it takes approximately 10 minutes to complete.
fs_client.jobs.stream(poller.result().job_ids[0])

Materialize transactions feature set data to the

online store
Earlier in this tutorial series, you materialized transactions feature set data to the offline
materialization store.

1. This code cell enables the transactions feature set online materialization.

Python

# Enable materialization to online store for the "transactions" feature

set.

transactions_fset_config =
fs_client._featuresets.get(name="transactions", version="1")
transactions_fset_config.materialization_settings.online_enabled = True

fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)
print(fs_poller.result())

2. This code cell backfills the data to both the online and offline materialization store,
to ensure that both stores have the latest data. The recurrent materialization job,
which you set up in Tutorial 3 of this series, now materializes data to both online
and offline materialization stores.

Python

# Trigger backfill on the "transactions" feature set to fill in the

online/offline store.
# Backfill from 01/01/2020 to all the way to 3 hours ago.

from datetime import datetime, timedelta

from azure.ai.ml.entities import DataAvailabilityStatus

st = datetime(2020, 1, 1, 0, 0, 0, 0)
et = datetime.now() - timedelta(hours=3)

This code cell tracks completion of the backfill job. Using the premium tier Azure
Cache for Redis provisioned earlier, this step might need approximately five
minutes to complete.

Python

# Get the job URL, and stream the job logs.

# With PREMIUM Redis SKU, SKU family "P", and cache capacity 2,
# it takes approximately 5 minutes to complete.
fs_client.jobs.stream(poller.result().job_ids[0])

Further explore online feature materialization

You can explore the feature materialization status for a feature set from the
Materialization jobs UI.

1. Open the Azure Machine Learning global landing page .

2. Select Feature stores in the left pane.

3. From the list of accessible feature stores, select the feature store for which you
performed the backfill.

4. Select the Materialization jobs tab.


The data materialization status can be
Complete (green)
Incomplete (red)
Pending (blue)
None (gray)
A data interval represents a contiguous portion of data with same data
materialization status. For example, the earlier snapshot has 16 data intervals in the
offline materialization store.
Your data can have a maximum of 2,000 data intervals. If your data contains more
than 2,000 data intervals, create a new feature set version.
You can provide a list of more than one data statuses (for example, ["None",
"Incomplete"] ) in a single backfill job.

During backfill, a new materialization job is submitted for each data interval that
falls in the defined feature window.
A new job is not submitted for a data interval if a materialization job is already
pending, or is running for a data interval that hasn't yet been backfilled.

Updating online materialization store

If an online materialization store is to be updated at the feature store level, then all
feature sets in the feature store should have online materialization disabled.
If online materialization is disabled on a feature set, the materialization status of
the already-materialized data in the online materialization store will be reset. This
renders the already-materialized data unusable. You must resubmit your
materialization jobs after you enable online materialization.
If only offline materialization was initially enabled for a feature set, and online
materialization is enabled later:
The default data materialization status of the data in the online store will be
None .

When the first online materialization job is submitted, the data already
materialized in the offline store, if available, is used to calculate online features.
If the data interval for online materialization partially overlaps the data interval
of already materialized data located in the offline store, separate materialization
jobs are submitted for the overlapping and nonoverlapping parts of the data
interval.

Test locally
Now, use your development environment to look up features from the online
materialization store. The tutorial notebook attached to Serverless Spark Compute
serves as the development environment.

This code cell parses the list of features from the existing feature retrieval specification.

Python

# Parse the list of features from the existing feature retrieval

specification.
feature_retrieval_spec_folder = root_dir +
"/project/fraud_model/feature_retrieval_spec"

features =
featurestore.resolve_feature_retrieval_spec(feature_retrieval_spec_folder)

features

This code retrieves feature values from the online materialization store.

Python

from azureml.featurestore import init_online_lookup

import time

# Initialize the online store client.

init_online_lookup(features, AzureMLOnBehalfOfCredential())

Prepare some observation data for testing, and use that data to look up features from
the online materialization store. During the online look-up, the keys ( accountID ) defined
in the observation sample data might not exist in the Redis (due to TTL ). In this case:

1. Open the Azure portal.

2. Navigate to the Redis instance.

3. Open the console for the Redis instance, and check for existing keys with the KEYS
* command.

4. Replace the accountID values in the sample observation data with the existing
keys.

Python

import pyarrow
from azureml.featurestore import get_online_features

# Prepare test observation data

obs = pyarrow.Table.from_pydict(
{"accountID": ["A985156952816816", "A1055521248929430",
"A914800935560176"]}
)

# Online lookup:
# It may happen that the keys defined in the observation sample data
above does not exist in the Redis (due to TTL).
# If this happens, go to Azure portal and navigate to the Redis
instance, open its console and check for existing keys using command
"KEYS *"
# and replace the sample observation data with the existing keys.
df = get_online_features(features, obs)
df

These steps looked up features from the online store. In the next step, you'll test online
features using an Azure Machine Learning managed online endpoint.

Test online features from Azure Machine

Learning managed online endpoint
A managed online endpoint deploys and scores models for online/realtime inference.
You can use any available inference technology - like Kubernetes, for example.

This step involves these actions:

1. Create an Azure Machine Learning managed online endpoint.

2. Grant required role-based access control (RBAC) permissions.
3. Deploy the model that you trained in the tutorial 3 of this tutorial series. The
scoring script used in this step has the code to look up online features.
4. Score the model with sample data.

Create Azure Machine Learning managed online endpoint

Visit this resource to learn more about managed online endpoints. With the managed
feature store API, you can also look up online features from other inference platforms.

This code cell defines the fraud-model managed online endpoint.

Python

from azure.ai.ml.entities import (

ManagedOnlineDeployment,
ManagedOnlineEndpoint,
Model,
CodeConfiguration,
Environment,
)
endpoint_name = "<ENDPOINT_NAME>"

endpoint = ManagedOnlineEndpoint(name=endpoint_name, auth_mode="key")

This code cell creates the managed online endpoint defined in the previous code cell.

Python

ws_client.online_endpoints.begin_create_or_update(endpoint).result()

Grant required RBAC permissions

Here, you grant required RBAC permissions to the managed online endpoint on the
Redis instance and feature store. The scoring code in the model deployment needs
these RBAC permissions to successfully search for features in the online store, with the
managed feature store API.

Get managed identity of the managed online endpoint

This code cell retrieves the managed identity of the managed online endpoint:

Python

# Get managed identity of the managed online endpoint.

endpoint = ws_client.online_endpoints.get(endpoint_name)

model_endpoint_msi_principal_id = endpoint.identity.principal_id
model_endpoint_msi_principal_id

Grant the Contributor role to the online endpoint managed

identity on the Azure Cache for Redis

This code cell grants the Contributor role to the online endpoint managed identity on
the Redis instance. This RBAC permission is needed to materialize data into the Redis
online store.

Python

from azure.core.exceptions import ResourceExistsError

from azure.mgmt.msi import ManagedServiceIdentityClient
from azure.mgmt.msi.models import Identity
from azure.mgmt.authorization import AuthorizationManagementClient
from azure.mgmt.authorization.models import RoleAssignmentCreateParameters
from uuid import uuid4

auth_client = AuthorizationManagementClient(
AzureMLOnBehalfOfCredential(), redis_subscription_id
)

scope =
f"/subscriptions/{redis_subscription_id}/resourceGroups/{redis_resource_grou
p_name}/providers/Microsoft.Cache/Redis/{redis_name}"

# The role definition ID for the "contributor" role on the redis cache
# You can find other built-in role definition IDs in the Azure documentation
role_definition_id =
f"/subscriptions/{redis_subscription_id}/providers/Microsoft.Authorization/r
oleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"

# Generate a random UUID for the role assignment name

role_assignment_name = str(uuid4())

# Set up the role assignment creation parameters

role_assignment_params = RoleAssignmentCreateParameters(
principal_id=model_endpoint_msi_principal_id,
role_definition_id=role_definition_id,
principal_type="ServicePrincipal",
)

# Create the role assignment

try:
# Create the role assignment
result = auth_client.role_assignments.create(
scope, role_assignment_name, role_assignment_params
)
print(
f"Redis RBAC granted to managed identity
'{model_endpoint_msi_principal_id}'."
)
except ResourceExistsError:
print(
f"Redis RBAC already exists for managed identity
'{model_endpoint_msi_principal_id}'."
)

Grant AzureML Data Scientist role to the online endpoint managed

identity on the feature store
This code cell grants the AzureML Data Scientist role to the online endpoint managed
identity on the feature store. This RBAC permission is required for successful
deployment of the model to the online endpoint.
Python

auth_client = AuthorizationManagementClient(
AzureMLOnBehalfOfCredential(), featurestore_subscription_id
)

scope =
f"/subscriptions/{featurestore_subscription_id}/resourceGroups/{featurestore
_resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces
/{featurestore_name}"

# The role definition ID for the "AzureML Data Scientist" role.

# You can find other built-in role definition IDs in the Azure
documentation.
role_definition_id =
f"/subscriptions/{featurestore_subscription_id}/providers/Microsoft.Authoriz
ation/roleDefinitions/f6c7c914-8db3-469d-8ca1-694a8f32e121"

# Generate a random UUID for the role assignment name.

role_assignment_name = str(uuid4())

# Set up the role assignment creation parameters.

role_assignment_params = RoleAssignmentCreateParameters(
principal_id=model_endpoint_msi_principal_id,
role_definition_id=role_definition_id,
principal_type="ServicePrincipal",
)

# Create the role assignment

try:
# Create the role assignment
result = auth_client.role_assignments.create(
scope, role_assignment_name, role_assignment_params
)
print(
f"Feature store RBAC granted to managed identity
'{model_endpoint_msi_principal_id}'."
)
except ResourceExistsError:
print(
f"Feature store RBAC already exists for managed identity
'{model_endpoint_msi_principal_id}'."
)

Deploy the model to the online endpoint

Review the scoring script project/fraud_model/online_inference/src/scoring.py . The

scoring script

1. Loads the feature metadata from the feature retrieval specification packaged with
the model during model training. Tutorial 3 of this tutorial series covered this task.
The specification has features from both the transactions and accounts feature
sets.
2. Looks up the online features using the index keys from the request, when an input
inference request is received. In this case, for both feature sets, the index column is
accountID .

3. Passes the features to the model to perform the inference, and returns the
response. The response is a boolean value that represents the variable is_fraud .

Next, execute this code cell to create a managed online deployment definition for
model deployment.

Python

deployment = ManagedOnlineDeployment(
name="green",
endpoint_name=endpoint_name,
model="azureml:fraud_model:1",
code_configuration=CodeConfiguration(
code=root_dir + "/project/fraud_model/online_inference/src/",
scoring_script="scoring.py",
),
environment=Environment(
conda_file=root_dir +
"/project/fraud_model/online_inference/conda.yml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
),
instance_type="Standard_DS3_v2",
instance_count=1,
)

Deploy the model to online endpoint with this code cell. The deployment might need
four to five minutes.

Python

# Model deployment to online enpoint may take 4-5 minutes.

ws_client.online_deployments.begin_create_or_update(deployment).result()

Test online deployment with mock data

Execute this code cell to test the online deployment with the mock data. You should see
0 or 1 as the output of this cell.

Python
# Test the online deployment using the mock data.
sample_data = root_dir + "/project/fraud_model/online_inference/test.json"
ws_client.online_endpoints.invoke(
endpoint_name=endpoint_name, request_file=sample_data,
deployment_name="green"
)

Clean up
The fifth tutorial in the series describes how to delete the resources.

Next steps
Network isolation with feature store (preview)
Azure Machine Learning feature stores samples repository
Tutorial 5: Develop a feature set with a
custom source
Article • 11/28/2023

Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, enable materialization and perform a backfill. Part 2 showed how to
experiment with features in the experimentation and training flows. Part 3 explained
recurrent materialization for the transactions feature set, and showed how to run a
batch inference pipeline on the registered model. Part 4 described how to run batch
inference.

In this tutorial, you'll

＂ Define the logic to load data from a custom data source.

＂ Configure and register a feature set to consume from this custom data source.
＂ Test the registered feature set.

Prerequisites

７ Note

This tutorial uses an Azure Machine Learning notebook with Serverless Spark
Compute.

Make sure you complete the previous tutorials in this series. This tutorial reuses
feature store and other resources created in those earlier tutorials.

You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the conda.yml file covers them.

Configure the Azure Machine Learning Spark notebook

You can create a new notebook and execute the instructions in this tutorial step by step.
You can also open and run the existing notebook
featurestore_sample/notebooks/sdk_only/5. Develop a feature set with custom
source.ipynb. Keep this tutorial open and refer to it for documentation links and more
explanation.

1. On the top menu, in the Compute dropdown list, select Serverless Spark Compute
under Azure Machine Learning Serverless Spark.

2. Configure the session:

a. Select Configure session in the top status bar.
b. Select the Python packages tab, s
c. Select Upload Conda file.
d. Upload the conda.yml file that you uploaded in the first tutorial.
e. Optionally, increase the session time-out (idle time) to avoid frequent
prerequisite reruns.

Set up the root directory for the samples

This code cell sets up the root directory for the samples. It needs about 10 minutes to
install all dependencies and start the Spark session.

Python

import os

# Please update the dir to ./Users/{your_user_alias} (or any custom

directory you uploaded the samples to).
# You can find the name from the directory structure in the left navigation
panel.
root_dir = "./Users/<your_user_alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
Initialize the CRUD client of the feature store
workspace
Initialize the MLClient for the feature store workspace, to cover the create, read, update,
and delete (CRUD) operations on the feature store workspace.

Python

from azure.ai.ml import MLClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

# Feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name that was used in the tutorial
#1
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]

# Feature store ml client

fs_client = MLClient(
AzureMLOnBehalfOfCredential(),
featurestore_subscription_id,
featurestore_resource_group_name,
featurestore_name,
)

Initialize the feature store core SDK client

As mentioned earlier, this tutorial uses the Python feature store core SDK ( azureml-
featurestore ). This initialized SDK client covers create, read, update, and delete (CRUD)

operations on feature stores, feature sets, and feature store entities.

Python

from azureml.featurestore import FeatureStoreClient

from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
Custom source definition
You can define your own source loading logic from any data storage that has a custom
source definition. Implement a source processor user-defined function (UDF) class
( CustomSourceTransformer in this tutorial) to use this feature. This class should define an
__init__(self, **kwargs) function, and a process(self, start_time, end_time,

**kwargs) function. The kwargs dictionary is supplied as a part of the feature set
specification definition. This definition is then passed to the UDF. The start_time and
end_time parameters are calculated and passed to the UDF function.

This is sample code for the source processor UDF class:

Python

from datetime import datetime

class CustomSourceTransformer:
def __init__(self, **kwargs):
self.path = kwargs.get("source_path")
self.timestamp_column_name = kwargs.get("timestamp_column_name")
if not self.path:
raise Exception("`source_path` is not provided")
if not self.timestamp_column_name:
raise Exception("`timestamp_column_name` is not provided")

def process(
self, start_time: datetime, end_time: datetime, **kwargs
) -> "pyspark.sql.DataFrame":
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, to_timestamp

spark = SparkSession.builder.getOrCreate()
df = spark.read.json(self.path)

if start_time:
df = df.filter(col(self.timestamp_column_name) >=
to_timestamp(lit(start_time)))

if end_time:
df = df.filter(col(self.timestamp_column_name) <
to_timestamp(lit(end_time)))

return df

Create a feature set specification with a custom

source, and experiment with it locally
Now, create a feature set specification with a custom source definition, and use it in your
development environment to experiment with the feature set. The tutorial notebook
attached to Serverless Spark Compute serves as the development environment.

Python

from azureml.featurestore import create_feature_set_spec

from azureml.featurestore.feature_source import CustomFeatureSource
from azureml.featurestore.contracts import (
SourceProcessCode,
TransformationCode,
Column,
ColumnType,
DateTimeOffset,
TimestampColumn,
)

transactions_source_process_code_path = (
root_dir
+
"/featurestore/featuresets/transactions_custom_source/source_process_code"
)
transactions_feature_transform_code_path = (
root_dir
+
"/featurestore/featuresets/transactions_custom_source/feature_process_code"
)

udf_featureset_spec = create_feature_set_spec(
source=CustomFeatureSource(
kwargs={
"source_path":
"wasbs://[email protected]/feature-store-
prp/datasources/transactions-source-json/*.json",
"timestamp_column_name": "timestamp",
},
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
source_process_code=SourceProcessCode(
path=transactions_source_process_code_path,
process_class="source_process.CustomSourceTransformer",
),
),
feature_transformation=TransformationCode(
path=transactions_feature_transform_code_path,

Next, define a feature window, and display the feature values in this feature window.

Python

from datetime import datetime

st = datetime(2023, 1, 1)
et = datetime(2023, 6, 1)

display(
udf_featureset_spec.to_spark_dataframe(
feature_window_start_date_time=st, feature_window_end_date_time=et
)
)

Export as a feature set specification

To register the feature set specification with the feature store, first save that specification
in a specific format. Review the generated transactions_custom_source feature set
specification. Open this file from the file tree to see the specification:
featurestore/featuresets/transactions_custom_source/spec/FeaturesetSpec.yaml .

The specification has these elements:

features : A list of features and their datatypes.

index_columns : The join keys required to access values from the feature set.

To learn more about the specification, see Understanding top-level entities in managed
feature store and CLI (v2) feature set YAML schema.

Feature set specification persistence offers another benefit: the feature set specification
can be source controlled.

Python

feature_spec_folder = (
root_dir + "/featurestore/featuresets/transactions_custom_source/spec"
)

udf_featureset_spec.dump(feature_spec_folder)
Register the transaction feature set with the
feature store
Use this code to register a feature set asset loaded from custom source with the feature
store. You can then reuse that asset, and easily share it. Registration of a feature set
asset offers managed capabilities, including versioning and materialization.

Python

from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

transaction_fset_config = FeatureSet(
name="transactions_custom_source",
version="1",
description="transactions feature set loaded from custom source",
entities=["azureml:account:1"],
stage="Development",
specification=FeatureSetSpecification(path=feature_spec_folder),
tags={"data_type": "nonPII"},
)

poller =
fs_client.feature_sets.begin_create_or_update(transaction_fset_config)
print(poller.result())

Obtain the registered feature set, and print related information.

Python

# Look up the feature set by providing name and version

transactions_fset_config = featurestore.feature_sets.get(
name="transactions_custom_source", version="1"
)
# Print feature set information
print(transactions_fset_config)

Test feature generation from registered feature

set
Use the to_spark_dataframe() function of the feature set to test the feature generation
from the registered feature set, and display the features. print-txn-fset-sample-values

Python
df = transactions_fset_config.to_spark_dataframe()
display(df)

You should be able to successfully fetch the registered feature set as a Spark dataframe,
and then display it. You can now use these features for a point-in-time join with
observation data, and the subsequent steps in your machine learning pipeline.

Clean up
If you created a resource group for the tutorial, you can delete that resource group,
which deletes all the resources associated with this tutorial. Otherwise, you can delete
the resources individually:

To delete the feature store, open the resource group in the Azure portal, select the
feature store, and delete it.
The user-assigned managed identity (UAI) assigned to the feature store workspace
is not deleted when we delete the feature store. To delete the UAI, follow these
instructions.
To delete a storage account-type offline store, open the resource group in the
Azure portal, select the storage that you created, and delete it.
To delete an Azure Cache for Redis instance, open the resource group in the Azure
portal, select the instance that you created, and delete it.

Next steps
Network isolation with feature store
Azure Machine Learning feature stores samples repository
Tutorial 6: Network isolation with
feature store (preview)
Article • 09/13/2023

） Important

For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .

This tutorial describes how to configure secure ingress through a private endpoint, and
secure egress through a managed virtual network.

Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, and use that feature set to generate training data. Part 2 of the tutorial
series showed how to enable materialization and perform a backfill. Part 3 of this tutorial
series showed how to experiment with features, as a way to improve model
performance. Part 3 also showed how a feature store increases agility in the
experimentation and training flows. Tutorial 4 described how to run batch inference.
Tutorial 5 explained how to use feature store for online/realtime inference use cases.
Tutorial 6 shows how to

＂ Set up the necessary resources for network isolation of a managed feature store.
＂ Create a new feature store resource.
＂ Set up your feature store to support network isolation scenarios.
＂ Update your project workspace (current workspace) to support network isolation
scenarios .

Prerequisites
７ Note

This tutorial uses Azure Machine Learning notebook with Serverless Spark
Compute.

Make sure you complete parts 1 through 5 of this tutorial series.

An Azure Machine Learning workspace, enabled with Managed virtual network for
serverless spark jobs.

If your workspace has an Azure Container Registry, it must use Premium SKU to
successfully complete the workspace configuration. To configure your project
workspace:

1. Create a YAML file named network.yml :

YAML

managed_network:
isolation_mode: allow_internet_outbound

2. Execute these commands to update the workspace and provision the

managed virtual network for serverless Spark jobs:

cli

az ml workspace update --file network.yml --resource-group

my_resource_group --name
my_workspace_name
az ml workspace provision-network --resource-group
my_resource_group --name my_workspace_name
--include-spark

For more information, see Configure for serverless spark job.

Your user account must have the Owner or Contributor role assigned to the
resource group where you create the feature store. Your user account also needs
the User Access Administrator role.

） Important

For your Azure Machine Learning workspace, set the isolation_mode to

allow_internet_outbound . This is the only isolation_mode option available at this
time. However, we are actively working to add allow_only_approved_outbound
isolation_mode functionality. As a workaround, this tutorial will show how to
connect to sources, materialization store and observation data securely through
private endpoints.

Set up
This tutorial uses the Python feature store core SDK ( azureml-featurestore ). The Python
SDK is used for feature set development and testing only. The CLI is used for create,
read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature
store entities. This is useful in continuous integration and continuous delivery (CI/CD) or
GitOps scenarios where CLI/YAML is preferred.

You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the conda.yaml file covers them.

To prepare the notebook environment for development:

1. Clone the azureml-examples repository to your local GitHub resources with this
command:

git clone --depth 1 https://github.com/Azure/azureml-examples

You can also download a zip file from the azureml-examples repository. At this
page, first select the code dropdown, and then select Download ZIP . Then, unzip
the contents into a folder on your local device.

2. Upload the feature store samples directory to the project workspace

a. In the Azure Machine Learning workspace, open the Azure Machine Learning
studio UI.
b. Select Notebooks in left navigation panel.
c. Select your user name in the directory listing.
d. Select ellipses (...) and then select Upload folder.
e. Select the feature store samples folder from the cloned directory path: azureml-
examples/sdk/python/featurestore-sample .

3. Run the tutorial

Option 1: Create a new notebook, and execute the instructions in this

document, step by step.
Option 2: Open existing notebook
featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network

Isolation for Feature store.ipynb . You may keep this document open and

refer to it for more explanation and documentation links.

a. Select Serverless Spark Compute in the top navigation Compute
dropdown. This operation might take one to two minutes. Wait for a status
bar in the top to display Configure session.
b. Select Configure session in the top status bar.
c. Select Python packages.
d. Select Upload conda file.
e. Select file azureml-examples/sdk/python/featurestore-
sample/project/env/conda.yml located on your local device.

f. (Optional) Increase the session time-out (idle time in minutes) to reduce

the serverless spark cluster startup time.

4. This code cell starts the Spark session. It needs about 10 minutes to install all
dependencies and start the Spark session.

Python

# Run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")

5. Set up the root directory for the samples

Python

import os

# Please update your alias below (or any custom directory you have
uploaded the samples to).
# You can find the name from the directory structure in the left
navigation.
root_dir = "./Users/<your user alias>/featurestore_sample"

if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")

6. Set up the Azure Machine Learning CLI:

Install the Azure Machine Learning CLI extension

Python

# install azure ml cli extension

!az extension add --name ml

Authenticate

Python

# authenticate
!az login

Set the default subscription

Python

# Set default subscription

import os

subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]

!az account set -s $subscription_id

７ Note

A feature store workspace supports feature reuse across projects. A project

Provision the necessary resources

You can create a new Azure Data Lake Storage (ADLS) Gen2 storage account and
containers, or reuse existing storage account and container resources for the feature
store. In a real-world situation, different storage accounts can host the ADLS Gen2
containers. Both options work, depending on your specific requirements.

For this tutorial, you create three separate storage containers in the same ADLS Gen2
storage account:

Source data
Offline store
Observation data
1. Create an ADLS Gen2 storage account for source data, offline store, and
observation data.

a. Provide the name of an Azure Data Lake Storage Gen2 storage account in the
following code sample. You can execute the following code cell with the
provided default settings. Optionally, you can override the default settings.

Python

## Default Setting
# We use the subscription, resource group, region of this active
project workspace,
# We hard-coded default resource names for creating new resources

## Overwrite
# You can replace them if you want to create the resources in a
different subsciprtion/resourceGroup, or use existing resources
# At the minimum, provide an ADLS Gen2 storage account name for
`storage_account_name`

storage_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
storage_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
storage_account_name = "<STORAGE_ACCOUNT_NAME>"

storage_location = "eastus"
storage_file_system_name_offline_store = "offline-store"
storage_file_system_name_source_data = "source-data"
storage_file_system_name_observation_data = "observation-data"

b. This code cell creates the ADLS Gen2 storage account defined in the above
code cell.

Python

# Create new storage account

!az storage account create --name $storage_account_name --enable-
hierarchical-namespace true --resource-group
$storage_resource_group_name --location $storage_location --
subscription $storage_subscription_id

c. This code cell creates a new storage container for offline store.

Python

# Create a new storage container for offline store

!az storage fs create --name $storage_file_system_name_offline_store
--account-name $storage_account_name --subscription
$storage_subscription_id

d. This code cell creates a new storage container for source data.

Python

# Create a new storage container for source data

!az storage fs create --name $storage_file_system_name_source_data -
-account-name $storage_account_name --subscription
$storage_subscription_id

e. This code cell creates a new storage container for observation data.

Python

# Create a new storage container for observation data

!az storage fs create --name
$storage_file_system_name_observation_data --account-name
$storage_account_name --subscription $storage_subscription_id

2. Copy the sample data required for this tutorial series into the newly created
storage containers.

a. To write data to the storage containers, ensure that Contributor and Storage
Blob Data Contributor roles are assigned to the user identity on the created
ADLS Gen2 storage account in the Azure portal following these steps.

） Important

Once you have ensured that the Contributor and Storage Blob Data
Contributor roles are assigned to the user identity, wait for a few minutes
after role assignment to let permissions propagate before proceeding with
the next steps. To learn more about access control, see role-based access
control (RBAC) for Azure storage accounts

The following code cells copy sample source data for transactions feature set
used in this tutorial from a public storage account to the newly created storage
account.

Python

# Copy sample source data for transactions feature set used in this
tutorial series from the public storage account to the newly created
storage account
transactions_source_data_path =
"wasbs://[email protected]/feature-
store-prp/datasources/transactions-source/*.parquet"
transactions_src_df =
spark.read.parquet(transactions_source_data_path)

transactions_src_df.write.parquet(

f"abfss://{storage_file_system_name_source_data}@{storage_account_na
me}.dfs.core.windows.net/transactions-source/"
)

b. Copy sample source data for account feature set used in this tutorial from a
public storage account to the newly created storage account.

Python

# Copy sample source data for account feature set used in this
tutorial series from the public storage account to the newly created
storage account
accounts_data_path =
"wasbs://[email protected]/feature-
store-prp/datasources/accounts-precalculated/*.parquet"
accounts_data_df = spark.read.parquet(accounts_data_path)

accounts_data_df.write.parquet(

f"abfss://{storage_file_system_name_source_data}@{storage_account_na
me}.dfs.core.windows.net/accounts-precalculated/"
)

c. Copy sample observation data used for training from a public storage account
to the newly created storage account.

Python

# Copy sample observation data used for training from the public
storage account to the newly created storage account
observation_data_train_path =
"wasbs://[email protected]/feature-
store-prp/observation_data/train/*.parquet"
observation_data_train_df =
spark.read.parquet(observation_data_train_path)

observation_data_train_df.write.parquet(

f"abfss://{storage_file_system_name_observation_data}@{storage_accou
nt_name}.dfs.core.windows.net/train/"
)
d. Copy sample observation data used for batch inference from a public storage
account to the newly created storage account.

Python

# Copy sample observation data used for batch inference from a

public storage account to the newly created storage account
observation_data_inference_path =
"wasbs://[email protected]/feature-
store-prp/observation_data/batch_inference/*.parquet"
observation_data_inference_df =
spark.read.parquet(observation_data_inference_path)

observation_data_inference_df.write.parquet(

f"abfss://{storage_file_system_name_observation_data}@{storage_accou
nt_name}.dfs.core.windows.net/batch_inference/"
)

3. Disable the public network access on the newly created storage account.

a. This code cell disables public network access for the ADLS Gen2 storage
account created earlier.

Python

# Disable the public network access for the above created ADLS Gen2
storage account
!az storage account update --name $storage_account_name --resource-
group $storage_resource_group_name --subscription
$storage_subscription_id --public-network-access disabled

b. Set ARM IDs for the offline store, source data, and observation data containers.

Python

# set the container arm id

offline_store_gen2_container_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Sto
rage/storageAccounts/{account}/blobServices/default/containers/{cont
ainer}".format(
sub_id=storage_subscription_id,
rg=storage_resource_group_name,
account=storage_account_name,
container=storage_file_system_name_offline_store,
)

print(offline_store_gen2_container_arm_id)

source_data_gen2_container_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Sto
rage/storageAccounts/{account}/blobServices/default/containers/{cont
ainer}".format(
sub_id=storage_subscription_id,
rg=storage_resource_group_name,
account=storage_account_name,
container=storage_file_system_name_source_data,
)

print(source_data_gen2_container_arm_id)

observation_data_gen2_container_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Sto
rage/storageAccounts/{account}/blobServices/default/containers/{cont
ainer}".format(
sub_id=storage_subscription_id,
rg=storage_resource_group_name,
account=storage_account_name,
container=storage_file_system_name_observation_data,
)

print(observation_data_gen2_container_arm_id)

Provision the user-assigned managed identity

(UAI)
1. Create a new User-assigned managed identity.

a. In the following code cell, provide a name for the user-assigned managed
identity that you would like to create.

Python

# User assigned managed identity values. Optionally you may change

the values.
uai_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
uai_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]
uai_name = "<UAI_NAME>"
# feature store location is used by default. You can change it.
uai_location = storage_location

b. This code cell creates the UAI.

Python

!az identity create --subscription $uai_subscription_id --resource-

group $uai_resource_group_name --location $uai_location --name
$uai_name
c. This code cell retrieves the principal ID, client ID, and ARM ID property values
for the created UAI.

Python

from azure.mgmt.msi import ManagedServiceIdentityClient

from azure.mgmt.msi.models import Identity
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

msi_client = ManagedServiceIdentityClient(
AzureMLOnBehalfOfCredential(), uai_subscription_id
)
managed_identity = msi_client.user_assigned_identities.get(
resource_name=uai_name,
resource_group_name=uai_resource_group_name
)

uai_principal_id = managed_identity.principal_id
uai_client_id = managed_identity.client_id
uai_arm_id = managed_identity.id

Grant RBAC permission to the user-assigned managed

identity (UAI)
The UAI is assigned to the feature store, and requires the following permissions:

Scope Action/Role

Feature store Azure Machine Learning Data Scientist role

Storage account of feature store offline store Storage Blob Data Contributor role

Storage accounts of source data Storage Blob Data Contributor role

The next CLI commands will assign the Storage Blob Data Contributor role to the
UAI. In this example, "Storage accounts of source data" doesn't apply because you
read the sample data from a public access blob storage. To use your own data
sources, you must assign the required roles to the UAI. To learn more about access
control, see role-based access control for Azure storage accounts and Azure
Machine Learning workspace.

Python

!az role assignment create --role "Storage Blob Data Contributor" --

assignee-object-id $uai_principal_id --assignee-principal-type
ServicePrincipal --scope $offline_store_gen2_container_arm_id
Python

!az role assignment create --role "Storage Blob Data Contributor" --

assignee-object-id $uai_principal_id --assignee-principal-type
ServicePrincipal --scope $source_data_gen2_container_arm_id

Python

!az role assignment create --role "Storage Blob Data Contributor" --

assignee-object-id $uai_principal_id --assignee-principal-type
ServicePrincipal --scope $observation_data_gen2_container_arm_id

Create a feature store with materialization

enabled

Set the feature store parameters

Set the feature store name, location, subscription ID, group name, and ARM ID values, as
shown in this code cell sample:

Python

# We use the subscription, resource group, region of this active project

workspace.
# Optionally, you can replace them to create the resources in a different
subsciprtion/resourceGroup, or use existing resources
import os

# At the minimum, define a name for the feature store

featurestore_name = "<YOUR_FEATURE_STORE_NAME>"
# It is recommended to create featurestore in the same location as the
storage
featurestore_location = storage_location
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]

feature_store_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.MachineLear
ningServices/workspaces/{ws_name}".format(
sub_id=featurestore_subscription_id,
rg=featurestore_resource_group_name,
ws_name=featurestore_name,
)
Following code cell generates a YAML specification file for a feature store with
materialization enabled.

Python

# The below code creates a feature store with enabled materialization

import yaml

config = {
"$schema": "http://azureml/sdk-2-0/FeatureStore.json",
"name": featurestore_name,
"location": featurestore_location,
"compute_runtime": {"spark_runtime_version": "3.2"},
"offline_store": {
"type": "azure_data_lake_gen2",
"target": offline_store_gen2_container_arm_id,
},
"materialization_identity": {"client_id": uai_client_id, "resource_id":
uai_arm_id},
}

feature_store_yaml = root_dir +
"/featurestore/featurestore_with_offline_setting.yaml"

with open(feature_store_yaml, "w") as outfile:

yaml.dump(config, outfile, default_flow_style=False)

Create the feature store

This code cell creates a feature store with materialization enabled by using the YAML
specification file generated in the previous step.

Python

!az ml feature-store create --file $feature_store_yaml --subscription

$featurestore_subscription_id --resource-group
$featurestore_resource_group_name

Initialize the Azure Machine Learning feature store core

SDK client
The SDK client initialized in this cell facilitates development and consumption of
features:

Python
# feature store client
from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential

Grant UAI access to the feature store

This code cell assigns AzureML Data Scientist role to the UAI on the created feature
store. To learn more about access control, see role-based access control for Azure
storage accounts and Azure Machine Learning workspace.

Python

!az role assignment create --role "AzureML Data Scientist" --assignee-

object-id $uai_principal_id --assignee-principal-type ServicePrincipal --
scope $feature_store_arm_id

Follow these instructions to get the Azure AD Object ID for your user identity. Then, use
your Azure AD Object ID in the following command to assign AzureML Data Scientist
role to your user identity on the created feature store.

Python

your_aad_objectid = "<YOUR_AAD_OBJECT_ID>"

!az role assignment create --role "AzureML Data Scientist" --assignee-

object-id $your_aad_objectid --assignee-principal-type User --scope
$feature_store_arm_id

Obtain the default storage account and key vault for the
feature store, and disable public network access to the
corresponding resources
The following code cell gets the feature store object for the next steps.

Python
fs = featurestore.feature_stores.get()

This code cell gets names of default storage account and key vault for the feature store.

Python

# Copy the properties storage_account and key_vault from the response

returned in feature store show command respectively
default_fs_storage_account_name = fs.storage_account.rsplit("/", 1)[-1]
default_key_vault_name = fs.key_vault.rsplit("/", 1)[-1]

This code cell disables public network access to the default storage account for the
feature store.

Python

# Disable the public network access for the above created default ADLS Gen2
storage account for the feature store
!az storage account update --name $default_fs_storage_account_name --
resource-group $featurestore_resource_group_name --subscription
$featurestore_subscription_id --public-network-access disabled

The following cell prints name of the default key vault for the feature store.

Python

print(default_key_vault_name)

Disable the public network access for the default feature

store key vault created earlier
Open the default key vault that you created in the previous cell, in the Azure
portal.
Select the Networking tab.
Select Disable public access, and then select Apply on the bottom left of the page.

Enable the managed virtual network for the

feature store workspace
Update the feature store with the necessary outbound
rules
The following code cell creates a YAML specification file for outbound rules that are
defined for the feature store.

Python

# The below code creates a configuration for managed virtual network for the
feature store
import yaml

config = {
"public_network_access": "disabled",
"managed_network": {
"isolation_mode": "allow_internet_outbound",
"outbound_rules": [
# You need to add multiple rules here if you have separate
storage account for source, observation data and offline store.
{
"name": "sourcerulefs",
"destination": {
"spark_enabled": "true",
"subresource_target": "dfs",
"service_resource_id":
f"/subscriptions/{storage_subscription_id}/resourcegroups/{storage_resource_
group_name}/providers/Microsoft.Storage/storageAccounts/{storage_account_nam
e}",
},
"type": "private_endpoint",
},
# This rule is added currently because serverless Spark doesn't
automatically create a private endpoint to default key vault.
{
"name": "defaultkeyvault",
"destination": {
"spark_enabled": "true",
"subresource_target": "vault",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Keyvault/vaults/{default_key_vault
_name}",
},
"type": "private_endpoint",
},
],
},
}

feature_store_managed_vnet_yaml = (
root_dir + "/featurestore/feature_store_managed_vnet_config.yaml"
)
with open(feature_store_managed_vnet_yaml, "w") as outfile:
yaml.dump(config, outfile, default_flow_style=False)

This code cell updates the feature store using the generated YAML specification file with
the outbound rules.

Python

# This command will change to `az ml featurestore update` in future for

parity.
!az ml workspace update --file $feature_store_managed_vnet_yaml --name
$featurestore_name --resource-group $featurestore_resource_group_name

Create private endpoints for the defined outbound rules

A provision-network command creates private endpoints from the managed virtual
network where the materialization job executes to the source, offline store, observation
data, default storage account, and the default key vault for the feature store. This
command may need about 20 minutes to complete.

Python

#### Provision network to create necessary private endpoints (it may take
approximately 20 minutes)
!az ml workspace provision-network --name $featurestore_name --resource-
group $featurestore_resource_group_name --include-spark

This code cell confirms that private endpoints defined by the outbound rules have been
created.

Python

### Check that managed virtual network is correctly enabled

### After provisioning the network, all the outbound rules should become
active
### For this tutorial, you will see 5 outbound rules
!az ml workspace show --name $featurestore_name --resource-group
$featurestore_resource_group_name

Update the managed virtual network for the

project workspace
Next, update the managed virtual network for the project workspace. First, get the
subscription ID, resource group, and workspace name for the project workspace.

Python

# lookup the subscription id, resource group and workspace name of the
current workspace
project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]

Update the project workspace with the necessary

outbound rules
The project workspace needs access to these resources:

Source data
Offline store
Observation data
Feature store
Default storage account of feature store

This code cell updates the project workspace using the generated YAML specification
file with required outbound rules.

Python

# The below code creates a configuration for managed virtual network for the
project workspace
import yaml

config = {
"managed_network": {
"isolation_mode": "allow_internet_outbound",
"outbound_rules": [
# Incase you have separate storage accounts for source,
observation data and offline store, you need to add multiple rules here. No
action needed otherwise.
{
"name": "projectsourcerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "dfs",
"service_resource_id":
f"/subscriptions/{storage_subscription_id}/resourcegroups/{storage_resource_
group_name}/providers/Microsoft.Storage/storageAccounts/{storage_account_nam
e}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to default storage of feature
store
{
"name": "defaultfsstoragerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "blob",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Storage/storageAccounts/{default_f
s_storage_account_name}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to default key vault of
feature store
{
"name": "defaultfskeyvaultrule",
"destination": {
"spark_enabled": "true",
"subresource_target": "vault",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Keyvault/vaults/{default_key_vault
_name}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to feature store
{
"name": "featurestorerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "amlworkspace",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces
/{featurestore_name}",
},
"type": "private_endpoint",
},
],
}
}

project_ws_managed_vnet_yaml = (
root_dir + "/featurestore/project_ws_managed_vnet_config.yaml"
)

with open(project_ws_managed_vnet_yaml, "w") as outfile:

yaml.dump(config, outfile, default_flow_style=False)
This code cell updates the project workspace using the generated YAML specification
file with the outbound rules.

Python

#### Update project workspace to create private endpoints for the defined
outbound rules (it may take approximately 15 minutes)
!az ml workspace update --file $project_ws_managed_vnet_yaml --name
$project_ws_name --resource-group $project_ws_rg

This code cell confirms that private endpoints defined by the outbound rules have been
created.

Python

!az ml workspace show --name $project_ws_name --resource-group

$project_ws_rg

You can also verify the outbound rules from the Azure portal by navigating to
Networking from left navigation panel for the project workspace and then opening
Workspace managed outbound access tab.

Prototype and develop a transaction rolling

aggregation feature set

Explore the transactions source data

７ Note

A publicly-accessible blob container hosts the sample data used in this tutorial. It
can only be read in Spark via wasbs driver. When you create feature sets using your
own source data, please host them in an ADLS Gen2 account, and use an abfss
driver in the data path.

Python

# remove the "." in the root directory path as we need to generate absolute
path to read from Spark
transactions_source_data_path =
f"abfss://{storage_file_system_name_source_data}@{storage_account_name}.dfs.
core.windows.net/transactions-source/*.parquet"
transactions_src_df = spark.read.parquet(transactions_source_data_path)

display(transactions_src_df.head(5))
# Note: display(training_df.head(5)) displays the timestamp column in a
different format. You can can call transactions_src_df.show() to see
correctly formatted value

Locally develop a transactions feature set

A feature set specification is a self-contained feature set definition that can be
developed and tested locally.

Create the following rolling window aggregate features:

transactions three-day count

transactions amount three-day sum
transactions amount three-day avg
transactions seven-day count
transactions amount seven-day sum
transactions amount seven-day avg

Inspect the feature transformation code file

featurestore/featuresets/transactions/spec/transformation_code/transaction_transfor

m.py . This spark transformer performs the rolling aggregation defined for the features.

To understand the feature set and transformations in more detail, see feature store
concepts.

Python
from azureml.featurestore import create_feature_set_spec, FeatureSetSpec
from azureml.featurestore.contracts import (
DateTimeOffset,
FeatureSource,
TransformationCode,
Column,
ColumnType,
SourceType,
TimestampColumn,
)

transactions_featureset_code_path = (
root_dir + "/featurestore/featuresets/transactions/transformation_code"
)

transactions_featureset_spec = create_feature_set_spec(
source=FeatureSource(
type=SourceType.parquet,

path=f"abfss://{storage_file_system_name_source_data}@{storage_account_name}
.dfs.core.windows.net/transactions-source/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
),
transformation_code=TransformationCode(
path=transactions_featureset_code_path,

transformer_class="transaction_transform.TransactionFeatureTransformer",
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
source_lookback=DateTimeOffset(days=7, hours=0, minutes=0),
temporal_join_lookback=DateTimeOffset(days=1, hours=0, minutes=0),
infer_schema=True,
)
# Generate a spark dataframe from the feature set specification
transactions_fset_df = transactions_featureset_spec.to_spark_dataframe()
# display few records
display(transactions_fset_df.head(5))

Export a feature set specification

To register a feature set specification with the feature store, that specification must be
saved in a specific format.

To inspect the generated transactions feature set specification, open this file from the
file tree to see the specification:

featurestore/featuresets/accounts/spec/FeaturesetSpec.yaml
The specification contains these elements:

source : a reference to a storage resource - in this case a parquet file in a blob

storage resource
features : a list of features and their datatypes. If you provide transformation code

index_columns : the join keys required to access values from the feature set

As another benefit of persisting a feature set specification as a YAML file, the

specification can be version controlled. Learn more about feature set specification in the
top level feature store entities document and the feature set specification YAML
reference.

Python

import os

# create a new folder to dump the feature set spec

transactions_featureset_spec_folder = (
root_dir + "/featurestore/featuresets/transactions/spec"
)

# check if the folder exists, create one if not

if not os.path.exists(transactions_featureset_spec_folder):
os.makedirs(transactions_featureset_spec_folder)

transactions_featureset_spec.dump(transactions_featureset_spec_folder)

Register a feature-store entity

Entities help enforce use of the same join key definitions across feature sets that use the
same logical entities. Entity examples could include account entities, customer entities,
etc. Entities are typically created once and then reused across feature sets. For more
information, see the top level feature store entities document.

This code cell creates an account entity for the feature store.

Python

account_entity_path = root_dir + "/featurestore/entities/account.yaml"

!az ml feature-store-entity create --file $account_entity_path --resource-
group $featurestore_resource_group_name --workspace-name $featurestore_name
Register the transaction feature set with the
feature store, and submit a materialization job
To share and reuse a feature set asset, you must first register that asset with the feature
store. Feature set asset registration offers managed capabilities including versioning and
materialization. This tutorial series covers these topics.

The feature set asset references both the feature set spec that you created earlier, and
other properties like version and materialization settings.

Create a feature set

The following code cell creates a feature set by using a predefined YAML specification
file.

Python

transactions_featureset_path = (
root_dir
+
"/featurestore/featuresets/transactions/featureset_asset_offline_enabled.yam
l"
)
!az ml feature-set create --file $transactions_featureset_path --resource-
group $featurestore_resource_group_name --workspace-name $featurestore_name

This code cell previews the newly created feature set.

Python

# Preview the newly created feature set

!az ml feature-set show --resource-group $featurestore_resource_group_name -

-workspace-name $featurestore_name -n transactions -v 1

Submit a backfill materialization job

The following code cell defines start and end time values for the feature materialization
window, and submits a backfill materialization job.

Python

feature_window_start_time = "2023-02-01T00:00.000Z"
feature_window_end_time = "2023-03-01T00:00.000Z"
!az ml feature-set backfill --name transactions --version 1 --workspace-name
$featurestore_name --resource-group $featurestore_resource_group_name --
feature-window-start-time $feature_window_start_time --feature-window-end-
time $feature_window_end_time

This code cell checks the status of the backfill materialization job, by providing
<JOB_ID_FROM_PREVIOUS_COMMAND> .

Python

### Check the job status

!az ml job show --name <JOB_ID_FROM_PREVIOUS_COMMAND> -g

$featurestore_resource_group_name -w $featurestore_name

Next, This code cell lists all the materialization jobs for the current feature set.

Python

### List all the materialization jobs for the current feature set

!az ml feature-set list-materialization-operation --name transactions --

version 1 -g $featurestore_resource_group_name -w $featurestore_name

Use the registered features to generate training

data

Load observation data

Start by exploring the observation data. The core data used for training and inference
typically involves observation data. The core data is then joined with feature data, to
create a full training data resource. Observation data is the data captured during the
time of the event. In this case, it has core transaction data including transaction ID,
account ID, and transaction amount values. Here, since the observation data is used for
training, it also has the target variable appended ( is_fraud ).

Python

observation_data_path =
f"abfss://{storage_file_system_name_observation_data}@{storage_account_name}
.dfs.core.windows.net/train/*.parquet"
observation_data_df = spark.read.parquet(observation_data_path)
obs_data_timestamp_column = "timestamp"
display(observation_data_df)
# Note: the timestamp column is displayed in a different format. Optionally,
you can can call training_df.show() to see correctly formatted value

Get the registered feature set, and list its features

Next, get a feature set by providing its name and version, and then list features in this
feature set. Also, print some sample feature values.

Python

# look up the featureset by providing name and version

transactions_featureset = featurestore.feature_sets.get("transactions", "1")
# list its features
transactions_featureset.features

Python

# print sample values

display(transactions_featureset.to_spark_dataframe().head(5))

Select features, and generate training data

Select features for the training data, and use the feature store SDK to generate the
training data.

Python

from azureml.featurestore import get_offline_features

# you can select features in pythonic way

features = [
transactions_featureset.get_feature("transaction_amount_7d_sum"),
transactions_featureset.get_feature("transaction_amount_7d_avg"),
]

# you can also specify features in string form:

featurestore:featureset:version:feature
more_features = [
"transactions:1:transaction_3d_count",
"transactions:1:transaction_amount_3d_avg",
]

more_features = featurestore.resolve_feature_uri(more_features)
features.extend(more_features)
# generate training dataframe by using feature data and observation data
training_df = get_offline_features(
features=features,
observation_data=observation_data_df,
timestamp_column=obs_data_timestamp_column,
)

# Ignore the message that says feature set is not materialized

(materialization is optional). We will enable materialization in the next
part of the tutorial.
display(training_df)
# Note: the timestamp column is displayed in a different format. Optionally,
you can can call training_df.show() to see correctly formatted value

You can see that a point-in-time join appended the features to the training data.

Optional next steps

Now that you successfully created a secure feature store and submitted a successful
materialization run, you can go through the tutorial series to build an understanding of
the feature store.

This tutorial contains a mixture of steps from tutorials 1 and 2 of this series. Remember
to replace the necessary public storage containers used in the other tutorial notebooks
with the ones created in this tutorial notebook, for the network isolation.

We have reached the end of the tutorial. Your training data uses features from a feature
store. You can either save it to storage for later use, or directly run model training on it.

Next steps
Part 3: Experiment and train models using features
Part 4: Enable recurrent materialization and run batch inference
How Azure Machine Learning works:
resources and assets
Article • 04/04/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

This article applies to the second version of the Azure Machine Learning CLI & Python
SDK (v2). For version one (v1), see How Azure Machine Learning works: Architecture and
concepts (v1)

Azure Machine Learning includes several resources and assets to enable you to perform
your machine learning tasks. These resources and assets are needed to run any job.

Resources: setup or infrastructural resources needed to run a machine learning

workflow. Resources include:
Workspace
Compute
Datastore
Assets: created using Azure Machine Learning commands or as part of a
training/scoring run. Assets are versioned and can be registered in the Azure
Machine Learning workspace. They include:
Model
Environment
Data
Component

This document provides a quick overview of these resources and assets.

Workspace
The workspace is the top-level resource for Azure Machine Learning, providing a
centralized place to work with all the artifacts you create when you use Azure Machine
Learning. The workspace keeps a history of all jobs, including logs, metrics, output, and
a snapshot of your scripts. The workspace stores references to resources like datastores
and compute. It also holds all assets like models, environments, components and data
asset.

Create a workspace
Azure CLI

To create a workspace using CLI v2, use the following command:

APPLIES TO: Azure CLI ml extension v2 (current)

Bash

az ml workspace create --file my_workspace.yml

For more information, see workspace YAML schema.

Compute
A compute is a designated compute resource where you run your job or host your
endpoint. Azure Machine Learning supports the following types of compute:

Compute cluster - a managed-compute infrastructure that allows you to easily

create a cluster of CPU or GPU compute nodes in the cloud.

７ Note

Instead of creating a compute cluster, use serverless compute (preview) to

offload compute lifecycle management to Azure Machine Learning.

Compute instance - a fully configured and managed development environment in

the cloud. You can use the instance as a training or inference compute for
development and testing. It's similar to a virtual machine on the cloud.

Inference cluster - used to deploy trained machine learning models to Azure

Kubernetes Service. You can create an Azure Kubernetes Service (AKS) cluster from
your Azure Machine Learning workspace, or attach an existing AKS cluster.

Attached compute - You can attach your own compute resources to your
workspace and use them for training and inference.

Azure CLI

To create a compute using CLI v2, use the following command:

APPLIES TO: Azure CLI ml extension v2 (current)

Bash

az ml compute --file my_compute.yml

For more information, see compute YAML schema.

Datastore
Azure Machine Learning datastores securely keep the connection information to your
data storage on Azure, so you don't have to code it in your scripts. You can register and
create a datastore to easily connect to your storage account, and access the data in your
underlying storage service. The CLI v2 and SDK v2 support the following types of cloud-
based storage services:

Azure Blob Container

Azure File Share
Azure Data Lake
Azure Data Lake Gen2

Azure CLI

To create a datastore using CLI v2, use the following command:

APPLIES TO: Azure CLI ml extension v2 (current)

Bash

az ml datastore create --file my_datastore.yml

For more information, see datastore YAML schema.

Model
Azure machine learning models consist of the binary file(s) that represent a machine
learning model and any corresponding metadata. Models can be created from a local or
remote file or directory. For remote locations https , wasbs and azureml locations are
supported. The created model will be tracked in the workspace under the specified
name and version. Azure Machine Learning supports three types of storage format for
models:
custom_model

mlflow_model
triton_model

Creating a model

Azure CLI

To create a model using CLI v2, use the following command:

APPLIES TO: Azure CLI ml extension v2 (current)

Bash

az ml model create --file my_model.yml

For more information, see model YAML schema.

Environment
Azure Machine Learning environments are an encapsulation of the environment where
your machine learning task happens. They specify the software packages, environment
variables, and software settings around your training and scoring scripts. The
environments are managed and versioned entities within your Machine Learning
workspace. Environments enable reproducible, auditable, and portable machine learning
workflows across a variety of computes.

Types of environment
Azure Machine Learning supports two types of environments: curated and custom.

Curated environments are provided by Azure Machine Learning and are available in your
workspace by default. Intended to be used as is, they contain collections of Python
packages and settings to help you get started with various machine learning
frameworks. These pre-created environments also allow for faster deployment time. For
a full list, see the curated environments article.

In custom environments, you're responsible for setting up your environment and

installing packages or any other dependencies that your training or scoring script needs
on the compute. Azure Machine Learning allows you to create your own environment
using

A docker image
A base docker image with a conda YAML to customize further
A docker build context

Create an Azure Machine Learning custom environment

Azure CLI

To create an environment using CLI v2, use the following command:

APPLIES TO: Azure CLI ml extension v2 (current)

Bash

az ml environment create --file my_environment.yml

For more information, see environment YAML schema.

Data
Azure Machine Learning allows you to work with different types of data:

URIs (a location in local/cloud storage)

uri_folder
uri_file

Tables (a tabular data abstraction)

mltable
Primitives
string
boolean

number

For most scenarios, you'll use URIs ( uri_folder and uri_file ) - a location in storage
that can be easily mapped to the filesystem of a compute node in a job by either
mounting or downloading the storage to the node.

mltable is an abstraction for tabular data that is to be used for AutoML Jobs, Parallel

Jobs, and some advanced scenarios. If you're just starting to use Azure Machine
Learning and aren't using AutoML, we strongly encourage you to begin with URIs.

Component
An Azure Machine Learning component is a self-contained piece of code that does one
step in a machine learning pipeline. Components are the building blocks of advanced
machine learning pipelines. Components can do tasks such as data processing, model
training, model scoring, and so on. A component is analogous to a function - it has a
name, parameters, expects input, and returns output.

Next steps
How to upgrade from v1 to v2
Train models with the v2 CLI and SDK
What is an Azure Machine Learning
workspace?
Article • 04/12/2023

Workspaces are places to collaborate with colleagues to create machine learning

artifacts and group related work. For example, experiments, jobs, datasets, models,
components, and inference endpoints. This article describes workspaces, how to
manage access to them, and how to use them to organize your work.

Ready to get started? Create a workspace.

Tasks performed within a workspace

For machine learning teams, the workspace is a place to organize their work. Below are
some of the tasks you can start from a workspace:

Create jobs - Jobs are training runs you use to build your models. You can group
jobs into experiments to compare metrics.
Author pipelines - Pipelines are reusable workflows for training and retraining your
model.
Register data assets - Data assets aid in management of the data you use for
model training and pipeline creation.
Register models - Once you have a model you want to deploy, you create a
registered model.

Create online endpoints - Use a registered model and a scoring script to create an
online endpoint.

Besides grouping your machine learning results, workspaces also host resource
configurations:

Compute targets are used to run your experiments.

Datastores define how you and others can connect to data sources when using
data assets.
Security settings - Networking, identity and access control, and encryption settings.

Organizing workspaces
For machine learning team leads and administrators, workspaces serve as containers for
access management, cost management and data isolation. Below are some tips for
organizing workspaces:

Use user roles for permission management in the workspace between users. For
example a data scientist, a machine learning engineer or an admin.
Assign access to user groups: By using Azure Active Directory user groups, you
don't have to add individual users to each workspace, and to other resources the
same group of users requires access to.
Create a workspace per project: While a workspace can be used for multiple
projects, limiting it to one project per workspace allows for cost reporting accrued
to a project level. It also allows you to manage configurations like datastores in the
scope of each project.
Share Azure resources: Workspaces require you to create several associated
resources. Share these resources between workspaces to save repetitive setup
steps.
Enable self-serve: Pre-create and secure associated resources as an IT admin, and
use user roles to let data scientists create workspaces on their own.
Share assets: You can share assets between workspaces using Azure Machine
Learning registries.

How is my content stored in a workspace?

Your workspace keeps a history of all training runs, with logs, metrics, output, lineage
metadata, and a snapshot of your scripts. As you perform tasks in Azure Machine
Learning, artifacts are generated. Their metadata and data are stored in the workspace
and on its associated resources.

Associated resources
When you create a new workspace, you're required to bring other Azure resources to
store your data. If not provided by you, these resources will automatically be created by
Azure Machine Learning.

Azure Storage account . Stores machine learning artifacts such as job logs. By
default, this storage account is used when you upload data to the workspace.
Jupyter notebooks that are used with your Azure Machine Learning compute
instances are stored here as well.

） Important

To use an existing Azure Storage account, it can't be of type BlobStorage, a

premium account (Premium_LRS and Premium_GRS) and cannot have a
hierarchical namespace (used with Azure Data Lake Storage Gen2). You can
use premium storage or hierarchical namespace as additional storage by
creating a datastore. Do not enable hierarchical namespace on the storage
account after upgrading to general-purpose v2. If you bring an existing
general-purpose v1 storage account, you may upgrade this to general-
purpose v2 after the workspace has been created.

Azure Container Registry . Stores created docker containers, when you build
custom environments via Azure Machine Learning. Scenarios that trigger creation
of custom environments include AutoML when deploying models and data
profiling.

７ Note
Workspaces can be created without Azure Container Registry as a dependency
if you do not have a need to build custom docker containers. To read
container images, Azure Machine Learning also works with external container
registries. Azure Container Registry is automatically provisioned when you
build custom docker images. Use Azure RBAC to prevent customer docker
containers from being built.

７ Note

If your subscription setting requires adding tags to resources under it, Azure
Container Registry (ACR) created by Azure Machine Learning will fail, since we
cannot set tags to ACR.

Azure Application Insights . Helps you monitor and collect diagnostic information
from your inference endpoints.

For more information, see Monitor online endpoints.

Azure Key Vault . Stores secrets that are used by compute targets and other
sensitive information that's needed by the workspace.

Create a workspace
There are multiple ways to create a workspace. To get started use one of the following
options:

The Azure Machine Learning studio lets you quickly create a workspace with
default settings.
Use Azure portal for a point-and-click interface with more security options.
Use the VS Code extension if you work in Visual Studio Code.

To automate workspace creation using your preferred security settings:

Azure Resource Manager / Bicep templates provide a declarative syntax to deploy

Azure resources. An alternative option is to use Terraform. Also see How to create
a secure workspace by using a template.

Use the Azure Machine Learning CLI or Azure Machine Learning SDK for Python for
prototyping and as part of your MLOps workflows.

Use REST APIs directly in scripting environment, for platform integration or in

MLOps workfows.
Tools for workspace interaction and
management
Once your workspace is set up, you can interact with it in the following ways:

On the web:
Azure Machine Learning studio
Azure Machine Learning designer

In any Python environment with the Azure Machine Learning SDK .

On the command line using the Azure Machine Learning CLI extension v2

Azure Machine Learning VS Code Extension

The following workspace management tasks are available in each interface.

Workspace management task Portal Studio Python SDK Azure CLI VS Code

Create a workspace ✓ ✓ ✓ ✓ ✓

Manage workspace access ✓ ✓

Create and manage compute resources ✓ ✓ ✓ ✓ ✓

Create a compute instance ✓ ✓ ✓ ✓

２ Warning

Moving your Azure Machine Learning workspace to a different subscription, or

moving the owning subscription to a new tenant, is not supported. Doing so may
cause errors.

Sub resources
When you create compute clusters and compute instances in Azure Machine Learning,
sub resources are created.

VMs: provide computing power for compute instances and compute clusters,
which you use to run jobs.
Load Balancer: a network load balancer is created for each compute instance and
compute cluster to manage traffic even while the compute instance/cluster is
stopped.
Virtual Network: these help Azure resources communicate with one another, the
internet, and other on-premises networks.
Bandwidth: encapsulates all outbound data transfers across regions.

Next steps
To learn more about planning a workspace for your organization's requirements, see
Organize and set up Azure Machine Learning.

To get started with Azure Machine Learning, see:

What is Azure Machine Learning?

Create and manage a workspace
Recover a workspace after deletion (soft-delete)
Get started with Azure Machine Learning
Tutorial: Create your first classification model with automated machine learning
Search for Azure Machine Learning
assets
Article • 01/12/2023

Use the search bar to find machine learning assets across all workspaces, resource
groups, and subscriptions in your organization. Your search text will be used to find
assets such as:

Jobs
Models
Components
Environments
Data

Free text search

1. Sign in to Azure Machine Learning studio .

2. In the top studio titlebar, if a workspace is open, select This workspace or All
workspaces to set the search context.

3. Type your text and hit enter to trigger a 'contains' search. A contains search scans
across all metadata fields for the given asset and sorts results by relevancy score
which is determined by weightings for different column properties.

Structured search
1. Sign in to Azure Machine Learning studio .
2. In the top studio titlebar, select All workspaces.
3. Click inside the search field to display filters to create more specific search queries.

The following filters are supported:

Job
Model
Component
Tags
SubmittedBy
Environment
Data

If an asset filter (job, model, component, environment, data) is present, results are
scoped to those tabs. Other filters apply to all assets unless an asset filter is also present
in the query. Similarly, free text search can be provided alongside filters, but are scoped
to the tabs chosen by asset filters, if present.

 Tip

Filters search for exact matches of text. Use free text queries for a contains
search.
Quotations are required around values that include spaces or other special
characters.
If duplicate filters are provided, only the first will be recognized in search
results.
Input text of any language is supported but filter strings must match the
provided options (ex. submittedBy:).
The tags filter can accept multiple key:value pairs separated by a comma (ex.
tags:"key1:value1, key2:value2").

View search results

You can view your search results in the individual Jobs, Models, Components,
Environments, and Data tabs. Select an asset to open its Details page in the context of
the relevant workspace. Results from workspaces you don't have permissions to view
aren't displayed.

If you've used this feature in a previous update, a search result error may occur. Reselect
your preferred workspaces in the Directory + Subscription + Workspace tab.

） Important

Search results may be unexpected for multiword terms in other languages (ex.
Chinese characters).
Customize search results
You can create, save and share different views for your search results.

1. On the search results page, select Edit view.

Use the menu to customize and create new views:

Item Description

Edit columns Add, delete, and re-order columns in the current view's search results table

Reset Add all hidden columns back into the view

Share Displays a URL you can copy to share this view

New... Create a new view

Clone Clone the current view as a new view

Since each tab displays different columns, you customize views separately for each tab.

Next steps
What is an Azure Machine Learning workspace?
Data in Azure Machine Learning
What is an Azure Machine Learning
compute instance?
Article • 09/27/2023

An Azure Machine Learning compute instance is a managed cloud-based workstation

for data scientists. Each compute instance has only one owner, although you can share
files between multiple compute instances.

Compute instances make it easy to get started with Azure Machine Learning
development and provide management and enterprise readiness capabilities for IT
administrators.

Use a compute instance as your fully configured and managed development

environment in the cloud for machine learning. They can also be used as a compute
target for training and inferencing for development and testing purposes.

For compute instance Jupyter functionality to work, ensure that web socket
communication isn't disabled. Ensure your network allows websocket connections to
*.instances.azureml.net and *.instances.azureml.ms.

） Important

Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .

Why use a compute instance?

A compute instance is a fully managed cloud-based workstation optimized for your
machine learning development environment. It provides the following benefits:

Key benefits Description

Productivity You can build and deploy models using integrated notebooks and the
following tools in Azure Machine Learning studio:
- Jupyter
- JupyterLab
- VS Code (preview)
Compute instance is fully integrated with Azure Machine Learning
Key benefits Description

workspace and studio. You can share notebooks and data with other data
scientists in the workspace.

Managed & secure Reduce your security footprint and add compliance with enterprise
security requirements. Compute instances provide robust management
policies and secure networking configurations such as:

- Autoprovisioning from Resource Manager templates or Azure Machine

Learning SDK
- Azure role-based access control (Azure RBAC)
- Virtual network support
- Azure policy to disable SSH access
- Azure policy to enforce creation in a virtual network
- Auto-shutdown/auto-start based on schedule
- TLS 1.2 enabled

Preconfigured for ML Save time on setup tasks with pre-configured and up-to-date ML
packages, deep learning frameworks, GPU drivers.

Fully customizable Broad support for Azure VM types including GPUs and persisted low-level
customization such as installing packages and drivers makes advanced
scenarios a breeze. You can also use setup scripts to automate
customization

Secure your compute instance with No public IP.

The compute instance is also a secure training compute target similar to compute
clusters, but it's single node.
You can create a compute instance yourself, or an administrator can create a
compute instance on your behalf.
You can also use a setup script for an automated way to customize and configure
the compute instance as per your needs.
To save on costs, create a schedule to automatically start and stop the compute
instance, or enable idle shutdown

Tools and environments

Azure Machine Learning compute instance enables you to author, train, and deploy
models in a fully integrated notebook experience in your workspace.

You can run notebooks from your Azure Machine Learning workspace, Jupyter ,
JupyterLab , or Visual Studio Code. VS Code Desktop can be configured to access your
compute instance. Or use VS Code for the Web, directly from the browser, and without
any required installations or dependencies.
We recommend you try VS Code for the Web to take advantage of the easy integration
and rich development environment it provides. VS Code for the Web gives you many of
the features of VS Code Desktop that you love, including search and syntax highlighting
while browsing and editing. For more information about using VS Code Desktop and VS
Code for the Web, see Launch Visual Studio Code integrated with Azure Machine
Learning (preview) and Work in VS Code remotely connected to a compute instance
(preview).

You can install packages and add kernels to your compute instance.

The following tools and environments are already installed on the compute instance:

General tools & environments Details

Drivers CUDA
cuDNN
NVIDIA
Blob FUSE

Intel MPI library

Azure CLI

Azure Machine Learning samples

Docker

Nginx

NCCL 2.0

Protobuf

R tools & environments Details

R kernel

You can Add RStudio or Posit Workbench (formerly RStudio Workbench) when you
create the instance.

PYTHON tools & Details

environments

Anaconda Python

Jupyter and extensions

Jupyterlab and extensions

PYTHON tools & Details
environments

Azure Machine Learning SDK Includes azure-ai-ml and many common azure extra packages.
for Python from PyPI To see the full list,
open a terminal window on your compute instance and run
conda list -n azureml_py310_sdkv2 ^azure

Other PyPI packages jupytext

tensorboard
nbconvert
notebook
Pillow

Conda packages cython

numpy
ipykernel
scikit-learn
matplotlib
tqdm
joblib
nodejs

Deep learning packages PyTorch

TensorFlow
Keras
Horovod
MLFlow
pandas-ml
scrapbook

ONNX packages keras2onnx

onnx
onnxconverter-common
skl2onnx
onnxmltools

Azure Machine Learning Python

samples

The compute instance has Ubuntu as the base OS.

Accessing files
Notebooks and Python scripts are stored in the default storage account of your
workspace in Azure file share. These files are located under your "User files" directory.
This storage makes it easy to share notebooks between compute instances. The storage
account also keeps your notebooks safely preserved when you stop or delete a compute
instance.

The Azure file share account of your workspace is mounted as a drive on the compute
instance. This drive is the default working directory for Jupyter, Jupyter Labs, RStudio,
and Posit Workbench. This means that the notebooks and other files you create in
Jupyter, JupyterLab, VS Code for Web, RStudio, or Posit are automatically stored on the
file share and available to use in other compute instances as well.

The files in the file share are accessible from all compute instances in the same
workspace. Any changes to these files on the compute instance will be reliably persisted
back to the file share.

You can also clone the latest Azure Machine Learning samples to your folder under the
user files directory in the workspace file share.

Writing small files can be slower on network drives than writing to the compute instance
local disk itself. If you're writing many small files, try using a directory directly on the
compute instance, such as a /tmp directory. Note these files won't be accessible from
other compute instances.

Don't store training data on the notebooks file share. For information on the various
options to store data, see Access data in a job.

You can use the /tmp directory on the compute instance for your temporary data.
However, don't write large files of data on the OS disk of the compute instance. OS disk
on compute instance has 128-GB capacity. You can also store temporary training data
on temporary disk mounted on /mnt. Temporary disk size is based on the VM size
chosen and can store larger amounts of data if a higher size VM is chosen. Any software
packages you install are saved on the OS disk of compute instance. Note customer
managed key encryption is currently not supported for OS disk. The OS disk for
compute instance is encrypted with Microsoft-managed keys.

Create
Follow the steps in Create resources you need to get started to create a basic compute
instance.

For more options, see create a new compute instance.

As an administrator, you can create a compute instance for others in the workspace.
You can also use a setup script for an automated way to customize and configure the
compute instance.

Other ways to create a compute instance:

Directly from the integrated notebooks experience.

From Azure Resource Manager template. For an example template, see the create
an Azure Machine Learning compute instance template .
With Azure Machine Learning SDK
From the CLI extension for Azure Machine Learning

The dedicated cores per region per VM family quota and total regional quota, which
applies to compute instance creation, is unified and shared with Azure Machine Learning
training compute cluster quota. Stopping the compute instance doesn't release quota to
ensure you'll be able to restart the compute instance. Don't stop the compute instance
through the OS terminal by doing a sudo shutdown.

Compute instance comes with P10 OS disk. Temp disk type depends on the VM size
chosen. Currently, it isn't possible to change the OS disk type.

Compute target
Compute instances can be used as a training compute target similar to Azure Machine
Learning compute training clusters. But a compute instance has only a single node,
while a compute cluster can have more nodes.

A compute instance:

Has a job queue.

Runs jobs securely in a virtual network environment, without requiring enterprises
to open up SSH port. The job executes in a containerized environment and
packages your model dependencies in a Docker container.
Can run multiple small jobs in parallel. One job per core can run in parallel while
the rest of the jobs are queued.
Supports single-node multi-GPU distributed training jobs

You can use compute instance as a local inferencing deployment target for test/debug
scenarios.

 Tip

The compute instance has 120GB OS disk. If you run out of disk space and get into
an unusable state, please clear at least 5 GB disk space on OS disk (mounted on /)
through the compute instance terminal by removing files/folders and then do sudo
reboot . Temporary disk will be freed after restart; you do not need to clear space on

temp disk manually. To access the terminal go to compute list page or compute
instance details page and click on Terminal link. You can check available disk space
by running df -h on the terminal. Clear at least 5 GB space before doing sudo
reboot . Please do not stop or restart the compute instance through the Studio until

5 GB disk space has been cleared. Auto shutdowns, including scheduled start or
stop as well as idle shutdowns, will not work if the CI disk is full.

Next steps
Create resources you need to get started.
Tutorial: Train your first ML model shows how to use a compute instance with an
integrated notebook.
What are compute targets in Azure
Machine Learning?
Article • 12/06/2023

A compute target is a designated compute resource or environment where you run your
training script or host your service deployment. This location might be your local
machine or a cloud-based compute resource. Using compute targets makes it easy for
you to later change your compute environment without having to change your code.

In a typical model development lifecycle, you might:

1. Start by developing and experimenting on a small amount of data. At this stage,

use your local environment, such as a local computer or cloud-based virtual
machine (VM), as your compute target.
2. Scale up to larger data, or do distributed training by using one of these training
compute targets.
3. After your model is ready, deploy it to a web hosting environment with one of
these deployment compute targets.

The compute resources you use for your compute targets are attached to a workspace.
Compute resources other than the local machine are shared by users of the workspace.

Training compute targets

Azure Machine Learning has varying support across different compute targets. A typical
model development lifecycle starts with development or experimentation on a small
amount of data. At this stage, use a local environment like your local computer or a
cloud-based VM. As you scale up your training on larger datasets or perform distributed
training, use Azure Machine Learning compute to create a single- or multi-node cluster
that autoscales each time you submit a job. You can also attach your own compute
resource, although support for different scenarios might vary.

Compute targets can be reused from one training job to the next. For example, after
you attach a remote VM to your workspace, you can reuse it for multiple jobs. For
machine learning pipelines, use the appropriate pipeline step for each compute target.

You can use any of the following resources for a training compute target for most jobs.
Not all resources can be used for automated machine learning, machine learning
pipelines, or designer. Azure Databricks can be used as a training resource for local runs
and machine learning pipelines, but not as a remote target for other training.
ﾉ Expand table

Training targets Automated Machine Azure Machine

machine learning learning Learning designer
pipelines

Local computer Yes

Azure Machine Learning Yes Yes Yes

compute cluster

Azure Machine Learning Yes Yes Yes

serverless compute

Azure Machine Learning Yes (through SDK) Yes Yes

compute instance

Azure Machine Learning Yes Yes

Kubernetes

Remote VM Yes Yes

Apache Spark pools Yes (SDK local mode Yes

(preview) only)

Azure Databricks Yes (SDK local mode Yes

only)

Azure Data Lake Analytics Yes

Azure HDInsight Yes

Azure Batch Yes

 Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the
terminal to clear at least 1-2 GB before you stop or restart the compute instance.

Compute targets for inference

When performing inference, Azure Machine Learning creates a Docker container that
hosts the model and associated resources needed to use it. This container is then used
in a compute target.

The compute target you use to host your model will affect the cost and availability of
your deployed endpoint. Use this table to choose an appropriate compute target.
ﾉ Expand table

Compute target Used for GPU Description

support

Azure Machine Real-time Yes Fully managed computes for real-time

Learning inference (managed online endpoints) and batch
endpoints scoring (batch endpoints) on serverless
Batch inference compute.

Azure Machine Real-time Yes Run inferencing workloads on on-premises,

Learning inference cloud, and edge Kubernetes clusters.
Kubernetes
Batch inference

７ Note

When choosing a cluster SKU, first scale up and then scale out. Start with a machine
that has 150% of the RAM your model requires, profile the result and find a
machine that has the performance you need. Once you've learned that, increase the
number of machines to fit your need for concurrent inference.

Learn where and how to deploy your model to a compute target.

Azure Machine Learning compute (managed)

Azure Machine Learning creates and manages the managed compute resources. This
type of compute is optimized for machine learning workloads. Azure Machine Learning
compute clusters, serverless compute, and compute instances are the only managed
computes.

There's no need to create serverless compute. You can create Azure Machine Learning
compute instances or compute clusters from:

Azure Machine Learning studio.

The Python SDK and the Azure CLI:
Compute instance.
Compute cluster.
An Azure Resource Manager template. For an example template, see Create an
Azure Machine Learning compute cluster .

７ Note
Instead of creating a compute cluster, use serverless compute to offload compute
lifecycle management to Azure Machine Learning.

When created, these compute resources are automatically part of your workspace,
unlike other kinds of compute targets.

ﾉ Expand table

Capability Compute cluster Compute instance

Single- or multi-node cluster ✓ Single node cluster

Autoscales each time you submit a job ✓

Automatic cluster management and job scheduling ✓ ✓

Support for both CPU and GPU resources ✓ ✓

７ Note

To avoid charges when the compute is idle:

For compute cluster make sure the minimum number of nodes is set to 0, or
use serverless compute.
For a compute instance, enable idle shutdown.

Supported VM series and sizes

） Important

If your compute instance or compute clusters are based on any of these series,
recreate with another VM size before their retirement date to avoid service
disruption.

These series are retiring on August 31, 2023:

Azure NC-series
Azure NCv2-series
Azure ND-series
Azure NV- and NV_Promo series

These series are retiring on August 31, 2024:

Azure Av1-series
Azure HB-series

When you select a node size for a managed compute resource in Azure Machine
Learning, you can choose from among select VM sizes available in Azure. Azure offers a
range of sizes for Linux and Windows for different workloads. To learn more, see VM
types and sizes.

There are a few exceptions and limitations to choosing a VM size:

Some VM series aren't supported in Azure Machine Learning.

Some VM series, such as GPUs and other special SKUs, might not initially appear in
your list of available VMs. But you can still use them, once you request a quota
change. For more information about requesting quotas, see Request quota and
limit increases. See the following table to learn more about supported series.

ﾉ Expand table

Supported VM series Category Supported by

DDSv4 General purpose Compute clusters and instance

Dv2 General purpose Compute clusters and instance

Dv3 General purpose Compute clusters and instance

DSv2 General purpose Compute clusters and instance

DSv3 General purpose Compute clusters and instance

EAv4 Memory optimized Compute clusters and instance

Ev3 Memory optimized Compute clusters and instance

ESv3 Memory optimized Compute clusters and instance

FSv2 Compute optimized Compute clusters and instance

FX Compute optimized Compute clusters

H High performance compute Compute clusters and instance

HB High performance compute Compute clusters and instance

HBv2 High performance compute Compute clusters and instance

HBv3 High performance compute Compute clusters and instance

Supported VM series Category Supported by

HC High performance compute Compute clusters and instance

LSv2 Storage optimized Compute clusters and instance

M Memory optimized Compute clusters and instance

NC GPU Compute clusters and instance

NC Promo GPU Compute clusters and instance

NCv2 GPU Compute clusters and instance

NCv3 GPU Compute clusters and instance

ND GPU Compute clusters and instance

NDv2 GPU Compute clusters and instance

NV GPU Compute clusters and instance

NVv3 GPU Compute clusters and instance

NCasT4_v3 GPU Compute clusters and instance

NDasrA100_v4 GPU Compute clusters and instance

While Azure Machine Learning supports these VM series, they might not be available in
all Azure regions. To check whether VM series are available, see Products available by
region .

７ Note

Azure Machine Learning doesn't support all VM sizes that Azure Compute supports.
To list the available VM sizes, use one of the following methods:

REST API

The Azure CLI extension 2.0 for machine learning command, az ml compute
list-sizes.

If using the GPU-enabled compute targets, it is important to ensure that the correct
CUDA drivers are installed in the training environment. Use the following table to
determine the correct CUDA version to use:
ﾉ Expand table

GPU Architecture Azure VM Series Supported CUDA versions

Ampere NDA100_v4 11.0+

Turing NCT4_v3 10.0+

Volta NCv3, NDv2 9.0+

Pascal NCv2, ND 9.0+

Maxwell NV, NVv3 9.0+

Kepler NC, NC Promo 9.0+

In addition to ensuring the CUDA version and hardware are compatible, also ensure that
the CUDA version is compatible with the version of the machine learning framework you
are using:

For PyTorch, you can check the compatibility by visiting Pytorch's previous versions
page .
For Tensorflow, you can check the compatibility by visiting Tensorflow's build from
source page .

Compute isolation
Azure Machine Learning compute offers VM sizes that are isolated to a specific
hardware type and dedicated to a single customer. Isolated VM sizes are best suited for
workloads that require a high degree of isolation from other customers' workloads for
reasons that include meeting compliance and regulatory requirements. Utilizing an
isolated size guarantees that your VM will be the only one running on that specific
server instance.

The current isolated VM offerings include:

Standard_M128ms
Standard_F72s_v2
Standard_NC24s_v3
Standard_NC24rs_v3*

*RDMA capable

To learn more about isolation, see Isolation in the Azure public cloud.
Unmanaged compute
An unmanaged compute target is not managed by Azure Machine Learning. You create
this type of compute target outside Azure Machine Learning and then attach it to your
workspace. Unmanaged compute resources can require additional steps for you to
maintain or to improve performance for machine learning workloads.

Azure Machine Learning supports the following unmanaged compute types:

Remote virtual machines

Azure HDInsight
Azure Databricks
Azure Data Lake Analytics

Kubernetes

For more information, see Manage compute resources.

Next steps
Learn how to:

Deploy your model to a compute target

What are Azure Machine Learning
environments?
Article • 01/03/2024

Azure Machine Learning environments are an encapsulation of the environment where

your machine learning training happens. They specify the Python packages, environment
variables, and software settings around your training and scoring scripts. They also
specify runtimes (Python, Spark, or Docker). The environments are managed and
versioned entities within your Machine Learning workspace that enable reproducible,
auditable, and portable machine learning workflows across a variety of compute targets.

You can use an Environment object on your local compute to:

Develop your training script.

Reuse the same environment on Azure Machine Learning Compute for model
training at scale.
Deploy your model with that same environment.
Revisit the environment in which an existing model was trained.

The following diagram illustrates how you can use a single Environment object in both
your job configuration (for training) and your inference and deployment configuration
(for web service deployments).

The environment, compute target and training script together form the job
configuration: the full specification of a training job.

Types of environments
Environments can broadly be divided into three categories: curated, user-managed, and
system-managed.

In user-managed environments, you're responsible for setting up your environment and

installing every package that your training script needs on the compute target. Also be
sure to include any dependencies needed for model deployment.

You use system-managed environments when you want conda to manage the Python
environment for you. A new conda environment is materialized from your conda
specification on top of a base docker image.

Create and manage environments

You can create environments from clients like the Azure Machine Learning Python SDK,
Azure Machine Learning CLI, Environments page in Azure Machine Learning studio, and
VS Code extension. Every client allows you to customize the base image, Dockerfile, and
Python layer if needed.

For specific code samples, see the "Create an environment" section of How to use
environments.

Environments are also easily managed through your workspace, which allows you to:

Register environments.
Fetch environments from your workspace to use for training or deployment.
Create a new instance of an environment by editing an existing one.
View changes to your environments over time, which ensures reproducibility.
Build Docker images automatically from your environments.

"Anonymous" environments are automatically registered in your workspace when you

submit an experiment. They will not be listed but may be retrieved by version.

For code samples, see the "Manage environments" section of How to use environments.

Environment building, caching, and reuse

Azure Machine Learning builds environment definitions into Docker images and conda
environments. It also caches the environments so they can be reused in subsequent
training jobs and service endpoint deployments. Running a training script remotely
requires the creation of a Docker image, but a local job can use a conda environment
directly.

Submitting a job using an environment

When you first submit a remote job using an environment, the Azure Machine Learning
service invokes an ACR Build Task on the Azure Container Registry (ACR) associated with
the Workspace. The built Docker image is then cached on the Workspace ACR. Curated
environments are backed by Docker images that are cached in Global ACR. At the start
of the job execution, the image is retrieved by the compute target from the relevant
ACR.

For local jobs, a Docker or conda environment is created based on the environment
definition. The scripts are then executed on the target compute - a local runtime
environment or local Docker engine.

Building environments as Docker images

If the image for a particular environment definition doesn't already exist in the
workspace ACR, a new image will be built. The image build consists of two steps:

1. Downloading a base image, and executing any Docker steps

2. Building a conda environment according to conda dependencies specified in the
environment definition.

The second step is optional, and the environment may instead come from the Docker
build context or base image. In this case you're responsible for installing any Python
packages, by including them in your base image, or specifying custom Docker steps.
You're also responsible for specifying the correct location for the Python executable. It is
also possible to use a custom Docker base image.

Image caching and reuse

If you use the same environment definition for another job, Azure Machine Learning
reuses the cached image from the Workspace ACR to save time.

To view the details of a cached image, check the Environments page in Azure Machine
Learning studio or use MLClient.environments to get and inspect the environment.
To determine whether to reuse a cached image or build a new one, Azure Machine
Learning computes a hash value from the environment definition and compares it to
the hashes of existing environments. The hash is based on the environment definition's:

Base image
Custom docker steps
Python packages
Spark packages

The hash isn't affected by the environment name or version. If you rename your
environment or create a new one with the same settings and packages as another
environment, then the hash value will remain the same. However, environment
definition changes like adding or removing a Python package or changing a package
version will result cause the resulting hash value to change. Changing the order of
dependencies or channels in an environment will also change the hash and require a
new image build. Similarly, any change to a curated environment will result in the
creation of a new "non-curated" environment.

７ Note

You will not be able to submit any local changes to a curated environment without
changing the name of the environment. The prefixes "AzureML-" and "Microsoft"
are reserved exclusively for curated environments, and your job submission will fail
if the name starts with either of them.

The environment's computed hash value is compared with those in the Workspace and
global ACR, or on the compute target (local jobs only). If there is a match then the
cached image is pulled and used, otherwise an image build is triggered.

The following diagram shows three environment definitions. Two of them have different
names and versions but identical base images and Python packages, which results in the
same hash and corresponding cached image. The third environment has different
Python packages and versions, leading to a different hash and cached image.
Actual cached images in your workspace ACR will have names like
azureml/azureml_e9607b2514b066c851012848913ba19f with the hash appearing at the end.

） Important

If you create an environment with an unpinned package dependency (for

example, numpy ), the environment uses the package version that was available
when the environment was created. Any future environment that uses a
matching definition will use the original version.

To update the package, specify a version number to force an image rebuild.

An example of this would be changing numpy to numpy==1.18.1 . New
dependencies--including nested ones--will be installed, and they might break
a previously working scenario.

Using an unpinned base image like mcr.microsoft.com/azureml/openmpi3.1.2-

ubuntu18.04 in your environment definition results in rebuilding the image

every time the latest tag is updated. This helps the image receive the latest
patches and system updates.

Image patching
Microsoft is responsible for patching the base images for known security vulnerabilities.
Updates for supported images are released every two weeks, with a commitment of no
unpatched vulnerabilities older than 30 days in the latest version of the image. Patched
images are released with a new immutable tag and the :latest tag is updated to the
latest version of the patched image.

You'll need to update associated Azure Machine Learning assets to use the newly
patched image. For example, when working with a managed online endpoint, you'll
need to redeploy your endpoint to use the patched image.

If you provide your own images, you're responsible for updating them and updating the
Azure Machine Learning assets that use them.

For more information on the base images, see the following links:

Azure Machine Learning base images GitHub repository.

Use a custom container to deploy a model to an online endpoint
Managing environments and container images

Next steps
Learn how to create and use environments in Azure Machine Learning.
See the Python SDK reference documentation for the environment class.
Manage software environments in Azure
Machine Learning studio
Article • 10/01/2023

In this article, learn how to create and manage Azure Machine Learning environments in
the Azure Machine Learning studio. Use the environments to track and reproduce your
projects' software dependencies as they evolve.

The examples in this article show how to:

Browse curated environments.

Create an environment and specify package dependencies.
Edit an existing environment specification and its properties.
Rebuild an environment and view image build logs.

For a high-level overview of how environments work in Azure Machine Learning, see
What are ML environments? For information, see How to set up a development
environment for Azure Machine Learning.

Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
An Azure Machine Learning workspace.

Browse curated environments

Curated environments contain collections of Python packages and are available in your
workspace by default. These environments are backed by cached Docker images, which
reduce the job preparation cost and support training and inferencing scenarios.

Click on an environment to see detailed information about its contents. For more
information, see Azure Machine Learning curated environments.

Create an environment
To create an environment:

1. Open your workspace in Azure Machine Learning studio .

2. On the left side, select Environments.
3. Select the Custom environments tab.
4. Select the Create button.

Create an environment by selecting one of the following options:

Create a new docker context

Start from an existing environment
Upload existing docker context
Use existing docker image with optional conda file

You can customize the configuration file, add tags and descriptions, and review the
properties before creating the entity.

If a new environment is given the same name as an existing environment in the

workspace, a new version of the existing one will be created.

View and edit environment details

Once an environment has been created, view its details by clicking on the name. Use the
dropdown menu to select different versions of the environment. Here you can view
metadata and the contents of the environment through its various dependencies.

Click on the pencil icons to edit tags, descriptions, configuration files under the Context
tab.
Keep in mind that any changes to the Docker or Conda sections will create a new
version of the environment.

View logs
Click on the Build log tab within the details page to view the logs of an environment
version and the environment log analysis. Environment log analysis is a feature that
provides insight and relevant troubleshooting documentation to explain environment
definition issues or image build failures.

Build log contains the bare output from an Azure Container Registry (ACR) task or
an Image Build Compute job.
Image build analysis is an analysis of the build log used to see the cause of the
image build failure.
Environment definition analysis provides information about the environment
definition if it goes against best practices for reproducibility, supportability, or
security.

For an overview of common build failures, see How to troubleshoot for environments .

If you have feedback on the environment log analysis, file a GitHub issue .

Rebuild an environment
In the details page, click on the rebuild button to rebuild the environment. Any
unpinned package versions in your configuration files may be updated to the most
recent version with this action.
Manage Azure Machine Learning
environments with the CLI & SDK (v2)
Article • 01/03/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

Azure Machine Learning environments define the execution environments for your jobs
or deployments and encapsulate the dependencies for your code. Azure Machine
Learning uses the environment specification to create the Docker container that your
training or scoring code runs in on the specified compute target. You can define an
environment from a conda specification, Docker image, or Docker build context.

In this article, learn how to create and manage Azure Machine Learning environments
using the SDK & CLI (v2).

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:

An Azure Machine Learning workspace. If you don't have one, use the steps in the
Quickstart: Create workspace resources article to create one.

The Azure CLI and the ml extension or the Azure Machine Learning Python SDK v2:

To install the Azure CLI and extension, see Install, set up, and use the CLI (v2).

） Important

The CLI examples in this article assume that you are using the Bash (or
compatible) shell. For example, from a Linux system or Windows
Subsystem for Linux.

To install the Python SDK v2, use the following command:

Bash

pip install azure-ai-ml azure-identity

To update an existing installation of the SDK to the latest version, use the
following command:

Bash

pip install --upgrade azure-ai-ml azure-identity

For more information, see Install the Python SDK v2 for Azure Machine
Learning .

 Tip

For a full-featured development environment, use Visual Studio Code and the
Azure Machine Learning extension to manage Azure Machine Learning resources
and train machine learning models.

Clone examples repository

To run the training examples, first clone the examples repository. For the CLI examples,
change into the cli directory. For the SDK examples, change into the
sdk/python/assets/environment directory:

Azure CLI

git clone --depth 1 https://github.com/Azure/azureml-examples

Note that --depth 1 clones only the latest commit to the repository, which reduces time
to complete the operation.

Connect to the workspace

 Tip

Use the tabs below to select the method you want to use to work with
environments. Selecting a tab will automatically switch all the tabs in this article to
the same tab. You can select another tab at any time.

Azure CLI
When using the Azure CLI, you need identifier parameters - a subscription, resource
group, and workspace name. While you can specify these parameters for each
command, you can also set defaults that will be used for all the commands. Use the
following commands to set default values. Replace <subscription ID> , <Azure
Machine Learning workspace name> , and <resource group> with the values for your

configuration:

Azure CLI

az account set --subscription <subscription ID>

az configure --defaults workspace=<Azure Machine Learning workspace
name> group=<resource group>

Curated environments
There are two types of environments in Azure Machine Learning: curated and custom
environments. Curated environments are predefined environments containing popular
ML frameworks and tooling. Custom environments are user-defined and can be created
via az ml environment create .

Curated environments are provided by Azure Machine Learning and are available in your
workspace by default. Azure Machine Learning routinely updates these environments
with the latest framework version releases and maintains them for bug fixes and security
patches. They're backed by cached Docker images, which reduce job preparation cost
and model deployment time.

You can use these curated environments out of the box for training or deployment by
referencing a specific environment using the azureml:<curated-environment-name>:
<version> or azureml:<curated-environment-name>@latest syntax. You can also use them

as reference for your own custom environments by modifying the Dockerfiles that back
these curated environments.

You can see the set of available curated environments in the Azure Machine Learning
studio UI, or by using the CLI (v2) via az ml environment list .

Create an environment
You can define an environment from a Docker image, a Docker build context, and a
conda specification with Docker image.
Create an environment from a Docker image
To define an environment from a Docker image, provide the image URI of the image
hosted in a registry such as Docker Hub or Azure Container Registry.

Azure CLI

The following example is a YAML specification file for an environment defined from
a Docker image. An image from the official PyTorch repository on Docker Hub is
specified via the image property in the YAML file.

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-example
image: pytorch/pytorch:latest
description: Environment created from a Docker image.

To create the environment:

cli

az ml environment create --file assets/environment/docker-image.yml

 Tip

Azure Machine Learning maintains a set of CPU and GPU Ubuntu Linux-based base
images with common system dependencies. For example, the GPU images contain
Miniconda, OpenMPI, CUDA, cuDNN, and NCCL. You can use these images for your
environments, or use their corresponding Dockerfiles as reference when building
your own custom images.

For the set of base images and their corresponding Dockerfiles, see the AzureML-
Containers repo .

Create an environment from a Docker build context

Instead of defining an environment from a prebuilt image, you can also define one from
a Docker build context . To do so, specify the directory that will serve as the build
context. This directory should contain a Dockerfile (not larger than 1MB) and any other
files needed to build the image.

Azure CLI

The following example is a YAML specification file for an environment defined from
a build context. The local path to the build context folder is specified in the
build.path field, and the relative path to the Dockerfile within that build context

folder is specified in the build.dockerfile_path field. If build.dockerfile_path is

omitted in the YAML file, Azure Machine Learning will look for a Dockerfile named
Dockerfile at the root of the build context.

In this example, the build context contains a Dockerfile named Dockerfile and a
requirements.txt file that is referenced within the Dockerfile for installing Python

packages.

YAML

To create the environment:

cli

az ml environment create --file assets/environment/docker-context.yml

Azure Machine Learning will start building the image from the build context when the
environment is created. You can monitor the status of the build and view the build logs
in the studio UI.

Create an environment from a conda specification

You can define an environment using a standard conda YAML configuration file that
includes the dependencies for the conda environment. See Creating an environment
manually for information on this standard format.

You must also specify a base Docker image for this environment. Azure Machine
Learning will build the conda environment on top of the Docker image provided. If you
install some Python dependencies in your Docker image, those packages won't exist in
the execution environment thus causing runtime failures. By default, Azure Machine
Learning will build a Conda environment with dependencies you specified, and will
execute the job in that environment instead of using any Python libraries that you
installed on the base image.

Azure CLI

The following example is a YAML specification file for an environment defined from
a conda specification. Here the relative path to the conda file from the Azure
Machine Learning environment YAML file is specified via the conda_file property.
You can alternatively define the conda specification inline using the conda_file
property, rather than defining it in a separate file.

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-plus-conda-example
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: conda-yamls/pydata.yml
description: Environment created from a Docker image plus Conda
environment.

To create the environment:

cli

az ml environment create --file assets/environment/docker-image-plus-

conda.yaml

Azure Machine Learning will build the final Docker image from this environment
specification when the environment is used in a job or deployment. You can also
manually trigger a build of the environment in the studio UI.

Manage environments
The SDK and CLI (v2) also allow you to manage the lifecycle of your Azure Machine
Learning environment assets.

List
List all the environments in your workspace:

Azure CLI

cli

az ml environment list

List all the environment versions under a given name:

Azure CLI

cli

az ml environment list --name docker-image-example

Show
Get the details of a specific environment:

Azure CLI

cli

az ml environment show --name docker-image-example --version 1

Update
Update mutable properties of a specific environment:

Azure CLI

cli

az ml environment update --name docker-image-example --version 1 --set

description="This is an updated description."
） Important

For environments, only description and tags can be updated. All other properties
are immutable; if you need to change any of those properties you should create a
new version of the environment.

Archive
Archiving an environment will hide it by default from list queries ( az ml environment
list ). You can still continue to reference and use an archived environment in your
workflows. You can archive either all versions of an environment or only a specific
version.

If you don't specify a version, all versions of the environment under that given name will
be archived. If you create a new environment version under an archived environment
container, that new version will automatically be set as archived as well.

Archive all versions of an environment:

Azure CLI

cli

az ml environment archive --name docker-image-example

Archive a specific environment version:

Azure CLI

cli

az ml environment archive --name docker-image-example --version 1

Use environments for training

Azure CLI
To use an environment for a training job, specify the environment field of the job
YAML configuration. You can either reference an existing registered Azure Machine
Learning environment via environment: azureml:<environment-name>:<environment-
version> or environment: azureml:<environment-name>@latest (to reference the

latest version of an environment), or define an environment specification inline. If

defining an environment inline, don't specify the name and version fields, as these
environments are treated as "unregistered" environments and aren't tracked in your
environment asset registry.

When you submit a training job, the building of a new environment can take several
minutes. The duration depends on the size of the required dependencies. The
environments are cached by the service. So as long as the environment definition
remains unchanged, you incur the full setup time only once.

For more information on how to use environments in jobs, see Train models.

Use environments for model deployments

Azure CLI

You can also use environments for your model deployments for both online and
batch scoring. To do so, specify the environment field in the deployment YAML
configuration.

For more information on how to use environments in deployments, see Deploy and
score a machine learning model by using an online endpoint.

Next steps
Train models (create jobs)
Deploy and score a machine learning model by using an online endpoint
Environment YAML schema reference
Create custom curated Azure Container
for PyTorch (ACPT) environments in
Azure Machine Learning studio
Article • 03/21/2023

If you're looking to extend curated environment and add Hugging Face (HF)
transformers or datasets or any other external packages to be installed, Azure Machine
Learning offers to create a new env with docker context containing ACPT curated
environment as base image and additional packages on top of it as below.

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free

account before you begin. Try the free or paid version of Azure Machine
Learning .

An Azure Machine Learning workspace. If you don't have one, use the steps in the
Quickstart: Create workspace resources article to create one.

Navigate to environments
In the Azure Machine Learning studio , navigate to the "Environments" section by
selecting the "Environments" option.


Navigate to curated environments

Navigate to curated environments and search "acpt" to list all the available ACPT
curated environments. Selecting the environment shows details of the environment.

Get details of the curated environments

To create custom environment, you need the base docker image repository, which can
be found in the "Description" section as "Azure Container Registry". Copy the "Azure
Container Registry" name, which is used later when you create a new custom
environment.


Navigate to custom environments

Go back and select the " Custom Environments" tab.


Create custom environments

Select + Create. In the "Create Environment" window, name the environment,
description and select "Create a new docker context" in Select environments type
section.

Paste the docker image name that you copied in previously. Configure your
environment by declaring the base image and add any env variables you want to use
and the packages that you want to include.


Review your environment settings, add any tags if needed and select on the Create
button to create your custom environment.

That's it! You've now created a custom environment in Azure Machine Learning studio
and can use it to run your machine learning models.

Next steps
Learn more about environment objects:
What are Azure Machine Learning environments? .
Learn more about curated environments.
Learn more about training models in Azure Machine Learning.
Azure Container for PyTorch (ACPT) reference
How to create and manage files in your
workspace
Article • 04/13/2023

Learn how to create and manage the files in your Azure Machine Learning workspace.
These files are stored in the default workspace storage. Files and folders can be shared
with anyone else with read access to the workspace, and can be used from any compute
instances in the workspace.

Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. Create workspace resources.

Create files
To create a new file in your default folder ( Users > yourname ):

1. Open your workspace in Azure Machine Learning studio .

2. On the left side, select Notebooks.

3. Select the + tool.

4. Select Create new file.


5. Name the file.

6. Select a file type.

7. Select Create.

Notebooks and most text file types display in the preview section. Most other file types
don't have a preview.

 Tip

If you don't see the correct preview for a notebook, make sure it has .ipynb as its
extension. Hover over the filename in the list to select ... if you need to rename the
file.

To create a new file in a different folder:

1. Select the "..." at the end of the folder.

2. Select Create new file.

） Important

Content in notebooks and scripts can potentially read data from your sessions and
access data without your organization in Azure. Only load files from trusted
sources. For more information, see Secure code best practices.

Customize your file editing experience

In the Azure Machine Learning studio file editor, you can customize your editing
experience with Command Palette and relevant keyboard shortcuts. When you invoke
the Command Palette, you will see a selectable list of many options to customize your
editing experience.
To invoke the Command Palette on a file, either use F1 or right-select anywhere in the
editing space and select Command Palette from the menu.

For example, choose "Indent using spaces" if you want your editor to auto-indent with
spaces instead of tabs. Take a few moments to explore the different options you have in
the Command Palette.

Manage files with Git

Use a compute instance terminal to clone and manage Git repositories. To integrate Git
with your Azure Machine Learning workspace, see Git integration for Azure Machine
Learning.

Clone samples
Your workspace contains a Samples folder with notebooks designed to help you explore
the SDK and serve as examples for your own machine learning projects. Clone these
notebooks into your own folder to run and edit them.

Share files
Copy and paste the URL to share a file. Only other users of the workspace can access
this URL. Learn more about granting access to your workspace.

Delete a file
You can't delete the Samples files. These files are part of the studio and are updated
each time a new SDK is published.

You can delete files found in your Files section in any of these ways:

In the studio, select the ... at the end of a folder or file. Make sure to use a
supported browser (Microsoft Edge, Chrome, or Firefox).
Use a terminal from any compute instance in your workspace. The folder
~/cloudfiles is mapped to storage on your workspace storage account.
In either Jupyter or JupyterLab with their tools.

Next steps
Run Jupyter notebooks in your workspace
Access a compute instance terminal in your workspace
Run Jupyter notebooks in your
workspace
Article • 09/26/2023

This article shows how to run your Jupyter notebooks inside your workspace of Azure
Machine Learning studio. There are other ways to run the notebook as well: Jupyter ,
JupyterLab , and Visual Studio Code. VS Code Desktop can be configured to access
your compute instance. Or use VS Code for the Web, directly from the browser, and
without any required installations or dependencies.

We recommend you try VS Code for the Web to take advantage of the easy integration
and rich development environment it provides. VS Code for the Web gives you many of
the features of VS Code Desktop that you love, including search and syntax highlighting
while browsing and editing. For more information about using VS Code Desktop and VS
Code for the Web, see Launch Visual Studio Code integrated with Azure Machine
Learning (preview) and Work in VS Code remotely connected to a compute instance
(preview).

No matter which solution you use to run the notebook, you'll have access to all the files
from your workspace. For information on how to create and manage files, including
notebooks, see Create and manage files in your workspace.

This rest of this article shows the experience for running the notebook directly in studio.

） Important

Features marked as (preview) are provided without a service level agreement, and
it's not recommended for production workloads. Certain features might not be
supported or might have constrained capabilities. For more information, see
Supplemental Terms of Use for Microsoft Azure Previews .

Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. See Create workspace resources.
Your user identity must have access to your workspace's default storage account.
Whether you can read, edit, or create notebooks depends on your access level to
your workspace. For example, a Contributor can edit the notebook, while a Reader
could only view it.

Access notebooks from your workspace

Use the Notebooks section of your workspace to edit and run Jupyter notebooks.

1. Sign into Azure Machine Learning studio

2. Select your workspace, if it isn't already open
3. On the left, select Notebooks

Edit a notebook
To edit a notebook, open any notebook located in the User files section of your
workspace. Select the cell you wish to edit. If you don't have any notebooks in this
section, see Create and manage files in your workspace.

You can edit the notebook without connecting to a compute instance. When you want
to run the cells in the notebook, select or create a compute instance. If you select a
stopped compute instance, it will automatically start when you run the first cell.

When a compute instance is running, you can also use code completion, powered by
Intellisense , in any Python notebook.

You can also launch Jupyter or JupyterLab from the notebook toolbar. Azure Machine
Learning doesn't provide updates and fix bugs from Jupyter or JupyterLab as they're
Open Source products outside of the boundary of Microsoft Support.

Focus mode
Use focus mode to expand your current view so you can focus on your active tabs.
Focus mode hides the Notebooks file explorer.

1. In the terminal window toolbar, select Focus mode to turn on focus mode.
Depending on your window width, the tool may be located under the ... menu item
in your toolbar.

2. While in focus mode, return to the standard view by selecting Standard view.
Code completion (IntelliSense)
IntelliSense is a code-completion aid that includes many features: List Members,
Parameter Info, Quick Info, and Complete Word. With only a few keystrokes, you can:

Learn more about the code you're using

Keep track of the parameters you're typing
Add calls to properties and methods

Share a notebook
Your notebooks are stored in your workspace's storage account, and can be shared with
others, depending on their access level to your workspace. They can open and edit the
notebook as long as they have the appropriate access. For example, a Contributor can
edit the notebook, while a Reader could only view it.

Other users of your workspace can find your notebook in the Notebooks, User files
section of Azure Machine Learning studio. By default, your notebooks are in a folder
with your username, and others can access them there.

You can also copy the URL from your browser when you open a notebook, then send to
others. As long as they have appropriate access to your workspace, they can open the
notebook.

Since you don't share compute instances, other users who run your notebook will do so
on their own compute instance.

Collaborate with notebook comments

(preview)
Use a notebook comment to collaborate with others who have access to your notebook.
Toggle the comments pane on and off with the Notebook comments tool at the top of
the notebook. If your screen isn't wide enough, find this tool by first selecting the ... at
the end of the set of tools.

Whether the comments pane is visible or not, you can add a comment into any code
cell:

1. Select some text in the code cell. You can only comment on text in a code cell.
2. Use the New comment thread tool to create your comment.

3. If the comments pane was previously hidden, it will now open.

4. Type your comment and post it with the tool or use Ctrl+Enter.
5. Once a comment is posted, select ... in the top right to:

Edit the comment

Resolve the thread
Delete the thread

Text that has been commented will appear with a purple highlight in the code. When
you select a comment in the comments pane, your notebook will scroll to the cell that
contains the highlighted text.

７ Note

Comments are saved into the code cell's metadata.

Clean your notebook (preview)

Over the course of creating a notebook, you typically end up with cells you used for
data exploration or debugging. The gather feature will help you produce a clean
notebook without these extraneous cells.

1. Run all of your notebook cells.

2. Select the cell containing the code you wish the new notebook to run. For
example, the code that submits an experiment, or perhaps the code that registers a
model.
3. Select the Gather icon that appears on the cell toolbar.

4. Enter the name for your new "gathered" notebook.

The new notebook contains only code cells, with all cells required to produce the same
results as the cell you selected for gathering.

Save and checkpoint a notebook

Azure Machine Learning creates a checkpoint file when you create an ipynb file.

In the notebook toolbar, select the menu and then File>Save and checkpoint to
manually save the notebook and it will add a checkpoint file associated with the
notebook.
Every notebook is autosaved every 30 seconds. AutoSave updates only the initial ipynb fi
le, not the checkpoint file.

Select Checkpoints in the notebook menu to create a named checkpoint and to revert
the notebook to a saved checkpoint.

Export a notebook
In the notebook toolbar, select the menu and then Export As to export the notebook as
any of the supported types:

Notebook
Python
HTML
LaTeX
The exported file is saved on your computer.

Run a notebook or Python script

To run a notebook or a Python script, you first connect to a running compute instance.

If you don't have a compute instance, use these steps to create one:

1. In the notebook or script toolbar, to the right of the Compute dropdown,

select + New Compute. Depending on your screen size, this may be located
under a ... menu.

2. Name the Compute and choose a Virtual Machine Size.

3. Select Create.
4. The compute instance is connected to the file automatically. You can now run
the notebook cells or the Python script using the tool to the left of the
compute instance.
If you have a stopped compute instance, select Start compute to the right of the
Compute dropdown. Depending on your screen size, this may be located under a
... menu.

Once you're connected to a compute instance, use the toolbar to run all cells in the
notebook, or Control + Enter to run a single selected cell.

Only you can see and use the compute instances you create. Your User files are stored
separately from the VM and are shared among all compute instances in the workspace.

Explore variables in the notebook

On the notebook toolbar, use the Variable explorer tool to show the name, type, length,
and sample values for all variables that have been created in your notebook.

Select the tool to show the variable explorer window.

Navigate with a TOC

On the notebook toolbar, use the Table of contents tool to display or hide the table of
contents. Start a markdown cell with a heading to add it to the table of contents. Select
an entry in the table to scroll to that cell in the notebook.

Change the notebook environment

The notebook toolbar allows you to change the environment on which your notebook
runs.

These actions won't change the notebook state or the values of any variables in the
notebook:

Action Result

Stop the kernel Stops any running cell. Running a cell will automatically
restart the kernel.

Navigate to another workspace Running cells are stopped.

section

These actions will reset the notebook state and will reset all variables in the notebook.

Action Result

Change the kernel Notebook uses new kernel

Action Result

Switch compute Notebook automatically uses the new compute.

Reset compute Starts again when you try to run a cell

Stop compute No cells will run

Open notebook in Jupyter or JupyterLab Notebook opened in a new tab.

Add new kernels

Use the terminal to create and add new kernels to your compute instance. The
notebook will automatically find all Jupyter kernels installed on the connected compute
instance.

Use the kernel dropdown on the right to change to any of the installed kernels.

Manage packages
Since your compute instance has multiple kernels, make sure use %pip or %conda magic
functions , which install packages into the currently running kernel. Don't use !pip or
!conda , which refers to all packages (including packages outside the currently running

kernel).

Status indicators
An indicator next to the Compute dropdown shows its status. The status is also shown
in the dropdown itself.

Color Compute status

Green Compute running

Red Compute failed

Black Compute stopped

Light Blue Compute creating, starting, restarting, setting Up

Gray Compute deleting, stopping

An indicator next to the Kernel dropdown shows its status.

Color Kernel status

Green Kernel connected, idle, busy

Gray Kernel not connected

Find compute details

Find details about your compute instances on the Compute page in studio .

Useful keyboard shortcuts

Similar to Jupyter Notebooks, Azure Machine Learning studio notebooks have a modal
user interface. The keyboard does different things depending on which mode the
notebook cell is in. Azure Machine Learning studio notebooks support the following two
modes for a given code cell: command mode and edit mode.

Command mode shortcuts

A cell is in command mode when there's no text cursor prompting you to type. When a
cell is in Command mode, you can edit the notebook as a whole but not type into
individual cells. Enter command mode by pressing ESC or using the mouse to select
outside of a cell's editor area. The left border of the active cell is blue and solid, and its
Run button is blue.

Shortcut Description

Enter Enter edit mode

Shift + Enter Run cell, select below

Control/Command + Enter Run cell

Alt + Enter Run cell, insert code cell below

Control/Command + Alt + Enter Run cell, insert markdown cell below

Alt + R Run all

Y Convert cell to code

Shortcut Description

M Convert cell to markdown

Up/K Select cell above

Down/J Select cell below

A Insert code cell above

B Insert code cell below

Control/Command + Shift + A Insert markdown cell above

Control/Command + Shift + B Insert markdown cell below

X Cut selected cell

C Copy selected cell

Shift + V Paste selected cell above

V Paste selected cell below

DD Delete selected cell

O Toggle output

Shift + O Toggle output scrolling

II Interrupt kernel

00 Restart kernel

Shift + Space Scroll up

Space Scroll down

Tab Change focus to next focusable item (when tab trap disabled)

Control/Command + S Save notebook

1 Change to h1

2 Change to h2

3 Change to h3

4 Change to h4

5 Change to h5

6 Change to h6
Edit mode shortcuts
Edit mode is indicated by a text cursor prompting you to type in the editor area. When a
cell is in edit mode, you can type into the cell. Enter edit mode by pressing Enter or
select a cell's editor area. The left border of the active cell is green and hatched, and its
Run button is green. You also see the cursor prompt in the cell in Edit mode.

Using the following keystroke shortcuts, you can more easily navigate and run code in
Azure Machine Learning notebooks when in Edit mode.

Shortcut Description

Escape Enter command mode

Control/Command + Space Activate IntelliSense

Shift + Enter Run cell, select below

Control/Command + Enter Run cell

Alt + Enter Run cell, insert code cell below

Control/Command + Alt + Enter Run cell, insert markdown cell below

Alt + R Run all cells

Up Move cursor up or previous cell

Down Move cursor down or next cell

Control/Command + S Save notebook

Control/Command + Up Go to cell start

Control/Command + Down Go to cell end

Tab Code completion or indent (if tab trap enabled)

Control/Command + M Enable/disable tab trap

Control/Command + ] Indent

Control/Command + [ Dedent

Control/Command + A Select all

Shortcut Description

Control/Command + Z Undo

Control/Command + Shift + Z Redo

Control/Command + Y Redo

Control/Command + Home Go to cell start

Control/Command + End Go to cell end

Control/Command + Left Go one word left

Control/Command + Right Go one word right

Control/Command + Backspace Delete word before

Control/Command + Delete Delete word after

Control/Command + / Toggle comment on cell

Troubleshooting
Connecting to a notebook: If you can't connect to a notebook, ensure that web
socket communication is not disabled. For compute instance Jupyter functionality
to work, web socket communication must be enabled. Ensure your network allows
websocket connections to *.instances.azureml.net and *.instances.azureml.ms.

Private endpoint: When a compute instance is deployed in a workspace with a

private endpoint, it can be only be accessed from within virtual network. If you're
using custom DNS or hosts file, add an entry for < instance-name >.< region
>.instances.azureml.ms with the private IP address of your workspace private
endpoint. For more information, see the custom DNS article.

Kernel crash: If your kernel crashed and was restarted, you can run the following
command to look at Jupyter log and find out more details: sudo journalctl -u
jupyter . If kernel issues persist, consider using a compute instance with more

memory.

Kernel not found or Kernel operations were disabled: When using the default
Python 3.8 kernel on a compute instance, you may get an error such as "Kernel not
found" or "Kernel operations were disabled". To fix, use one of the following
methods:
Create a new compute instance. This will use a new image where this problem
has been resolved.
Use the Py 3.6 kernel on the existing compute instance.
From a terminal in the default py38 environment, run pip install
ipykernel==6.6.0 OR pip install ipykernel==6.0.3

Expired token: If you run into an expired token issue, sign out of your Azure
Machine Learning studio, sign back in, and then restart the notebook kernel.

File upload limit: When uploading a file through the notebook's file explorer,
you're limited files that are smaller than 5 TB. If you need to upload a file larger
than this, we recommend that you use the SDK to upload the data to a datastore.
For more information, see Create data assets.

Next steps
Run your first experiment
Backup your file storage with snapshots
Working in secure environments
Access a compute instance terminal in
your workspace
Article • 12/28/2023

Access the terminal of a compute instance in your workspace to:

Use files from Git and version files. These files are stored in your workspace file
system, not restricted to a single compute instance.
Install packages on the compute instance.
Create extra kernels on the compute instance.

Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. See Create workspace resources.

Access a terminal
To access the terminal:

1. Open your workspace in Azure Machine Learning studio .

2. On the left side, select Notebooks.

3. Select the Open terminal image.

4. When a compute instance is running, the terminal window for that compute
instance appears.

5. When no compute instance is running, use the Compute section on the right to
start or create a compute instance.
In addition to the steps above, you can also access the terminal from:

RStudio or Posit Workbench (formerly RStudio Workbench) (See Add custom

applications such as RStudio or Posit Workbench)): Select the Terminal tab on top
left.
Jupyter Lab: Select the Terminal tile under the Other heading in the Launcher tab.
Jupyter: Select New>Terminal on top right in the Files tab.
SSH to the machine, if you enabled SSH access when the compute instance was
created.

Copy and paste in the terminal

Windows: Ctrl-Insert to copy and use Ctrl-Shift-v or Shift-Insert to
paste.
Mac OS: Cmd-c to copy and Cmd-v to paste.
FireFox/IE may not support clipboard permissions properly.

Use files from Git and version files

Access all Git operations from the terminal. All Git files and folders will be stored in your
workspace file system. This storage allows you to use these files from any compute
instance in your workspace.

７ Note

Add your files and folders anywhere under the ~/cloudfiles/code/Users folder so
they will be visible in all your Jupyter environments.

To integrate Git with your Azure Machine Learning workspace, see Git integration for
Azure Machine Learning.

Install packages
Install packages from a terminal window. Install Python packages into the Python 3.8 -
AzureML environment. Install R packages into the R environment.

Or you can install packages directly in Jupyter Notebook, RStudio, or Posit Workbench
(formerly RStudio Workbench):

RStudio or Posit Workbench(see Add custom applications such as RStudio or Posit

Workbench): Use the Packages tab on the bottom right, or the Console tab on the
top left.
Python: Add install code and execute in a Jupyter Notebook cell.

７ Note

For package management within a notebook, use %pip or %conda magic functions
to automatically install packages into the currently-running kernel, rather than !pip
or !conda which refers to all packages (including packages outside the currently-
running kernel)

Add new kernels

２ Warning

While customizing the compute instance, make sure you do not delete the
azureml_py36 or azureml_py38 conda environments. Also do not delete Python
3.6 - AzureML or Python 3.8 - AzureML kernels. These are needed for
Jupyter/JupyterLab functionality.

To add a new Jupyter kernel to the compute instance:

1. Use the terminal window to create a new environment. For example, the code
below creates newenv :

shell

conda create --name newenv

2. Activate the environment. For example, after creating newenv :

shell
conda activate newenv

3. Install pip and ipykernel package to the new environment and create a kernel for
that conda env

shell

conda install pip

conda install ipykernel
python -m ipykernel install --user --name newenv --display-name "Python
(newenv)"

Any of the available Jupyter Kernels can be installed.

To add a new R kernel to the compute instance:

1. Use the terminal window to create a new environment. For example, the code
below creates r_env :

shell

conda create -n r_env r-essentials r-base

2. Activate the environment. For example, after creating r_env :

shell

conda activate r_env

3. Run R in the new environment:

4. At the R prompt, run IRkernel :

IRkernel::installspec(name = 'irenv', displayname = 'New R Env')

5. Quit the R session.

q()

It will take a few minutes before the new R kernel is ready to use. If you get an error
saying it is invalid, wait and then try again.

For more information about conda, see Using R language with Anaconda . For more
information about IRkernel, see Native R kernel for Jupyter .

Remove added kernels

２ Warning

To remove an added Jupyter kernel from the compute instance, you must remove the
kernelspec, and (optionally) the conda environment. You can also choose to keep the
conda environment. You must remove the kernelspec, or your kernel will still be
selectable and cause unexpected behavior.

To remove the kernelspec:

1. Use the terminal window to list and find the kernelspec:

shell

jupyter kernelspec list

2. Remove the kernelspec, replacing UNWANTED_KERNEL with the kernel you'd like
to remove:

shell

jupyter kernelspec uninstall UNWANTED_KERNEL

To also remove the conda environment:

1. Use the terminal window to list and find the conda environment:
shell

conda env list

2. Remove the conda environment, replacing ENV_NAME with the conda

environment you'd like to remove:

shell

conda env remove -n ENV_NAME

Upon refresh, the kernel list in your notebooks view should reflect the changes you have
made.

Manage terminal sessions

Terminal sessions can stay active if terminal tabs are not properly closed. Too many
active terminal sessions can impact the performance of your compute instance.

Select Manage active sessions in the terminal toolbar to see a list of all active terminal
sessions and shut down the sessions you no longer need.

Learn more about how to manage sessions running on your compute at Managing
notebook and terminal sessions.

２ Warning

Make sure you close any sessions you no longer need to preserve your compute
instance's resources and optimize your performance.
Manage notebook and terminal sessions
Article • 01/19/2023

Notebook and terminal sessions run on the compute and maintain your current working
state.

When you reopen a notebook, or reconnect to a terminal session, you can reconnect to
the previous session state (including command history, execution history, and defined
variables). However, too many active sessions may slow down the performance of your
compute. With too many active sessions, you may find your terminal or notebook cell
typing lags, or terminal or notebook command execution may feel slower than
expected.

Use the session management panel in Azure Machine Learning studio to help you
manage your active sessions and optimize the performance of your compute instance.
Navigate to this session management panel from the compute toolbar of either a
terminal tab or a notebook tab.

７ Note

For optimal performance, we recommend you don’t keep more than six active
sessions - and the fewer the better.

Notebook sessions
In the session management panel, select a linked notebook name in the notebook
sessions section to reopen a notebook with its previous state.

Notebook sessions are kept active when you close a notebook tab in the Azure Machine
Learning studio. So, when you reopen a notebook you'll have access to previously
defined variables and execution state - in this case, you're benefitting from the active
notebook session.

However, keeping too many active notebook sessions can slow down the performance
of your compute. So, you should use the session management panel to shut down any
notebook sessions you no longer need.

Select Manage active sessions in the terminal toolbar to open the session management
panel and shut down the sessions you no longer need. In the following image, you can
see that the tooltip shows the count of active notebook sessions.

Terminal sessions
In the session management panel, you can select on a terminal link to reopen a terminal
tab connected to that previous terminal session.

In contrast to notebook sessions, terminal sessions are terminated when you close a
terminal tab. However, if you navigate away from the Azure Machine Learning studio
without closing a terminal tab, the session may remain open. You should be shut down
any terminal sessions you no longer need by using the session management panel.

Next steps
How to create and manage files in your workspace
Run Jupyter notebooks in your workspace
Access a compute instance terminal in your workspace
Launch Visual Studio Code integrated
with Azure Machine Learning (preview)
Article • 06/15/2023

In this article, you learn how to launch Visual Studio Code remotely connected to an
Azure Machine Learning compute instance. Use VS Code as your integrated
development environment (IDE) with the power of Azure Machine Learning resources.
Use VS Code in the browser with VS Code for the Web, or use the VS Code desktop
application.

） Important

There are two ways you can connect to a compute instance from Visual Studio Code. We
recommend the first approach.

1. Use VS Code as your workspace's integrated development environment (IDE).

This option provides you with a full-featured development environment for
building your machine learning projects.

You can open VS Code from your workspace either in the browser VS Code
for the Web or desktop application VS Code Desktop.
We recommend VS Code for the Web, as you can do all your machine
learning work directly from the browser, and without any required
installations or dependencies.

2. Remote Jupyter Notebook server. This option allows you to set a compute
instance as a remote Jupyter Notebook server. This option is only available in VS
Code (Desktop).

） Important

To connect to a compute instance behind a firewall, see Configure inbound and

outbound network traffic.
Prerequisites
Before you get started, you need:

1. An Azure Machine Learning workspace and compute instance. Complete Create

resources you need to get started to create them both.

2. Sign in to studio and select your workspace if it's not already open.

3. In the Manage preview features panel, scroll down and enable Connect compute
instances to Visual Studio Code for the Web.

Use VS Code as your workspace IDE

Use one of these options to connect VS Code to your compute instance and workspace
files.

Studio -> VS Code (Web)

VS Code for the Web provides you with a full-featured development environment
for building your machine learning projects, all from the browser and without
required installations or dependencies. And by connecting your Azure Machine
Learning compute instance, you get the rich and integrated development
experience VS Code offers, enhanced by the power of Azure Machine Learning.

Launch VS Code for the Web with one select from the Azure Machine Learning
studio, and seamlessly continue your work.

Sign in to Azure Machine Learning studio and follow the steps to launch a VS
Code (Web) browser tab, connected to your Azure Machine Learning compute
instance.

You can create the connection from either the Notebooks or Compute section of
Azure Machine Learning studio.

Notebooks

1. Select the Notebooks tab.

2. In the Notebooks tab, select the file you want to edit.

3. If the compute instance is stopped, select Start compute and wait until
it's running.

4. Select Editors > Edit in VS Code (Web).

Compute

1. Select the Compute tab

2. If the compute instance you wish to use is stopped, select it and then
select Start.
3. Once the compute instance is running, in the Applications column, select
VS Code (Web).


If you pick one of the click-out experiences, a new VS Code window is opened, and a
connection attempt made to the remote compute instance. When attempting to make
this connection, the following steps are taking place:

1. Authorization. Some checks are performed to make sure the user attempting to
make a connection is authorized to use the compute instance.
2. VS Code Remote Server is installed on the compute instance.
3. A WebSocket connection is established for real-time interaction.

Once the connection is established, it's persisted. A token is issued at the start of the
session, which gets refreshed automatically to maintain the connection with your
compute instance.

After you connect to your remote compute instance, use the editor to:

Author and manage files on your remote compute instance or file share .
Use the VS Code integrated terminal to run commands and applications on your
remote compute instance.
Debug your scripts and applications
Use VS Code to manage your Git repositories

Remote Jupyter Notebook server

This option allows you to use a compute instance as a remote Jupyter Notebook server
from Visual Studio Code (Desktop). This option connects only to the compute instance,
not the rest of the workspace. You won't see your workspace files in VS Code when
using this option.

In order to configure a compute instance as a remote Jupyter Notebook server, first

install:

Azure Machine Learning Visual Studio Code extension. For more information, see
the Azure Machine Learning Visual Studio Code Extension setup guide.

To connect to a compute instance:

1. Open a Jupyter Notebook in Visual Studio Code.

2. When the integrated notebook experience loads, choose Select Kernel.



Alternatively, use the command palette:

a. Select View > Command Palette from the menu bar to open the command
palette.
b. Enter into the text box AzureML: Connect to Compute instance Jupyter server .

3. Choose Azure ML Compute Instances from the list of Jupyter server options.

4. Select your subscription from the list of subscriptions. If you have previously
configured your default Azure Machine Learning workspace, this step is skipped.

5. Select your workspace.

6. Select your compute instance from the list. If you don't have one, select Create
new Azure Machine Learning Compute Instance and follow the prompts to create
one.

7. For the changes to take effect, you have to reload Visual Studio Code.

8. Open a Jupyter Notebook and run a cell.

） Important

You MUST run a cell in order to establish the connection.

At this point, you can continue to run cells in your Jupyter Notebook.

 Tip

You can also work with Python script files (.py) containing Jupyter-like code cells.
For more information, see the Visual Studio Code Python interactive
documentation .
Next steps
Now that you've launched Visual Studio Code remotely connected to a compute
instance, you can prep your data, edit and debug your code, and submit training jobs
with the Azure Machine Learning extension.

To learn more about how to make the most of VS Code integrated with Azure Machine
Learning, see Work in VS Code remotely connected to a compute instance (preview).
Work in VS Code remotely connected to
a compute instance (preview)
Article • 05/23/2023

In this article, learn specifics of working within a VS Code remote connection to an Azure
Machine Learning compute instance. Use VS Code as your full-featured integrated
development environment (IDE) with the power of Azure Machine Learning resources.
You can work with a remote connection to your compute instance in the browser with
VS Code for the Web, or the VS Code desktop application.

We recommend VS Code for the Web, as you can do all your machine learning
work directly from the browser, and without any required installations or
dependencies.

） Important

To connect to a compute instance behind a firewall, see Configure inbound and

outbound network traffic.

Prerequisites
Before you get started, you will need:

An Azure Machine Learning workspace and compute instance. Complete Create

resources you need to get started to create them both.

Set up your remotely connected IDE

VS Code has multiple extensions that can help you achieve your machine learning goals.
Use the Azure extension to connect and work with your Azure subscription. Use the
Azure Machine Learning extension to view, update and create workspace assets like
computes, data, environments, jobs and more.

When you use VS Code for the Web, the latest versions of these extensions are
automatically available to you. If you use the desktop application, you may need to
install them.

When you launch VS Code connected to a compute instance for the first time, make
sure you follow these steps and take a few moments to orient yourself to the tools in
your integrated development environment.

1. Locate the Azure extension and sign in

2. Once your subscriptions are listed, you can filter to the ones you use frequently.
You can also pin workspaces you use most often within the subscriptions.

3. The workspace you launched the VS Code remote connection from (the workspace
the compute instance is in) should be automatically set as the default. You can
update the default workspace from the VS Code status bar.

4. If you plan to use the Azure Machine Learning CLI, open a terminal from the menu,
and sign in to the Azure Machine Learning CLI using az login --identity .
Subsequent times you connect to this compute instance, you shouldn't have to repeat
these steps.

Connect to a kernel
There are a few ways to connect to a Jupyter kernel from VS Code. It's important to
understand the differences in behavior, and the benefits of the different approaches.

If you have already opened this notebook in Azure Machine Learning, we recommend
you connect to an existing session on the compute instance. This action reconnects to
an existing session you had for this notebook in Azure Machine Learning.

1. Locate the kernel picker in the upper right-hand corner of your notebook and
select it

2. Choose the 'Azure Machine Learning compute instance' option, and then the
'Remote' if you've connected before

3. Select a notebook session with an existing connection

If your notebook didn't have an existing session, you can pick from the kernels available
in that list to create a new one. This action creates a VS Code-specific kernel session.
These VS Code-specific sessions are usable only within VS Code and must be managed
there. You can manage these sessions by installing the Jupyter PowerToys extension.

While there are a few ways to connect and manage kernels in VS Code, connecting to an
existing kernel session is the recommended way to enable a seamless transition from
the Azure Machine Learning studio to VS Code. If you plan to mostly work within VS
Code, you can make use of any kernel connection approach that works for you.

Transition between Azure Machine Learning

and VS Code
We recommend not trying to work on the same files in both applications at the same
time as you may have conflicts you need to resolve. We'll save your current file in the
studio before navigating to VS Code. You can execute many of the actions provided in
the Azure Machine Learning studio in VS Code instead, using a YAML-first approach. You
may find you prefer to do certain actions (for example, editing and debugging files) in
VS Code, and other actions (for example, Creating a training job) in the Azure Machine
Learning studio. You should find you can seamlessly navigate back and forth between
the two.

Next steps
For more information on managing Jupyter kernels in VS Code, see Jupyter kernel
management .
Manage Azure Machine Learning
resources with the VS Code Extension
(preview)
Article • 04/04/2023

Learn how to manage Azure Machine Learning resources with the VS Code extension.

） Important

Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning .
Visual Studio Code. If you don't have it, install it .
Azure Machine Learning extension. Follow the Azure Machine Learning VS Code
extension installation guide to set up the extension.

Create resources
The quickest way to create resources is using the extension's toolbar.

1. Open the Azure Machine Learning view.

2. Select + in the activity bar.
3. Choose your resource from the dropdown list.
4. Configure the specification file. The information required depends on the type of
resource you want to create.
5. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, you can create a resource by using the command palette:

1. Open the command palette View > Command Palette

2. Enter > Azure ML: Create <RESOURCE-TYPE> into the text box. Replace RESOURCE-
TYPE with the type of resource you want to create.

3. Configure the specification file.

4. Open the command palette View > Command Palette
5. Enter > Azure ML: Create Resource into the text box.

Version resources
Some resources like environments, and models allow you to make changes to a
resource and store the different versions.

To version a resource:

1. Use the existing specification file that created the resource or follow the create
resources process to create a new specification file.
2. Increment the version number in the template.
3. Right-click the specification file and select AzureML: Execute YAML.

As long as the name of the updated resource is the same as the previous version, Azure
Machine Learning picks up the changes and creates a new version.

Workspaces
For more information, see workspaces.

Create a workspace
1. In the Azure Machine Learning view, right-click your subscription node and select
Create Workspace.
2. A specification file appears. Configure the specification file.
3. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Workspace command in the command palette.

Remove workspace
1. Expand the subscription node that contains your workspace.
2. Right-click the workspace you want to remove.
3. Select whether you want to remove:

Only the workspace: This option deletes only the workspace Azure resource.
The resource group, storage accounts, and any other resources the
workspace was attached to are still in Azure.
With associated resources: This option deletes the workspace and all
resources associated with it.

Alternatively, use the > Azure ML: Remove Workspace command in the command palette.

Datastores
The extension currently supports datastores of the following types:

Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Azure File

For more information, see datastore.

Create a datastore
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Datastores node and select Create Datastore.
4. Choose the datastore type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Datastore command in the command palette.

Manage a datastore
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Datastores node inside your workspace.
4. Right-click the datastore you want to:

Unregister Datastore. Removes datastore from your workspace.

View Datastore. Display read-only datastore settings

Alternatively, use the > Azure ML: Unregister Datastore and > Azure ML: View
Datastore commands respectively in the command palette.

Environments
For more information, see environments.

Create environment
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Environments node and select Create Environment.
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Environment command in the command
palette.

View environment configurations

To view the dependencies and configurations for a specific environment in the
extension:

1. Expand the subscription node that contains your workspace.

2. Expand your workspace node.
3. Expand the Environments node.
4. Right-click the environment you want to view and select View Environment.

Alternatively, use the > Azure ML: View Environment command in the command palette.

Create job
The quickest way to create a job is by clicking the Create Job icon in the extension's
activity bar.

Using the resource nodes in the Azure Machine Learning view:

1. Expand the subscription node that contains your workspace.

2. Expand your workspace node.
3. Right-click the Experiments node in your workspace and select Create Job.
4. Choose your job type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Job command in the command palette.

View job
To view your job in Azure Machine Learning studio:

1. Expand the subscription node that contains your workspace.

2. Expand the Experiments node inside your workspace.
3. Right-click the experiment you want to view and select View Experiment in Studio.
4. A prompt appears asking you to open the experiment URL in Azure Machine
Learning studio. Select Open.

Alternatively, use the > Azure ML: View Experiment in Studio command respectively in
the command palette.

Track job progress

As you're running your job, you may want to see its progress. To track the progress of a
job in Azure Machine Learning studio from the extension:

1. Expand the subscription node that contains your workspace.

2. Expand the Experiments node inside your workspace.
3. Expand the job node you want to track progress for.
4. Right-click the job and select View Job in Studio.
5. A prompt appears asking you to open the job URL in Azure Machine Learning
studio. Select Open.

Download job logs & outputs

Once a job is complete, you may want to download the logs and assets such as the
model generated as part of a job.

1. Expand the subscription node that contains your workspace.

2. Expand the Experiments node inside your workspace.
3. Expand the job node you want to download logs and outputs for.
4. Right-click the job:

To download the outputs, select Download outputs.

To download the logs, select Download logs.

Alternatively, use the > Azure ML: Download Outputs and > Azure ML: Download Logs
commands respectively in the command palette.

Compute instances
For more information, see compute instances.

Create compute instance

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute node.
4. Right-click the Compute instances node in your workspace and select Create
Compute.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Compute command in the command palette.

Connect to compute instance

To use a compute instance as a development environment or remote Jupyter server, see
Connect to a compute instance.

Stop or restart compute instance

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to stop or restart and select Stop
Compute instance or Restart Compute instance respectively.

Alternatively, use the > Azure ML: Stop Compute instance and Restart Compute instance
commands respectively in the command palette.

View compute instance configuration

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to inspect and select View Compute
instance Properties.

Alternatively, use the AzureML: View Compute instance Properties command in the
command palette.

Delete compute instance

Alternatively, use the AzureML: Delete Compute instance command in the command
palette.

Compute clusters
For more information, see training compute targets.

Create compute cluster

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute node.
4. Right-click the Compute clusters node in your workspace and select Create
Compute.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Compute command in the command palette.

View compute configuration

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute clusters node inside your Compute node.
4. Right-click the compute you want to view and select View Compute Properties.

Alternatively, use the > Azure ML: View Compute Properties command in the command
palette.

Delete compute cluster

Alternatively, use the > Azure ML: Remove Compute command in the command palette.

Inference Clusters
For more information, see compute targets for inference.

Manage inference clusters

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Inference clusters node inside your Compute node.
4. Right-click the compute you want to:

View Compute Properties. Displays read-only configuration data about your

attached compute.
Detach compute. Detaches the compute from your workspace.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach
Compute commands respectively in the command palette.

Delete inference clusters

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to delete and select Remove Compute.

Alternatively, use the > Azure ML: Remove Compute command in the command palette.

Attached Compute
For more information, see unmanaged compute.

Manage attached compute

1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to:

View Compute Properties. Displays read-only configuration data about your

attached compute.
Detach compute. Detaches the compute from your workspace.

Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach
Compute commands respectively in the command palette.

Models
For more information, see train machine learning models.

Create model
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Model.
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Model command in the command palette.

View model properties

1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model whose properties you want to see and select View Model
Properties. A file opens in the editor containing your model properties.

Alternatively, use the > Azure ML: View Model Properties command in the command
palette.

Download model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to download and select Download Model File.

Alternatively, use the > Azure ML: Download Model File command in the command
palette.

Delete a model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to delete and select Remove Model.
4. A prompt appears confirming you want to remove the model. Select Ok.

Alternatively, use the > Azure ML: Remove Model command in the command palette.

Endpoints
For more information, see endpdoints.

Create endpoint
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Endpoint.
4. Choose your endpoint type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.

Alternatively, use the > Azure ML: Create Endpoint command in the command palette.

Delete endpoint
1. Expand the subscription node that contains your workspace.
2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to remove and select Remove Service.
4. A prompt appears confirming you want to remove the service. Select Ok.

Alternatively, use the > Azure ML: Remove Service command in the command palette.

View service properties

In addition to creating and deleting deployments, you can view and edit settings
associated with the deployment.

1. Expand the subscription node that contains your workspace.

2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to manage:

To view deployment configuration settings, select View Service Properties.

Alternatively, use the > Azure ML: View Service Properties command in the command
palette.

Next steps
Train an image classification model with the VS Code extension.
MLflow and Azure Machine Learning
Article • 01/10/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

MLflow is an open-source framework designed to manage the complete machine

learning lifecycle. Its ability to train and serve models on different platforms allows you
to use a consistent set of tools regardless of where your experiments are running:
whether locally on your computer, on a remote compute target, on a virtual machine, or
on an Azure Machine Learning compute instance.

Azure Machine Learning workspaces are MLflow-compatible, which means that you can
use Azure Machine Learning workspaces in the same way that you'd use an MLflow
server. This compatibility has the following advantages:

Azure Machine Learning doesn't host MLflow server instances under the hood;
rather, the workspace can speak the MLflow API language.
You can use Azure Machine Learning workspaces as your tracking server for any
MLflow code, whether it runs on Azure Machine Learning or not. You only need to
configure MLflow to point to the workspace where the tracking should happen.
You can run any training routine that uses MLflow in Azure Machine Learning
without any change.

 Tip

Unlike the Azure Machine Learning SDK v1, there's no logging functionality in the
SDK v2. We recommend that you use MLflow for logging, so that your training
routines are cloud-agnostic and portable—removing any dependency your code
has on Azure Machine Learning.

Tracking with MLflow

Azure Machine Learning uses MLflow tracking to log metrics and store artifacts for your
experiments. When you're connected to Azure Machine Learning, all tracking performed
using MLflow is materialized in the workspace you're working on. To learn more about
how to set up your experiments to use MLflow for tracking experiments and training
routines, see Log metrics, parameters, and files with MLflow. You can also use MLflow to
query & compare experiments and runs.
MLflow in Azure Machine Learning provides a way to centralize tracking. You can
connect MLflow to Azure Machine Learning workspaces even when you're working
locally or in a different cloud. The workspace provides a centralized, secure, and scalable
location to store training metrics and models.

Using MLflow in Azure Machine Learning includes the capabilities to:

Track machine learning experiments and models running locally or in the cloud.
Track Azure Databricks machine learning experiments.
Track Azure Synapse Analytics machine learning experiments.

Example notebooks
Training and tracking an XGBoost classifier with MLflow : Demonstrates how to
track experiments by using MLflow, log models, and combine multiple flavors into
pipelines.
Training and tracking an XGBoost classifier with MLflow using service principal
authentication : Demonstrates how to track experiments by using MLflow from a
compute that's running outside Azure Machine Learning. The example shows how
to authenticate against Azure Machine Learning services by using a service
principal.
Hyper-parameter optimization using HyperOpt and nested runs in MLflow :
Demonstrates how to use child runs in MLflow to do hyper-parameter optimization
for models by using the popular library Hyperopt . The example shows how to
transfer metrics, parameters, and artifacts from child runs to parent runs.
Logging models with MLflow : Demonstrates how to use the concept of models,
instead of artifacts, with MLflow. The example also shows how to construct custom
models.
Manage runs and experiments with MLflow : Demonstrates how to query
experiments, runs, metrics, parameters, and artifacts from Azure Machine Learning
by using MLflow.

Tracking with MLflow in R

MLflow support in R has the following limitations:

MLflow tracking is limited to tracking experiment metrics, parameters, and models

on Azure Machine Learning jobs.
Interactive training on RStudio, Posit (formerly RStudio Workbench), or Jupyter
notebooks with R kernels is not supported.
Model management and registration are not supported using the MLflow R SDK.
Instead, use the Azure Machine Learning CLI or Azure Machine Learning studio
for model registration and management.

To learn about using the MLflow tracking client with Azure Machine Learning, view the
examples in Train R models using the Azure Machine Learning CLI (v2) .

Tracking with MLflow in Java

MLflow support in Java has the following limitations:

MLflow tracking is limited to tracking experiment metrics and parameters on Azure

Machine Learning jobs.
Artifacts and models can't be tracked using the MLflow Java SDK. Instead, use the
Outputs folder in jobs along with the mlflow.save_model method to save models

(or artifacts) that you want to capture.

To learn about using the MLflow tracking client with Azure Machine Learning, view the
Java example that uses the MLflow tracking client with Azure Machine Learning .

Model registries with MLflow

Azure Machine Learning supports MLflow for model management. This support
represents a convenient way to support the entire model lifecycle for users that are
familiar with the MLflow client.

To learn more about how to manage models by using the MLflow API in Azure Machine
Learning, view Manage model registries in Azure Machine Learning with MLflow.

Example notebook
Manage model registries with MLflow : Demonstrates how to manage models in
registries by using MLflow.

Model deployment with MLflow

You can deploy MLflow models to Azure Machine Learning and take advantage of the
improved experience when you use MLflow models. Azure Machine Learning supports
deployment of MLflow models to both real-time and batch endpoints without having to
specify an environment or a scoring script. Deployment is supported using the MLflow
SDK, Azure Machine Learning CLI, Azure Machine Learning SDK for Python, or the Azure
Machine Learning studio .

To learn more about deploying MLflow models to Azure Machine Learning for both real-
time and batch inferencing, see Guidelines for deploying MLflow models.

Example notebooks
Deploy MLflow to online endpoints : Demonstrates how to deploy models in
MLflow format to online endpoints using the MLflow SDK.
Deploy MLflow to online endpoints with safe rollout : Demonstrates how to
deploy models in MLflow format to online endpoints, using the MLflow SDK with
progressive rollout of models. The example also shows deployment of multiple
versions of a model to the same endpoint.
Deploy MLflow to web services (V1) : Demonstrates how to deploy models in
MLflow format to web services (ACI/AKS v1) using the MLflow SDK.
Deploy models trained in Azure Databricks to Azure Machine Learning with
MLflow : Demonstrates how to train models in Azure Databricks and deploy them
in Azure Machine Learning. The example also covers how to handle cases where
you also want to track the experiments with the MLflow instance in Azure
Databricks.

Training with MLflow projects (preview)

） Important

You can submit training jobs to Azure Machine Learning by using MLflow projects
(preview). You can submit jobs locally with Azure Machine Learning tracking or migrate
your jobs to the cloud via Azure Machine Learning compute.

To learn how to submit training jobs with MLflow Projects that use Azure Machine
Learning workspaces for tracking, see Train machine learning models with MLflow
projects and Azure Machine Learning.
Example notebooks
Track an MLflow project in Azure Machine Learning workspaces .
Train and run an MLflow project on Azure Machine Learning jobs .

MLflow SDK, Azure Machine Learning v2, and

Azure Machine Learning studio capabilities
The following table shows the operations that are possible, using each of the client tools
available in the machine learning lifecycle.

ﾉ Expand table

Feature MLflow Azure Machine Azure Machine

SDK Learning CLI/SDK Learning studio

Track and log metrics, parameters, ✓

and models

Retrieve metrics, parameters, and ✓ 1 ✓

models

Submit training jobs ✓2 ✓ ✓

Submit training jobs with Azure ✓ ✓

Machine Learning data assets

Submit training jobs with machine ✓ ✓

learning pipelines

Manage experiments and runs ✓ ✓ ✓

Manage MLflow models ✓3 ✓ ✓

Manage non-MLflow models ✓ ✓

Deploy MLflow models to Azure ✓4 ✓ ✓

Machine Learning (Online & Batch)

Deploy non-MLflow models to Azure ✓ ✓

Machine Learning

７ Note

1
Only artifacts and models can be downloaded.
2
Possible by using MLflow projects (preview).
3
Some operations may not be supported. View Manage model registries in
Azure Machine Learning with MLflow for details.
4
Deployment of MLflow models for batch inference by using the MLflow SDK
is not possible at the moment. As an alternative, see Deploy and run MLflow
models in Spark jobs.

Related content
From artifacts to models in MLflow.
Configure MLflow for Azure Machine Learning.
Migrate logging from SDK v1 to MLflow
Track ML experiments and models with MLflow.
Log MLflow models.
Guidelines for deploying MLflow models.
From artifacts to models in MLflow
Article • 12/21/2023

The following article explains the differences between an MLflow artifact and an MLflow
model, and how to transition from one to the other. It also explains how Azure Machine
Learning uses the concept of an MLflow model to enable streamlined deployment
workflows.

What's the difference between an artifact and a

model?
If you're not familiar with MLflow, you might not be aware of the difference between
logging artifacts or files vs. logging MLflow models. There are some fundamental
differences between the two:

Artifact
An artifact is any file that's generated (and captured) from an experiment's run or job.
An artifact could represent a model serialized as a pickle file, the weights of a PyTorch or
TensorFlow model, or even a text file containing the coefficients of a linear regression.
Some artifacts could also have nothing to do with the model itself; rather, they could
contain configurations to run the model, or preprocessing information, or sample data,
and so on. Artifacts can come in various formats.

You might have been logging artifacts already:

Python

filename = 'model.pkl'
with open(filename, 'wb') as f:
pickle.dump(model, f)

mlflow.log_artifact(filename)

Model
A model in MLflow is also an artifact. However, we make stronger assumptions about
this type of artifact. Such assumptions provide a clear contract between the saved files
and what they mean. When you log your models as artifacts (simple files), you need to
know what the model builder meant for each of those files so as to know how to load
the model for inference. On the contrary, MLflow models can be loaded using the
contract specified in the The MLmodel format.

In Azure Machine Learning, logging models has the following advantages:

You can deploy them to real-time or batch endpoints without providing a scoring
script or an environment.
When you deploy models, the deployments automatically have a swagger
generated, and the Test feature can be used in Azure Machine Learning studio.
You can use the models directly as pipeline inputs.
You can use the Responsible AI dashboard with your models.

You can log models by using the MLflow SDK:

Python

import mlflow
mlflow.sklearn.log_model(sklearn_estimator, "classifier")

The MLmodel format

MLflow adopts the MLmodel format as a way to create a contract between the artifacts
and what they represent. The MLmodel format stores assets in a folder. Among these
assets, there's a file named MLmodel . This file is the single source of truth about how a
model can be loaded and used.

The following screenshot shows a sample MLflow model's folder in the Azure Machine
Learning studio. The model is placed in a folder called credit_defaults_model . There is
no specific requirement on the naming of this folder. The folder contains the MLmodel
file among other model artifacts.


The following code is an example of what the MLmodel file for a computer vision model
trained with fastai might look like:

MLmodel

YAML

artifact_path: classifier
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12
model_uuid: e694c68eba484299976b06ab9058f636
run_id: e13da8ac-b1e6-45d4-a9b2-6a0a5cfac537
signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, 300, 300, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'

Model flavors
Considering the large number of machine learning frameworks available to use, MLflow
introduced the concept of flavor as a way to provide a unique contract to work across all
machine learning frameworks. A flavor indicates what to expect for a given model that's
created with a specific framework. For instance, TensorFlow has its own flavor, which
specifies how a TensorFlow model should be persisted and loaded. Because each model
flavor indicates how to persist and load the model for a given framework, the MLmodel
format doesn't enforce a single serialization mechanism that all models must support.
This decision allows each flavor to use the methods that provide the best performance
or best support according to their best practices—without compromising compatibility
with the MLmodel standard.

The following code is an example of the flavors section for an fastai model.

YAML
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12

Model signature
A model signature in MLflow is an important part of the model's specification, as it
serves as a data contract between the model and the server running the model. A model
signature is also important for parsing and enforcing a model's input types at
deployment time. If a signature is available, MLflow enforces input types when data is
submitted to your model. For more information, see MLflow signature enforcement .

Signatures are indicated when models get logged, and they're persisted in the
signature section of the MLmodel file. The Autolog feature in MLflow automatically

infers signatures in a best effort way. However, you might have to log the models
manually if the inferred signatures aren't the ones you need. For more information, see
How to log models with signatures .

There are two types of signatures:

Column-based signature: This signature operates on tabular data. For models with
this type of signature, MLflow supplies pandas.DataFrame objects as inputs.
Tensor-based signature: This signature operates with n-dimensional arrays or
tensors. For models with this signature, MLflow supplies numpy.ndarray as inputs
(or a dictionary of numpy.ndarray in the case of named-tensors).

The following example corresponds to a computer vision model trained with fastai .
This model receives a batch of images represented as tensors of shape (300, 300, 3)
with the RGB representation of them (unsigned integers). The model outputs batches of
predictions (probabilities) for two classes.

MLmodel

YAML

signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, 300, 300, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'

 Tip

Azure Machine Learning generates a swagger file for a deployment of an MLflow

model with a signature available. This makes it easier to test deployments using the
Azure Machine Learning studio.

Model environment
Requirements for the model to run are specified in the conda.yaml file. MLflow can
automatically detect dependencies or you can manually indicate them by calling the
mlflow.<flavor>.log_model() method. The latter can be useful if the libraries included in

your environment aren't the ones you intended to use.

The following code is an example of an environment used for a model created with the
fastai framework:

conda.yaml

YAML

channels:
- conda-forge
dependencies:
- python=3.8.5
- pip
- pip:
- mlflow
- astunparse==1.6.3
- cffi==1.15.0
- configparser==3.7.4
- defusedxml==0.7.1
- fastai==2.4.1
- google-api-core==2.7.1
- ipython==8.2.0
- psutil==5.9.0
name: mlflow-env

７ Note
What's the difference between an MLflow environment and an Azure Machine
Learning environment?

While an MLflow environment operates at the level of the model, an Azure Machine
Learning environment operates at the level of the workspace (for registered
environments) or jobs/deployments (for anonymous environments). When you
deploy MLflow models in Azure Machine Learning, the model's environment is built
and used for deployment. Alternatively, you can override this behavior with the
Azure Machine Learning CLI v2 and deploy MLflow models using a specific Azure
Machine Learning environment.

Predict function
All MLflow models contain a predict function. This function is called when a model is
deployed using a no-code-deployment experience. What the predict function returns
(for example, classes, probabilities, or a forecast) depend on the framework (that is, the
flavor) used for training. Read the documentation of each flavor to know what they
return.

In same cases, you might need to customize this predict function to change the way
inference is executed. In such cases, you need to log models with a different behavior in
the predict method or log a custom model's flavor.

Workflows for loading MLflow models

You can load models that were created as MLflow models from several locations,
including:

directly from the run where the models were logged

from the file system where they models are saved
from the model registry where the models are registered.

MLflow provides a consistent way to load these models regardless of the location.

There are two workflows available for loading models:

Load back the same object and types that were logged: You can load models
using the MLflow SDK and obtain an instance of the model with types belonging
to the training library. For example, an ONNX model returns a ModelProto while a
decision tree model trained with scikit-learn returns a DecisionTreeClassifier
object. Use mlflow.<flavor>.load_model() to load back the same model object and
types that were logged.

Load back a model for running inference: You can load models using the MLflow
SDK and obtain a wrapper where MLflow guarantees that there will be a predict
function. It doesn't matter which flavor you're using, every MLflow model has a
predict function. Furthermore, MLflow guarantees that this function can be called

by using arguments of type pandas.DataFrame , numpy.ndarray , or dict[string,

numpyndarray] (depending on the signature of the model). MLflow handles the

type conversion to the input type that the model expects. Use
mlflow.pyfunc.load_model() to load back a model for running inference.

Related content
Configure MLflow for Azure Machine Learning
How to log MLFlow models
Guidelines for deploying MLflow models
Configure MLflow for Azure Machine
Learning
Article • 03/10/2023

Azure Machine Learning workspaces are MLflow-compatible, which means they can act
as an MLflow server without any extra configuration. Each workspace has an MLflow
tracking URI that can be used by MLflow to connect to the workspace. Azure Machine
Learning workspaces are already configured to work with MLflow so no extra
configuration is required.

However, if you are working outside of Azure Machine Learning (like your local machine,
Azure Synapse Analytics, or Azure Databricks) you need to configure MLflow to point to
the workspace. In this article, you'll learn how you can configure MLflow to connect to
an Azure Machine Learning for tracking, registries, and deployment.

） Important

When running on Azure Compute (Azure Machine Learning Notebooks, Jupyter

notebooks hosted on Azure Machine Learning Compute Instances, or jobs running
on Azure Machine Learning compute clusters) you don't have to configure the
tracking URI. It's automatically configured for you.

Prerequisites
You need the following prerequisites to follow this tutorial:

Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .

Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

package without SQL storage, server, UI, or data science dependencies. It is
recommended for users who primarily need the tracking and logging
capabilities without importing the full suite of MLflow features including
deployments.

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Configure MLflow tracking URI

To connect MLflow to an Azure Machine Learning workspace, you need the tracking URI
for the workspace. Each workspace has its own tracking URI and it has the protocol
azureml:// .

1. Get the tracking URI for your workspace:

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

a. Login and configure your workspace:

Bash

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-
group> location=<location>

b. You can get the tracking URI using the az ml workspace command:

Bash

az ml workspace show --query mlflow_tracking_uri

2. Configuring the tracking URI:

Using MLflow SDK

Then the method set_tracking_uri() points the MLflow tracking URI to that
URI.

Python

import mlflow

mlflow.set_tracking_uri(mlflow_tracking_uri)

 Tip

When working on shared environments, like an Azure Databricks cluster,

Azure Synapse Analytics cluster, or similar, it is useful to set the environment
variable MLFLOW_TRACKING_URI at the cluster level to automatically configure
the MLflow tracking URI to point to Azure Machine Learning for all the
sessions running in the cluster rather than to do it on a per-session basis.

Configure authentication
Once the tracking is set, you'll also need to configure how the authentication needs to
happen to the associated workspace. By default, the Azure Machine Learning plugin for
MLflow will perform interactive authentication by opening the default browser to
prompt for credentials.

The Azure Machine Learning plugin for MLflow supports several authentication
mechanisms through the package azure-identity , which is installed as a dependency
for the plugin azureml-mlflow . The following authentication methods are tried one by
one until one of them succeeds:

1. Environment: it reads account information specified via environment variables and

use it to authenticate.
2. Managed Identity: If the application is deployed to an Azure host with Managed
Identity enabled, it authenticates with it.
3. Azure CLI: if a user has signed in via the Azure CLI az login command, it
authenticates as that user.
4. Azure PowerShell: if a user has signed in via Azure PowerShell's Connect-AzAccount
command, it authenticates as that user.
5. Interactive browser: it interactively authenticates a user via the default browser.
For interactive jobs where there's a user connected to the session, you can rely on
Interactive Authentication and hence no further action is required.

２ Warning

Interactive browser authentication will block code execution when prompting for
credentials. It is not a suitable option for authentication in unattended
environments like training jobs. We recommend to configure other authentication
mode.

For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.

MLflow SDK

Python

import os

os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"

 Tip

When working on shared environments, it is advisable to configure these

environment variables at the compute. As a best practice, manage them as secrets
in an instance of Azure Key Vault whenever possible. For instance, in Azure
Databricks you can use secrets in environment variables as follows in the cluster
configuration: AZURE_CLIENT_SECRET={{secrets/<scope-name>/<secret-name>}} . See
Reference a secret in an environment variable for how to do it in Azure Databricks
or refer to similar documentation in your platform.

If you'd rather use a certificate instead of a secret, you can configure the environment
variables AZURE_CLIENT_CERTIFICATE_PATH to the path to a PEM or PKCS12 certificate file
(including private key) and AZURE_CLIENT_CERTIFICATE_PASSWORD with the password of the
certificate file, if any.

Configure authorization and permission levels

Some default roles like AzureML Data Scientist or contributor are already configured to
perform MLflow operations in an Azure Machine Learning workspace. If using a custom
roles, you need the following permissions:

To use MLflow tracking:

Microsoft.MachineLearningServices/workspaces/experiments/* .

Microsoft.MachineLearningServices/workspaces/jobs/* .

To use MLflow model registry:

Microsoft.MachineLearningServices/workspaces/models/*/*

Grant access for the service principal you created or user account to your workspace as
explained at Grant access.

Troubleshooting authentication
MLflow will try to authenticate to Azure Machine Learning on the first operation
interacting with the service, like mlflow.set_experiment() or mlflow.start_run() . If you
find issues or unexpected authentication prompts during the process, you can increase
the logging level to get more details about the error:

Python

import logging

logging.getLogger("azure").setLevel(logging.DEBUG)

Set experiment name (optional)

All MLflow runs are logged to the active experiment. By default, runs are logged to an
experiment named Default that is automatically created for you. You can configure the
experiment where tracking is happening.

 Tip

When submitting jobs using Azure Machine Learning CLI v2, you can set the
experiment name using the property experiment_name in the YAML definition of the
job. You don't have to configure it on your training script. See YAML: display name,
experiment name, description, and tags for details.

MLflow SDK
To configure the experiment you want to work on use MLflow command
mlflow.set_experiment() .

Python

experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)

Non-public Azure Clouds support

The Azure Machine Learning plugin for MLflow is configured by default to work with the
global Azure cloud. However, you can configure the Azure cloud you are using by
setting the environment variable AZUREML_CURRENT_CLOUD .

MLflow SDK

Python

import os

os.environ["AZUREML_CURRENT_CLOUD"] = "AzureChinaCloud"

You can identify the cloud you are using with the following Azure CLI command:

Bash

az cloud list

The current cloud has the value IsActive set to True .

Next steps
Now that your environment is connected to your workspace in Azure Machine Learning,
you can start to work with it.

Track ML experiments and models with MLflow

Manage models registries in Azure Machine Learning with MLflow
Train with MLflow Projects (Preview)
Guidelines for deploying MLflow models
Track ML experiments and models with
MLflow
Article • 04/04/2023

Tracking refers to process of saving all experiment's related information that you may
find relevant for every experiment you run. Such metadata varies based on your project,
but it may include:

＂ Code
＂ Environment details (OS version, Python packages)
＂ Input data
＂ Parameter configurations
＂ Models
＂ Evaluation metrics
＂ Evaluation visualizations (confusion matrix, importance plots)
＂ Evaluation results (including some evaluation predictions)

Some of these elements are automatically tracked by Azure Machine Learning when
working with jobs (including code, environment, and input and output data). However,
others like models, parameters, and metrics, need to be instrumented by the model
builder as it's specific to the particular scenario.

In this article, you'll learn how to use MLflow for tracking your experiments and runs in
Azure Machine Learning workspaces.

７ Note

If you want to track experiments running on Azure Databricks or Azure Synapse

Analytics, see the dedicated articles Track Azure Databricks ML experiments with
MLflow and Azure Machine Learning or Track Azure Synapse Analytics ML
experiments with MLflow and Azure Machine Learning.

Benefits of tracking experiments

We highly encourage machine learning practitioners to instrument their experimentation
by tracking them, regardless if they're training with jobs in Azure Machine Learning or
interactively in notebooks. Benefits include:
All of your ML experiments are organized in a single place, allowing you to search
and filter experiments to find the information and drill down to see what exactly it
was that you tried before.
Compare experiments, analyze results, and debug model training with little extra
work.
Reproduce or re-run experiments to validate results.
Improve collaboration by seeing what everyone is doing, sharing experiment
results, and access experiment data programmatically.

Why MLflow
Azure Machine Learning workspaces are MLflow-compatible, which means you can use
MLflow to track runs, metrics, parameters, and artifacts with your Azure Machine
Learning workspaces. By using MLflow for tracking, you don't need to change your
training routines to work with Azure Machine Learning or inject any cloud-specific
syntax, which is one of the main advantages of the approach.

See MLflow and Azure Machine Learning for all supported MLflow and Azure Machine
Learning functionality including MLflow Project support (preview) and model
deployment.

Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .

Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Configuring the experiment

MLflow organizes the information in experiments and runs (in Azure Machine Learning,
runs are called Jobs). By default, runs are logged to an experiment named Default that
is automatically created for you. You can configure the experiment where tracking is
happening.

Working interactively

When training interactively, such as in a Jupyter Notebook, use MLflow command

mlflow.set_experiment() . For example, the following code snippet demonstrates
configuring the experiment, and then logging during a job:

Python

experiment_name = 'hello-world-example'
mlflow.set_experiment(experiment_name)

Configure the run

Azure Machine Learning tracks any training job in what MLflow calls a run. Use runs to
capture all the processing that your job performs.

Working interactively

When working interactively, MLflow starts tracking your training routine as soon as
you try to log information that requires an active run. For instance, when you log a
metric, log a parameter, or when you start a training cycle when Mlflow's
autologging functionality is enabled. However, it's usually helpful to start the run
explicitly, specially if you want to capture the total time of your experiment in the
field Duration. To start the run explicitly, use mlflow.start_run() .
Regardless if you started the run manually or not, you'll eventually need to stop the
run to inform MLflow that your experiment run has finished and marks its status as
Completed. To do that, all mlflow.end_run() . We strongly recommend starting runs
manually so you don't forget to end them when working on notebooks.

Python

mlflow.start_run()

# Your code

mlflow.end_run()

To help you avoid forgetting to end the run, it's usually helpful to use the context
manager paradigm:

Python

with mlflow.start_run() as run:

# Your code

When you start a new run with mlflow.start_run() , it may be interesting to

indicate the parameter run_name which will then translate to the name of the run in
Azure Machine Learning user interface and help you identify the run quicker:

Python

with mlflow.start_run(run_name="hello-world-example") as run:

# Your code

Autologging
You can log metrics, parameters and files with MLflow manually. However, you can also
rely on MLflow automatic logging capability. Each machine learning framework
supported by MLflow decides what to track automatically for you.

To enable automatic logging insert the following code before your training code:

Python

mlflow.autolog()
View metrics and artifacts in your workspace
The metrics and artifacts from MLflow logging are tracked in your workspace. To view
them anytime, navigate to your workspace and find the experiment by name in your
workspace in Azure Machine Learning studio .

Select the logged metrics to render charts on the right side. You can customize the
charts by applying smoothing, changing the color, or plotting multiple metrics on a
single graph. You can also resize and rearrange the layout as you wish. Once you've
created your desired view, you can save it for future use and share it with your
teammates using a direct link.

You can also access or query metrics, parameters and artifacts programatically using
the MLflow SDK. Use mlflow.get_run() as explained bellow:

Python

import mlflow

run = mlflow.get_run("<RUN_ID>")

metrics = run.data.metrics
params = run.data.params
tags = run.data.tags

print(metrics, params, tags)

 Tip
For metrics, the previous example will only return the last value of a given metric. If
you want to retrieve all the values of a given metric, use mlflow.get_metric_history
method as explained at Getting params and metrics from a run.

To download artifacts you've logged, like files and models, you can use
mlflow.artifacts.download_artifacts()

Python

mlflow.artifacts.download_artifacts(run_id="<RUN_ID>",
artifact_path="helloworld.txt")

For more details about how to retrieve or compare information from experiments and
runs in Azure Machine Learning using MLflow view Query & compare experiments and
runs with MLflow

Example notebooks
If you're looking for examples about how to use MLflow in Jupyter notebooks, please
see our example's repository Using MLflow (Jupyter Notebooks) .

Limitations
Some methods available in the MLflow API may not be available when connected to
Azure Machine Learning. For details about supported and unsupported operations
please read Support matrix for querying runs and experiments.

Next steps
Deploy MLflow models.
Manage models with MLflow.
Track Azure Databricks ML experiments
with MLflow and Azure Machine
Learning
Article • 02/24/2023

MLflow is an open-source library for managing the life cycle of your machine learning
experiments. You can use MLflow to integrate Azure Databricks with Azure Machine
Learning to ensure you get the best from both of the products.

In this article, you will learn:

＂ The required libraries needed to use MLflow with Azure Databricks and Azure
Machine Learning.
＂ How to track Azure Databricks runs with MLflow in Azure Machine Learning.
＂ How to log models with MLflow to get them registered in Azure Machine Learning.
＂ How to deploy and consume models registered in Azure Machine Learning.

Prerequisites
Install the azureml-mlflow package, which handles the connectivity with Azure
Machine Learning, including authentication.
An Azure Databricks workspace and cluster.
Create an Azure Machine Learning Workspace.
See which access permissions you need to perform your MLflow operations with
your workspace.

Example notebooks
The Training models in Azure Databricks and deploying them on Azure Machine
Learning demonstrates how to train models in Azure Databricks and deploy them in
Azure Machine Learning. It also includes how to handle cases where you also want to
track the experiments and models with the MLflow instance in Azure Databricks and
leverage Azure Machine Learning for deployment.

Install libraries
To install libraries on your cluster, navigate to the Libraries tab and select Install New
In the Package field, type azureml-mlflow and then select install. Repeat this step as
necessary to install other additional packages to your cluster for your experiment.

Track Azure Databricks runs with MLflow

Azure Databricks can be configured to track experiments using MLflow in two ways:

Track in both Azure Databricks workspace and Azure Machine Learning workspace
(dual-tracking)
Track exclusively on Azure Machine Learning

By default, dual-tracking is configured for you when you linked your Azure Databricks
workspace.
Dual-tracking on Azure Databricks and Azure Machine
Learning
Linking your ADB workspace to your Azure Machine Learning workspace enables you to
track your experiment data in the Azure Machine Learning workspace and Azure
Databricks workspace at the same time. This is referred as Dual-tracking.

２ Warning

Dual-tracking in a private link enabled Azure Machine Learning workspace is not

supported by the moment. Configure exclusive tracking with your Azure Machine
Learning workspace instead.

２ Warning

Dual-tracking in not supported in Azure China by the moment. Configure exclusive

tracking with your Azure Machine Learning workspace instead.

To link your ADB workspace to a new or existing Azure Machine Learning workspace,

1. Sign in to Azure portal .

2. Navigate to your ADB workspace's Overview page.
3. Select the Link Azure Machine Learning workspace button on the bottom right.
After you link your Azure Databricks workspace with your Azure Machine Learning
workspace, MLflow Tracking is automatically set to be tracked in all of the following
places:

The linked Azure Machine Learning workspace.

Your original ADB workspace.

You can use then MLflow in Azure Databricks in the same way as you're used to. The
following example sets the experiment name as it is usually done in Azure Databricks
and start logging some parameters:

Python

import mlflow

experimentName = "/Users/{user_name}/{experiment_folder}/{experiment_name}"
mlflow.set_experiment(experimentName)

with mlflow.start_run():
mlflow.log_param('epochs', 20)
pass

７ Note

As opposite to tracking, model registries don't support registering models at the

same time on both Azure Machine Learning and Azure Databricks. Either one or the
other has to be used. Please read the section Registering models in the registry
with MLflow for more details.

Tracking exclusively on Azure Machine Learning

workspace
If you prefer to manage your tracked experiments in a centralized location, you can set
MLflow tracking to only track in your Azure Machine Learning workspace. This
configuration has the advantage of enabling easier path to deployment using Azure
Machine Learning deployment options.

２ Warning

For private link enabled Azure Machine Learning workspace, you have to deploy
Azure Databricks in your own network (VNet injection) to ensure proper
connectivity.
You have to configure the MLflow tracking URI to point exclusively to Azure Machine
Learning, as it is demonstrated in the following example:

Configure tracking URI

1. Get the tracking URI for your workspace:

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

a. Login and configure your workspace:

Bash

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-
group> location=<location>

b. You can get the tracking URI using the az ml workspace command:

Bash

az ml workspace show --query mlflow_tracking_uri

2. Configuring the tracking URI:

Using MLflow SDK

Then the method set_tracking_uri() points the MLflow tracking URI to that
URI.

Python

import mlflow

mlflow.set_tracking_uri(mlflow_tracking_uri)

 Tip

When working on shared environments, like an Azure Databricks cluster,

Once the environment variable is configured, any experiment running in such

cluster will be tracked in Azure Machine Learning.

Configure authentication

Once the tracking is configured, you'll also need to configure how the authentication
needs to happen to the associated workspace. By default, the Azure Machine Learning
plugin for MLflow will perform interactive authentication by opening the default
browser to prompt for credentials. Refer to Configure MLflow for Azure Machine
Learning: Configure authentication to additional ways to configure authentication for
MLflow in Azure Machine Learning workspaces.

For interactive jobs where there's a user connected to the session, you can rely on
Interactive Authentication and hence no further action is required.

２ Warning
Interactive browser authentication will block code execution when prompting for
credentials. It is not a suitable option for authentication in unattended
environments like training jobs. We recommend to configure other authentication
mode.

For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.

MLflow SDK

Python

import os

os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"

 Tip

When working on shared environments, it is advisable to configure these

Experiment's names in Azure Machine Learning

When MLflow is configured to exclusively track experiments in Azure Machine Learning
workspace, the experiment's naming convention has to follow the one used by Azure
Machine Learning. In Azure Databricks, experiments are named with the path to where
the experiment is saved like /Users/[email protected]/iris-classifier . However, in
Azure Machine Learning, you have to provide the experiment name directly. As in the
previous example, the same experiment would be named iris-classifier directly:

Python

mlflow.set_experiment(experiment_name="experiment-name")
Tracking parameters, metrics and artifacts
You can use then MLflow in Azure Databricks in the same way as you're used to. For
details see Log & view metrics and log files.

Logging models with MLflow

After your model is trained, you can log it to the tracking server with the mlflow.
<model_flavor>.log_model() method. <model_flavor> , refers to the framework

associated with the model. Learn what model flavors are supported . In the following
example, a model created with the Spark library MLLib is being registered:

Python

mlflow.spark.log_model(model, artifact_path = "model")

It's worth to mention that the flavor spark doesn't correspond to the fact that we are
training a model in a Spark cluster but because of the training framework it was used
(you can perfectly train a model using TensorFlow with Spark and hence the flavor to
use would be tensorflow ).

Models are logged inside of the run being tracked. That means that models are available
in either both Azure Databricks and Azure Machine Learning (default) or exclusively in
Azure Machine Learning if you configured the tracking URI to point to it.

） Important

Notice that here the parameter registered_model_name has not been specified.
Read the section Registering models in the registry with MLflow for more details
about the implications of such parameter and how the registry works.

Registering models in the registry with MLflow

As opposite to tracking, model registries can't operate at the same time in Azure
Databricks and Azure Machine Learning. Either one or the other has to be used. By
default, the Azure Databricks workspace is used for model registries; unless you chose to
set MLflow Tracking to only track in your Azure Machine Learning workspace, then the
model registry is the Azure Machine Learning workspace.
Then, considering you're using the default configuration, the following line will log a
model inside the corresponding runs of both Azure Databricks and Azure Machine
Learning, but it will register it only on Azure Databricks:

Python

mlflow.spark.log_model(model, artifact_path = "model",

registered_model_name = 'model_name')

If a registered model with the name doesn’t exist, the method registers a new
model, creates version 1, and returns a ModelVersion MLflow object.

If a registered model with the name already exists, the method creates a new
model version and returns the version object.

Using Azure Machine Learning Registry with MLflow

If you want to use Azure Machine Learning Model Registry instead of Azure Databricks,
we recommend you to set MLflow Tracking to only track in your Azure Machine Learning
workspace. This will remove the ambiguity of where models are being registered and
simplifies complexity.

However, if you want to continue using the dual-tracking capabilities but register
models in Azure Machine Learning, you can instruct MLflow to use Azure Machine
Learning for model registries by configuring the MLflow Model Registry URI. This URI
has the exact same format and value that the MLflow tracking URI.

Python

mlflow.set_registry_uri(azureml_mlflow_uri)

７ Note

The value of azureml_mlflow_uri was obtained in the same way it was demostrated
in Set MLflow Tracking to only track in your Azure Machine Learning workspace

For a complete example about this scenario please check the example Training models
in Azure Databricks and deploying them on Azure Machine Learning .
Deploying and consuming models registered in
Azure Machine Learning
Models registered in Azure Machine Learning Service using MLflow can be consumed
as:

An Azure Machine Learning endpoint (real-time and batch): This deployment

allows you to leverage Azure Machine Learning deployment capabilities for both
real-time and batch inference in Azure Container Instances (ACI), Azure Kubernetes
(AKS) or our Managed Inference Endpoints.

MLFlow model objects or Pandas UDFs, which can be used in Azure Databricks
notebooks in streaming or batch pipelines.

Deploy models to Azure Machine Learning endpoints

You can leverage the azureml-mlflow plugin to deploy a model to your Azure Machine
Learning workspace. Check How to deploy MLflow models page for a complete detail
about how to deploy models to the different targets.

） Important

Models need to be registered in Azure Machine Learning registry in order to deploy

them. If your models happen to be registered in the MLflow instance inside Azure
Databricks, you will have to register them again in Azure Machine Learning. If this is
you case, please check the example Training models in Azure Databricks and
deploying them on Azure Machine Learning

Deploy models to ADB for batch scoring using UDFs

You can choose Azure Databricks clusters for batch scoring. By leveraging Mlflow, you
can resolve any model from the registry you are connected to. You will typically use one
of the following two methods:

If your model was trained and built with Spark libraries (like MLLib ), use
mlflow.pyfunc.spark_udf to load a model and used it as a Spark Pandas UDF to

score new data.

If your model wasn't trained or built with Spark libraries, either use
mlflow.pyfunc.load_model or mlflow.<flavor>.load_model to load the model in the

cluster driver. Notice that in this way, any parallelization or work distribution you
want to happen in the cluster needs to be orchestrated by you. Also, notice that
MLflow doesn't install any library your model requires to run. Those libraries need
to be installed in the cluster before running it.

The following example shows how to load a model from the registry named uci-heart-
classifier and used it as a Spark Pandas UDF to score new data.

Python

from pyspark.sql.types import ArrayType, FloatType

model_name = "uci-heart-classifier"
model_uri = "models:/"+model_name+"/latest"

#Create a Spark UDF for the MLFlow model

pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri)

 Tip

Check Loading models from registry for more ways to reference models from the
registry.

Once the model is loaded, you can use to score new data:

Python

#Load Scoring Data into Spark Dataframe

scoreDf = spark.table({table_name}).where({required_conditions})

#Make Prediction
preds = (scoreDf
.withColumn('target_column_name', pyfunc_udf('Input_column1',
'Input_column2', ' Input_column3', …))
)

display(preds)

Clean up resources
If you wish to keep your Azure Databricks workspace, but no longer need the Azure
Machine Learning workspace, you can delete the Azure Machine Learning workspace.
This action results in unlinking your Azure Databricks workspace and the Azure Machine
Learning workspace.
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to
delete them individually is unavailable at this time. Instead, delete the resource group
that contains the storage account and workspace, so you don't incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Deploy MLflow models as an Azure web service.
Manage your models.
Track experiment jobs with MLflow and Azure Machine Learning.
Learn more about Azure Databricks and MLflow.
Track Azure Synapse Analytics ML
experiments with MLflow and Azure
Machine Learning
Article • 02/24/2023

In this article, learn how to enable MLflow to connect to Azure Machine Learning while
working in an Azure Synapse Analytics workspace. You can leverage this configuration
for tracking, model management and model deployment.

MLflow is an open-source library for managing the life cycle of your machine learning
experiments. MLFlow Tracking is a component of MLflow that logs and tracks your
training run metrics and model artifacts. Learn more about MLflow.

If you have an MLflow Project to train with Azure Machine Learning, see Train ML
models with MLflow Projects and Azure Machine Learning (preview).

Prerequisites
An Azure Synapse Analytics workspace and cluster.
An Azure Machine Learning Workspace.

Install libraries
To install libraries on your dedicated cluster in Azure Synapse Analytics:

1. Create a requirements.txt file with the packages your experiments requires, but
making sure it also includes the following packages:

requirements.txt

pip

mlflow
azureml-mlflow
azure-ai-ml

2. Navigate to Azure Analytics Workspace portal.

3. Navigate to the Manage tab and select Apache Spark Pools.

4. Click the three dots next to the cluster name, and select Packages.

5. On the Requirements files section, click on Upload.

6. Upload the requirements.txt file.

7. Wait for your cluster to restart.

Track experiments with MLflow

Azure Synapse Analytics can be configured to track experiments using MLflow to Azure
Machine Learning workspace. Azure Machine Learning provides a centralized repository
to manage the entire lifecycle of experiments, models and deployments. It also has the
advantage of enabling easier path to deployment using Azure Machine Learning
deployment options.

Configuring your notebooks to use MLflow connected to

Azure Machine Learning
To use Azure Machine Learning as your centralized repository for experiments, you can
leverage MLflow. On each notebook where you are working on, you have to configure
the tracking URI to point to the workspace you will be using. The following example
shows how it can be done:

Configure tracking URI

1. Get the tracking URI for your workspace:

Azure CLI

APPLIES TO: Azure CLI ml extension v2 (current)

a. Login and configure your workspace:

Bash

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-
group> location=<location>

b. You can get the tracking URI using the az ml workspace command:

Bash

az ml workspace show --query mlflow_tracking_uri

2. Configuring the tracking URI:

Using MLflow SDK

Then the method set_tracking_uri() points the MLflow tracking URI to that
URI.

Python

import mlflow

mlflow.set_tracking_uri(mlflow_tracking_uri)

 Tip

When working on shared environments, like an Azure Databricks cluster,

Configure authentication

２ Warning

For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.

MLflow SDK

Python

import os

os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"

 Tip

When working on shared environments, it is advisable to configure these

Experiment's names in Azure Machine Learning

By default, Azure Machine Learning tracks runs in a default experiment called Default . It
is usually a good idea to set the experiment you will be going to work on. Use the
following syntax to set the experiment's name:

Python
mlflow.set_experiment(experiment_name="experiment-name")

Tracking parameters, metrics and artifacts

You can use then MLflow in Azure Synapse Analytics in the same way as you're used to.
For details see Log & view metrics and log files.

Registering models in the registry with MLflow

Models can be registered in Azure Machine Learning workspace, which offers a
centralized repository to manage their lifecycle. The following example logs a model
trained with Spark MLLib and also registers it in the registry.

Python

mlflow.spark.log_model(model,
artifact_path = "model",
registered_model_name = "model_name")

If a registered model with the name doesn’t exist, the method registers a new
model, creates version 1, and returns a ModelVersion MLflow object.

If a registered model with the name already exists, the method creates a new
model version and returns the version object.

You can manage models registered in Azure Machine Learning using MLflow. View
Manage models registries in Azure Machine Learning with MLflow for more details.

Deploying and consuming models registered in

Azure Machine Learning
Models registered in Azure Machine Learning Service using MLflow can be consumed
as:

An Azure Machine Learning endpoint (real-time and batch): This deployment

allows you to leverage Azure Machine Learning deployment capabilities for both
real-time and batch inference in Azure Container Instances (ACI), Azure Kubernetes
(AKS) or our Managed Endpoints.

MLFlow model objects or Pandas UDFs, which can be used in Azure Synapse
Analytics notebooks in streaming or batch pipelines.
Deploy models to Azure Machine Learning endpoints
You can leverage the azureml-mlflow plugin to deploy a model to your Azure Machine
Learning workspace. Check How to deploy MLflow models page for a complete detail
about how to deploy models to the different targets.

） Important

Models need to be registered in Azure Machine Learning registry in order to deploy

them. Deployment of unregistered models is not supported in Azure Machine
Learning.

Deploy models for batch scoring using UDFs

You can choose Azure Synapse Analytics clusters for batch scoring. The MLFlow model is
loaded and used as a Spark Pandas UDF to score new data.

Python

from pyspark.sql.types import ArrayType, FloatType

model_uri = "runs:/"+last_run_id+ {model_path}

#Create a Spark UDF for the MLFlow model

pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri)

#Load Scoring Data into Spark Dataframe

scoreDf = spark.table({table_name}).where({required_conditions})

#Make Prediction
preds = (scoreDf
.withColumn('target_column_name', pyfunc_udf('Input_column1',
'Input_column2', ' Input_column3', …))
)

display(preds)

Clean up resources
If you wish to keep your Azure Synapse Analytics workspace, but no longer need the
Azure Machine Learning workspace, you can delete the Azure Machine Learning
workspace. If you don't plan to use the logged metrics and artifacts in your workspace,
the ability to delete them individually is unavailable at this time. Instead, delete the
resource group that contains the storage account and workspace, so you don't incur any
charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Next steps
Track experiment runs with MLflow and Azure Machine Learning.
Deploy MLflow models in Azure Machine Learning.
Manage your models with MLflow.
Train with MLflow Projects in Azure
Machine Learning (preview)
Article • 07/06/2023

In this article, learn how to submit training jobs with MLflow Projects that use Azure
Machine Learning workspaces for tracking. You can submit jobs and only track them
with Azure Machine Learning or migrate your runs to the cloud to run completely on
Azure Machine Learning Compute.

） Important

MLflow Projects allow for you to organize and describe your code to let other data
scientists (or automated tools) run it. MLflow Projects with Azure Machine Learning
enable you to track and manage your training runs in your workspace.

２ Warning

Support for MLflow Projects in Azure Machine Learning will end on September 30,
2023. You'll be able to submit MLflow Projects ( MLproject files) to Azure Machine
Learning until that date.

We recommend that you transition to Azure Machine Learning Jobs, using either
the Azure CLI or the Azure Machine Learning SDK for Python (v2) before September
2026, when MLflow Projects will be fully retired in Azure Machine Learning. For
more information on Azure Machine Learning jobs, see Track ML experiments and
models with MLflow.

Learn more about the MLflow and Azure Machine Learning integration.

Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .

Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Using Azure Machine Learning as backend for MLflow projects requires the
package azureml-core :

Bash

pip install azureml-core

Connect to your workspace

If you're working outside Azure Machine Learning, you need to configure MLflow to
point to your Azure Machine Learning workspace's tracking URI. You can find the
instructions at Configure MLflow for Azure Machine Learning.

Track MLflow Projects in Azure Machine

Learning workspaces
This example shows how to submit MLflow projects and track them Azure Machine
Learning.

1. Add the azureml-mlflow package as a pip dependency to your environment

configuration file in order to track metrics and key artifacts in your workspace.

conda.yaml

YAML

name: mlflow-example
channels:
- defaults
dependencies:
- numpy>=1.14.3
- pandas>=1.0.0
- scikit-learn
- pip:
- mlflow
- azureml-mlflow

2. Submit the local run and ensure you set the parameter backend = "azureml" , which
adds support of automatic tracking, model's capture, log files, snapshots, and
printed errors in your workspace. In this example we assume the MLflow project
you are trying to run is in the same folder you currently are, uri="." .

MLflow CLI

Bash

mlflow run . --experiment-name --backend azureml --env-

manager=local -P alpha=0.3

View your runs and metrics in the Azure Machine Learning studio .

Train MLflow projects in Azure Machine

Learning jobs
This example shows how to submit MLflow projects as a job running on Azure Machine
Learning compute.

1. Create the backend configuration object, in this case we are going to indicate
COMPUTE . This parameter references the name of your remote compute cluster you
want to use for running your project. If COMPUTE is present, the project will be
automatically submitted as an Azure Machine Learning job to the indicated
compute.

MLflow CLI

backend_config.json

JSON

{
"COMPUTE": "cpu-cluster"
}

2. Add the azureml-mlflow package as a pip dependency to your environment

configuration file in order to track metrics and key artifacts in your workspace.

conda.yaml

YAML

name: mlflow-example
channels:
- defaults
dependencies:
- numpy>=1.14.3
- pandas>=1.0.0
- scikit-learn
- pip:
- mlflow
- azureml-mlflow

3. Submit the local run and ensure you set the parameter backend = "azureml" , which
adds support of automatic tracking, model's capture, log files, snapshots, and
printed errors in your workspace. In this example we assume the MLflow project
you are trying to run is in the same folder you currently are, uri="." .

MLflow CLI

Bash
mlflow run . --backend azureml --backend-config backend_config.json
-P alpha=0.3

７ Note

Since Azure Machine Learning jobs always run in the context of environments,
the parameter env_manager is ignored.

View your runs and metrics in the Azure Machine Learning studio .

Clean up resources
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to
delete them individually is currently unavailable. Instead, delete the resource group that
contains the storage account and workspace, so you don't incur any charges:

1. In the Azure portal, select Resource groups on the far left.

2. From the list, select the resource group you created.

3. Select Delete resource group.

4. Enter the resource group name. Then select Delete.

Example notebooks
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon
concepts presented in this article.
Train an MLflow project on a local compute
Train an MLflow project on remote compute .

７ Note

A community-driven repository of examples using mlflow can be found at

https://github.com/Azure/azureml-examples .

Next steps
Track Azure Databricks runs with MLflow.
Query & compare experiments and runs with MLflow.
Manage models registries in Azure Machine Learning with MLflow.
Guidelines for deploying MLflow models.
Log metrics, parameters and files with
MLflow
Article • 04/04/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Azure Machine Learning supports logging and tracking experiments using MLflow
Tracking . You can log models, metrics, parameters, and artifacts with MLflow as it
supports local mode to cloud portability.

） Important

Unlike the Azure Machine Learning SDK v1, there is no logging functionality in the
Azure Machine Learning SDK for Python (v2). See this guidance to learn how to log
with MLflow. If you were using Azure Machine Learning SDK v1 before, we
recommend you to start leveraging MLflow for tracking experiments. See Migrate
logging from SDK v1 to MLflow for specific guidance.

Logs can help you diagnose errors and warnings, or track performance metrics like
parameters and model performance. In this article, you learn how to enable logging in
the following scenarios:

＂ Log metrics, parameters and models when submitting jobs.

＂ Tracking runs when training interactively.
＂ Viewing diagnostic information about training.

 Tip

This article shows you how to monitor the model training process. If you're
interested in monitoring resource usage and events from Azure Machine Learning,
such as quotas, completed training jobs, or completed model deployments, see
Monitoring Azure Machine Learning.

 Tip

For information on logging metrics in Azure Machine Learning designer, see How
to log metrics in the designer.
Prerequisites
You must have an Azure Machine Learning workspace. Create one if you don't have
any.

You must have mlflow , and azureml-mlflow packages installed. If you don't, use
the following command to install them in your development environment:

Bash

pip install mlflow azureml-mlflow

If you are doing remote tracking (tracking experiments running outside Azure
Machine Learning), configure MLflow to track experiments using Azure Machine
Learning. See Configure MLflow for Azure Machine Learning for more details.

To log metrics, parameters, artifacts and models in your experiments in Azure

Machine Learning using MLflow, just import MLflow in your script:

Python

import mlflow

Configuring experiments
MLflow organizes the information in experiments and runs (in Azure Machine Learning,
runs are called Jobs). There are some differences in how to configure them depending
on how you are running your code:

Training interactively

When training interactively, such as in a Jupyter Notebook, use the following

pattern:

1. Create or set the active experiment.

2. Start the job.
3. Use logging methods to log metrics and other information.
4. End the job.

For example, the following code snippet demonstrates configuring the experiment,
and then logging during a job:

Python
import mlflow
# Set the experiment
mlflow.set_experiment("mlflow-experiment")

# Start the run

mlflow_run = mlflow.start_run()
# Log metrics or other information
mlflow.log_metric('mymetric', 1)
# End run
mlflow.end_run()

 Tip

Technically you don't have to call start_run() as a new run is created if one
doesn't exist and you call a logging API. In that case, you can use
mlflow.active_run() to retrieve the run once currently being used. For more

information, see mlflow.active_run() .

You can also use the context manager paradigm:

Python

import mlflow
mlflow.set_experiment("mlflow-experiment")

# Start the run, log metrics, end the run

with mlflow.start_run() as run:
# Run started when context manager is entered, and ended when
context manager exits
mlflow.log_metric('mymetric', 1)
mlflow.log_metric('anothermetric',1)
pass

When you start a new run with mlflow.start_run , it may be useful to indicate the
parameter run_name which will then translate to the name of the run in Azure
Machine Learning user interface and help you identify the run quicker:

Python

with mlflow.start_run(run_name="iris-classifier-random-forest") as run:

mlflow.log_metric('mymetric', 1)
mlflow.log_metric('anothermetric',1)

For more information on MLflow logging APIs, see the MLflow reference .
Logging parameters
MLflow supports the logging parameters used by your experiments. Parameters can be
of any type, and can be logged using the following syntax:

Python

mlflow.log_param("num_epochs", 20)

MLflow also offers a convenient way to log multiple parameters by indicating all of them
using a dictionary. Several frameworks can also pass parameters to models using
dictionaries and hence this is a convenient way to log them in the experiment.

Python

params = {
"num_epochs": 20,
"dropout_rate": .6,
"objective": "binary_crossentropy"
}

mlflow.log_params(params)

Logging metrics
Metrics, as opposite to parameters, are always numeric. The following table describes
how to log specific numeric types:

Logged Value Example code Notes

Log a numeric mlflow.log_metric("my_metric",

value (int or 1)
float)

Log a numeric mlflow.log_metric("my_metric", Use parameter step to indicate the step at

value (int or 1, step=1) which you are logging the metric value. It
float) over time can be any integer number. It defaults to
zero.

Log a boolean mlflow.log_metric("my_metric", 0 = True, 1 = False

value 0)

） Important
Performance considerations: If you need to log multiple metrics (or multiple values
for the same metric) avoid making calls to mlflow.log_metric in loops. Better
performance can be achieved by logging batch of metrics. Use the method
mlflow.log_metrics which accepts a dictionary with all the metrics you want to log
at once or use MLflowClient.log_batch which accepts multiple type of elements for
logging. See Logging curves or list of values for an example.

Logging curves or list of values

Curves (or list of numeric values) can be logged with MLflow by logging the same metric
multiple times. The following example shows how to do it:

Python

list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
from mlflow.entities import Metric
from mlflow.tracking import MlflowClient
import time

client = MlflowClient()
client.log_batch(mlflow.active_run().info.run_id,
metrics=[Metric(key="sample_list", value=val,
timestamp=int(time.time() * 1000), step=0) for val in list_to_log])

Logging images
MLflow supports two ways of logging images. Both of them persists the given image as
an artifact inside of the run.

Logged Example code Notes

Value

Log numpy mlflow.log_image(img, img should be an instance of numpy.ndarray or

metrics or "figure.png") PIL.Image.Image . figure.png is the name of the artifact
PIL image that will be generated inside of the run. It doesn't have
objects to be an existing file.

Log matlotlib mlflow.log_figure(fig, figure.png is the name of the artifact that will be
plot or "figure.png") generated inside of the run. It doesn't have to be an
image file existing file.

Logging files
In general, files in MLflow are called artifacts. You can log artifacts in multiple ways in
Mlflow:

Logged Value Example code Notes

Log text in a mlflow.log_text("text string", Text is persisted inside of the run in

text file "notes.txt") a text file with name notes.txt .

Log mlflow.log_dict(dictionary, "file.yaml" dictionary is a dictionary object

dictionaries as containing all the structure that
JSON and YAML you want to persist as JSON or
files YAML file.

Log a trivial file mlflow.log_artifact("path/to/file.pkl") Files are always logged in the root
already of the run. If artifact_path is
existing provided, then the file is logged in
a folder as indicated in that
parameter.

Log all the mlflow.log_artifacts("path/to/folder") Folder structure is copied to the

artifacts in an run, but the root folder indicated is
existing folder not included.

 Tip

When loggiging large files with log_artifact or log_model , you may encounter
time out errors before the upload of the file is completed. Consider increasing the
timeout value by adjusting the environment variable
AZUREML_ARTIFACTS_DEFAULT_TIMEOUT . It's default value is 300 (seconds).

Logging models
MLflow introduces the concept of "models" as a way to package all the artifacts required
for a given model to function. Models in MLflow are always a folder with an arbitrary
number of files, depending on the framework used to generate the model. Logging
models has the advantage of tracking all the elements of the model as a single entity
that can be registered and then deployed. On top of that, MLflow models enjoy the
benefit of no-code deployment and can be used with the Responsible AI dashboard in
studio. Read the article From artifacts to models in MLflow for more information.

To save the model from a training run, use the log_model() API for the framework
you're working with. For example, mlflow.sklearn.log_model() . For more details about
how to log MLflow models see Logging MLflow models For migrating existing models
to MLflow, see Convert custom models to MLflow.

 Tip

When loggiging large models, you may encounter the error Failed to flush the
queue within 300 seconds . Usually, it means the operation is timing out before the

upload of the model artifacts is completed. Consider increasing the timeout value
by adjusting the environment variable AZUREML_ARTIFACTS_DEFAULT_VALUE .

Automatic logging
With Azure Machine Learning and MLflow, users can log metrics, model parameters and
model artifacts automatically when training a model. Each framework decides what to
track automatically for you. A variety of popular machine learning libraries are
supported. Learn more about Automatic logging with MLflow .

To enable automatic logging insert the following code before your training code:

Python

mlflow.autolog()

 Tip

You can control what gets automatically logged with autolog. For instance, if you
indicate mlflow.autolog(log_models=False) , MLflow will log everything but models
for you. Such control is useful in cases where you want to log models manually but
still enjoy automatic logging of metrics and parameters. Also notice that some
frameworks may disable automatic logging of models if the trained model goes
behond specific boundaries. Such behavior depends on the flavor used and we
recommend you to view they documentation if this is your case.

View jobs/runs information with MLflow

You can view the logged information using MLflow through the MLflow.entities.Run
object:

Python
import mlflow

run = mlflow.get_run(run_id="<RUN_ID>")

You can view the metrics, parameters, and tags for the run in the data field of the run
object.

Python

metrics = run.data.metrics
params = run.data.params
tags = run.data.tags

７ Note

The metrics dictionary returned by mlflow.get_run or mlflow.seach_runs only

returns the most recently logged value for a given metric name. For example, if you
log a metric called iteration multiple times with values, 1 , then 2 , then 3 , then 4 ,
only 4 is returned when calling run.data.metrics['iteration'] .

To get all metrics logged for a particular metric name, you can use
MlFlowClient.get_metric_history() as explained in the example Getting params
and metrics from a run.

 Tip

MLflow can retrieve metrics and parameters from multiple runs at the same time,
allowing for quick comparisons across multiple trials. Learn about this in Query &
compare experiments and runs with MLflow.

Any artifact logged by a run can be queried by MLflow. Artifacts can't be accessed using
the run object itself and the MLflow client should be used instead:

Python

client = mlflow.tracking.MlflowClient()
client.list_artifacts("<RUN_ID>")

The method above will list all the artifacts logged in the run, but they will remain stored
in the artifacts store (Azure Machine Learning storage). To download any of them, use
the method download_artifact :
Python

file_path = client.download_artifacts("<RUN_ID>",
path="feature_importance_weight.png")

For more information please refer to Getting metrics, parameters, artifacts and models.

View jobs/runs information in the studio

You can browse completed job records, including logged metrics, in the Azure Machine
Learning studio .

Navigate to the Jobs tab. To view all your jobs in your Workspace across Experiments,
select the All jobs tab. You can drill down on jobs for specific Experiments by applying
the Experiment filter in the top menu bar. Click on the job of interest to enter the details
view, and then select the Metrics tab.

Select the logged metrics to render charts on the right side. You can customize the
charts by applying smoothing, changing the color, or plotting multiple metrics on a
single graph. You can also resize and rearrange the layout as you wish. Once you have
created your desired view, you can save it for future use and share it with your
teammates using a direct link.

View and download diagnostic logs

Log files are an essential resource for debugging the Azure Machine Learning
workloads. After submitting a training job, drill down to a specific run to view its logs
and outputs:

1. Navigate to the Jobs tab.

2. Select the runID for a specific run.
3. Select Outputs and logs at the top of the page.
4. Select Download all to download all your logs into a zip folder.
5. You can also download individual log files by choosing the log file and selecting
Download

user_logs folder

This folder contains information about the user generated logs. This folder is open by
default, and the std_log.txt log is selected. The std_log.txt is where your code's logs (for
example, print statements) show up. This file contains stdout log and stderr logs from
your control script and training script, one per process. In most cases, you'll monitor the
logs here.

system_logs folder
This folder contains the logs generated by Azure Machine Learning and it will be closed
by default. The logs generated by the system are grouped into different folders, based
on the stage of the job in the runtime.

Other folders

For jobs training on multi-compute clusters, logs are present for each node IP. The
structure for each node is the same as single node jobs. There's one more logs folder for
overall execution, stderr, and stdout logs.
Azure Machine Learning logs information from various sources during training, such as
AutoML or the Docker container that runs the training job. Many of these logs aren't
documented. If you encounter problems and contact Microsoft support, they may be
able to use these logs during troubleshooting.

Next steps
Train ML models with MLflow and Azure Machine Learning.
Migrate from SDK v1 logging to MLflow tracking.
Logging MLflow models
Article • 02/24/2023

The following article explains how to start logging your trained models (or artifacts) as
MLflow models. It explores the different methods to customize the way MLflow
packages your models and hence how it runs them.

Why logging models instead of artifacts?

If you are not familiar with MLflow, you may not be aware of the difference between
logging artifacts or files vs. logging MLflow models. We recommend reading the article
From artifacts to models in MLflow for an introduction to the topic.

A model in MLflow is also an artifact, but with a specific structure that serves as a
contract between the person that created the model and the person that intends to use
it. Such contract helps build the bridge about the artifacts themselves and what they
mean.

Logging models has the following advantages:

＂ Models can be directly loaded for inference using mlflow.<flavor>.load_model and

use the predict function.
＂ Models can be used as pipelines inputs directly.
＂ Models can be deployed without indicating a scoring script nor an environment.
＂ Swagger is enabled in deployed endpoints automatically and the Test feature can
be used in Azure Machine Learning studio.
＂ You can use the Responsible AI dashboard.

There are different ways to start using the model's concept in Azure Machine Learning
with MLflow, as explained in the following sections:

Logging models using autolog

One of the simplest ways to start using this approach is by using MLflow autolog
functionality. Autolog allows MLflow to instruct the framework associated to with the
framework you are using to log all the metrics, parameters, artifacts and models that the
framework considers relevant. By default, most models will be log if autolog is enabled.
Some flavors may decide not to do that in specific situations. For instance, the flavor
PySpark won't log models if they exceed a certain size.
You can turn on autologging by using either mlflow.autolog() or mlflow.
<flavor>.autolog() . The following example uses autolog() for logging a classifier
model trained with XGBoost:

Python

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

mlflow.autolog()

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")

model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

 Tip

If you are using Machine Learning pipelines, like for instance Scikit-Learn
pipelines , use the autolog functionality of that flavor for logging models. Models
are automatically logged when the fit() method is called on the pipeline object.
The notebook Training and tracking an XGBoost classifier with MLflow
demonstrates how to log a model with preprocessing using pipelines.

Logging models with a custom signature,

environment or samples
You can log models manually using the method mlflow.<flavor>.log_model in MLflow.
Such workflow has the advantages of retaining control of different aspects of how the
model is logged.

Use this method when:

＂ You want to indicate pip packages or a conda environment different from the ones
that are automatically detected.
＂ You want to include input examples.
＂ You want to include specific artifacts into the package that will be needed.
＂ Your signature is not correctly inferred by autolog . This is specifically important
when you deal with inputs that are tensors where the signature needs specific
shapes.
＂ Somehow the default behavior of autolog doesn't fill your purpose.

The following example code logs a model for an XGBoost classifier:

Python

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env

mlflow.autolog(log_models=False)

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")

model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

# Signature
signature = infer_signature(X_test, y_test)

# Conda environment
custom_env =_mlflow_conda_env(
additional_conda_deps=None,
additional_pip_deps=["xgboost==1.5.2"],
additional_conda_channels=None,
)

# Sample
input_example = X_train.sample(n=1)

# Log the model manually

mlflow.xgboost.log_model(model,
artifact_path="classifier",
conda_env=custom_env,
signature=signature,
input_example=input_example)

７ Note

log_models=False is configured in autolog . This prevents MLflow to

automatically log the model, as it is done manually later.

infer_signature is a convenient method to try to infer the signature directly

from inputs and outputs.

mlflow.utils.environment._mlflow_conda_env is a private method in MLflow

SDK and it may change in the future. This example uses it just for sake of
simplicity, but it must be used with caution or generate the YAML definition
manually as a Python dictionary.

Logging models with a different behavior in

the predict method
When you log a model using either mlflow.autolog or using mlflow.
<flavor>.log_model , the flavor used for the model decides how inference should be

executed and what gets returned by the model. MLflow doesn't enforce any specific
behavior in how the predict generate results. There are scenarios where you probably
want to do some pre-processing or post-processing before and after your model is
executed.

A solution to this scenario is to implement machine learning pipelines that moves from
inputs to outputs directly. Although this is possible (and sometimes encourageable for
performance considerations), it may be challenging to achieve. For those cases, you
probably want to customize how your model does inference using a custom models as
explained in the following section.

Logging custom models

MLflow provides support for a variety of machine learning frameworks including
FastAI, MXNet Gluon, PyTorch, TensorFlow, XGBoost, CatBoost, h2o, Keras, LightGBM,
MLeap, ONNX, Prophet, spaCy, Spark MLLib, Scikit-Learn, and statsmodels. However,
there may be times where you need to change how a flavor works, log a model not
natively supported by MLflow or even log a model that uses multiple elements from
different frameworks. For those cases, you may need to create a custom model flavor.

For this type of models, MLflow introduces a flavor called pyfunc (standing from Python
function). Basically this flavor allows you to log any object you want as a model, as long
as it satisfies two conditions:

You implement the method predict (at least).

The Python object inherits from mlflow.pyfunc.PythonModel .

 Tip

Serializable models that implements the Scikit-learn API can use the Scikit-learn
flavor to log the model, regardless of whether the model was built with Scikit-learn.
If your model can be persisted in Pickle format and the object has methods
predict() and predict_proba() (at least), then you can use
mlflow.sklearn.log_model() to log it inside a MLflow run.

Using a model wrapper

The simplest way of creating your custom model's flavor is by creating a wrapper
around your existing model object. MLflow will serialize it and package it for you.
Python objects are serializable when the object can be stored in the file system as a
file (generally in Pickle format). During runtime, the object can be materialized from
such file and all the values, properties and methods available when it was saved will
be restored.

Use this method when:

＂ Your model can be serialized in Pickle format.

＂ You want to retain the models state as it was just after training.
＂ You want to customize the way the predict function works.

The following sample wraps a model created with XGBoost to make it behaves in a
different way to the default implementation of the XGBoost flavor (it returns the
probabilities instead of the classes):

Python

from mlflow.pyfunc import PythonModel, PythonModelContext

class ModelWrapper(PythonModel):
def __init__(self, model):
self._model = model

def predict(self, context: PythonModelContext, data):

# You don't have to keep the semantic meaning of `predict`. You
can use here model.recommend(), model.forecast(), etc
return self._model.predict_proba(data)

# You can even add extra functions if you need to. Since the model
is serialized,
# all of them will be available when you load your model back.
def predict_batch(self, data):
pass

Then, a custom model can be logged in the run like this:

Python
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature

mlflow.xgboost.autolog(log_models=False)

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")

model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_probs = model.predict_proba(X_test)

accuracy = accuracy_score(y_test, y_probs.argmax(axis=1))

mlflow.log_metric("accuracy", accuracy)

signature = infer_signature(X_test, y_probs)

mlflow.pyfunc.log_model("classifier",
python_model=ModelWrapper(model),
signature=signature)

 Tip

Note how the infer_signature method now uses y_probs to infer the
signature. Our target column has the target class, but our model now returns
the two probabilities for each class.

Next steps
Deploy MLflow models
Query & compare experiments and runs
with MLflow
Article • 06/26/2023

Experiments and jobs (or runs) in Azure Machine Learning can be queried using MLflow.
You don't need to install any specific SDK to manage what happens inside of a training
job, creating a more seamless transition between local runs and the cloud by removing
cloud-specific dependencies. In this article, you'll learn how to query and compare
experiments and runs in your workspace using Azure Machine Learning and MLflow SDK
in Python.

MLflow allows you to:

Create, query, delete and search for experiments in a workspace.

Query, delete, and search for runs in a workspace.
Track and retrieve metrics, parameters, artifacts and models from runs.

See Support matrix for querying runs and experiments in Azure Machine Learning for a
detailed comparison between MLflow Open-Source and MLflow when connected to
Azure Machine Learning.

７ Note

The Azure Machine Learning Python SDK v2 does not provide native logging or
tracking capabilities. This applies not just for logging but also for querying the
metrics logged. Instead, use MLflow to manage experiments and runs. This article
explains how to use MLflow to manage experiments and runs in Azure Machine
Learning.

REST API
Query and searching experiments and runs is also available using the MLflow REST API.
See Using MLflow REST with Azure Machine Learning for an example about how to
consume it.

Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Query and search experiments

Use MLflow to search for experiments inside of your workspace. See the following
examples:

Get all active experiments:

Python

mlflow.search_experiments()

７ Note

In legacy versions of MLflow (<2.0) use method mlflow.list_experiments()

instead.

Get all the experiments, including archived:

Python

from mlflow.entities import ViewType

mlflow.search_experiments(view_type=ViewType.ALL)

Get a specific experiment by name:

Python

mlflow.get_experiment_by_name(experiment_name)

Get a specific experiment by ID:

Python

mlflow.get_experiment('1234-5678-90AB-CDEFG')

Searching experiments
The search_experiments() method available since Mlflow 2.0 allows searching
experiment matching a criteria using filter_string .

Retrieve multiple experiments based on their IDs:

Python

mlflow.search_experiments(filter_string="experiment_id IN ("
"'CDEFG-1234-5678-90AB', '1234-5678-90AB-CDEFG', '5678-1234-90AB-
CDEFG')"
)

Retrieve all experiments created after a given time:

Python

import datetime

dt = datetime.datetime(2022, 6, 20, 5, 32, 48)

mlflow.search_experiments(filter_string=f"creation_time >
{int(dt.timestamp())}")

Retrieve all experiments with a given tag:

Python
mlflow.search_experiments(filter_string=f"tags.framework = 'torch'")

Query and search runs

MLflow allows searching runs inside of any experiment, including multiple experiments
at the same time. The method mlflow.search_runs() accepts the argument
experiment_ids and experiment_name to indicate on which experiments you want to

search. You can also indicate search_all_experiments=True if you want to search across
all the experiments in the workspace:

By experiment name:

Python

mlflow.search_runs(experiment_names=[ "my_experiment" ])

By experiment ID:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])

Search across all experiments in the workspace:

Python

mlflow.search_runs(filter_string="params.num_boost_round='100'",
search_all_experiments=True)

Notice that experiment_ids supports providing an array of experiments, so you can

search runs across multiple experiments if required. This may be useful in case you want
to compare runs of the same model when it is being logged in different experiments (by
different people, different project iterations, etc.).

） Important

If experiment_ids , experiment_names , or search_all_experiments are not indicated,

then MLflow will search by default in the current active experiment. You can set the
active experiment using mlflow.set_experiment()
By default, MLflow returns the data in Pandas Dataframe format, which makes it handy
when doing further processing our analysis of the runs. Returned data includes columns
with:

Basic information about the run.

Parameters with column's name params.<parameter-name> .
Metrics (last logged value of each) with column's name metrics.<metric-name> .

All metrics and parameters are also returned when querying runs. However, for metrics
containing multiple values (for instance, a loss curve, or a PR curve), only the last value
of the metric is returned. If you want to retrieve all the values of a given metric, uses
mlflow.get_metric_history method. See Getting params and metrics from a run for an

example.

Ordering runs
By default, experiments are ordered descending by start_time , which is the time the
experiment was queue in Azure Machine Learning. However, you can change this default
by using the parameter order_by .

Order runs by attributes, like start_time :

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.start_time DESC"])

Order runs and limit results. The following example returns the last single run in
the experiment:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
max_results=1, order_by=["attributes.start_time
DESC"])

Order runs by the attribute duration :

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.duration DESC"])
 Tip

attributes.duration is not present in MLflow OSS, but provided in Azure

Machine Learning for convenience.

Order runs by metric's values:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG"
]).sort_values("metrics.accuracy", ascending=False)

２ Warning

Using order_by with expressions containing metrics.* , params.* , or tags.*

in the parameter order_by is not supported by the moment. Please use
order_values method from Pandas as shown in the example.

Filtering runs
You can also look for a run with a specific combination in the hyperparameters using the
parameter filter_string . Use params to access run's parameters, metrics to access
metrics logged in the run, and attributes to access run information details. MLflow
supports expressions joined by the AND keyword (the syntax does not support OR):

Search runs based on a parameter's value:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'")

２ Warning

Only operators = , like , and != are supported for filtering parameters .

Search runs based on a metric's value:

Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="metrics.auc>0.8")

Search runs with a given tag:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="tags.framework='torch'")

Search runs created by a given user:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.user_id = 'John Smith'")

Search runs that have failed. See Filter runs by status for possible values:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")

Search runs created after a given time:

Python

import datetime

dt = datetime.datetime(2022, 6, 20, 5, 32, 48)

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.creation_time >
'{int(dt.timestamp())}'")

 Tip

Notice that for the key attributes , values should always be strings and hence
encoded between quotes.

Search runs taking longer than one hour:

Python
duration = 360 * 1000 # duration is in milliseconds
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.duration > '{duration}'")

 Tip

attributes.duration is not present in MLflow OSS, but provided in Azure

Machine Learning for convenience.

Search runs having the ID in a given set:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.run_id IN ('1234-5678-
90AB-CDEFG', '5678-1234-90AB-CDEFG')")

Filter runs by status

When filtering runs by status, notice that MLflow uses a different convention to name
the different possible status of a run compared to Azure Machine Learning. The
following table shows the possible values:

Azure Machine MLFlow's Meaning

Learning Job attributes.status
status

Not started SCHEDULED The job/run was just registered in Azure Machine
Learning but it has processed it yet.

Queue SCHEDULED The job/run is scheduled for running, but it hasn't

started yet.

Preparing SCHEDULED The job/run has not started yet, but a compute has
been allocated for the execution and it is on building
state.

Running RUNNING The job/run is currently under active execution.

Completed FINISHED The job/run has completed without errors.

Failed FAILED The job/run has completed with errors.

Canceled KILLED The job/run has been canceled or killed by the

user/system.
Example:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")

Getting metrics, parameters, artifacts and

models
The method search_runs returns a Pandas Dataframe containing a limited amount of
information by default. You can get Python objects if needed, which may be useful to
get details about them. Use the output_format parameter to control how output is
returned:

Python

runs = mlflow.search_runs(
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'",
output_format="list",
)

Details can then be accessed from the info member. The following sample shows how
to get the run_id :

Python

last_run = runs[-1]
print("Last run ID:", last_run.info.run_id)

Getting params and metrics from a run

When runs are returned using output_format="list" , you can easily access parameters
using the key data :

Python

last_run.data.params

In the same way, you can query metrics:

Python

last_run.data.metrics

For metrics that contain multiple values (for instance, a loss curve, or a PR curve), only
the last logged value of the metric is returned. If you want to retrieve all the values of a
given metric, uses mlflow.get_metric_history method. This method requires you to use
the MlflowClient :

Python

client = mlflow.tracking.MlflowClient()
client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")

Getting artifacts from a run

Any artifact logged by a run can be queried by MLflow. Artifacts can't be access using
the run object itself and the MLflow client should be used instead:

Python

client = mlflow.tracking.MlflowClient()
client.list_artifacts("1234-5678-90AB-CDEFG")

Python

file_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG",
artifact_path="feature_importance_weight.png"
)

７ Note

In legacy versions of MLflow (<2.0), use the method

MlflowClient.download_artifacts() instead.

Getting models from a run

Models can also be logged in the run and then retrieved directly from it. To retrieve it,
you need to know the artifact's path where it is stored. The method list_artifacats
can be used to find artifacts that are representing a model since MLflow models are
always folders. You can download a model by indicating the path where the model is
stored using the download_artifact method:

Python

artifact_path="classifier"
model_local_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG", artifact_path=artifact_path
)

You can then load the model back from the downloaded artifacts using the typical
function load_model in the flavor-specific namespace. The following example uses
xgboost :

Python

model = mlflow.xgboost.load_model(model_local_path)

MLflow also allows you to both operations at once and download and load the model in
a single instruction. MLflow will download the model to a temporary folder and load it
from there. The method load_model uses an URI format to indicate from where the
model has to be retrieved. In the case of loading a model from a run, the URI structure is
as follows:

Python

model =
mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")

 Tip

For query and loading models registered in the Model Registry, view Manage
models registries in Azure Machine Learning with MLflow.

Getting child (nested) runs

MLflow supports the concept of child (nested) runs. They are useful when you need to
spin off training routines requiring being tracked independently from the main training
process. Hyper-parameter tuning optimization processes or Azure Machine Learning
pipelines are typical examples of jobs that generate multiple child runs. You can query
all the child runs of a specific run using the property tag mlflow.parentRunId , which
contains the run ID of the parent run.

Python

hyperopt_run = mlflow.last_active_run()
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId='{hyperopt_run.info.run_id}'"
)

Compare jobs and models in Azure Machine

Learning studio (preview)
To compare and evaluate the quality of your jobs and models in Azure Machine
Learning studio, use the preview panel to enable the feature. Once enabled, you can
compare the parameters, metrics, and tags between the jobs and/or models you
selected.

） Important

Training and tracking a classifier with MLflow : Demonstrates how to track

experiments using MLflow, log models and combine multiple flavors into pipelines.
Manage experiments and runs with MLflow : Demonstrates how to query
experiments, runs, metrics, parameters and artifacts from Azure Machine Learning
using MLflow.

Support matrix for querying runs and

experiments
The MLflow SDK exposes several methods to retrieve runs, including options to control
what is returned and how. Use the following table to learn about which of those
methods are currently supported in MLflow when connected to Azure Machine Learning:

Feature Supported Supported by Azure

by MLflow Machine Learning

Ordering runs by attributes ✓ ✓

Ordering runs by metrics ✓ 1

Ordering runs by parameters ✓ 1

Ordering runs by tags ✓ 1

Feature Supported Supported by Azure
by MLflow Machine Learning

Filtering runs by attributes ✓ ✓

Filtering runs by metrics ✓ ✓

Filtering runs by metrics with special characters ✓

(escaped)

Filtering runs by parameters ✓ ✓

Filtering runs by tags ✓ ✓

Filtering runs with numeric comparators (metrics) ✓ ✓

including = , != , > , >= , < , and <=

Filtering runs with string comparators (params, tags, ✓ ✓2

and attributes): = and !=

Filtering runs with string comparators (params, tags, ✓ ✓

and attributes): LIKE / ILIKE

Filtering runs with comparators AND ✓ ✓

Filtering runs with comparators OR

Renaming experiments ✓

７ Note

1 Check the section Ordering runs for instructions and examples on how to
achieve the same functionality in Azure Machine Learning.
2 != for tags not supported.

Next steps
Manage your models with MLflow.
Deploy models with MLflow.
Manage models registries in Azure
Machine Learning with MLflow
Article • 03/21/2023

Azure Machine Learning supports MLflow for model management. Such approach
represents a convenient way to support the entire model lifecycle for users familiar with
the MLFlow client. The following article describes the different capabilities and how it
compares with other options.

Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .

Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Some operations may be executed directly using the MLflow fluent API ( mlflow.
<method> ). However, others may require to create an MLflow client, which allows to

communicate with Azure Machine Learning in the MLflow protocol. You can create
an MlflowClient object as follows. This tutorial uses the object client to refer to
such MLflow client.

Python

import mlflow

client = mlflow.tracking.MlflowClient()

Registering new models in the registry

The models registry offer a convenient and centralized way to manage models in a
workspace. Each workspace has its own independent models registry. The following
section explains multiple ways to register models in the registry using MLflow SDK.

Creating models from an existing run

If you have an MLflow model logged inside of a run and you want to register it in a
registry, use the run ID and the path where the model was logged. See Manage
experiments and runs with MLflow to know how to query this information if you don't
have it.

Python

mlflow.register_model(f"runs:/{run_id}/{artifact_path}", model_name)

７ Note

Models can only be registered to the registry in the same workspace where the run
was tracked. Cross-workspace operations are not supported by the moment in
Azure Machine Learning.

 Tip

We recommend to register models from runs or using the method mlflow.

<flavor>.log_model from inside the run as it keeps lineage from the job that

generated the asset.

Creating models from assets

If you have a folder with an MLModel MLflow model, then you can register it directly.
There's no need for the model to be always in the context of a run. To do that you can
use the URI schema file://path/to/model to register MLflow models stored in the local
file system. Let's create a simple model using Scikit-Learn and save it in MLflow format
in the local storage:

Python

from sklearn import linear_model

reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

mlflow.sklearn.save_model(reg, "./regressor")

 Tip

The method save_model() works in the same way as log_model() . While

log_model() saves the model inside on an active run, save_model() uses the local

file system for saving the model.

You can now register the model from the local path:

Python

import os

model_local_path = os.path.abspath("./regressor")
mlflow.register_model(f"file://{model_local_path}", "local-model-test")

Querying model registries

You can use the MLflow SDK to query and search for models registered in the registry.
The following section explains multiple ways to achieve it.

Querying all the models in the registry

You can query all the registered models in the registry using the MLflow client. The
following sample prints all the model's names:

Python
for model in client.search_registered_models():
print(f"{model.name}")

Use order_by to order by a specific property like name , version , creation_timestamp ,

and last_updated_timestamp :

Python

client.search_registered_models(order_by=["name ASC"])

７ Note

MLflow 2.0 advisory: In older versions of Mlflow (<2.0), use method

MlflowClient.list_registered_models() instead.

Getting specific versions of the model

The search_registered_models() command retrieves the model object, which contains
all the model versions. However, if you want to get the last registered model version of a
given model, you can use get_registered_model :

Python

client.get_registered_model(model_name)

If you need a specific version of the model, you can indicate so:

Python

client.get_model_version(model_name, version=2)

Loading models from registry

You can load models directly from the registry to restore the models objects that were
logged. Use the functions mlflow.<flavor>.load_model() or mlflow.pyfunc.load_model()
indicating the URI of the model you want to load using the following syntax:

models:/<model-name>/latest , to load the last version of the model.

models:/<model-name>/<version-number> , to load a specific version of the model.

models:/<model-name>/<stage-name> , to load a specific version in a given stage for

a model. View Model stages for details.

 Tip

For learning about the difference between mlflow.<flavor>.load_model() and

mlflow.pyfunc.load_model() , view Loading MLflow models back article.

Model stages
MLflow supports model's stages to manage model's lifecycle. Model's version can
transition from one stage to another. Stages are assigned to a model's version (instead
of models) which means that a given model can have multiple versions on different
stages.

） Important

Stages can only be accessed using the MLflow SDK. They don't show up in the
Azure ML Studio portal and can't be retrieved using neither Azure ML SDK,
Azure ML CLI, or Azure ML REST API. Creating deployment from a given model's
stage is not supported by the moment.

Querying model stages

You can use the MLflow client to check all the possible stages a model can be:

Python

client.get_model_version_stages(model_name, version="latest")

You can see what model's version is on each stage by getting the model from the
registry. The following example gets the model's version currently in the stage Staging .

Python

client.get_latest_versions(model_name, stages=["Staging"])

７ Note
Multiple versions can be in the same stage at the same time in Mlflow, however,
this method returns the latest version (greater version) among all of them.

２ Warning

Stage names are case sensitive.

Transitioning models
Transitioning a model's version to a particular stage can be done using the MLflow
client.

Python

client.transition_model_version_stage(model_name, version=3,
stage="Staging")

By default, if there were an existing model version in that particular stage, it remains
there. Hence, it isn't replaced as multiple model's versions can be in the same stage at
the same time. Alternatively, you can indicate archive_existing_versions=True to tell
MLflow to move the existing model's version to the stage Archived .

Python

client.transition_model_version_stage(
model_name, version=3, stage="Staging", archive_existing_versions=True
)

Loading models from stages

ou can load a model in a particular stage directly from Python using the load_model
function and the following URI format. Notice that for this method to success, you need
to have all the libraries and dependencies already installed in the environment you're
working at.

Python

model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
Editing and deleting models
Editing registered models is supported in both Mlflow and Azure ML. However, there are
some differences important to be noticed:

２ Warning

Renaming models is not supported in Azure Machine Learning as model objects are
immmutable.

Editing models
You can edit model's description and tags from a model using Mlflow:

Python

client.update_model_version(model_name, version=1, description="My

classifier description")

To edit tags, you have to use the method set_model_version_tag and

remove_model_version_tag :

Python

client.set_model_version_tag(model_name, version="1", key="type",

value="classification")

Removing a tag:

Python

client.delete_model_version_tag(model_name, version="1", key="type")

Deleting a model's version

You can delete any model version in the registry using the MLflow client, as
demonstrated in the following example:

Python

client.delete_model_version(model_name, version="2")
７ Note

Azure Machine Learning doesn't support deleting the entire model container. To
achieve the same thing, you will need to delete all the model versions from a given
model.

Support matrix for managing models with

MLflow
The MLflow client exposes several methods to retrieve and manage models. The
following table shows which of those methods are currently supported in MLflow when
connected to Azure ML. It also compares it with other models management capabilities
in Azure ML.

Feature MLflow Azure ML Azure Azure

with ML ML
MLflow CLIv2 Studio

Registering models in MLflow format ✓ ✓ ✓ ✓

Registering models not in MLflow format ✓ ✓

Registering models from runs outputs/artifacts ✓ ✓1 ✓2 ✓

Registering models from runs outputs/artifacts in a ✓ ✓5 ✓5

different tracking server/workspace

Search/list registered models ✓ ✓ ✓ ✓

Retrieving details of registered model's versions ✓ ✓ ✓ ✓

Editing registered model's versions description ✓ ✓ ✓ ✓

Editing registered model's versions tags ✓ ✓ ✓ ✓

3 3 3
Renaming registered models ✓

3 3 3
Deleting a registered model (container) ✓

Deleting a registered model's version ✓ ✓ ✓ ✓

Manage MLflow model stages ✓ ✓

Search registered models by name ✓ ✓ ✓ ✓4

Search registered models using string comparators ✓ ✓4

LIKE and ILIKE
Feature MLflow Azure ML Azure Azure
with ML ML
MLflow CLIv2 Studio

Search registered models by tag ✓4

７ Note

1
Use URIs with format runs:/<ruin-id>/<path> .
2
Use URIs with format azureml://jobs/<job-id>/outputs/artifacts/<path> .
3
Registered models are immutable objects in Azure ML.
4
Use search box in Azure ML Studio. Partial match supported.
5 Use registries.

Next steps
Logging MLflow models
Query & compare experiments and runs with MLflow
Guidelines for deploying MLflow models
Query & compare experiments and runs
with MLflow
Article • 06/26/2023

MLflow allows you to:

Create, query, delete and search for experiments in a workspace.

Query, delete, and search for runs in a workspace.
Track and retrieve metrics, parameters, artifacts and models from runs.

See Support matrix for querying runs and experiments in Azure Machine Learning for a
detailed comparison between MLflow Open-Source and MLflow when connected to
Azure Machine Learning.

７ Note

REST API
Query and searching experiments and runs is also available using the MLflow REST API.
See Using MLflow REST with Azure Machine Learning for an example about how to
consume it.

Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

Query and search experiments

Use MLflow to search for experiments inside of your workspace. See the following
examples:

Get all active experiments:

Python

mlflow.search_experiments()

７ Note

In legacy versions of MLflow (<2.0) use method mlflow.list_experiments()

instead.

Get all the experiments, including archived:

Python

from mlflow.entities import ViewType

mlflow.search_experiments(view_type=ViewType.ALL)

Get a specific experiment by name:

Python

mlflow.get_experiment_by_name(experiment_name)

Get a specific experiment by ID:

Python

mlflow.get_experiment('1234-5678-90AB-CDEFG')

Searching experiments
The search_experiments() method available since Mlflow 2.0 allows searching
experiment matching a criteria using filter_string .

Retrieve multiple experiments based on their IDs:

Python

mlflow.search_experiments(filter_string="experiment_id IN ("
"'CDEFG-1234-5678-90AB', '1234-5678-90AB-CDEFG', '5678-1234-90AB-
CDEFG')"
)

Retrieve all experiments created after a given time:

Python

import datetime

dt = datetime.datetime(2022, 6, 20, 5, 32, 48)

mlflow.search_experiments(filter_string=f"creation_time >
{int(dt.timestamp())}")

Retrieve all experiments with a given tag:

Python
mlflow.search_experiments(filter_string=f"tags.framework = 'torch'")

Query and search runs

search. You can also indicate search_all_experiments=True if you want to search across
all the experiments in the workspace:

By experiment name:

Python

mlflow.search_runs(experiment_names=[ "my_experiment" ])

By experiment ID:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])

Search across all experiments in the workspace:

Python

mlflow.search_runs(filter_string="params.num_boost_round='100'",
search_all_experiments=True)

Notice that experiment_ids supports providing an array of experiments, so you can

） Important

If experiment_ids , experiment_names , or search_all_experiments are not indicated,

Basic information about the run.

Parameters with column's name params.<parameter-name> .
Metrics (last logged value of each) with column's name metrics.<metric-name> .

example.

Order runs by attributes, like start_time :

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.start_time DESC"])

Order runs and limit results. The following example returns the last single run in
the experiment:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
max_results=1, order_by=["attributes.start_time
DESC"])

Order runs by the attribute duration :

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.duration DESC"])
 Tip

attributes.duration is not present in MLflow OSS, but provided in Azure

Machine Learning for convenience.

Order runs by metric's values:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG"
]).sort_values("metrics.accuracy", ascending=False)

２ Warning

Using order_by with expressions containing metrics.* , params.* , or tags.*

in the parameter order_by is not supported by the moment. Please use
order_values method from Pandas as shown in the example.

Search runs based on a parameter's value:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'")

２ Warning

Only operators = , like , and != are supported for filtering parameters .

Search runs based on a metric's value:

Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="metrics.auc>0.8")

Search runs with a given tag:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="tags.framework='torch'")

Search runs created by a given user:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.user_id = 'John Smith'")

Search runs that have failed. See Filter runs by status for possible values:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")

Search runs created after a given time:

Python

import datetime

dt = datetime.datetime(2022, 6, 20, 5, 32, 48)

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.creation_time >
'{int(dt.timestamp())}'")

 Tip

Notice that for the key attributes , values should always be strings and hence
encoded between quotes.

Search runs taking longer than one hour:

Python
duration = 360 * 1000 # duration is in milliseconds
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.duration > '{duration}'")

 Tip

attributes.duration is not present in MLflow OSS, but provided in Azure

Machine Learning for convenience.

Search runs having the ID in a given set:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.run_id IN ('1234-5678-
90AB-CDEFG', '5678-1234-90AB-CDEFG')")

Filter runs by status

Azure Machine MLFlow's Meaning

Learning Job attributes.status
status

Not started SCHEDULED The job/run was just registered in Azure Machine
Learning but it has processed it yet.

Queue SCHEDULED The job/run is scheduled for running, but it hasn't

started yet.

Preparing SCHEDULED The job/run has not started yet, but a compute has
been allocated for the execution and it is on building
state.

Running RUNNING The job/run is currently under active execution.

Completed FINISHED The job/run has completed without errors.

Failed FAILED The job/run has completed with errors.

Canceled KILLED The job/run has been canceled or killed by the

user/system.
Example:

Python

mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")

Getting metrics, parameters, artifacts and

Python

runs = mlflow.search_runs(
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'",
output_format="list",
)

Details can then be accessed from the info member. The following sample shows how
to get the run_id :

Python

last_run = runs[-1]
print("Last run ID:", last_run.info.run_id)

Getting params and metrics from a run

When runs are returned using output_format="list" , you can easily access parameters
using the key data :

Python

last_run.data.params

In the same way, you can query metrics:

Python

last_run.data.metrics

Python

client = mlflow.tracking.MlflowClient()
client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")

Getting artifacts from a run

Any artifact logged by a run can be queried by MLflow. Artifacts can't be access using
the run object itself and the MLflow client should be used instead:

Python

client = mlflow.tracking.MlflowClient()
client.list_artifacts("1234-5678-90AB-CDEFG")

Python

file_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG",
artifact_path="feature_importance_weight.png"
)

７ Note

In legacy versions of MLflow (<2.0), use the method

MlflowClient.download_artifacts() instead.

Getting models from a run

Python

artifact_path="classifier"
model_local_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG", artifact_path=artifact_path
)

You can then load the model back from the downloaded artifacts using the typical
function load_model in the flavor-specific namespace. The following example uses
xgboost :

Python

model = mlflow.xgboost.load_model(model_local_path)

Python

model =
mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")

 Tip

For query and loading models registered in the Model Registry, view Manage
models registries in Azure Machine Learning with MLflow.

Getting child (nested) runs

Python

hyperopt_run = mlflow.last_active_run()
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId='{hyperopt_run.info.run_id}'"
)

Compare jobs and models in Azure Machine

） Important

Training and tracking a classifier with MLflow : Demonstrates how to track

Support matrix for querying runs and

Feature Supported Supported by Azure

by MLflow Machine Learning

Ordering runs by attributes ✓ ✓

Ordering runs by metrics ✓ 1

Ordering runs by parameters ✓ 1

Ordering runs by tags ✓ 1

Feature Supported Supported by Azure
by MLflow Machine Learning

Filtering runs by attributes ✓ ✓

Filtering runs by metrics ✓ ✓

Filtering runs by metrics with special characters ✓

(escaped)

Filtering runs by parameters ✓ ✓

Filtering runs by tags ✓ ✓

Filtering runs with numeric comparators (metrics) ✓ ✓

including = , != , > , >= , < , and <=

Filtering runs with string comparators (params, tags, ✓ ✓2

and attributes): = and !=

Filtering runs with string comparators (params, tags, ✓ ✓

and attributes): LIKE / ILIKE

Filtering runs with comparators AND ✓ ✓

Filtering runs with comparators OR

Renaming experiments ✓

７ Note

1 Check the section Ordering runs for instructions and examples on how to
achieve the same functionality in Azure Machine Learning.
2 != for tags not supported.

Next steps
Manage your models with MLflow.
Deploy models with MLflow.
Guidelines for deploying MLflow
models
Article • 10/18/2023

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn how to deploy your MLflow model to Azure Machine Learning for
both real-time and batch inference. Learn also about the different tools you can use to
perform management of the deployment.

Deploying MLflow models vs custom models

When deploying MLflow models to Azure Machine Learning, you don't have to provide
a scoring script or an environment for deployment as they're automatically generated
for you. We typically refer to this functionality as no-code deployment.

For no-code-deployment, Azure Machine Learning:

Ensures all the package dependencies indicated in the MLflow model are satisfied.
Provides a MLflow base image/curated environment that contains the following
items:
Packages required for Azure Machine Learning to perform inference, including
mlflow-skinny .
A scoring script to perform inference.

 Tip

Workspaces without public network access: Before you can deploy MLflow models
to online endpoints without egress connectivity, you have to package the models
(preview). By using model packaging, you can avoid the need for an internet
connection, which Azure Machine Learning would otherwise require to dynamically
install necessary Python packages for the MLflow models.

Python packages and dependencies

Azure Machine Learning automatically generates environments to run inference of
MLflow models. Those environments are built by reading the conda dependencies
specified in the MLflow model. Azure Machine Learning also adds any required package
to run the inferencing server, which will vary depending on the type of deployment
you're doing.

conda.yaml

YAML

channels:
- conda-forge
dependencies:
- python=3.7.11
- pip
- pip:
- mlflow
- scikit-learn==0.24.1
- cloudpickle==2.0.0
- psutil==5.8.0
name: mlflow-env

２ Warning

MLflow performs automatic package detection when logging models, and pins
their versions in the conda dependencies of the model. However, such action is
performed at the best of its knowledge and there might be cases when the
detection doesn't reflect your intentions or requirements. On those cases consider
logging models with a custom conda dependencies definition.

Implications of models with signatures

MLflow models can include a signature that indicates the expected inputs and their
types. For those models containing a signature, Azure Machine Learning enforces
compliance with it, both in terms of the number of inputs and their types. This means
that your data input should comply with the types indicated in the model signature. If
the data can't be parsed as expected, the invocation will fail. This applies for both online
and batch endpoints.

MLmodel

YAML

artifact_path: model
flavors:
python_function:
env: conda.yaml
loader_module: mlflow.sklearn
model_path: model.pkl
python_version: 3.7.11
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.24.1
run_id: f1e06708-641d-4a49-8f36-e9dcd8d34346
signature:
inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type":
"double"},
{"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"},
{"name":
"s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name":
"s3", "type":
"double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type":
"double"},
{"name": "s6", "type": "double"}]'
outputs: '[{"type": "double"}]'
utc_time_created: '2022-03-17 01:56:03.706848'

You can inspect your model's signature by opening the MLmodel file associated with
your MLflow model. For more information on how signatures work in MLflow, see
Signatures in MLflow.

 Tip

Signatures in MLflow models are optional but they are highly encouraged as they
provide a convenient way to early detect data compatibility issues. For more
information about how to log models with signatures read Logging models with a
custom signature, environment or samples.

Differences between models deployed in Azure

Machine Learning and MLflow built-in server
MLflow includes built-in deployment tools that model developers can use to test models
locally. For instance, you can run a local instance of a model registered in MLflow server
registry with mlflow models serve -m my_model or you can use the MLflow CLI mlflow
models predict . Azure Machine Learning online and batch endpoints run different

inferencing technologies, which might have different features. Read this section to
understand their differences.

Batch vs online endpoints

Azure Machine Learning supports deploying models to both online and batch
endpoints. Online Endpoints compare to MLflow built-in server and they provide a
scalable, synchronous, and lightweight way to run models for inference. Batch
Endpoints, on the other hand, provide a way to run asynchronous inference over long
running inferencing processes that can scale to large amounts of data. This capability
isn't present by the moment in MLflow server although similar capability can be
achieved using Spark jobs.

The rest of this section mostly applies to online endpoints but you can learn more of
batch endpoint and MLflow models at Use MLflow models in batch deployments.

Input formats

Input type MLflow built-in Azure Machine Learning

server Online Endpoints

JSON-serialized pandas DataFrames in the split ✓ ✓

orientation

JSON-serialized pandas DataFrames in the Deprecated

records orientation

CSV-serialized pandas DataFrames ✓ Use batch1

Tensor input format as JSON-serialized lists ✓ ✓

(tensors) and dictionary of lists (named tensors)

Tensor input formatted as in TF Serving's API ✓

７ Note

1
We suggest you to explore batch inference for processing files. See Deploy
MLflow models to Batch Endpoints.

Input structure
Regardless of the input type used, Azure Machine Learning requires inputs to be
provided in a JSON payload, within a dictionary key input_data . The following section
shows different payload examples and the differences between MLflow built-in server
and Azure Machine Learning inferencing server.

２ Warning
Note that such key is not required when serving models using the command
mlflow models serve and hence payloads can't be used interchangeably.

） Important

MLflow 2.0 advisory: Notice that the payload's structure has changed in MLflow
2.0.

Payload example for a JSON-serialized pandas DataFrames in the

split orientation

Azure Machine Learning

JSON

{
"input_data": {
"columns": [
"age", "sex", "trestbps", "chol", "fbs", "restecg",
"thalach", "exang", "oldpeak", "slope", "ca", "thal"
],
"index": [1],
"data": [
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}
}

Payload example for a tensor input

Azure Machine Learning

JSON

{
"input_data": [
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}

Payload example for a named-tensor input

Azure Machine Learning

JSON

{
"input_data": {
"tokens": [
[0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
],
"mask": [
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
]
}
}

For more information about MLflow built-in deployment tools, see MLflow
documentation section .

How to customize inference when deploying

MLflow models
You might be used to authoring scoring scripts to customize how inference is executed
for your custom models. However, when deploying MLflow models to Azure Machine
Learning, the decision about how inference should be executed is done by the model
builder (the person who built the model), rather than by the DevOps engineer (the
person who is trying to deploy it). Each model framework might automatically apply
specific inference routines.

If you need to change the behavior at any point about how inference of an MLflow
model is executed, you can either change how your model is being logged in the
training routine or customize inference with a scoring script at deployment time.

Change how your model is logged during training

When you log a model using either mlflow.autolog or using mlflow.
<flavor>.log_model , the flavor used for the model decides how inference should be

executed and what gets returned by the model. MLflow doesn't enforce any specific
behavior in how the predict() function generates results. However, there are scenarios
where you probably want to do some preprocessing or post-processing before and after
your model is executed. On another scenarios, you might want to change what's
returned like probabilities vs classes.

A solution to this scenario is to implement machine learning pipelines that moves from
inputs to outputs directly. For instance, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline
are popular (and sometimes encourageable for performance considerations) ways to do
so. Another alternative is to customize how your model does inference using a custom
model flavor.

Customize inference with a scoring script

Although MLflow models don't require a scoring script, you can still provide one if
needed. You can use it to customize how inference is executed for MLflow models. To
learn how to do it, refer to Customizing MLflow model deployments (Online Endpoints)
and Customizing MLflow model deployments (Batch Endpoints).

） Important

When you opt-in to specify a scoring script for an MLflow model deployment, you
also need to provide an environment for it.

Deployment tools
Azure Machine Learning offers many ways to deploy MLflow models to online and batch
endpoints. You can deploy models using the following tools:

＂ MLflow SDK
＂ Azure Machine Learning CLI and Azure Machine Learning SDK for Python
＂ Azure Machine Learning studio

Each workflow has different capabilities, particularly around which type of compute they
can target. The following table shows them.
Scenario MLflow SDK Azure Machine Azure Machine
Learning CLI/SDK Learning studio

Deploy to managed online See example1 See example1 See example1

endpoints

Deploy to managed online Not See example See example

endpoints (with a scoring script) supported3

Deploy to batch endpoints Not See example See example

supported3

Deploy to batch endpoints (with Not See example See example

a scoring script) supported3

Deploy to web services Legacy Not supported2 Not supported2

(ACI/AKS) support2

Deploy to web services (ACI/AKS Not Legacy support2 Legacy support2

- with a scoring script) supported3

７ Note

1
Deployment to online endpoints that are in workspaces with private link
enabled requires you to package models before deployment (preview).
2
We recommend switching to managed online endpoints instead.
3
MLflow (OSS) doesn't have the concept of a scoring script and doesn't
support batch execution currently.

Which deployment tool to use?

If you're familiar with MLflow or your platform supports MLflow natively (like Azure
Databricks), and you wish to continue using the same set of methods, use the MLflow
SDK.

However, if you're more familiar with the Azure Machine Learning CLI v2, you want to
automate deployments using automation pipelines, or you want to keep deployment
configuration in a git repository; we recommend that you use the Azure Machine
Learning CLI v2.

If you want to quickly deploy and test models trained with MLflow, you can use the
Azure Machine Learning studio UI deployment.
Next steps
To learn more, review these articles:

Deploy MLflow models to online endpoints

Progressive rollout of MLflow models
Deploy MLflow models to Batch Endpoints
Deploy MLflow models to online
endpoints
Article • 10/18/2023

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn how to deploy your MLflow model to an online endpoint for real-
time inference. When you deploy your MLflow model to an online endpoint, you don't
need to indicate a scoring script or an environment. This characteristic is referred as no-
code deployment.

For no-code-deployment, Azure Machine Learning

Dynamically installs Python packages provided in the conda.yaml file. Hence,

dependencies are installed during container runtime.
Provides a MLflow base image/curated environment that contains the following
items:
azureml-inference-server-http
mlflow-skinny
A scoring script to perform inference.

 Tip

About this example

This example shows how you can deploy an MLflow model to an online endpoint to
perform predictions. This example uses an MLflow model based on the Diabetes
dataset . This dataset contains ten baseline variables, age, sex, body mass index,
average blood pressure, and six blood serum measurements obtained from n = 442
diabetes patients. It also contains the response of interest, a quantitative measure of
disease progression one year after baseline (regression).
The model was trained using an scikit-learn regressor and all the required
preprocessing has been packaged as a pipeline, making this model an end-to-end
pipeline that goes from raw data to predictions.

The information in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste YAML
and other files, clone the repo, and then change directories to the cli/endpoints/online
if you are using the Azure CLI or sdk/endpoints/online if you are using our SDK for
Python.

Azure CLI

git clone https://github.com/Azure/azureml-examples --depth 1

cd azureml-examples/cli/endpoints/online

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository,
open the notebook: mlflow_sdk_online_endpoints_progresive.ipynb .

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free

account before you begin. Try the free or paid version of Azure Machine
Learning .
Azure role-based access controls (Azure RBAC) are used to grant access to
operations in Azure Machine Learning. To perform the steps in this article, your
user account must be assigned the owner or contributor role for the Azure
Machine Learning workspace, or a custom role allowing
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more
information, see Manage access to an Azure Machine Learning workspace.
You must have a MLflow model registered in your workspace. Particularly, this
example registers a model trained for the Diabetes dataset .

Additionally, you need to:

Azure CLI
Install the Azure CLI and the ml extension to the Azure CLI. For more
information, see Install, set up, and use the CLI (v2).

Connect to your workspace

First, let's connect to Azure Machine Learning workspace where we are going to work
on.

Azure CLI

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-group>
location=<location>

Registering the model

Online Endpoint can only deploy registered models. In this case, we already have a local
copy of the model in the repository, so we only need to publish the model to the
registry in the workspace. You can skip this step if the model you are trying to deploy is
already registered.

Azure CLI

MODEL_NAME='sklearn-diabetes'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"sklearn-diabetes/model"

Alternatively, if your model was logged inside of a run, you can register it directly.

 Tip

To register the model, you will need to know the location where the model has
been stored. If you are using autolog feature of MLflow, the path will depend on
the type and framework of the model being used. We recommend to check the
jobs output to identify which is the name of this folder. You can look for the folder
that contains a file named MLModel . If you are logging your models manually using
log_model , then the path is the argument you pass to such method. As an example,

if you log the model using mlflow.sklearn.log_model(my_model, "classifier") ,

then the path where the model is stored is classifier .

Azure CLI

Use the Azure Machine Learning CLI v2 to create a model from a training job
output. In the following example, a model named $MODEL_NAME is registered using
the artifacts of a job with ID $RUN_ID . The path where the model is stored is
$MODEL_PATH .

Bash

az ml model create --name $MODEL_NAME --path

azureml://jobs/$RUN_ID/outputs/artifacts/$MODEL_PATH

７ Note

The path $MODEL_PATH is the location where the model has been stored in the
run.

Deploy an MLflow model to an online endpoint

1. First. we need to configure the endpoint where the model will be deployed. The
following example configures the name and authentication mode of the endpoint:

Azure CLI

endpoint.yaml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.s
chema.json
name: my-endpoint
auth_mode: key
2. Let's create the endpoint:

Azure CLI

az ml online-endpoint create --name $ENDPOINT_NAME -f

endpoints/online/ncd/create-endpoint.yaml

3. Now, it is time to configure the deployment. A deployment is a set of resources

required for hosting the model that does the actual inferencing.

Azure CLI

sklearn-deployment.yaml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: sklearn-deployment
endpoint_name: my-endpoint
model:
name: mir-sample-sklearn-ncd-model
version: 1
path: sklearn-diabetes/model
type: mlflow_model
instance_type: Standard_DS3_v2
instance_count: 1

７ Note

scoring_script and environment auto generation are only supported for

pyfunc model's flavor. To use a different flavor, see Customizing MLflow

model deployments.

4. Let's create the deployment:

Azure CLI

Azure CLI
az ml online-deployment create --name sklearn-deployment --endpoint
$ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --
all-traffic

If your endpoint doesn't have egress connectivity, use model packaging

(preview) by including the flag --with-package :

Azure CLI

az ml online-deployment create --with-package --name sklearn-

deployment --endpoint $ENDPOINT_NAME -f
endpoints/online/ncd/sklearn-deployment.yaml --all-traffic

5. Assign all the traffic to the deployment: So far, the endpoint has one deployment,
but none of its traffic is assigned to it. Let's assign it.

Azure CLI

This step in not required in the Azure CLI since we used the --all-traffic
during creation. If you need to change traffic, you can use the command az ml
online-endpoint update --traffic as explained at Progressively update traffic.

6. Update the endpoint configuration:

Azure CLI

Invoke the endpoint

Once your deployment completes, your deployment is ready to serve request. One of
the easier ways to test the deployment is by using the built-in invocation capability in
the deployment client you are using.

sample-request-sklearn.json

JSON
{"input_data": {
"columns": [
"age",
"sex",
"bmi",
"bp",
"s1",
"s2",
"s3",
"s4",
"s5",
"s6"
],
"data": [
[ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
[ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
],
"index": [0,1]
}}

７ Note

Notice how the key input_data has been used in this example instead of inputs as
used in MLflow serving. This is because Azure Machine Learning requires a different
input format to be able to automatically generate the swagger contracts for the
endpoints. See Differences between models deployed in Azure Machine Learning
and MLflow built-in server for details about expected input format.

To submit a request to the endpoint, you can do as follows:

Azure CLI

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file

endpoints/online/ncd/sample-request-sklearn.json

The response will be similar to the following text:

JSON

[
11633.100167144921,
8522.117402884991
]

） Important

For MLflow no-code-deployment, testing via local endpoints is currently not

supported.

Customizing MLflow model deployments

MLflow models can be deployed to online endpoints without indicating a scoring script
in the deployment definition. However, you can opt to customize how inference is
executed.

You will typically select this workflow when:

＂ The model doesn't have a PyFunc flavor on it.

＂ You need to customize the way the model is run, for instance, use an specific flavor
to load it with mlflow.<flavor>.load_model() .
＂ You need to do pre/post processing in your scoring routine when it is not done by
the model itself.
＂ The output of the model can't be nicely represented in tabular data. For instance, it
is a tensor representing an image.

） Important

If you choose to indicate an scoring script for an MLflow model deployment, you
will also have to specify the environment where the deployment will run.

Steps
Use the following steps to deploy an MLflow model with a custom scoring script.

1. Identify the folder where your MLflow model is placed.

a. Go to Azure Machine Learning portal .

b. Go to the section Models.

c. Select the model you are trying to deploy and click on the tab Artifacts.
d. Take note of the folder that is displayed. This folder was indicated when the
model was registered.

2. Create a scoring script. Notice how the folder name model you identified before
has been included in the init() function.

score.py

Python

import logging
import os
import json
import mlflow
from io import StringIO
from mlflow.pyfunc.scoring_server import infer_and_parse_json_input,
predictions_to_json

def init():
global model
global input_schema
# "model" is the path of the mlflow artifacts when the model was
registered. For automl
# models, this is generally "mlflow-model".
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model")
model = mlflow.pyfunc.load_model(model_path)
input_schema = model.metadata.get_input_schema()

def run(raw_data):
json_data = json.loads(raw_data)
if "input_data" not in json_data.keys():
raise Exception("Request must contain a top level key named
'input_data'")
serving_input = json.dumps(json_data["input_data"])
data = infer_and_parse_json_input(serving_input, input_schema)
predictions = model.predict(data)

result = StringIO()
predictions_to_json(predictions, result)
return result.getvalue()

 Tip

The previous scoring script is provided as an example about how to perform

inference of an MLflow model. You can adapt this example to your needs or
change any of its parts to reflect your scenario.

２ Warning

MLflow 2.0 advisory: The provided scoring script will work with both MLflow
1.X and MLflow 2.X. However, be advised that the expected input/output
formats on those versions may vary. Check the environment definition used to
ensure you are using the expected MLflow version. Notice that MLflow 2.0 is
only supported in Python 3.8+.

3. Let's create an environment where the scoring script can be executed. Since our
model is MLflow, the conda requirements are also specified in the model package
(for more details about MLflow models and the files included on it see The
MLmodel format). We are going then to build the environment using the conda
dependencies from the file. However, we need also to include the package
azureml-inference-server-http which is required for Online Deployments in Azure

Machine Learning.

The conda definition file looks as follows:

conda.yml

YAML

channels:
- conda-forge
dependencies:
- python=3.9
- pip
- pip:
- mlflow
- scikit-learn==1.2.2
- cloudpickle==2.2.1
- psutil==5.9.4
- pandas==2.0.0
- azureml-inference-server-http
name: mlflow-env

７ Note

Note how the package azureml-inference-server-http has been added to the

original conda dependencies file.

We will use this conda dependencies file to create the environment:

Azure CLI

The environment will be created inline in the deployment configuration.

4. Let's create the deployment now:

Azure CLI

Create a deployment configuration file:

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: sklearn-diabetes-custom
endpoint_name: my-endpoint
model: azureml:sklearn-diabetes@latest
environment:
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
conda_file: sklearn-diabetes/environment/conda.yml
code_configuration:
code: sklearn-diabetes/src
scoring_script: score.py
instance_type: Standard_F2s_v2
instance_count: 1

Create the deployment:

Azure CLI
az ml online-deployment create -f deployment.yml

5. Once your deployment completes, your deployment is ready to serve request. One
of the easier ways to test the deployment is by using a sample request file along
with the invoke method.

sample-request-sklearn.json

JSON

{"input_data": {
"columns": [
"age",
"sex",
"bmi",
"bp",
"s1",
"s2",
"s3",
"s4",
"s5",
"s6"
],
"data": [
[ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
[ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
],
"index": [0,1]
}}

To submit a request to the endpoint, you can do as follows:

Azure CLI

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file

endpoints/online/mlflow/sample-request-sklearn-custom.json

The response will be similar to the following text:

JSON

{
"predictions": [
11633.100167144921,
8522.117402884991
]
}

２ Warning

MLflow 2.0 advisory: In MLflow 1.X, the key predictions will be missing.

Clean up resources
Once you're done with the endpoint, you can delete the associated resources:

Azure CLI

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

Next steps
To learn more, review these articles:

Deploy models with REST

Create and use online endpoints in the studio
Safe rollout for online endpoints
How to autoscale managed online endpoints
Use batch endpoints for batch scoring
View costs for an Azure Machine Learning managed online endpoint
Access Azure resources with an online endpoint and managed identity
Troubleshoot online endpoint deployment
Progressive rollout of MLflow models to
Online Endpoints
Article • 10/18/2023

In this article, you'll learn how you can progressively update and deploy MLflow models
to Online Endpoints without causing service disruption. You'll use blue-green
deployment, also known as a safe rollout strategy, to introduce a new version of a web
service to production. This strategy will allow you to roll out your new version of the
web service to a small subset of users or requests before rolling it out completely.

About this example

Online Endpoints have the concept of Endpoint and Deployment. An endpoint
represents the API that customers use to consume the model, while the deployment
indicates the specific implementation of that API. This distinction allows users to
decouple the API from the implementation and to change the underlying
implementation without affecting the consumer. This example will use such concepts to
update the deployed model in endpoints without introducing service disruption.

The model we will deploy is based on the UCI Heart Disease Data Set . The database
contains 76 attributes, but we are using a subset of 14 of them. The model tries to
predict the presence of heart disease in a patient. It is integer valued from 0 (no
presence) to 1 (presence). It has been trained using an XGBBoost classifier and all the
required preprocessing has been packaged as a scikit-learn pipeline, making this
model an end-to-end pipeline that goes from raw data to predictions.

The information in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste files,
clone the repo, and then change directories to sdk/using-mlflow/deploy .

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository,
open the notebook: mlflow_sdk_online_endpoints_progresive.ipynb .

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine
Learning .
Azure role-based access controls (Azure RBAC) are used to grant access to
operations in Azure Machine Learning. To perform the steps in this article, your
user account must be assigned the owner or contributor role for the Azure
Machine Learning workspace, or a custom role allowing
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more
information, see Manage access to an Azure Machine Learning workspace.

Additionally, you will need to:

Azure CLI

Install the Azure CLI and the ml extension to the Azure CLI. For more
information, see Install, set up, and use the CLI (v2).

Connect to your workspace

First, let's connect to Azure Machine Learning workspace where we are going to work
on.

Azure CLI

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-group>
location=<location>

Registering the model in the registry

Ensure your model is registered in Azure Machine Learning registry. Deployment of
unregistered models is not supported in Azure Machine Learning. You can register a
new model using the MLflow SDK:

Azure CLI

Azure CLI
MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"

Create an online endpoint

Online endpoints are endpoints that are used for online (real-time) inferencing. Online
endpoints contain deployments that are ready to receive data from clients and can send
responses back in real time.

We are going to exploit this functionality by deploying multiple versions of the same
model under the same endpoint. However, the new deployment will receive 0% of the
traffic at the begging. Once we are sure about the new model to work correctly, we are
going to progressively move traffic from one deployment to the other.

1. Endpoints require a name, which needs to be unique in the same region. Let's
ensure to create one that doesn't exist:

Azure CLI

ENDPOINT_SUFIX=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w

${1:-5} | head -n 1)
ENDPOINT_NAME="heart-classifier-$ENDPOINT_SUFIX"

2. Configure the endpoint

Azure CLI

endpoint.yml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.s
chema.json
name: heart-classifier-edp
auth_mode: key

3. Create the endpoint:

Azure CLI

az ml online-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

4. Getting the authentication secret for the endpoint.

Azure CLI

ENDPOINT_SECRET_KEY=$(az ml online-endpoint get-credentials -n

$ENDPOINT_NAME | jq -r ".accessToken")

Create a blue deployment

So far, the endpoint is empty. There are no deployments on it. Let's create the first one
by deploying the same model we were working on before. We will call this deployment
"default" and it will represent our "blue deployment".

1. Configure the deployment

Azure CLI

blue-deployment.yml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

2. Create the deployment

Azure CLI
Azure CLI

az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f

blue-deployment.yml --all-traffic

If your endpoint doesn't have egress connectivity, use model packaging

(preview) by including the flag --with-package :

Azure CLI

az ml online-deployment create --with-package --endpoint-name

$ENDPOINT_NAME -f blue-deployment.yml --all-traffic

 Tip

We set the flag --all-traffic in the create command, which will assign
all the traffic to the new deployment.

3. Assign all the traffic to the deployment

So far, the endpoint has one deployment, but none of its traffic is assigned to it.
Let's assign it.

Azure CLI

This step in not required in the Azure CLI since we used the --all-traffic
during creation.

4. Update the endpoint configuration:

Azure CLI

This step in not required in the Azure CLI since we used the --all-traffic
during creation.

5. Create a sample input to test the deployment

Azure CLI

sample.yml
YAML

{
"input_data": {
"columns": [
"age",
"sex",
"cp",
"trestbps",
"chol",
"fbs",
"restecg",
"thalach",
"exang",
"oldpeak",
"slope",
"ca",
"thal"
],
"data": [
[ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal"
]
]
}
}

6. Test the deployment

Azure CLI

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file

sample.json

Create a green deployment under the endpoint

Let's imagine that there is a new version of the model created by the development team
and it is ready to be in production. We can first try to fly this model and once we are
confident, we can update the endpoint to route the traffic to it.

1. Register a new model version

Azure CLI
Azure CLI

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"

Let's get the version number of the new model:

Azure CLI

VERSION=$(az ml model show -n heart-classifier --label latest | jq

-r ".version")

2. Configure a new deployment

Azure CLI

green-deployment.yml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

We will name the deployment as follows:

Azure CLI

GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"

3. Create the new deployment

Azure CLI

Azure CLI
az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --
endpoint-name $ENDPOINT_NAME -f green-deployment.yml

If your endpoint doesn't have egress connectivity, use model packaging

(preview) by including the flag --with-package :

Azure CLI

az ml online-deployment create --with-package -n

$GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-
deployment.yml

4. Test the deployment without changing traffic

Azure CLI

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-

name $GREEN_DEPLOYMENT_NAME --request-file sample.json

 Tip

Notice how now we are indicating the name of the deployment we want to
invoke.

Progressively update the traffic

One we are confident with the new deployment, we can update the traffic to route some
of it to the new deployment. Traffic is configured at the endpoint level:

1. Configure the traffic:

Azure CLI

This step in not required in the Azure CLI

2. Update the endpoint

Azure CLI

az ml online-endpoint update --name $ENDPOINT_NAME --traffic

"default=90 $GREEN_DEPLOYMENT_NAME=10"

3. If you decide to switch the entire traffic to the new deployment, update all the
traffic:

Azure CLI

This step in not required in the Azure CLI

4. Update the endpoint

Azure CLI

az ml online-endpoint update --name $ENDPOINT_NAME --traffic

"default=0 $GREEN_DEPLOYMENT_NAME=100"

5. Since the old deployment doesn't receive any traffic, you can safely delete it:

Azure CLI

az ml online-deployment delete --endpoint-name $ENDPOINT_NAME --

name default

 Tip

Notice that at this point, the former "blue deployment" has been deleted and
the new "green deployment" has taken the place of the "blue deployment".

Clean-up resources
Azure CLI

Azure CLI

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

） Important

Notice that deleting an endpoint also deletes all the deployments under it.

Next steps
Deploy MLflow models to Batch Endpoints
Using MLflow models for no-code deployment
Deploy MLflow models in batch
deployments
Article • 05/15/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

In this article, learn how to deploy MLflow models to Azure Machine Learning for both
batch inference using batch endpoints. When deploying MLflow models to batch
endpoints, Azure Machine Learning:

Provides a MLflow base image/curated environment that contains the required

dependencies to run an Azure Machine Learning Batch job.
Creates a batch job pipeline with a scoring script for you that can be used to
process data using parallelization.

７ Note

For more information about the supported input file types in model deployments
with MLflow, view Considerations when deploying to batch inference.

About this example

This example shows how you can deploy an MLflow model to a batch endpoint to
perform batch predictions. This example uses an MLflow model based on the UCI Heart
Disease Data Set . The database contains 76 attributes, but we are using a subset of 14
of them. The model tries to predict the presence of heart disease in a patient. It is
integer valued from 0 (no presence) to 1 (presence).

The model has been trained using an XGBBoost classifier and all the required
preprocessing has been packaged as a scikit-learn pipeline, making this model an
end-to-end pipeline that goes from raw data to predictions.

The example in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste YAML
and other files, first clone the repo and then change directories to the folder:

Azure CLI
Azure CLI

git clone https://github.com/Azure/azureml-examples --depth 1

cd azureml-examples/cli

The files for this example are in:

Azure CLI

cd endpoints/batch/deploy-models/heart-classifier-mlflow

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository,
open the notebook: mlflow-for-batch-tabular.ipynb .

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free

account before you begin. Try the free or paid version of Azure Machine
Learning .

An Azure Machine Learning workspace. If you don't have one, use the steps in the
How to manage workspaces article to create one.

Ensure you have the following permissions in the workspace:

Create/manage batch endpoints and deployments: Use roles Owner,

contributor, or custom role allowing
Microsoft.MachineLearningServices/workspaces/batchEndpoints/* .

You will need to install the following software to work with Azure Machine
Learning:

Azure CLI
The Azure CLI and the ml extension for Azure Machine Learning.

Azure CLI

az extension add -n ml

７ Note

Pipeline component deployments for Batch Endpoints were introduced in

version 2.7 of the ml extension for Azure CLI. Use az extension update --
name ml to get the last version of it.

Connect to your workspace

The workspace is the top-level resource for Azure Machine Learning, providing a
centralized place to work with all the artifacts you create when you use Azure Machine
Learning. In this section, we'll connect to the workspace in which you'll perform
deployment tasks.

Azure CLI

Pass in the values for your subscription ID, workspace, location, and resource group
in the following code:

Azure CLI

az account set --subscription <subscription>

az configure --defaults workspace=<workspace> group=<resource-group>
location=<location>

Steps
Follow these steps to deploy an MLflow model to a batch endpoint for running batch
inference over new data:

1. Batch Endpoint can only deploy registered models. In this case, we already have a
local copy of the model in the repository, so we only need to publish the model to
the registry in the workspace. You can skip this step if the model you are trying to
deploy is already registered.

Azure CLI

MODEL_NAME='heart-classifier-mlflow'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"

2. Before moving any forward, we need to make sure the batch deployments we are
about to create can run on some infrastructure (compute). Batch deployments can
run on any Azure Machine Learning compute that already exists in the workspace.
That means that multiple batch deployments can share the same compute
infrastructure. In this example, we are going to work on an Azure Machine Learning
compute cluster called cpu-cluster . Let's verify the compute exists on the
workspace or create it otherwise.

Azure CLI

Create a compute cluster as follows:

Azure CLI

az ml compute create -n batch-cluster --type amlcompute --min-

instances 0 --max-instances 5

3. Now it is time to create the batch endpoint and deployment. Let's start with the
endpoint first. Endpoints only require a name and a description to be created. The
name of the endpoint will end-up in the URI associated with your endpoint.
Because of that, batch endpoint names need to be unique within an Azure
region. For example, there can be only one batch endpoint with the name
mybatchendpoint in westus2 .

Azure CLI

In this case, let's place the name of the endpoint in a variable so we can easily
reference it later.

Azure CLI
ENDPOINT_NAME="heart-classifier"

4. Create the endpoint:

Azure CLI

To create a new endpoint, create a YAML configuration like the following:

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.js
on
name: heart-classifier-batch
description: A heart condition classifier for batch inference
auth_mode: aad_token

Then, create the endpoint with the following command:

Azure CLI

az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

5. Now, let create the deployment. MLflow models don't require you to indicate an
environment or a scoring script when creating the deployments as it is created for
you. However, you can specify them if you want to customize how the deployment
does inference.

Azure CLI

To create a new deployment under the created endpoint, create a YAML

configuration like the following. You can check the full batch endpoint YAML
schema for extra properties.

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.
json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-mlflow
description: A heart condition classifier based on XGBoost
type: model
model: azureml:heart-classifier-mlflow@latest
compute: azureml:batch-cluster
resources:
instance_count: 2
settings:
max_concurrency_per_instance: 2
mini_batch_size: 2
output_action: append_row
output_file_name: predictions.csv
retry_settings:
max_retries: 3
timeout: 300
error_threshold: -1
logging_level: info

Then, create the deployment with the following command:

Azure CLI

az ml batch-deployment create --file deployment-

simple/deployment.yml --endpoint-name $ENDPOINT_NAME --set-default

７ Note

Batch deployments only support deploying MLflow models with a pyfunc

flavor. To use a different flavor, see Customizing MLflow models
deployments with a scoring script..

6. Although you can invoke a specific deployment inside of an endpoint, you will
usually want to invoke the endpoint itself and let the endpoint decide which
deployment to use. Such deployment is named the "default" deployment. This
gives you the possibility of changing the default deployment and hence changing
the model serving the deployment without changing the contract with the user
invoking the endpoint. Use the following instruction to update the default
deployment:

Azure CLI

DEPLOYMENT_NAME="classifier-xgboost-mlflow"
az ml batch-endpoint update --name $ENDPOINT_NAME --set
defaults.deployment_name=$DEPLOYMENT_NAME
7. At this point, our batch endpoint is ready to be used.

Testing out the deployment

For testing our endpoint, we are going to use a sample of unlabeled data located in this
repository and that can be used with the model. Batch endpoints can only process data
that is located in the cloud and that is accessible from the Azure Machine Learning
workspace. In this example, we are going to upload it to an Azure Machine Learning
data store. Particularly, we are going to create a data asset that can be used to invoke
the endpoint for scoring. However, notice that batch endpoints accept data that can be
placed in multiple type of locations.

1. Let's create the data asset first. This data asset consists of a folder with multiple
CSV files that we want to process in parallel using batch endpoints. You can skip
this step is your data is already registered as a data asset or you want to use a
different input type.

Azure CLI

a. Create a data asset definition in YAML :

heart-dataset-unlabeled.yml

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/data.schema.json
name: heart-dataset-unlabeled
description: An unlabeled dataset for heart classification.
type: uri_folder
path: data

b. Create the data asset:

Azure CLI

az ml data create -f heart-dataset-unlabeled.yml

2. Now that the data is uploaded and ready to be used, let's invoke the endpoint:

Azure CLI
Azure CLI

JOB_NAME = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --

input azureml:heart-dataset-unlabeled@latest --query name -o tsv)

７ Note

The utility jq may not be installed on every installation. You can get
installation instructions in this link .

 Tip

Notice how we are not indicating the deployment name in the invoke
operation. That's because the endpoint automatically routes the job to the
default deployment. Since our endpoint only has one deployment, then that
one is the default one. You can target an specific deployment by indicating
the argument/parameter deployment_name .

3. A batch job is started as soon as the command returns. You can monitor the status
of the job until it finishes:

Azure CLI

az ml job show -n $JOB_NAME --web

Analyzing the outputs

Output predictions are generated in the predictions.csv file as indicated in the
deployment configuration. The job generates a named output called score where this
file is placed. Only one file is generated per batch job.

The file is structured as follows:

There is one row per each data point that was sent to the model. For tabular data,
this means that one row is generated for each row in the input files and hence the
number of rows in the generated file ( predictions.csv ) equals the sum of all the
rows in all the processed files. For other data types, there is one row per each
processed file.

Two columns are indicated:

The file name where the data was read from. In tabular data, use this field to
know which prediction belongs to which input data. For any given file,
predictions are returned in the same order they appear in the input file so you
can rely on the row number to match the corresponding prediction.
The prediction associated with the input data. This value is returned "as-is" it
was provided by the model's predict(). function.

You can download the results of the job by using the job name:

Azure CLI

To download the predictions, use the following command:

Azure CLI

az ml job download --name $JOB_NAME --output-name score --download-path

Once the file is downloaded, you can open it using your favorite tool. The following
example loads the predictions using Pandas dataframe.

Python

from ast import literal_eval

import pandas as pd

with open("named-outputs/score/predictions.csv", "r") as f:

data = f.read()
score = pd.DataFrame(
literal_eval(data.replace("\n", ",")), columns=["file",
"prediction"]
)
score

２ Warning

The file predictions.csv may not be a regular CSV file and can't be read correctly
using pandas.read_csv() method.
The output looks as follows:

file prediction

heart-unlabeled-0.csv 0

heart-unlabeled-0.csv 1

... 1

heart-unlabeled-3.csv 0

 Tip

Notice that in this example the input data was tabular data in CSV format and there
were 4 different input files (heart-unlabeled-0.csv, heart-unlabeled-1.csv, heart-
unlabeled-2.csv and heart-unlabeled-3.csv).

Considerations when deploying to batch

inference
Azure Machine Learning supports no-code deployment for batch inference in managed
endpoints. This represents a convenient way to deploy models that require processing
of big amounts of data in a batch-fashion.

How work is distributed on workers

Work is distributed at the file level, for both structured and unstructured data. As a
consequence, only file datasets or URI folders are supported for this feature. Each
worker processes batches of Mini batch size files at a time. Further parallelism can be
achieved if Max concurrency per instance is increased.

２ Warning

Nested folder structures are not explored during inference. If you are partitioning
your data using folders, make sure to flatten the structure beforehand.

２ Warning
Batch deployments will call the predict function of the MLflow model once per file.
For CSV files containing multiple rows, this may impose a memory pressure in the
underlying compute. When sizing your compute, take into account not only the
memory consumption of the data being read but also the memory footprint of the
model itself. This is specially true for models that processes text, like transformer-
based models where the memory consumption is not linear with the size of the
input. If you encouter several out-of-memory exceptions, consider splitting the
data in smaller files with less rows or implement batching at the row level inside of
the model/scoring script.

File's types support

The following data types are supported for batch inference when deploying MLflow
models without an environment and a scoring script:

File Type Signature requirement

extension returned as
model's
input

.csv , pd.DataFrame ColSpec . If not provided, columns typing is not enforced.

.parquet ,
.pqt

.png , np.ndarray TensorSpec . Input is reshaped to match tensors shape if available. If

.jpg , no signature is available, tensors of type np.uint8 are inferred. For
.jpeg , additional guidance read Considerations for MLflow models
.tiff , processing images.
.bmp , .gif

２ Warning

Be advised that any unsupported file that may be present in the input data will
make the job to fail. You will see an error entry as follows: "ERROR:azureml:Error
processing input file: '/mnt/batch/tasks/.../a-given-file.avro'. File type 'avro' is not
supported.".

 Tip

If you like to process a different file type, or execute inference in a different way
that batch endpoints do by default you can always create the deploymnet with a
scoring script as explained in Using MLflow models with a scoring script.

Signature enforcement for MLflow models

Input's data types are enforced by batch deployment jobs while reading the data using
the available MLflow model signature. This means that your data input should comply
with the types indicated in the model signature. If the data can't be parsed as expected,
the job will fail with an error message similar to the following one: "ERROR:azureml:Error
processing input file: '/mnt/batch/tasks/.../a-given-file.csv'. Exception: invalid literal for
int() with base 10: 'value'".

 Tip

You can inspect the model signature of your model by opening the MLmodel file
associated with your MLflow model. For more details about how signatures work in
MLflow see Signatures in MLflow.

Flavor support
Batch deployments only support deploying MLflow models with a pyfunc flavor. If you
need to deploy a different flavor, see Using MLflow models with a scoring script.

Customizing MLflow models deployments with

a scoring script
MLflow models can be deployed to batch endpoints without indicating a scoring script
in the deployment definition. However, you can opt in to indicate this file (usually
referred as the batch driver) to customize how inference is executed.

You will typically select this workflow when:

＂ You need to process a file type not supported by batch deployments MLflow
deployments.
＂ You need to customize the way the model is run, for instance, use an specific flavor
to load it with mlflow.<flavor>.load() .
＂ You need to do pre/pos processing in your scoring routine when it is not done by
the model itself.
＂ The output of the model can't be nicely represented in tabular data. For instance, it
is a tensor representing an image.
＂ You model can't process each file at once because of memory constrains and it
needs to read it in chunks.

） Important

If you choose to indicate an scoring script for an MLflow model deployment, you
will also have to specify the environment where the deployment will run.

２ Warning

Customizing the scoring script for MLflow deployments is only available from the
Azure CLI or SDK for Python. If you are creating a deployment using Azure
Machine Learning studio UI , please switch to the CLI or the SDK.

Steps
Use the following steps to deploy an MLflow model with a custom scoring script.

1. Identify the folder where your MLflow model is placed.

a. Go to Azure Machine Learning portal .

b. Go to the section Models.

c. Select the model you are trying to deploy and click on the tab Artifacts.

d. Take note of the folder that is displayed. This folder was indicated when the
model was registered.


2. Create a scoring script. Notice how the folder name model you identified before
has been included in the init() function.

deployment-custom/code/batch_driver.py

Python

# Licensed under the MIT license.

import os
import glob
import mlflow
import pandas as pd

def init():
global model
global model_input_types
global model_output_names

# AZUREML_MODEL_DIR is an environment variable created during

deployment
# It is the path to the model folder
# Please provide your model's folder name if there's one
model_path = glob.glob(os.environ["AZUREML_MODEL_DIR"] + "/*/")[0]

# Load the model, it's input types and output names

model = mlflow.pyfunc.load(model_path)
if model.metadata.signature.inputs:
model_input_types = dict(
zip(
model.metadata.signature.inputs.input_names(),
model.metadata.signature.inputs.pandas_types(),
)
)
if model.metadata.signature.outputs:
if model.metadata.signature.outputs.has_input_names():
model_output_names =
model.metadata.signature.outputs.input_names()
elif len(model.metadata.signature.outputs.input_names()) == 1:
model_output_names = ["prediction"]

def run(mini_batch):
print(f"run method start: {__file__}, run({len(mini_batch)}
files)")

data = pd.concat(
map(
lambda fp:
pd.read_csv(fp).assign(filename=os.path.basename(fp)), mini_batch
)
)
if model_input_types:
data = data.astype(model_input_types)

pred = model.predict(data)

if pred is not pd.DataFrame:

if not model_output_names:
model_output_names = ["pred_col" + str(i) for i in
range(pred.shape[1])]
pred = pd.DataFrame(pred, columns=model_output_names)

return pd.concat([data, pred], axis=1)

 Tip

If your model is already registered in the model registry, you can

download/copy the conda.yml file associated with your model by going to
Azure Machine Learning studio > Models > Select your model from the list
> Artifacts. Open the root folder in the navigation and select the conda.yml
file listed. Click on Download or copy its content.

） Important
This example uses a conda environment specified at /heart-classifier-
mlflow/environment/conda.yaml . This file was created by combining the
original MLflow conda dependencies file and adding the package azureml-
core . You can't use the conda.yml file from the model directly.

Azure CLI

The environment definition will be included in the deployment definition itself

as an anonymous environment. You'll see in the following lines in the
deployment:

YAML

environment:
name: batch-mlflow-xgboost
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
conda_file: environment/conda.yaml

4. Configure the deployment:

Azure CLI

To create a new deployment under the created endpoint, create a YAML

configuration like the following. You can check the full batch endpoint YAML
schema for extra properties.

YAML

$schema:
https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.
json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-custom
description: A heart condition classifier based on XGBoost
type: model
model: azureml:heart-classifier-mlflow@latest
environment:
name: batch-mlflow-xgboost
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
conda_file: environment/conda.yaml
code_configuration:
code: code
scoring_script: batch_driver.py
compute: azureml:batch-cluster
resources:
instance_count: 2
settings:
max_concurrency_per_instance: 2
mini_batch_size: 2
output_action: append_row
output_file_name: predictions.csv
retry_settings:
max_retries: 3
timeout: 300
error_threshold: -1
logging_level: info

5. Let's create the deployment now:

Azure CLI

az ml batch-deployment create --file deployment-

custom/deployment.yml --endpoint-name $ENDPOINT_NAME

6. At this point, our batch endpoint is ready to be used.

Clean up resources
Azure CLI

Run the following code to delete the batch endpoint and all the underlying
deployments. Batch scoring jobs won't be deleted.

Azure CLI

az ml batch-endpoint delete --name $ENDPOINT_NAME --yes

Next steps
Customize outputs in batch deployments
Deploy and run MLflow models in Spark
jobs
Article • 01/03/2023

In this article, learn how to deploy and run your MLflow model in Spark jobs to
perform inference over large amounts of data or as part of data wrangling jobs.

About this example

This example shows how you can deploy an MLflow model registered in Azure Machine
Learning to Spark jobs running in managed Spark clusters (preview), Azure Databricks,
or Azure Synapse Analytics, to perform inference over large amounts of data.

The model is based on the UCI Heart Disease Data Set . The database contains 76
attributes, but we are using a subset of 14 of them. The model tries to predict the
presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1
(presence). It has been trained using an XGBBoost classifier and all the required
preprocessing has been packaged as a scikit-learn pipeline, making this model an
end-to-end pipeline that goes from raw data to predictions.

Azure CLI

git clone https://github.com/Azure/azureml-examples --depth 1

cd sdk/python/using-mlflow/deploy

Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:

Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .

Bash

pip install mlflow azureml-mlflow

 Tip

You can use the package mlflow-skinny , which is a lightweight MLflow

You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.

If you're doing remote tracking (tracking experiments running outside Azure

Machine Learning), configure MLflow to point to your Azure Machine Learning
workspace's tracking URI as explained at Configure MLflow for Azure Machine
Learning.

You must have a MLflow model registered in your workspace. Particularly, this
example will register a model trained for the Diabetes dataset .

Connect to your workspace

First, let's connect to Azure Machine Learning workspace where your model is
registered.

Azure Machine Learning compute

Tracking is already configured for you. Your default credentials will also be used
when working with MLflow.

Registering the model

We need a model registered in the Azure Machine Learning registry to perform
inference. In this case, we already have a local copy of the model in the repository, so we
only need to publish the model to the registry in the workspace. You can skip this step if
the model you are trying to deploy is already registered.

Python
model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

Alternatively, if your model was logged inside of a run, you can register it directly.

 Tip

To register the model, you'll need to know the location where the model has been
stored. If you are using autolog feature of MLflow, the path will depend on the
type and framework of the model being used. We recommend to check the jobs
output to identify which is the name of this folder. You can look for the folder that
contains a file named MLModel . If you are logging your models manually using
log_model , then the path is the argument you pass to such method. As an example,

if you log the model using mlflow.sklearn.log_model(my_model, "classifier") ,

then the path where the model is stored is classifier .

Python

model_name = 'heart-classifier'

registered_model = mlflow_client.create_model_version(
name=model_name, source=f"runs://{RUN_ID}/{MODEL_PATH}"
)
version = registered_model.version

７ Note

The path MODEL_PATH is the location where the model has been stored in the run.

Get input data to score

We'll need some input data to run or jobs on. In this example, we'll download sample
data from internet and place it in a shared storage used by the Spark cluster.

Python
import urllib

urllib.request.urlretrieve("https://azuremlexampledata.blob.core.windows.net
/data/heart-disease-uci/data/heart.csv", "/tmp/data")

Move the data to a mounted storage account available to the entire cluster.

Python

dbutils.fs.mv("file:/tmp/data", "dbfs:/")

） Important

The previous code uses dbutils , which is a tool available in Azure Databricks
cluster. Use the appropriate tool depending on the platform you are using.

The input data is then placed in the following folder:

Python

input_data_path = "dbfs:/data"

Run the model in Spark clusters

The following section explains how to run MLflow models registered in Azure Machine
Learning in Spark jobs.

1. Ensure the following libraries are installed in the cluster:

YAML

- mlflow<3,>=2.1
- cloudpickle==2.2.0
- scikit-learn==1.2.0
- xgboost==1.7.2

2. We'll use a notebook to demonstrate how to create a scoring routine with an

MLflow model registered in Azure Machine Learning. Create a notebook and use
PySpark as the default language.

3. Import the required namespaces:

Python

import mlflow
import pyspark.sql.functions as f

4. Configure the model URI. The following URI brings a model named heart-
classifier in its latest version.

Python

model_uri = "models:/heart-classifier/latest"

5. Load the model as an UDF function. A user-defined function (UDF) is a function

defined by a user, allowing custom logic to be reused in the user environment.

Python

predict_function = mlflow.pyfunc.spark_udf(spark, model_uri,

result_type='double')

 Tip

Use the argument result_type to control the type returned by the predict()
function.

6. Read the data you want to score:

Python

df = spark.read.option("header", "true").option("inferSchema",
"true").csv(input_data_path).drop("target")

In our case, the input data is on CSV format and placed in the folder dbfs:/data/ .
We're also dropping the column target as this dataset contains the target variable
to predict. In production scenarios, your data won't have this column.

7. Run the function predict_function and place the predictions on a new column. In
this case, we're placing the predictions in the column predictions .

Python

df.withColumn("predictions", score_function(*df.columns))
 Tip

The predict_function receives as arguments the columns required. In our

case, all the columns of the data frame are expected by the model and hence
df.columns is used. If your model requires a subset of the columns, you can
introduce them manually. If you model has a signature, types need to be
compatible between inputs and expected types.

8. You can write your predictions back to storage:

Python

scored_data_path = "dbfs:/scored-data"
scored_data.to_csv(scored_data_path)

Run the model in a standalone Spark job in

Azure Machine Learning
Azure Machine Learning supports creation of a standalone Spark job, and creation of a
reusable Spark component that can be used in Azure Machine Learning pipelines. In this
example, we'll deploy a scoring job that runs in Azure Machine Learning standalone
Spark job and runs an MLflow model to perform inference.

７ Note

To learn more about Spark jobs in Azure Machine Learning, see Submit Spark jobs
in Azure Machine Learning (preview).

1. A Spark job requires a Python script that takes arguments. Create a scoring script:

score.py

Python

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--model")
parser.add_argument("--input_data")
parser.add_argument("--scored_data")

args = parser.parse_args()
print(args.model)
print(args.input_data)

# Load the model as an UDF function

predict_function = mlflow.pyfunc.spark_udf(spark, args.model,
env_manager="conda")

# Read the data you want to score

df = spark.read.option("header", "true").option("inferSchema",
"true").csv(input_data).drop("target")

# Run the function `predict_function` and place the predictions on a

new column
scored_data = df.withColumn("predictions", score_function(*df.columns))

# Save the predictions

scored_data.to_csv(args.scored_data)

The above script takes three arguments --model , --input_data and --scored_data .
The first two are inputs and represent the model we want to run and the input
data, the last one is an output and it is the output folder where predictions will be
placed.

 Tip

Installation of Python packages: The previous scoring script loads the MLflow
model into an UDF function, but it indicates the parameter
env_manager="conda" . When this parameter is set, MLflow will restore the
required packages as specified in the model definition in an isolated
environment where only the UDF function runs. For more details see
mlflow.pyfunc.spark_udf documentation.

2. Create a job definition:

mlflow-score-spark-job.yml

yml

$schema: http://azureml/sdk-2-0/SparkJob.json
type: spark

code: ./src
entry:
file: score.py

conf:
spark.driver.cores: 1
spark.driver.memory: 2g
spark.executor.cores: 2
spark.executor.memory: 2g
spark.executor.instances: 2

inputs:
model:
type: mlflow_model
path: azureml:heart-classifier@latest
input_data:
type: uri_file
path: https://azuremlexampledata.blob.core.windows.net/data/heart-
disease-uci/data/heart.csv
mode: direct

outputs:
scored_data:
type: uri_folder

args: >-
--model ${{inputs.model}}
--input_data ${{inputs.input_data}}
--scored_data ${{outputs.scored_data}}

identity:
type: user_identity

resources:
instance_type: standard_e4s_v3
runtime_version: "3.2"

 Tip

To use an attached Synapse Spark pool, define compute property in the

sample YAML specification file shown above instead of resources property.

3. The YAML files shown above can be used in the az ml job create command, with
the --file parameter, to create a standalone Spark job as shown:

Azure CLI

az ml job create -f mlflow-score-spark-job.yml

Next steps
Deploy MLflow models to batch endpoints
Deploy MLflow models to online endpoint
Using MLflow models for no-code deployment
Bring your R workloads
Article • 02/24/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

There's no Azure Machine Learning SDK for R. Instead, you'll use either the CLI or a
Python control script to run your R scripts.

This article outlines the key scenarios for R that are supported in Azure Machine
Learning and known limitations.

Typical R workflow
A typical workflow for using R with Azure Machine Learning:

Develop R scripts interactively using Jupyter Notebooks on a compute instance.

(While you can also add Posit or RStudio to a compute instance, you can't currently
access data assets in the workspace from these applications on the compute
instance. So for now, interactive work is best done in a Jupyter notebook.)
Read tabular data from a registered data asset or datastore
Install additional R libraries
Save artifacts to the workspace file storage

Adapt your script to run as a production job in Azure Machine Learning

Remove any code that may require user interaction
Add command line input parameters to the script as necessary
Include and source the azureml_utils.R script in the same working directory of
the R script to be executed
Use crate to package the model
Include the R/MLflow functions in the script to log artifacts, models, parameters,
and/or tags to the job on MLflow

Submit remote asynchronous R jobs (you submit jobs via the CLI or Python SDK,
not R)
Build an environment
Log job artifacts, parameters, tags and models

Register your model using Azure Machine Learning studio

Deploy registered R models to managed online endpoints

Use the deployed endpoints for real-time inferencing/scoring
Known limitations

Limitation Do this instead

There's no R control-plane SDK. Use the Azure CLI or Python control

script to submit jobs.

RStudio running as a custom application (such as Posit Use Jupyter Notebooks with the R
or RStudio) within a container on the compute kernel on the compute instance.
instance can't access workspace assets or MLflow.

Interactive querying of workspace MLflow registry

from R isn't supported.

Nested MLflow runs in R are not supported.

Parallel job step isn't supported. Run a script in parallel n times using
different input parameters. But you'll
have to meta-program to generate n
YAML or CLI calls to do it.

Programmatic model registering/recording from a

running job with R isn't supported.

Zero code deployment (that is, automatic deployment) Create a custom container with plumber
of an R MLflow model is currently not supported. for deployment.

Scoring an R model with batch endpoints isn't

supported.

Azure Machine Learning online deployment yml can Follow the steps in How to deploy a
only use image URIs directly from the registry for the registered R model to an online (real
environment specification; not pre-built environments time) endpoint for the correct way to
from the same Dockerfile. deploy.

Next steps
Learn more about R in Azure Machine Learning:

Interactive R development
Adapt your R script to run in production
How to train R models in Azure Machine Learning
How to deploy an R model to an online (real time) endpoint
Interactive R development
Article • 06/01/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

This article shows how to use R on a compute instance in Azure Machine Learning
studio, that runs an R kernel in a Jupyter notebook.

The popular RStudio IDE also works. You can install RStudio or Posit Workbench in a
custom container on a compute instance. However, this has limitations in reading and
writing to your Azure Machine Learning workspace.

） Important

The code shown in this article works on an Azure Machine Learning compute
instance. The compute instance has an environment and configuration file
necessary for the code to run successfully.

Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today
An Azure Machine Learning workspace and a compute instance
A basic understand of using Jupyter notebooks in Azure Machine Learning studio.
See Model development on a cloud workstation for more information.

Run R in a notebook in studio

You'll use a notebook in your Azure Machine Learning workspace, on a compute
instance.

1. Sign in to Azure Machine Learning studio

2. Open your workspace if it isn't already open

3. On the left navigation, select Notebooks

4. Create a new notebook, named RunR.ipynb

 Tip

If you're not sure how to create and work with notebooks in studio, review
Run Jupyter notebooks in your workspace

5. Select the notebook.

6. On the notebook toolbar, make sure your compute instance is running. If not, start
it now.

7. On the notebook toolbar, switch the kernel to R.

Your notebook is now ready to run R commands.

Access data
You can upload files to your workspace file storage resource, and then access those files
in R. However, for files stored in Azure data assets or data from datastores, you must
install some packages.

This section describes how to use Python and the reticulate package to load your data
assets and datastores into R, from an interactive session. You use the azureml-fsspec
Python package and the reticulate R package to read tabular data as Pandas
DataFrames. This section also includes an example of reading data assets and datastores
into an R data.frame .

To install these packages:

1. Create a new file on the compute instance, called setup.sh.

2. Copy this code into the file:

Bash

#!/bin/bash
set -e

# Installs azureml-fsspec in default conda environment

# Does not need to run as sudo

eval "$(conda shell.bash hook)"

conda activate azureml_py310_sdkv2
pip install azureml-fsspec
conda deactivate

# Checks that version 1.26 of reticulate is installed (needs to be done

as sudo)

sudo -u azureuser -i <<'EOF'

R -e "if (packageVersion('reticulate') >= 1.26) message('Version OK')
else install.packages('reticulate')"
EOF

3. Select Save and run script in terminal to run the script

The install script handles these steps:

pip installs azureml-fsspec in the default conda environment for the compute

instance
Installs the R reticulate package if necessary (version must be 1.26 or greater)

Read tabular data from registered data assets or

datastores
For data stored in a data asset created in Azure Machine Learning, use these steps to
read that tabular file into a Pandas DataFrame or an R data.frame :

７ Note

Reading a file with reticulate only works with tabular data.

1. Ensure you have the correct version of reticulate . For a version less than 1.26, try
to use a newer compute instance.

packageVersion("reticulate")

2. Load reticulate and set the conda environment where azureml-fsspec was
installed
R

library(reticulate)
use_condaenv("azureml_py310_sdkv2")
print("Environment is set")

3. Find the URI path to the data file.

a. First, get a handle to your workspace

py_code <- "from azure.identity import DefaultAzureCredential

from azure.ai.ml import MLClient
credential = DefaultAzureCredential()
ml_client = MLClient.from_config(credential=credential)"

py_run_string(py_code)
print("ml_client is configured")

b. Use this code to retrieve the asset. Make sure to replace <DATA_NAME> and
<VERSION_NUMBER> with the name and number of your data asset.

 Tip

In studio, select Data in the left navigation to find the name and version
number of your data asset.

# Replace <MY_NAME> and <MY_VERSION> with your values

py_code <- "my_name = '<MY_NAME>'
my_version = '<MY_VERSION>'
data_asset = ml_client.data.get(name=my_name, version=my_version)
data_uri = data_asset.path"

c. Run the code to retrieve the URI.

py_run_string(py_code)
print(paste("URI path is", py$data_uri))

4. Use Pandas read functions to read the file(s) into the R environment
R

pd <- import("pandas")
cc <- pd$read_csv(py$data_uri)
head(cc)

You can also use a Datastore URI to access different files on a registered Datastore, and
read these resources into an R data.frame .

1. In this format, create a Datastore URI, using your own values:

subscription <- '<subscription_id>'

resource_group <- '<resource_group>'
workspace <- '<workspace>'
datastore_name <- '<datastore>'
path_on_datastore <- '<path>'

uri <- paste0("azureml://subscriptions/", subscription,

"/resourcegroups/", resource_group, "/workspaces/", workspace,
"/datastores/", datastore_name, "/paths/", path_on_datastore)

 Tip

Instead of remembering the datastore URI format, you can copy-and-paste

the datastore URI from the Studio UI, if you know the datastore where your
file is located:
a. Navigate to the file/folder you want to read into R
b. Select the elipsis (...) next to it.
c. Select from the menu Copy URI.
d. Select the Datastore URI to copy into your notebook/script. Note that you
must create a variable for <path> in the code.
2. Create a filestore object using the aforementioned URI:

fs <- azureml.fsspec$AzureMachineLearningFileSystem(uri, sep = "")

3. Read into an R data.frame :

df <- with(fs$open("<path>)", "r") %as% f, {

x <- as.character(f$read(), encoding = "utf-8")
read.csv(textConnection(x), header = TRUE, sep = ",", stringsAsFactors =
FALSE)
})
print(df)

Install R packages
A compute instance has many preinstalled R packages.

To install other packages, you must explicitly state the location and dependencies.

 Tip

When you create or use a different compute instance, you must re-install any
packages you've installed.

For example, to install the tsibble package:

install.packages("tsibble",
dependencies = TRUE,
lib = "/home/azureuser")

７ Note

If you install packages within an R session that runs in a Jupyter notebook,

dependencies = TRUE is required. Otherwise, dependent packages will not
automatically install. The lib location is also required to install in the correct
compute instance location.

Load R libraries
Add /home/azureuser to the R library path.

.libPaths("/home/azureuser")

 Tip

You must update the .libPaths in each interactive R script to access user installed
libraries. Add this code to the top of each interactive R script or notebook.

Once the libPath is updated, load libraries as usual.

library('tsibble')

Use R in the notebook

Beyond the issues described earlier, use R as you would in any other environment,
including your local workstation. In your notebook or script, you can read and write to
the path where the notebook/script is stored.

７ Note
From an interactive R session, you can only write to the workspace file system.
From an interactive R session, you cannot interact with MLflow (such as log
model or query registry).

Next steps
Adapt your R script to run in production
Adapt your R script to run in production
Article • 02/26/2023

This article explains how to take an existing R script and make the appropriate changes
to run it as a job in Azure Machine Learning.

You'll have to make most of, if not all, of the changes described in detail in this article.

Remove user interaction

Your R script must be designed to run unattended and will be executed via the Rscript
command within the container. Make sure you remove any interactive inputs or outputs
from the script.

Add parsing
If your script requires any sort of input parameter (most scripts do), pass the inputs into
the script via the Rscript call.

Bash

Rscript <name-of-r-script>.R
--data_file ${{inputs.<name-of-yaml-input-1>}}
--brand ${{inputs.<name-of-yaml-input-2>}}

In your R script, parse the inputs and make the proper type conversions. We recommend
that you use the optparse package.

The following snippet shows how to:

initiate the parser

add all your inputs as options
parse the inputs with the appropriate data types

You can also add defaults, which are handy for testing. We recommend that you add an
--output parameter with a default value of ./outputs so that any output of the script
will be stored.

library(optparse)
parser <- OptionParser()

parser <- add_option(

parser,
"--output",
type = "character",
action = "store",
default = "./outputs"
)

parser <- add_option(

parser,
"--data_file",
type = "character",
action = "store",
default = "data/myfile.csv"
)

parser <- add_option(

parser,
"--brand",
type = "double",
action = "store",
default = 1
)
args <- parse_args(parser)

args is a named list. You can use any of these parameters later in your script.

Source the azureml_utils.R helper script

You must source a helper script called azureml_utils.R script in the same working
directory of the R script that will be run. The helper script is required for the running R
script to be able to communicate with the MLflow server. The helper script provides a
method to continuously retrieve the authentication token, since the token changes
quickly in a running job. The helper script also allows you to use the logging functions
provided in the R MLflow API to log models, parameters, tags and general artifacts.

1. Create your file, azureml_utils.R , with this code:

# Azure ML utility to enable usage of the MLFlow R API for tracking

with Azure Machine Learning (Azure ML). This utility does the
following::
# 1. Understands Azure ML MLflow tracking url by extending OSS MLflow R
client.
# 2. Manages Azure ML Token refresh for remote runs (runs that execute
in Azure Machine Learning). It uses tcktk2 R libraray to schedule token
refresh.
# Token refresh interval can be controlled by setting the
environment variable MLFLOW_AML_TOKEN_REFRESH_INTERVAL and defaults to
30 seconds.

library(mlflow)
library(httr)
library(later)
library(tcltk2)

new_mlflow_client.mlflow_azureml <- function(tracking_uri) {

host <- paste("https", tracking_uri$path, sep = "://")
get_host_creds <- function () {
mlflow:::new_mlflow_host_creds(
host = host,
token = Sys.getenv("MLFLOW_TRACKING_TOKEN"),
username = Sys.getenv("MLFLOW_TRACKING_USERNAME", NA),
password = Sys.getenv("MLFLOW_TRACKING_PASSWORD", NA),
insecure = Sys.getenv("MLFLOW_TRACKING_INSECURE", NA)
)
}
cli_env <- function() {
creds <- get_host_creds()
res <- list(
MLFLOW_TRACKING_USERNAME = creds$username,
MLFLOW_TRACKING_PASSWORD = creds$password,
MLFLOW_TRACKING_TOKEN = creds$token,
MLFLOW_TRACKING_INSECURE = creds$insecure
)
res[!is.na(res)]
}
mlflow:::new_mlflow_client_impl(get_host_creds, cli_env, class =
"mlflow_azureml_client")
}

get_auth_header <- function() {

headers <- list()
auth_token <- Sys.getenv("MLFLOW_TRACKING_TOKEN")
auth_header <- paste("Bearer", auth_token, sep = " ")
headers$Authorization <- auth_header
headers
}

get_token <- function(host, exp_id, run_id) {

req_headers <- do.call(httr::add_headers, get_auth_header())
token_host <- gsub("mlflow/v1.0","history/v1.0", host)
token_host <- gsub("azureml://","https://", token_host)
api_url <- paste0(token_host, "/experimentids/", exp_id, "/runs/",
run_id, "/token")
GET( api_url, timeout(getOption("mlflow.rest.timeout", 30)),
req_headers)
}

fetch_token_from_aml <- function() {

message("Refreshing token")
tracking_uri <- Sys.getenv("MLFLOW_TRACKING_URI")
exp_id <- Sys.getenv("MLFLOW_EXPERIMENT_ID")
run_id <- Sys.getenv("MLFLOW_RUN_ID")
sleep_for <- 1
time_left <- 30
response <- get_token(tracking_uri, exp_id, run_id)
while (response$status_code == 429 && time_left > 0) {
time_left <- time_left - sleep_for
warning(paste("Request returned with status code 429 (Rate
limit exceeded). Retrying after ",
sleep_for, " seconds. Will continue to retry 429s
for up to ", time_left,
" second.", sep = ""))
Sys.sleep(sleep_for)
sleep_for <- min(time_left, sleep_for * 2)
response <- get_token(tracking_uri, exp_id)
}

if (response$status_code != 200){
error_response = paste("Error fetching token will try again
after sometime: ", str(response), sep = " ")
warning(error_response)
}

if (response$status_code == 200){
text <- content(response, "text", encoding = "UTF-8")
json_resp <-jsonlite::fromJSON(text, simplifyVector = FALSE)
json_resp$token
Sys.setenv(MLFLOW_TRACKING_TOKEN = json_resp$token)
message("Refreshing token done")
}
}

clean_tracking_uri <- function() {

tracking_uri <- httr::parse_url(Sys.getenv("MLFLOW_TRACKING_URI"))
tracking_uri$query = ""
tracking_uri <-httr::build_url(tracking_uri)
Sys.setenv(MLFLOW_TRACKING_URI = tracking_uri)
}

clean_tracking_uri()
tcltk2::tclTaskSchedule(as.integer(Sys.getenv("MLFLOW_TOKEN_REFRESH_INT
ERVAL_SECONDS", 30))*1000, fetch_token_from_aml(), id =
"fetch_token_from_aml", redo = TRUE)

# Set MLFlow related env vars

Sys.setenv(MLFLOW_BIN = system("which mlflow", intern = TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN = system("which python", intern = TRUE))

2. Start your R script with the following line:

R
source("azureml_utils.R")

Read data files as local files

When you run an R script as a job, Azure Machine Learning takes the data you specify in
the job submission and mounts it on the running container. Therefore you'll be able to
read the data file(s) as if they were local files on the running container.

Make sure your source data is registered as a data asset

Pass the data asset by name in the job submission parameters
Read the files as you normally would read a local file

Define the input parameter as shown in the parameters section. Use the parameter,
data-file , to specify a whole path, so that you can use read_csv(args$data_file) to

read the data asset.

Save job artifacts (images, data, etc.)

） Important

This section does not apply to models. See the following two sections for model
specific saving and logging instructions.

You can store arbitrary script outputs like data files, images, serialized R objects, etc. that
are generated by the R script in Azure Machine Learning. Create a ./outputs directory
to store any generated artifacts (images, models, data, etc.) Any files saved to ./outputs
will be automatically included in the run and uploaded to the experiment at the end of
the run. Since you added a default value for the --output parameter in the input
parameters section, include the following code snippet in your R script to create the
output directory.

if (!dir.exists(args$output)) {
dir.create(args$output)
}

After you create the directory, save your artifacts to that directory. For example:

R
# create and save a plot
library(ggplot2)

myplot <- ggplot(...)

ggsave(myplot,
filename = file.path(args$output,"forecast-plot.png"))

# save an rds serialized object

saveRDS(myobject, file = file.path(args$output,"myobject.rds"))

crate your models with the carrier package

The R MLflow API documentation specifies that your R models need to be of the
crate model flavor.

If your R script trains a model and you produce a model object, you'll need to
crate it to be able to deploy it at a later time with Azure Machine Learning.

When using the crate function, use explicit namespaces when calling any package
function you need.

Let's say you have a timeseries model object called my_ts_model created with the fable
package. In order to make this model callable when it's deployed, create a crate where
you'll pass in the model object and a forecasting horizon in number of periods:

library(carrier)
crated_model <- crate(function(x)
{
fabletools::forecast(!!my_ts_model, h = x)
})

The crated_model object is the one you'll log.

Log models, parameters, tags, or other artifacts

with the R MLflow API
In addition to saving any generated artifacts, you can also log models, tags, and
parameters for each run. Use the R MLflow API to do so.
When you log a model, you log the crated model you created as described in the
previous section.

７ Note

When you log a model, the model is also saved and added to the run artifacts.
There is no need to explicitly save a model unless you did not log it.

To log a model, and/or parameter:

1. Start the run with mlflow_start_run()

2. Log artifacts with mlflow_log_model , mlflow_log_param , or mlflow_log_batch
3. Do not end the run with mlflow_end_run() . Skip this call, as it currently causes an
error.

For example, to log the crated_model object as created in the previous section, you
would include the following code in your R script:

 Tip

Use models as value for artifact_path when logging a model, this is a best
practice (even though you can name it something else.)

mlflow_start_run()

mlflow_log_model(
model = crated_model, # the crate model object
artifact_path = "models" # a path to save the model object to
)

mlflow_log_param(<key-name>, <value>)

# mlflow_end_run() - causes an error, do not include mlflow_end_run()

Script structure and example

Use these code snippets as a guide to structure your R script, following all the changes
outlined in this article.

R
# BEGIN R SCRIPT

# source the azureml_utils.R script which is needed to use the MLflow back
end
# with R
source("azureml_utils.R")

# load your packages here. Make sure that they are installed in the
container.
library(...)

# parse the command line arguments.

library(optparse)

parser <- OptionParser()

parser <- add_option(

parser,
"--output",
type = "character",
action = "store",
default = "./outputs"
)

parser <- add_option(

parser,
"--data_file",
type = "character",
action = "store",
default = "data/myfile.csv"
)

parser <- add_option(

parser,
"--brand",
type = "double",
action = "store",
default = 1
)
args <- parse_args(parser)

# your own R code goes here

# - model building/training
# - visualizations
# - etc.

# create the ./outputs directory

if (!dir.exists(args$output)) {
dir.create(args$output)
}

# log models and parameters to MLflow

mlflow_start_run()
mlflow_log_model(
model = crated_model, # the crate model object
artifact_path = "models" # a path to save the model object to
)

mlflow_log_param(<key-name>, <value>)

# mlflow_end_run() - causes an error, do not include mlflow_end_run()

## END OF R SCRIPT

Create an environment
To run your R script, you'll use the ml extension for Azure CLI, also referred to as CLI v2.
The ml command uses a YAML job definitions file. For more information about
submitting jobs with az ml , see Train models with Azure Machine Learning CLI.

The YAML job file specifies an environment. You'll need to create this environment in
your workspace before you can run the job.

You can create the environment in Azure Machine Learning studio or with the Azure CLI.

Whatever method you use, you'll use a Dockerfile. All Docker context files for R
environments must have the following specification in order to work on Azure Machine
Learning:

Dockerfile

FROM rocker/tidyverse:latest

# Install python
RUN apt-get update -qq && \
apt-get install -y python3-pip tcl tk libz-dev libpng-dev

RUN ln -f /usr/bin/python3 /usr/bin/python

RUN ln -f /usr/bin/pip3 /usr/bin/pip
RUN pip install -U pip

# Install azureml-MLflow
RUN pip install azureml-MLflow
RUN pip install MLflow

# Create link for python

RUN ln -f /usr/bin/python3 /usr/bin/python

# Install R packages required for logging with MLflow (these are necessary)
RUN R -e "install.packages('mlflow', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('carrier', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('optparse', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('tcltk2', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"

The base image is rocker/tidyverse:latest , which has many R packages and their
dependencies already installed.

） Important

You must install any R packages your script will need to run in advance. Add more
lines to the Docker context file as needed.

Dockerfile

RUN R -e "install.packages('<package-to-install>', dependencies = TRUE,

repos = 'https://cloud.r-project.org/')"

Additional suggestions
Some additional suggestions you may want to consider:

Use R's tryCatch function for exception and error handling

Add explicit logging for troubleshooting and debugging

Next steps
How to train R models in Azure Machine Learning
Run an R job to train a model
Article • 07/13/2023

APPLIES TO: Azure CLI ml extension v2 (current)

This article explains how to take the R script that you adapted to run in production and
set it up to run as an R job using the Azure Machine Learning CLI V2.

７ Note

Although the title of this article refers to training a model, you can actually run any
kind of R script as long as it meets the requirements listed in the adapting article.

Prerequisites
An Azure Machine Learning workspace.
A registered data asset that your training job will use.
Azure CLI and ml extension installed. Or use a compute instance in your
workspace, which has the CLI preinstalled.
A compute cluster or compute instance to run your training job.
An R environment for the compute cluster to use to run the job.

Create a folder with this structure

Create this folder structure for your project:

📁 r-job-azureml
├─ src
│ ├─ azureml_utils.R
│ ├─ r-source.R
├─ job.yml

） Important

All source code goes in the src directory.

The r-source.R file is the R script that you adapted to run in production
The azureml_utils.R file is necessary. The source code is shown here

Prepare the job YAML

Azure Machine Learning CLI v2 has different different YAML schemas for different
operations. You'll use the job YAML schema to submit a job. This is the job.yml file that
is a part of this project.

You'll need to gather specific pieces of information to put into the YAML:

The name of the registered data asset you'll use as the data input (with version):
azureml:<REGISTERED-DATA-ASSET>:<VERSION>

The name of the environment you created (with version): azureml:<R-ENVIRONMENT-

NAME>:<VERSION>

The name of the compute cluster: azureml:<COMPUTE-CLUSTER-NAME>

 Tip

For Azure Machine Learning artifacts that require versions (data assets,
environments), you can use the shortcut URI azureml:<AZUREML-ASSET>@latest to
get the latest version of that artifact if you don't need to set a specific version.

Sample YAML schema to submit a job

Edit your job.yml file to contain the following. Make sure to replace values shown <IN-
BRACKETS-AND-CAPS> and remove the brackets.

yml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
# the Rscript command goes in the command key below. Here you also specify
# which parameters are passed into the R script and can reference the input
# keys and values further below
# Modify any value shown below <IN-BRACKETS-AND-CAPS> (remove the brackets)
command: >
Rscript <NAME-OF-R-SCRIPT>.R
--data_file ${{inputs.datafile}}
--other_input_parameter ${{inputs.other}}
code: src # this is the code directory
inputs:
datafile: # this is a registered data asset
type: uri_file
path: azureml:<REGISTERED-DATA-ASSET>@latest
other: 1 # this is a sample parameter, which is the number 1 (as text)
environment: azureml:<R-ENVIRONMENT-NAME>@latest
compute: azureml:<COMPUTE-CLUSTER-OR-INSTANCE-NAME>
experiment_name: <NAME-OF-EXPERIMENT>
description: <DESCRIPTION>

Submit the job

In the following commands in this section, you may need to know:

The Azure Machine Learning workspace name

The resource group name where the workspace is
The subscription where the workspace is

Find these values from Azure Machine Learning studio :

1. Sign in and open your workspace.

2. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
3. You can copy the values from the section that appears.

To submit the job, run the following commands in a terminal window:

1. Change directories into the r-job-azureml .

Bash

cd r-job-azureml

2. Sign in to Azure. If you're doing this from an Azure Machine Learning compute
instance, use:
Azure CLI

az login --identity

If you're not on the compute instance, omit --identity and follow the prompt to
open a browser window to authenticate.

3. Make sure you have the most recent versions of the CLI and the ml extension:

Azure CLI

az upgrade

4. If you have multiple Azure subscriptions, set the active subscription to the one
you're using for your workspace. (You can skip this step if you only have access to
a single subscription.) Replace <SUBSCRIPTION-NAME> with your subscription name.
Also remove the brackets <> .

Azure CLI

az account set --subscription "<SUBSCRIPTION-NAME>"

5. Now use CLI to submit the job. If you're doing this on a compute instance in your
workspace, you can use environment variables for the workspace name and
resource group as show in the following code. If you aren't on a compute instance,
replace these values with your workspace name and resource group.

Azure CLI

az ml job create -f job.yml --workspace-name $CI_WORKSPACE --resource-

group $CI_RESOURCE_GROUP

Once you've submitted the job, you can check the status and results in studio:

1. Sign in to Azure Machine Learning studio .

2. Select your workspace if it isn't already loaded.
3. On the left navigation, select Jobs.
4. Select the Experiment name that you used to train your model.
5. Select the Display name of the job to view details and artifacts of the job,
including metrics, images, child jobs, outputs, logs, and code used in the job.

Register model
Finally, once the training job is complete, register your model if you want to deploy it.
Start in the studio from the page showing your job details.

1. Once your job completes, select Outputs + logs to view outputs of the job.

2. Open the models folder to verify that crate.bin and MLmodel are present. If not,
check the logs to see if there was an error.

3. On the toolbar at the top, select + Register model.

4. For Model type, change the default from MLflow to Unspecified type.

5. For Job output, select models, the folder that contains the model.

6. Select Next.

7. Supply the name you wish to use for your model. Add Description, Version, and
Tags if you wish.

8. Select Next.

9. Review the information.

10. Select Register.

At the top of the page, you'll see a confirmation that the model is registered. The
confirmation looks similar to this:
Select Click here to go to this model. if you wish to view the registered model details.

Next steps
Now that you have a registered model, learn How to deploy an R model to an online
(real time) endpoint.
How to deploy a registered R model to
an online (real time) endpoint
Article • 02/24/2023

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, you'll learn how to deploy an R model to a managed endpoint (Web API)
so that your application can score new data against the model in near real-time.

Prerequisites
An Azure Machine Learning workspace.
Azure CLI and ml extension installed. Or use a compute instance in your
workspace, which has the CLI pre-installed.
At least one custom environment associated with your workspace. Create an R
environment, or any other custom environment if you don't have one.
An understanding of the R plumber package
A model that you've trained and packaged with crate, and registered into your
workspace

Create a folder with this structure

Create this folder structure for your project:

📂 r-deploy-azureml
├─📂 docker-context
│ ├─ Dockerfile
│ └─ start_plumber.R
├─📂 src
│ └─ plumber.R
├─ deployment.yml
├─ endpoint.yml

The contents of each of these files is shown and explained in this article.

Dockerfile
This is the file that defines the container environment. You'll also define the installation
of any additional R packages here.
A sample Dockerfile will look like this:

Dockerfile

# REQUIRED: Begin with the latest R container with plumber

FROM rstudio/plumber:latest

# REQUIRED: Install carrier package to be able to use the crated model

(whether from a training job
# or uploaded)
RUN R -e "install.packages('carrier', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"

# OPTIONAL: Install any additional R packages you may need for your model
crate to run
RUN R -e "install.packages('<PACKAGE-NAME>', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('<PACKAGE-NAME>', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"

# REQUIRED
ENTRYPOINT []

COPY ./start_plumber.R /tmp/start_plumber.R

CMD ["Rscript", "/tmp/start_plumber.R"]

Modify the file to add the packages you need for your scoring script.

plumber.R

） Important

This section shows how to structure the plumber.R script. For detailed information
about the plumber package, see plumber documentation .

The file plumber.R is the R script where you'll define the function for scoring. This script
also performs tasks that are necessary to make your endpoint work. The script:

Gets the path where the model is mounted from the AZUREML_MODEL_DIR
environment variable in the container.
Loads a model object created with the crate function from the carrier package,
which was saved as crate.bin when it was packaged.
Unserializes the model object
Defines the scoring function
 Tip

Make sure that whatever your scoring function produces can be converted back to
JSON. Some R objects are not easily converted.

# plumber.R
# This script will be deployed to a managed endpoint to do the model scoring

# REQUIRED
# When you deploy a model as an online endpoint, Azure Machine Learning
mounts your model
# to your endpoint. Model mounting enables you to deploy new versions of the
model without
# having to create a new Docker image.

model_dir <- Sys.getenv("AZUREML_MODEL_DIR")

# REQUIRED
# This reads the serialized model with its respecive predict/score method
you
# registered. The loaded load_model object is a raw binary object.
load_model <- readRDS(paste0(model_dir, "/models/crate.bin"))

# REQUIRED
# You have to unserialize the load_model object to make it its function
scoring_function <- unserialize(load_model)

# REQUIRED
# << Readiness route vs. liveness route >>
# An HTTP server defines paths for both liveness and readiness. A liveness
route is used to
# check whether the server is running. A readiness route is used to check
whether the
# server's ready to do work. In machine learning inference, a server could
respond 200 OK
# to a liveness request before loading a model. The server could respond 200
OK to a
# readiness request only after the model has been loaded into memory.

#* Liveness check
#* @get /live
function() {
"alive"
}

#* Readiness check
#* @get /ready
function() {
"ready"
}
# << The scoring function >>
# This is the function that is deployed as a web API that will score the
model
# Make sure that whatever you are producing as a score can be converted
# to JSON to be sent back as the API response
# in the example here, forecast_horizon (the number of time units to
forecast) is the input to scoring_function.
# the output is a tibble
# we are converting some of the output types so they work in JSON

#* @param forecast_horizon
#* @post /score
function(forecast_horizon) {
scoring_function(as.numeric(forecast_horizon)) |>
tibble::as_tibble() |>
dplyr::transmute(period = as.character(yr_wk),
dist = as.character(logmove),
forecast = .mean) |>
jsonlite::toJSON()
}

start_plumber.R
The file start_plumber.R is the R script that gets run when the container starts, and it
calls your plumber.R script. Use the following script as-is.

entry_script_path <- paste0(Sys.getenv('AML_APP_ROOT'),'/',

Sys.getenv('AZUREML_ENTRY_SCRIPT'))

pr <- plumber::plumb(entry_script_path)

args <- list(host = '0.0.0.0', port = 8000);

if (packageVersion('plumber') >= '1.0.0') {

pr$setDocs(TRUE)
} else {
args$swagger <- TRUE
}

do.call(pr$run, args)

Build container
These steps assume you have an Azure Container Registry associated with your
workspace, which is created when you create your first custom environment. To see if
you have a custom environment:

1. Sign in to Azure Machine Learning studio .

2. Select your workspace if necessary.
3. On the left navigation, select Environments.
4. On the top, select Custom environments.
5. If you see custom environments, nothing more is needed.
6. If you don't see any custom environments, create an R environment, or any other
custom environment. (You won't use this environment for deployment, but you will
use the container registry that is also created for you.)

Once you have verified that you have at least one custom environment, use the
following steps to build a container.

1. Open a terminal window and sign in to Azure. If you're doing this from an Azure
Machine Learning compute instance, use:

Azure CLI

az login --identity

If you're not on the compute instance, omit --identity and follow the prompt to
open a browser window to authenticate.

2. Make sure you have the most recent versions of the CLI and the ml extension:

Azure CLI

az upgrade

3. If you have multiple Azure subscriptions, set the active subscription to the one
you're using for your workspace. (You can skip this step if you only have access to
a single subscription.) Replace <SUBSCRIPTION-NAME> with your subscription name.
Also remove the brackets <> .

Azure CLI

az account set --subscription "<SUBSCRIPTION-NAME>"

4. Set the default workspace. If you're doing this from a compute instance, you can
use the following command as is. If you're on any other computer, substitute your
resource group and workspace name instead. (You can find these values in Azure
Machine Learning studio.)

Azure CLI

az configure --defaults group=$CI_RESOURCE_GROUP

workspace=$CI_WORKSPACE

5. Make sure you are in your project directory.

Bash

cd r-deploy-azureml

6. To build the image in the cloud, execute the following bash commands in your
terminal. Replace <IMAGE-NAME> with the name you want to give the image.

If your workspace is in a virtual network, see Enable Azure Container Registry (ACR)
for additional steps to add --image-build-compute to the az acr build command
in the last line of this code.

Azure CLI

WORKSPACE=$(az config get --query "defaults[?name ==

'workspace'].value" -o tsv)
ACR_NAME=$(az ml workspace show -n $WORKSPACE --query
container_registry -o tsv | cut -d'/' -f9-)
IMAGE_TAG=${ACR_NAME}.azurecr.io/<IMAGE-NAME>

az acr build ./docker-context -t $IMAGE_TAG -r $ACR_NAME

） Important

It will take a few minutes for the image to be built. Wait until the build process is
complete before proceeding to the next section. Don't close this terminal, you'll use
it next to create the deployment.

The az acr command will automatically upload your docker-context folder - that
contains the artifacts to build the image - to the cloud where the image will be built and
hosted in an Azure Container Registry.

Deploy model
In this section of the article, you'll define and create an endpoint and deployment to
deploy the model and image built in the previous steps to a managed online endpoint.

An endpoint is an HTTPS endpoint that clients - such as an application - can call to

receive the scoring output of a trained model. It provides:

＂ Authentication using "key & token" based auth

＂ SSL termination
＂ A stable scoring URI (endpoint-name.region.inference.ml.Azure.com)

A deployment is a set of resources required for hosting the model that does the actual
scoring. A single endpoint can contain multiple deployments. The load balancing
capabilities of Azure Machine Learning managed endpoints allows you to give any
percentage of traffic to each deployment. Traffic allocation can be used to do safe
rollout blue/green deployments by balancing requests between different instances.

Create managed online endpoint

1. In your project directory, add the endpoint.yml file with the following code.
Replace <ENDPOINT-NAME> with the name you want to give your managed endpoint.

yml

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schem
a.json
name: <ENDPOINT-NAME>
auth_mode: aml_token

2. Using the same terminal where you built the image, execute the following CLI
command to create an endpoint:

Azure CLI

az ml online-endpoint create -f endpoint.yml

3. Leave the terminal open to continue using it in the next section.

Create deployment
1. To create your deployment, add the following code to the deployment.yml file.
Replace <ENDPOINT-NAME> with the endpoint name you defined in the
endpoint.yml file

Replace <DEPLOYMENT-NAME> with the name you want to give the deployment

Replace <MODEL-URI> with the registered model's URI in the form of

azureml:modelname@latest

Replace <IMAGE-TAG> with the value from:

Bash

echo $IMAGE_TAG

yml

$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.sch
ema.json
name: <DEPLOYMENT-NAME>
endpoint_name: <ENDPOINT-NAME>
code_configuration:
code: ./src
scoring_script: plumber.R
model: <MODEL-URI>
environment:
image: <IMAGE-TAG>
inference_config:
liveness_route:
port: 8000
path: /live
readiness_route:
port: 8000
path: /ready
scoring_route:
port: 8000
path: /score
instance_type: Standard_DS2_v2
instance_count: 1

2. Next, in your terminal execute the following CLI command to create the
deployment (notice that you're setting 100% of the traffic to this model):

Azure CLI

az ml online-deployment create -f deployment.yml --all-traffic --skip-

script-validation
７ Note

It may take several minutes for the service to be deployed. Wait until deployment is
finished before proceeding to the next section.

Test
Once your deployment has been successfully created, you can test the endpoint using
studio or the CLI:

Studio

Navigate to the Azure Machine Learning studio and select from the left-hand
menu Endpoints. Next, select the r-endpoint-iris you created earlier.

Enter the following json into the Input data to rest real-time endpoint textbox:

JSON

{
"forecast_horizon" : [2]
}

Select Test. You should see the following output:

Clean-up resources
Now that you've successfully scored with your endpoint, you can delete it so you don't
incur ongoing cost:

Azure CLI
az ml online-endpoint delete --name r-endpoint-forecast

Next steps
For more information about using R with Azure Machine Learning, see Overview of R
capabilities in Azure Machine Learning
Run Azure Machine Learning models
from Fabric, using batch endpoints
(preview)
Article • 11/15/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2

(current)

In this article, you learn how to consume Azure Machine Learning batch deployments
from Microsoft Fabric. Although the workflow uses models that are deployed to batch
endpoints, it also supports the use of batch pipeline deployments from Fabric.

） Important

For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .

Prerequisites
Get a Microsoft Fabric subscription. Or sign up for a free Microsoft Fabric trial.
Sign in to Microsoft Fabric.
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine
Learning .
An Azure Machine Learning workspace. If you don't have one, use the steps in How
to manage workspaces to create one.
Ensure that you have the following permissions in the workspace:
Create/manage batch endpoints and deployments: Use roles Owner,
contributor, or custom role allowing
Microsoft.MachineLearningServices/workspaces/batchEndpoints/* .

Create ARM deployments in the workspace resource group: Use roles Owner,
contributor, or custom role allowing Microsoft.Resources/deployments/write
in the resource group where the workspace is deployed.
A model deployed to a batch endpoint. If you don't have one, use the steps in
Deploy models for scoring in batch endpoints to create one.
Download the heart-unlabeled.csv sample dataset to use for scoring.

Architecture
Azure Machine Learning can't directly access data stored in Fabric's OneLake. However,
you can use OneLake's capability to create shortcuts within a Lakehouse to read and
write data stored in Azure Data Lake Gen2. Since Azure Machine Learning supports
Azure Data Lake Gen2 storage, this setup allows you to use Fabric and Azure Machine
Learning together. The data architecture is as follows:

Configure data access

To allow Fabric and Azure Machine Learning to read and write the same data without
having to copy it, you can take advantage of OneLake shortcuts and Azure Machine
Learning datastores. By pointing a OneLake shortcut and a datastore to the same
storage account, you can ensure that both Fabric and Azure Machine Learning read from
and write to the same underlying data.

In this section, you create or identify a storage account to use for storing the
information that the batch endpoint will consume and that Fabric users will see in
OneLake. Fabric only supports storage accounts with hierarchical names enabled, such
as Azure Data Lake Gen2.

Create a OneLake shortcut to the storage account

1. Open the Synapse Data Engineering experience in Fabric.

2. From the left-side panel, select your Fabric workspace to open it.
3. Open the lakehouse that you'll use to configure the connection. If you don't have a
lakehouse already, go to the Data Engineering experience to create a lakehouse. In
this example, you use a lakehouse named trusted.

4. In the left-side navigation bar, open more options for Files, and then select New
shortcut to bring up the wizard.

5. Select the Azure Data Lake Storage Gen2 option.

6. In the Connection settings section, paste the URL associated with the Azure Data
Lake Gen2 storage account.

7. In the Connection credentials section:

a. For Connection, select Create new connection.
b. For Connection name, keep the default populated value.
c. For Authentication kind, select Organizational account to use the credentials
of the connected user via OAuth 2.0.
d. Select Sign in to sign in.

8. Select Next.

9. Configure the path to the shortcut, relative to the storage account, if needed. Use
this setting to configure the folder that the shortcut will point to.

10. Configure the Name of the shortcut. This name will be a path inside the lakehouse.
In this example, name the shortcut datasets.

11. Save the changes.

Create a datastore that points to the storage account

1. Open the Azure Machine Learning studio .

2. Go to your Azure Machine Learning workspace.

3. Go to the Data section.

4. Select the Datastores tab.

5. Select Create.

6. Configure the datastore as follows:

a. For Datastore name, enter trusted_blob.

b. For Datastore type select Azure Blob Storage.

 Tip

Why should you configure Azure Blob Storage instead of Azure Data Lake
Gen2? Batch endpoints can only write predictions to Blob Storage
accounts. However, every Azure Data Lake Gen2 storage account is also a
blob storage account; therefore, they can be used interchangeably.

c. Select the storage account from the wizard, using the Subscription ID, Storage
account, and Blob container (file system).
d. Select Create.

7. Ensure that the compute where the batch endpoint is running has permission to
mount the data in this storage account. Although access is still granted by the
identity that invokes the endpoint, the compute where the batch endpoint runs
needs to have permission to mount the storage account that you provide. For
more information, see Accessing storage services.

Upload sample dataset

Upload some sample data for the endpoint to use as input:

1. Go to your Fabric workspace.

2. Select the lakehouse where you created the shortcut.

3. Go to the datasets shortcut.

4. Create a folder to store the sample dataset that you want to score. Name the
folder uci-heart-unlabeled.

5. Use the Get data option and select Upload files to upload the sample dataset
heart-unlabeled.csv.

6. Upload the sample dataset.



7. The sample file is ready to be consumed. Note the path to the location where you
saved it.

Create a Fabric to batch inferencing pipeline

In this section, you create a Fabric-to-batch inferencing pipeline in your existing Fabric
workspace and invoke batch endpoints.

1. Return to the Data Engineering experience (if you already navigated away from it),
by using the experience selector icon in the lower left corner of your home page.

2. Open your Fabric workspace.

3. From the New section of the homepage, select Data pipeline.

4. Name the pipeline and select Create.



5. Select the Activities tab from the toolbar in the designer canvas.

6. Select more options at the end of the tab and select Azure Machine Learning.

7. Go to the Settings tab and configure the activity as follows:

a. Select New next to Azure Machine Learning connection to create a new

connection to the Azure Machine Learning workspace that contains your
deployment.


b. In the Connection settings section of the creation wizard, specify the values of
the subscription ID, Resource group name, and Workspace name, where your
endpoint is deployed.

c. In the Connection credentials section, select Organizational account as the

value for the Authentication kind for your connection. Organizational account
uses the credentials of the connected user. Alternatively, you could use Service
principal. In production settings, we recommend that you use a Service
principal. Regardless of the authentication type, ensure that the identity
associated with the connection has the rights to call the batch endpoint that
you deployed.


d. Save the connection. Once the connection is selected, Fabric automatically
populates the available batch endpoints in the selected workspace.

8. For Batch endpoint, select the batch endpoint you want to call. In this example,
select heart-classifier-....