Azure Machine Learning Azureml API 2
Azure Machine Learning Azureml API 2
Overview
e OVERVIEW
f QUICKSTART
Create resources
g TUTORIAL
Train a model
Deploy a model
c HOW-TO GUIDE
Train models
c HOW-TO GUIDE
Train with R
Deploy models
` DEPLOY
Deploy R models
c HOW-TO GUIDE
c HOW-TO GUIDE
Reference docs
i REFERENCE
CLI (v2)
REST API
Resources
i REFERENCE
Upgrade to v2
Azure Machine Learning is a cloud service for accelerating and managing the machine
learning (ML) project lifecycle. ML professionals, data scientists, and engineers can use it
in their day-to-day workflows to train and deploy models and manage machine learning
operations (MLOps).
You can create a model in Machine Learning or use a model built from an open-source
platform, such as PyTorch, TensorFlow, or scikit-learn. MLOps tools help you monitor,
retrain, and redeploy models.
Tip
Free trial! If you don't have an Azure subscription, create a free account before you
begin. Try the free or paid version of Azure Machine Learning . You get credits
to spend on Azure services. After they're used up, you can keep the account and
use free Azure services . Your credit card is never charged unless you explicitly
change your settings and ask to be charged.
Data scientists and ML engineers can use tools to accelerate and automate their day-to-
day workflows. Application developers can use tools for integrating models into
applications or services. Platform developers can use a robust set of tools, backed by
durable Azure Resource Manager APIs, for building advanced ML tooling.
Enterprises working in the Microsoft Azure cloud can use familiar security and role-
based access control for infrastructure. You can set up a project to deny access to
protected data and select operations.
Develop models for fairness and explainability, tracking and auditability to fulfill
lineage and audit compliance requirements
Deploy ML models quickly and easily at scale, and manage and govern them
efficiently with MLOps
Run machine learning workloads anywhere with built-in governance, security, and
compliance
As you're refining the model and collaborating with others throughout the rest of the
Machine Learning development cycle, you can share and find assets, resources, and
metrics for your projects on the Machine Learning studio UI.
Studio
Machine Learning studio offers multiple authoring experiences depending on the type
of project and the level of your past ML experience, without having to install anything.
Notebooks: Write and run your own code in managed Jupyter Notebook servers
that are directly integrated in the studio.
Visualize run metrics: Analyze and optimize your experiments with visualization.
Azure Machine Learning designer: Use the designer to train and deploy ML
models without writing any code. Drag and drop datasets and components to
create ML pipelines.
Data labeling: Use Machine Learning data labeling to efficiently coordinate image
labeling or text labeling projects.
) Important
Machine Learning doesn't store or process your data outside of the region where
you deploy.
Project lifecycle
The project lifecycle can vary by project, but it often looks like this diagram.
A workspace organizes a project and allows for collaboration for many users all working
toward a common objective. Users in a workspace can easily share the results of their
runs from experimentation in the studio user interface. Or they can use versioned assets
for jobs like environments and storage references.
You can deploy models to the managed inferencing solution, for both real-time and
batch deployments, abstracting away the infrastructure management typically required
for deploying models.
Train models
In Machine Learning, you can run your training script in the cloud or build a model from
scratch. Customers often bring models they've built and trained in open-source
frameworks so that they can operationalize them in the cloud.
PyTorch
TensorFlow
scikit-learn
XGBoost
LightGBM
R
.NET
For more information, see Open-source integration with Azure Machine Learning.
Hyperparameter optimization
Hyperparameter optimization, or hyperparameter tuning, can be a tedious task. Machine
Learning can automate this task for arbitrary parameterized commands with little
modification to your job definition. Results are visualized in the studio.
Supported via Azure Machine Learning Kubernetes, Azure Machine Learning compute
clusters, and serverless compute:
PyTorch
TensorFlow
MPI
You can use MPI distribution for Horovod or custom multinode logic. Apache Spark is
supported via serverless Spark compute and attached Synapse Spark pool that use
Azure Synapse Analytics Spark clusters.
For more information, see Distributed training with Azure Machine Learning.
Deploy models
To bring a model into production, it's deployed. The Machine Learning managed
endpoints abstract the required infrastructure for both batch or real-time (online) model
scoring (inferencing).
Real-time and batch scoring (inferencing)
Batch scoring, or batch inferencing, involves invoking an endpoint with a reference to
data. The batch endpoint runs jobs asynchronously to process data in parallel on
compute clusters and store the data for further analysis.
Real-time scoring, or online inferencing, involves invoking an endpoint with one or more
model deployments and receiving a response in near real time via HTTPS. Traffic can be
split across multiple deployments, allowing for testing new model versions by diverting
some amount of traffic initially and increasing after confidence in the new model is
established.
ML model lifecycle
git integration.
MLflow integration.
Machine learning pipeline scheduling.
Azure Event Grid integration for custom triggers.
Ease of use with CI/CD tools like GitHub Actions or Azure DevOps.
Next steps
Start using Azure Machine Learning:
Azure Machine Learning CLI v2 (CLI v2) and Azure Machine Learning Python SDK v2
(SDK v2) introduce a consistency of features and terminology across the interfaces. To
create this consistency, the syntax of commands differs, in some cases significantly, from
the first versions (v1).
There are no differences in functionality between CLI v2 and SDK v2. The command line-
based CLI might be more convenient in CI/CD MLOps types of scenarios, while the SDK
might be more convenient for development.
The YAML file defines the configuration of the asset or workflow, such as what is it
and where should it run? Any custom logic or IP used, say data preparation, model
training, and model scoring, can remain in script files. These files are referred to in
the YAML but aren't part of the YAML itself. Machine Learning supports script files
in Python, R, Java, Julia, or C#. All you need to learn is YAML format and command
lines to use Machine Learning. You can stick with script files of your choice.
The use of command line for execution makes deployment and automation
simpler because you can invoke workflows from any offering or platform, which
allows users to call the command line.
Machine Learning offers endpoints to streamline model deployments for both real-
time and batch inference deployments. This functionality is available only via CLI v2
and SDK v2.
SDK v2 is on par with CLI v2 functionality and is consistent in how assets (nouns) and
actions (verbs) are used between SDK and CLI. For example, to list an asset, you can use
the list action in both SDK and CLI. You can use the same list action to list a
compute, model, environment, and so on.
Machine Learning offers endpoints to streamline model deployments for both real-
time and batch inference deployments. This functionality is available only via CLI v2
and SDK v2.
CLI v2
Azure Machine Learning CLI v1 has been deprecated. We recommend that you use CLI
v2 if:
SDK v2
Azure Machine Learning Python SDK v1 doesn't have a planned deprecation date. If you
have significant investments in Python SDK v1 and don't need any new features offered
by SDK v2, you can continue to use SDK v1. However, you should consider using SDK v2
if:
You want to use new features like reusable components and managed inferencing.
You're starting a new workflow or pipeline. All new features and future investments
will be introduced in v2.
You want to take advantage of the improved usability of the Python SDK v2 ability
to compose jobs and pipelines by using Python functions, with easy evolution from
simple to complex tasks.
Next steps
Upgrade from v1 to v2
The Azure Machine Learning glossary is a short dictionary of terminology for the
Machine Learning platform. For general Azure terminology, see also:
Component
A Machine Learning component is a self-contained piece of code that does one step in
a machine learning pipeline. Components are the building blocks of advanced machine
learning pipelines. Components can do tasks such as data processing, model training,
and model scoring. A component is analogous to a function. It has a name and
parameters, expects input, and returns output.
Compute
A compute is a designated compute resource where you run your job or host your
endpoint. Machine Learning supports the following types of compute:
7 Note
Data
Machine Learning allows you to work with different types of data:
Primitives:
string
boolean
number
For most scenarios, you use URIs ( uri_folder and uri_file ) to identify a location in
storage that can be easily mapped to the file system of a compute node in a job by
either mounting or downloading the storage to the node.
The mltable parameter is an abstraction for tabular data that's used for automated
machine learning (AutoML) jobs, parallel jobs, and some advanced scenarios. If you're
starting to use Machine Learning and aren't using AutoML, we strongly encourage you
to begin with URIs.
Datastore
Machine Learning datastores securely keep the connection information to your data
storage on Azure so that you don't have to code it in your scripts. You can register and
create a datastore to easily connect to your storage account and access the data in your
underlying storage service. The Azure Machine Learning CLI v2 and SDK v2 support the
following types of cloud-based storage services:
Environment
Machine Learning environments are an encapsulation of the environment where your
machine learning task happens. They specify the software packages, environment
variables, and software settings around your training and scoring scripts. The
environments are managed and versioned entities within your Machine Learning
workspace. Environments enable reproducible, auditable, and portable machine learning
workflows across various computes.
Types of environment
Machine Learning supports two types of environments: curated and custom.
Curated environments are provided by Machine Learning and are available in your
workspace by default. They're intended to be used as is. They contain collections of
Python packages and settings to help you get started with various machine learning
frameworks. These precreated environments also allow for faster deployment time. For a
full list, see Azure Machine Learning curated environments.
In custom environments, you're responsible for setting up your environment. Make sure
to install the packages and any other dependencies that your training or scoring script
needs on the compute. Machine Learning allows you to create your own environment
by using:
A Docker image.
A base Docker image with a conda YAML to customize further.
A Docker build context.
Model
Machine Learning models consist of the binary files that represent a machine learning
model and any corresponding metadata. You can create models from a local or remote
file or directory. For remote locations, https , wasbs , and azureml locations are
supported. The created model is tracked in the workspace under the specified name and
version. Machine Learning supports three types of storage format for models:
custom_model
mlflow_model
triton_model
Workspace
The workspace is the top-level resource for Machine Learning. It provides a centralized
place to work with all the artifacts you create when you use Machine Learning. The
workspace keeps a history of all jobs, including logs, metrics, output, and a snapshot of
your scripts. The workspace stores references to resources like datastores and compute.
It also holds all assets like models, environments, components, and data assets.
Next steps
What is Azure Machine Learning?
Tutorial: Create resources you need to
get started
Article • 08/17/2023
This article was partially created with the help of AI. An author reviewed and revised
the content as needed. Read more.
In this tutorial, you will create the resources you need to start working with Azure
Machine Learning.
" A workspace. To use Azure Machine Learning, you'll first need a workspace. The
workspace is the central place to view and manage all the artifacts and resources
you create.
" A compute instance. A compute instance is a pre-configured cloud-computing
resource that you can use to train, automate, manage, and track machine learning
models. A compute instance is the quickest way to start using the Azure Machine
Learning SDKs and CLIs. You'll use it to run Jupyter notebooks and Python scripts in
the rest of the tutorials.
This video shows you how to create a workspace and compute instance. The steps are
also described in the sections below.
https://learn-video.azurefd.net/vod/player?id=a0e901d2-e82a-4e96-9c7f-
3b5467859969&locale=en-us&embedUrl=%2Fazure%2Fmachine-
learning%2Fquickstart-create-resources
Prerequisites
An Azure account with an active subscription. Create an account for free .
If you already have a workspace, skip this section and continue to Create a compute
instance.
Field Description
Workspace Enter a unique name that identifies your workspace. Names must be unique
name across the resource group. Use a name that's easy to recall and to
differentiate from workspaces created by others. The workspace name is
case-insensitive.
Region Select the Azure region closest to your users and the data resources to
create your workspace.
7 Note
This creates a workspace along with all required resources. If you would like to
reuse resources, such as Storage Account, Azure Container Registry, Azure KeyVault,
or Application Insights, use the Azure portal instead.
You'll only see this option if you don't yet have a compute instance in your
workspace.
5. Select Create.
The Authoring section of the studio contains multiple ways to get started in
creating machine learning models. You can:
Notebooks section allows you to create Jupyter Notebooks, copy sample
notebooks, and run notebooks and Python scripts.
Automated ML steps you through creating a machine learning model without
writing code.
Designer gives you a drag-and-drop way to build models using prebuilt
components.
The Assets section of the studio helps you keep track of the assets you create as
you run your jobs. If you have a new workspace, there's nothing in any of these
sections yet.
The Manage section of the studio lets you create and manage compute and
external services you link to your workspace. It's also where you can create and
manage a Data labeling project.
But you could also create a new, empty notebook, then copy/paste code from a tutorial
into the notebook. To do so:
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
Next steps
You now have an Azure Machine Learning workspace, which contains a compute
instance to use for your development environment.
Continue on to learn how to use the compute instance to run notebooks and scripts in
the Azure Machine Learning cloud.
Use your compute instance with the following tutorials to train and deploy a model.
Tutorial Description
Upload, access and explore your data in Store large data in the cloud and retrieve it from
Azure Machine Learning notebooks and scripts
Train a model in Azure Machine Learning Dive in to the details of training a model
Tutorial Description
Create production machine learning pipelines Split a complete machine learning task into a
multistep workflow.
Set up a Python development
environment for Azure Machine
Learning
Article • 04/25/2023
The following table shows each development environment covered in this article, along
with pros and cons.
The Data Similar to the cloud-based compute A slower getting started experience
Science instance (Python is pre-installed), but with compared to the cloud-based
Virtual additional popular data science and compute instance.
Machine machine learning tools pre-installed. Easy
(DSVM) to scale and combine with other custom
tools and workflows.
Azure Easiest way to get started. The SDK is Lack of control over your
Machine already installed in your workspace VM, development environment and
Learning and notebook tutorials are pre-cloned and dependencies. Additional cost
compute ready to run. incurred for Linux VM (VM can be
instance stopped when not in use to avoid
charges). See pricing details .
This article also provides additional usage tips for the following tools:
Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some
extras that you should install.
Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning
extension includes language support for Python, and features to make working
with the Azure Machine Learning much more convenient and productive.
Prerequisites
Azure Machine Learning workspace. If you don't have one, you can create an Azure
Machine Learning workspace through the Azure portal, Azure CLI, and Azure
Resource Manager templates.
JSON
{
"subscription_id": "<subscription-id>",
"resource_group": "<resource-group>",
"workspace_name": "<workspace-name>"
}
This JSON file must be in the directory structure that contains your Python scripts or
Jupyter Notebooks. It can be in the same directory, a subdirectory named.azureml*, or in
a parent directory.
To use this file from your code, use the MLClient.from_config method. This code loads
the information from the file and connects to your workspace.
Create a script to connect to your Azure Machine Learning workspace. Make sure
to replace subscription_id , resource_group , and workspace_name with your own.
Python
7 Note
Although not required, it's recommended you use Anaconda or
Miniconda to manage Python virtual environments and install packages.
) Important
If you're on Linux or macOS and use a shell other than bash (for example, zsh)
you might receive errors when you run some commands. To work around this
problem, use the bash command to start a new bash shell and run the
commands there.
Now that you have your local environment set up, you're ready to start working with
Azure Machine Learning. See the Tutorial: Azure Machine Learning in a day to get
started.
Jupyter Notebooks
When running a local Jupyter Notebook server, it's recommended that you create an
IPython kernel for your Python virtual environment. This helps ensure the expected
kernel and package import behavior.
Bash
2. Create a kernel for your Python virtual environment. Make sure to replace <myenv>
with the name of your Python virtual environment.
Bash
Once you have the Visual Studio Code extension installed, use it to:
Create one anytime from within your Azure Machine Learning workspace. Provide just a
name and specify an Azure VM type. Try it now with Create resources to get started.
To learn more about compute instances, including how to install packages, see Create
and manage an Azure Machine Learning compute instance.
Tip
In addition to a Jupyter Notebook server and JupyterLab, you can use compute
instances in the integrated notebook feature inside of Azure Machine Learning studio.
You can also use the Azure Machine Learning Visual Studio Code extension to connect
to a remote compute instance using VS Code.
For a more comprehensive list of the tools, see the Data Science VM tools guide.
) Important
If you plan to use the Data Science VM as a compute target for your training or
inferencing jobs, only Ubuntu is supported.
Azure CLI
Azure CLI
Bash
3. Once the environment has been created, activate it and install the SDK
Bash
4. To configure the Data Science VM to use your Azure Machine Learning workspace,
create a workspace configuration file or use an existing one.
Tip
Similar to local environments, you can use Visual Studio Code and the Azure
Machine Learning Visual Studio Code extension to interact with Azure
Machine Learning.
Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference .
Install and set up the CLI (v2)
Article • 04/04/2023
The ml extension to the Azure CLI is the enhanced interface for Azure Machine Learning.
It enables you to train and deploy models from the command line, with features that
accelerate scaling data science up and out while tracking the model lifecycle.
Prerequisites
To use the CLI, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today.
To use the CLI commands in this document from your local environment, you
need the Azure CLI.
Installation
The new Machine Learning extension requires Azure CLI version >=2.38.0 . Ensure this
requirement is met:
Azure CLI
az version
Azure CLI
az extension list
Remove any existing installation of the ml extension and also the CLI v1 azure-cli-ml
extension:
Azure CLI
Azure CLI
az extension add -n ml
Run the help command to verify your installation and see available subcommands:
Azure CLI
az ml -h
Azure CLI
az extension update -n ml
Installation on Linux
If you're using Linux, the fastest way to install the necessary CLI version and the Machine
Learning extension is:
Bash
Set up
Login:
Azure CLI
az login
If you have access to multiple Azure subscriptions, you can set your active subscription:
Azure CLI
az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>"
Optionally, setup common variables in your shell for usage in subsequent commands:
Azure CLI
GROUP="azureml-examples"
LOCATION="eastus"
WORKSPACE="main"
2 Warning
This uses Bash syntax for setting variables -- adjust as needed for your shell. You
can also replace the values in commands below inline rather than using variables.
If it doesn't already exist, you can create the Azure resource group:
Azure CLI
Azure CLI
Azure CLI
Tip
Most code examples assume you have set a default workspace and resource group.
You can override these on the command line.
Azure CLI
az configure -l -o table
Tip
Secure communications
The ml CLI extension (sometimes called 'CLI v2') for Azure Machine Learning sends
operational data (YAML parameters and metadata) over the public internet. All the ml
CLI extension commands communicate with the Azure Resource Manager. This
communication is secured using HTTPS/TLS 1.2.
Data in a data store that is secured in a virtual network is not sent over the public
internet. For example, if your training data is located in the default storage account for
the workspace, and the storage account is in a virtual network.
7 Note
With the previous extension ( azure-cli-ml , sometimes called 'CLI v1'), only some of
the commands communicate with the Azure Resource Manager. Specifically,
commands that create, update, delete, list, or show Azure resources. Operations
such as submitting a training job communicate directly with the Azure Machine
Learning workspace. If your workspace is secured with a private endpoint, that is
enough to secure commands provided by the azure-cli-ml extension.
Public workspace
If your Azure Machine Learning workspace is public (that is, not behind a virtual
network), then there is no additional configuration required. Communications are
secured using HTTPS/TLS 1.2
Next steps
Train models using CLI (v2)
Set up the Visual Studio Code Azure Machine Learning extension
Train an image classification TensorFlow model using the Azure Machine Learning
Visual Studio Code extension
Explore Azure Machine Learning with examples
Set up Visual Studio Code desktop with
the Azure Machine Learning extension
(preview)
Article • 06/15/2023
Learn how to set up the Azure Machine Learning Visual Studio Code extension for your
machine learning workflows. You only need to do this setup when using the VS Code
desktop application. If you use VS Code for the Web, this is handled for you.
The Azure Machine Learning extension for VS Code provides a user interface to:
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning .
Visual Studio Code. If you don't have it, install it .
Python
(Optional) To create resources using the extension, you need to install the CLI (v2).
For setup instructions, see Install, set up, and use the CLI (v2).
Clone the community driven repository
Bash
git clone https://github.com/Azure/azureml-examples.git --depth 1
2. Select Extensions icon from the Activity Bar to open the Extensions view.
3. In the Extensions view search bar, type "Azure Machine Learning" and select the
first extension.
4. Select Install.
7 Note
The Azure Machine Learning VS Code extension uses the CLI (v2) by default. To
switch to the 1.0 CLI, set the azureML.CLI Compatibility Mode setting in Visual
Studio Code to 1.0 . For more information on modifying your settings in Visual
Studio, see the user and workspace settings documentation .
To sign into your Azure account, select the Azure: Sign In button in the bottom right
corner on the Visual Studio Code status bar to start the sign in process.
Schema validation
Autocompletion
Diagnostics
If you don't have a workspace, create one. For more information, see manage Azure
Machine Learning resources with the VS Code extension.
To choose your default workspace, select the Set Azure Machine Learning Workspace
button on the Visual Studio Code status bar and follow the prompts to set your
workspace.
Alternatively, use the > Azure ML: Set Default Workspace command in the command
palette and follow the prompts to set your workspace.
Next Steps
Manage your Azure Machine Learning resources
Develop on a remote compute instance locally
Train an image classification model using the Visual Studio Code extension
Run and debug machine learning experiments locally (CLI v1)
Quickstart: Get started with Azure
Machine Learning
Article • 10/20/2023
This tutorial is an introduction to some of the most used features of the Azure Machine
Learning service. In it, you will create, register and deploy a model. This tutorial will help
you become familiar with the core concepts of Azure Machine Learning and their most
common usage.
You'll learn how to run a training job on a scalable compute resource, then deploy it,
and finally test the deployment.
You'll create a training script to handle the data preparation, train and register a model.
Once you train the model, you'll deploy it as an endpoint, then call the endpoint for
inferencing.
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
2. Sign in to studio and select your workspace if it's not already open.
3. Open or create a notebook in your workspace:
2. If the compute instance is stopped, select Start compute and wait until it is
running.
3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.
4. If you see a banner that says you need to be authenticated, select Authenticate.
) Important
The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.
You'll create ml_client for a handle to the workspace. You'll then use ml_client to
manage resources and jobs.
In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.
Python
# authenticate
credential = DefaultAzureCredential()
SUBSCRIPTION="<SUBSCRIPTION_ID>"
RESOURCE_GROUP="<RESOURCE_GROUP>"
WS_NAME="<AML_WORKSPACE_NAME>"
# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id=SUBSCRIPTION,
resource_group_name=RESOURCE_GROUP,
workspace_name=WS_NAME,
)
7 Note
Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).
Python
Python
import os
train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)
This script handles the preprocessing of the data, splitting it into test and train data. It
then consumes this data to train a tree based model and return the output model.
MLFlow will be used to log the parameters and metrics during our pipeline run.
The cell below uses IPython magic to write the training script into the directory you just
created.
Python
%%writefile {train_src_dir}/main.py
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
def main():
"""Main function of the script."""
# Start Logging
mlflow.start_run()
# enable autologging
mlflow.sklearn.autolog()
###################
#<prepare the data>
###################
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)
##################
#<train the model>
##################
# Extracting the label column
y_train = train_df.pop("default payment next month")
clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
###################
#</train the model>
###################
##########################
#<save and register model>
##########################
# Registering the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=args.registered_model_name,
artifact_path=args.registered_model_name,
)
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
As you can see in this script, once the model is trained, the model file is saved and
registered to the workspace. Now you can use the registered model in inferencing
endpoints.
You might need to select Refresh to see the new folder and script in your Files.
Configure the command
Now that you have a script that can perform the desired tasks, and a compute cluster to
run the script, you'll use a general purpose command that can run command line
actions. This command line action can directly call system commands or run a script.
Here, you'll create input variables to specify the input data, split ratio, learning rate and
registered model name. The command script will:
Use an environment that defines software and runtime libraries needed for the
training script. Azure Machine Learning provides many curated or ready-made
environments, which are useful for common training and inference scenarios. You'll
use one of those environments here. In Tutorial: Train a model in Azure Machine
Learning, you'll learn how to create a custom environment.
Configure the command line action itself - python main.py in this case. The
inputs/outputs are accessible in the command via the ${{ ... }} notation.
In this sample, we access the data from a file on the internet.
Since a compute resource was not specified, the script will be run on a serverless
compute cluster that is automatically created.
Python
registered_model_name = "credit_defaults_model"
job = command(
inputs=dict(
data=Input(
type="uri_file",
path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
),
test_train_ratio=0.2,
learning_rate=0.25,
registered_model_name=registered_model_name,
),
code="./src/", # location of source code
command="python main.py --data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --
registered_model_name ${{inputs.registered_model_name}}",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
display_name="credit_default_prediction",
)
Python
ml_client.create_or_update(job)
The output of this job will look like this in the Azure Machine Learning studio. Explore
the tabs for various details like metrics, outputs etc. Once completed, the job will
register a model in your workspace as a result of training.
) Important
Wait until the status of the job is complete before returning to this notebook to
continue. The job will take 2 to 3 minutes to run. It could take longer (up to 10
minutes) if the compute cluster has been scaled down to zero nodes and custom
environment is still building.
To deploy a machine learning service, you'll use the model you registered.
Python
import uuid
Python
endpoint =
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
7 Note
Once the endpoint has been created, you can retrieve it as below:
Python
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(
f'Endpoint "{endpoint.name}" with provisioning state "
{endpoint.provisioning_state}" is retrieved'
)
You can check the Models page on Azure Machine Learning studio, to identify the latest
version of your registered model. Alternatively, the code below will retrieve the latest
version number for you to use.
Python
Python
# picking the model to deploy. Here we use the latest version of our
registered model
model = ml_client.models.get(name=registered_model_name,
version=latest_model_version)
blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()
7 Note
Create a sample request file following the design expected in the run method in the
score script.
Python
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)
Python
%%writefile {deploy_dir}/sample-request.json
{
"input_data": {
"columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
"index": [0, 1],
"data": [
[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
10, 9, 8]
]
}
}
Python
Clean up resources
If you're not going to use the endpoint, delete it to stop using the resource. Make sure
no other deployments are using an endpoint before you delete it.
7 Note
Python
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next steps
Now that you have an idea of what's involved in training and deploying a model, learn
more about the process in these tutorials:
Tutorial Description
Upload, access and explore your data in Store large data in the cloud and retrieve it from
Azure Machine Learning notebooks and scripts
Train a model in Azure Machine Learning Dive in to the details of training a model
Create production machine learning pipelines Split a complete machine learning task into a
multistep workflow.
Tutorial: Upload, access and explore
your data in Azure Machine Learning
Article • 12/27/2023
The start of a machine learning project typically involves exploratory data analysis (EDA),
data-preprocessing (cleaning, feature engineering), and the building of Machine
Learning model prototypes to validate hypotheses. This prototyping project phase is
highly interactive. It lends itself to development in an IDE or a Jupyter notebook, with a
Python interactive console. This tutorial describes these ideas.
This video shows how to get started in Azure Machine Learning studio so that you can
follow the steps in the tutorial. The video shows how to create a notebook, clone the
notebook, create a compute instance, and download the data needed for the tutorial.
The steps are also described in the following sections.
https://learn-video.azurefd.net/vod/player?id=514a29e2-0ae7-4a5d-a537-
8f10681f5545&locale=en-us&embedUrl=%2Fazure%2Fmachine-learning%2Ftutorial-
explore-data
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
2. Sign in to studio and select your workspace if it's not already open.
2. If the compute instance is stopped, select Start compute and wait until it is
running.
3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.
4. If you see a banner that says you need to be authenticated, select Authenticate.
) Important
The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.
7 Note
This tutorial depends on data placed in an Azure Machine Learning resource folder
location. For this tutorial, 'local' means a folder location in that Azure Machine
Learning resource.
1. Select Open terminal below the three dots, as shown in this image:
2. The terminal window opens in a new tab.
3. Make sure you cd to the same folder where this notebook is located. For example,
if the notebook is in a folder named get-started-notebooks:
4. Enter these commands in the terminal window to copy the data to your compute
instance:
mkdir data
cd data # the sub-folder where you'll store the
data
wget
https://azuremlexamples.blob.core.windows.net/datasets/credit_card/defa
ult_of_credit_card_clients.csv
Learn more about this data on the UCI Machine Learning Repository.
In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.
Python
# authenticate
credential = DefaultAzureCredential()
7 Note
Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).
An Azure Machine Learning data asset is similar to web browser bookmarks (favorites).
Instead of remembering long storage paths (URIs) that point to your most frequently
used data, you can create a data asset, and then access that asset with a friendly name.
Data asset creation also creates a reference to the data source location, along with a
copy of its metadata. Because the data remains in its existing location, you incur no
extra storage cost, and don't risk data source integrity. You can create Data assets from
Azure Machine Learning datastores, Azure Storage, public URLs, and local files.
Tip
For smaller-size data uploads, Azure Machine Learning data asset creation works
well for data uploads from local machine resources to cloud storage. This approach
avoids the need for extra tools or utilities. However, a larger-size data upload might
require a dedicated tool or utility - for example, azcopy. The azcopy command-line
tool moves data to and from Azure Storage. Learn more about azcopy here.
The next notebook cell creates the data asset. The code sample uploads the raw data file
to the designated cloud storage resource.
Each time you create a data asset, you need a unique version for it. If the version already
exists, you'll get an error. In this code, we're using the "initial" for the first read of the
data. If that version already exists, we'll skip creating it again.
You can also omit the version parameter, and a version number is generated for you,
starting with 1 and then incrementing from there.
In this tutorial, we use the name "initial" as the first version. The Create production
machine learning pipelines tutorial will also use this version of the data, so here we are
using a value that you'll see again in that tutorial.
Python
my_path = "./data/default_of_credit_card_clients.csv"
# set the version number of the data asset
v1 = "initial"
my_data = Data(
name="credit-card",
version=v1,
description="Credit card data",
path=my_path,
type=AssetTypes.URI_FILE,
)
## create data asset if it doesn't already exist:
try:
data_asset = ml_client.data.get(name="credit-card", version=v1)
print(
f"Data asset already exists. Name: {my_data.name}, version:
{my_data.version}"
)
except:
ml_client.data.create_or_update(my_data)
print(f"Data asset created. Name: {my_data.name}, version:
{my_data.version}")
You can see the uploaded data by selecting Data on the left. You'll see the data is
uploaded and a data asset is created:
This data is named credit-card, and in the Data assets tab, we can see it in the Name
column. This data uploaded to your workspace's default datastore named
workspaceblobstore, seen in the Data source column.
df =
pd.read_csv("azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspa
ces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/<filename>.c
sv")
You'll want to create data assets for frequently accessed data. Here's an easier way to
access the CSV file in Pandas:
) Important
In a notebook cell, execute this code to install the azureml-fsspec Python library in
your Jupyter kernel:
Python
Python
import pandas as pd
# read into pandas - note that you will see 2 headers in your data frame -
that is ok, for now
df = pd.read_csv(data_asset.path)
df.head()
Read Access data from Azure cloud storage during interactive development to learn
more about data access in a notebook.
two headers
a client ID column; we wouldn't use this feature in Machine Learning
spaces in the response variable name
Also, compared to the CSV format, the Parquet file format becomes a better way to
store this data. Parquet offers compression, and it maintains schema. Therefore, to clean
the data and store it in Parquet, use:
Python
# read in data again, this time using the 2nd row as the header
df = pd.read_csv(data_asset.path, header=1)
# rename column
df.rename(columns={"default payment next month": "default"}, inplace=True)
# remove ID column
df.drop("ID", axis=1, inplace=True)
ノ Expand table
X1 Explanatory Amount of the given credit (NT dollar): it includes both the individual
consumer credit and their family (supplementary) credit.
X6-X11 Explanatory History of past payment. We tracked the past monthly payment
records (from April to September 2005). -1 = pay duly; 1 = payment
delay for one month; 2 = payment delay for two months; . . .; 8 =
Column Variable Description
Name(s) Type
payment delay for eight months; 9 = payment delay for nine months
and above.
X12-17 Explanatory Amount of bill statement (NT dollar) from April to September 2005.
X18-23 Explanatory Amount of previous payment (NT dollar) from April to September
2005.
Next, create a new version of the data asset (the data automatically uploads to cloud
storage). For this version, we'll add a time value, so that each time this code is run, a
different version number will be created.
Python
# Next, create a new *version* of the data asset (the data is automatically
uploaded to cloud storage):
v2 = "cleaned" + time.strftime("%Y.%m.%d.%H%M%S", time.gmtime())
my_path = "./data/cleaned-credit-card.parquet"
# Define the data asset, and use tags to make it clear the asset can be used
in training
my_data = Data(
name="credit-card",
version=v2,
description="Default of credit card clients data.",
tags={"training_data": "true", "format": "parquet"},
path=my_path,
type=AssetTypes.URI_FILE,
)
my_data = ml_client.data.create_or_update(my_data)
The cleaned parquet file is the latest version data source. This code shows the CSV
version result set first, then the Parquet version:
Python
import pandas as pd
"___________________________________________________________________________
__________________________________\n"
)
print(f"V2 Data asset URI: {data_asset_v2.path}")
v2df = pd.read_parquet(data_asset_v2.path)
print(v2df.head(5))
Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
Next steps
Read Create data assets for more information about data assets.
Learn how to develop a training script with a notebook on an Azure Machine Learning
cloud workstation. This tutorial covers the basics you need to get started:
" Set up and configuring the cloud workstation. Your cloud workstation is powered by
an Azure Machine Learning compute instance, which is pre-configured with
environments to support your various model development needs.
" Use cloud-based development environments.
" Use MLflow to track your model metrics, all from within a notebook.
Prerequisites
To use Azure Machine Learning, you'll first need a workspace. If you don't have one,
complete Create resources you need to get started to create a workspace and learn
more about using it.
4. If you don't have a compute instance, you'll see Create compute in the middle of
the screen. Select Create compute and fill out the form. You can use all the
defaults. (If you already have a compute instance, you'll instead see Terminal in
that spot. You'll use Terminal later in this tutorial.)
Set up a new environment for prototyping
(optional)
In order for your script to run, you need to be working in an environment configured
with the dependencies and libraries the code expects. This section helps you create an
environment tailored to your code. To create the new Jupyter kernel your notebook
connects to, you'll use a YAML file that defines the dependencies.
Upload a file.
Files you upload are stored in an Azure file share, and these files are mounted to
each compute instance and shared within the workspace.
1. Select Add files, then select Upload files to upload it to your workspace.
2. Select Browse and select file(s).
4. Select Upload.
You'll see the workstation_env.yml file under your username folder in the Files tab.
Select this file to preview it, and see what dependencies it specifies. You'll see
contents like this:
yml
name: workstation_env
# This file serves as an example - you can update packages or versions
to fit your use case
dependencies:
- python=3.8
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- mlflow-skinny
- azureml-mlflow
- psutil>=5.8,<5.9
- ipykernel~=6.0
- matplotlib
Create a kernel.
Now use the Azure Machine Learning terminal to create a new Jupyter kernel,
based on the workstation_env.yml file.
1. Select Terminal to open a terminal window. You can also open the terminal
from the left command bar:
2. If the compute instance is stopped, select Start compute and wait until it's
running.
3. Once the compute is running, you see a welcome message in the terminal,
and you can start typing commands.
Bash
6. Create the environment based on the conda file provided. It takes a few
minutes to build this environment.
Bash
Bash
8. Validate the correct environment is active, again looking for the environment
marked with a *.
Bash
Bash
You now have a new kernel. Next you'll open a notebook and use this kernel.
Create a notebook
1. Select Add files, and choose Create new file.
2. Name your new notebook develop-tutorial.ipynb (or enter your preferred name).
3. If the compute instance is stopped, select Start compute and wait until it's
running.
4. You'll see the notebook is connected to the default kernel in the top right. Switch
to use the Tutorial Workstation Env kernel if you created the kernel.
This code uses sklearn for training and MLflow for logging the metrics.
1. Start with code that imports the packages and libraries you'll use in the training
script.
Python
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
2. Next, load and process the data for this experiment. In this tutorial, you read the
data from a file on the internet.
Python
"https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
header=1,
index_col=0,
)
Python
# Extracting the label column
y_train = train_df.pop("default payment next month")
4. Add code to start autologging with MLflow , so that you can track the metrics and
results. With the iterative nature of model development, MLflow helps you log
model parameters and results. Refer back to those runs to compare and
understand how your model performs. The logs also provide context for when
you're ready to move from the development phase to the training phase of your
workflows within Azure Machine Learning.
Python
5. Train a model.
Python
mlflow.start_run()
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
# Stop logging for this model
mlflow.end_run()
7 Note
You can ignore the mlflow warnings. You'll still get all the results you need
tracked.
Iterate
Now that you have model results, you may want to change something and try again. For
example, try a different classifier technique:
Python
mlflow.start_run()
ada = AdaBoostClassifier()
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
print(classification_report(y_test, y_pred))
# Stop logging for this model
mlflow.end_run()
7 Note
You can ignore the mlflow warnings. You'll still get all the results you need tracked.
Examine results
Now that you've tried two different models, use the results tracked by MLFfow to decide
which model is better. You can reference metrics like accuracy, or other indicators that
matter most for your scenarios. You can dive into these results in more detail by looking
at the jobs created by MLflow .
3. There are two different jobs shown, one for each of the models you tried. These
names are autogenerated. As you hover over a name, use the pencil tool next to
the name if you want to rename it.
4. Select the link for the first job. The name appears at the top. You can also rename it
here with the pencil tool.
5. The page shows details of the job, such as properties, outputs, tags, and
parameters. Under Tags, you'll see the estimator_name, which describes the type of
model.
6. Select the Metrics tab to view the metrics that were logged by MLflow . (Expect
your results to differ, as you have a different training set.)
7. Select the Images tab to view the images generated by MLflow .
8. Go back and review the metrics and images for the other model.
4. Look through this file and delete the code you don't want in the training script. For
example, keep the code for the model you wish to use, and delete code for the
model you don't want.
You now have a Python script to use for training your preferred model.
Run the Python script
For now, you're running this code on your compute instance, which is your Azure
Machine Learning development environment. Tutorial: Train a model shows you how to
run a training script in a more scalable way on more powerful compute resources.
2. View your current conda environments. The active environment is marked with a *.
Bash
Bash
Bash
python train.py
7 Note
You can ignore the mlflow warnings. You'll still get all the metric and images from
autologging.
Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next steps
Learn more about:
This tutorial showed you the early steps of creating a model, prototyping on the same
machine where the code resides. For your production training, learn how to use that
training script on more powerful remote compute resources:
Train a model
Tutorial: Train a model in Azure Machine
Learning
Article • 11/15/2023
Learn how a data scientist uses Azure Machine Learning to train a model. In this
example, we use the associated credit card dataset to show how you can use Azure
Machine Learning for a classification problem. The goal is to predict if a customer has a
high likelihood of defaulting on a credit card payment.
The training script handles the data preparation, then trains and registers a model. This
tutorial takes you through steps to submit a cloud-based training job (command job). If
you would like to learn more about how to load your data into Azure, see Tutorial:
Upload, access and explore your data in Azure Machine Learning. The steps are:
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
2. Sign in to studio and select your workspace if it's not already open.
2. If the compute instance is stopped, select Start compute and wait until it is
running.
3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.
4. If you see a banner that says you need to be authenticated, select Authenticate.
) Important
The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.
A command job is a function that allows you to submit a custom training script to train
your model. This can also be defined as a custom training job. A command job in Azure
Machine Learning is a type of job that runs a script or command in a specified
environment. You can use command jobs to train models, process data, or any other
custom code you want to execute in the cloud.
In this tutorial, we'll focus on using a command job to create a custom training job that
we'll use to train a model. For any custom training job, the below items are required:
environment
data
command job
training script
In this tutorial we'll provide all these items for our example: creating a classifier to
predict customers who have a high likelihood of defaulting on credit card payments.
In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.
Python
# authenticate
credential = DefaultAzureCredential()
SUBSCRIPTION="<SUBSCRIPTION_ID>"
RESOURCE_GROUP="<RESOURCE_GROUP>"
WS_NAME="<AML_WORKSPACE_NAME>"
# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id=SUBSCRIPTION,
resource_group_name=RESOURCE_GROUP,
workspace_name=WS_NAME,
)
7 Note
Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).
Python
Azure Machine Learning provides many curated or ready-made environments, which are
useful for common training and inference scenarios.
In this example, you'll create a custom conda environment for your jobs, using a conda
yaml file.
Python
import os
dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)
The cell below uses IPython magic to write the conda file into the directory you just
created.
Python
%%writefile {dependencies_dir}/conda.yaml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=1.0.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- mlflow==2.8.0
- mlflow-skinny==2.8.0
- azureml-mlflow==1.51.0
- psutil>=5.8,<5.9
- tqdm>=4.59,<4.60
- ipykernel~=6.0
- matplotlib
The specification contains some usual packages, that you'll use in your job (numpy, pip).
Reference this yaml file to create and register this custom environment in your
workspace:
Python
custom_env_name = "aml-scikit-learn"
custom_job_env = Environment(
name=custom_env_name,
description="Custom environment for Credit Card Defaults job",
tags={"scikit-learn": "1.0.2"},
conda_file=os.path.join(dependencies_dir, "conda.yaml"),
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)
custom_job_env = ml_client.environments.create_or_update(custom_job_env)
print(
f"Environment with name {custom_job_env.name} is registered to
workspace, the environment version is {custom_job_env.version}"
)
The training script handles the data preparation, training and registering of the trained
model. The method train_test_split handles splitting the dataset into test and training
data. In this tutorial, you'll create a Python training script.
Command jobs can be run from CLI, Python SDK, or studio interface. In this tutorial,
you'll use the Azure Machine Learning Python SDK v2 to create and run the command
job.
Python
import os
train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)
This script handles the preprocessing of the data, splitting it into test and train data. It
then consumes this data to train a tree based model and return the output model.
MLFlow is used to log the parameters and metrics during our job. The MLFlow package
allows you to keep track of metrics and results for each model Azure trains. We'll be
using MLFlow to first get the best model for our data, then we'll view the model's
metrics on the Azure studio.
Python
%%writefile {train_src_dir}/main.py
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
def main():
"""Main function of the script."""
# Start Logging
mlflow.start_run()
# enable autologging
mlflow.sklearn.autolog()
###################
#<prepare the data>
###################
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)
##################
#<train the model>
##################
# Extracting the label column
y_train = train_df.pop("default payment next month")
clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
###################
#</train the model>
###################
##########################
#<save and register model>
##########################
# Registering the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=args.registered_model_name,
artifact_path=args.registered_model_name,
)
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
In this script, once the model is trained, the model file is saved and registered to the
workspace. Registering your model allows you to store and version your models in the
Azure cloud, in your workspace. Once you register a model, you can find all other
registered model in one place in the Azure Studio called the model registry. The model
registry helps you organize and keep track of your trained models.
Here, create input variables to specify the input data, split ratio, learning rate and
registered model name. The command script will:
Use the environment created earlier - you can use the @latest notation to indicate
the latest version of the environment when the command is run.
Configure the command line action itself - python main.py in this case. The
inputs/outputs are accessible in the command via the ${{ ... }} notation.
Since a compute resource was not specified, the script will be run on a serverless
compute cluster that is automatically created.
Python
registered_model_name = "credit_defaults_model"
job = command(
inputs=dict(
data=Input(
type="uri_file",
path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/def
ault_of_credit_card_clients.csv",
),
test_train_ratio=0.2,
learning_rate=0.25,
registered_model_name=registered_model_name,
),
code="./src/", # location of source code
command="python main.py --data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --
registered_model_name ${{inputs.registered_model_name}}",
environment="aml-scikit-learn@latest",
display_name="credit_default_prediction",
)
Python
ml_client.create_or_update(job)
) Important
Wait until the status of the job is complete before returning to this notebook to
continue. The job will take 2 to 3 minutes to run. It could take longer (up to 10
minutes) if the compute cluster has been scaled down to zero nodes and custom
environment is still building.
When you run the cell, the notebook output shows a link to the job's details page on
Azure Studio. Alternatively, you can also select Jobs on the left navigation menu. A job is
a grouping of many runs from a specified script or piece of code. Information for the run
is stored under that job. The details page gives an overview of the job, the time it took
to run, when it was created, etc. The page also has tabs to other information about the
job such as metrics, Outputs + logs, and code. Listed below are the tabs available in the
job's details page:
Overview: The overview section provides basic information about the job, including
its status, start and end times, and the type of job that was run
Inputs: The input section lists the data and code that were used as inputs for the
job. This section can include datasets, scripts, environment configurations, and
other resources that were used during training.
Outputs + logs: The Outputs + logs tab contains logs generated while the job was
running. This tab assists in troubleshooting if anything goes wrong with your
training script or model creation.
Metrics: The metrics tab showcases key performance metrics from your model such
as training score, f1 score, and precision score.
Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
3. Select Delete resource group.
Next Steps
Learn about deploying a model
Deploy a model .
This tutorial used an online data file. To learn more about other ways to access data, see
Tutorial: Upload, access and explore your data in Azure Machine Learning.
If you would like to learn more about different ways to train models in Azure Machine
Learning, see What is automated machine learning (AutoML)?. Automated ML is a
supplemental tool to reduce the amount of time a data scientist spends finding a model
that works best with their data.
If you would like more examples similar to this tutorial, see Samples section of studio.
These same samples are available at our GitHub examples page. The examples include
complete Python Notebooks that you can run code and learn to train a model. You can
modify and run existing scripts from the samples, containing scenarios including
classification, natural language processing, and anomaly detection.
Deploy a model as an online endpoint
Article • 04/20/2023
Learn to deploy a model to an online endpoint, using Azure Machine Learning Python
SDK v2.
In this tutorial, we use a model trained to predict the likelihood of defaulting on a credit
card payment. The goal is to deploy this model and show its use.
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
2. Sign in to studio and select your workspace if it's not already open.
4. View your VM quota and ensure you have enough quota available to create online
deployments. In this tutorial, you will need at least 8 cores of STANDARD_DS3_v2 and
12 cores of STANDARD_F4s_v2 . To view your VM quota usage and request quota
increases, see Manage resource quotas.
2. If the compute instance is stopped, select Start compute and wait until it is
running.
3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.
4. If you see a banner that says you need to be authenticated, select Authenticate.
) Important
The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.
Python
# authenticate
credential = DefaultAzureCredential()
7 Note
Creating MLClient will not connect to the workspace. The client initialization is lazy
and will wait for the first time it needs to make a call (this will happen in the next
code cell).
If you didn't complete the training tutorial, you'll need to register the model. Registering
your model before deployment is a recommended best practice.
In this example, we specify the path (where to upload files from) inline. If you cloned the
tutorials folder, then run the following code as-is. Otherwise, download the files and
metadata for the model to deploy . Update the path to the location on your local
computer where you've unzipped the model's files.
The SDK automatically uploads the files and registers the model.
For more information on registering your model as an asset, see Register your model as
an asset in Machine Learning by using the SDK.
Python
Alternatively, the code below will retrieve the latest version number for you to use.
Python
registered_model_name = "credit_defaults_model"
print(latest_model_version)
Now that you have a registered model, you can create an endpoint and deployment.
The next section will briefly cover some key details about these topics.
An endpoint, in this context, is an HTTPS path that provides an interface for clients to
send requests (input data) to a trained model and receive the inferencing (scoring)
results back from the model. An endpoint provides:
A deployment is a set of resources required for hosting the model that does the actual
inferencing.
A single endpoint can contain multiple deployments. Endpoints and deployments are
independent Azure Resource Manager resources that appear in the Azure portal.
Azure Machine Learning allows you to implement online endpoints for real-time
inferencing on client data, and batch endpoints for inferencing on large volumes of data
over a period of time.
In this tutorial, we'll walk you through the steps of implementing a managed online
endpoint. Managed online endpoints work with powerful CPU and GPU machines in
Azure in a scalable, fully managed way that frees you from the overhead of setting up
and managing the underlying deployment infrastructure.
Python
import uuid
Tip
auth_mode : Use key for key-based authentication. Use aml_token for Azure
Python
Using the MLClient created earlier, we'll now create the endpoint in the workspace. This
command will start the endpoint creation and return a confirmation response while the
endpoint creation continues.
7 Note
endpoint =
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
Python
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(
f'Endpoint "{endpoint.name}" with provisioning state "
{endpoint.provisioning_state}" is retrieved'
)
model - The model to use for the deployment. This value can be either a reference
scoring_script - Relative path to the scoring file in the source code directory.
This script executes the model on a given input request. For an example of a
scoring script, see Understand the scoring script in the "Deploy an ML model
with an online endpoint" article.
instance_type - The VM size to use for the deployment. For the list of supported
) Important
If you typically deploy models using scoring scripts and custom environments and
want to achieve the same functionality using MLflow models, we recommend
reading Using MLflow models for no-code deployment.
7 Note
Python
Using the MLClient created earlier, we'll now create the deployment in the workspace.
This command will start the deployment creation and return a confirmation response
while the deployment creation continues.
Python
Python
Python
Python
import os
Now, create the file in the deploy directory. The cell below uses IPython magic to write
the file into the directory you just created.
Python
%%writefile {deploy_dir}/sample-request.json
{
"input_data": {
"columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
"index": [0, 1],
"data": [
[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
10, 9, 8]
]
}
}
Using the MLClient created earlier, we'll get a handle to the endpoint. The endpoint can
be invoked using the invoke command with the following parameters:
Python
Python
logs = ml_client.online_deployments.get_logs(
name="blue", endpoint_name=online_endpoint_name, lines=50
)
print(logs)
Python
# picking the model to deploy. Here we use the latest version of our
registered model
model = ml_client.models.get(name=registered_model_name,
version=latest_model_version)
In the following code, you'll increase the VM instance manually. However, note that it is
also possible to autoscale online endpoints. Autoscale automatically runs the right
amount of resources to handle the load on your application. Managed online endpoints
support autoscaling through integration with the Azure monitor autoscale feature. To
configure autoscaling, see autoscale online endpoints.
Python
Python
You can test traffic allocation by invoking the endpoint several times:
Python
Python
logs = ml_client.online_deployments.get_logs(
name="green", endpoint_name=online_endpoint_name, lines=50
)
print(logs)
If you open the metrics for the online endpoint, you can set up the page to see metrics
such as the average request latency as shown in the following figure.
For more information on how to view online endpoint metrics, see Monitor online
endpoints.
Python
Python
ml_client.online_deployments.begin_delete(
name="blue", endpoint_name=online_endpoint_name
).result()
Clean up resources
If you aren't going use the endpoint and deployment after completing this tutorial, you
should delete them.
7 Note
Python
ml_client.online_endpoints.begin_delete(name=online_endpoint_name).result()
Delete everything
Use these steps to delete your Azure Machine Learning workspace and all compute
resources.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next Steps
Deploy and score a machine learning model by using an online endpoint.
Test the deployment with mirrored traffic
Monitor online endpoints
Autoscale an online endpoint
Customize MLflow model deployments with scoring script
View costs for an Azure Machine Learning managed online endpoint
Tutorial: Create production machine
learning pipelines
Article • 11/15/2023
7 Note
For a tutorial that uses SDK v1 to build a pipeline, see Tutorial: Build an Azure
Machine Learning pipeline for image classification
The core of a machine learning pipeline is to split a complete machine learning task into
a multistep workflow. Each step is a manageable component that can be developed,
optimized, configured, and automated individually. Steps are connected through well-
defined interfaces. The Azure Machine Learning pipeline service automatically
orchestrates all the dependencies between pipeline steps. The benefits of using a
pipeline are standardized the MLOps practice, scalable team collaboration, training
efficiency and cost reduction. To learn more about the benefits of pipelines, see What
are Azure Machine Learning pipelines.
In this tutorial, you use Azure Machine Learning to create a production ready machine
learning project, using Azure Machine Learning Python SDK v2.
This means you will be able to leverage the Azure Machine Learning Python SDK to:
During this tutorial, you create an Azure Machine Learning pipeline to train a model for
credit default prediction. The pipeline handles two steps:
1. Data preparation
2. Training and registering the trained model
The next image shows a simple pipeline as you'll see it in the Azure studio once
submitted.
The two steps are first data preparation and second training.
Prerequisites
1. To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
2. Sign in to studio and select your workspace if it's not already open.
3. Complete the tutorial Upload, access and explore your data to create the data
asset you need in this tutorial. Make sure you run all the code to create the initial
data asset. Explore the data and revise it if you wish, but you'll only need the initial
data in this tutorial.
2. If the compute instance is stopped, select Start compute and wait until it is
running.
3. Make sure that the kernel, found on the top right, is Python 3.10 - SDK v2 . If not,
use the dropdown to select this kernel.
4. If you see a banner that says you need to be authenticated, select Authenticate.
) Important
The rest of this tutorial contains cells of the tutorial notebook. Copy/paste them
into your new notebook, or switch to the notebook now if you cloned it.
In the next cell, enter your Subscription ID, Resource Group name and Workspace name.
To find these values:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace
name.
2. Copy the value for workspace, resource group and subscription ID into the code.
3. You'll need to copy one value, close the area and paste, then come back for the
next one.
Python
# authenticate
credential = DefaultAzureCredential()
SUBSCRIPTION="<SUBSCRIPTION_ID>"
RESOURCE_GROUP="<RESOURCE_GROUP>"
WS_NAME="<AML_WORKSPACE_NAME>"
# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id=SUBSCRIPTION,
resource_group_name=RESOURCE_GROUP,
workspace_name=WS_NAME,
)
7 Note
Creating MLClient will not connect to the workspace. The client initialization is lazy,
it will wait for the first time it needs to make a call (this will happen in the next code
cell).
Verify the connection by making a call to ml_client . Since this is the first time that
you're making a call to the workspace, you might be asked to authenticate.
Python
Python
In this example, you create a conda environment for your jobs, using a conda yaml file.
First, create a directory to store the file in.
Python
import os
dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)
Python
%%writefile {dependencies_dir}/conda.yaml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- xlrd==2.0.1
- mlflow== 2.4.1
- azureml-mlflow==1.51.0
The specification contains some usual packages, that you use in your pipeline (numpy,
pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning
jobs. However, adding these packages let you interact with Azure Machine Learning for
logging metrics and registering models, all inside the Azure Machine Learning job. You
use them in the training script later in this tutorial.
Use the yaml file to create and register this custom environment in your workspace:
Python
custom_env_name = "aml-scikit-learn"
pipeline_job_env = Environment(
name=custom_env_name,
description="Custom environment for Credit Card Defaults pipeline",
tags={"scikit-learn": "0.24.2"},
conda_file=os.path.join(dependencies_dir, "conda.yaml"),
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
version="0.2.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to
workspace, the environment version is {pipeline_job_env.version}"
)
Azure Machine Learning pipelines are reusable ML workflows that usually consist of
several components. The typical life of a component is:
Optionally, register the component with a name and version in your workspace, to
make it reusable and shareable.
Load that component from the pipeline code.
Implement the pipeline using the component's inputs, outputs and parameters.
Submit the pipeline.
There are two ways to create a component, programmatic and yaml definition. The next
two sections walk you through creating a component both ways. You can either create
the two components trying both options or pick your preferred method.
7 Note
In this tutorial for simplicity we are using the same compute for all components.
However, you can set different computes for each component, for example by
adding a line like train_step.compute = "cpu-cluster" . To view an example of
building a pipeline with different computes for each component, see the Basic
pipeline job section in the cifar-10 pipeline tutorial .
Python
import os
data_prep_src_dir = "./components/data_prep"
os.makedirs(data_prep_src_dir, exist_ok=True)
This script performs the simple task of splitting the data into train and test datasets.
Azure Machine Learning mounts datasets as folders to the computes, therefore, we
created an auxiliary select_first_file function to access the data file inside the
mounted input folder.
MLFlow is used to log the parameters and metrics during our pipeline run.
Python
%%writefile {data_prep_src_dir}/data_prep.py
import os
import argparse
import pandas as pd
from sklearn.model_selection import train_test_split
import logging
import mlflow
def main():
"""Main function of the script."""
# Start Logging
mlflow.start_run()
mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)
credit_test_df.to_csv(os.path.join(args.test_data, "data.csv"),
index=False)
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
Now that you have a script that can perform the desired task, create an Azure Machine
Learning Component from it.
Use the general purpose CommandComponent that can run command line actions. This
command line action can directly call system commands or run a script. The
inputs/outputs are specified on the command line via the ${{ ... }} notation.
Python
data_prep_component = command(
name="data_prep_credit_defaults",
display_name="Data preparation for training",
description="reads a .xl input, split the input to train and test",
inputs={
"data": Input(type="uri_folder"),
"test_train_ratio": Input(type="number"),
},
outputs=dict(
train_data=Output(type="uri_folder", mode="rw_mount"),
test_data=Output(type="uri_folder", mode="rw_mount"),
),
# The source folder of the component
code=data_prep_src_dir,
command="""python data_prep.py \
--data ${{inputs.data}} --test_train_ratio
${{inputs.test_train_ratio}} \
--train_data ${{outputs.train_data}} --test_data
${{outputs.test_data}} \
""",
environment=f"{pipeline_job_env.name}:{pipeline_job_env.version}",
)
Python
You used the CommandComponent class to create your first component. This time you use
the yaml definition to define the second component. Each method has its own
advantages. A yaml definition can actually be checked-in along the code, and would
provide a readable history tracking. The programmatic method using CommandComponent
can be easier with built-in class documentation and code completion.
Python
import os
train_src_dir = "./components/train"
os.makedirs(train_src_dir, exist_ok=True)
Python
%%writefile {train_src_dir}/train.py
import argparse
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
import os
import pandas as pd
import mlflow
def select_first_file(path):
"""Selects first file in folder, use under assumption there is only one
file in folder
Args:
path (str): path to directory or file to choose
Returns:
str: full path of selected file
"""
files = os.listdir(path)
return os.path.join(path, files[0])
# Start Logging
mlflow.start_run()
# enable autologging
mlflow.sklearn.autolog()
os.makedirs("./outputs", exist_ok=True)
def main():
"""Main function of the script."""
# paths are mounted as folder, therefore, we are selecting the file from
folder
train_df = pd.read_csv(select_first_file(args.train_data))
# paths are mounted as folder, therefore, we are selecting the file from
folder
test_df = pd.read_csv(select_first_file(args.test_data))
clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
As you can see in this training script, once the model is trained, the model file is saved
and registered to the workspace. Now you can use the registered model in inferencing
endpoints.
For the environment of this step, you use one of the built-in (curated) Azure Machine
Learning environments. The tag azureml , tells the system to use look for the name in
curated environments. First, create the yaml file describing the component:
Python
%%writefile {train_src_dir}/train.yml
# <component>
name: train_credit_defaults_model
display_name: Train Credit Defaults Model
# version: 1 # Not specifying a version will automatically update the
version
type: command
inputs:
train_data:
type: uri_folder
test_data:
type: uri_folder
learning_rate:
type: number
registered_model_name:
type: string
outputs:
model:
type: uri_folder
code: .
environment:
# for this step, we'll use an AzureML curate environment
azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1
command: >-
python train.py
--train_data ${{inputs.train_data}}
--test_data ${{inputs.test_data}}
--learning_rate ${{inputs.learning_rate}}
--registered_model_name ${{inputs.registered_model_name}}
--model ${{outputs.model}}
# </component>
Now create and register the component. Registering it allows you to re-use it in other
pipelines. Also, anyone else with access to your workspace can use the registered
component.
Python
Here, you use input data, split ratio and registered model name as input variables. Then
call the components and connect them via their inputs/outputs identifiers. The outputs
of each step can be accessed via the .outputs property.
The Python functions returned by load_component() work as any regular Python function
that we use within a pipeline to call each step.
To code the pipeline, you use a specific @dsl.pipeline decorator that identifies the
Azure Machine Learning pipelines. In the decorator, we can specify the pipeline
description and default resources like compute and storage. Like a Python function,
pipelines can have inputs. You can then create multiple instances of a single pipeline
with different inputs.
Here, we used input data, split ratio and registered model name as input variables. We
then call the components and connect them via their inputs/outputs identifiers. The
outputs of each step can be accessed via the .outputs property.
Python
# the dsl decorator tells the sdk that we are defining an Azure Machine
Learning pipeline
from azure.ai.ml import dsl, Input, Output
@dsl.pipeline(
compute="serverless", # "serverless" value runs pipeline on serverless
compute
description="E2E data_perp-train pipeline",
)
def credit_defaults_pipeline(
pipeline_job_data_input,
pipeline_job_test_train_ratio,
pipeline_job_learning_rate,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
test_train_ratio=pipeline_job_test_train_ratio,
)
Now use your pipeline definition to instantiate a pipeline with your dataset, split rate of
choice and the name you picked for your model.
Python
registered_model_name = "credit_defaults_model"
Here you also pass an experiment name. An experiment is a container for all the
iterations one does on a certain project. All the jobs submitted under the same
experiment name would be listed next to each other in Azure Machine Learning studio.
Once completed, the pipeline registers a model in your workspace as a result of training.
Python
You can track the progress of your pipeline, by using the link generated in the previous
cell. When you first select this link, you might see that the pipeline is still running. Once
it's complete, you can examine each component's results.
There are two important results you'll want to see about training:
View your metrics: Select the Metrics tab. This section shows different logged
metrics. In this example. mlflow autologging , has automatically logged the training
metrics.
Clean up resources
If you plan to continue now to other tutorials, skip to Next steps.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next steps
Learn how to Schedule machine learning pipeline jobs
Tutorial: Train an object detection model
with AutoML and Python
Article • 11/07/2023
In this tutorial, you learn how to train an object detection model using Azure Machine
Learning automated ML with the Azure Machine Learning CLI extension v2 or the Azure
Machine Learning Python SDK v2. This object detection model identifies whether the
image contains objects, such as a can, carton, milk bottle, or water bottle.
You write code using the Python SDK in this tutorial and learn the following tasks:
Prerequisites
To use Azure Machine Learning, you'll first need a workspace. If you don't have
one, complete Create resources you need to get started to create a workspace and
learn more about using it.
Download and unzip the *odFridgeObjects.zip data file. The dataset is annotated
in Pascal VOC format, where each image corresponds to an xml file. Each xml file
contains information on where its corresponding image file is located and also
contains information about the bounding boxes and the object labels. In order to
use this data, you first need to convert it to the required JSONL format as seen in
the Convert the downloaded data to JSONL section of the notebook.
Use a compute instance to follow this tutorial without further installation. (See how
to create a compute instance.) Or install the CLI/SDK to use your own local
environment.
Azure CLI
7 Note
To try serverless compute (preview), skip this step and proceed to Experiment
setup.
You first need to set up a compute target to use for your automated ML model training.
Automated ML models for image tasks require GPU SKUs.
This tutorial uses the NCsv3-series (with V100 GPUs) as this type of compute target uses
multiple GPUs to speed up training. Additionally, you can set up multiple nodes to take
advantage of parallelism when tuning hyperparameters for your model.
The following code creates a GPU compute of size Standard_NC24s_v3 with four nodes.
Azure CLI
yml
$schema:
https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: gpu-cluster
type: amlcompute
size: Standard_NC24s_v3
min_instances: 0
max_instances: 4
idle_time_before_scale_down: 120
To create the compute, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.
Azure CLI
Experiment setup
You can use an Experiment to track your model training jobs.
Azure CLI
YAML
experiment_name: dpv2-cli-automl-image-object-detection-experiment
Python
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.patches as patches
from PIL import Image as pil_image
import numpy as np
import json
import os
label_to_color_mapping = {}
for gt in ground_truth_boxes:
label = gt["label"]
if label in label_to_color_mapping:
color = label_to_color_mapping[label]
else:
# Generate a random color. If you want to use a specific color,
you can use something like "red".
color = np.random.rand(3)
label_to_color_mapping[label] = color
# Display label
ax.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)
plt.show()
Using the above helper functions, for any given image, you can run the following code
to display the bounding boxes.
Python
image_file = "./odFridgeObjects/images/31.jpg"
jsonl_file = "./odFridgeObjects/train_annotations.jsonl"
plot_ground_truth_boxes_jsonl(image_file, jsonl_file)
Azure CLI
yml
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder
To upload the images as a data asset, you run the following CLI v2 command with
the path to your .yml file, workspace name, resource group and subscription ID.
Azure CLI
az ml data create -f [PATH_TO_YML_FILE] --workspace-name
[YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --
subscription [YOUR_AZURE_SUBSCRIPTION]
Next step is to create MLTable from your data in jsonl format as shown below. MLtable
package your data into a consumable object for training.
YAML
paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
Azure CLI
The following configuration creates training and validation data from the MLTable.
YAML
target_column_name: label
training_data:
path: data/training-mltable-folder
type: mltable
validation_data:
path: data/validation-mltable-folder
type: mltable
Azure CLI
APPLIES TO: Azure CLI ml extension v2 (current)
yml
resources:
instance_type: Standard_NC24s_v3
instance_count: 4
```yaml
task: image_object_detection
primary_metric: mean_average_precision
compute: azureml:gpu-cluster
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
In your AutoML job, you can perform an automatic hyperparameter sweep in order to
find the optimal model (we call this functionality AutoMode). You only specify the
number of trials; the hyperparameter search space, sampling method and early
termination policy aren't needed. The system will automatically determine the region of
the hyperparameter space to sweep based on the number of trials. A value between 10
and 20 will likely work well on many datasets.
Azure CLI
limits:
max_trials: 10
max_concurrent_trials: 2
Azure CLI
To submit your AutoML job, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.
Azure CLI
In this example, we'll train an object detection model with yolov5 and
fasterrcnn_resnet50_fpn , both of which are pretrained on COCO, a large-scale object
You can perform a hyperparameter sweep over a defined search space to find the
optimal model.
Job limits
You can control the resources spent on your AutoML Image training job by specifying
the timeout_minutes , max_trials and the max_concurrent_trials for the job in limit
settings. Refer to detailed description on Job Limits parameters.
Azure CLI
YAML
limits:
timeout_minutes: 60
max_trials: 10
max_concurrent_trials: 2
The following code defines the search space in preparation for the hyperparameter
sweep for each defined architecture, yolov5 and fasterrcnn_resnet50_fpn . In the search
space, specify the range of values for learning_rate , optimizer , lr_scheduler , etc., for
AutoML to choose from as it attempts to generate a model with the optimal primary
metric. If hyperparameter values aren't specified, then default values are used for each
architecture.
For the tuning settings, use random sampling to pick samples from this parameter space
by using the random sampling_algorithm. The job limits configured above, tells
automated ML to try a total of 10 trials with these different samples, running two trials
at a time on our compute target, which was set up using four nodes. The more
parameters the search space has, the more trials you need to find optimal models.
The Bandit early termination policy is also used. This policy terminates poor performing
trials; that is, those trials that aren't within 20% slack of the best performing trial, which
significantly saves compute resources.
Azure CLI
YAML
sweep:
sampling_algorithm: random
early_termination:
type: bandit
evaluation_interval: 2
slack_factor: 0.2
delay_evaluation: 6
YAML
search_space:
- model_name:
type: choice
values: [yolov5]
learning_rate:
type: uniform
min_value: 0.0001
max_value: 0.01
model_size:
type: choice
values: [small, medium]
- model_name:
type: choice
values: [fasterrcnn_resnet50_fpn]
learning_rate:
type: uniform
min_value: 0.0001
max_value: 0.001
optimizer:
type: choice
values: [sgd, adam, adamw]
min_size:
type: choice
values: [600, 800]
Once the search space and sweep settings are defined, you can then submit the job to
train an image model using your training dataset.
Azure CLI
To submit your AutoML job, you run the following CLI v2 command with the path to
your .yml file, workspace name, resource group and subscription ID.
Azure CLI
When doing a hyperparameter sweep, it can be useful to visualize the different trials
that were tried using the HyperDrive UI. You can navigate to this UI by going to the
'Child jobs' tab in the UI of the main automl_image_job from above, which is the
HyperDrive parent job. Then you can go into the 'Child jobs' tab of this one.
Alternatively, here below you can see directly the HyperDrive parent job and navigate to
its 'Child jobs' tab:
Azure CLI
YAML
Azure CLI
YAML
Azure CLI
Azure CLI
After you register the model you want to use, you can deploy it using the managed
online endpoint deploy-managed-online-endpoint
Azure CLI
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema
.json
name: od-fridge-items-endpoint
auth_mode: key
Azure CLI
Azure CLI
We can also create a batch endpoint for batch inferencing on large volumes of data over
a period of time. Check out the object detection batch scoring notebook for batch
inferencing using the batch endpoint.
Configure online deployment
A deployment is a set of resources required for hosting the model that does the actual
inferencing. We create a deployment for our endpoint using the
ManagedOnlineDeployment class. You can use either GPU or CPU VM SKUs for your
deployment cluster.
Azure CLI
YAML
name: od-fridge-items-mlflow-deploy
endpoint_name: od-fridge-items-endpoint
model: azureml:od-fridge-items-mlflow-model@latest
instance_type: Standard_DS3_v2
instance_count: 1
liveness_probe:
failure_threshold: 30
success_threshold: 1
timeout: 2
period: 10
initial_delay: 2000
readiness_probe:
failure_threshold: 10
success_threshold: 1
timeout: 10
period: 10
initial_delay: 2000
Azure CLI
Azure CLI
Update traffic:
By default the current deployment is set to receive 0% traffic. you can set the traffic
percentage current deployment should receive. Sum of traffic percentages of all the
deployments with one end point shouldn't exceed 100%.
Azure CLI
Azure CLI
YAML
Visualize detections
Now that you have scored a test image, you can visualize the bounding boxes for this
image. To do so, be sure you have matplotlib installed.
Azure CLI
Clean up resources
Don't complete this section if you plan on running other Azure Machine Learning
tutorials.
If you don't plan to use the resources you created, delete them, so you don't incur any
charges.
You can also keep the resource group but delete a single workspace. Display the
workspace properties and select Delete.
Next steps
In this automated machine learning tutorial, you did the following tasks:
Learn how to set up AutoML to train computer vision models with Python.
Code examples:
Azure CLI
APPLIES TO: Azure CLI ml extension v2 (current)
Review detailed code examples and use cases in the azureml-examples
repository for automated machine learning samples . Check the folders
with 'cli-automl-image-' prefix for samples specific to building computer
vision models.
7 Note
Use of the fridge objects dataset is available through the license under the MIT
License .
Tutorial: Train a classification model with
no-code AutoML in the Azure Machine
Learning studio
Article • 08/09/2023
Learn how to train a classification model with no-code AutoML using Azure Machine
Learning automated ML in the Azure Machine Learning studio. This classification model
predicts if a client will subscribe to a fixed term deposit with a financial institution.
With automated ML, you can automate away time intensive tasks. Automated machine
learning rapidly iterates over many combinations of algorithms and hyperparameters to
help you find the best model based on a success metric of your choosing.
You won't write any code in this tutorial, you'll use the studio interface to perform
training. You'll learn how to do the following tasks:
Also try automated machine learning for these other model types:
For a no-code example of forecasting, see Tutorial: Demand forecasting & AutoML.
For a code first example of an object detection model, see the Tutorial: Train an
object detection model with AutoML and Python,
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account .
Create a workspace
An Azure Machine Learning workspace is a foundational resource in the cloud that you
use to experiment, train, and deploy machine learning models. It ties your Azure
subscription and resource group to an easily consumed object in the service.
In this tutorial, complete the follow steps to create a workspace and continue the
tutorial.
Field Description
Workspace Enter a unique name that identifies your workspace. Names must be unique
name across the resource group. Use a name that's easy to recall and to differentiate
from workspaces created by others. The workspace name is case-insensitive.
Resource Use an existing resource group in your subscription or enter a name to create a
group new resource group. A resource group holds related resources for an Azure
solution. You need contributor or owner role to use an existing resource group. For
more information about access, see Manage access to an Azure Machine Learning
workspace.
Region Select the Azure region closest to your users and the data resources to create
your workspace.
For more information on Azure resources refer to the steps in this article, Create
resources you need to get started.
For other ways to create a workspace in Azure, Manage Azure Machine Learning
workspaces in the portal or with the Python SDK (v2).
1. Create a new data asset by selecting From local files from the +Create data asset
drop-down.
a. On the Basic info form, give your data asset a name and provide an optional
description. The automated ML interface currently only supports
TabularDatasets, so the dataset type should default to Tabular.
c. On the Datastore and file selection form, select the default datastore that was
automatically set up during your workspace creation, workspaceblobstore
(Azure Blob Storage). This is where you'll upload your data file to make it
available to your workspace.
f. Select Next on the bottom left, to upload it to the default container that was
automatically set up during your workspace creation.
When the upload is complete, the Settings and preview form is pre-populated
based on the file type.
g. Verify that your data is properly formatted via the Schema form. The data
should be populated as follows. After you verify that the data is accurate, select
Next.
File format Defines the layout and type of data stored in a file. Delimited
Column Indicates how the headers of the dataset, if any, will be All files have
headers treated. same headers
Skip rows Indicates how many, if any, rows are skipped in the None
dataset.
h. The Schema form allows for further configuration of your data for this
experiment. For this example, select the toggle switch for the day_of_week, so
as to not include it. Select Next.
i. On the Confirm details form, verify the information matches what was
previously populated on the Basic info, Datastore and file selection and
Settings and preview forms.
l. Review the data by selecting the data asset and looking at the preview tab that
populates to ensure you didn't include day_of_week then, select Close.
m. Select Next.
Configure job
After you load and configure your data, you can set up your experiment. This setup
includes experiment design tasks such as, selecting the size of your compute
environment and specifying what column you want to predict.
b. Select y as the target column, what you want to predict. This column indicates
whether the client subscribed to a term deposit or not.
c. Select compute cluster as your compute type.
Virtual machine type Select the virtual machine type for CPU (Central
your compute. Processing Unit)
Virtual machine size Select the virtual machine size for Standard_DS12_V2
your compute. A list of
recommended sizes is provided
based on your data and experiment
type.
Min / Max nodes To profile data, you must specify 1 or more Min nodes: 1
nodes. Max nodes:
6
Idle seconds Idle time before the cluster is automatically 120 (default)
before scale down scaled down to the minimum node count.
iv. After creation, select your new compute target from the drop-down list.
e. Select Next.
3. On the Select task and settings form, complete the setup for your automated ML
experiment by specifying the machine learning task type and configuration
settings.
model created by
automated ML.
Additional classification These settings help improve Positive class label: None
settings the accuracy of your model
Select Save.
c. Select Next.
5. Select Finish to run the experiment. The Job Detail screen opens with the Job
status at the top as the experiment preparation begins. This status updates as the
experiment progresses. Notifications also appear in the top right corner of the
studio to inform you of the status of your experiment.
) Important
Preparation takes 10-15 minutes to prepare the experiment run. Once running, it
takes 2-3 minutes more for each iteration.
In production, you'd likely walk away for a bit. But for this tutorial, we suggest you
start exploring the tested algorithms on the Models tab as they complete while the
others are still running.
Explore models
Navigate to the Models tab to see the algorithms (models) tested. By default, the
models are ordered by metric score as they complete. For this tutorial, the model that
scores the highest based on the chosen AUC_weighted metric is at the top of the list.
While you wait for all of the experiment models to finish, select the Algorithm name of
a completed model to explore its performance details.
The following navigates through the Details and the Metrics tabs to view the selected
model's properties, metrics, and performance charts.
Model explanations
While you wait for the models to complete, you can also take a look at model
explanations and see which data features (raw or engineered) influenced a particular
model's predictions.
These model explanations can be generated on demand, and are summarized in the
model explanations dashboard that's part of the Explanations (preview) tab.
4. Select the Explain model button at the top. On the right, the Explain model pane
appears.
5. Select the automl-compute that you created previously. This compute cluster
initiates a child job to generate the model explanations.
6. Select Create at the bottom. A green success message appears towards the top of
your screen.
7 Note
7. Select the Explanations (preview) button. This tab populates once the
explainability run completes.
8. On the left hand side, expand the pane and select the row that says raw under
Features.
9. Select the Aggregate feature importance tab on the right. This chart shows which
data features influenced the predictions of the selected model.
In this example, the duration appears to have the most influence on the predictions
of this model.
Deploy the best model
The automated machine learning interface allows you to deploy the best model as a
web service in a few steps. Deployment is the integration of the model so it can predict
on new data and identify potential areas of opportunity.
For this experiment, deployment to a web service means that the financial institution
now has an iterative and scalable web solution for identifying potential fixed term
deposit customers.
Check to see if your experiment run is complete. To do so, navigate back to the parent
job page by selecting Job 1 at the top of your screen. A Completed status is shown on
the top left of the screen.
Once the experiment run is complete, the Details page is populated with a Best model
summary section. In this experiment context, VotingEnsemble is considered the best
model, based on the AUC_weighted metric.
We deploy this model, but be advised, deployment takes about 20 minutes to complete.
The deployment process entails several steps including registering the model,
generating resources, and configuring them for the web service.
2. Select the Deploy menu in the top-left and select Deploy to web service.
Field Value
Enable Disable.
authentication
Use custom Disable. Allows for the default driver file (scoring script) and
deployments environment file to be auto-generated.
For this example, we use the defaults provided in the Advanced menu.
4. Select Deploy.
A green success message appears at the top of the Job screen, and in the Model
summary pane, a status message appears under Deploy status. Select Refresh
periodically to check the deployment status.
Proceed to the Next Steps to learn more about how to consume your new web service,
and test your predictions using Power BI's built in Azure Machine Learning support.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store.
Delete only the deployment files to minimize costs to your account, or if you want to
keep your workspace and experiment files. Otherwise, delete the entire resource group,
if you don't plan to use any of the files.
3. Select Proceed.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
3. Select Delete resource group.
Next steps
In this automated machine learning tutorial, you used Azure Machine Learning's
automated ML interface to create and deploy a classification model. See these articles
for more information and next steps:
7 Note
This Bank Marketing dataset is made available under the Creative Commons (CCO:
Public Domain) License . Any rights in individual contents of the database are
licensed under the Database Contents License and available on Kaggle . This
dataset was originally available within the UCI Machine Learning Database .
Learn how to create a time-series forecasting model without writing a single line of code
using automated machine learning in the Azure Machine Learning studio. This model
predicts rental demand for a bike sharing service.
You don't write any code in this tutorial, you use the studio interface to perform training.
You learn how to do the following tasks:
Also try automated machine learning for these other model types:
Prerequisites
An Azure Machine Learning workspace. See Create workspace resources.
1. On the Select dataset form, select From local files from the +Create dataset drop-
down.
a. On the Basic info form, give your dataset a name and provide an optional
description. The dataset type should default to Tabular, since automated ML in
Azure Machine Learning studio currently only supports tabular datasets.
c. On the Datastore and file selection form, select the default datastore that was
automatically set up during your workspace creation, workspaceblobstore
(Azure Blob Storage). This is the storage location where you upload your data
file.
e. Choose the bike-no.csv file on your local computer. This is the file you
downloaded as a prerequisite .
f. Select Next
When the upload is complete, the Settings and preview form is pre-populated
based on the file type.
g. Verify that the Settings and preview form is populated as follows and select
Next.
File format Defines the layout and type of data stored in a file. Delimited
Field Description Value for
tutorial
Column Indicates how the headers of the dataset, if any, will be Only first file
headers treated. has headers
Skip rows Indicates how many, if any, rows are skipped in the None
dataset.
h. The Schema form allows for further configuration of your data for this
experiment.
i. For this example, choose to ignore the casual and registered columns. These
columns are a breakdown of the cnt column so, therefore we don't include
them.
ii. Also for this example, leave the defaults for the Properties and Type.
i. On the Confirm details form, verify the information matches what was
previously populated on the Basic info and Settings and preview forms.
l. Select Next.
Configure job
After you load and configure your data, set up your remote compute target and select
which column in your data you want to predict.
b. Select cnt as the target column, what you want to predict. This column indicates
the number of total bike share rentals.
c. Select compute cluster as your compute type.
Virtual machine type Select the virtual machine type for CPU (Central
your compute. Processing Unit)
Virtual machine size Select the virtual machine size for your Standard_DS12_V2
compute. A list of recommended sizes
is provided based on your data and
experiment type.
Min / Max nodes To profile data, you must specify one or more Min nodes: 1
nodes. Max nodes:
6
Idle seconds before Idle time before the cluster is automatically 120 (default)
scale down scaled down to the minimum node count.
iv. After creation, select your new compute target from the drop-down list.
e. Select Next.
1. On the Task type and settings form, select Time series forecasting as the machine
learning task type.
2. Select date as your Time column and leave Time series identifiers blank.
3. The Frequency is how often your historic data is collected. Keep Autodetect
selected.
4.
5. The forecast horizon is the length of time into the future you want to predict.
Deselect Autodetect and type 14 in the field.
6. Select View additional configuration settings and populate the fields as follows.
These settings are to better control the training job and specify settings for your
forecast. Otherwise, defaults are applied based on experiment selection and data.
Exit criterion If a criteria is met, the training Training job time (hours): 3
job is stopped. Metric score threshold:
None
Additional configurations Description Value for tutorial
Select Save.
7. Select Next.
Run experiment
To run your experiment, select Finish. The Job details screen opens with the Job status
at the top next to the job number. This status updates as the experiment progresses.
Notifications also appear in the top right corner of the studio, to inform you of the
status of your experiment.
) Important
Preparation takes 10-15 minutes to prepare the experiment job. Once running, it
takes 2-3 minutes more for each iteration.
In production, you'd likely walk away for a bit as this process takes time. While you
wait, we suggest you start exploring the tested algorithms on the Models tab as
they complete.
Explore models
Navigate to the Models tab to see the algorithms (models) tested. By default, the
models are ordered by metric score as they complete. For this tutorial, the model that
scores the highest based on the chosen Normalized root mean squared error metric is
at the top of the list.
While you wait for all of the experiment models to finish, select the Algorithm name of
a completed model to explore its performance details.
The following example navigates to select a model from the list of models that the job
created. Then, you select the Overview and the Metrics tabs to view the selected
model's properties, metrics and performance charts.
For this experiment, deployment to a web service means that the bike share company
now has an iterative and scalable web solution for forecasting bike share rental demand.
Once the job is complete, navigate back to parent job page by selecting Job 1 at the top
of your screen.
In the Best model summary section, the best model in the context of this experiment, is
selected based on the Normalized root mean squared error metric.
We deploy this model, but be advised, deployment takes about 20 minutes to complete.
The deployment process entails several steps including registering the model,
generating resources, and configuring them for the web service.
2. Select the Deploy button located in the top-left area of the screen.
Use custom Disable. Disabling allows for the default driver file (scoring script)
deployment assets and environment file to be autogenerated.
For this example, we use the defaults provided in the Advanced menu.
4. Select Deploy.
A green success message appears at the top of the Job screen stating that the
deployment was started successfully. The progress of the deployment can be
found in the Model summary pane under Deploy status.
Proceed to the Next steps to learn more about how to consume your new web service,
and test your predictions using Power BI's built in Azure Machine Learning support.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store.
Delete only the deployment files to minimize costs to your account, or if you want to
keep your workspace and experiment files. Otherwise, delete the entire resource group,
if you don't plan to use any of the files.
) Important
The resources that you created can be used as prerequisites to other Azure
Machine Learning tutorials and how-to articles.
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next steps
In this tutorial, you used automated ML in the Azure Machine Learning studio to create
and deploy a time series forecasting model that predicts bike share rental demand.
See this article for steps on how to create a Power BI supported schema to facilitate
consumption of your newly deployed web service:
7 Note
This bike share dataset has been modified for this tutorial. This dataset was made
available as part of a Kaggle competition and was originally available via Capital
Bikeshare . It can also be found within the UCI Machine Learning Database .
Source: Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble
detectors and background knowledge, Progress in Artificial Intelligence (2013): pp.
1-15, Springer Berlin Heidelberg.
Tutorial: Train an image classification
TensorFlow model using the Azure
Machine Learning Visual Studio Code
Extension (preview)
Article • 11/15/2023
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and we don't recommend it for production workloads.
Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning . If you're using the free subscription, only CPU clusters
are supported.
Install Visual Studio Code , a lightweight, cross-platform code editor.
Azure Machine Learning Studio Visual Studio Code extension. For install
instructions see the Setup Azure Machine Learning Visual Studio Code extension
guide
CLI (v2). For installation instructions, see Install, set up, and use the CLI (v2)
Clone the community driven repository
Bash
Create a workspace
The first thing you have to do to build an application in Azure Machine Learning is to
create a workspace. A workspace contains the resources to train models as well as the
trained models themselves. For more information, see what is a workspace.
2. On the Visual Studio Code activity bar, select the Azure icon to open the Azure
Machine Learning view.
3. In the Azure Machine Learning view, right-click your subscription node and select
Create Workspace.
4. A specification file appears. Configure the specification file with the following
options.
yml
$schema:
https://azuremlschemas.azureedge.net/latest/workspace.schema.json
name: TeamWorkspace
location: WestUS2
display_name: team-ml-workspace
description: A workspace for training machine learning models
tags:
purpose: training
team: ml-team
5. Right-click the specification file and select AzureML: Execute YAML. Creating a
resource uses the configuration options defined in the YAML specification file and
submits a job using the CLI (v2). At this point, a request to Azure is made to create
a new workspace and dependent resources in your account. After a few minutes,
the new workspace appears in your subscription node.
6. Set TeamWorkspace as your default workspace. Doing so places resources and jobs
you create in the workspace by default. Select the Set Azure Machine Learning
Workspace button on the Visual Studio Code status bar and follow the prompts to
set TeamWorkspace as your default workspace.
Like workspaces and compute targets, training jobs are defined using resource
templates. For this sample, the specification is defined in the job.yml file which looks like
the following:
yml
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >
python train.py
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:48
resources:
instance_type: Standard_NC12
instance_count: 3
experiment_name: tensorflow-mnist-example
description: Train a basic neural network with TensorFlow on the MNIST
dataset.
At this point, a request is sent to Azure to run your experiment on the selected compute
target in your workspace. This process takes several minutes. The amount of time to run
the training job is impacted by several factors like the compute type and training data
size. To track the progress of your experiment, right-click the current run node and
select View Job in Azure portal.
When the dialog requesting to open an external website appears, select Open.
When the model is done training, the status label next to the run node updates to
"Completed".
Next steps
In this tutorial, you learn the following tasks:
Launch Visual Studio Code integrated with Azure Machine Learning (preview)
For a walkthrough of how to edit, run, and debug code locally, see the Python
hello-world tutorial .
Run Jupyter Notebooks in Visual Studio Code using a remote Jupyter server.
For a walkthrough of how to train with Azure Machine Learning outside of Visual
Studio Code, see Tutorial: Train and deploy a model with Azure Machine Learning.
Tutorial 1: Develop and register a feature
set with managed feature store
Article • 11/28/2023
This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.
You can use Azure Machine Learning managed feature store to discover, create, and
operationalize features. The machine learning lifecycle includes a prototyping phase,
where you experiment with various features. It also involves an operationalization phase,
where models are deployed and inference steps look up feature data. Features serve as
the connective tissue in the machine learning lifecycle. To learn more about basic
concepts for managed feature store, see What is managed feature store? and
Understanding top-level entities in managed feature store.
This tutorial describes how to create a feature set specification with custom
transformations. It then uses that feature set to generate training data, enable
materialization, and perform a backfill. Materialization computes the feature values for a
feature window, and then stores those values in a materialization store. All feature
queries can then use those values from the materialization store.
Without materialization, a feature set query applies the transformations to the source on
the fly, to compute the features before it returns the values. This process works well for
the prototyping phase. However, for training and inference operations in a production
environment, we recommend that you materialize the features, for greater reliability and
availability.
This tutorial is the first part of the managed feature store tutorial series. Here, you learn
how to:
The SDK-only track uses only Python SDKs. Choose this track for pure, Python-
based development and deployment.
The SDK and CLI track uses the Python SDK for feature set development and
testing only, and it uses the CLI for CRUD (create, read, update, and delete)
operations. This track is useful in continuous integration and continuous delivery
(CI/CD) or GitOps scenarios, where CLI/YAML is preferred.
Prerequisites
Before you proceed with this tutorial, be sure to cover these prerequisites:
On your user account, the Owner role for the resource group where the feature
store is created.
If you choose to use a new resource group for this tutorial, you can easily delete all
the resources by deleting the resource group.
1. In the Azure Machine Learning studio environment, select Notebooks on the left
pane, and then select the Samples tab.
2. Browse to the featurestore_sample directory (select Samples > SDK v2 > sdk >
python > featurestore_sample), and then select Clone.
3. The Select target directory panel opens. Select the Users directory, then select
your user name, and finally select Clone.
4. To configure the notebook environment, you must upload the conda.yml file:
a. Select Notebooks on the left pane, and then select the Files tab.
b. Browse to the env directory (select Users > your_user_name >
featurestore_sample > project > env), and then select the conda.yml file.
c. Select Download.
5. In the Azure Machine Learning environment, open the notebook, and then select
Configure session.
8. Select Apply.
# Run this cell to start the spark session (any code block will start the
session ). This can take around 10 mins.
print("start spark session")
import os
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
7 Note
You use a feature store to reuse features across projects. You use a project
workspace (an Azure Machine Learning workspace) to train inference models, by
taking advantage of features from feature stores. Many project workspaces can
share and reuse the same feature store.
SDK track
You use the same MLClient (package name azure-ai-ml ) SDK that you use
with the Azure Machine Learning workspace. A feature store is implemented
as a type of workspace. As a result, this SDK is used for CRUD operations for
feature stores, feature sets, and feature store entities.
This tutorial doesn't require explicit installation of those SDKs, because the earlier
conda.yml instructions cover this step.
Python
featurestore_name = "<FEATURESTORE_NAME>"
featurestore_location = "eastus"
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
SDK track
Python
ml_client = MLClient(
AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
)
fs = FeatureStore(name=featurestore_name,
location=featurestore_location)
# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())
3. Initialize a feature store core SDK client for Azure Machine Learning.
As explained earlier in this tutorial, the feature store core SDK client is used to
develop and consume features.
Python
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
4. Grant the "Azure Machine Learning Data Scientist" role on the feature store to your
user identity. Obtain your Microsoft Entra object ID value from the Azure portal, as
described in Find the user object ID.
Assign the AzureML Data Scientist role to your user identity, so that it can create
resources in feature store workspace. The permissions might need some time to
propagate.
For more information more about access control, see Manage access control for
managed feature store.
Python
your_aad_objectid = "<USER_AAD_OBJECTID>"
This notebook uses sample data hosted in a publicly accessible blob container. It
can be read into Spark only through a wasbs driver. When you create feature sets
by using your own source data, host them in an Azure Data Lake Storage Gen2
account, and use an abfss driver in the data path.
Python
To learn more about the feature set and transformations, see What is managed
feature store?.
Python
transactions_featureset_code_path = (
root_dir +
"/featurestore/featuresets/transactions/transformation_code"
)
transactions_featureset_spec = create_feature_set_spec(
source=ParquetFeatureSource(
path="wasbs://[email protected]/feature-
store-prp/datasources/transactions-source/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
),
feature_transformation=TransformationCode(
path=transactions_featureset_code_path,
transformer_class="transaction_transform.TransactionFeatureTransformer"
,
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
source_lookback=DateTimeOffset(days=7, hours=0, minutes=0),
temporal_join_lookback=DateTimeOffset(days=1, hours=0, minutes=0),
infer_schema=True,
)
To register the feature set specification with the feature store, you must save that
specification in a specific format.
Review the generated transactions feature set specification. Open this file from
the file tree to see the specification:
featurestore/featuresets/accounts/spec/FeaturesetSpec.yaml.
code, the code must return a DataFrame that maps to the features and
datatypes.
index_columns : The join keys required to access values from the feature set.
Persisting the feature set specification offers another benefit: the feature set
specification can be source controlled.
Python
import os
transactions_featureset_spec.dump(transactions_featureset_spec_folder,
overwrite=False)
SDK track
Python
Create an account entity that has the join key accountID of type string .
Python
from azure.ai.ml.entities import DataColumn, DataColumnType
account_entity_config = FeatureStoreEntity(
name="account",
version="1",
index_columns=[DataColumn(name="accountID",
type=DataColumnType.STRING)],
stage="Development",
description="This entity represents user account index key
accountID.",
tags={"data_typ": "nonPII"},
)
poller =
fs_client.feature_store_entities.begin_create_or_update(account_ent
ity_config)
print(poller.result())
SDK track
Python
transaction_fset_config = FeatureSet(
name="transactions",
version="1",
description="7-day and 3-day rolling aggregation of transactions
featureset",
entities=[f"azureml:account:1"],
stage="Development",
specification=FeatureSetSpecification(path=transactions_featureset_spec_
folder),
tags={"data_type": "nonPII"},
)
poller =
fs_client.feature_sets.begin_create_or_update(transaction_fset_config)
print(poller.result())
SDK track
1. Obtain your Microsoft Entra object ID value from the Azure portal, as
described in Find the user object ID.
2. Obtain information about the offline materialization store from the Feature
Store Overview page in the Feature Store UI. You can find the values for the
storage account subscription ID, storage account resource group name, and
storage account name for offline materialization store in the Offline
materialization store card.
For more information about access control, see Manage access control for
managed feature store.
Execute this code cell for role assignment. The permissions might need some
time to propagate.
Python
your_aad_objectid = "<USER_AAD_OBJECTID>"
storage_subscription_id = "<SUBSCRIPTION_ID>"
storage_resource_group_name = "<RESOURCE_GROUP>"
storage_account_name = "<STORAGE_ACCOUNT_NAME>"
grant_user_aad_storage_data_reader_role(
AzureMLOnBehalfOfCredential(),
your_aad_objectid,
storage_subscription_id,
storage_resource_group_name,
storage_account_name,
)
Generate a training data DataFrame by using
the registered feature set
1. Load observation data.
Observation data typically involves the core data used for training and inferencing.
This data joins with the feature data to create the full training data resource.
Observation data is data captured during the event itself. Here, it has core
transaction data, including transaction ID, account ID, and transaction amount
values. Because you use it for training, it also has an appended target variable
(is_fraud).
Python
observation_data_path =
"wasbs://[email protected]/feature-store-
prp/observation_data/train/*.parquet"
observation_data_df = spark.read.parquet(observation_data_path)
obs_data_timestamp_column = "timestamp"
display(observation_data_df)
# Note: the timestamp column is displayed in a different format.
Optionally, you can can call training_df.show() to see correctly
formatted value
Python
Python
3. Select the features that become part of the training data. Then, use the feature
store SDK to generate the training data itself.
Python
from azureml.featurestore import get_offline_features
more_features = featurestore.resolve_feature_uri(more_features)
features.extend(more_features)
SDK track
Set spark.sql.shuffle.partitions in the yaml file according to the
feature data size
7 Note
The sample data used in this notebook is small. Therefore, this parameter is set
to 1 in the featureset_asset_offline_enabled.yaml file.
Python
transactions_fset_config =
fs_client._featuresets.get(name="transactions", version="1")
transactions_fset_config.materialization_settings =
MaterializationSettings(
offline_enabled=True,
resource=MaterializationComputeResource(instance_type="standard_e8s_v3")
,
spark_configuration={
"spark.driver.cores": 4,
"spark.driver.memory": "36g",
"spark.executor.cores": 4,
"spark.executor.memory": "36g",
"spark.executor.instances": 2,
"spark.sql.shuffle.partitions": 1,
},
schedule=None,
)
fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)
print(fs_poller.result())
You can also save the feature set asset as a YAML resource.
SDK track
Python
## uncomment to run
transactions_fset_config.dump(
root_dir
+
"/featurestore/featuresets/transactions/featureset_asset_offline_enabled
.yaml"
)
7 Note
You might need to determine a backfill data window value. The window must
match the window of your training data. For example, to use 18 months of data for
training, you must retrieve features for 18 months. This means you should backfill
for an 18-month window.
SDK track
This code cell materializes data by current status None or Incomplete for the defined
feature window.
Python
poller = fs_client.feature_sets.begin_backfill(
name="transactions",
version="1",
feature_window_start_time=st,
feature_window_end_time=et,
data_status=[DataAvailabilityStatus.NONE],
)
print(poller.result().job_ids)
Python
Tip
Print sample data from the feature set. The output information shows that the data was
retrieved from the materialization store. The get_offline_features() method retrieved
the training and inference data. It also uses the materialization store by default.
Python
# Look up the feature set by providing a name and a version and display few
records.
transactions_featureset = featurestore.feature_sets.get("transactions", "1")
display(transactions_featureset.to_spark_dataframe().head(5))
3. From the list of accessible feature stores, select the feature store for which you
performed backfill.
The data can have a maximum of 2,000 data intervals. If your data contains more
than 2,000 data intervals, create a new feature set version.
You can provide a list of more than one data statuses (for example, ["None",
"Incomplete"] ) in a single backfill job.
During backfill, a new materialization job is submitted for each data interval that
falls within the defined feature window.
If a materialization job is pending, or that job is running for a data interval that
hasn't yet been backfilled, a new job isn't submitted for that data interval.
7 Note
SDK track
Python
poller = fs_client.feature_sets.begin_backfill(
name="transactions",
version=version,
job_id="<JOB_ID_OF_FAILED_MATERIALIZATION_JOB>",
)
print(poller.result().job_ids)
This tutorial built the training data with features from the feature store, enabled
materialization to offline feature store, and performed a backfill. Next, you'll run model
training using these features.
Clean up
The fifth tutorial in the series describes how to delete the resources.
Next steps
See the next tutorial in the series: Experiment and train models by using features.
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 2: Experiment and train models
by using features
Article • 11/15/2023
This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.
The first tutorial showed how to create a feature set specification with custom
transformations, and then use that feature set to generate training data, enable
materialization, and perform a backfill. This tutorial shows how to enable materialization,
and perform a backfill. It also shows how to experiment with features, as a way to
improve model performance.
Prerequisites
Before you proceed with this tutorial, be sure to complete the first tutorial in the series.
Set up
1. Configure the Azure Machine Learning Spark notebook.
You can create a new notebook and execute the instructions in this tutorial step by
step. You can also open and run the existing notebook named 2. Experiment and
train models using features.ipynb from the featurestore_sample/notebooks directory.
You can choose sdk_only or sdk_and_cli. Keep this tutorial open and refer to it for
documentation links and more explanation.
a. On the top menu, in the Compute dropdown list, select Serverless Spark
Compute under Azure Machine Learning Serverless Spark.
Python
# run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")
Python
import os
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
Python SDK
Not applicable.
This is the current workspace, and the tutorial notebook runs in this resource.
Python
### Initialize the MLClient of this project workspace
import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]
Python
# feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name from part #1 of the
tutorial
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
Python
You need this compute cluster when you run the training/batch inference jobs.
Python
cluster_basic = AmlCompute(
name="cpu-cluster-fs",
type="amlcompute",
size="STANDARD_F4S_V2", # you can replace it with other supported
VM SKUs
location=ws_client.workspaces.get(ws_client.workspace_name).location,
min_instances=0,
max_instances=1,
idle_time_before_scale_down=360,
)
ws_client.begin_create_or_update(cluster_basic).result()
To onboard precomputed features, you can create a feature set specification without
writing any transformation code. You use a feature set specification to develop and test
a feature set in a fully local development environment.
You don't need to connect to a feature store. In this procedure, you create the feature
set specification locally, and then sample the values from it. For capabilities of managed
feature store, you must use a feature asset definition to register the feature set
specification with a feature store. Later steps in this tutorial provide more details.
Python
accounts_data_path =
"wasbs://[email protected]/feature-store-
prp/datasources/accounts-precalculated/*.parquet"
accounts_df = spark.read.parquet(accounts_data_path)
display(accounts_df.head(5))
2. Create the accounts feature set specification locally, from these precomputed
features.
You don't need any transformation code here, because you reference
precomputed features.
Python
accounts_featureset_spec = create_feature_set_spec(
source=ParquetFeatureSource(
path="wasbs://[email protected]/feature-
store-prp/datasources/accounts-precalculated/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
# account profiles in the source are updated once a year. set
temporal_join_lookback to 365 days
temporal_join_lookback=DateTimeOffset(days=365, hours=0,
minutes=0),
infer_schema=True,
)
To register the feature set specification with the feature store, you must save the
feature set specification in a specific format.
After you run the next cell, inspect the generated accounts feature set
specification. To see the specification, open the
featurestore/featuresets/accounts/spec/FeatureSetSpec.yaml file from the file tree.
code, the code must return a DataFrame that maps to the features and
datatypes. Without the provided transformation code, the system builds the
query to map the features and datatypes to the source. In this case, the
generated accounts feature set specification doesn't contain transformation
code, because features are precomputed.
index_columns : The join keys required to access values from the feature set.
To learn more, see Understanding top-level entities in managed feature store and
the CLI (v2) feature set specification YAML schema.
You don't need any transformation code here, because you reference
precomputed features.
Python
import os
Python
This step generates training data for illustrative purposes. As an option, you can
locally train models here. Later steps in this tutorial explain how to train a model in
the cloud.
Python
After you locally experiment with feature definitions, and they seem reasonable,
you can register a feature set asset definition with the feature store.
Python
accounts_fset_config = FeatureSet(
name="accounts",
version="1",
description="accounts featureset",
entities=[f"azureml:account:1"],
stage="Development",
specification=FeatureSetSpecification(path=accounts_featureset_spec_fol
der),
tags={"data_type": "nonPII"},
)
poller =
fs_client.feature_sets.begin_create_or_update(accounts_fset_config)
print(poller.result())
Python
The first tutorial covered this step, when you registered the transactions feature
set. Because you also have an accounts feature set, you can browse through the
available features:
a. Go to the Azure Machine Learning global landing page .
b. On the left pane, select Feature stores.
c. In the list of feature stores, select the feature store that you created earlier.
The UI shows the feature sets and entity that you created. Select the feature sets to
browse through the feature definitions. You can use the global search box to
search for feature sets across feature stores.
Python
3. Select features for the model, and export the model as a feature retrieval
specification.
In the previous steps, you selected features from a combination of registered and
unregistered feature sets, for local experimentation and testing. You can now
experiment in the cloud. Your model-shipping agility increases if you save the
selected features as a feature retrieval specification, and then use the specification
in the machine learning operations (MLOps) or continuous integration and
continuous delivery (CI/CD) flow for training and inference.
Python
transactions_featureset.get_feature("transaction_amount_7d_sum"),
transactions_featureset.get_feature("transaction_amount_3d_sum"),
]
more_features = featurestore.resolve_feature_uri(more_features)
features.extend(more_features)
The inference phase uses the feature retrieval to look up the features. It
integrates all phases of the machine learning lifecycle. Changes to the
training/inference pipeline can stay at a minimum as you experiment and
deploy.
Use of the feature retrieval specification and the built-in feature retrieval
component is optional. You can directly use the get_offline_features() API, as
shown earlier. The name of the specification should be
feature_retrieval_spec.yaml when it's packaged with the model. This way, the
system can recognize it.
Python
# Create feature retrieval spec
feature_retrieval_spec_folder = root_dir +
"/project/fraud_model/feature_retrieval_spec"
featurestore.generate_feature_retrieval_spec(feature_retrieval_spec_
folder, features)
a. Feature retrieval: For its input, this built-in component takes the feature retrieval
specification, the observation data, and the time-stamp column name. It then
generates the training data as output. It runs these steps as a managed Spark
job.
b. Training: Based on the training data, this step trains the model and then
generates a model (not yet registered).
c. Evaluation: This step validates whether the model performance and quality fall
within a threshold. (In this tutorial, it's a placeholder step for illustration
purposes.)
7 Note
In the second tutorial, you ran a backfill job to materialize data for the
transactions feature set. The feature retrieval step reads feature values
from the offline store for this feature set. The behavior is the same, even if
you use the get_offline_features() API.
Python
training_pipeline_path = (
root_dir +
"/project/fraud_model/pipelines/training_pipeline.yaml"
)
training_pipeline_definition =
load_job(source=training_pipeline_path)
training_pipeline_job =
ws_client.jobs.create_or_update(training_pipeline_definition)
ws_client.jobs.stream(training_pipeline_job.name)
# Note: First time it runs, each step in pipeline can take ~ 15
mins. However subsequent runs can be faster (assuming spark pool is
warm - default timeout is 30 mins)
To display the pipeline steps, select the hyperlink for the Web View
pipeline, and open it in a new window.
The feature retrieval specification is packaged along with the model. The model
registration step in the training pipeline handled this step. You created the feature
retrieval specification during experimentation. Now it's part of the model
definition. In the next tutorial, you'll see how inferencing uses it.
On the same Models page, select the Feature sets tab. This tab shows both the
transactions and accounts feature sets on which this model depends.
The feature retrieval specification determined this list when the model was
registered.
Clean up
The fifth tutorial in the series describes how to delete the resources.
Next steps
Go to the next tutorial in the series: Enable recurrent materialization and run batch
inference.
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 3: Enable recurrent
materialization and run batch inference
Article • 11/28/2023
This tutorial series shows how features seamlessly integrate all phases of the machine
learning lifecycle: prototyping, training, and operationalization.
The first tutorial showed how to create a feature set specification with custom
transformations, and then use that feature set to generate training data, enable
materialization, and perform a backfill. The second tutorial showed how to enable
materialization, and perform a backfill. It also showed how to experiment with features,
as a way to improve model performance.
Prerequisites
Before you proceed with this tutorial, be sure to complete the first and second tutorials
in the series.
Set up
1. Configure the Azure Machine Learning Spark notebook.
To run this tutorial, you can create a new notebook and execute the instructions
step by step. You can also open and run the existing notebook named 3. Enable
recurrent materialization and run batch inference. You can find that notebook, and
all the notebooks in this series, in the featurestore_sample/notebooks directory. You
can choose sdk_only or sdk_and_cli. Keep this tutorial open and refer to it for
documentation links and more explanation.
a. In the Compute dropdown list in the top nav, select Serverless Spark Compute
under Azure Machine Learning Serverless Spark.
Python
# run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")
Python
import os
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
Python SDK
Not applicable.
5. Initialize the project workspace CRUD (create, read, update, and delete) client.
Python
Be sure to update the featurestore_name value, to reflect what you created in the
first tutorial.
Python
# feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name from part #1 of the
tutorial
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
Python
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
To handle inference of the model in production, you might want to set up recurrent
materialization jobs to keep the materialization store up to date. These jobs run on user-
defined schedules. The recurrent job schedule works this way:
Interval and frequency values define a window. For example, the following values
define a three-hour window:
interval = 3
frequency = Hour
The first window starts at the start_time value defined in RecurrenceTrigger , and
so on.
The first recurrent job is submitted at the start of the next window after the update
time.
Later recurrent jobs are submitted at every window after the first job.
Python
transactions_fset_config = fs_client.feature_sets.get(name="transactions",
version="1")
fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)
print(fs_poller.result())
Python SDK
Python
1. You use the same built-in feature retrieval component for feature retrieval that you
used in the training pipeline (covered in the third tutorial). For pipeline training,
you provided a feature retrieval specification as a component input. For batch
inference, you pass the registered model as the input. The component looks for
the feature retrieval specification in the model artifact.
Additionally, for training, the observation data had the target variable. However,
the batch inference observation data doesn't have the target variable. The feature
retrieval step joins the observation data with the features and outputs the data for
batch inference.
2. The pipeline uses the batch inference input data from previous step, runs inference
on the model, and appends the predicted value as output.
7 Note
You use a job for batch inference in this example. You can also use batch
endpoints in Azure Machine Learning.
Python
3. Paste the Data field value in the following cell, with separate name and version
values. The last character is the version, preceded by a colon ( : ).
4. Note the predict_is_fraud column that the batch inference pipeline generated.
Python
inf_data_output = ws_client.data.get(
name="azureml_1c106662-aa5e-4354-b5f9-
57c1b0fdb3a7_output_data_data_with_prediction",
version="1",
)
inf_output_df = spark.read.parquet(inf_data_output.path +
"data/*.parquet")
display(inf_output_df.head(5))
Clean up
The fifth tutorial in the series describes how to delete the resources.
Next steps
Learn about feature store concepts and top-level entities in managed feature store.
Learn about identity and access control for managed feature store.
View the troubleshooting guide for managed feature store.
View the YAML reference.
Tutorial 4: Enable online materialization
and run online inference
Article • 11/28/2023
An Azure Machine Learning managed feature store lets you discover, create, and
operationalize features. Features serve as the connective tissue in the machine learning
lifecycle, starting from the prototyping phase, where you experiment with various
features. That lifecycle continues to the operationalization phase, where you deploy your
models, and inference steps look up the feature data. For more information about
feature stores, see feature store concepts.
Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, and use that feature set to generate training data. Part 2 of the series
showed how to enable materialization, and perform a backfill. Additionally, Part 2
showed how to experiment with features, as a way to improve model performance. Part
3 showed how a feature store increases agility in the experimentation and training flows.
Part 3 also described how to run batch inference.
Prerequisites
7 Note
This tutorial uses Azure Machine Learning notebook with Serverless Spark
Compute.
Make sure you complete parts 1 through 4 of this tutorial series. This tutorial
reuses the feature store and other resources created in the earlier tutorials.
Set up
This tutorial uses the Python feature store core SDK ( azureml-featurestore ). The Python
SDK is used for create, read, update, and delete (CRUD) operations, on feature stores,
feature sets, and feature store entities.
You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the online.yml file covers them.
You can create a new notebook and execute the instructions in this tutorial step by
step. You can also open and run the existing notebook
featurestore_sample/notebooks/sdk_only/4. Enable online store and run online
inference.ipynb. Keep this tutorial open and refer to it for documentation links and
more explanation.
a. In the Compute dropdown list in the top nav, select Serverless Spark Compute.
2. This code cell starts the Spark session. It needs about 10 minutes to install all
dependencies and start the Spark session.
Python
# Run this cell to start the spark session (any code block will start
the session ). This can take approximately 10 mins.
print("start spark session")
Python
import os
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
4. Initialize the MLClient for the project workspace, where the tutorial notebook runs.
The MLClient is used for the create, read, update, and delete (CRUD) operations.
Python
import os
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]
5. Initialize the MLClient for the feature store workspace, for the create, read, update,
and delete (CRUD) operations on the feature store workspace.
Python
# Feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name from part #1 of the
tutorial
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
7 Note
6. As mentioned earlier, this tutorial uses the Python feature store core SDK ( azureml-
featurestore ). This initialized SDK client is used for create, read, update, and delete
(CRUD) operations, on feature stores, feature sets, and feature store entities.
Python
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
1. Set values for the Azure Cache for Redis resource, to use as online materialization
store. In this code cell, define the name of the Azure Cache for Redis resource to
create or reuse. You can override other default settings.
Python
ws_location =
ws_client.workspaces.get(ws_client.workspace_name).location
redis_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
redis_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]
redis_name = "<REDIS_NAME>"
redis_location = ws_location
2. You can create a new Redis instance. You would select the Redis Cache tier (basic,
standard, premium, or enterprise). Choose an SKU family available for the cache
tier you select. For more information about tiers and cache performance, see this
resource. For more information about SKU tiers and Azure cache families, see this
resource .
Execute this code cell to create an Azure Cache for Redis with premium tier, SKU
family P , and cache capacity 2. It might take between 5 and 10 minutes to prepare
the Redis instance.
Python
management_client = RedisManagementClient(
AzureMLOnBehalfOfCredential(), redis_subscription_id
)
redis_arm_id = (
management_client.redis.begin_create(
resource_group_name=redis_resource_group_name,
name=redis_name,
parameters=RedisCreateParameters(
location=redis_location,
sku=Sku(name=SkuName.PREMIUM, family=SkuFamily.P,
capacity=2),
),
)
.result()
.id
)
print(redis_arm_id)
3. Optionally, this code cell reuses an existing Redis instance with the previously
defined name.
Python
redis_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Cache/
Redis/{name}".format(
sub_id=redis_subscription_id,
rg=redis_resource_group_name,
name=redis_name,
)
Python
ml_client = MLClient(
AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
)
fs = FeatureStore(
name=featurestore_name,
online_store=online_store,
)
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())
7 Note
Python
accounts_fset_config = fs_client._featuresets.get(name="accounts",
version="1")
accounts_fset_config.materialization_settings = MaterializationSettings(
offline_enabled=True,
online_enabled=True,
resource=MaterializationComputeResource(instance_type="standard_e8s_v3"),
spark_configuration={
"spark.driver.cores": 4,
"spark.driver.memory": "36g",
"spark.executor.cores": 4,
"spark.executor.memory": "36g",
"spark.executor.instances": 2,
},
schedule=None,
)
fs_poller =
fs_client.feature_sets.begin_create_or_update(accounts_fset_config)
print(fs_poller.result())
Python
st = datetime(2020, 1, 1, 0, 0, 0, 0)
et = datetime.now() - timedelta(hours=3)
poller = fs_client.feature_sets.begin_backfill(
name="accounts",
version="1",
feature_window_start_time=st,
feature_window_end_time=et,
data_status=["None"],
)
print(poller.result().job_ids)
Tip
This code cell tracks completion of the backfill job. With the Azure Cache for Redis
premium tier provisioned earlier, this step might need approximately 10 minutes to
complete.
Python
1. This code cell enables the transactions feature set online materialization.
Python
transactions_fset_config =
fs_client._featuresets.get(name="transactions", version="1")
transactions_fset_config.materialization_settings.online_enabled = True
fs_poller =
fs_client.feature_sets.begin_create_or_update(transactions_fset_config)
print(fs_poller.result())
2. This code cell backfills the data to both the online and offline materialization store,
to ensure that both stores have the latest data. The recurrent materialization job,
which you set up in Tutorial 3 of this series, now materializes data to both online
and offline materialization stores.
Python
st = datetime(2020, 1, 1, 0, 0, 0, 0)
et = datetime.now() - timedelta(hours=3)
poller = fs_client.feature_sets.begin_backfill(
name="transactions",
version="1",
feature_window_start_time=st,
feature_window_end_time=et,
data_status=[DataAvailabilityStatus.NONE],
)
print(poller.result().job_ids)
This code cell tracks completion of the backfill job. Using the premium tier Azure
Cache for Redis provisioned earlier, this step might need approximately five
minutes to complete.
Python
3. From the list of accessible feature stores, select the feature store for which you
performed the backfill.
The data materialization status can be
Complete (green)
Incomplete (red)
Pending (blue)
None (gray)
A data interval represents a contiguous portion of data with same data
materialization status. For example, the earlier snapshot has 16 data intervals in the
offline materialization store.
Your data can have a maximum of 2,000 data intervals. If your data contains more
than 2,000 data intervals, create a new feature set version.
You can provide a list of more than one data statuses (for example, ["None",
"Incomplete"] ) in a single backfill job.
During backfill, a new materialization job is submitted for each data interval that
falls in the defined feature window.
A new job is not submitted for a data interval if a materialization job is already
pending, or is running for a data interval that hasn't yet been backfilled.
When the first online materialization job is submitted, the data already
materialized in the offline store, if available, is used to calculate online features.
If the data interval for online materialization partially overlaps the data interval
of already materialized data located in the offline store, separate materialization
jobs are submitted for the overlapping and nonoverlapping parts of the data
interval.
Test locally
Now, use your development environment to look up features from the online
materialization store. The tutorial notebook attached to Serverless Spark Compute
serves as the development environment.
This code cell parses the list of features from the existing feature retrieval specification.
Python
features =
featurestore.resolve_feature_retrieval_spec(feature_retrieval_spec_folder)
features
This code retrieves feature values from the online materialization store.
Python
Prepare some observation data for testing, and use that data to look up features from
the online materialization store. During the online look-up, the keys ( accountID ) defined
in the observation sample data might not exist in the Redis (due to TTL ). In this case:
3. Open the console for the Redis instance, and check for existing keys with the KEYS
* command.
4. Replace the accountID values in the sample observation data with the existing
keys.
Python
import pyarrow
from azureml.featurestore import get_online_features
# Online lookup:
# It may happen that the keys defined in the observation sample data
above does not exist in the Redis (due to TTL).
# If this happens, go to Azure portal and navigate to the Redis
instance, open its console and check for existing keys using command
"KEYS *"
# and replace the sample observation data with the existing keys.
df = get_online_features(features, obs)
df
These steps looked up features from the online store. In the next step, you'll test online
features using an Azure Machine Learning managed online endpoint.
Python
This code cell creates the managed online endpoint defined in the previous code cell.
Python
ws_client.online_endpoints.begin_create_or_update(endpoint).result()
Python
model_endpoint_msi_principal_id = endpoint.identity.principal_id
model_endpoint_msi_principal_id
This code cell grants the Contributor role to the online endpoint managed identity on
the Redis instance. This RBAC permission is needed to materialize data into the Redis
online store.
Python
auth_client = AuthorizationManagementClient(
AzureMLOnBehalfOfCredential(), redis_subscription_id
)
scope =
f"/subscriptions/{redis_subscription_id}/resourceGroups/{redis_resource_grou
p_name}/providers/Microsoft.Cache/Redis/{redis_name}"
# The role definition ID for the "contributor" role on the redis cache
# You can find other built-in role definition IDs in the Azure documentation
role_definition_id =
f"/subscriptions/{redis_subscription_id}/providers/Microsoft.Authorization/r
oleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
auth_client = AuthorizationManagementClient(
AzureMLOnBehalfOfCredential(), featurestore_subscription_id
)
scope =
f"/subscriptions/{featurestore_subscription_id}/resourceGroups/{featurestore
_resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces
/{featurestore_name}"
1. Loads the feature metadata from the feature retrieval specification packaged with
the model during model training. Tutorial 3 of this tutorial series covered this task.
The specification has features from both the transactions and accounts feature
sets.
2. Looks up the online features using the index keys from the request, when an input
inference request is received. In this case, for both feature sets, the index column is
accountID .
3. Passes the features to the model to perform the inference, and returns the
response. The response is a boolean value that represents the variable is_fraud .
Next, execute this code cell to create a managed online deployment definition for
model deployment.
Python
deployment = ManagedOnlineDeployment(
name="green",
endpoint_name=endpoint_name,
model="azureml:fraud_model:1",
code_configuration=CodeConfiguration(
code=root_dir + "/project/fraud_model/online_inference/src/",
scoring_script="scoring.py",
),
environment=Environment(
conda_file=root_dir +
"/project/fraud_model/online_inference/conda.yml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
),
instance_type="Standard_DS3_v2",
instance_count=1,
)
Deploy the model to online endpoint with this code cell. The deployment might need
four to five minutes.
Python
Python
# Test the online deployment using the mock data.
sample_data = root_dir + "/project/fraud_model/online_inference/test.json"
ws_client.online_endpoints.invoke(
endpoint_name=endpoint_name, request_file=sample_data,
deployment_name="green"
)
Clean up
The fifth tutorial in the series describes how to delete the resources.
Next steps
Network isolation with feature store (preview)
Azure Machine Learning feature stores samples repository
Tutorial 5: Develop a feature set with a
custom source
Article • 11/28/2023
An Azure Machine Learning managed feature store lets you discover, create, and
operationalize features. Features serve as the connective tissue in the machine learning
lifecycle, starting from the prototyping phase, where you experiment with various
features. That lifecycle continues to the operationalization phase, where you deploy your
models, and inference steps look up the feature data. For more information about
feature stores, see feature store concepts.
Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, enable materialization and perform a backfill. Part 2 showed how to
experiment with features in the experimentation and training flows. Part 3 explained
recurrent materialization for the transactions feature set, and showed how to run a
batch inference pipeline on the registered model. Part 4 described how to run batch
inference.
Prerequisites
7 Note
This tutorial uses an Azure Machine Learning notebook with Serverless Spark
Compute.
Make sure you complete the previous tutorials in this series. This tutorial reuses
feature store and other resources created in those earlier tutorials.
Set up
This tutorial uses the Python feature store core SDK ( azureml-featurestore ). The Python
SDK is used for create, read, update, and delete (CRUD) operations, on feature stores,
feature sets, and feature store entities.
You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the conda.yml file covers them.
1. On the top menu, in the Compute dropdown list, select Serverless Spark Compute
under Azure Machine Learning Serverless Spark.
Python
import os
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
Initialize the CRUD client of the feature store
workspace
Initialize the MLClient for the feature store workspace, to cover the create, read, update,
and delete (CRUD) operations on the feature store workspace.
Python
# Feature store
featurestore_name = (
"<FEATURESTORE_NAME>" # use the same name that was used in the tutorial
#1
)
featurestore_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
featurestore_resource_group_name = os.environ["AZUREML_ARM_RESOURCEGROUP"]
Python
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
Custom source definition
You can define your own source loading logic from any data storage that has a custom
source definition. Implement a source processor user-defined function (UDF) class
( CustomSourceTransformer in this tutorial) to use this feature. This class should define an
__init__(self, **kwargs) function, and a process(self, start_time, end_time,
**kwargs) function. The kwargs dictionary is supplied as a part of the feature set
specification definition. This definition is then passed to the UDF. The start_time and
end_time parameters are calculated and passed to the UDF function.
Python
class CustomSourceTransformer:
def __init__(self, **kwargs):
self.path = kwargs.get("source_path")
self.timestamp_column_name = kwargs.get("timestamp_column_name")
if not self.path:
raise Exception("`source_path` is not provided")
if not self.timestamp_column_name:
raise Exception("`timestamp_column_name` is not provided")
def process(
self, start_time: datetime, end_time: datetime, **kwargs
) -> "pyspark.sql.DataFrame":
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, to_timestamp
spark = SparkSession.builder.getOrCreate()
df = spark.read.json(self.path)
if start_time:
df = df.filter(col(self.timestamp_column_name) >=
to_timestamp(lit(start_time)))
if end_time:
df = df.filter(col(self.timestamp_column_name) <
to_timestamp(lit(end_time)))
return df
Python
transactions_source_process_code_path = (
root_dir
+
"/featurestore/featuresets/transactions_custom_source/source_process_code"
)
transactions_feature_transform_code_path = (
root_dir
+
"/featurestore/featuresets/transactions_custom_source/feature_process_code"
)
udf_featureset_spec = create_feature_set_spec(
source=CustomFeatureSource(
kwargs={
"source_path":
"wasbs://[email protected]/feature-store-
prp/datasources/transactions-source-json/*.json",
"timestamp_column_name": "timestamp",
},
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
source_process_code=SourceProcessCode(
path=transactions_source_process_code_path,
process_class="source_process.CustomSourceTransformer",
),
),
feature_transformation=TransformationCode(
path=transactions_feature_transform_code_path,
transformer_class="transaction_transform.TransactionFeatureTransformer",
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
source_lookback=DateTimeOffset(days=7, hours=0, minutes=0),
temporal_join_lookback=DateTimeOffset(days=1, hours=0, minutes=0),
infer_schema=True,
)
udf_featureset_spec
Next, define a feature window, and display the feature values in this feature window.
Python
st = datetime(2023, 1, 1)
et = datetime(2023, 6, 1)
display(
udf_featureset_spec.to_spark_dataframe(
feature_window_start_date_time=st, feature_window_end_date_time=et
)
)
index_columns : The join keys required to access values from the feature set.
To learn more about the specification, see Understanding top-level entities in managed
feature store and CLI (v2) feature set YAML schema.
Feature set specification persistence offers another benefit: the feature set specification
can be source controlled.
Python
feature_spec_folder = (
root_dir + "/featurestore/featuresets/transactions_custom_source/spec"
)
udf_featureset_spec.dump(feature_spec_folder)
Register the transaction feature set with the
feature store
Use this code to register a feature set asset loaded from custom source with the feature
store. You can then reuse that asset, and easily share it. Registration of a feature set
asset offers managed capabilities, including versioning and materialization.
Python
transaction_fset_config = FeatureSet(
name="transactions_custom_source",
version="1",
description="transactions feature set loaded from custom source",
entities=["azureml:account:1"],
stage="Development",
specification=FeatureSetSpecification(path=feature_spec_folder),
tags={"data_type": "nonPII"},
)
poller =
fs_client.feature_sets.begin_create_or_update(transaction_fset_config)
print(poller.result())
Python
Python
df = transactions_fset_config.to_spark_dataframe()
display(df)
You should be able to successfully fetch the registered feature set as a Spark dataframe,
and then display it. You can now use these features for a point-in-time join with
observation data, and the subsequent steps in your machine learning pipeline.
Clean up
If you created a resource group for the tutorial, you can delete that resource group,
which deletes all the resources associated with this tutorial. Otherwise, you can delete
the resources individually:
To delete the feature store, open the resource group in the Azure portal, select the
feature store, and delete it.
The user-assigned managed identity (UAI) assigned to the feature store workspace
is not deleted when we delete the feature store. To delete the UAI, follow these
instructions.
To delete a storage account-type offline store, open the resource group in the
Azure portal, select the storage that you created, and delete it.
To delete an Azure Cache for Redis instance, open the resource group in the Azure
portal, select the instance that you created, and delete it.
Next steps
Network isolation with feature store
Azure Machine Learning feature stores samples repository
Tutorial 6: Network isolation with
feature store (preview)
Article • 09/13/2023
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and we don't recommend it for production workloads.
Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
An Azure Machine Learning managed feature store lets you discover, create, and
operationalize features. Features serve as the connective tissue in the machine learning
lifecycle, starting from the prototyping phase, where you experiment with various
features. That lifecycle continues to the operationalization phase, where you deploy your
models, and inference steps look up the feature data. For more information about
feature stores, see the feature store concepts document.
This tutorial describes how to configure secure ingress through a private endpoint, and
secure egress through a managed virtual network.
Part 1 of this tutorial series showed how to create a feature set specification with custom
transformations, and use that feature set to generate training data. Part 2 of the tutorial
series showed how to enable materialization and perform a backfill. Part 3 of this tutorial
series showed how to experiment with features, as a way to improve model
performance. Part 3 also showed how a feature store increases agility in the
experimentation and training flows. Tutorial 4 described how to run batch inference.
Tutorial 5 explained how to use feature store for online/realtime inference use cases.
Tutorial 6 shows how to
" Set up the necessary resources for network isolation of a managed feature store.
" Create a new feature store resource.
" Set up your feature store to support network isolation scenarios.
" Update your project workspace (current workspace) to support network isolation
scenarios .
Prerequisites
7 Note
This tutorial uses Azure Machine Learning notebook with Serverless Spark
Compute.
An Azure Machine Learning workspace, enabled with Managed virtual network for
serverless spark jobs.
If your workspace has an Azure Container Registry, it must use Premium SKU to
successfully complete the workspace configuration. To configure your project
workspace:
YAML
managed_network:
isolation_mode: allow_internet_outbound
cli
Your user account must have the Owner or Contributor role assigned to the
resource group where you create the feature store. Your user account also needs
the User Access Administrator role.
) Important
Set up
This tutorial uses the Python feature store core SDK ( azureml-featurestore ). The Python
SDK is used for feature set development and testing only. The CLI is used for create,
read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature
store entities. This is useful in continuous integration and continuous delivery (CI/CD) or
GitOps scenarios where CLI/YAML is preferred.
You don't need to explicitly install these resources for this tutorial, because in the set-up
instructions shown here, the conda.yaml file covers them.
1. Clone the azureml-examples repository to your local GitHub resources with this
command:
You can also download a zip file from the azureml-examples repository. At this
page, first select the code dropdown, and then select Download ZIP . Then, unzip
the contents into a folder on your local device.
Isolation for Feature store.ipynb . You may keep this document open and
4. This code cell starts the Spark session. It needs about 10 minutes to install all
dependencies and start the Spark session.
Python
# Run this cell to start the spark session (any code block will start
the session ). This can take around 10 mins.
print("start spark session")
Python
import os
# Please update your alias below (or any custom directory you have
uploaded the samples to).
# You can find the name from the directory structure in the left
navigation.
root_dir = "./Users/<your user alias>/featurestore_sample"
if os.path.isdir(root_dir):
print("The folder exists.")
else:
print("The folder does not exist. Please create or fix the path")
Authenticate
Python
# authenticate
!az login
Python
subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
7 Note
For this tutorial, you create three separate storage containers in the same ADLS Gen2
storage account:
Source data
Offline store
Observation data
1. Create an ADLS Gen2 storage account for source data, offline store, and
observation data.
a. Provide the name of an Azure Data Lake Storage Gen2 storage account in the
following code sample. You can execute the following code cell with the
provided default settings. Optionally, you can override the default settings.
Python
## Default Setting
# We use the subscription, resource group, region of this active
project workspace,
# We hard-coded default resource names for creating new resources
## Overwrite
# You can replace them if you want to create the resources in a
different subsciprtion/resourceGroup, or use existing resources
# At the minimum, provide an ADLS Gen2 storage account name for
`storage_account_name`
storage_subscription_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
storage_resource_group_name =
os.environ["AZUREML_ARM_RESOURCEGROUP"]
storage_account_name = "<STORAGE_ACCOUNT_NAME>"
storage_location = "eastus"
storage_file_system_name_offline_store = "offline-store"
storage_file_system_name_source_data = "source-data"
storage_file_system_name_observation_data = "observation-data"
b. This code cell creates the ADLS Gen2 storage account defined in the above
code cell.
Python
c. This code cell creates a new storage container for offline store.
Python
d. This code cell creates a new storage container for source data.
Python
e. This code cell creates a new storage container for observation data.
Python
2. Copy the sample data required for this tutorial series into the newly created
storage containers.
a. To write data to the storage containers, ensure that Contributor and Storage
Blob Data Contributor roles are assigned to the user identity on the created
ADLS Gen2 storage account in the Azure portal following these steps.
) Important
Once you have ensured that the Contributor and Storage Blob Data
Contributor roles are assigned to the user identity, wait for a few minutes
after role assignment to let permissions propagate before proceeding with
the next steps. To learn more about access control, see role-based access
control (RBAC) for Azure storage accounts
The following code cells copy sample source data for transactions feature set
used in this tutorial from a public storage account to the newly created storage
account.
Python
# Copy sample source data for transactions feature set used in this
tutorial series from the public storage account to the newly created
storage account
transactions_source_data_path =
"wasbs://[email protected]/feature-
store-prp/datasources/transactions-source/*.parquet"
transactions_src_df =
spark.read.parquet(transactions_source_data_path)
transactions_src_df.write.parquet(
f"abfss://{storage_file_system_name_source_data}@{storage_account_na
me}.dfs.core.windows.net/transactions-source/"
)
b. Copy sample source data for account feature set used in this tutorial from a
public storage account to the newly created storage account.
Python
# Copy sample source data for account feature set used in this
tutorial series from the public storage account to the newly created
storage account
accounts_data_path =
"wasbs://[email protected]/feature-
store-prp/datasources/accounts-precalculated/*.parquet"
accounts_data_df = spark.read.parquet(accounts_data_path)
accounts_data_df.write.parquet(
f"abfss://{storage_file_system_name_source_data}@{storage_account_na
me}.dfs.core.windows.net/accounts-precalculated/"
)
c. Copy sample observation data used for training from a public storage account
to the newly created storage account.
Python
# Copy sample observation data used for training from the public
storage account to the newly created storage account
observation_data_train_path =
"wasbs://[email protected]/feature-
store-prp/observation_data/train/*.parquet"
observation_data_train_df =
spark.read.parquet(observation_data_train_path)
observation_data_train_df.write.parquet(
f"abfss://{storage_file_system_name_observation_data}@{storage_accou
nt_name}.dfs.core.windows.net/train/"
)
d. Copy sample observation data used for batch inference from a public storage
account to the newly created storage account.
Python
observation_data_inference_df.write.parquet(
f"abfss://{storage_file_system_name_observation_data}@{storage_accou
nt_name}.dfs.core.windows.net/batch_inference/"
)
3. Disable the public network access on the newly created storage account.
a. This code cell disables public network access for the ADLS Gen2 storage
account created earlier.
Python
# Disable the public network access for the above created ADLS Gen2
storage account
!az storage account update --name $storage_account_name --resource-
group $storage_resource_group_name --subscription
$storage_subscription_id --public-network-access disabled
b. Set ARM IDs for the offline store, source data, and observation data containers.
Python
print(offline_store_gen2_container_arm_id)
source_data_gen2_container_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Sto
rage/storageAccounts/{account}/blobServices/default/containers/{cont
ainer}".format(
sub_id=storage_subscription_id,
rg=storage_resource_group_name,
account=storage_account_name,
container=storage_file_system_name_source_data,
)
print(source_data_gen2_container_arm_id)
observation_data_gen2_container_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.Sto
rage/storageAccounts/{account}/blobServices/default/containers/{cont
ainer}".format(
sub_id=storage_subscription_id,
rg=storage_resource_group_name,
account=storage_account_name,
container=storage_file_system_name_observation_data,
)
print(observation_data_gen2_container_arm_id)
a. In the following code cell, provide a name for the user-assigned managed
identity that you would like to create.
Python
Python
Python
msi_client = ManagedServiceIdentityClient(
AzureMLOnBehalfOfCredential(), uai_subscription_id
)
managed_identity = msi_client.user_assigned_identities.get(
resource_name=uai_name,
resource_group_name=uai_resource_group_name
)
uai_principal_id = managed_identity.principal_id
uai_client_id = managed_identity.client_id
uai_arm_id = managed_identity.id
Scope Action/Role
Storage account of feature store offline store Storage Blob Data Contributor role
The next CLI commands will assign the Storage Blob Data Contributor role to the
UAI. In this example, "Storage accounts of source data" doesn't apply because you
read the sample data from a public access blob storage. To use your own data
sources, you must assign the required roles to the UAI. To learn more about access
control, see role-based access control for Azure storage accounts and Azure
Machine Learning workspace.
Python
Python
Python
feature_store_arm_id =
"/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.MachineLear
ningServices/workspaces/{ws_name}".format(
sub_id=featurestore_subscription_id,
rg=featurestore_resource_group_name,
ws_name=featurestore_name,
)
Following code cell generates a YAML specification file for a feature store with
materialization enabled.
Python
config = {
"$schema": "http://azureml/sdk-2-0/FeatureStore.json",
"name": featurestore_name,
"location": featurestore_location,
"compute_runtime": {"spark_runtime_version": "3.2"},
"offline_store": {
"type": "azure_data_lake_gen2",
"target": offline_store_gen2_container_arm_id,
},
"materialization_identity": {"client_id": uai_client_id, "resource_id":
uai_arm_id},
}
feature_store_yaml = root_dir +
"/featurestore/featurestore_with_offline_setting.yaml"
Python
Python
# feature store client
from azureml.featurestore import FeatureStoreClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
featurestore = FeatureStoreClient(
credential=AzureMLOnBehalfOfCredential(),
subscription_id=featurestore_subscription_id,
resource_group_name=featurestore_resource_group_name,
name=featurestore_name,
)
Python
Follow these instructions to get the Azure AD Object ID for your user identity. Then, use
your Azure AD Object ID in the following command to assign AzureML Data Scientist
role to your user identity on the created feature store.
Python
your_aad_objectid = "<YOUR_AAD_OBJECT_ID>"
Obtain the default storage account and key vault for the
feature store, and disable public network access to the
corresponding resources
The following code cell gets the feature store object for the next steps.
Python
fs = featurestore.feature_stores.get()
This code cell gets names of default storage account and key vault for the feature store.
Python
This code cell disables public network access to the default storage account for the
feature store.
Python
# Disable the public network access for the above created default ADLS Gen2
storage account for the feature store
!az storage account update --name $default_fs_storage_account_name --
resource-group $featurestore_resource_group_name --subscription
$featurestore_subscription_id --public-network-access disabled
The following cell prints name of the default key vault for the feature store.
Python
print(default_key_vault_name)
Python
# The below code creates a configuration for managed virtual network for the
feature store
import yaml
config = {
"public_network_access": "disabled",
"managed_network": {
"isolation_mode": "allow_internet_outbound",
"outbound_rules": [
# You need to add multiple rules here if you have separate
storage account for source, observation data and offline store.
{
"name": "sourcerulefs",
"destination": {
"spark_enabled": "true",
"subresource_target": "dfs",
"service_resource_id":
f"/subscriptions/{storage_subscription_id}/resourcegroups/{storage_resource_
group_name}/providers/Microsoft.Storage/storageAccounts/{storage_account_nam
e}",
},
"type": "private_endpoint",
},
# This rule is added currently because serverless Spark doesn't
automatically create a private endpoint to default key vault.
{
"name": "defaultkeyvault",
"destination": {
"spark_enabled": "true",
"subresource_target": "vault",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Keyvault/vaults/{default_key_vault
_name}",
},
"type": "private_endpoint",
},
],
},
}
feature_store_managed_vnet_yaml = (
root_dir + "/featurestore/feature_store_managed_vnet_config.yaml"
)
with open(feature_store_managed_vnet_yaml, "w") as outfile:
yaml.dump(config, outfile, default_flow_style=False)
This code cell updates the feature store using the generated YAML specification file with
the outbound rules.
Python
Python
#### Provision network to create necessary private endpoints (it may take
approximately 20 minutes)
!az ml workspace provision-network --name $featurestore_name --resource-
group $featurestore_resource_group_name --include-spark
This code cell confirms that private endpoints defined by the outbound rules have been
created.
Python
Python
# lookup the subscription id, resource group and workspace name of the
current workspace
project_ws_sub_id = os.environ["AZUREML_ARM_SUBSCRIPTION"]
project_ws_rg = os.environ["AZUREML_ARM_RESOURCEGROUP"]
project_ws_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]
Source data
Offline store
Observation data
Feature store
Default storage account of feature store
This code cell updates the project workspace using the generated YAML specification
file with required outbound rules.
Python
# The below code creates a configuration for managed virtual network for the
project workspace
import yaml
config = {
"managed_network": {
"isolation_mode": "allow_internet_outbound",
"outbound_rules": [
# Incase you have separate storage accounts for source,
observation data and offline store, you need to add multiple rules here. No
action needed otherwise.
{
"name": "projectsourcerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "dfs",
"service_resource_id":
f"/subscriptions/{storage_subscription_id}/resourcegroups/{storage_resource_
group_name}/providers/Microsoft.Storage/storageAccounts/{storage_account_nam
e}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to default storage of feature
store
{
"name": "defaultfsstoragerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "blob",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Storage/storageAccounts/{default_f
s_storage_account_name}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to default key vault of
feature store
{
"name": "defaultfskeyvaultrule",
"destination": {
"spark_enabled": "true",
"subresource_target": "vault",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.Keyvault/vaults/{default_key_vault
_name}",
},
"type": "private_endpoint",
},
# Rule to create private endpoint to feature store
{
"name": "featurestorerule",
"destination": {
"spark_enabled": "true",
"subresource_target": "amlworkspace",
"service_resource_id":
f"/subscriptions/{featurestore_subscription_id}/resourcegroups/{featurestore
_resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces
/{featurestore_name}",
},
"type": "private_endpoint",
},
],
}
}
project_ws_managed_vnet_yaml = (
root_dir + "/featurestore/project_ws_managed_vnet_config.yaml"
)
Python
#### Update project workspace to create private endpoints for the defined
outbound rules (it may take approximately 15 minutes)
!az ml workspace update --file $project_ws_managed_vnet_yaml --name
$project_ws_name --resource-group $project_ws_rg
This code cell confirms that private endpoints defined by the outbound rules have been
created.
Python
You can also verify the outbound rules from the Azure portal by navigating to
Networking from left navigation panel for the project workspace and then opening
Workspace managed outbound access tab.
A publicly-accessible blob container hosts the sample data used in this tutorial. It
can only be read in Spark via wasbs driver. When you create feature sets using your
own source data, please host them in an ADLS Gen2 account, and use an abfss
driver in the data path.
Python
# remove the "." in the root directory path as we need to generate absolute
path to read from Spark
transactions_source_data_path =
f"abfss://{storage_file_system_name_source_data}@{storage_account_name}.dfs.
core.windows.net/transactions-source/*.parquet"
transactions_src_df = spark.read.parquet(transactions_source_data_path)
display(transactions_src_df.head(5))
# Note: display(training_df.head(5)) displays the timestamp column in a
different format. You can can call transactions_src_df.show() to see
correctly formatted value
m.py . This spark transformer performs the rolling aggregation defined for the features.
To understand the feature set and transformations in more detail, see feature store
concepts.
Python
from azureml.featurestore import create_feature_set_spec, FeatureSetSpec
from azureml.featurestore.contracts import (
DateTimeOffset,
FeatureSource,
TransformationCode,
Column,
ColumnType,
SourceType,
TimestampColumn,
)
transactions_featureset_code_path = (
root_dir + "/featurestore/featuresets/transactions/transformation_code"
)
transactions_featureset_spec = create_feature_set_spec(
source=FeatureSource(
type=SourceType.parquet,
path=f"abfss://{storage_file_system_name_source_data}@{storage_account_name}
.dfs.core.windows.net/transactions-source/*.parquet",
timestamp_column=TimestampColumn(name="timestamp"),
source_delay=DateTimeOffset(days=0, hours=0, minutes=20),
),
transformation_code=TransformationCode(
path=transactions_featureset_code_path,
transformer_class="transaction_transform.TransactionFeatureTransformer",
),
index_columns=[Column(name="accountID", type=ColumnType.string)],
source_lookback=DateTimeOffset(days=7, hours=0, minutes=0),
temporal_join_lookback=DateTimeOffset(days=1, hours=0, minutes=0),
infer_schema=True,
)
# Generate a spark dataframe from the feature set specification
transactions_fset_df = transactions_featureset_spec.to_spark_dataframe()
# display few records
display(transactions_fset_df.head(5))
To inspect the generated transactions feature set specification, open this file from the
file tree to see the specification:
featurestore/featuresets/accounts/spec/FeaturesetSpec.yaml
The specification contains these elements:
storage resource
features : a list of features and their datatypes. If you provide transformation code
index_columns : the join keys required to access values from the feature set
Python
import os
transactions_featureset_spec.dump(transactions_featureset_spec_folder)
This code cell creates an account entity for the feature store.
Python
The feature set asset references both the feature set spec that you created earlier, and
other properties like version and materialization settings.
Python
transactions_featureset_path = (
root_dir
+
"/featurestore/featuresets/transactions/featureset_asset_offline_enabled.yam
l"
)
!az ml feature-set create --file $transactions_featureset_path --resource-
group $featurestore_resource_group_name --workspace-name $featurestore_name
Python
Python
feature_window_start_time = "2023-02-01T00:00.000Z"
feature_window_end_time = "2023-03-01T00:00.000Z"
!az ml feature-set backfill --name transactions --version 1 --workspace-name
$featurestore_name --resource-group $featurestore_resource_group_name --
feature-window-start-time $feature_window_start_time --feature-window-end-
time $feature_window_end_time
This code cell checks the status of the backfill materialization job, by providing
<JOB_ID_FROM_PREVIOUS_COMMAND> .
Python
Next, This code cell lists all the materialization jobs for the current feature set.
Python
### List all the materialization jobs for the current feature set
Python
observation_data_path =
f"abfss://{storage_file_system_name_observation_data}@{storage_account_name}
.dfs.core.windows.net/train/*.parquet"
observation_data_df = spark.read.parquet(observation_data_path)
obs_data_timestamp_column = "timestamp"
display(observation_data_df)
# Note: the timestamp column is displayed in a different format. Optionally,
you can can call training_df.show() to see correctly formatted value
Python
Python
Python
more_features = featurestore.resolve_feature_uri(more_features)
features.extend(more_features)
# generate training dataframe by using feature data and observation data
training_df = get_offline_features(
features=features,
observation_data=observation_data_df,
timestamp_column=obs_data_timestamp_column,
)
You can see that a point-in-time join appended the features to the training data.
This tutorial contains a mixture of steps from tutorials 1 and 2 of this series. Remember
to replace the necessary public storage containers used in the other tutorial notebooks
with the ones created in this tutorial notebook, for the network isolation.
We have reached the end of the tutorial. Your training data uses features from a feature
store. You can either save it to storage for later use, or directly run model training on it.
Next steps
Part 3: Experiment and train models using features
Part 4: Enable recurrent materialization and run batch inference
How Azure Machine Learning works:
resources and assets
Article • 04/04/2023
This article applies to the second version of the Azure Machine Learning CLI & Python
SDK (v2). For version one (v1), see How Azure Machine Learning works: Architecture and
concepts (v1)
Azure Machine Learning includes several resources and assets to enable you to perform
your machine learning tasks. These resources and assets are needed to run any job.
Workspace
The workspace is the top-level resource for Azure Machine Learning, providing a
centralized place to work with all the artifacts you create when you use Azure Machine
Learning. The workspace keeps a history of all jobs, including logs, metrics, output, and
a snapshot of your scripts. The workspace stores references to resources like datastores
and compute. It also holds all assets like models, environments, components and data
asset.
Create a workspace
Azure CLI
Bash
Compute
A compute is a designated compute resource where you run your job or host your
endpoint. Azure Machine Learning supports the following types of compute:
7 Note
Attached compute - You can attach your own compute resources to your
workspace and use them for training and inference.
Azure CLI
Datastore
Azure Machine Learning datastores securely keep the connection information to your
data storage on Azure, so you don't have to code it in your scripts. You can register and
create a datastore to easily connect to your storage account, and access the data in your
underlying storage service. The CLI v2 and SDK v2 support the following types of cloud-
based storage services:
Azure CLI
Bash
Model
Azure machine learning models consist of the binary file(s) that represent a machine
learning model and any corresponding metadata. Models can be created from a local or
remote file or directory. For remote locations https , wasbs and azureml locations are
supported. The created model will be tracked in the workspace under the specified
name and version. Azure Machine Learning supports three types of storage format for
models:
custom_model
mlflow_model
triton_model
Creating a model
Azure CLI
Bash
Environment
Azure Machine Learning environments are an encapsulation of the environment where
your machine learning task happens. They specify the software packages, environment
variables, and software settings around your training and scoring scripts. The
environments are managed and versioned entities within your Machine Learning
workspace. Environments enable reproducible, auditable, and portable machine learning
workflows across a variety of computes.
Types of environment
Azure Machine Learning supports two types of environments: curated and custom.
Curated environments are provided by Azure Machine Learning and are available in your
workspace by default. Intended to be used as is, they contain collections of Python
packages and settings to help you get started with various machine learning
frameworks. These pre-created environments also allow for faster deployment time. For
a full list, see the curated environments article.
A docker image
A base docker image with a conda YAML to customize further
A docker build context
Azure CLI
Bash
Data
Azure Machine Learning allows you to work with different types of data:
number
For most scenarios, you'll use URIs ( uri_folder and uri_file ) - a location in storage
that can be easily mapped to the filesystem of a compute node in a job by either
mounting or downloading the storage to the node.
mltable is an abstraction for tabular data that is to be used for AutoML Jobs, Parallel
Jobs, and some advanced scenarios. If you're just starting to use Azure Machine
Learning and aren't using AutoML, we strongly encourage you to begin with URIs.
Component
An Azure Machine Learning component is a self-contained piece of code that does one
step in a machine learning pipeline. Components are the building blocks of advanced
machine learning pipelines. Components can do tasks such as data processing, model
training, model scoring, and so on. A component is analogous to a function - it has a
name, parameters, expects input, and returns output.
Next steps
How to upgrade from v1 to v2
Train models with the v2 CLI and SDK
What is an Azure Machine Learning
workspace?
Article • 04/12/2023
Create jobs - Jobs are training runs you use to build your models. You can group
jobs into experiments to compare metrics.
Author pipelines - Pipelines are reusable workflows for training and retraining your
model.
Register data assets - Data assets aid in management of the data you use for
model training and pipeline creation.
Register models - Once you have a model you want to deploy, you create a
registered model.
Create online endpoints - Use a registered model and a scoring script to create an
online endpoint.
Besides grouping your machine learning results, workspaces also host resource
configurations:
Organizing workspaces
For machine learning team leads and administrators, workspaces serve as containers for
access management, cost management and data isolation. Below are some tips for
organizing workspaces:
Use user roles for permission management in the workspace between users. For
example a data scientist, a machine learning engineer or an admin.
Assign access to user groups: By using Azure Active Directory user groups, you
don't have to add individual users to each workspace, and to other resources the
same group of users requires access to.
Create a workspace per project: While a workspace can be used for multiple
projects, limiting it to one project per workspace allows for cost reporting accrued
to a project level. It also allows you to manage configurations like datastores in the
scope of each project.
Share Azure resources: Workspaces require you to create several associated
resources. Share these resources between workspaces to save repetitive setup
steps.
Enable self-serve: Pre-create and secure associated resources as an IT admin, and
use user roles to let data scientists create workspaces on their own.
Share assets: You can share assets between workspaces using Azure Machine
Learning registries.
Associated resources
When you create a new workspace, you're required to bring other Azure resources to
store your data. If not provided by you, these resources will automatically be created by
Azure Machine Learning.
Azure Storage account . Stores machine learning artifacts such as job logs. By
default, this storage account is used when you upload data to the workspace.
Jupyter notebooks that are used with your Azure Machine Learning compute
instances are stored here as well.
) Important
Azure Container Registry . Stores created docker containers, when you build
custom environments via Azure Machine Learning. Scenarios that trigger creation
of custom environments include AutoML when deploying models and data
profiling.
7 Note
Workspaces can be created without Azure Container Registry as a dependency
if you do not have a need to build custom docker containers. To read
container images, Azure Machine Learning also works with external container
registries. Azure Container Registry is automatically provisioned when you
build custom docker images. Use Azure RBAC to prevent customer docker
containers from being built.
7 Note
If your subscription setting requires adding tags to resources under it, Azure
Container Registry (ACR) created by Azure Machine Learning will fail, since we
cannot set tags to ACR.
Azure Application Insights . Helps you monitor and collect diagnostic information
from your inference endpoints.
Azure Key Vault . Stores secrets that are used by compute targets and other
sensitive information that's needed by the workspace.
Create a workspace
There are multiple ways to create a workspace. To get started use one of the following
options:
The Azure Machine Learning studio lets you quickly create a workspace with
default settings.
Use Azure portal for a point-and-click interface with more security options.
Use the VS Code extension if you work in Visual Studio Code.
Use the Azure Machine Learning CLI or Azure Machine Learning SDK for Python for
prototyping and as part of your MLOps workflows.
On the web:
Azure Machine Learning studio
Azure Machine Learning designer
Workspace management task Portal Studio Python SDK Azure CLI VS Code
Create a workspace ✓ ✓ ✓ ✓ ✓
2 Warning
Sub resources
When you create compute clusters and compute instances in Azure Machine Learning,
sub resources are created.
VMs: provide computing power for compute instances and compute clusters,
which you use to run jobs.
Load Balancer: a network load balancer is created for each compute instance and
compute cluster to manage traffic even while the compute instance/cluster is
stopped.
Virtual Network: these help Azure resources communicate with one another, the
internet, and other on-premises networks.
Bandwidth: encapsulates all outbound data transfers across regions.
Next steps
To learn more about planning a workspace for your organization's requirements, see
Organize and set up Azure Machine Learning.
Use the search bar to find machine learning assets across all workspaces, resource
groups, and subscriptions in your organization. Your search text will be used to find
assets such as:
Jobs
Models
Components
Environments
Data
2. In the top studio titlebar, if a workspace is open, select This workspace or All
workspaces to set the search context.
3. Type your text and hit enter to trigger a 'contains' search. A contains search scans
across all metadata fields for the given asset and sorts results by relevancy score
which is determined by weightings for different column properties.
Structured search
1. Sign in to Azure Machine Learning studio .
2. In the top studio titlebar, select All workspaces.
3. Click inside the search field to display filters to create more specific search queries.
Job
Model
Component
Tags
SubmittedBy
Environment
Data
If an asset filter (job, model, component, environment, data) is present, results are
scoped to those tabs. Other filters apply to all assets unless an asset filter is also present
in the query. Similarly, free text search can be provided alongside filters, but are scoped
to the tabs chosen by asset filters, if present.
Tip
Filters search for exact matches of text. Use free text queries for a contains
search.
Quotations are required around values that include spaces or other special
characters.
If duplicate filters are provided, only the first will be recognized in search
results.
Input text of any language is supported but filter strings must match the
provided options (ex. submittedBy:).
The tags filter can accept multiple key:value pairs separated by a comma (ex.
tags:"key1:value1, key2:value2").
If you've used this feature in a previous update, a search result error may occur. Reselect
your preferred workspaces in the Directory + Subscription + Workspace tab.
) Important
Search results may be unexpected for multiword terms in other languages (ex.
Chinese characters).
Customize search results
You can create, save and share different views for your search results.
Item Description
Edit columns Add, delete, and re-order columns in the current view's search results table
Since each tab displays different columns, you customize views separately for each tab.
Next steps
What is an Azure Machine Learning workspace?
Data in Azure Machine Learning
What is an Azure Machine Learning
compute instance?
Article • 09/27/2023
Compute instances make it easy to get started with Azure Machine Learning
development and provide management and enterprise readiness capabilities for IT
administrators.
For compute instance Jupyter functionality to work, ensure that web socket
communication isn't disabled. Ensure your network allows websocket connections to
*.instances.azureml.net and *.instances.azureml.ms.
) Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
Productivity You can build and deploy models using integrated notebooks and the
following tools in Azure Machine Learning studio:
- Jupyter
- JupyterLab
- VS Code (preview)
Compute instance is fully integrated with Azure Machine Learning
Key benefits Description
workspace and studio. You can share notebooks and data with other data
scientists in the workspace.
Managed & secure Reduce your security footprint and add compliance with enterprise
security requirements. Compute instances provide robust management
policies and secure networking configurations such as:
Preconfigured for ML Save time on setup tasks with pre-configured and up-to-date ML
packages, deep learning frameworks, GPU drivers.
Fully customizable Broad support for Azure VM types including GPUs and persisted low-level
customization such as installing packages and drivers makes advanced
scenarios a breeze. You can also use setup scripts to automate
customization
You can run notebooks from your Azure Machine Learning workspace, Jupyter ,
JupyterLab , or Visual Studio Code. VS Code Desktop can be configured to access your
compute instance. Or use VS Code for the Web, directly from the browser, and without
any required installations or dependencies.
We recommend you try VS Code for the Web to take advantage of the easy integration
and rich development environment it provides. VS Code for the Web gives you many of
the features of VS Code Desktop that you love, including search and syntax highlighting
while browsing and editing. For more information about using VS Code Desktop and VS
Code for the Web, see Launch Visual Studio Code integrated with Azure Machine
Learning (preview) and Work in VS Code remotely connected to a compute instance
(preview).
You can install packages and add kernels to your compute instance.
The following tools and environments are already installed on the compute instance:
Drivers CUDA
cuDNN
NVIDIA
Blob FUSE
Azure CLI
Docker
Nginx
NCCL 2.0
Protobuf
R kernel
You can Add RStudio or Posit Workbench (formerly RStudio Workbench) when you
create the instance.
Anaconda Python
Azure Machine Learning SDK Includes azure-ai-ml and many common azure extra packages.
for Python from PyPI To see the full list,
open a terminal window on your compute instance and run
conda list -n azureml_py310_sdkv2 ^azure
Accessing files
Notebooks and Python scripts are stored in the default storage account of your
workspace in Azure file share. These files are located under your "User files" directory.
This storage makes it easy to share notebooks between compute instances. The storage
account also keeps your notebooks safely preserved when you stop or delete a compute
instance.
The Azure file share account of your workspace is mounted as a drive on the compute
instance. This drive is the default working directory for Jupyter, Jupyter Labs, RStudio,
and Posit Workbench. This means that the notebooks and other files you create in
Jupyter, JupyterLab, VS Code for Web, RStudio, or Posit are automatically stored on the
file share and available to use in other compute instances as well.
The files in the file share are accessible from all compute instances in the same
workspace. Any changes to these files on the compute instance will be reliably persisted
back to the file share.
You can also clone the latest Azure Machine Learning samples to your folder under the
user files directory in the workspace file share.
Writing small files can be slower on network drives than writing to the compute instance
local disk itself. If you're writing many small files, try using a directory directly on the
compute instance, such as a /tmp directory. Note these files won't be accessible from
other compute instances.
Don't store training data on the notebooks file share. For information on the various
options to store data, see Access data in a job.
You can use the /tmp directory on the compute instance for your temporary data.
However, don't write large files of data on the OS disk of the compute instance. OS disk
on compute instance has 128-GB capacity. You can also store temporary training data
on temporary disk mounted on /mnt. Temporary disk size is based on the VM size
chosen and can store larger amounts of data if a higher size VM is chosen. Any software
packages you install are saved on the OS disk of compute instance. Note customer
managed key encryption is currently not supported for OS disk. The OS disk for
compute instance is encrypted with Microsoft-managed keys.
Create
Follow the steps in Create resources you need to get started to create a basic compute
instance.
As an administrator, you can create a compute instance for others in the workspace.
You can also use a setup script for an automated way to customize and configure the
compute instance.
The dedicated cores per region per VM family quota and total regional quota, which
applies to compute instance creation, is unified and shared with Azure Machine Learning
training compute cluster quota. Stopping the compute instance doesn't release quota to
ensure you'll be able to restart the compute instance. Don't stop the compute instance
through the OS terminal by doing a sudo shutdown.
Compute instance comes with P10 OS disk. Temp disk type depends on the VM size
chosen. Currently, it isn't possible to change the OS disk type.
Compute target
Compute instances can be used as a training compute target similar to Azure Machine
Learning compute training clusters. But a compute instance has only a single node,
while a compute cluster can have more nodes.
A compute instance:
You can use compute instance as a local inferencing deployment target for test/debug
scenarios.
Tip
The compute instance has 120GB OS disk. If you run out of disk space and get into
an unusable state, please clear at least 5 GB disk space on OS disk (mounted on /)
through the compute instance terminal by removing files/folders and then do sudo
reboot . Temporary disk will be freed after restart; you do not need to clear space on
temp disk manually. To access the terminal go to compute list page or compute
instance details page and click on Terminal link. You can check available disk space
by running df -h on the terminal. Clear at least 5 GB space before doing sudo
reboot . Please do not stop or restart the compute instance through the Studio until
5 GB disk space has been cleared. Auto shutdowns, including scheduled start or
stop as well as idle shutdowns, will not work if the CI disk is full.
Next steps
Create resources you need to get started.
Tutorial: Train your first ML model shows how to use a compute instance with an
integrated notebook.
What are compute targets in Azure
Machine Learning?
Article • 12/06/2023
A compute target is a designated compute resource or environment where you run your
training script or host your service deployment. This location might be your local
machine or a cloud-based compute resource. Using compute targets makes it easy for
you to later change your compute environment without having to change your code.
The compute resources you use for your compute targets are attached to a workspace.
Compute resources other than the local machine are shared by users of the workspace.
Compute targets can be reused from one training job to the next. For example, after
you attach a remote VM to your workspace, you can reuse it for multiple jobs. For
machine learning pipelines, use the appropriate pipeline step for each compute target.
You can use any of the following resources for a training compute target for most jobs.
Not all resources can be used for automated machine learning, machine learning
pipelines, or designer. Azure Databricks can be used as a training resource for local runs
and machine learning pipelines, but not as a remote target for other training.
ノ Expand table
Tip
The compute instance has 120GB OS disk. If you run out of disk space, use the
terminal to clear at least 1-2 GB before you stop or restart the compute instance.
The compute target you use to host your model will affect the cost and availability of
your deployed endpoint. Use this table to choose an appropriate compute target.
ノ Expand table
7 Note
When choosing a cluster SKU, first scale up and then scale out. Start with a machine
that has 150% of the RAM your model requires, profile the result and find a
machine that has the performance you need. Once you've learned that, increase the
number of machines to fit your need for concurrent inference.
There's no need to create serverless compute. You can create Azure Machine Learning
compute instances or compute clusters from:
7 Note
Instead of creating a compute cluster, use serverless compute to offload compute
lifecycle management to Azure Machine Learning.
When created, these compute resources are automatically part of your workspace,
unlike other kinds of compute targets.
ノ Expand table
7 Note
For compute cluster make sure the minimum number of nodes is set to 0, or
use serverless compute.
For a compute instance, enable idle shutdown.
) Important
If your compute instance or compute clusters are based on any of these series,
recreate with another VM size before their retirement date to avoid service
disruption.
Azure NC-series
Azure NCv2-series
Azure ND-series
Azure NV- and NV_Promo series
When you select a node size for a managed compute resource in Azure Machine
Learning, you can choose from among select VM sizes available in Azure. Azure offers a
range of sizes for Linux and Windows for different workloads. To learn more, see VM
types and sizes.
ノ Expand table
While Azure Machine Learning supports these VM series, they might not be available in
all Azure regions. To check whether VM series are available, see Products available by
region .
7 Note
Azure Machine Learning doesn't support all VM sizes that Azure Compute supports.
To list the available VM sizes, use one of the following methods:
REST API
The Azure CLI extension 2.0 for machine learning command, az ml compute
list-sizes.
If using the GPU-enabled compute targets, it is important to ensure that the correct
CUDA drivers are installed in the training environment. Use the following table to
determine the correct CUDA version to use:
ノ Expand table
In addition to ensuring the CUDA version and hardware are compatible, also ensure that
the CUDA version is compatible with the version of the machine learning framework you
are using:
For PyTorch, you can check the compatibility by visiting Pytorch's previous versions
page .
For Tensorflow, you can check the compatibility by visiting Tensorflow's build from
source page .
Compute isolation
Azure Machine Learning compute offers VM sizes that are isolated to a specific
hardware type and dedicated to a single customer. Isolated VM sizes are best suited for
workloads that require a high degree of isolation from other customers' workloads for
reasons that include meeting compliance and regulatory requirements. Utilizing an
isolated size guarantees that your VM will be the only one running on that specific
server instance.
Standard_M128ms
Standard_F72s_v2
Standard_NC24s_v3
Standard_NC24rs_v3*
*RDMA capable
To learn more about isolation, see Isolation in the Azure public cloud.
Unmanaged compute
An unmanaged compute target is not managed by Azure Machine Learning. You create
this type of compute target outside Azure Machine Learning and then attach it to your
workspace. Unmanaged compute resources can require additional steps for you to
maintain or to improve performance for machine learning workloads.
Kubernetes
Next steps
Learn how to:
The following diagram illustrates how you can use a single Environment object in both
your job configuration (for training) and your inference and deployment configuration
(for web service deployments).
The environment, compute target and training script together form the job
configuration: the full specification of a training job.
Types of environments
Environments can broadly be divided into three categories: curated, user-managed, and
system-managed.
Curated environments are provided by Azure Machine Learning and are available in your
workspace by default. Intended to be used as is, they contain collections of Python
packages and settings to help you get started with various machine learning
frameworks. These pre-created environments also allow for faster deployment time. For
a full list, see the curated environments article.
You use system-managed environments when you want conda to manage the Python
environment for you. A new conda environment is materialized from your conda
specification on top of a base docker image.
For specific code samples, see the "Create an environment" section of How to use
environments.
Environments are also easily managed through your workspace, which allows you to:
Register environments.
Fetch environments from your workspace to use for training or deployment.
Create a new instance of an environment by editing an existing one.
View changes to your environments over time, which ensures reproducibility.
Build Docker images automatically from your environments.
For code samples, see the "Manage environments" section of How to use environments.
For local jobs, a Docker or conda environment is created based on the environment
definition. The scripts are then executed on the target compute - a local runtime
environment or local Docker engine.
The second step is optional, and the environment may instead come from the Docker
build context or base image. In this case you're responsible for installing any Python
packages, by including them in your base image, or specifying custom Docker steps.
You're also responsible for specifying the correct location for the Python executable. It is
also possible to use a custom Docker base image.
To view the details of a cached image, check the Environments page in Azure Machine
Learning studio or use MLClient.environments to get and inspect the environment.
To determine whether to reuse a cached image or build a new one, Azure Machine
Learning computes a hash value from the environment definition and compares it to
the hashes of existing environments. The hash is based on the environment definition's:
Base image
Custom docker steps
Python packages
Spark packages
The hash isn't affected by the environment name or version. If you rename your
environment or create a new one with the same settings and packages as another
environment, then the hash value will remain the same. However, environment
definition changes like adding or removing a Python package or changing a package
version will result cause the resulting hash value to change. Changing the order of
dependencies or channels in an environment will also change the hash and require a
new image build. Similarly, any change to a curated environment will result in the
creation of a new "non-curated" environment.
7 Note
You will not be able to submit any local changes to a curated environment without
changing the name of the environment. The prefixes "AzureML-" and "Microsoft"
are reserved exclusively for curated environments, and your job submission will fail
if the name starts with either of them.
The environment's computed hash value is compared with those in the Workspace and
global ACR, or on the compute target (local jobs only). If there is a match then the
cached image is pulled and used, otherwise an image build is triggered.
The following diagram shows three environment definitions. Two of them have different
names and versions but identical base images and Python packages, which results in the
same hash and corresponding cached image. The third environment has different
Python packages and versions, leading to a different hash and cached image.
Actual cached images in your workspace ACR will have names like
azureml/azureml_e9607b2514b066c851012848913ba19f with the hash appearing at the end.
) Important
every time the latest tag is updated. This helps the image receive the latest
patches and system updates.
Image patching
Microsoft is responsible for patching the base images for known security vulnerabilities.
Updates for supported images are released every two weeks, with a commitment of no
unpatched vulnerabilities older than 30 days in the latest version of the image. Patched
images are released with a new immutable tag and the :latest tag is updated to the
latest version of the patched image.
You'll need to update associated Azure Machine Learning assets to use the newly
patched image. For example, when working with a managed online endpoint, you'll
need to redeploy your endpoint to use the patched image.
If you provide your own images, you're responsible for updating them and updating the
Azure Machine Learning assets that use them.
For more information on the base images, see the following links:
Next steps
Learn how to create and use environments in Azure Machine Learning.
See the Python SDK reference documentation for the environment class.
Manage software environments in Azure
Machine Learning studio
Article • 10/01/2023
In this article, learn how to create and manage Azure Machine Learning environments in
the Azure Machine Learning studio. Use the environments to track and reproduce your
projects' software dependencies as they evolve.
For a high-level overview of how environments work in Azure Machine Learning, see
What are ML environments? For information, see How to set up a development
environment for Azure Machine Learning.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
An Azure Machine Learning workspace.
Click on an environment to see detailed information about its contents. For more
information, see Azure Machine Learning curated environments.
Create an environment
To create an environment:
You can customize the configuration file, add tags and descriptions, and review the
properties before creating the entity.
Click on the pencil icons to edit tags, descriptions, configuration files under the Context
tab.
Keep in mind that any changes to the Docker or Conda sections will create a new
version of the environment.
View logs
Click on the Build log tab within the details page to view the logs of an environment
version and the environment log analysis. Environment log analysis is a feature that
provides insight and relevant troubleshooting documentation to explain environment
definition issues or image build failures.
Build log contains the bare output from an Azure Container Registry (ACR) task or
an Image Build Compute job.
Image build analysis is an analysis of the build log used to see the cause of the
image build failure.
Environment definition analysis provides information about the environment
definition if it goes against best practices for reproducibility, supportability, or
security.
For an overview of common build failures, see How to troubleshoot for environments .
If you have feedback on the environment log analysis, file a GitHub issue .
Rebuild an environment
In the details page, click on the rebuild button to rebuild the environment. Any
unpinned package versions in your configuration files may be updated to the most
recent version with this action.
Manage Azure Machine Learning
environments with the CLI & SDK (v2)
Article • 01/03/2024
Azure Machine Learning environments define the execution environments for your jobs
or deployments and encapsulate the dependencies for your code. Azure Machine
Learning uses the environment specification to create the Docker container that your
training or scoring code runs in on the specified compute target. You can define an
environment from a conda specification, Docker image, or Docker build context.
In this article, learn how to create and manage Azure Machine Learning environments
using the SDK & CLI (v2).
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure Machine Learning workspace. If you don't have one, use the steps in the
Quickstart: Create workspace resources article to create one.
The Azure CLI and the ml extension or the Azure Machine Learning Python SDK v2:
To install the Azure CLI and extension, see Install, set up, and use the CLI (v2).
) Important
The CLI examples in this article assume that you are using the Bash (or
compatible) shell. For example, from a Linux system or Windows
Subsystem for Linux.
Bash
Bash
For more information, see Install the Python SDK v2 for Azure Machine
Learning .
Tip
For a full-featured development environment, use Visual Studio Code and the
Azure Machine Learning extension to manage Azure Machine Learning resources
and train machine learning models.
Azure CLI
Note that --depth 1 clones only the latest commit to the repository, which reduces time
to complete the operation.
Tip
Use the tabs below to select the method you want to use to work with
environments. Selecting a tab will automatically switch all the tabs in this article to
the same tab. You can select another tab at any time.
Azure CLI
When using the Azure CLI, you need identifier parameters - a subscription, resource
group, and workspace name. While you can specify these parameters for each
command, you can also set defaults that will be used for all the commands. Use the
following commands to set default values. Replace <subscription ID> , <Azure
Machine Learning workspace name> , and <resource group> with the values for your
configuration:
Azure CLI
Curated environments
There are two types of environments in Azure Machine Learning: curated and custom
environments. Curated environments are predefined environments containing popular
ML frameworks and tooling. Custom environments are user-defined and can be created
via az ml environment create .
Curated environments are provided by Azure Machine Learning and are available in your
workspace by default. Azure Machine Learning routinely updates these environments
with the latest framework version releases and maintains them for bug fixes and security
patches. They're backed by cached Docker images, which reduce job preparation cost
and model deployment time.
You can use these curated environments out of the box for training or deployment by
referencing a specific environment using the azureml:<curated-environment-name>:
<version> or azureml:<curated-environment-name>@latest syntax. You can also use them
as reference for your own custom environments by modifying the Dockerfiles that back
these curated environments.
You can see the set of available curated environments in the Azure Machine Learning
studio UI, or by using the CLI (v2) via az ml environment list .
Create an environment
You can define an environment from a Docker image, a Docker build context, and a
conda specification with Docker image.
Create an environment from a Docker image
To define an environment from a Docker image, provide the image URI of the image
hosted in a registry such as Docker Hub or Azure Container Registry.
Azure CLI
The following example is a YAML specification file for an environment defined from
a Docker image. An image from the official PyTorch repository on Docker Hub is
specified via the image property in the YAML file.
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-example
image: pytorch/pytorch:latest
description: Environment created from a Docker image.
cli
Tip
Azure Machine Learning maintains a set of CPU and GPU Ubuntu Linux-based base
images with common system dependencies. For example, the GPU images contain
Miniconda, OpenMPI, CUDA, cuDNN, and NCCL. You can use these images for your
environments, or use their corresponding Dockerfiles as reference when building
your own custom images.
For the set of base images and their corresponding Dockerfiles, see the AzureML-
Containers repo .
Azure CLI
The following example is a YAML specification file for an environment defined from
a build context. The local path to the build context folder is specified in the
build.path field, and the relative path to the Dockerfile within that build context
In this example, the build context contains a Dockerfile named Dockerfile and a
requirements.txt file that is referenced within the Dockerfile for installing Python
packages.
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-context-example
build:
path: docker-contexts/python-and-pip
cli
Azure Machine Learning will start building the image from the build context when the
environment is created. You can monitor the status of the build and view the build logs
in the studio UI.
You must also specify a base Docker image for this environment. Azure Machine
Learning will build the conda environment on top of the Docker image provided. If you
install some Python dependencies in your Docker image, those packages won't exist in
the execution environment thus causing runtime failures. By default, Azure Machine
Learning will build a Conda environment with dependencies you specified, and will
execute the job in that environment instead of using any Python libraries that you
installed on the base image.
Azure CLI
The following example is a YAML specification file for an environment defined from
a conda specification. Here the relative path to the conda file from the Azure
Machine Learning environment YAML file is specified via the conda_file property.
You can alternatively define the conda specification inline using the conda_file
property, rather than defining it in a separate file.
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-plus-conda-example
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: conda-yamls/pydata.yml
description: Environment created from a Docker image plus Conda
environment.
cli
Azure Machine Learning will build the final Docker image from this environment
specification when the environment is used in a job or deployment. You can also
manually trigger a build of the environment in the studio UI.
Manage environments
The SDK and CLI (v2) also allow you to manage the lifecycle of your Azure Machine
Learning environment assets.
List
List all the environments in your workspace:
Azure CLI
cli
az ml environment list
Azure CLI
cli
Show
Get the details of a specific environment:
Azure CLI
cli
Update
Update mutable properties of a specific environment:
Azure CLI
cli
For environments, only description and tags can be updated. All other properties
are immutable; if you need to change any of those properties you should create a
new version of the environment.
Archive
Archiving an environment will hide it by default from list queries ( az ml environment
list ). You can still continue to reference and use an archived environment in your
workflows. You can archive either all versions of an environment or only a specific
version.
If you don't specify a version, all versions of the environment under that given name will
be archived. If you create a new environment version under an archived environment
container, that new version will automatically be set as archived as well.
Azure CLI
cli
Azure CLI
cli
When you submit a training job, the building of a new environment can take several
minutes. The duration depends on the size of the required dependencies. The
environments are cached by the service. So as long as the environment definition
remains unchanged, you incur the full setup time only once.
For more information on how to use environments in jobs, see Train models.
You can also use environments for your model deployments for both online and
batch scoring. To do so, specify the environment field in the deployment YAML
configuration.
For more information on how to use environments in deployments, see Deploy and
score a machine learning model by using an online endpoint.
Next steps
Train models (create jobs)
Deploy and score a machine learning model by using an online endpoint
Environment YAML schema reference
Create custom curated Azure Container
for PyTorch (ACPT) environments in
Azure Machine Learning studio
Article • 03/21/2023
If you're looking to extend curated environment and add Hugging Face (HF)
transformers or datasets or any other external packages to be installed, Azure Machine
Learning offers to create a new env with docker context containing ACPT curated
environment as base image and additional packages on top of it as below.
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure Machine Learning workspace. If you don't have one, use the steps in the
Quickstart: Create workspace resources article to create one.
Navigate to environments
In the Azure Machine Learning studio , navigate to the "Environments" section by
selecting the "Environments" option.
Paste the docker image name that you copied in previously. Configure your
environment by declaring the base image and add any env variables you want to use
and the packages that you want to include.
Review your environment settings, add any tags if needed and select on the Create
button to create your custom environment.
That's it! You've now created a custom environment in Azure Machine Learning studio
and can use it to run your machine learning models.
Next steps
Learn more about environment objects:
What are Azure Machine Learning environments? .
Learn more about curated environments.
Learn more about training models in Azure Machine Learning.
Azure Container for PyTorch (ACPT) reference
How to create and manage files in your
workspace
Article • 04/13/2023
Learn how to create and manage the files in your Azure Machine Learning workspace.
These files are stored in the default workspace storage. Files and folders can be shared
with anyone else with read access to the workspace, and can be used from any compute
instances in the workspace.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. Create workspace resources.
Create files
To create a new file in your default folder ( Users > yourname ):
5. Name the file.
7. Select Create.
Notebooks and most text file types display in the preview section. Most other file types
don't have a preview.
Tip
If you don't see the correct preview for a notebook, make sure it has .ipynb as its
extension. Hover over the filename in the list to select ... if you need to rename the
file.
) Important
Content in notebooks and scripts can potentially read data from your sessions and
access data without your organization in Azure. Only load files from trusted
sources. For more information, see Secure code best practices.
For example, choose "Indent using spaces" if you want your editor to auto-indent with
spaces instead of tabs. Take a few moments to explore the different options you have in
the Command Palette.
Clone samples
Your workspace contains a Samples folder with notebooks designed to help you explore
the SDK and serve as examples for your own machine learning projects. Clone these
notebooks into your own folder to run and edit them.
Share files
Copy and paste the URL to share a file. Only other users of the workspace can access
this URL. Learn more about granting access to your workspace.
Delete a file
You can't delete the Samples files. These files are part of the studio and are updated
each time a new SDK is published.
You can delete files found in your Files section in any of these ways:
In the studio, select the ... at the end of a folder or file. Make sure to use a
supported browser (Microsoft Edge, Chrome, or Firefox).
Use a terminal from any compute instance in your workspace. The folder
~/cloudfiles is mapped to storage on your workspace storage account.
In either Jupyter or JupyterLab with their tools.
Next steps
Run Jupyter notebooks in your workspace
Access a compute instance terminal in your workspace
Run Jupyter notebooks in your
workspace
Article • 09/26/2023
This article shows how to run your Jupyter notebooks inside your workspace of Azure
Machine Learning studio. There are other ways to run the notebook as well: Jupyter ,
JupyterLab , and Visual Studio Code. VS Code Desktop can be configured to access
your compute instance. Or use VS Code for the Web, directly from the browser, and
without any required installations or dependencies.
We recommend you try VS Code for the Web to take advantage of the easy integration
and rich development environment it provides. VS Code for the Web gives you many of
the features of VS Code Desktop that you love, including search and syntax highlighting
while browsing and editing. For more information about using VS Code Desktop and VS
Code for the Web, see Launch Visual Studio Code integrated with Azure Machine
Learning (preview) and Work in VS Code remotely connected to a compute instance
(preview).
No matter which solution you use to run the notebook, you'll have access to all the files
from your workspace. For information on how to create and manage files, including
notebooks, see Create and manage files in your workspace.
This rest of this article shows the experience for running the notebook directly in studio.
) Important
Features marked as (preview) are provided without a service level agreement, and
it's not recommended for production workloads. Certain features might not be
supported or might have constrained capabilities. For more information, see
Supplemental Terms of Use for Microsoft Azure Previews .
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. See Create workspace resources.
Your user identity must have access to your workspace's default storage account.
Whether you can read, edit, or create notebooks depends on your access level to
your workspace. For example, a Contributor can edit the notebook, while a Reader
could only view it.
Edit a notebook
To edit a notebook, open any notebook located in the User files section of your
workspace. Select the cell you wish to edit. If you don't have any notebooks in this
section, see Create and manage files in your workspace.
You can edit the notebook without connecting to a compute instance. When you want
to run the cells in the notebook, select or create a compute instance. If you select a
stopped compute instance, it will automatically start when you run the first cell.
When a compute instance is running, you can also use code completion, powered by
Intellisense , in any Python notebook.
You can also launch Jupyter or JupyterLab from the notebook toolbar. Azure Machine
Learning doesn't provide updates and fix bugs from Jupyter or JupyterLab as they're
Open Source products outside of the boundary of Microsoft Support.
Focus mode
Use focus mode to expand your current view so you can focus on your active tabs.
Focus mode hides the Notebooks file explorer.
1. In the terminal window toolbar, select Focus mode to turn on focus mode.
Depending on your window width, the tool may be located under the ... menu item
in your toolbar.
2. While in focus mode, return to the standard view by selecting Standard view.
Code completion (IntelliSense)
IntelliSense is a code-completion aid that includes many features: List Members,
Parameter Info, Quick Info, and Complete Word. With only a few keystrokes, you can:
Share a notebook
Your notebooks are stored in your workspace's storage account, and can be shared with
others, depending on their access level to your workspace. They can open and edit the
notebook as long as they have the appropriate access. For example, a Contributor can
edit the notebook, while a Reader could only view it.
Other users of your workspace can find your notebook in the Notebooks, User files
section of Azure Machine Learning studio. By default, your notebooks are in a folder
with your username, and others can access them there.
You can also copy the URL from your browser when you open a notebook, then send to
others. As long as they have appropriate access to your workspace, they can open the
notebook.
Since you don't share compute instances, other users who run your notebook will do so
on their own compute instance.
Whether the comments pane is visible or not, you can add a comment into any code
cell:
1. Select some text in the code cell. You can only comment on text in a code cell.
2. Use the New comment thread tool to create your comment.
Text that has been commented will appear with a purple highlight in the code. When
you select a comment in the comments pane, your notebook will scroll to the cell that
contains the highlighted text.
7 Note
The new notebook contains only code cells, with all cells required to produce the same
results as the cell you selected for gathering.
In the notebook toolbar, select the menu and then File>Save and checkpoint to
manually save the notebook and it will add a checkpoint file associated with the
notebook.
Every notebook is autosaved every 30 seconds. AutoSave updates only the initial ipynb fi
le, not the checkpoint file.
Select Checkpoints in the notebook menu to create a named checkpoint and to revert
the notebook to a saved checkpoint.
Export a notebook
In the notebook toolbar, select the menu and then Export As to export the notebook as
any of the supported types:
Notebook
Python
HTML
LaTeX
The exported file is saved on your computer.
If you don't have a compute instance, use these steps to create one:
Once you're connected to a compute instance, use the toolbar to run all cells in the
notebook, or Control + Enter to run a single selected cell.
Only you can see and use the compute instances you create. Your User files are stored
separately from the VM and are shared among all compute instances in the workspace.
These actions won't change the notebook state or the values of any variables in the
notebook:
Action Result
Stop the kernel Stops any running cell. Running a cell will automatically
restart the kernel.
These actions will reset the notebook state and will reset all variables in the notebook.
Action Result
Use the kernel dropdown on the right to change to any of the installed kernels.
Manage packages
Since your compute instance has multiple kernels, make sure use %pip or %conda magic
functions , which install packages into the currently running kernel. Don't use !pip or
!conda , which refers to all packages (including packages outside the currently running
kernel).
Status indicators
An indicator next to the Compute dropdown shows its status. The status is also shown
in the dropdown itself.
Shortcut Description
O Toggle output
II Interrupt kernel
00 Restart kernel
Tab Change focus to next focusable item (when tab trap disabled)
1 Change to h1
2 Change to h2
3 Change to h3
4 Change to h4
5 Change to h5
6 Change to h6
Edit mode shortcuts
Edit mode is indicated by a text cursor prompting you to type in the editor area. When a
cell is in edit mode, you can type into the cell. Enter edit mode by pressing Enter or
select a cell's editor area. The left border of the active cell is green and hatched, and its
Run button is green. You also see the cursor prompt in the cell in Edit mode.
Using the following keystroke shortcuts, you can more easily navigate and run code in
Azure Machine Learning notebooks when in Edit mode.
Shortcut Description
Control/Command + ] Indent
Control/Command + [ Dedent
Control/Command + Z Undo
Control/Command + Y Redo
Troubleshooting
Connecting to a notebook: If you can't connect to a notebook, ensure that web
socket communication is not disabled. For compute instance Jupyter functionality
to work, web socket communication must be enabled. Ensure your network allows
websocket connections to *.instances.azureml.net and *.instances.azureml.ms.
Kernel crash: If your kernel crashed and was restarted, you can run the following
command to look at Jupyter log and find out more details: sudo journalctl -u
jupyter . If kernel issues persist, consider using a compute instance with more
memory.
Kernel not found or Kernel operations were disabled: When using the default
Python 3.8 kernel on a compute instance, you may get an error such as "Kernel not
found" or "Kernel operations were disabled". To fix, use one of the following
methods:
Create a new compute instance. This will use a new image where this problem
has been resolved.
Use the Py 3.6 kernel on the existing compute instance.
From a terminal in the default py38 environment, run pip install
ipykernel==6.6.0 OR pip install ipykernel==6.0.3
Expired token: If you run into an expired token issue, sign out of your Azure
Machine Learning studio, sign back in, and then restart the notebook kernel.
File upload limit: When uploading a file through the notebook's file explorer,
you're limited files that are smaller than 5 TB. If you need to upload a file larger
than this, we recommend that you use the SDK to upload the data to a datastore.
For more information, see Create data assets.
Next steps
Run your first experiment
Backup your file storage with snapshots
Working in secure environments
Access a compute instance terminal in
your workspace
Article • 12/28/2023
Use files from Git and version files. These files are stored in your workspace file
system, not restricted to a single compute instance.
Install packages on the compute instance.
Create extra kernels on the compute instance.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin.
A Machine Learning workspace. See Create workspace resources.
Access a terminal
To access the terminal:
4. When a compute instance is running, the terminal window for that compute
instance appears.
5. When no compute instance is running, use the Compute section on the right to
start or create a compute instance.
In addition to the steps above, you can also access the terminal from:
7 Note
Add your files and folders anywhere under the ~/cloudfiles/code/Users folder so
they will be visible in all your Jupyter environments.
To integrate Git with your Azure Machine Learning workspace, see Git integration for
Azure Machine Learning.
Install packages
Install packages from a terminal window. Install Python packages into the Python 3.8 -
AzureML environment. Install R packages into the R environment.
Or you can install packages directly in Jupyter Notebook, RStudio, or Posit Workbench
(formerly RStudio Workbench):
7 Note
For package management within a notebook, use %pip or %conda magic functions
to automatically install packages into the currently-running kernel, rather than !pip
or !conda which refers to all packages (including packages outside the currently-
running kernel)
2 Warning
While customizing the compute instance, make sure you do not delete the
azureml_py36 or azureml_py38 conda environments. Also do not delete Python
3.6 - AzureML or Python 3.8 - AzureML kernels. These are needed for
Jupyter/JupyterLab functionality.
1. Use the terminal window to create a new environment. For example, the code
below creates newenv :
shell
shell
conda activate newenv
3. Install pip and ipykernel package to the new environment and create a kernel for
that conda env
shell
1. Use the terminal window to create a new environment. For example, the code
below creates r_env :
shell
shell
It will take a few minutes before the new R kernel is ready to use. If you get an error
saying it is invalid, wait and then try again.
For more information about conda, see Using R language with Anaconda . For more
information about IRkernel, see Native R kernel for Jupyter .
2 Warning
While customizing the compute instance, make sure you do not delete the
azureml_py36 or azureml_py38 conda environments. Also do not delete Python
3.6 - AzureML or Python 3.8 - AzureML kernels. These are needed for
Jupyter/JupyterLab functionality.
To remove an added Jupyter kernel from the compute instance, you must remove the
kernelspec, and (optionally) the conda environment. You can also choose to keep the
conda environment. You must remove the kernelspec, or your kernel will still be
selectable and cause unexpected behavior.
shell
2. Remove the kernelspec, replacing UNWANTED_KERNEL with the kernel you'd like
to remove:
shell
1. Use the terminal window to list and find the conda environment:
shell
shell
Upon refresh, the kernel list in your notebooks view should reflect the changes you have
made.
Select Manage active sessions in the terminal toolbar to see a list of all active terminal
sessions and shut down the sessions you no longer need.
Learn more about how to manage sessions running on your compute at Managing
notebook and terminal sessions.
2 Warning
Make sure you close any sessions you no longer need to preserve your compute
instance's resources and optimize your performance.
Manage notebook and terminal sessions
Article • 01/19/2023
Notebook and terminal sessions run on the compute and maintain your current working
state.
When you reopen a notebook, or reconnect to a terminal session, you can reconnect to
the previous session state (including command history, execution history, and defined
variables). However, too many active sessions may slow down the performance of your
compute. With too many active sessions, you may find your terminal or notebook cell
typing lags, or terminal or notebook command execution may feel slower than
expected.
Use the session management panel in Azure Machine Learning studio to help you
manage your active sessions and optimize the performance of your compute instance.
Navigate to this session management panel from the compute toolbar of either a
terminal tab or a notebook tab.
7 Note
For optimal performance, we recommend you don’t keep more than six active
sessions - and the fewer the better.
Notebook sessions
In the session management panel, select a linked notebook name in the notebook
sessions section to reopen a notebook with its previous state.
Notebook sessions are kept active when you close a notebook tab in the Azure Machine
Learning studio. So, when you reopen a notebook you'll have access to previously
defined variables and execution state - in this case, you're benefitting from the active
notebook session.
However, keeping too many active notebook sessions can slow down the performance
of your compute. So, you should use the session management panel to shut down any
notebook sessions you no longer need.
Select Manage active sessions in the terminal toolbar to open the session management
panel and shut down the sessions you no longer need. In the following image, you can
see that the tooltip shows the count of active notebook sessions.
Terminal sessions
In the session management panel, you can select on a terminal link to reopen a terminal
tab connected to that previous terminal session.
In contrast to notebook sessions, terminal sessions are terminated when you close a
terminal tab. However, if you navigate away from the Azure Machine Learning studio
without closing a terminal tab, the session may remain open. You should be shut down
any terminal sessions you no longer need by using the session management panel.
Select Manage active sessions in the terminal toolbar to open the session management
panel and shut down the sessions you no longer need. In the following image, you can
see that the tooltip shows the count of active terminal sessions.
Next steps
How to create and manage files in your workspace
Run Jupyter notebooks in your workspace
Access a compute instance terminal in your workspace
Launch Visual Studio Code integrated
with Azure Machine Learning (preview)
Article • 06/15/2023
In this article, you learn how to launch Visual Studio Code remotely connected to an
Azure Machine Learning compute instance. Use VS Code as your integrated
development environment (IDE) with the power of Azure Machine Learning resources.
Use VS Code in the browser with VS Code for the Web, or use the VS Code desktop
application.
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
There are two ways you can connect to a compute instance from Visual Studio Code. We
recommend the first approach.
You can open VS Code from your workspace either in the browser VS Code
for the Web or desktop application VS Code Desktop.
We recommend VS Code for the Web, as you can do all your machine
learning work directly from the browser, and without any required
installations or dependencies.
2. Remote Jupyter Notebook server. This option allows you to set a compute
instance as a remote Jupyter Notebook server. This option is only available in VS
Code (Desktop).
) Important
2. Sign in to studio and select your workspace if it's not already open.
3. In the Manage preview features panel, scroll down and enable Connect compute
instances to Visual Studio Code for the Web.
VS Code for the Web provides you with a full-featured development environment
for building your machine learning projects, all from the browser and without
required installations or dependencies. And by connecting your Azure Machine
Learning compute instance, you get the rich and integrated development
experience VS Code offers, enhanced by the power of Azure Machine Learning.
Launch VS Code for the Web with one select from the Azure Machine Learning
studio, and seamlessly continue your work.
Sign in to Azure Machine Learning studio and follow the steps to launch a VS
Code (Web) browser tab, connected to your Azure Machine Learning compute
instance.
You can create the connection from either the Notebooks or Compute section of
Azure Machine Learning studio.
Notebooks
3. If the compute instance is stopped, select Start compute and wait until
it's running.
Compute
If you pick one of the click-out experiences, a new VS Code window is opened, and a
connection attempt made to the remote compute instance. When attempting to make
this connection, the following steps are taking place:
1. Authorization. Some checks are performed to make sure the user attempting to
make a connection is authorized to use the compute instance.
2. VS Code Remote Server is installed on the compute instance.
3. A WebSocket connection is established for real-time interaction.
Once the connection is established, it's persisted. A token is issued at the start of the
session, which gets refreshed automatically to maintain the connection with your
compute instance.
After you connect to your remote compute instance, use the editor to:
Author and manage files on your remote compute instance or file share .
Use the VS Code integrated terminal to run commands and applications on your
remote compute instance.
Debug your scripts and applications
Use VS Code to manage your Git repositories
Azure Machine Learning Visual Studio Code extension. For more information, see
the Azure Machine Learning Visual Studio Code Extension setup guide.
3. Choose Azure ML Compute Instances from the list of Jupyter server options.
4. Select your subscription from the list of subscriptions. If you have previously
configured your default Azure Machine Learning workspace, this step is skipped.
6. Select your compute instance from the list. If you don't have one, select Create
new Azure Machine Learning Compute Instance and follow the prompts to create
one.
7. For the changes to take effect, you have to reload Visual Studio Code.
) Important
At this point, you can continue to run cells in your Jupyter Notebook.
Tip
You can also work with Python script files (.py) containing Jupyter-like code cells.
For more information, see the Visual Studio Code Python interactive
documentation .
Next steps
Now that you've launched Visual Studio Code remotely connected to a compute
instance, you can prep your data, edit and debug your code, and submit training jobs
with the Azure Machine Learning extension.
To learn more about how to make the most of VS Code integrated with Azure Machine
Learning, see Work in VS Code remotely connected to a compute instance (preview).
Work in VS Code remotely connected to
a compute instance (preview)
Article • 05/23/2023
In this article, learn specifics of working within a VS Code remote connection to an Azure
Machine Learning compute instance. Use VS Code as your full-featured integrated
development environment (IDE) with the power of Azure Machine Learning resources.
You can work with a remote connection to your compute instance in the browser with
VS Code for the Web, or the VS Code desktop application.
We recommend VS Code for the Web, as you can do all your machine learning
work directly from the browser, and without any required installations or
dependencies.
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
) Important
Prerequisites
Before you get started, you will need:
When you use VS Code for the Web, the latest versions of these extensions are
automatically available to you. If you use the desktop application, you may need to
install them.
When you launch VS Code connected to a compute instance for the first time, make
sure you follow these steps and take a few moments to orient yourself to the tools in
your integrated development environment.
2. Once your subscriptions are listed, you can filter to the ones you use frequently.
You can also pin workspaces you use most often within the subscriptions.
3. The workspace you launched the VS Code remote connection from (the workspace
the compute instance is in) should be automatically set as the default. You can
update the default workspace from the VS Code status bar.
4. If you plan to use the Azure Machine Learning CLI, open a terminal from the menu,
and sign in to the Azure Machine Learning CLI using az login --identity .
Subsequent times you connect to this compute instance, you shouldn't have to repeat
these steps.
Connect to a kernel
There are a few ways to connect to a Jupyter kernel from VS Code. It's important to
understand the differences in behavior, and the benefits of the different approaches.
If you have already opened this notebook in Azure Machine Learning, we recommend
you connect to an existing session on the compute instance. This action reconnects to
an existing session you had for this notebook in Azure Machine Learning.
1. Locate the kernel picker in the upper right-hand corner of your notebook and
select it
2. Choose the 'Azure Machine Learning compute instance' option, and then the
'Remote' if you've connected before
While there are a few ways to connect and manage kernels in VS Code, connecting to an
existing kernel session is the recommended way to enable a seamless transition from
the Azure Machine Learning studio to VS Code. If you plan to mostly work within VS
Code, you can make use of any kernel connection approach that works for you.
Next steps
For more information on managing Jupyter kernels in VS Code, see Jupyter kernel
management .
Manage Azure Machine Learning
resources with the VS Code Extension
(preview)
Article • 04/04/2023
Learn how to manage Azure Machine Learning resources with the VS Code extension.
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning .
Visual Studio Code. If you don't have it, install it .
Azure Machine Learning extension. Follow the Azure Machine Learning VS Code
extension installation guide to set up the extension.
Create resources
The quickest way to create resources is using the extension's toolbar.
Version resources
Some resources like environments, and models allow you to make changes to a
resource and store the different versions.
To version a resource:
1. Use the existing specification file that created the resource or follow the create
resources process to create a new specification file.
2. Increment the version number in the template.
3. Right-click the specification file and select AzureML: Execute YAML.
As long as the name of the updated resource is the same as the previous version, Azure
Machine Learning picks up the changes and creates a new version.
Workspaces
For more information, see workspaces.
Create a workspace
1. In the Azure Machine Learning view, right-click your subscription node and select
Create Workspace.
2. A specification file appears. Configure the specification file.
3. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, use the > Azure ML: Create Workspace command in the command palette.
Remove workspace
1. Expand the subscription node that contains your workspace.
2. Right-click the workspace you want to remove.
3. Select whether you want to remove:
Only the workspace: This option deletes only the workspace Azure resource.
The resource group, storage accounts, and any other resources the
workspace was attached to are still in Azure.
With associated resources: This option deletes the workspace and all
resources associated with it.
Alternatively, use the > Azure ML: Remove Workspace command in the command palette.
Datastores
The extension currently supports datastores of the following types:
Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Azure File
Create a datastore
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Datastores node and select Create Datastore.
4. Choose the datastore type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, use the > Azure ML: Create Datastore command in the command palette.
Manage a datastore
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Datastores node inside your workspace.
4. Right-click the datastore you want to:
Alternatively, use the > Azure ML: Unregister Datastore and > Azure ML: View
Datastore commands respectively in the command palette.
Environments
For more information, see environments.
Create environment
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Environments node and select Create Environment.
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, use the > Azure ML: Create Environment command in the command
palette.
Alternatively, use the > Azure ML: View Environment command in the command palette.
Create job
The quickest way to create a job is by clicking the Create Job icon in the extension's
activity bar.
Alternatively, use the > Azure ML: Create Job command in the command palette.
View job
To view your job in Azure Machine Learning studio:
Alternatively, use the > Azure ML: View Experiment in Studio command respectively in
the command palette.
Alternatively, use the > Azure ML: Download Outputs and > Azure ML: Download Logs
commands respectively in the command palette.
Compute instances
For more information, see compute instances.
Alternatively, use the > Azure ML: Create Compute command in the command palette.
Alternatively, use the > Azure ML: Stop Compute instance and Restart Compute instance
commands respectively in the command palette.
Alternatively, use the AzureML: View Compute instance Properties command in the
command palette.
Alternatively, use the AzureML: Delete Compute instance command in the command
palette.
Compute clusters
For more information, see training compute targets.
Alternatively, use the > Azure ML: Create Compute command in the command palette.
Alternatively, use the > Azure ML: View Compute Properties command in the command
palette.
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Inference Clusters
For more information, see compute targets for inference.
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Attached Compute
For more information, see unmanaged compute.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach
Compute commands respectively in the command palette.
Models
For more information, see train machine learning models.
Create model
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Model.
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, use the > Azure ML: Create Model command in the command palette.
Alternatively, use the > Azure ML: View Model Properties command in the command
palette.
Download model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to download and select Download Model File.
Alternatively, use the > Azure ML: Download Model File command in the command
palette.
Delete a model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to delete and select Remove Model.
4. A prompt appears confirming you want to remove the model. Select Ok.
Alternatively, use the > Azure ML: Remove Model command in the command palette.
Endpoints
For more information, see endpdoints.
Create endpoint
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Endpoint.
4. Choose your endpoint type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, use the > Azure ML: Create Endpoint command in the command palette.
Delete endpoint
1. Expand the subscription node that contains your workspace.
2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to remove and select Remove Service.
4. A prompt appears confirming you want to remove the service. Select Ok.
Alternatively, use the > Azure ML: Remove Service command in the command palette.
Alternatively, use the > Azure ML: View Service Properties command in the command
palette.
Next steps
Train an image classification model with the VS Code extension.
MLflow and Azure Machine Learning
Article • 01/10/2024
Azure Machine Learning workspaces are MLflow-compatible, which means that you can
use Azure Machine Learning workspaces in the same way that you'd use an MLflow
server. This compatibility has the following advantages:
Azure Machine Learning doesn't host MLflow server instances under the hood;
rather, the workspace can speak the MLflow API language.
You can use Azure Machine Learning workspaces as your tracking server for any
MLflow code, whether it runs on Azure Machine Learning or not. You only need to
configure MLflow to point to the workspace where the tracking should happen.
You can run any training routine that uses MLflow in Azure Machine Learning
without any change.
Tip
Unlike the Azure Machine Learning SDK v1, there's no logging functionality in the
SDK v2. We recommend that you use MLflow for logging, so that your training
routines are cloud-agnostic and portable—removing any dependency your code
has on Azure Machine Learning.
Track machine learning experiments and models running locally or in the cloud.
Track Azure Databricks machine learning experiments.
Track Azure Synapse Analytics machine learning experiments.
Example notebooks
Training and tracking an XGBoost classifier with MLflow : Demonstrates how to
track experiments by using MLflow, log models, and combine multiple flavors into
pipelines.
Training and tracking an XGBoost classifier with MLflow using service principal
authentication : Demonstrates how to track experiments by using MLflow from a
compute that's running outside Azure Machine Learning. The example shows how
to authenticate against Azure Machine Learning services by using a service
principal.
Hyper-parameter optimization using HyperOpt and nested runs in MLflow :
Demonstrates how to use child runs in MLflow to do hyper-parameter optimization
for models by using the popular library Hyperopt . The example shows how to
transfer metrics, parameters, and artifacts from child runs to parent runs.
Logging models with MLflow : Demonstrates how to use the concept of models,
instead of artifacts, with MLflow. The example also shows how to construct custom
models.
Manage runs and experiments with MLflow : Demonstrates how to query
experiments, runs, metrics, parameters, and artifacts from Azure Machine Learning
by using MLflow.
To learn about using the MLflow tracking client with Azure Machine Learning, view the
examples in Train R models using the Azure Machine Learning CLI (v2) .
To learn about using the MLflow tracking client with Azure Machine Learning, view the
Java example that uses the MLflow tracking client with Azure Machine Learning .
To learn more about how to manage models by using the MLflow API in Azure Machine
Learning, view Manage model registries in Azure Machine Learning with MLflow.
Example notebook
Manage model registries with MLflow : Demonstrates how to manage models in
registries by using MLflow.
To learn more about deploying MLflow models to Azure Machine Learning for both real-
time and batch inferencing, see Guidelines for deploying MLflow models.
Example notebooks
Deploy MLflow to online endpoints : Demonstrates how to deploy models in
MLflow format to online endpoints using the MLflow SDK.
Deploy MLflow to online endpoints with safe rollout : Demonstrates how to
deploy models in MLflow format to online endpoints, using the MLflow SDK with
progressive rollout of models. The example also shows deployment of multiple
versions of a model to the same endpoint.
Deploy MLflow to web services (V1) : Demonstrates how to deploy models in
MLflow format to web services (ACI/AKS v1) using the MLflow SDK.
Deploy models trained in Azure Databricks to Azure Machine Learning with
MLflow : Demonstrates how to train models in Azure Databricks and deploy them
in Azure Machine Learning. The example also covers how to handle cases where
you also want to track the experiments with the MLflow instance in Azure
Databricks.
) Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
You can submit training jobs to Azure Machine Learning by using MLflow projects
(preview). You can submit jobs locally with Azure Machine Learning tracking or migrate
your jobs to the cloud via Azure Machine Learning compute.
To learn how to submit training jobs with MLflow Projects that use Azure Machine
Learning workspaces for tracking, see Train machine learning models with MLflow
projects and Azure Machine Learning.
Example notebooks
Track an MLflow project in Azure Machine Learning workspaces .
Train and run an MLflow project on Azure Machine Learning jobs .
ノ Expand table
7 Note
1
Only artifacts and models can be downloaded.
2
Possible by using MLflow projects (preview).
3
Some operations may not be supported. View Manage model registries in
Azure Machine Learning with MLflow for details.
4
Deployment of MLflow models for batch inference by using the MLflow SDK
is not possible at the moment. As an alternative, see Deploy and run MLflow
models in Spark jobs.
Related content
From artifacts to models in MLflow.
Configure MLflow for Azure Machine Learning.
Migrate logging from SDK v1 to MLflow
Track ML experiments and models with MLflow.
Log MLflow models.
Guidelines for deploying MLflow models.
From artifacts to models in MLflow
Article • 12/21/2023
The following article explains the differences between an MLflow artifact and an MLflow
model, and how to transition from one to the other. It also explains how Azure Machine
Learning uses the concept of an MLflow model to enable streamlined deployment
workflows.
Artifact
An artifact is any file that's generated (and captured) from an experiment's run or job.
An artifact could represent a model serialized as a pickle file, the weights of a PyTorch or
TensorFlow model, or even a text file containing the coefficients of a linear regression.
Some artifacts could also have nothing to do with the model itself; rather, they could
contain configurations to run the model, or preprocessing information, or sample data,
and so on. Artifacts can come in various formats.
Python
filename = 'model.pkl'
with open(filename, 'wb') as f:
pickle.dump(model, f)
mlflow.log_artifact(filename)
Model
A model in MLflow is also an artifact. However, we make stronger assumptions about
this type of artifact. Such assumptions provide a clear contract between the saved files
and what they mean. When you log your models as artifacts (simple files), you need to
know what the model builder meant for each of those files so as to know how to load
the model for inference. On the contrary, MLflow models can be loaded using the
contract specified in the The MLmodel format.
You can deploy them to real-time or batch endpoints without providing a scoring
script or an environment.
When you deploy models, the deployments automatically have a swagger
generated, and the Test feature can be used in Azure Machine Learning studio.
You can use the models directly as pipeline inputs.
You can use the Responsible AI dashboard with your models.
Python
import mlflow
mlflow.sklearn.log_model(sklearn_estimator, "classifier")
The following screenshot shows a sample MLflow model's folder in the Azure Machine
Learning studio. The model is placed in a folder called credit_defaults_model . There is
no specific requirement on the naming of this folder. The folder contains the MLmodel
file among other model artifacts.
The following code is an example of what the MLmodel file for a computer vision model
trained with fastai might look like:
MLmodel
YAML
artifact_path: classifier
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12
model_uuid: e694c68eba484299976b06ab9058f636
run_id: e13da8ac-b1e6-45d4-a9b2-6a0a5cfac537
signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, 300, 300, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'
Model flavors
Considering the large number of machine learning frameworks available to use, MLflow
introduced the concept of flavor as a way to provide a unique contract to work across all
machine learning frameworks. A flavor indicates what to expect for a given model that's
created with a specific framework. For instance, TensorFlow has its own flavor, which
specifies how a TensorFlow model should be persisted and loaded. Because each model
flavor indicates how to persist and load the model for a given framework, the MLmodel
format doesn't enforce a single serialization mechanism that all models must support.
This decision allows each flavor to use the methods that provide the best performance
or best support according to their best practices—without compromising compatibility
with the MLmodel standard.
The following code is an example of the flavors section for an fastai model.
YAML
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12
Model signature
A model signature in MLflow is an important part of the model's specification, as it
serves as a data contract between the model and the server running the model. A model
signature is also important for parsing and enforcing a model's input types at
deployment time. If a signature is available, MLflow enforces input types when data is
submitted to your model. For more information, see MLflow signature enforcement .
Signatures are indicated when models get logged, and they're persisted in the
signature section of the MLmodel file. The Autolog feature in MLflow automatically
infers signatures in a best effort way. However, you might have to log the models
manually if the inferred signatures aren't the ones you need. For more information, see
How to log models with signatures .
Column-based signature: This signature operates on tabular data. For models with
this type of signature, MLflow supplies pandas.DataFrame objects as inputs.
Tensor-based signature: This signature operates with n-dimensional arrays or
tensors. For models with this signature, MLflow supplies numpy.ndarray as inputs
(or a dictionary of numpy.ndarray in the case of named-tensors).
The following example corresponds to a computer vision model trained with fastai .
This model receives a batch of images represented as tensors of shape (300, 300, 3)
with the RGB representation of them (unsigned integers). The model outputs batches of
predictions (probabilities) for two classes.
MLmodel
YAML
signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, 300, 300, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'
Tip
Model environment
Requirements for the model to run are specified in the conda.yaml file. MLflow can
automatically detect dependencies or you can manually indicate them by calling the
mlflow.<flavor>.log_model() method. The latter can be useful if the libraries included in
The following code is an example of an environment used for a model created with the
fastai framework:
conda.yaml
YAML
channels:
- conda-forge
dependencies:
- python=3.8.5
- pip
- pip:
- mlflow
- astunparse==1.6.3
- cffi==1.15.0
- configparser==3.7.4
- defusedxml==0.7.1
- fastai==2.4.1
- google-api-core==2.7.1
- ipython==8.2.0
- psutil==5.9.0
name: mlflow-env
7 Note
What's the difference between an MLflow environment and an Azure Machine
Learning environment?
While an MLflow environment operates at the level of the model, an Azure Machine
Learning environment operates at the level of the workspace (for registered
environments) or jobs/deployments (for anonymous environments). When you
deploy MLflow models in Azure Machine Learning, the model's environment is built
and used for deployment. Alternatively, you can override this behavior with the
Azure Machine Learning CLI v2 and deploy MLflow models using a specific Azure
Machine Learning environment.
Predict function
All MLflow models contain a predict function. This function is called when a model is
deployed using a no-code-deployment experience. What the predict function returns
(for example, classes, probabilities, or a forecast) depend on the framework (that is, the
flavor) used for training. Read the documentation of each flavor to know what they
return.
In same cases, you might need to customize this predict function to change the way
inference is executed. In such cases, you need to log models with a different behavior in
the predict method or log a custom model's flavor.
MLflow provides a consistent way to load these models regardless of the location.
Load back the same object and types that were logged: You can load models
using the MLflow SDK and obtain an instance of the model with types belonging
to the training library. For example, an ONNX model returns a ModelProto while a
decision tree model trained with scikit-learn returns a DecisionTreeClassifier
object. Use mlflow.<flavor>.load_model() to load back the same model object and
types that were logged.
Load back a model for running inference: You can load models using the MLflow
SDK and obtain a wrapper where MLflow guarantees that there will be a predict
function. It doesn't matter which flavor you're using, every MLflow model has a
predict function. Furthermore, MLflow guarantees that this function can be called
type conversion to the input type that the model expects. Use
mlflow.pyfunc.load_model() to load back a model for running inference.
Related content
Configure MLflow for Azure Machine Learning
How to log MLFlow models
Guidelines for deploying MLflow models
Configure MLflow for Azure Machine
Learning
Article • 03/10/2023
Azure Machine Learning workspaces are MLflow-compatible, which means they can act
as an MLflow server without any extra configuration. Each workspace has an MLflow
tracking URI that can be used by MLflow to connect to the workspace. Azure Machine
Learning workspaces are already configured to work with MLflow so no extra
configuration is required.
However, if you are working outside of Azure Machine Learning (like your local machine,
Azure Synapse Analytics, or Azure Databricks) you need to configure MLflow to point to
the workspace. In this article, you'll learn how you can configure MLflow to connect to
an Azure Machine Learning for tracking, registries, and deployment.
) Important
Prerequisites
You need the following prerequisites to follow this tutorial:
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Azure CLI
Bash
b. You can get the tracking URI using the az ml workspace command:
Bash
Python
import mlflow
mlflow.set_tracking_uri(mlflow_tracking_uri)
Tip
Configure authentication
Once the tracking is set, you'll also need to configure how the authentication needs to
happen to the associated workspace. By default, the Azure Machine Learning plugin for
MLflow will perform interactive authentication by opening the default browser to
prompt for credentials.
The Azure Machine Learning plugin for MLflow supports several authentication
mechanisms through the package azure-identity , which is installed as a dependency
for the plugin azureml-mlflow . The following authentication methods are tried one by
one until one of them succeeds:
2 Warning
Interactive browser authentication will block code execution when prompting for
credentials. It is not a suitable option for authentication in unattended
environments like training jobs. We recommend to configure other authentication
mode.
For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.
MLflow SDK
Python
import os
os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"
Tip
If you'd rather use a certificate instead of a secret, you can configure the environment
variables AZURE_CLIENT_CERTIFICATE_PATH to the path to a PEM or PKCS12 certificate file
(including private key) and AZURE_CLIENT_CERTIFICATE_PASSWORD with the password of the
certificate file, if any.
Microsoft.MachineLearningServices/workspaces/jobs/* .
Grant access for the service principal you created or user account to your workspace as
explained at Grant access.
Troubleshooting authentication
MLflow will try to authenticate to Azure Machine Learning on the first operation
interacting with the service, like mlflow.set_experiment() or mlflow.start_run() . If you
find issues or unexpected authentication prompts during the process, you can increase
the logging level to get more details about the error:
Python
import logging
logging.getLogger("azure").setLevel(logging.DEBUG)
Tip
When submitting jobs using Azure Machine Learning CLI v2, you can set the
experiment name using the property experiment_name in the YAML definition of the
job. You don't have to configure it on your training script. See YAML: display name,
experiment name, description, and tags for details.
MLflow SDK
To configure the experiment you want to work on use MLflow command
mlflow.set_experiment() .
Python
experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)
MLflow SDK
Python
import os
os.environ["AZUREML_CURRENT_CLOUD"] = "AzureChinaCloud"
You can identify the cloud you are using with the following Azure CLI command:
Bash
az cloud list
Next steps
Now that your environment is connected to your workspace in Azure Machine Learning,
you can start to work with it.
Tracking refers to process of saving all experiment's related information that you may
find relevant for every experiment you run. Such metadata varies based on your project,
but it may include:
" Code
" Environment details (OS version, Python packages)
" Input data
" Parameter configurations
" Models
" Evaluation metrics
" Evaluation visualizations (confusion matrix, importance plots)
" Evaluation results (including some evaluation predictions)
Some of these elements are automatically tracked by Azure Machine Learning when
working with jobs (including code, environment, and input and output data). However,
others like models, parameters, and metrics, need to be instrumented by the model
builder as it's specific to the particular scenario.
In this article, you'll learn how to use MLflow for tracking your experiments and runs in
Azure Machine Learning workspaces.
7 Note
Why MLflow
Azure Machine Learning workspaces are MLflow-compatible, which means you can use
MLflow to track runs, metrics, parameters, and artifacts with your Azure Machine
Learning workspaces. By using MLflow for tracking, you don't need to change your
training routines to work with Azure Machine Learning or inject any cloud-specific
syntax, which is one of the main advantages of the approach.
See MLflow and Azure Machine Learning for all supported MLflow and Azure Machine
Learning functionality including MLflow Project support (preview) and model
deployment.
Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Working interactively
Python
experiment_name = 'hello-world-example'
mlflow.set_experiment(experiment_name)
Working interactively
When working interactively, MLflow starts tracking your training routine as soon as
you try to log information that requires an active run. For instance, when you log a
metric, log a parameter, or when you start a training cycle when Mlflow's
autologging functionality is enabled. However, it's usually helpful to start the run
explicitly, specially if you want to capture the total time of your experiment in the
field Duration. To start the run explicitly, use mlflow.start_run() .
Regardless if you started the run manually or not, you'll eventually need to stop the
run to inform MLflow that your experiment run has finished and marks its status as
Completed. To do that, all mlflow.end_run() . We strongly recommend starting runs
manually so you don't forget to end them when working on notebooks.
Python
mlflow.start_run()
# Your code
mlflow.end_run()
To help you avoid forgetting to end the run, it's usually helpful to use the context
manager paradigm:
Python
Python
Autologging
You can log metrics, parameters and files with MLflow manually. However, you can also
rely on MLflow automatic logging capability. Each machine learning framework
supported by MLflow decides what to track automatically for you.
To enable automatic logging insert the following code before your training code:
Python
mlflow.autolog()
View metrics and artifacts in your workspace
The metrics and artifacts from MLflow logging are tracked in your workspace. To view
them anytime, navigate to your workspace and find the experiment by name in your
workspace in Azure Machine Learning studio .
Select the logged metrics to render charts on the right side. You can customize the
charts by applying smoothing, changing the color, or plotting multiple metrics on a
single graph. You can also resize and rearrange the layout as you wish. Once you've
created your desired view, you can save it for future use and share it with your
teammates using a direct link.
You can also access or query metrics, parameters and artifacts programatically using
the MLflow SDK. Use mlflow.get_run() as explained bellow:
Python
import mlflow
run = mlflow.get_run("<RUN_ID>")
metrics = run.data.metrics
params = run.data.params
tags = run.data.tags
Tip
For metrics, the previous example will only return the last value of a given metric. If
you want to retrieve all the values of a given metric, use mlflow.get_metric_history
method as explained at Getting params and metrics from a run.
To download artifacts you've logged, like files and models, you can use
mlflow.artifacts.download_artifacts()
Python
mlflow.artifacts.download_artifacts(run_id="<RUN_ID>",
artifact_path="helloworld.txt")
For more details about how to retrieve or compare information from experiments and
runs in Azure Machine Learning using MLflow view Query & compare experiments and
runs with MLflow
Example notebooks
If you're looking for examples about how to use MLflow in Jupyter notebooks, please
see our example's repository Using MLflow (Jupyter Notebooks) .
Limitations
Some methods available in the MLflow API may not be available when connected to
Azure Machine Learning. For details about supported and unsupported operations
please read Support matrix for querying runs and experiments.
Next steps
Deploy MLflow models.
Manage models with MLflow.
Track Azure Databricks ML experiments
with MLflow and Azure Machine
Learning
Article • 02/24/2023
MLflow is an open-source library for managing the life cycle of your machine learning
experiments. You can use MLflow to integrate Azure Databricks with Azure Machine
Learning to ensure you get the best from both of the products.
" The required libraries needed to use MLflow with Azure Databricks and Azure
Machine Learning.
" How to track Azure Databricks runs with MLflow in Azure Machine Learning.
" How to log models with MLflow to get them registered in Azure Machine Learning.
" How to deploy and consume models registered in Azure Machine Learning.
Prerequisites
Install the azureml-mlflow package, which handles the connectivity with Azure
Machine Learning, including authentication.
An Azure Databricks workspace and cluster.
Create an Azure Machine Learning Workspace.
See which access permissions you need to perform your MLflow operations with
your workspace.
Example notebooks
The Training models in Azure Databricks and deploying them on Azure Machine
Learning demonstrates how to train models in Azure Databricks and deploy them in
Azure Machine Learning. It also includes how to handle cases where you also want to
track the experiments and models with the MLflow instance in Azure Databricks and
leverage Azure Machine Learning for deployment.
Install libraries
To install libraries on your cluster, navigate to the Libraries tab and select Install New
In the Package field, type azureml-mlflow and then select install. Repeat this step as
necessary to install other additional packages to your cluster for your experiment.
Track in both Azure Databricks workspace and Azure Machine Learning workspace
(dual-tracking)
Track exclusively on Azure Machine Learning
By default, dual-tracking is configured for you when you linked your Azure Databricks
workspace.
Dual-tracking on Azure Databricks and Azure Machine
Learning
Linking your ADB workspace to your Azure Machine Learning workspace enables you to
track your experiment data in the Azure Machine Learning workspace and Azure
Databricks workspace at the same time. This is referred as Dual-tracking.
2 Warning
2 Warning
To link your ADB workspace to a new or existing Azure Machine Learning workspace,
You can use then MLflow in Azure Databricks in the same way as you're used to. The
following example sets the experiment name as it is usually done in Azure Databricks
and start logging some parameters:
Python
import mlflow
experimentName = "/Users/{user_name}/{experiment_folder}/{experiment_name}"
mlflow.set_experiment(experimentName)
with mlflow.start_run():
mlflow.log_param('epochs', 20)
pass
7 Note
2 Warning
For private link enabled Azure Machine Learning workspace, you have to deploy
Azure Databricks in your own network (VNet injection) to ensure proper
connectivity.
You have to configure the MLflow tracking URI to point exclusively to Azure Machine
Learning, as it is demonstrated in the following example:
Azure CLI
Bash
b. You can get the tracking URI using the az ml workspace command:
Bash
Then the method set_tracking_uri() points the MLflow tracking URI to that
URI.
Python
import mlflow
mlflow.set_tracking_uri(mlflow_tracking_uri)
Tip
Configure authentication
Once the tracking is configured, you'll also need to configure how the authentication
needs to happen to the associated workspace. By default, the Azure Machine Learning
plugin for MLflow will perform interactive authentication by opening the default
browser to prompt for credentials. Refer to Configure MLflow for Azure Machine
Learning: Configure authentication to additional ways to configure authentication for
MLflow in Azure Machine Learning workspaces.
For interactive jobs where there's a user connected to the session, you can rely on
Interactive Authentication and hence no further action is required.
2 Warning
Interactive browser authentication will block code execution when prompting for
credentials. It is not a suitable option for authentication in unattended
environments like training jobs. We recommend to configure other authentication
mode.
For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.
MLflow SDK
Python
import os
os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"
Tip
Python
mlflow.set_experiment(experiment_name="experiment-name")
Tracking parameters, metrics and artifacts
You can use then MLflow in Azure Databricks in the same way as you're used to. For
details see Log & view metrics and log files.
associated with the model. Learn what model flavors are supported . In the following
example, a model created with the Spark library MLLib is being registered:
Python
It's worth to mention that the flavor spark doesn't correspond to the fact that we are
training a model in a Spark cluster but because of the training framework it was used
(you can perfectly train a model using TensorFlow with Spark and hence the flavor to
use would be tensorflow ).
Models are logged inside of the run being tracked. That means that models are available
in either both Azure Databricks and Azure Machine Learning (default) or exclusively in
Azure Machine Learning if you configured the tracking URI to point to it.
) Important
Notice that here the parameter registered_model_name has not been specified.
Read the section Registering models in the registry with MLflow for more details
about the implications of such parameter and how the registry works.
Python
If a registered model with the name doesn’t exist, the method registers a new
model, creates version 1, and returns a ModelVersion MLflow object.
If a registered model with the name already exists, the method creates a new
model version and returns the version object.
However, if you want to continue using the dual-tracking capabilities but register
models in Azure Machine Learning, you can instruct MLflow to use Azure Machine
Learning for model registries by configuring the MLflow Model Registry URI. This URI
has the exact same format and value that the MLflow tracking URI.
Python
mlflow.set_registry_uri(azureml_mlflow_uri)
7 Note
The value of azureml_mlflow_uri was obtained in the same way it was demostrated
in Set MLflow Tracking to only track in your Azure Machine Learning workspace
For a complete example about this scenario please check the example Training models
in Azure Databricks and deploying them on Azure Machine Learning .
Deploying and consuming models registered in
Azure Machine Learning
Models registered in Azure Machine Learning Service using MLflow can be consumed
as:
MLFlow model objects or Pandas UDFs, which can be used in Azure Databricks
notebooks in streaming or batch pipelines.
) Important
If your model was trained and built with Spark libraries (like MLLib ), use
mlflow.pyfunc.spark_udf to load a model and used it as a Spark Pandas UDF to
cluster driver. Notice that in this way, any parallelization or work distribution you
want to happen in the cluster needs to be orchestrated by you. Also, notice that
MLflow doesn't install any library your model requires to run. Those libraries need
to be installed in the cluster before running it.
The following example shows how to load a model from the registry named uci-heart-
classifier and used it as a Spark Pandas UDF to score new data.
Python
model_name = "uci-heart-classifier"
model_uri = "models:/"+model_name+"/latest"
Tip
Check Loading models from registry for more ways to reference models from the
registry.
Once the model is loaded, you can use to score new data:
Python
#Make Prediction
preds = (scoreDf
.withColumn('target_column_name', pyfunc_udf('Input_column1',
'Input_column2', ' Input_column3', …))
)
display(preds)
Clean up resources
If you wish to keep your Azure Databricks workspace, but no longer need the Azure
Machine Learning workspace, you can delete the Azure Machine Learning workspace.
This action results in unlinking your Azure Databricks workspace and the Azure Machine
Learning workspace.
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to
delete them individually is unavailable at this time. Instead, delete the resource group
that contains the storage account and workspace, so you don't incur any charges:
Next steps
Deploy MLflow models as an Azure web service.
Manage your models.
Track experiment jobs with MLflow and Azure Machine Learning.
Learn more about Azure Databricks and MLflow.
Track Azure Synapse Analytics ML
experiments with MLflow and Azure
Machine Learning
Article • 02/24/2023
In this article, learn how to enable MLflow to connect to Azure Machine Learning while
working in an Azure Synapse Analytics workspace. You can leverage this configuration
for tracking, model management and model deployment.
MLflow is an open-source library for managing the life cycle of your machine learning
experiments. MLFlow Tracking is a component of MLflow that logs and tracks your
training run metrics and model artifacts. Learn more about MLflow.
If you have an MLflow Project to train with Azure Machine Learning, see Train ML
models with MLflow Projects and Azure Machine Learning (preview).
Prerequisites
An Azure Synapse Analytics workspace and cluster.
An Azure Machine Learning Workspace.
Install libraries
To install libraries on your dedicated cluster in Azure Synapse Analytics:
1. Create a requirements.txt file with the packages your experiments requires, but
making sure it also includes the following packages:
requirements.txt
pip
mlflow
azureml-mlflow
azure-ai-ml
Azure CLI
b. You can get the tracking URI using the az ml workspace command:
Bash
Then the method set_tracking_uri() points the MLflow tracking URI to that
URI.
Python
import mlflow
mlflow.set_tracking_uri(mlflow_tracking_uri)
Tip
Configure authentication
Once the tracking is configured, you'll also need to configure how the authentication
needs to happen to the associated workspace. By default, the Azure Machine Learning
plugin for MLflow will perform interactive authentication by opening the default
browser to prompt for credentials. Refer to Configure MLflow for Azure Machine
Learning: Configure authentication to additional ways to configure authentication for
MLflow in Azure Machine Learning workspaces.
For interactive jobs where there's a user connected to the session, you can rely on
Interactive Authentication and hence no further action is required.
2 Warning
Interactive browser authentication will block code execution when prompting for
credentials. It is not a suitable option for authentication in unattended
environments like training jobs. We recommend to configure other authentication
mode.
For those scenarios where unattended execution is required, you'll have to configure a
service principal to communicate with Azure Machine Learning.
MLflow SDK
Python
import os
os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"
Tip
Python
mlflow.set_experiment(experiment_name="experiment-name")
Python
mlflow.spark.log_model(model,
artifact_path = "model",
registered_model_name = "model_name")
If a registered model with the name doesn’t exist, the method registers a new
model, creates version 1, and returns a ModelVersion MLflow object.
If a registered model with the name already exists, the method creates a new
model version and returns the version object.
You can manage models registered in Azure Machine Learning using MLflow. View
Manage models registries in Azure Machine Learning with MLflow for more details.
MLFlow model objects or Pandas UDFs, which can be used in Azure Synapse
Analytics notebooks in streaming or batch pipelines.
Deploy models to Azure Machine Learning endpoints
You can leverage the azureml-mlflow plugin to deploy a model to your Azure Machine
Learning workspace. Check How to deploy MLflow models page for a complete detail
about how to deploy models to the different targets.
) Important
Python
#Make Prediction
preds = (scoreDf
.withColumn('target_column_name', pyfunc_udf('Input_column1',
'Input_column2', ' Input_column3', …))
)
display(preds)
Clean up resources
If you wish to keep your Azure Synapse Analytics workspace, but no longer need the
Azure Machine Learning workspace, you can delete the Azure Machine Learning
workspace. If you don't plan to use the logged metrics and artifacts in your workspace,
the ability to delete them individually is unavailable at this time. Instead, delete the
resource group that contains the storage account and workspace, so you don't incur any
charges:
Next steps
Track experiment runs with MLflow and Azure Machine Learning.
Deploy MLflow models in Azure Machine Learning.
Manage your models with MLflow.
Train with MLflow Projects in Azure
Machine Learning (preview)
Article • 07/06/2023
In this article, learn how to submit training jobs with MLflow Projects that use Azure
Machine Learning workspaces for tracking. You can submit jobs and only track them
with Azure Machine Learning or migrate your runs to the cloud to run completely on
Azure Machine Learning Compute.
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
MLflow Projects allow for you to organize and describe your code to let other data
scientists (or automated tools) run it. MLflow Projects with Azure Machine Learning
enable you to track and manage your training runs in your workspace.
2 Warning
Support for MLflow Projects in Azure Machine Learning will end on September 30,
2023. You'll be able to submit MLflow Projects ( MLproject files) to Azure Machine
Learning until that date.
We recommend that you transition to Azure Machine Learning Jobs, using either
the Azure CLI or the Azure Machine Learning SDK for Python (v2) before September
2026, when MLflow Projects will be fully retired in Azure Machine Learning. For
more information on Azure Machine Learning jobs, see Track ML experiments and
models with MLflow.
Learn more about the MLflow and Azure Machine Learning integration.
Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Using Azure Machine Learning as backend for MLflow projects requires the
package azureml-core :
Bash
conda.yaml
YAML
name: mlflow-example
channels:
- defaults
dependencies:
- numpy>=1.14.3
- pandas>=1.0.0
- scikit-learn
- pip:
- mlflow
- azureml-mlflow
2. Submit the local run and ensure you set the parameter backend = "azureml" , which
adds support of automatic tracking, model's capture, log files, snapshots, and
printed errors in your workspace. In this example we assume the MLflow project
you are trying to run is in the same folder you currently are, uri="." .
MLflow CLI
Bash
View your runs and metrics in the Azure Machine Learning studio .
1. Create the backend configuration object, in this case we are going to indicate
COMPUTE . This parameter references the name of your remote compute cluster you
want to use for running your project. If COMPUTE is present, the project will be
automatically submitted as an Azure Machine Learning job to the indicated
compute.
MLflow CLI
backend_config.json
JSON
{
"COMPUTE": "cpu-cluster"
}
conda.yaml
YAML
name: mlflow-example
channels:
- defaults
dependencies:
- numpy>=1.14.3
- pandas>=1.0.0
- scikit-learn
- pip:
- mlflow
- azureml-mlflow
3. Submit the local run and ensure you set the parameter backend = "azureml" , which
adds support of automatic tracking, model's capture, log files, snapshots, and
printed errors in your workspace. In this example we assume the MLflow project
you are trying to run is in the same folder you currently are, uri="." .
MLflow CLI
Bash
mlflow run . --backend azureml --backend-config backend_config.json
-P alpha=0.3
7 Note
Since Azure Machine Learning jobs always run in the context of environments,
the parameter env_manager is ignored.
View your runs and metrics in the Azure Machine Learning studio .
Clean up resources
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to
delete them individually is currently unavailable. Instead, delete the resource group that
contains the storage account and workspace, so you don't incur any charges:
Example notebooks
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon
concepts presented in this article.
Train an MLflow project on a local compute
Train an MLflow project on remote compute .
7 Note
Next steps
Track Azure Databricks runs with MLflow.
Query & compare experiments and runs with MLflow.
Manage models registries in Azure Machine Learning with MLflow.
Guidelines for deploying MLflow models.
Log metrics, parameters and files with
MLflow
Article • 04/04/2023
Azure Machine Learning supports logging and tracking experiments using MLflow
Tracking . You can log models, metrics, parameters, and artifacts with MLflow as it
supports local mode to cloud portability.
) Important
Unlike the Azure Machine Learning SDK v1, there is no logging functionality in the
Azure Machine Learning SDK for Python (v2). See this guidance to learn how to log
with MLflow. If you were using Azure Machine Learning SDK v1 before, we
recommend you to start leveraging MLflow for tracking experiments. See Migrate
logging from SDK v1 to MLflow for specific guidance.
Logs can help you diagnose errors and warnings, or track performance metrics like
parameters and model performance. In this article, you learn how to enable logging in
the following scenarios:
Tip
This article shows you how to monitor the model training process. If you're
interested in monitoring resource usage and events from Azure Machine Learning,
such as quotas, completed training jobs, or completed model deployments, see
Monitoring Azure Machine Learning.
Tip
For information on logging metrics in Azure Machine Learning designer, see How
to log metrics in the designer.
Prerequisites
You must have an Azure Machine Learning workspace. Create one if you don't have
any.
You must have mlflow , and azureml-mlflow packages installed. If you don't, use
the following command to install them in your development environment:
Bash
If you are doing remote tracking (tracking experiments running outside Azure
Machine Learning), configure MLflow to track experiments using Azure Machine
Learning. See Configure MLflow for Azure Machine Learning for more details.
Python
import mlflow
Configuring experiments
MLflow organizes the information in experiments and runs (in Azure Machine Learning,
runs are called Jobs). There are some differences in how to configure them depending
on how you are running your code:
Training interactively
For example, the following code snippet demonstrates configuring the experiment,
and then logging during a job:
Python
import mlflow
# Set the experiment
mlflow.set_experiment("mlflow-experiment")
Tip
Technically you don't have to call start_run() as a new run is created if one
doesn't exist and you call a logging API. In that case, you can use
mlflow.active_run() to retrieve the run once currently being used. For more
Python
import mlflow
mlflow.set_experiment("mlflow-experiment")
When you start a new run with mlflow.start_run , it may be useful to indicate the
parameter run_name which will then translate to the name of the run in Azure
Machine Learning user interface and help you identify the run quicker:
Python
For more information on MLflow logging APIs, see the MLflow reference .
Logging parameters
MLflow supports the logging parameters used by your experiments. Parameters can be
of any type, and can be logged using the following syntax:
Python
mlflow.log_param("num_epochs", 20)
MLflow also offers a convenient way to log multiple parameters by indicating all of them
using a dictionary. Several frameworks can also pass parameters to models using
dictionaries and hence this is a convenient way to log them in the experiment.
Python
params = {
"num_epochs": 20,
"dropout_rate": .6,
"objective": "binary_crossentropy"
}
mlflow.log_params(params)
Logging metrics
Metrics, as opposite to parameters, are always numeric. The following table describes
how to log specific numeric types:
) Important
Performance considerations: If you need to log multiple metrics (or multiple values
for the same metric) avoid making calls to mlflow.log_metric in loops. Better
performance can be achieved by logging batch of metrics. Use the method
mlflow.log_metrics which accepts a dictionary with all the metrics you want to log
at once or use MLflowClient.log_batch which accepts multiple type of elements for
logging. See Logging curves or list of values for an example.
Python
list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
from mlflow.entities import Metric
from mlflow.tracking import MlflowClient
import time
client = MlflowClient()
client.log_batch(mlflow.active_run().info.run_id,
metrics=[Metric(key="sample_list", value=val,
timestamp=int(time.time() * 1000), step=0) for val in list_to_log])
Logging images
MLflow supports two ways of logging images. Both of them persists the given image as
an artifact inside of the run.
Log matlotlib mlflow.log_figure(fig, figure.png is the name of the artifact that will be
plot or "figure.png") generated inside of the run. It doesn't have to be an
image file existing file.
Logging files
In general, files in MLflow are called artifacts. You can log artifacts in multiple ways in
Mlflow:
Log a trivial file mlflow.log_artifact("path/to/file.pkl") Files are always logged in the root
already of the run. If artifact_path is
existing provided, then the file is logged in
a folder as indicated in that
parameter.
Tip
When loggiging large files with log_artifact or log_model , you may encounter
time out errors before the upload of the file is completed. Consider increasing the
timeout value by adjusting the environment variable
AZUREML_ARTIFACTS_DEFAULT_TIMEOUT . It's default value is 300 (seconds).
Logging models
MLflow introduces the concept of "models" as a way to package all the artifacts required
for a given model to function. Models in MLflow are always a folder with an arbitrary
number of files, depending on the framework used to generate the model. Logging
models has the advantage of tracking all the elements of the model as a single entity
that can be registered and then deployed. On top of that, MLflow models enjoy the
benefit of no-code deployment and can be used with the Responsible AI dashboard in
studio. Read the article From artifacts to models in MLflow for more information.
To save the model from a training run, use the log_model() API for the framework
you're working with. For example, mlflow.sklearn.log_model() . For more details about
how to log MLflow models see Logging MLflow models For migrating existing models
to MLflow, see Convert custom models to MLflow.
Tip
When loggiging large models, you may encounter the error Failed to flush the
queue within 300 seconds . Usually, it means the operation is timing out before the
upload of the model artifacts is completed. Consider increasing the timeout value
by adjusting the environment variable AZUREML_ARTIFACTS_DEFAULT_VALUE .
Automatic logging
With Azure Machine Learning and MLflow, users can log metrics, model parameters and
model artifacts automatically when training a model. Each framework decides what to
track automatically for you. A variety of popular machine learning libraries are
supported. Learn more about Automatic logging with MLflow .
To enable automatic logging insert the following code before your training code:
Python
mlflow.autolog()
Tip
You can control what gets automatically logged with autolog. For instance, if you
indicate mlflow.autolog(log_models=False) , MLflow will log everything but models
for you. Such control is useful in cases where you want to log models manually but
still enjoy automatic logging of metrics and parameters. Also notice that some
frameworks may disable automatic logging of models if the trained model goes
behond specific boundaries. Such behavior depends on the flavor used and we
recommend you to view they documentation if this is your case.
Python
import mlflow
run = mlflow.get_run(run_id="<RUN_ID>")
You can view the metrics, parameters, and tags for the run in the data field of the run
object.
Python
metrics = run.data.metrics
params = run.data.params
tags = run.data.tags
7 Note
To get all metrics logged for a particular metric name, you can use
MlFlowClient.get_metric_history() as explained in the example Getting params
and metrics from a run.
Tip
MLflow can retrieve metrics and parameters from multiple runs at the same time,
allowing for quick comparisons across multiple trials. Learn about this in Query &
compare experiments and runs with MLflow.
Any artifact logged by a run can be queried by MLflow. Artifacts can't be accessed using
the run object itself and the MLflow client should be used instead:
Python
client = mlflow.tracking.MlflowClient()
client.list_artifacts("<RUN_ID>")
The method above will list all the artifacts logged in the run, but they will remain stored
in the artifacts store (Azure Machine Learning storage). To download any of them, use
the method download_artifact :
Python
file_path = client.download_artifacts("<RUN_ID>",
path="feature_importance_weight.png")
For more information please refer to Getting metrics, parameters, artifacts and models.
Navigate to the Jobs tab. To view all your jobs in your Workspace across Experiments,
select the All jobs tab. You can drill down on jobs for specific Experiments by applying
the Experiment filter in the top menu bar. Click on the job of interest to enter the details
view, and then select the Metrics tab.
Select the logged metrics to render charts on the right side. You can customize the
charts by applying smoothing, changing the color, or plotting multiple metrics on a
single graph. You can also resize and rearrange the layout as you wish. Once you have
created your desired view, you can save it for future use and share it with your
teammates using a direct link.
user_logs folder
This folder contains information about the user generated logs. This folder is open by
default, and the std_log.txt log is selected. The std_log.txt is where your code's logs (for
example, print statements) show up. This file contains stdout log and stderr logs from
your control script and training script, one per process. In most cases, you'll monitor the
logs here.
system_logs folder
This folder contains the logs generated by Azure Machine Learning and it will be closed
by default. The logs generated by the system are grouped into different folders, based
on the stage of the job in the runtime.
Other folders
For jobs training on multi-compute clusters, logs are present for each node IP. The
structure for each node is the same as single node jobs. There's one more logs folder for
overall execution, stderr, and stdout logs.
Azure Machine Learning logs information from various sources during training, such as
AutoML or the Docker container that runs the training job. Many of these logs aren't
documented. If you encounter problems and contact Microsoft support, they may be
able to use these logs during troubleshooting.
Next steps
Train ML models with MLflow and Azure Machine Learning.
Migrate from SDK v1 logging to MLflow tracking.
Logging MLflow models
Article • 02/24/2023
The following article explains how to start logging your trained models (or artifacts) as
MLflow models. It explores the different methods to customize the way MLflow
packages your models and hence how it runs them.
A model in MLflow is also an artifact, but with a specific structure that serves as a
contract between the person that created the model and the person that intends to use
it. Such contract helps build the bridge about the artifacts themselves and what they
mean.
There are different ways to start using the model's concept in Azure Machine Learning
with MLflow, as explained in the following sections:
Python
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
mlflow.autolog()
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
Tip
If you are using Machine Learning pipelines, like for instance Scikit-Learn
pipelines , use the autolog functionality of that flavor for logging models. Models
are automatically logged when the fit() method is called on the pipeline object.
The notebook Training and tracking an XGBoost classifier with MLflow
demonstrates how to log a model with preprocessing using pipelines.
" You want to indicate pip packages or a conda environment different from the ones
that are automatically detected.
" You want to include input examples.
" You want to include specific artifacts into the package that will be needed.
" Your signature is not correctly inferred by autolog . This is specifically important
when you deal with inputs that are tensors where the signature needs specific
shapes.
" Somehow the default behavior of autolog doesn't fill your purpose.
Python
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env
mlflow.autolog(log_models=False)
# Signature
signature = infer_signature(X_test, y_test)
# Conda environment
custom_env =_mlflow_conda_env(
additional_conda_deps=None,
additional_pip_deps=["xgboost==1.5.2"],
additional_conda_channels=None,
)
# Sample
input_example = X_train.sample(n=1)
7 Note
SDK and it may change in the future. This example uses it just for sake of
simplicity, but it must be used with caution or generate the YAML definition
manually as a Python dictionary.
executed and what gets returned by the model. MLflow doesn't enforce any specific
behavior in how the predict generate results. There are scenarios where you probably
want to do some pre-processing or post-processing before and after your model is
executed.
A solution to this scenario is to implement machine learning pipelines that moves from
inputs to outputs directly. Although this is possible (and sometimes encourageable for
performance considerations), it may be challenging to achieve. For those cases, you
probably want to customize how your model does inference using a custom models as
explained in the following section.
For this type of models, MLflow introduces a flavor called pyfunc (standing from Python
function). Basically this flavor allows you to log any object you want as a model, as long
as it satisfies two conditions:
Tip
Serializable models that implements the Scikit-learn API can use the Scikit-learn
flavor to log the model, regardless of whether the model was built with Scikit-learn.
If your model can be persisted in Pickle format and the object has methods
predict() and predict_proba() (at least), then you can use
mlflow.sklearn.log_model() to log it inside a MLflow run.
The simplest way of creating your custom model's flavor is by creating a wrapper
around your existing model object. MLflow will serialize it and package it for you.
Python objects are serializable when the object can be stored in the file system as a
file (generally in Pickle format). During runtime, the object can be materialized from
such file and all the values, properties and methods available when it was saved will
be restored.
The following sample wraps a model created with XGBoost to make it behaves in a
different way to the default implementation of the XGBoost flavor (it returns the
probabilities instead of the classes):
Python
class ModelWrapper(PythonModel):
def __init__(self, model):
self._model = model
# You can even add extra functions if you need to. Since the model
is serialized,
# all of them will be available when you load your model back.
def predict_batch(self, data):
pass
Python
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
mlflow.xgboost.autolog(log_models=False)
Tip
Note how the infer_signature method now uses y_probs to infer the
signature. Our target column has the target class, but our model now returns
the two probabilities for each class.
Next steps
Deploy MLflow models
Query & compare experiments and runs
with MLflow
Article • 06/26/2023
Experiments and jobs (or runs) in Azure Machine Learning can be queried using MLflow.
You don't need to install any specific SDK to manage what happens inside of a training
job, creating a more seamless transition between local runs and the cloud by removing
cloud-specific dependencies. In this article, you'll learn how to query and compare
experiments and runs in your workspace using Azure Machine Learning and MLflow SDK
in Python.
See Support matrix for querying runs and experiments in Azure Machine Learning for a
detailed comparison between MLflow Open-Source and MLflow when connected to
Azure Machine Learning.
7 Note
The Azure Machine Learning Python SDK v2 does not provide native logging or
tracking capabilities. This applies not just for logging but also for querying the
metrics logged. Instead, use MLflow to manage experiments and runs. This article
explains how to use MLflow to manage experiments and runs in Azure Machine
Learning.
REST API
Query and searching experiments and runs is also available using the MLflow REST API.
See Using MLflow REST with Azure Machine Learning for an example about how to
consume it.
Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Python
mlflow.search_experiments()
7 Note
mlflow.search_experiments(view_type=ViewType.ALL)
Python
mlflow.get_experiment_by_name(experiment_name)
Python
mlflow.get_experiment('1234-5678-90AB-CDEFG')
Searching experiments
The search_experiments() method available since Mlflow 2.0 allows searching
experiment matching a criteria using filter_string .
Python
mlflow.search_experiments(filter_string="experiment_id IN ("
"'CDEFG-1234-5678-90AB', '1234-5678-90AB-CDEFG', '5678-1234-90AB-
CDEFG')"
)
Python
import datetime
Python
mlflow.search_experiments(filter_string=f"tags.framework = 'torch'")
search. You can also indicate search_all_experiments=True if you want to search across
all the experiments in the workspace:
By experiment name:
Python
mlflow.search_runs(experiment_names=[ "my_experiment" ])
By experiment ID:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
Python
mlflow.search_runs(filter_string="params.num_boost_round='100'",
search_all_experiments=True)
) Important
All metrics and parameters are also returned when querying runs. However, for metrics
containing multiple values (for instance, a loss curve, or a PR curve), only the last value
of the metric is returned. If you want to retrieve all the values of a given metric, uses
mlflow.get_metric_history method. See Getting params and metrics from a run for an
example.
Ordering runs
By default, experiments are ordered descending by start_time , which is the time the
experiment was queue in Azure Machine Learning. However, you can change this default
by using the parameter order_by .
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.start_time DESC"])
Order runs and limit results. The following example returns the last single run in
the experiment:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
max_results=1, order_by=["attributes.start_time
DESC"])
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.duration DESC"])
Tip
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG"
]).sort_values("metrics.accuracy", ascending=False)
2 Warning
Filtering runs
You can also look for a run with a specific combination in the hyperparameters using the
parameter filter_string . Use params to access run's parameters, metrics to access
metrics logged in the run, and attributes to access run information details. MLflow
supports expressions joined by the AND keyword (the syntax does not support OR):
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'")
2 Warning
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="metrics.auc>0.8")
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="tags.framework='torch'")
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.user_id = 'John Smith'")
Search runs that have failed. See Filter runs by status for possible values:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")
Python
import datetime
Tip
Notice that for the key attributes , values should always be strings and hence
encoded between quotes.
Python
duration = 360 * 1000 # duration is in milliseconds
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.duration > '{duration}'")
Tip
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.run_id IN ('1234-5678-
90AB-CDEFG', '5678-1234-90AB-CDEFG')")
Not started SCHEDULED The job/run was just registered in Azure Machine
Learning but it has processed it yet.
Preparing SCHEDULED The job/run has not started yet, but a compute has
been allocated for the execution and it is on building
state.
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")
Python
runs = mlflow.search_runs(
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'",
output_format="list",
)
Details can then be accessed from the info member. The following sample shows how
to get the run_id :
Python
last_run = runs[-1]
print("Last run ID:", last_run.info.run_id)
Python
last_run.data.params
last_run.data.metrics
For metrics that contain multiple values (for instance, a loss curve, or a PR curve), only
the last logged value of the metric is returned. If you want to retrieve all the values of a
given metric, uses mlflow.get_metric_history method. This method requires you to use
the MlflowClient :
Python
client = mlflow.tracking.MlflowClient()
client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")
Python
client = mlflow.tracking.MlflowClient()
client.list_artifacts("1234-5678-90AB-CDEFG")
The method above will list all the artifacts logged in the run, but they will remain stored
in the artifacts store (Azure Machine Learning storage). To download any of them, use
the method download_artifact :
Python
file_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG",
artifact_path="feature_importance_weight.png"
)
7 Note
Python
artifact_path="classifier"
model_local_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG", artifact_path=artifact_path
)
You can then load the model back from the downloaded artifacts using the typical
function load_model in the flavor-specific namespace. The following example uses
xgboost :
Python
model = mlflow.xgboost.load_model(model_local_path)
MLflow also allows you to both operations at once and download and load the model in
a single instruction. MLflow will download the model to a temporary folder and load it
from there. The method load_model uses an URI format to indicate from where the
model has to be retrieved. In the case of loading a model from a run, the URI structure is
as follows:
Python
model =
mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
Tip
For query and loading models registered in the Model Registry, view Manage
models registries in Azure Machine Learning with MLflow.
Python
hyperopt_run = mlflow.last_active_run()
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId='{hyperopt_run.info.run_id}'"
)
) Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon
concepts presented in this article.
Renaming experiments ✓
7 Note
1 Check the section Ordering runs for instructions and examples on how to
achieve the same functionality in Azure Machine Learning.
2 != for tags not supported.
Next steps
Manage your models with MLflow.
Deploy models with MLflow.
Manage models registries in Azure
Machine Learning with MLflow
Article • 03/21/2023
Azure Machine Learning supports MLflow for model management. Such approach
represents a convenient way to support the entire model lifecycle for users familiar with
the MLFlow client. The following article describes the different capabilities and how it
compares with other options.
Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Some operations may be executed directly using the MLflow fluent API ( mlflow.
<method> ). However, others may require to create an MLflow client, which allows to
communicate with Azure Machine Learning in the MLflow protocol. You can create
an MlflowClient object as follows. This tutorial uses the object client to refer to
such MLflow client.
Python
import mlflow
client = mlflow.tracking.MlflowClient()
Python
mlflow.register_model(f"runs:/{run_id}/{artifact_path}", model_name)
7 Note
Models can only be registered to the registry in the same workspace where the run
was tracked. Cross-workspace operations are not supported by the moment in
Azure Machine Learning.
Tip
Python
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
mlflow.sklearn.save_model(reg, "./regressor")
Tip
You can now register the model from the local path:
Python
import os
model_local_path = os.path.abspath("./regressor")
mlflow.register_model(f"file://{model_local_path}", "local-model-test")
Python
for model in client.search_registered_models():
print(f"{model.name}")
Python
client.search_registered_models(order_by=["name ASC"])
7 Note
Python
client.get_registered_model(model_name)
If you need a specific version of the model, you can indicate so:
Python
client.get_model_version(model_name, version=2)
Tip
Model stages
MLflow supports model's stages to manage model's lifecycle. Model's version can
transition from one stage to another. Stages are assigned to a model's version (instead
of models) which means that a given model can have multiple versions on different
stages.
) Important
Stages can only be accessed using the MLflow SDK. They don't show up in the
Azure ML Studio portal and can't be retrieved using neither Azure ML SDK,
Azure ML CLI, or Azure ML REST API. Creating deployment from a given model's
stage is not supported by the moment.
Python
client.get_model_version_stages(model_name, version="latest")
You can see what model's version is on each stage by getting the model from the
registry. The following example gets the model's version currently in the stage Staging .
Python
client.get_latest_versions(model_name, stages=["Staging"])
7 Note
Multiple versions can be in the same stage at the same time in Mlflow, however,
this method returns the latest version (greater version) among all of them.
2 Warning
Transitioning models
Transitioning a model's version to a particular stage can be done using the MLflow
client.
Python
client.transition_model_version_stage(model_name, version=3,
stage="Staging")
By default, if there were an existing model version in that particular stage, it remains
there. Hence, it isn't replaced as multiple model's versions can be in the same stage at
the same time. Alternatively, you can indicate archive_existing_versions=True to tell
MLflow to move the existing model's version to the stage Archived .
Python
client.transition_model_version_stage(
model_name, version=3, stage="Staging", archive_existing_versions=True
)
Python
model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
Editing and deleting models
Editing registered models is supported in both Mlflow and Azure ML. However, there are
some differences important to be noticed:
2 Warning
Renaming models is not supported in Azure Machine Learning as model objects are
immmutable.
Editing models
You can edit model's description and tags from a model using Mlflow:
Python
Python
Removing a tag:
Python
Python
client.delete_model_version(model_name, version="2")
7 Note
Azure Machine Learning doesn't support deleting the entire model container. To
achieve the same thing, you will need to delete all the model versions from a given
model.
3 3 3
Renaming registered models ✓
3 3 3
Deleting a registered model (container) ✓
7 Note
1
Use URIs with format runs:/<ruin-id>/<path> .
2
Use URIs with format azureml://jobs/<job-id>/outputs/artifacts/<path> .
3
Registered models are immutable objects in Azure ML.
4
Use search box in Azure ML Studio. Partial match supported.
5 Use registries.
Next steps
Logging MLflow models
Query & compare experiments and runs with MLflow
Guidelines for deploying MLflow models
Query & compare experiments and runs
with MLflow
Article • 06/26/2023
Experiments and jobs (or runs) in Azure Machine Learning can be queried using MLflow.
You don't need to install any specific SDK to manage what happens inside of a training
job, creating a more seamless transition between local runs and the cloud by removing
cloud-specific dependencies. In this article, you'll learn how to query and compare
experiments and runs in your workspace using Azure Machine Learning and MLflow SDK
in Python.
See Support matrix for querying runs and experiments in Azure Machine Learning for a
detailed comparison between MLflow Open-Source and MLflow when connected to
Azure Machine Learning.
7 Note
The Azure Machine Learning Python SDK v2 does not provide native logging or
tracking capabilities. This applies not just for logging but also for querying the
metrics logged. Instead, use MLflow to manage experiments and runs. This article
explains how to use MLflow to manage experiments and runs in Azure Machine
Learning.
REST API
Query and searching experiments and runs is also available using the MLflow REST API.
See Using MLflow REST with Azure Machine Learning for an example about how to
consume it.
Prerequisites
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
Tip
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
Python
mlflow.search_experiments()
7 Note
mlflow.search_experiments(view_type=ViewType.ALL)
Python
mlflow.get_experiment_by_name(experiment_name)
Python
mlflow.get_experiment('1234-5678-90AB-CDEFG')
Searching experiments
The search_experiments() method available since Mlflow 2.0 allows searching
experiment matching a criteria using filter_string .
Python
mlflow.search_experiments(filter_string="experiment_id IN ("
"'CDEFG-1234-5678-90AB', '1234-5678-90AB-CDEFG', '5678-1234-90AB-
CDEFG')"
)
Python
import datetime
Python
mlflow.search_experiments(filter_string=f"tags.framework = 'torch'")
search. You can also indicate search_all_experiments=True if you want to search across
all the experiments in the workspace:
By experiment name:
Python
mlflow.search_runs(experiment_names=[ "my_experiment" ])
By experiment ID:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
Python
mlflow.search_runs(filter_string="params.num_boost_round='100'",
search_all_experiments=True)
) Important
All metrics and parameters are also returned when querying runs. However, for metrics
containing multiple values (for instance, a loss curve, or a PR curve), only the last value
of the metric is returned. If you want to retrieve all the values of a given metric, uses
mlflow.get_metric_history method. See Getting params and metrics from a run for an
example.
Ordering runs
By default, experiments are ordered descending by start_time , which is the time the
experiment was queue in Azure Machine Learning. However, you can change this default
by using the parameter order_by .
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.start_time DESC"])
Order runs and limit results. The following example returns the last single run in
the experiment:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
max_results=1, order_by=["attributes.start_time
DESC"])
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
order_by=["attributes.duration DESC"])
Tip
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG"
]).sort_values("metrics.accuracy", ascending=False)
2 Warning
Filtering runs
You can also look for a run with a specific combination in the hyperparameters using the
parameter filter_string . Use params to access run's parameters, metrics to access
metrics logged in the run, and attributes to access run information details. MLflow
supports expressions joined by the AND keyword (the syntax does not support OR):
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'")
2 Warning
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="metrics.auc>0.8")
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="tags.framework='torch'")
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.user_id = 'John Smith'")
Search runs that have failed. See Filter runs by status for possible values:
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")
Python
import datetime
Tip
Notice that for the key attributes , values should always be strings and hence
encoded between quotes.
Python
duration = 360 * 1000 # duration is in milliseconds
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string=f"attributes.duration > '{duration}'")
Tip
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.run_id IN ('1234-5678-
90AB-CDEFG', '5678-1234-90AB-CDEFG')")
Not started SCHEDULED The job/run was just registered in Azure Machine
Learning but it has processed it yet.
Preparing SCHEDULED The job/run has not started yet, but a compute has
been allocated for the execution and it is on building
state.
Python
mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="attributes.status = 'Failed'")
Python
runs = mlflow.search_runs(
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
filter_string="params.num_boost_round='100'",
output_format="list",
)
Details can then be accessed from the info member. The following sample shows how
to get the run_id :
Python
last_run = runs[-1]
print("Last run ID:", last_run.info.run_id)
Python
last_run.data.params
last_run.data.metrics
For metrics that contain multiple values (for instance, a loss curve, or a PR curve), only
the last logged value of the metric is returned. If you want to retrieve all the values of a
given metric, uses mlflow.get_metric_history method. This method requires you to use
the MlflowClient :
Python
client = mlflow.tracking.MlflowClient()
client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")
Python
client = mlflow.tracking.MlflowClient()
client.list_artifacts("1234-5678-90AB-CDEFG")
The method above will list all the artifacts logged in the run, but they will remain stored
in the artifacts store (Azure Machine Learning storage). To download any of them, use
the method download_artifact :
Python
file_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG",
artifact_path="feature_importance_weight.png"
)
7 Note
Python
artifact_path="classifier"
model_local_path = mlflow.artifacts.download_artifacts(
run_id="1234-5678-90AB-CDEFG", artifact_path=artifact_path
)
You can then load the model back from the downloaded artifacts using the typical
function load_model in the flavor-specific namespace. The following example uses
xgboost :
Python
model = mlflow.xgboost.load_model(model_local_path)
MLflow also allows you to both operations at once and download and load the model in
a single instruction. MLflow will download the model to a temporary folder and load it
from there. The method load_model uses an URI format to indicate from where the
model has to be retrieved. In the case of loading a model from a run, the URI structure is
as follows:
Python
model =
mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
Tip
For query and loading models registered in the Model Registry, view Manage
models registries in Azure Machine Learning with MLflow.
Python
hyperopt_run = mlflow.last_active_run()
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId='{hyperopt_run.info.run_id}'"
)
) Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon
concepts presented in this article.
Renaming experiments ✓
7 Note
1 Check the section Ordering runs for instructions and examples on how to
achieve the same functionality in Azure Machine Learning.
2 != for tags not supported.
Next steps
Manage your models with MLflow.
Deploy models with MLflow.
Guidelines for deploying MLflow
models
Article • 10/18/2023
In this article, learn how to deploy your MLflow model to Azure Machine Learning for
both real-time and batch inference. Learn also about the different tools you can use to
perform management of the deployment.
Ensures all the package dependencies indicated in the MLflow model are satisfied.
Provides a MLflow base image/curated environment that contains the following
items:
Packages required for Azure Machine Learning to perform inference, including
mlflow-skinny .
A scoring script to perform inference.
Tip
Workspaces without public network access: Before you can deploy MLflow models
to online endpoints without egress connectivity, you have to package the models
(preview). By using model packaging, you can avoid the need for an internet
connection, which Azure Machine Learning would otherwise require to dynamically
install necessary Python packages for the MLflow models.
conda.yaml
YAML
channels:
- conda-forge
dependencies:
- python=3.7.11
- pip
- pip:
- mlflow
- scikit-learn==0.24.1
- cloudpickle==2.0.0
- psutil==5.8.0
name: mlflow-env
2 Warning
MLflow performs automatic package detection when logging models, and pins
their versions in the conda dependencies of the model. However, such action is
performed at the best of its knowledge and there might be cases when the
detection doesn't reflect your intentions or requirements. On those cases consider
logging models with a custom conda dependencies definition.
MLmodel
YAML
artifact_path: model
flavors:
python_function:
env: conda.yaml
loader_module: mlflow.sklearn
model_path: model.pkl
python_version: 3.7.11
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.24.1
run_id: f1e06708-641d-4a49-8f36-e9dcd8d34346
signature:
inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type":
"double"},
{"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"},
{"name":
"s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name":
"s3", "type":
"double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type":
"double"},
{"name": "s6", "type": "double"}]'
outputs: '[{"type": "double"}]'
utc_time_created: '2022-03-17 01:56:03.706848'
You can inspect your model's signature by opening the MLmodel file associated with
your MLflow model. For more information on how signatures work in MLflow, see
Signatures in MLflow.
Tip
Signatures in MLflow models are optional but they are highly encouraged as they
provide a convenient way to early detect data compatibility issues. For more
information about how to log models with signatures read Logging models with a
custom signature, environment or samples.
inferencing technologies, which might have different features. Read this section to
understand their differences.
The rest of this section mostly applies to online endpoints but you can learn more of
batch endpoint and MLflow models at Use MLflow models in batch deployments.
Input formats
7 Note
1
We suggest you to explore batch inference for processing files. See Deploy
MLflow models to Batch Endpoints.
Input structure
Regardless of the input type used, Azure Machine Learning requires inputs to be
provided in a JSON payload, within a dictionary key input_data . The following section
shows different payload examples and the differences between MLflow built-in server
and Azure Machine Learning inferencing server.
2 Warning
Note that such key is not required when serving models using the command
mlflow models serve and hence payloads can't be used interchangeably.
) Important
MLflow 2.0 advisory: Notice that the payload's structure has changed in MLflow
2.0.
JSON
{
"input_data": {
"columns": [
"age", "sex", "trestbps", "chol", "fbs", "restecg",
"thalach", "exang", "oldpeak", "slope", "ca", "thal"
],
"index": [1],
"data": [
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}
}
JSON
{
"input_data": [
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}
JSON
{
"input_data": {
"tokens": [
[0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
],
"mask": [
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
]
}
}
For more information about MLflow built-in deployment tools, see MLflow
documentation section .
If you need to change the behavior at any point about how inference of an MLflow
model is executed, you can either change how your model is being logged in the
training routine or customize inference with a scoring script at deployment time.
executed and what gets returned by the model. MLflow doesn't enforce any specific
behavior in how the predict() function generates results. However, there are scenarios
where you probably want to do some preprocessing or post-processing before and after
your model is executed. On another scenarios, you might want to change what's
returned like probabilities vs classes.
A solution to this scenario is to implement machine learning pipelines that moves from
inputs to outputs directly. For instance, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline
are popular (and sometimes encourageable for performance considerations) ways to do
so. Another alternative is to customize how your model does inference using a custom
model flavor.
) Important
When you opt-in to specify a scoring script for an MLflow model deployment, you
also need to provide an environment for it.
Deployment tools
Azure Machine Learning offers many ways to deploy MLflow models to online and batch
endpoints. You can deploy models using the following tools:
" MLflow SDK
" Azure Machine Learning CLI and Azure Machine Learning SDK for Python
" Azure Machine Learning studio
Each workflow has different capabilities, particularly around which type of compute they
can target. The following table shows them.
Scenario MLflow SDK Azure Machine Azure Machine
Learning CLI/SDK Learning studio
7 Note
1
Deployment to online endpoints that are in workspaces with private link
enabled requires you to package models before deployment (preview).
2
We recommend switching to managed online endpoints instead.
3
MLflow (OSS) doesn't have the concept of a scoring script and doesn't
support batch execution currently.
However, if you're more familiar with the Azure Machine Learning CLI v2, you want to
automate deployments using automation pipelines, or you want to keep deployment
configuration in a git repository; we recommend that you use the Azure Machine
Learning CLI v2.
If you want to quickly deploy and test models trained with MLflow, you can use the
Azure Machine Learning studio UI deployment.
Next steps
To learn more, review these articles:
In this article, learn how to deploy your MLflow model to an online endpoint for real-
time inference. When you deploy your MLflow model to an online endpoint, you don't
need to indicate a scoring script or an environment. This characteristic is referred as no-
code deployment.
Tip
Workspaces without public network access: Before you can deploy MLflow models
to online endpoints without egress connectivity, you have to package the models
(preview). By using model packaging, you can avoid the need for an internet
connection, which Azure Machine Learning would otherwise require to dynamically
install necessary Python packages for the MLflow models.
The information in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste YAML
and other files, clone the repo, and then change directories to the cli/endpoints/online
if you are using the Azure CLI or sdk/endpoints/online if you are using our SDK for
Python.
Azure CLI
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
Azure CLI
Install the Azure CLI and the ml extension to the Azure CLI. For more
information, see Install, set up, and use the CLI (v2).
Azure CLI
Azure CLI
Azure CLI
Azure CLI
MODEL_NAME='sklearn-diabetes'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"sklearn-diabetes/model"
Alternatively, if your model was logged inside of a run, you can register it directly.
Tip
To register the model, you will need to know the location where the model has
been stored. If you are using autolog feature of MLflow, the path will depend on
the type and framework of the model being used. We recommend to check the
jobs output to identify which is the name of this folder. You can look for the folder
that contains a file named MLModel . If you are logging your models manually using
log_model , then the path is the argument you pass to such method. As an example,
Azure CLI
Use the Azure Machine Learning CLI v2 to create a model from a training job
output. In the following example, a model named $MODEL_NAME is registered using
the artifacts of a job with ID $RUN_ID . The path where the model is stored is
$MODEL_PATH .
Bash
7 Note
The path $MODEL_PATH is the location where the model has been stored in the
run.
Azure CLI
endpoint.yaml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.s
chema.json
name: my-endpoint
auth_mode: key
2. Let's create the endpoint:
Azure CLI
Azure CLI
Azure CLI
sklearn-deployment.yaml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: sklearn-deployment
endpoint_name: my-endpoint
model:
name: mir-sample-sklearn-ncd-model
version: 1
path: sklearn-diabetes/model
type: mlflow_model
instance_type: Standard_DS3_v2
instance_count: 1
7 Note
model deployments.
Azure CLI
Azure CLI
az ml online-deployment create --name sklearn-deployment --endpoint
$ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --
all-traffic
Azure CLI
5. Assign all the traffic to the deployment: So far, the endpoint has one deployment,
but none of its traffic is assigned to it. Let's assign it.
Azure CLI
This step in not required in the Azure CLI since we used the --all-traffic
during creation. If you need to change traffic, you can use the command az ml
online-endpoint update --traffic as explained at Progressively update traffic.
Azure CLI
This step in not required in the Azure CLI since we used the --all-traffic
during creation. If you need to change traffic, you can use the command az ml
online-endpoint update --traffic as explained at Progressively update traffic.
sample-request-sklearn.json
JSON
{"input_data": {
"columns": [
"age",
"sex",
"bmi",
"bp",
"s1",
"s2",
"s3",
"s4",
"s5",
"s6"
],
"data": [
[ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
[ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
],
"index": [0,1]
}}
7 Note
Notice how the key input_data has been used in this example instead of inputs as
used in MLflow serving. This is because Azure Machine Learning requires a different
input format to be able to automatically generate the swagger contracts for the
endpoints. See Differences between models deployed in Azure Machine Learning
and MLflow built-in server for details about expected input format.
Azure CLI
Azure CLI
JSON
[
11633.100167144921,
8522.117402884991
]
) Important
) Important
If you choose to indicate an scoring script for an MLflow model deployment, you
will also have to specify the environment where the deployment will run.
Steps
Use the following steps to deploy an MLflow model with a custom scoring script.
c. Select the model you are trying to deploy and click on the tab Artifacts.
d. Take note of the folder that is displayed. This folder was indicated when the
model was registered.
2. Create a scoring script. Notice how the folder name model you identified before
has been included in the init() function.
score.py
Python
import logging
import os
import json
import mlflow
from io import StringIO
from mlflow.pyfunc.scoring_server import infer_and_parse_json_input,
predictions_to_json
def init():
global model
global input_schema
# "model" is the path of the mlflow artifacts when the model was
registered. For automl
# models, this is generally "mlflow-model".
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model")
model = mlflow.pyfunc.load_model(model_path)
input_schema = model.metadata.get_input_schema()
def run(raw_data):
json_data = json.loads(raw_data)
if "input_data" not in json_data.keys():
raise Exception("Request must contain a top level key named
'input_data'")
serving_input = json.dumps(json_data["input_data"])
data = infer_and_parse_json_input(serving_input, input_schema)
predictions = model.predict(data)
result = StringIO()
predictions_to_json(predictions, result)
return result.getvalue()
Tip
2 Warning
MLflow 2.0 advisory: The provided scoring script will work with both MLflow
1.X and MLflow 2.X. However, be advised that the expected input/output
formats on those versions may vary. Check the environment definition used to
ensure you are using the expected MLflow version. Notice that MLflow 2.0 is
only supported in Python 3.8+.
3. Let's create an environment where the scoring script can be executed. Since our
model is MLflow, the conda requirements are also specified in the model package
(for more details about MLflow models and the files included on it see The
MLmodel format). We are going then to build the environment using the conda
dependencies from the file. However, we need also to include the package
azureml-inference-server-http which is required for Online Deployments in Azure
Machine Learning.
conda.yml
YAML
channels:
- conda-forge
dependencies:
- python=3.9
- pip
- pip:
- mlflow
- scikit-learn==1.2.2
- cloudpickle==2.2.1
- psutil==5.9.4
- pandas==2.0.0
- azureml-inference-server-http
name: mlflow-env
7 Note
Azure CLI
Azure CLI
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: sklearn-diabetes-custom
endpoint_name: my-endpoint
model: azureml:sklearn-diabetes@latest
environment:
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
conda_file: sklearn-diabetes/environment/conda.yml
code_configuration:
code: sklearn-diabetes/src
scoring_script: score.py
instance_type: Standard_F2s_v2
instance_count: 1
Azure CLI
az ml online-deployment create -f deployment.yml
5. Once your deployment completes, your deployment is ready to serve request. One
of the easier ways to test the deployment is by using a sample request file along
with the invoke method.
sample-request-sklearn.json
JSON
{"input_data": {
"columns": [
"age",
"sex",
"bmi",
"bp",
"s1",
"s2",
"s3",
"s4",
"s5",
"s6"
],
"data": [
[ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
[ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
],
"index": [0,1]
}}
Azure CLI
Azure CLI
JSON
{
"predictions": [
11633.100167144921,
8522.117402884991
]
}
2 Warning
MLflow 2.0 advisory: In MLflow 1.X, the key predictions will be missing.
Clean up resources
Once you're done with the endpoint, you can delete the associated resources:
Azure CLI
Azure CLI
Next steps
To learn more, review these articles:
In this article, you'll learn how you can progressively update and deploy MLflow models
to Online Endpoints without causing service disruption. You'll use blue-green
deployment, also known as a safe rollout strategy, to introduce a new version of a web
service to production. This strategy will allow you to roll out your new version of the
web service to a small subset of users or requests before rolling it out completely.
The model we will deploy is based on the UCI Heart Disease Data Set . The database
contains 76 attributes, but we are using a subset of 14 of them. The model tries to
predict the presence of heart disease in a patient. It is integer valued from 0 (no
presence) to 1 (presence). It has been trained using an XGBBoost classifier and all the
required preprocessing has been packaged as a scikit-learn pipeline, making this
model an end-to-end pipeline that goes from raw data to predictions.
The information in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste files,
clone the repo, and then change directories to sdk/using-mlflow/deploy .
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine
Learning .
Azure role-based access controls (Azure RBAC) are used to grant access to
operations in Azure Machine Learning. To perform the steps in this article, your
user account must be assigned the owner or contributor role for the Azure
Machine Learning workspace, or a custom role allowing
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more
information, see Manage access to an Azure Machine Learning workspace.
Azure CLI
Install the Azure CLI and the ml extension to the Azure CLI. For more
information, see Install, set up, and use the CLI (v2).
Azure CLI
Azure CLI
Azure CLI
Azure CLI
MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"
We are going to exploit this functionality by deploying multiple versions of the same
model under the same endpoint. However, the new deployment will receive 0% of the
traffic at the begging. Once we are sure about the new model to work correctly, we are
going to progressively move traffic from one deployment to the other.
1. Endpoints require a name, which needs to be unique in the same region. Let's
ensure to create one that doesn't exist:
Azure CLI
Azure CLI
Azure CLI
endpoint.yml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.s
chema.json
name: heart-classifier-edp
auth_mode: key
Azure CLI
Azure CLI
Azure CLI
Azure CLI
blue-deployment.yml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1
Azure CLI
Azure CLI
Azure CLI
Tip
We set the flag --all-traffic in the create command, which will assign
all the traffic to the new deployment.
So far, the endpoint has one deployment, but none of its traffic is assigned to it.
Let's assign it.
Azure CLI
This step in not required in the Azure CLI since we used the --all-traffic
during creation.
Azure CLI
This step in not required in the Azure CLI since we used the --all-traffic
during creation.
Azure CLI
sample.yml
YAML
{
"input_data": {
"columns": [
"age",
"sex",
"cp",
"trestbps",
"chol",
"fbs",
"restecg",
"thalach",
"exang",
"oldpeak",
"slope",
"ca",
"thal"
],
"data": [
[ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal"
]
]
}
}
Azure CLI
Azure CLI
Azure CLI
Azure CLI
MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"
Azure CLI
Azure CLI
green-deployment.yml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment
.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1
Azure CLI
GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"
Azure CLI
Azure CLI
az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --
endpoint-name $ENDPOINT_NAME -f green-deployment.yml
Azure CLI
Azure CLI
Azure CLI
Tip
Notice how now we are indicating the name of the deployment we want to
invoke.
Azure CLI
Azure CLI
3. If you decide to switch the entire traffic to the new deployment, update all the
traffic:
Azure CLI
Azure CLI
Azure CLI
5. Since the old deployment doesn't receive any traffic, you can safely delete it:
Azure CLI
Azure CLI
Tip
Notice that at this point, the former "blue deployment" has been deleted and
the new "green deployment" has taken the place of the "blue deployment".
Clean-up resources
Azure CLI
Azure CLI
) Important
Notice that deleting an endpoint also deletes all the deployments under it.
Next steps
Deploy MLflow models to Batch Endpoints
Using MLflow models for no-code deployment
Deploy MLflow models in batch
deployments
Article • 05/15/2023
In this article, learn how to deploy MLflow models to Azure Machine Learning for both
batch inference using batch endpoints. When deploying MLflow models to batch
endpoints, Azure Machine Learning:
7 Note
For more information about the supported input file types in model deployments
with MLflow, view Considerations when deploying to batch inference.
The model has been trained using an XGBBoost classifier and all the required
preprocessing has been packaged as a scikit-learn pipeline, making this model an
end-to-end pipeline that goes from raw data to predictions.
The example in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste YAML
and other files, first clone the repo and then change directories to the folder:
Azure CLI
Azure CLI
Azure CLI
cd endpoints/batch/deploy-models/heart-classifier-mlflow
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure Machine Learning workspace. If you don't have one, use the steps in the
How to manage workspaces article to create one.
Create ARM deployments in the workspace resource group: Use roles Owner,
contributor, or custom role allowing Microsoft.Resources/deployments/write in
the resource group where the workspace is deployed.
You will need to install the following software to work with Azure Machine
Learning:
Azure CLI
The Azure CLI and the ml extension for Azure Machine Learning.
Azure CLI
az extension add -n ml
7 Note
Azure CLI
Pass in the values for your subscription ID, workspace, location, and resource group
in the following code:
Azure CLI
Steps
Follow these steps to deploy an MLflow model to a batch endpoint for running batch
inference over new data:
1. Batch Endpoint can only deploy registered models. In this case, we already have a
local copy of the model in the repository, so we only need to publish the model to
the registry in the workspace. You can skip this step if the model you are trying to
deploy is already registered.
Azure CLI
Azure CLI
MODEL_NAME='heart-classifier-mlflow'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path
"model"
2. Before moving any forward, we need to make sure the batch deployments we are
about to create can run on some infrastructure (compute). Batch deployments can
run on any Azure Machine Learning compute that already exists in the workspace.
That means that multiple batch deployments can share the same compute
infrastructure. In this example, we are going to work on an Azure Machine Learning
compute cluster called cpu-cluster . Let's verify the compute exists on the
workspace or create it otherwise.
Azure CLI
Azure CLI
3. Now it is time to create the batch endpoint and deployment. Let's start with the
endpoint first. Endpoints only require a name and a description to be created. The
name of the endpoint will end-up in the URI associated with your endpoint.
Because of that, batch endpoint names need to be unique within an Azure
region. For example, there can be only one batch endpoint with the name
mybatchendpoint in westus2 .
Azure CLI
In this case, let's place the name of the endpoint in a variable so we can easily
reference it later.
Azure CLI
ENDPOINT_NAME="heart-classifier"
Azure CLI
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.js
on
name: heart-classifier-batch
description: A heart condition classifier for batch inference
auth_mode: aad_token
Azure CLI
5. Now, let create the deployment. MLflow models don't require you to indicate an
environment or a scoring script when creating the deployments as it is created for
you. However, you can specify them if you want to customize how the deployment
does inference.
Azure CLI
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.
json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-mlflow
description: A heart condition classifier based on XGBoost
type: model
model: azureml:heart-classifier-mlflow@latest
compute: azureml:batch-cluster
resources:
instance_count: 2
settings:
max_concurrency_per_instance: 2
mini_batch_size: 2
output_action: append_row
output_file_name: predictions.csv
retry_settings:
max_retries: 3
timeout: 300
error_threshold: -1
logging_level: info
Azure CLI
7 Note
6. Although you can invoke a specific deployment inside of an endpoint, you will
usually want to invoke the endpoint itself and let the endpoint decide which
deployment to use. Such deployment is named the "default" deployment. This
gives you the possibility of changing the default deployment and hence changing
the model serving the deployment without changing the contract with the user
invoking the endpoint. Use the following instruction to update the default
deployment:
Azure CLI
Azure CLI
DEPLOYMENT_NAME="classifier-xgboost-mlflow"
az ml batch-endpoint update --name $ENDPOINT_NAME --set
defaults.deployment_name=$DEPLOYMENT_NAME
7. At this point, our batch endpoint is ready to be used.
1. Let's create the data asset first. This data asset consists of a folder with multiple
CSV files that we want to process in parallel using batch endpoints. You can skip
this step is your data is already registered as a data asset or you want to use a
different input type.
Azure CLI
heart-dataset-unlabeled.yml
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/data.schema.json
name: heart-dataset-unlabeled
description: An unlabeled dataset for heart classification.
type: uri_folder
path: data
Azure CLI
2. Now that the data is uploaded and ready to be used, let's invoke the endpoint:
Azure CLI
Azure CLI
7 Note
The utility jq may not be installed on every installation. You can get
installation instructions in this link .
Tip
Notice how we are not indicating the deployment name in the invoke
operation. That's because the endpoint automatically routes the job to the
default deployment. Since our endpoint only has one deployment, then that
one is the default one. You can target an specific deployment by indicating
the argument/parameter deployment_name .
3. A batch job is started as soon as the command returns. You can monitor the status
of the job until it finishes:
Azure CLI
Azure CLI
There is one row per each data point that was sent to the model. For tabular data,
this means that one row is generated for each row in the input files and hence the
number of rows in the generated file ( predictions.csv ) equals the sum of all the
rows in all the processed files. For other data types, there is one row per each
processed file.
You can download the results of the job by using the job name:
Azure CLI
Azure CLI
Once the file is downloaded, you can open it using your favorite tool. The following
example loads the predictions using Pandas dataframe.
Python
2 Warning
The file predictions.csv may not be a regular CSV file and can't be read correctly
using pandas.read_csv() method.
The output looks as follows:
file prediction
heart-unlabeled-0.csv 0
heart-unlabeled-0.csv 1
... 1
heart-unlabeled-3.csv 0
Tip
Notice that in this example the input data was tabular data in CSV format and there
were 4 different input files (heart-unlabeled-0.csv, heart-unlabeled-1.csv, heart-
unlabeled-2.csv and heart-unlabeled-3.csv).
2 Warning
Nested folder structures are not explored during inference. If you are partitioning
your data using folders, make sure to flatten the structure beforehand.
2 Warning
Batch deployments will call the predict function of the MLflow model once per file.
For CSV files containing multiple rows, this may impose a memory pressure in the
underlying compute. When sizing your compute, take into account not only the
memory consumption of the data being read but also the memory footprint of the
model itself. This is specially true for models that processes text, like transformer-
based models where the memory consumption is not linear with the size of the
input. If you encouter several out-of-memory exceptions, consider splitting the
data in smaller files with less rows or implement batching at the row level inside of
the model/scoring script.
2 Warning
Be advised that any unsupported file that may be present in the input data will
make the job to fail. You will see an error entry as follows: "ERROR:azureml:Error
processing input file: '/mnt/batch/tasks/.../a-given-file.avro'. File type 'avro' is not
supported.".
Tip
If you like to process a different file type, or execute inference in a different way
that batch endpoints do by default you can always create the deploymnet with a
scoring script as explained in Using MLflow models with a scoring script.
Tip
Signatures in MLflow models are optional but they are highly encouraged as they
provide a convenient way to early detect data compatibility issues. For more
information about how to log models with signatures read Logging models with a
custom signature, environment or samples.
You can inspect the model signature of your model by opening the MLmodel file
associated with your MLflow model. For more details about how signatures work in
MLflow see Signatures in MLflow.
Flavor support
Batch deployments only support deploying MLflow models with a pyfunc flavor. If you
need to deploy a different flavor, see Using MLflow models with a scoring script.
" You need to process a file type not supported by batch deployments MLflow
deployments.
" You need to customize the way the model is run, for instance, use an specific flavor
to load it with mlflow.<flavor>.load() .
" You need to do pre/pos processing in your scoring routine when it is not done by
the model itself.
" The output of the model can't be nicely represented in tabular data. For instance, it
is a tensor representing an image.
" You model can't process each file at once because of memory constrains and it
needs to read it in chunks.
) Important
If you choose to indicate an scoring script for an MLflow model deployment, you
will also have to specify the environment where the deployment will run.
2 Warning
Customizing the scoring script for MLflow deployments is only available from the
Azure CLI or SDK for Python. If you are creating a deployment using Azure
Machine Learning studio UI , please switch to the CLI or the SDK.
Steps
Use the following steps to deploy an MLflow model with a custom scoring script.
c. Select the model you are trying to deploy and click on the tab Artifacts.
d. Take note of the folder that is displayed. This folder was indicated when the
model was registered.
2. Create a scoring script. Notice how the folder name model you identified before
has been included in the init() function.
deployment-custom/code/batch_driver.py
Python
import os
import glob
import mlflow
import pandas as pd
def init():
global model
global model_input_types
global model_output_names
def run(mini_batch):
print(f"run method start: {__file__}, run({len(mini_batch)}
files)")
data = pd.concat(
map(
lambda fp:
pd.read_csv(fp).assign(filename=os.path.basename(fp)), mini_batch
)
)
if model_input_types:
data = data.astype(model_input_types)
pred = model.predict(data)
3. Let's create an environment where the scoring script can be executed. Since our
model is MLflow, the conda requirements are also specified in the model package
(for more details about MLflow models and the files included on it see The
MLmodel format). We are going then to build the environment using the conda
dependencies from the file. However, we need also to include the package
azureml-core which is required for Batch Deployments.
Tip
) Important
This example uses a conda environment specified at /heart-classifier-
mlflow/environment/conda.yaml . This file was created by combining the
original MLflow conda dependencies file and adding the package azureml-
core . You can't use the conda.yml file from the model directly.
Azure CLI
YAML
environment:
name: batch-mlflow-xgboost
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
conda_file: environment/conda.yaml
Azure CLI
YAML
$schema:
https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.
json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-custom
description: A heart condition classifier based on XGBoost
type: model
model: azureml:heart-classifier-mlflow@latest
environment:
name: batch-mlflow-xgboost
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
conda_file: environment/conda.yaml
code_configuration:
code: code
scoring_script: batch_driver.py
compute: azureml:batch-cluster
resources:
instance_count: 2
settings:
max_concurrency_per_instance: 2
mini_batch_size: 2
output_action: append_row
output_file_name: predictions.csv
retry_settings:
max_retries: 3
timeout: 300
error_threshold: -1
logging_level: info
Azure CLI
Azure CLI
Clean up resources
Azure CLI
Run the following code to delete the batch endpoint and all the underlying
deployments. Batch scoring jobs won't be deleted.
Azure CLI
Next steps
Customize outputs in batch deployments
Deploy and run MLflow models in Spark
jobs
Article • 01/03/2023
In this article, learn how to deploy and run your MLflow model in Spark jobs to
perform inference over large amounts of data or as part of data wrangling jobs.
The model is based on the UCI Heart Disease Data Set . The database contains 76
attributes, but we are using a subset of 14 of them. The model tries to predict the
presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1
(presence). It has been trained using an XGBBoost classifier and all the required
preprocessing has been packaged as a scikit-learn pipeline, making this model an
end-to-end pipeline that goes from raw data to predictions.
The information in this article is based on code samples contained in the azureml-
examples repository. To run the commands locally without having to copy/paste files,
clone the repo, and then change directories to sdk/using-mlflow/deploy .
Azure CLI
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
Install Mlflow SDK package mlflow and Azure Machine Learning plug-in for
MLflow azureml-mlflow .
Bash
You need an Azure Machine Learning workspace. You can create one following this
tutorial.
See which access permissions you need to perform your MLflow operations with
your workspace.
You must have a MLflow model registered in your workspace. Particularly, this
example will register a model trained for the Diabetes dataset .
Tracking is already configured for you. Your default credentials will also be used
when working with MLflow.
Python
model_name = 'heart-classifier'
model_local_path = "model"
registered_model = mlflow_client.create_model_version(
name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version
Alternatively, if your model was logged inside of a run, you can register it directly.
Tip
To register the model, you'll need to know the location where the model has been
stored. If you are using autolog feature of MLflow, the path will depend on the
type and framework of the model being used. We recommend to check the jobs
output to identify which is the name of this folder. You can look for the folder that
contains a file named MLModel . If you are logging your models manually using
log_model , then the path is the argument you pass to such method. As an example,
Python
model_name = 'heart-classifier'
registered_model = mlflow_client.create_model_version(
name=model_name, source=f"runs://{RUN_ID}/{MODEL_PATH}"
)
version = registered_model.version
7 Note
The path MODEL_PATH is the location where the model has been stored in the run.
Python
import urllib
urllib.request.urlretrieve("https://azuremlexampledata.blob.core.windows.net
/data/heart-disease-uci/data/heart.csv", "/tmp/data")
Move the data to a mounted storage account available to the entire cluster.
Python
dbutils.fs.mv("file:/tmp/data", "dbfs:/")
) Important
The previous code uses dbutils , which is a tool available in Azure Databricks
cluster. Use the appropriate tool depending on the platform you are using.
Python
input_data_path = "dbfs:/data"
YAML
- mlflow<3,>=2.1
- cloudpickle==2.2.0
- scikit-learn==1.2.0
- xgboost==1.7.2
import mlflow
import pyspark.sql.functions as f
4. Configure the model URI. The following URI brings a model named heart-
classifier in its latest version.
Python
model_uri = "models:/heart-classifier/latest"
Python
Tip
Use the argument result_type to control the type returned by the predict()
function.
Python
df = spark.read.option("header", "true").option("inferSchema",
"true").csv(input_data_path).drop("target")
In our case, the input data is on CSV format and placed in the folder dbfs:/data/ .
We're also dropping the column target as this dataset contains the target variable
to predict. In production scenarios, your data won't have this column.
7. Run the function predict_function and place the predictions on a new column. In
this case, we're placing the predictions in the column predictions .
Python
df.withColumn("predictions", score_function(*df.columns))
Tip
Python
scored_data_path = "dbfs:/scored-data"
scored_data.to_csv(scored_data_path)
7 Note
To learn more about Spark jobs in Azure Machine Learning, see Submit Spark jobs
in Azure Machine Learning (preview).
1. A Spark job requires a Python script that takes arguments. Create a scoring script:
score.py
Python
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--model")
parser.add_argument("--input_data")
parser.add_argument("--scored_data")
args = parser.parse_args()
print(args.model)
print(args.input_data)
The above script takes three arguments --model , --input_data and --scored_data .
The first two are inputs and represent the model we want to run and the input
data, the last one is an output and it is the output folder where predictions will be
placed.
Tip
Installation of Python packages: The previous scoring script loads the MLflow
model into an UDF function, but it indicates the parameter
env_manager="conda" . When this parameter is set, MLflow will restore the
required packages as specified in the model definition in an isolated
environment where only the UDF function runs. For more details see
mlflow.pyfunc.spark_udf documentation.
mlflow-score-spark-job.yml
yml
$schema: http://azureml/sdk-2-0/SparkJob.json
type: spark
code: ./src
entry:
file: score.py
conf:
spark.driver.cores: 1
spark.driver.memory: 2g
spark.executor.cores: 2
spark.executor.memory: 2g
spark.executor.instances: 2
inputs:
model:
type: mlflow_model
path: azureml:heart-classifier@latest
input_data:
type: uri_file
path: https://azuremlexampledata.blob.core.windows.net/data/heart-
disease-uci/data/heart.csv
mode: direct
outputs:
scored_data:
type: uri_folder
args: >-
--model ${{inputs.model}}
--input_data ${{inputs.input_data}}
--scored_data ${{outputs.scored_data}}
identity:
type: user_identity
resources:
instance_type: standard_e4s_v3
runtime_version: "3.2"
Tip
3. The YAML files shown above can be used in the az ml job create command, with
the --file parameter, to create a standalone Spark job as shown:
Azure CLI
Next steps
Deploy MLflow models to batch endpoints
Deploy MLflow models to online endpoint
Using MLflow models for no-code deployment
Bring your R workloads
Article • 02/24/2023
There's no Azure Machine Learning SDK for R. Instead, you'll use either the CLI or a
Python control script to run your R scripts.
This article outlines the key scenarios for R that are supported in Azure Machine
Learning and known limitations.
Typical R workflow
A typical workflow for using R with Azure Machine Learning:
Submit remote asynchronous R jobs (you submit jobs via the CLI or Python SDK,
not R)
Build an environment
Log job artifacts, parameters, tags and models
RStudio running as a custom application (such as Posit Use Jupyter Notebooks with the R
or RStudio) within a container on the compute kernel on the compute instance.
instance can't access workspace assets or MLflow.
Parallel job step isn't supported. Run a script in parallel n times using
different input parameters. But you'll
have to meta-program to generate n
YAML or CLI calls to do it.
Zero code deployment (that is, automatic deployment) Create a custom container with plumber
of an R MLflow model is currently not supported. for deployment.
Azure Machine Learning online deployment yml can Follow the steps in How to deploy a
only use image URIs directly from the registry for the registered R model to an online (real
environment specification; not pre-built environments time) endpoint for the correct way to
from the same Dockerfile. deploy.
Next steps
Learn more about R in Azure Machine Learning:
Interactive R development
Adapt your R script to run in production
How to train R models in Azure Machine Learning
How to deploy an R model to an online (real time) endpoint
Interactive R development
Article • 06/01/2023
This article shows how to use R on a compute instance in Azure Machine Learning
studio, that runs an R kernel in a Jupyter notebook.
The popular RStudio IDE also works. You can install RStudio or Posit Workbench in a
custom container on a compute instance. However, this has limitations in reading and
writing to your Azure Machine Learning workspace.
) Important
The code shown in this article works on an Azure Machine Learning compute
instance. The compute instance has an environment and configuration file
necessary for the code to run successfully.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today
An Azure Machine Learning workspace and a compute instance
A basic understand of using Jupyter notebooks in Azure Machine Learning studio.
See Model development on a cloud workstation for more information.
If you're not sure how to create and work with notebooks in studio, review
Run Jupyter notebooks in your workspace
6. On the notebook toolbar, make sure your compute instance is running. If not, start
it now.
Access data
You can upload files to your workspace file storage resource, and then access those files
in R. However, for files stored in Azure data assets or data from datastores, you must
install some packages.
This section describes how to use Python and the reticulate package to load your data
assets and datastores into R, from an interactive session. You use the azureml-fsspec
Python package and the reticulate R package to read tabular data as Pandas
DataFrames. This section also includes an example of reading data assets and datastores
into an R data.frame .
Bash
#!/bin/bash
set -e
pip installs azureml-fsspec in the default conda environment for the compute
instance
Installs the R reticulate package if necessary (version must be 1.26 or greater)
7 Note
1. Ensure you have the correct version of reticulate . For a version less than 1.26, try
to use a newer compute instance.
packageVersion("reticulate")
2. Load reticulate and set the conda environment where azureml-fsspec was
installed
R
library(reticulate)
use_condaenv("azureml_py310_sdkv2")
print("Environment is set")
py_run_string(py_code)
print("ml_client is configured")
b. Use this code to retrieve the asset. Make sure to replace <DATA_NAME> and
<VERSION_NUMBER> with the name and number of your data asset.
Tip
In studio, select Data in the left navigation to find the name and version
number of your data asset.
py_run_string(py_code)
print(paste("URI path is", py$data_uri))
4. Use Pandas read functions to read the file(s) into the R environment
R
pd <- import("pandas")
cc <- pd$read_csv(py$data_uri)
head(cc)
You can also use a Datastore URI to access different files on a registered Datastore, and
read these resources into an R data.frame .
Tip
Install R packages
A compute instance has many preinstalled R packages.
To install other packages, you must explicitly state the location and dependencies.
Tip
When you create or use a different compute instance, you must re-install any
packages you've installed.
install.packages("tsibble",
dependencies = TRUE,
lib = "/home/azureuser")
7 Note
Load R libraries
Add /home/azureuser to the R library path.
.libPaths("/home/azureuser")
Tip
You must update the .libPaths in each interactive R script to access user installed
libraries. Add this code to the top of each interactive R script or notebook.
library('tsibble')
7 Note
From an interactive R session, you can only write to the workspace file system.
From an interactive R session, you cannot interact with MLflow (such as log
model or query registry).
Next steps
Adapt your R script to run in production
Adapt your R script to run in production
Article • 02/26/2023
This article explains how to take an existing R script and make the appropriate changes
to run it as a job in Azure Machine Learning.
You'll have to make most of, if not all, of the changes described in detail in this article.
Add parsing
If your script requires any sort of input parameter (most scripts do), pass the inputs into
the script via the Rscript call.
Bash
Rscript <name-of-r-script>.R
--data_file ${{inputs.<name-of-yaml-input-1>}}
--brand ${{inputs.<name-of-yaml-input-2>}}
In your R script, parse the inputs and make the proper type conversions. We recommend
that you use the optparse package.
You can also add defaults, which are handy for testing. We recommend that you add an
--output parameter with a default value of ./outputs so that any output of the script
will be stored.
library(optparse)
parser <- OptionParser()
args is a named list. You can use any of these parameters later in your script.
library(mlflow)
library(httr)
library(later)
library(tcltk2)
if (response$status_code != 200){
error_response = paste("Error fetching token will try again
after sometime: ", str(response), sep = " ")
warning(error_response)
}
if (response$status_code == 200){
text <- content(response, "text", encoding = "UTF-8")
json_resp <-jsonlite::fromJSON(text, simplifyVector = FALSE)
json_resp$token
Sys.setenv(MLFLOW_TRACKING_TOKEN = json_resp$token)
message("Refreshing token done")
}
}
clean_tracking_uri()
tcltk2::tclTaskSchedule(as.integer(Sys.getenv("MLFLOW_TOKEN_REFRESH_INT
ERVAL_SECONDS", 30))*1000, fetch_token_from_aml(), id =
"fetch_token_from_aml", redo = TRUE)
R
source("azureml_utils.R")
Define the input parameter as shown in the parameters section. Use the parameter,
data-file , to specify a whole path, so that you can use read_csv(args$data_file) to
) Important
This section does not apply to models. See the following two sections for model
specific saving and logging instructions.
You can store arbitrary script outputs like data files, images, serialized R objects, etc. that
are generated by the R script in Azure Machine Learning. Create a ./outputs directory
to store any generated artifacts (images, models, data, etc.) Any files saved to ./outputs
will be automatically included in the run and uploaded to the experiment at the end of
the run. Since you added a default value for the --output parameter in the input
parameters section, include the following code snippet in your R script to create the
output directory.
if (!dir.exists(args$output)) {
dir.create(args$output)
}
After you create the directory, save your artifacts to that directory. For example:
R
# create and save a plot
library(ggplot2)
ggsave(myplot,
filename = file.path(args$output,"forecast-plot.png"))
If your R script trains a model and you produce a model object, you'll need to
crate it to be able to deploy it at a later time with Azure Machine Learning.
When using the crate function, use explicit namespaces when calling any package
function you need.
Let's say you have a timeseries model object called my_ts_model created with the fable
package. In order to make this model callable when it's deployed, create a crate where
you'll pass in the model object and a forecasting horizon in number of periods:
library(carrier)
crated_model <- crate(function(x)
{
fabletools::forecast(!!my_ts_model, h = x)
})
7 Note
When you log a model, the model is also saved and added to the run artifacts.
There is no need to explicitly save a model unless you did not log it.
For example, to log the crated_model object as created in the previous section, you
would include the following code in your R script:
Tip
Use models as value for artifact_path when logging a model, this is a best
practice (even though you can name it something else.)
mlflow_start_run()
mlflow_log_model(
model = crated_model, # the crate model object
artifact_path = "models" # a path to save the model object to
)
mlflow_log_param(<key-name>, <value>)
R
# BEGIN R SCRIPT
# source the azureml_utils.R script which is needed to use the MLflow back
end
# with R
source("azureml_utils.R")
# load your packages here. Make sure that they are installed in the
container.
library(...)
mlflow_log_param(<key-name>, <value>)
Create an environment
To run your R script, you'll use the ml extension for Azure CLI, also referred to as CLI v2.
The ml command uses a YAML job definitions file. For more information about
submitting jobs with az ml , see Train models with Azure Machine Learning CLI.
The YAML job file specifies an environment. You'll need to create this environment in
your workspace before you can run the job.
You can create the environment in Azure Machine Learning studio or with the Azure CLI.
Whatever method you use, you'll use a Dockerfile. All Docker context files for R
environments must have the following specification in order to work on Azure Machine
Learning:
Dockerfile
FROM rocker/tidyverse:latest
# Install python
RUN apt-get update -qq && \
apt-get install -y python3-pip tcl tk libz-dev libpng-dev
# Install azureml-MLflow
RUN pip install azureml-MLflow
RUN pip install MLflow
# Install R packages required for logging with MLflow (these are necessary)
RUN R -e "install.packages('mlflow', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('carrier', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('optparse', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('tcltk2', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
The base image is rocker/tidyverse:latest , which has many R packages and their
dependencies already installed.
) Important
You must install any R packages your script will need to run in advance. Add more
lines to the Docker context file as needed.
Dockerfile
Additional suggestions
Some additional suggestions you may want to consider:
Next steps
How to train R models in Azure Machine Learning
Run an R job to train a model
Article • 07/13/2023
This article explains how to take the R script that you adapted to run in production and
set it up to run as an R job using the Azure Machine Learning CLI V2.
7 Note
Although the title of this article refers to training a model, you can actually run any
kind of R script as long as it meets the requirements listed in the adapting article.
Prerequisites
An Azure Machine Learning workspace.
A registered data asset that your training job will use.
Azure CLI and ml extension installed. Or use a compute instance in your
workspace, which has the CLI preinstalled.
A compute cluster or compute instance to run your training job.
An R environment for the compute cluster to use to run the job.
📁 r-job-azureml
├─ src
│ ├─ azureml_utils.R
│ ├─ r-source.R
├─ job.yml
) Important
The r-source.R file is the R script that you adapted to run in production
The azureml_utils.R file is necessary. The source code is shown here
You'll need to gather specific pieces of information to put into the YAML:
The name of the registered data asset you'll use as the data input (with version):
azureml:<REGISTERED-DATA-ASSET>:<VERSION>
Tip
For Azure Machine Learning artifacts that require versions (data assets,
environments), you can use the shortcut URI azureml:<AZUREML-ASSET>@latest to
get the latest version of that artifact if you don't need to set a specific version.
yml
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
# the Rscript command goes in the command key below. Here you also specify
# which parameters are passed into the R script and can reference the input
# keys and values further below
# Modify any value shown below <IN-BRACKETS-AND-CAPS> (remove the brackets)
command: >
Rscript <NAME-OF-R-SCRIPT>.R
--data_file ${{inputs.datafile}}
--other_input_parameter ${{inputs.other}}
code: src # this is the code directory
inputs:
datafile: # this is a registered data asset
type: uri_file
path: azureml:<REGISTERED-DATA-ASSET>@latest
other: 1 # this is a sample parameter, which is the number 1 (as text)
environment: azureml:<R-ENVIRONMENT-NAME>@latest
compute: azureml:<COMPUTE-CLUSTER-OR-INSTANCE-NAME>
experiment_name: <NAME-OF-EXPERIMENT>
description: <DESCRIPTION>
Bash
cd r-job-azureml
2. Sign in to Azure. If you're doing this from an Azure Machine Learning compute
instance, use:
Azure CLI
az login --identity
If you're not on the compute instance, omit --identity and follow the prompt to
open a browser window to authenticate.
3. Make sure you have the most recent versions of the CLI and the ml extension:
Azure CLI
az upgrade
4. If you have multiple Azure subscriptions, set the active subscription to the one
you're using for your workspace. (You can skip this step if you only have access to
a single subscription.) Replace <SUBSCRIPTION-NAME> with your subscription name.
Also remove the brackets <> .
Azure CLI
5. Now use CLI to submit the job. If you're doing this on a compute instance in your
workspace, you can use environment variables for the workspace name and
resource group as show in the following code. If you aren't on a compute instance,
replace these values with your workspace name and resource group.
Azure CLI
Once you've submitted the job, you can check the status and results in studio:
Register model
Finally, once the training job is complete, register your model if you want to deploy it.
Start in the studio from the page showing your job details.
1. Once your job completes, select Outputs + logs to view outputs of the job.
2. Open the models folder to verify that crate.bin and MLmodel are present. If not,
check the logs to see if there was an error.
4. For Model type, change the default from MLflow to Unspecified type.
5. For Job output, select models, the folder that contains the model.
6. Select Next.
7. Supply the name you wish to use for your model. Add Description, Version, and
Tags if you wish.
8. Select Next.
At the top of the page, you'll see a confirmation that the model is registered. The
confirmation looks similar to this:
Select Click here to go to this model. if you wish to view the registered model details.
Next steps
Now that you have a registered model, learn How to deploy an R model to an online
(real time) endpoint.
How to deploy a registered R model to
an online (real time) endpoint
Article • 02/24/2023
In this article, you'll learn how to deploy an R model to a managed endpoint (Web API)
so that your application can score new data against the model in near real-time.
Prerequisites
An Azure Machine Learning workspace.
Azure CLI and ml extension installed. Or use a compute instance in your
workspace, which has the CLI pre-installed.
At least one custom environment associated with your workspace. Create an R
environment, or any other custom environment if you don't have one.
An understanding of the R plumber package
A model that you've trained and packaged with crate, and registered into your
workspace
📂 r-deploy-azureml
├─📂 docker-context
│ ├─ Dockerfile
│ └─ start_plumber.R
├─📂 src
│ └─ plumber.R
├─ deployment.yml
├─ endpoint.yml
The contents of each of these files is shown and explained in this article.
Dockerfile
This is the file that defines the container environment. You'll also define the installation
of any additional R packages here.
A sample Dockerfile will look like this:
Dockerfile
# OPTIONAL: Install any additional R packages you may need for your model
crate to run
RUN R -e "install.packages('<PACKAGE-NAME>', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
RUN R -e "install.packages('<PACKAGE-NAME>', dependencies = TRUE, repos =
'https://cloud.r-project.org/')"
# REQUIRED
ENTRYPOINT []
Modify the file to add the packages you need for your scoring script.
plumber.R
) Important
This section shows how to structure the plumber.R script. For detailed information
about the plumber package, see plumber documentation .
The file plumber.R is the R script where you'll define the function for scoring. This script
also performs tasks that are necessary to make your endpoint work. The script:
Gets the path where the model is mounted from the AZUREML_MODEL_DIR
environment variable in the container.
Loads a model object created with the crate function from the carrier package,
which was saved as crate.bin when it was packaged.
Unserializes the model object
Defines the scoring function
Tip
Make sure that whatever your scoring function produces can be converted back to
JSON. Some R objects are not easily converted.
# plumber.R
# This script will be deployed to a managed endpoint to do the model scoring
# REQUIRED
# When you deploy a model as an online endpoint, Azure Machine Learning
mounts your model
# to your endpoint. Model mounting enables you to deploy new versions of the
model without
# having to create a new Docker image.
# REQUIRED
# This reads the serialized model with its respecive predict/score method
you
# registered. The loaded load_model object is a raw binary object.
load_model <- readRDS(paste0(model_dir, "/models/crate.bin"))
# REQUIRED
# You have to unserialize the load_model object to make it its function
scoring_function <- unserialize(load_model)
# REQUIRED
# << Readiness route vs. liveness route >>
# An HTTP server defines paths for both liveness and readiness. A liveness
route is used to
# check whether the server is running. A readiness route is used to check
whether the
# server's ready to do work. In machine learning inference, a server could
respond 200 OK
# to a liveness request before loading a model. The server could respond 200
OK to a
# readiness request only after the model has been loaded into memory.
#* Liveness check
#* @get /live
function() {
"alive"
}
#* Readiness check
#* @get /ready
function() {
"ready"
}
# << The scoring function >>
# This is the function that is deployed as a web API that will score the
model
# Make sure that whatever you are producing as a score can be converted
# to JSON to be sent back as the API response
# in the example here, forecast_horizon (the number of time units to
forecast) is the input to scoring_function.
# the output is a tibble
# we are converting some of the output types so they work in JSON
#* @param forecast_horizon
#* @post /score
function(forecast_horizon) {
scoring_function(as.numeric(forecast_horizon)) |>
tibble::as_tibble() |>
dplyr::transmute(period = as.character(yr_wk),
dist = as.character(logmove),
forecast = .mean) |>
jsonlite::toJSON()
}
start_plumber.R
The file start_plumber.R is the R script that gets run when the container starts, and it
calls your plumber.R script. Use the following script as-is.
pr <- plumber::plumb(entry_script_path)
do.call(pr$run, args)
Build container
These steps assume you have an Azure Container Registry associated with your
workspace, which is created when you create your first custom environment. To see if
you have a custom environment:
Once you have verified that you have at least one custom environment, use the
following steps to build a container.
1. Open a terminal window and sign in to Azure. If you're doing this from an Azure
Machine Learning compute instance, use:
Azure CLI
az login --identity
If you're not on the compute instance, omit --identity and follow the prompt to
open a browser window to authenticate.
2. Make sure you have the most recent versions of the CLI and the ml extension:
Azure CLI
az upgrade
3. If you have multiple Azure subscriptions, set the active subscription to the one
you're using for your workspace. (You can skip this step if you only have access to
a single subscription.) Replace <SUBSCRIPTION-NAME> with your subscription name.
Also remove the brackets <> .
Azure CLI
4. Set the default workspace. If you're doing this from a compute instance, you can
use the following command as is. If you're on any other computer, substitute your
resource group and workspace name instead. (You can find these values in Azure
Machine Learning studio.)
Azure CLI
Bash
cd r-deploy-azureml
6. To build the image in the cloud, execute the following bash commands in your
terminal. Replace <IMAGE-NAME> with the name you want to give the image.
If your workspace is in a virtual network, see Enable Azure Container Registry (ACR)
for additional steps to add --image-build-compute to the az acr build command
in the last line of this code.
Azure CLI
) Important
It will take a few minutes for the image to be built. Wait until the build process is
complete before proceeding to the next section. Don't close this terminal, you'll use
it next to create the deployment.
The az acr command will automatically upload your docker-context folder - that
contains the artifacts to build the image - to the cloud where the image will be built and
hosted in an Azure Container Registry.
Deploy model
In this section of the article, you'll define and create an endpoint and deployment to
deploy the model and image built in the previous steps to a managed online endpoint.
A deployment is a set of resources required for hosting the model that does the actual
scoring. A single endpoint can contain multiple deployments. The load balancing
capabilities of Azure Machine Learning managed endpoints allows you to give any
percentage of traffic to each deployment. Traffic allocation can be used to do safe
rollout blue/green deployments by balancing requests between different instances.
yml
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schem
a.json
name: <ENDPOINT-NAME>
auth_mode: aml_token
2. Using the same terminal where you built the image, execute the following CLI
command to create an endpoint:
Azure CLI
Create deployment
1. To create your deployment, add the following code to the deployment.yml file.
Replace <ENDPOINT-NAME> with the endpoint name you defined in the
endpoint.yml file
Replace <DEPLOYMENT-NAME> with the name you want to give the deployment
Bash
echo $IMAGE_TAG
yml
$schema:
https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.sch
ema.json
name: <DEPLOYMENT-NAME>
endpoint_name: <ENDPOINT-NAME>
code_configuration:
code: ./src
scoring_script: plumber.R
model: <MODEL-URI>
environment:
image: <IMAGE-TAG>
inference_config:
liveness_route:
port: 8000
path: /live
readiness_route:
port: 8000
path: /ready
scoring_route:
port: 8000
path: /score
instance_type: Standard_DS2_v2
instance_count: 1
2. Next, in your terminal execute the following CLI command to create the
deployment (notice that you're setting 100% of the traffic to this model):
Azure CLI
It may take several minutes for the service to be deployed. Wait until deployment is
finished before proceeding to the next section.
Test
Once your deployment has been successfully created, you can test the endpoint using
studio or the CLI:
Studio
Navigate to the Azure Machine Learning studio and select from the left-hand
menu Endpoints. Next, select the r-endpoint-iris you created earlier.
Enter the following json into the Input data to rest real-time endpoint textbox:
JSON
{
"forecast_horizon" : [2]
}
Clean-up resources
Now that you've successfully scored with your endpoint, you can delete it so you don't
incur ongoing cost:
Azure CLI
az ml online-endpoint delete --name r-endpoint-forecast
Next steps
For more information about using R with Azure Machine Learning, see Overview of R
capabilities in Azure Machine Learning
Run Azure Machine Learning models
from Fabric, using batch endpoints
(preview)
Article • 11/15/2023
In this article, you learn how to consume Azure Machine Learning batch deployments
from Microsoft Fabric. Although the workflow uses models that are deployed to batch
endpoints, it also supports the use of batch pipeline deployments from Fabric.
) Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and we don't recommend it for production workloads.
Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
Prerequisites
Get a Microsoft Fabric subscription. Or sign up for a free Microsoft Fabric trial.
Sign in to Microsoft Fabric.
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine
Learning .
An Azure Machine Learning workspace. If you don't have one, use the steps in How
to manage workspaces to create one.
Ensure that you have the following permissions in the workspace:
Create/manage batch endpoints and deployments: Use roles Owner,
contributor, or custom role allowing
Microsoft.MachineLearningServices/workspaces/batchEndpoints/* .
Create ARM deployments in the workspace resource group: Use roles Owner,
contributor, or custom role allowing Microsoft.Resources/deployments/write
in the resource group where the workspace is deployed.
A model deployed to a batch endpoint. If you don't have one, use the steps in
Deploy models for scoring in batch endpoints to create one.
Download the heart-unlabeled.csv sample dataset to use for scoring.
Architecture
Azure Machine Learning can't directly access data stored in Fabric's OneLake. However,
you can use OneLake's capability to create shortcuts within a Lakehouse to read and
write data stored in Azure Data Lake Gen2. Since Azure Machine Learning supports
Azure Data Lake Gen2 storage, this setup allows you to use Fabric and Azure Machine
Learning together. The data architecture is as follows:
In this section, you create or identify a storage account to use for storing the
information that the batch endpoint will consume and that Fabric users will see in
OneLake. Fabric only supports storage accounts with hierarchical names enabled, such
as Azure Data Lake Gen2.
2. From the left-side panel, select your Fabric workspace to open it.
3. Open the lakehouse that you'll use to configure the connection. If you don't have a
lakehouse already, go to the Data Engineering experience to create a lakehouse. In
this example, you use a lakehouse named trusted.
4. In the left-side navigation bar, open more options for Files, and then select New
shortcut to bring up the wizard.
6. In the Connection settings section, paste the URL associated with the Azure Data
Lake Gen2 storage account.
8. Select Next.
9. Configure the path to the shortcut, relative to the storage account, if needed. Use
this setting to configure the folder that the shortcut will point to.
10. Configure the Name of the shortcut. This name will be a path inside the lakehouse.
In this example, name the shortcut datasets.
5. Select Create.
Tip
Why should you configure Azure Blob Storage instead of Azure Data Lake
Gen2? Batch endpoints can only write predictions to Blob Storage
accounts. However, every Azure Data Lake Gen2 storage account is also a
blob storage account; therefore, they can be used interchangeably.
c. Select the storage account from the wizard, using the Subscription ID, Storage
account, and Blob container (file system).
d. Select Create.
7. Ensure that the compute where the batch endpoint is running has permission to
mount the data in this storage account. Although access is still granted by the
identity that invokes the endpoint, the compute where the batch endpoint runs
needs to have permission to mount the storage account that you provide. For
more information, see Accessing storage services.
4. Create a folder to store the sample dataset that you want to score. Name the
folder uci-heart-unlabeled.
5. Use the Get data option and select Upload files to upload the sample dataset
heart-unlabeled.csv.
7. The sample file is ready to be consumed. Note the path to the location where you
saved it.
1. Return to the Data Engineering experience (if you already navigated away from it),
by using the experience selector icon in the lower left corner of your home page.
5. Select the Activities tab from the toolbar in the designer canvas.
6. Select more options at the end of the tab and select Azure Machine Learning.
b. In the Connection settings section of the creation wizard, specify the values of
the subscription ID, Resource group name, and Workspace name, where your
endpoint is deployed.
d. Save the connection. Once the connection is selected, Fabric automatically
populates the available batch endpoints in the selected workspace.
8. For Batch endpoint, select the batch endpoint you want to call. In this example,
select heart-classifier-....