0% found this document useful (0 votes)

17 views61 pages

Solution (Updated)

The document outlines a case study for Margie's Travel, focusing on implementing a web app using Azure ML and Databricks to predict flight delays based on weather data. It details the steps for setting up the Azure environment, creating a Databricks cluster, loading datasets, and developing a data pipeline using Azure Data Factory. The goal is to enhance customer service by providing timely information to business travelers during flight bookings.

Uploaded by

d27993746

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views61 pages

Solution (Updated)

Uploaded by

d27993746

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

End-to-End: Azure ML and Big Data

Case Study – Students Guide

Overview
Margie's Travel (MT) provides concierge services for business travelers. In an
increasingly crowded market, they are always looking for ways to differentiate
themselves, and provide added value to their corporate customers.

They are looking to pilot a web app that their internal customer service agents can
use to provide additional information useful to the traveler during the flight booking
process. They want to enable their agents to enter in the flight information and
produce a prediction as to whether the departing flight will encounter a 15-minute
or longer delay, considering the weather forecasted for the departure hour.

Solution architecture
Below is a diagram of the solution architecture you will build in this lab. Please study
this carefully so you understand the whole of the solution as you are working on the
various components.
Exercise 1: Retrieve lab environment information and
create Databricks cluster.
Duration: 10 minutes

In this exercise, you will retrieve your Azure Storage account name and access key
and your Azure Subscription Id and record the values to use later within the lab. You
will also create a new Azure Databricks cluster.

Task 1: Retrieve Azure Storage account information and

Subscription Id
You will need to have the Azure Storage account name and access key when you
create your Azure Databricks cluster during the lab. You will also need to create
storage containers in which you will store your flight and weather data files.

1. From the side menu in the Azure portal, choose Resource groups, then enter
your resource group name into the filter box, and select it from the list.

2. Next, select your lab Azure Storage account from the list.

3. On the left menu, select Overview, locate, and copy your Azure
Subscription ID and save to a text editor such as Notepad for later.
4. Select Access keys (1) from the menu. Copy the storage account name (2)
and the key1 key (3) and copy the values to a text editor such as Notepad for
later.

Task 2: Create an Azure Databricks cluster

You have provisioned an Azure Databricks workspace, and now you need to create a
new cluster within the workspace. Part of the cluster configuration includes setting
up an account access key to your Azure Storage account, using the Spark Config
within the new cluster form. This will allow your cluster to access the lab files.

1. From the side menu in the Azure portal, select Resource groups, then enter
your resource group name into the filter box, and select it from the list.

2. Next, select your Azure Databricks service from the list.

3. In the Overview pane of the Azure Databricks service, select Launch
Workspace.

Azure Databricks will automatically log you in using Azure Active Directory
Single Sign On.
4. Select Compute from the side menu and then Create Compute.

5. On the New Cluster form, provide the following:

o Cluster Name: lab

o Policy: Unrestricted

o Single node: Selected

o Databricks Runtime Version: Runtime: 9.1 LTS ML (Scala 2.12, Spark 3.1.2)

o Node type: Compute Optimized → Standard_F4

o Terminate after: Check the box and enter 120

o Advanced options:

o Spark Config: Expand Advanced Options and edit the Spark Config by
entering the connection information for your Azure Storage account
that you copied above in Task 1. This will allow your cluster to access
the lab files. Enter the following:

[Link].<STORAGE_ACCOUNT_NAME>.[Link]
<ACCESS_KEY>, where <STORAGE_ACCOUNT_NAME> is your Azure
Storage account name, and <ACCESS_KEY> is your storage access
key.

Example: [Link]
HD+91Y77b+TezEu1lh9QXXU2Va6Cjg9bu0RRpb/KtBj8lWQa6jwyA0OGTDmSNVFr8iSlkytIFON
EHLdl67Fgxg==
6. Select Create Cluster.
Exercise 2: Load Sample Data and Databricks
Notebooks
Duration: 60 minutes
In this exercise, you will implement a classification experiment. You will load the
training data from your local machine into a dataset. Then, you will explore the data
to identify the primary components you should use for prediction and use two
different algorithms for predicting the classification. You will then evaluate the
performance of both algorithms and choose the algorithm that performs best. The
model selected will be exposed as a web service that is integrated with the optional
sample web app at the end.

Task 1: Upload the Sample Datasets

1. Before you begin working with machine learning services, there are three
datasets you need to load.

2. Download the three CSV sample datasets from here: [Link] (If
you get an error, or the page won't open, try pasting the URL into a new
browser window and verify the case sensitive URL is exactly as shown). If you
are still having trouble, a zip file called [Link] is
included in the assignment resources folder on blackboard.

3. Extract the ZIP and verify you have the following files:

o [Link]
o [Link]
o [Link]

4. Open your Azure Databricks workspace. Before continuing to the next step,
verify that your new cluster is running. Do this by navigating to Clusters on
the left-hand menu and ensuring that the state of your cluster is Running.
5. Select Catalog from the menu and choose your cluster from the dropdown.
Next, select default under Databases (if this does not appear, you may need
to start your cluster). Finally, select +Add -> Add data above the Tables
header.

4
1
3
6. Select the native integration DBFS from the listed Data source options.

7. Drag-and-drop the [Link] file into the file

area. Select Create Table with UI.
7. Select your labcluster to preview the table, then select Preview Table.

8. Change the Table Name to flight_delays_with_airport_codes and select the

checkmark for First row is header. Select Create Table.
9. Repeat steps 5 through 8 for the [Link] and
[Link] files, setting the name for each dataset in a
similar fashion:

o flightweatherwithairportcode_csv renamed
to flight_weather_with_airport_code
o airportcodelocationlookupclean_csv renamed
to airport_code_location_lookup_clean
Task 2: Install Azure ML library on the cluster
1. Select Compute on the left-hand menu, then select your lab cluster to open it.

1
2

2. Select the Libraries tab. If you do not see the Azure ML library already
installed on the cluster, continue to the next step. Otherwise, continue to Task
3.

3. Select Install New.

2
4. In the Install Library dialog, select PyPi for the Library Source, then enter the
following in the Package field: azureml-sdk[databricks]. Select Install.

5. Wait until the library's status shows as Installed before continuing.

8. Repeat step 4 with the library: mlflow

Task 3: Open Azure Databricks and complete lab notebooks
1. Within Azure Databricks, select Workspace on the menu, then Users, then
select the ellipses next to your username. Select Import.

1
2 4

2. Within the Import Notebooks dialog, select Import from: URL, then paste the
following into the URL textbox: Notes: You can also directly upload the dbc file (blackboard)
[Link]
[Link]
files/[Link]
Hands-on%20lab/lab-files/[Link]

3. After importing, expand the new BigDataVis folder.

1. Within the [Link] there should be 3 subfolders for exercises 2, 5

and 6 containing a total of 5 notebooks. Open the Exercise 2 subfolder and
explore the notebooks starting at 01 Data Preparation.
2. Run each cell of the notebooks located in the Exercise 2 folder (01, 02 and 03)
individually by selecting within the cell, then entering Ctrl+Enter on your
keyboard. Pay close attention to the instructions within the notebook so you
understand each step of the data preparation process. Make sure to connect
to your compute cluster using the dropdown menu pinned to the top of the
notebooks.

3. During Exercise 2 – 0.3 Deploy as a Web Service use the Machine Learning
group to explore and serve your model:

NOTE: you may not currently be able to serve your model while your lab cluster is running due
to CPU quotas imposed on free or trial accounts. If this is the case, please stop your cluster to
temporarily serve your model and execute the REST call in a local python notebook to test its
functionality.

4. Do NOT run the Clean up part of Notebook 3 (i.e. this command: [Link]()).
You will need the URL of your Machine Learning Model exposed later in
Exercise 8: Deploy intelligent web app (Optional Lab).

Note: you could get this URL by updating your Notebook and adding this
line print(service.scoring_uri), or by going to your Azure Machine Learning
service workspace via the Azure portal and then to the "Deployments" blade.

5. Do NOT run any notebooks within the Exercise 5 or 6 folders. They will be
discussed later in the lab.
Exercise 3: Setup Azure Data Factory
Duration: 20 minutes

In this exercise, you will create a baseline environment for Azure Data Factory
development for further operationalization of data movement and processing. You
will create a Data Factory service, and then install the Data Management Gateway
which is the agent that facilitates data movement from on-premises to Microsoft
Azure.

Task 1: Download and stage data to be processed

1. Open a web browser.

2. Download the AdventureWorks sample data from [Link]

3. Extract it to a new folder called C:\Data.

Task 2: Install and configure Azure Data Factory Integration

Runtime on your machine
1. To download the latest version of Azure Data Factory Integration Runtime, go
to [Link]

2. Select Download, then choose the download you want from the next screen.

3. Run the installer, once downloaded.

4. When you see the following screen, select Next.

5. Check the box to accept the terms and select Next.

6. Accept the default Destination Folder and select Next.

7. Choose Install to complete the installation.

8. Select Finish once the installation has completed.

9. After selecting Finish, the following screen will appear. Keep it open for now.
You will come back to this screen once the Data Factory in Azure has been
provisioned, and obtain the gateway key so we can connect Data Factory to
this "on-premises" server.
Task 3: Configure Azure Data Factory
1. Launch a new browser window and navigate to the Azure portal
([Link] Once prompted, log in with your Microsoft Azure
credentials. If prompted, choose whether your account is an organization
account or a Microsoft account. This will be based on which account was used
to provision your Azure subscription that is being used for this lab.

2. From the side menu in the Azure portal, choose Resource groups, then enter
your resource group name into the filter box, and select it from the list.

3. Next, select your Azure Data Factory service from the list.

4. On the Data Factory Overview screen, select Launch studio.

5. A new page will open in another tab or new window. Within the Azure Data
Factory site, select Manage on the menu.
6. Now, select Integration runtimes in the menu beneath Connections (1), then
select + New (2).

7. In the Integration Runtime Setup blade that appears, select Azure, Self-
Hosted, then select Continue.
8. Select Self-Hosted then select Continue.
9. Enter a Name, such as bigdatagateway-[initials], and select Create.

10. Under Option 2: Manual setup, copy the Key1 authentication key value by
selecting the Copy button, then select Close.

11. Don't close the current screen or browser session.

12. Paste the Key1 value into the box in the middle of the Microsoft Integration
Runtime Configuration Manager screen.

13. Select Register.

14. It can take up to a minute or two to register. If it takes more than a couple of
minutes, and the screen does not respond or returns an error message, close
the screen by selecting the Cancel button.

15. The next screen will be New Integration Runtime (Self-hosted) Node. Select
Finish.
16. You will then get a screen with a confirmation message. Select the Launch
Configuration Manager button to view the connection details.
17. You can now return to the Azure Data Factory page and view the Integration
Runtime you just configured. You may need to select Refresh to view the
Running status for the IR.

18. Select the Azure Data Factory Overview button on the menu. Leave this open
for the next exercise.
Exercise 4: Develop a data factory pipeline for data
movement.
Duration: 20 minutes

In this exercise, you will create an Azure Data Factory pipeline to copy data (.csv
files) from an on-premises server (your machine) to Azure Blob Storage. The goal of
the exercise is to demonstrate data movement from an on-premises location to
Azure Storage (via the Integration Runtime).

Task 1: Create copy pipeline using the Copy Data Wizard

1. Within the Azure Data Factory overview page, select Ingest.

2. In the Copy Data properties, enter the following:

o Task Type: Built-in copy task

o Task cadence or task schedule: Select Schedule

o Start Date (UTC): Enter 03/01/2018 12:00 AM

o Recurrence: Every 1, and select Month(s)
o Under the Advanced recurrence options, make sure you have a value
of 0 in the textboxes for Hours (UTC) and Minutes (UTC), otherwise it
will fail later during Publishing.

o End: No End
o Click Next;
3. Select Next.

4. On the Source data store screen, select + Create new connection.

5. Scroll through the options and select File System, then select Continue.

6. In the New Linked Service form, enter the following:

o Name: OnPremServer

o Connect via integration runtime: Select the Integration runtime

created previously in this exercise.

o Host: C:\Data

o Username: Use your machine's login username.

o Password: Use your machine's login password.

7. Select Test connection to verify you correctly entered the values. Finally,
select Create.
Debugging Connection Failed

If your connection fails, please check:

1) Your account details (your username might be the email associated with your Microsoft
account not the name shown),
2) Your firewall settings (if non-standard this may cause issues),
3) And Integration configurations:

By default, new installations disable access to local storage. This can be enabled by executing
[Link] found in “C:\Program Files\Microsoft Integration Runtime\5.0\Shared\” with the args: “-elma” and “-
DisableLocalFolderPathValidation”; See here for more details

• Open a new PowerShell with Admin permission:

• Navigate to the executable and issue the command:

Possibly also add

The error message "Cannot connect to '\Laptop123\C'. Detail Message: The system
could not find the environment option that was entered".
Solution at [Link]
connect-to-laptop123c-the-system-could-not

Try using below command to find your username: whoami

8. On the Source data store page choose the input file or folder screen, select Browse, then
select the FlightsAndWeather folder. Next, select Load all files under file loading behavior,
check Recursively, then select Next.

9. On the File format settings page, select the following options:

o File format: Delimited Text

o Column delimiter: Comma (,)

o Row delimiter: Auto detect (\r, \n, or \r\n)

o First row as header: Checked

10. Select Next.

11. On the Destination data store screen, select + Create new connection.

12. Select Azure Blob Storage within the New Connection blade, then
select Continue.
13. On the New Connection (Azure Blob Storage) account screen, enter the
following, test your connection, and then select Create.

o Name: BlobStorageOutput

o Connect via integration runtime: Select your Integration Runtime.

o Authentication method: Select Account key

o Account selection method: From Azure subscription

o Storage account name: Select the blob storage account you

provisioned in the before-the-lab section.
14. On the Destination data store page, select Next.

15. From the Choose the output file or folder tab, enter the following:
o Folder path: sparkcontainer/FlightsAndWeather/{Year}/{Month}/
o Filename: [Link]
o Year: yyyy
o Month: MM
o Copy behavior: Merge files
o Select Next.
16. On the File format settings screen, select the Delimited Text file format,
and check the Add header to file checkbox, then select Next.
17. On the Settings screen, enter:

o Task name: CopyOnPrem2AzurePipeline

o Task description: (Optional) This pipeline copies time-sliced CSV files from
on-premises C:\\Data to Azure Blob Storage as a continuous job.
o Select Skip incompatible rows under Fault tolerance,
o Uncheck Enable logging. Expand Advanced settings
o Set Degree of copy parallelism to 10
o Then select Next.
18. Review settings on the Summary tab, but DO NOT choose Next.

19. Scroll down on the summary page until you see the Copy Settings section.
Select Edit next to Copy Settings.
20. Change the following Copy setting:

o Timeout: 7.[Link]
o Retry: 3
o Select Save.

21. After saving the Copy settings, select Next on the Summary tab.

22. On the Deployment screen you will see a message that the deployment in is
progress, and after a minute or two that the deployment completed.
Select Edit Pipeline to close out of the wizard and navigate to the pipeline
editing blade.
Exercise 5: Operationalize ML scoring with Azure
Databricks and Data Factory
Duration: 20 minutes

In this exercise, you will extend the Data Factory to operationalize the scoring of data
using the previously created machine learning model within an Azure Databricks
notebook.

Task 1: Create Azure Databricks Linked Service

1. Return to, or reopen, the Author & Monitor page for your Azure Data Factory
in a web browser, navigate to the Author view, and select the pipeline.

2. Once there, expand Databricks under Activities.

3. Drag the Notebook activity onto the design surface to the side of the Copy
activity.

4. Select the Notebook activity on the design surface to display tabs containing
its properties and settings at the bottom of the screen. On the General tab,
enter BatchScore into the Name field.

5. Select the Azure Databricks tab, and select + New next to the Databricks
Linked service drop down. Here, you will configure a new linked service which
will serve as the connection to your Databricks cluster.
6. On the New Linked Service dialog, enter the following:

o Name: AzureDatabricks

o Connect via integration runtime: Leave set to Default.

o Account selection method: From Azure subscription

o Azure subscription: Choose your Azure Subscription.

o Databricks workspace: Pick your Databricks workspace to populate

the Domain automatically.

o Select cluster: Existing interactive cluster

7. Leave the form open and open your Azure Databricks workspace in another
browser tab. You will generate and retrieve the Access token here.
8. In Azure Databricks, select the Account icon in the top corner of the window,
then select User Settings.

9. Select Generate New Token under the Access Tokens tab. Enter ADF
access for the comment and leave the lifetime at 90 days. Select Generate.

10. Copy the generated token and paste it into a text editor such as Notepad
for a later step.
11. Switch back to your Azure Data Factory screen and paste the generated token
into the Access token field within the form. After a moment, select your
cluster underneath Choose from existing clusters. Select Create.

12. Switch back to Azure Databricks. Select Workspace in the menu. Select
the Exercise 5 folder then open notebook 01 Deploy for Batch Scoring.
Examine the content but don't run any of the cells yet. You need to
replace STORAGE-ACCOUNT-NAME with the name of the blob storage
account you copied in Exercise 1 into cmd 4.
13. Switch back to your Azure Data Factory screen. Select the Settings tab, then
browse to your Exercise 5/01 Deploy for Batch Score notebook into the
Notebook path field.

14. The final step is to connect the Copy activities with the Notebook activity.
Select the small green box on the side of the copy activity and drag the arrow
onto the Notebook activity on the design surface. What this means is that the
copy activity must complete processing and generate its files in your storage
account before the Notebook activity runs, ensuring the files required by the
BatchScore notebook are in place at the time of execution. Select Publish All,
then Publish, after making the connection.
Task 2: Trigger workflow
1. Switch back to Azure Data Factory. Select your pipeline if it is not already
opened.

2. Select Trigger, then Trigger Now located above the pipeline design surface.

3. Enter 3/1/2017into the windowStart parameter, then select OK.

4. Select Monitor in the menu. You will be able to see your pipeline activity in
progress as well as the status of past runs.

Note: You may need to restart your Azure Databricks cluster if it has
automatically terminated due to inactivity.
Exercise 6: Summarize data using Azure Databricks
Duration: 10 minutes

In this exercise, you will prepare a summary of flight delay data using Spark SQL.

Task 1: Summarize delays by airport

1. Open your Azure Databricks workspace, expand the Exercise 6 folder and
open the final notebook called 01 Explore Data.

2. Execute each cell and follow the instructions in the notebook that explains
each step.
Exercise 7: Visualizing in Power BI Desktop
Duration: 20 minutes

In this exercise, you will create visualizations in Power BI Desktop.

Task 1: Obtain the JDBC connection string to your Azure

Databricks cluster
Before you begin, you must first obtain the JDBC connection string to your Azure
Databricks cluster.

1. In Azure Databricks, go to Clusters and select your cluster.

2. On the cluster edit page, scroll down to the bottom of the page,
expand Advanced Options, then select the JDBC/ODBC tab.

3. On the JDBC/ODBC tab, copy and save the first JDBC URL.

o Construct the JDBC server address that you will use when you set up
your Spark cluster connection in Power BI Desktop.

o Take the JDBC URL and do the following:

o Replace jdbc:spark with https.

o Remove everything in the path between the port number and sql,
retaining the components indicated by the boxes in the image below.
Also remove ;AuthMech=3;UID=token;PWD=<personal-access-token> from the
end of the string.
o In our example, the server address would be:

[Link]
3592075405/0615-225254-need937
Task 2: Connect to Azure Databricks using Power BI Desktop
1. If you did not already do so during the before the hands-on lab setup,
download Power BI Desktop from [Link]
us/desktop/.

2. When Power BI Desktop starts, you will need to enter your personal
information, or Sign in if you already have an account.

3. Select Get data on the screen that is displayed next.

4. Select Spark from the list of available data sources. You may enter Spark into
the search field to find it faster.

5. Select Connect.

6. On the next screen, you will be prompted for your Spark cluster information.

7. Paste the JDBC connection string you constructed into the Server field.

8. Select the HTTP protocol.

9. Select DirectQuery for the Data Connectivity mode and select OK. This option
will offload query tasks to the Azure Databricks Spark cluster, providing near-
real time querying.

10. Enter your credentials on the next screen as follows:

o Username: token

o Password: Remember that ADF Access token we generated for the

Azure Data Factory notebook activity? Paste the same value here for
the password.
11. Select Connect.

12. In the Navigator dialog, check the box next to flight_delays_summary, and
select Load.
Task 3: Create Power BI report
1. Once the data finishes loading, you will see the fields appear on the far side of
the Power BI Desktop client window.

2. From the Visualizations area, next to Fields, select the Globe icon to add a
Map visualization to the report design surface.
3. With the Map visualization still selected, drag the OriginLatLong field to
the Location field under Visualizations. Then Next, drag the NumDelays field
to the Bubble Size field under Visualizations.

4. You should now see a map that looks similar to the following (resize and
zoom on your map if necessary):
5. Unselect the Map visualization by selecting the white space next to the map in
the report area.

6. From the Visualizations area, select the Stacked Column Chart icon to add a
bar chart visual to the report's design surface.

7. With the Stacked Column Chart still selected, drag the DayofMonth field and
drop it into the Axis field located under Visualizations.

8. Next, drag the NumDelays field over, and drop it into the Value field.
9. Grab the corner of the new Stacked Column Chart visual on the report design
surface and drag it out to make it as wide as the bottom of your report
design surface. It should look something like the following.

10. Unselect the Stacked Column Chart visual by selecting on the white space next
to the map on the design surface.
11. From the Visualizations area, select the Treemap icon to add this
visualization to the report.

12. With the Treemap visualization selected, drag the OriginAirportCode field
into the Group field under Visualizations.

13. Next, drag the NumDelays field over, and drop it into the Values field.
14. Grab the corner of the Treemap visual on the report design surface and
expand it to fill the area between the map and the side edge of the design
surface. The report should now look similar to the following.

15. You can cross filter any of the visualizations on the report by selecting one of
the other visuals within the report, as shown below (This may take a few
seconds to change, as the data is loaded).
16. You can save the report, by choosing Save from the File menu, and entering a
name and location for the file.

Big Data and Visualization Hands-Steps-1
No ratings yet
Big Data and Visualization Hands-Steps-1
7 pages
Azure Databricks Team Data Science Lab
No ratings yet
Azure Databricks Team Data Science Lab
18 pages
Azure Databricks for Data Science Lab
No ratings yet
Azure Databricks for Data Science Lab
18 pages
Big Data and Visualization
No ratings yet
Big Data and Visualization
141 pages
Microsoft Certified: Azure Data Scientist Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Scientist Associate - Skills Measured
4 pages
Mslearn dp100 06
No ratings yet
Mslearn dp100 06
2 pages
DP-100 - Designing and Implementing A Data Science
No ratings yet
DP-100 - Designing and Implementing A Data Science
9 pages
Databricks Lab 1
100% (3)
Databricks Lab 1
7 pages
Azure DP-100 Exam Skills Guide
No ratings yet
Azure DP-100 Exam Skills Guide
9 pages
How The Components Work Together?
No ratings yet
How The Components Work Together?
29 pages
Azure Databricks Workspace Setup Guide
No ratings yet
Azure Databricks Workspace Setup Guide
3 pages
Esas Doc Ms
No ratings yet
Esas Doc Ms
1,590 pages
Running Experiments in Azure ML
No ratings yet
Running Experiments in Azure ML
3 pages
Azure Databricks ETL Tutorial
No ratings yet
Azure Databricks ETL Tutorial
12 pages
Creating an Azure ML Workspace Guide
No ratings yet
Creating an Azure ML Workspace Guide
3 pages
Create A Clustering Model With Azure Machine Learning Designer
No ratings yet
Create A Clustering Model With Azure Machine Learning Designer
22 pages
150+ +Azure+Databricks+Slides
No ratings yet
150+ +Azure+Databricks+Slides
35 pages
Azure Data Storage Lab Guide
No ratings yet
Azure Data Storage Lab Guide
15 pages
Azure DP-100 Exam Guide: Data Science Skills
100% (1)
Azure DP-100 Exam Guide: Data Science Skills
6 pages
Azure Databricks Comprehensive Guide
No ratings yet
Azure Databricks Comprehensive Guide
27 pages
Azure ML Fresco - Toaz - Info
No ratings yet
Azure ML Fresco - Toaz - Info
28 pages
Azure DataBricks
No ratings yet
Azure DataBricks
37 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Create A Regression Model With Azure Machine Learning Designer
No ratings yet
Create A Regression Model With Azure Machine Learning Designer
21 pages
Create Azure ML Workspace Guide
No ratings yet
Create Azure ML Workspace Guide
19 pages
DP-100 Overview
No ratings yet
DP-100 Overview
13 pages
Azure Daywise Track - XLSX - Azure Track
No ratings yet
Azure Daywise Track - XLSX - Azure Track
12 pages
Intro To Ai and ML Microsoft
No ratings yet
Intro To Ai and ML Microsoft
19 pages
DP 100 Dumps
100% (2)
DP 100 Dumps
269 pages
Azure Machine Learning
No ratings yet
Azure Machine Learning
18 pages
DP-100T01 L Outline
No ratings yet
DP-100T01 L Outline
4 pages
Data Science in The Cloud With Microsoft Azure Machine Learning and Python Stephen F. Elston Download
No ratings yet
Data Science in The Cloud With Microsoft Azure Machine Learning and Python Stephen F. Elston Download
25 pages
Azure Databricks Documentation
100% (1)
Azure Databricks Documentation
7,197 pages
Mslearn dp100 10
No ratings yet
Mslearn dp100 10
2 pages
Azure Data Storage Lab for Engineers
No ratings yet
Azure Data Storage Lab for Engineers
15 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Azure ML Model Interpretation Guide
No ratings yet
Azure ML Model Interpretation Guide
2 pages
Azure Data Scientist Associate Skills Guide
No ratings yet
Azure Data Scientist Associate Skills Guide
3 pages
Microsoft Premium DP-100 by - VCEplus 60q-DEMO PDF
No ratings yet
Microsoft Premium DP-100 by - VCEplus 60q-DEMO PDF
52 pages
Machine Learning On Cloud
15% (13)
Machine Learning On Cloud
28 pages
Lab 2
No ratings yet
Lab 2
4 pages
Francesca Lazzeri - Machine Learning For Time Series Forecasting With Python-Wiley (2020) (177-206)
No ratings yet
Francesca Lazzeri - Machine Learning For Time Series Forecasting With Python-Wiley (2020) (177-206)
30 pages
Microsoft Certified Azure Data Scientist Associate Skills Measured
No ratings yet
Microsoft Certified Azure Data Scientist Associate Skills Measured
3 pages
Azure ML Studio: Key Concepts and Tasks
No ratings yet
Azure ML Studio: Key Concepts and Tasks
4 pages
DP 100
No ratings yet
DP 100
102 pages
Azure Databricks Brief Introduction
No ratings yet
Azure Databricks Brief Introduction
40 pages
Exam DP 100 Data Science Solution On Azure Skills Measured
100% (1)
Exam DP 100 Data Science Solution On Azure Skills Measured
6 pages
Monitor Azure ML Models with Insights
No ratings yet
Monitor Azure ML Models with Insights
2 pages
Azure Data Engineering Course Interview Questions 1751484980
No ratings yet
Azure Data Engineering Course Interview Questions 1751484980
20 pages
Lab 1 - Getting Started With Azure ML
No ratings yet
Lab 1 - Getting Started With Azure ML
16 pages
Azure Automated Machine Learning Guide
No ratings yet
Azure Automated Machine Learning Guide
5 pages
Azure Databricks
No ratings yet
Azure Databricks
21 pages
Azure ML Lab Setup Guide
No ratings yet
Azure ML Lab Setup Guide
1 page
Lab - 01 - Data Engineering Practice
No ratings yet
Lab - 01 - Data Engineering Practice
4 pages
Artificial Intelligence Azure Machine Learning
No ratings yet
Artificial Intelligence Azure Machine Learning
4 pages
Sample
No ratings yet
Sample
53 pages
Untitled Document 2
No ratings yet
Untitled Document 2
33 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
Genetics
No ratings yet
Genetics
33 pages
Small Pargraphs
No ratings yet
Small Pargraphs
1 page
Optim Exit Routine
No ratings yet
Optim Exit Routine
524 pages
As 4
No ratings yet
As 4
2 pages
DN 4.0 - Java FSE Mandatory Hands-On Detail
No ratings yet
DN 4.0 - Java FSE Mandatory Hands-On Detail
6 pages
Banking Dimentions
No ratings yet
Banking Dimentions
2 pages
cjcsm6520.01b Link16
100% (1)
cjcsm6520.01b Link16
58 pages
Configuration Emergency Access Management 12
100% (1)
Configuration Emergency Access Management 12
42 pages
Master Boot Record - Wikipedia, The Free Encyclopedia
No ratings yet
Master Boot Record - Wikipedia, The Free Encyclopedia
21 pages
Command PLOTTING
No ratings yet
Command PLOTTING
14 pages
Lucy Trojan Mini Street Lighting Isolator 2 Way PDF
No ratings yet
Lucy Trojan Mini Street Lighting Isolator 2 Way PDF
2 pages
Marketing Professional Resume: Allyson Lech
No ratings yet
Marketing Professional Resume: Allyson Lech
1 page
Software Development Journey of Utkarsh Dev
No ratings yet
Software Development Journey of Utkarsh Dev
2 pages
2024 Senior CS ZedICTHub File Organisation and Access Answers
No ratings yet
2024 Senior CS ZedICTHub File Organisation and Access Answers
2 pages
BEP240SNc Tech Support3
No ratings yet
BEP240SNc Tech Support3
0 pages
Year 8 COMPUTING Test 1 2024 Memo
No ratings yet
Year 8 COMPUTING Test 1 2024 Memo
2 pages
PPSC Unit-1
No ratings yet
PPSC Unit-1
44 pages
Data Acquisition & Control in Microcontrollers
No ratings yet
Data Acquisition & Control in Microcontrollers
26 pages
Analyzer A30 Machine Condition Monitor
No ratings yet
Analyzer A30 Machine Condition Monitor
2 pages
Concept of Units and Tens
No ratings yet
Concept of Units and Tens
47 pages
Quantization and Compression PDF
No ratings yet
Quantization and Compression PDF
220 pages
Lec. 23 Adaptive Control
No ratings yet
Lec. 23 Adaptive Control
21 pages
Salesforce Object Relationships Explained
No ratings yet
Salesforce Object Relationships Explained
5 pages
Bartender 2016 Manual
No ratings yet
Bartender 2016 Manual
21 pages
Product Overview s71200 System Manual
No ratings yet
Product Overview s71200 System Manual
7 pages
Sign VirtualBox Modules on CentOS 8
No ratings yet
Sign VirtualBox Modules on CentOS 8
3 pages
Roadrunner Release Notes
No ratings yet
Roadrunner Release Notes
86 pages
DTMF Signaling Micro Lab Project
No ratings yet
DTMF Signaling Micro Lab Project
4 pages
The Anti-Sublime Ideal in Data Art
No ratings yet
The Anti-Sublime Ideal in Data Art
15 pages
Raptor+ Thermal 68/85
No ratings yet
Raptor+ Thermal 68/85
46 pages
Module 1
No ratings yet
Module 1
23 pages
BR100 Application Configurations Multi Org.
50% (2)
BR100 Application Configurations Multi Org.
19 pages

Solution (Updated)

Uploaded by

Solution (Updated)

Uploaded by

End-to-End: Azure ML and Big Data

Case Study – Students Guide

Task 1: Retrieve Azure Storage account information and

Task 2: Create an Azure Databricks cluster

2. Next, select your Azure Databricks service from the list.

5. On the New Cluster form, provide the following:

o Cluster Name: lab

o Single node: Selected

o Node type: Compute Optimized → Standard_F4

o Terminate after: Check the box and enter 120

Task 1: Upload the Sample Datasets

7. Drag-and-drop the [Link] file into the file

8. Change the Table Name to flight_delays_with_airport_codes and select the

3. Select Install New.

5. Wait until the library's status shows as Installed before continuing.

8. Repeat step 4 with the library: mlflow

3. After importing, expand the new BigDataVis folder.

1. Within the [Link] there should be 3 subfolders for exercises 2, 5

Task 1: Download and stage data to be processed

2. Download the AdventureWorks sample data from [Link]

3. Extract it to a new folder called C:\Data.

Task 2: Install and configure Azure Data Factory Integration

3. Run the installer, once downloaded.

4. When you see the following screen, select Next.

6. Accept the default Destination Folder and select Next.

8. Select Finish once the installation has completed.

4. On the Data Factory Overview screen, select Launch studio.

11. Don't close the current screen or browser session.

13. Select Register.

Task 1: Create copy pipeline using the Copy Data Wizard

2. In the Copy Data properties, enter the following:

o Task Type: Built-in copy task

o Task cadence or task schedule: Select Schedule

o Start Date (UTC): Enter 03/01/2018 12:00 AM

4. On the Source data store screen, select + Create new connection.

6. In the New Linked Service form, enter the following:

o Connect via integration runtime: Select the Integration runtime

o Username: Use your machine's login username.

o Password: Use your machine's login password.

If your connection fails, please check:

• Open a new PowerShell with Admin permission:

• Navigate to the executable and issue the command:

Possibly also add

Try using below command to find your username: whoami

9. On the File format settings page, select the following options:

o File format: Delimited Text

o Column delimiter: Comma (,)

o Row delimiter: Auto detect (\r, \n, or \r\n)

o First row as header: Checked

10. Select Next.

o Connect via integration runtime: Select your Integration Runtime.

o Authentication method: Select Account key

o Account selection method: From Azure subscription

o Storage account name: Select the blob storage account you

o Task name: CopyOnPrem2AzurePipeline

Task 1: Create Azure Databricks Linked Service

2. Once there, expand Databricks under Activities.

o Connect via integration runtime: Leave set to Default.

o Account selection method: From Azure subscription

o Azure subscription: Choose your Azure Subscription.

o Databricks workspace: Pick your Databricks workspace to populate

o Select cluster: Existing interactive cluster

3. Enter 3/1/2017into the windowStart parameter, then select OK.

Task 1: Summarize delays by airport

In this exercise, you will create visualizations in Power BI Desktop.

Task 1: Obtain the JDBC connection string to your Azure

1. In Azure Databricks, go to Clusters and select your cluster.

o Take the JDBC URL and do the following:

o Replace jdbc:spark with https.

3. Select Get data on the screen that is displayed next.

8. Select the HTTP protocol.

10. Enter your credentials on the next screen as follows:

o Password: Remember that ADF Access token we generated for the

You might also like