Hands-On Practice for Assignments – Part 1
Hands-On Practice for Assignments – Part 1
page 1
Hands-On Practice for Assignments – Part 1
Log in to SASViya for Learners 4 by following steps below:
1. Go to https://www.sas.com/en_us/software/viya-for-learners.html
2. Click on Access for Students and, if asked, accept the T&C agreement.
3. Click on Launch SAS Viya for Learners 4.
First screen you will see is the SAS Drive – Share and Collaborate – a collaborative interface for organizing and
sharing content
From the applications menu on top left corner select Explore and Visualize Data to open SAS Visual
Analytics Application.
Click on New Report– and click Add data to select the INSIGHTTOY_SALES data source.
page 2
Hands-On Practice for Assignments – Part 1
To find more information about the dataset, from the applications menu on top left corner select Manage
Data, select your dataset – Insight Toy_Sales
page 3
Hands-On Practice for Assignments – Part 1
To find even more information on this dataset, click Actions and select Discover information assets. You can
also access this section by clicking Follow the prompts to analyse data.
page 4
Hands-On Practice for Assignments – Part 1
Click Column Analysis to find more information on your measures. Metadata Measures and Data Quality
Measures Tabs give you more information on completeness and uniqueness of data items.
Hands-on Exercise 1-1 (Exploration):
1) How many observations are there in the INSIGHTTOY_SALES data set?
2) How many data items (or variables, or columns) are there?
3) What do Sample Data and Column Analysis tabs tell about the data?
page 5
Hands-On Practice for Assignments – Part 1
After exploring data set, go back to Explore and Visualise by clicking or click Actions and select Explore and
Visualize from the dropdown list to conduct the steps below.
page 6
Hands-On Practice for Assignments – Part 1
Unique Row Identifier, Hierarchies and Geographies
Set up Unique Row Identifier
Right click on Order data item and Set Order as Unique Row Identifier
page 7
Hands-On Practice for Assignments – Part 1
Create Product Hierarchy
Click New data item to create Hierarchy
page 8
Hands-On Practice for Assignments – Part 1
Create Geographies
Select Facility Country/Region and select Geography as a Classification
Facility Country/Region (using Country or Region ISO 2-Letter Codes). Make sure you see the
‘100% mapped’ above the map.
Facility (custom coordinates, using: xyFacility Lat and xyFacility Lon), Facility City (custom geography,
using: xyFacility City Lat and xyFacility City Lon), and repeat the same steps for the following data items:
Facility Continents, Facility State/Province, and Manufacturing Facility.
page 9
Hands-On Practice for Assignments – Part 1
Hide the xy Lat and Lon Data items
Create a hierarchy of the geographies
Click New data item to create Hierarchy
Facility Geo Hierarchy: Facility Continents, Facility Country/Region, Facility State/Province, Facility City,
and Facility.
page 10
Hands-On Practice for Assignments – Part 1
Create a Location Hierarchy
Click New data item to create Hierarchy
Country Hierarchy: Facility Country/Region, Facility State/Province and Facility City.
Create Order Margin as a Calculated Item
Click New data item to create Calculated Item
1.
2.
- Select Operators – Subtraction
- Select Data
page 11
Hands-On Practice for Assignments – Part 1
Create Gross Profit Margin as an Aggregated Measure
Click New data item to create Calculated Item as Aggregated Measure
Select Operators – Division
Select Functions – Aggregated (simple) - Sum
Select Data
Create Satisfaction as Custom Category
Click New data item to create Custom Category
Note: Enter 0 and 0.55 in Intervals
page 12
Hands-On Practice for Assignments – Part 1
Exploration
SAS Visual Analytics Explorer and Visualise enables you to explore your data sources (in-memory tables
from the SAS CAS Analytic Server) by using interactive visualisations such as charts, histograms, and
tables. You can also apply data analysis such as forecasting, correlation, and fit lines. You can also
perform statistical modeling tasks in the explorer.
Verify the data set details
Now you are ready to start your exploration
At the left side menu the data items on the data set are displayed, the numbers indicate the number of
different occurrences, and the properties of the data items are represented by icons, as showed
below.
Aggregated Measure or Time Period Calculation
Calculated (Category, Date, or Measure)
Category
Category with a user-defined format
Date and Time
Geography
or Hierarchy
Measure
Unique identifier
You can manage the properties for individual data items by using the Data pane. The changes that you
make are saved as part of the exploration, but do not affect the original data source.
page 13
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 1-2:
From the applications menu on top left corner select Discover Information Assets
Select Column Analysis and verify the distributions and main statistics of each measure variable
page 14
Hands-On Practice for Assignments – Part 1
Analyse the Sites
We can see that Insight Toys manufacture 779 different products (Product SKU) at 130 facilities
(Facilities).
Where do our revenues and profits come from?
To display data on a map, the locations to be displayed must be provided with longitude and latitude
information.
Let’s create a new visualization with Country Hierarchy, Order Total and Gross Profit Margin.
Go to Objects – Geo Maps - Geo Coordinate
The size of the dots represents the Revenues (sum of Order Total) at the respective countries, while the
color (gradient) shows the Gross Profit Margin.
It quickly becomes clear that the largest proportion of sales comes from the United States (above 537
million). In fact this is so high, that makes it difficult to visualise sales in the other countries. We can
filter out the United States to make it easier to compare the rest of the world. For this purpose we
select the section Filter from the right pane, select New Filter and select the Facility
Country/Region, unselect US
Note: Under the Facility Country/Region filter, a bar is displayed for each data item value showing
number of rows for corresponding value. As you can see US has the highest percentage of data.
page 15
Hands-On Practice for Assignments – Part 1
Now let's look at the United States in detail. Select the United States again (or delete the filter) and drill
down one level (State/Province) on the United States (double click on the bubble of the United States).
page 16
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 2:
1) Which city in California has the highest Order Total?
2) Which Brazilian state has the highest Gross Profit Margin?
3) What is the Order Total and Gross Profit Margin in the city of Beijing?
4) What is the Order Total and Gross Profit Margin in the province of Gauteng (South Africa)?
5) Which countries have the lowest Gross Profit Margin in 2013?
6) Which are the top 10 cities with highest Order Total in 2013? Tip: You may need to create a different
visualization to check this, e.g.: a Crosstab.
Geo Region
Note 1: Region maps (also known as a choropleth map) requires a geography variable with the role type
of geography and it does not support custom geography data items. Our Facility Country/Region
geography variable was created using Country or Region Names (on the dataset it was provided as the
standard ISO country names).
page 17
Hands-On Practice for Assignments – Part 1
Analyse the Products
Now let’s verify the sales and profit margins generated by the products on our portfolio.
Select Treemap
Then, assign the data items: Tile - Product Hierarchy, Size - Order Total and Color Order Margin.
You can change the Color Gradient in Options
We see the sum of revenues and margins for 2 Product Brands: Toy and Novelty. Now drilldown on
Toy to see these values on the Product Lines of the Toy brand. Drilldown again on Game, and you see its
Product Make. You can observe big differences here.
page 18
Hands-On Practice for Assignments – Part 1
A box plot can help visualise the distribution of our revenues and profits.
Make a copy (Duplicate) of the current visualization TreeMap and then change the visualization to a
Box plot.
Next, drill down again on Toy -> Games by double clicking on x axis titles.
Notice: using the Options menu you can select to Show Outliers and Averages.
Mousing over the outliers of the box plots shows more detailed distribution of outlier values.
page 19
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 3:
1) Which Super Hero Figure toy generated the highest Order Total and Order Margin? And the lowest?
2) What are Order Total and Order Margin of small cats plush?
3) Apparently our Novelty Thrift Clip(s) are not generating much revenues. Are there some styles
generating higher revenues?
4) We are doing very well with our third generation of Board Games. Which one generated the highest
Order Total and Order Margin?
5) Which are the top 5 products (Product SKU) in terms of revenue (Order Total) in 2013? And the
bottom 5 products? Tip: Create a Bar chart.
page 20
Hands-On Practice for Assignments – Part 1
Analyse the Sales Evolution
Now let’s verify how sales have developed in recent years. Create a new page and drag and drop a Line
Chart from the Objects tab on the left. Select Transaction Date Year (you can create it as a new
calculated, category item or change a format of the Transaction Date data item) as Category, and
Order Total & Order Product Cost as Measures.
We can see how these two financial metrics evolved in the last few decades. But, what if we want to
see them in a monthly basis?
To do this we need to use a Time Series Plot object; we also need to Duplicate the Transaction Date
data item and Format it as MMMYYYY. We call this new item Transaction Date Month Year.
page 21
Hands-On Practice for Assignments – Part 1
We then create our Time Series Plot with Transaction Date Month Year and same measures.
We can now see a strong seasonality. To be easier to visualise it, only a certain portion of the time
series is displayed. In our specific case, the period from Jan 2000 to Jan 2014.
With the scroll bar you can zoom in or out the displayed time series. Move the left or right scrollbars
back and forth to see the periods you are interested in. In order to move both controllers
simultaneously, you can rotate the mouse wheel. Try it out!
page 22
Hands-On Practice for Assignments – Part 1
We also want to know the forecast for the next months, so change the timeseries plot to a Forecasting
chart, by default it generates the forecast for the next 6 months.
As you can see it added a line with predicted values to your visualization and a colored band that
represents the confidence interval, in this case 95% confidence interval.
Notice: The explorer automatically tests multiple forecasting models for your data, and then selects the
best model. To see which forecasting model was used, click on to expand the time series plot and
explore ‘Dependent Variable Results’ tab.
page 23
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 4:
1) What is the forecast of Order Total for the next 24 months? Tip: Remove Order Product Cost and
change the Forecast horizon on Options -> Forecast.
2) Which method was used to generate this forecast?
3) According to this forecast, what would be the Order Total in December 2013? And December
2014?
We can try to improve accuracy of the forecast by adding other measures as Underlying factors. The
forecasting model evaluates the additional measures to determine whether they contribute to the
accuracy of the forecast.
Now, include Order Marketing Cost, Sales Rep Rating and Vendor Satisfaction in our forecasting as
Underlying Factors. It is for a forecast horizon of 24 months.
Note: Check the Aggregation for Vendor Satisfaction and Sales Rep Rating
page 24
Hands-On Practice for Assignments – Part 1
Order Marketing Cost and Vendor Satisfaction contributed to the forecast while Vendor Satisfaction
didn’t.
Now that our forecast includes underlying factors, we can apply scenario analysis and goal seeking to this
forecast. Click on What If … to select Scenario Analysis to forecast hypothetical scenarios by specifying
future values for underlying factors that contribute to the forecast.
Suppose we would increase Marketing investments by 20%, what would be our sales forecast?
New window Scenario Analysis pops up, click Adjust and select Factor: Order Marketing Cost, Adjustment:
Add percentage to all, Value: 0.20%, then click OK
page 25
Hands-On Practice for Assignments – Part 1
Then click Apply
We can observe that the sales forecast would be higher.
Let’s reset it for now. Click Reset All
Now, suppose we had a target for Order Total of $ 8 million in December 2014 (make sure you adjust
the forecasting to 24 month), how much we should invest in Marketing in order to achieve this goal?
Select Goal Seeking and enter our target: 8000000 for December 2014.
page 26
Hands-On Practice for Assignments – Part 1
Click Apply. Go back to the table view.
You can see the underlying factor requirements, in order to achieve our $8 million goal in December
2014.
Please reset all before moving on.
Hands-on Exercise 5:
1) We want to increase our forecast of Order Total by $ 1 million every month for the next 24 months.
How much we should invest in Marketing in order to achieve this goal?
page 27
Hands-On Practice for Assignments – Part 1
Analyse Customer Satisfaction
Now let’s verify how our Customer Satisfaction has developed in recent years. Create a new
Visualization, select Transaction Month Year and Vendor Satisfaction and drag them to the visualization
area.
We can see that the satisfaction is not great (around 57%), we can also forecast it for the next 12
months, and it wouldn’t be better.
page 28
Hands-On Practice for Assignments – Part 1
Our marketing department is convinced that Vendor Satisfaction is key to improving our sales, but it is
necessary to understand what impacts customer satisfaction.
Let’s verify how Vendor Satisfaction is correlated with the other measures. Create a new visualization,
select all measures and drag them into the visualization. It automatically decided to create a Correlation
Matrix.
page 29
Hands-On Practice for Assignments – Part 1
We can focus our attention to the second last row (Vendor Satisfaction), we can see that its strongest
relationship is with Sales Rep Rating. Double click on that cell and a new heat map visualization will
display the two measures from our selected cell. We can see it is a strong positive linear relationship
(Correlation: 0.973).
Hands-on Exercise 6:
1) Which are the 3 measures with the strongest correlations with Vendor Satisfaction? What are their
correlations?
page 30
Hands-On Practice for Assignments – Part 1
On a different approach, we can use a predictive model, like a Decision Tree, to better understand what
impacts customer satisfaction, but also to verify the characteristics of our most satisfied customers.
Previously, a Custom Category named Satisfaction was created, by categorizing as “Good” the cases
where Vendor Satisfaction is between 55% and 100%. We can see it using two visualizations, one for
Satisfaction and another one for Vendor Satisfaction. Then using the Automatic Actions and selecting
the “Good” or “Bad” on Satisfaction object we can see on the Distribution of Vendor Satisfaction object
where they are.
page 31
Hands-On Practice for Assignments – Part 1
A decision tree creates a segmentation of the data based on a series of rules. Each rule assigns an
observation to a node based on the value of one predictor. Rules are applied sequentially, which results in
a hierarchy called a tree. The initial node contains the entire data set and is called the root node. A node
and all of its successors form a branch. The final nodes are called leaves. For each leaf, a decision is made
about the response variable.
A decision tree is a supervised Predictive Model, in the sense that it requires one Target variable and at
least one predictor. The Target can be a category variable (Classification Tree) or a measure variable
(Regression Tree). A predictor can be a category or measure variable. A decision tree will be automatically
built, but you can also manually train it and prune nodes in the interactive mode.
Create a new visualization and select the Decision Tree:
Drag Satisfaction to the visualization or to Response in the roles tab. Now we need to add variables that
may influence it, let’s include the ones with highest correlations, but also some other variables (not only
measures) that we suspect would influence it, accordingly to our domain expertise. Select: Market
Penetration, Order Sales Cost, Sales Rep Rating, Vendor Distance, Vendor Rating, Vendor Loyalty Program
and Facility Continents and drag them to Predictors.
We can see the variables importance selected by the Decision Tree, interestingly Vendor Loyalty
Program and Market penetration are least important factors.
page 32
Hands-On Practice for Assignments – Part 1
As we zoom in our Decision Tree, we can see the first branch used Sales Rep Rating and split it at 51%.
Those greater or equal to 51% present “Good” satisfaction rates in 97.65% of the cases. Those with less
than 51% present “Bad” rates in 87.19% of the cases. Growing the Tree, by further subdividing the
cases, we can obtain a more complete and accurate model. By looking at the leaves (mouse over) we
can quickly check its rules and statistics.
In Options set Plot Type to Leaf statistics and change Statistics to Show to Percent.
With the Diagnostics Plots we can see Leaf Statistics and Assessment windows.
The Leaf Statistics window plots the frequency or percentage of each observation in each leaf node.
The Assessment window plots lift, ROC, and misclassification rates, we can use those to determine how
well the model fits the data. Lift is the ratio of the percent of captured responses within each percentile
bin to the average percent of responses. A receiver operating characteristic (ROC) chart displays the
ability of a model to avoid false positive and false negative classifications. The misclassification plot
displays how many observations were correctly and incorrectly classified for each value of the response
variable.
page 33
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 7:
1) How many leaves do we have?
2) View the Leaf Statistics. Which is the purest leaf? What is rule associated with that leave?
3) View the ROC chart. What is the KS Statistic?
4) View the Misclassification Chart. How many false negatives are there?
In a different attempt to understand Customer Satisfaction, Insight Toy has been collecting an Order
Note for each order. When the customers express some feedback or comment, the Sales Representative
enter them on those notes, in a free text format. The Marketing department intended to categorize
those notes and then search for potential problems they should look into, but also to verify the
customer sentiment. Unfortunately, due to the high volume of notes, the Marketing department has
been unable to analyse them, until now.
SAS Visual Analytics also provides the Word Cloud, to display a set of words from a character data item,
or a document collection. Depending on the type of word cloud and your data roles, the size of each
word in the cloud can indicate the importance (topic term weight) of the word, the frequency of the
word, or the value of a measure.
Text Topics object use text analytics to analyze each value in a document collection as a text document
that can contain multiple words. Words that often appear together in the document collection are
identified as topics. For a selected topic, the word cloud displays the terms with the greatest topic term
weight values (importance of the term within the topic). A word cloud can also display whether the
documents in a topic express positive, negative, or neutral sentiment.
Let’s look at the Order Notes of 2013, in a new visualization select Text Topics .
Assign data items
page 34
Hands-On Practice for Assignments – Part 1
Drag Transaction Date Year into Filter and select 2013 (only).
We can see in the initial results that the Text Analytics categorisation algorithm identified 6 topics by
grouping important terms in this document collection.
Clicking on each topic shows document Count, it will also update the word cloud/terms. Topic Term
Weights are also shown. We can read the Order Notes with this term on the tab Documents.
page 35
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 8:
1) Verify the term dissatisfy. What is its Topic Term Weight on the first topic? View some Documents
where it is present.
2) Is this term dissatisfy also presented on other topics? If so, what is its Topic Term Weight
Based on our primary interest on Satisfaction, so let’s incorporate Sentiment Analysis.
In Options -> Text Topics -> General, Text Analytics Setting Check Analyze document sentiment. Set the
Maximum topics to 5, with these settings it would identify fewer topics. Alternatively, you can select
Analyse sentiment from the drop-down list located at the top of the page
page 36
Hands-On Practice for Assignments – Part 1
page 37
Hands-On Practice for Assignments – Part 1
Hands-on Exercise 9:
1) How many topics were found?
2) Which topic present more documents with negative sentiments? Which are the top 5 terms
(regarding Topic Term Weight) on this topic?
3) Select the term dissatisfy and inspect the documents with higher negative sentiments.
We can select some cases for further investigation. From the last exercise, with the term dissatisfy
selected, choose some documents, and Create list table from the Selected Documents.
On this new visualisation we can add other information, like: Vendor, Facility Country, Facility
State/Province
page 38
Additional Hands-On Information for Assignment 1
We can also present the frequency of this problematic Order Notes, involving the term dissatisfy. Just
duplicate this Visualization, transform it into a Crosstab, remove Order Notes and Click Frequency to
order it.
page 39