ASSIGNMENT 1
Basic Data Description and Data Distributions
Basic Data Descriptors
1. The total number of orders in the “OrderList.xlsx” file is:
Ans- 3,000
2. The median of “Total Sale $” is larger than mean. By how much? Round to 2 decimal
places.
Ans- 2.81
3. What is the standard deviation of Total Sale? Round to 2 decimal place.
Ans- 120.26
4. What percentage of orders fell within the interquartile range of Total Sale?
Ans- 50%
5. What is the approximate shape of the distribution of total sales? (Hint: Create a
histogram to see, or use what you know about the mean/median relationship and the
rule of thumb.)
The distribution is uniform.
The distribution is symmetric.
The distribution is skewed right.
The distribution is skewed left.
Answer-The distribution is skewed left
.
6. Given the limited information you have, your boss wants you to group customers in a
meaningful way. You decided to take a look at how the order region impacts things.
Calculate the average total sales from the North region only. What is the difference
between the North region average total sales and the average total sales across all
regions (including the North)? Round to 2 decimal places.
Ans- 1.44
7. What is the absolute value of difference between the North region median total sales
and all orders median total sales (across all regions including the North)?
Round your answer to two decimal places.
Ans- 0.58
8. Next, take a look at customer age. Create 3 age groups: 21-30, 31-40, 41-50. What is
the average total sales for the age group with the highest average? Round to 2 decimal
places.
Ans- 494.62 in the Age Group of 21-30
9. What is the median total sales of the age group with the highest average Total Sales?
Round to 2 decimal places.
Ans- 479.50
10. Given the mean and median of the group with the highest average sales, what can you
say about the distribution of total sales within that group?
The distribution is uniform.
The distribution is normal.
The distribution is skewed right.
The distribution is skewed left.
11. Based on this data, what would you recommend to your boss?
We should separate customers by region and target the North region as our main
customer segment has historically brought in higher average total sales.
We should separate customers by age group and target 21-30 as our main
customer segment. That segment has historically brought in higher average total
sales.
We should separate customers by age group and target 31-40 as our main customer
segment. That segment has historically brought in higher average total sales.
We should separate customers by age group and target 41-50 as our main customer
segment. That segment has historically brought in higher average total sales
ASSIGNMENT 2
Measures of Association, Probability, and Data
Distributions Descriptive
Descriptive Measures of Association, probability, and Statistical
Distribution
1. How many rows of data are included in the dataset given?
Ans – 6,000
2. What is the covariance of Datasets A and B? Round to 2 decimal places.
Ans- 81379.81
3. Which dataset pair has the highest covariance?
A&B
B&C
A&C
Cannot be determined with the information given
4. Which dataset pair has the strongest relationship?
A&B
B&C
A&C
Cannot be determined with the information given
5. Given that dataset A outcomes always occurs before dataset B outcomes (and no other
information). Can you conclude that A causes B?
Yes, because all requirements for causation are met.
Yes, because covariance and correlation are both positive.
No, the variables are not correlated.
No, there is no control for external variables.
6. Create a histogram of Dataset A. Based on the shape of the distribution of outcomes,
which of the followings is most likely true?
Higher values are much more likely to occur than lower values.
Negative values are much more likely to occur than positive values.
All values in the range have a relatively equivalent chance of occurring, with a slightly
lower probability on the high end.
No information can be used from this dataset.
7. Create a histogram of Dataset B. Based on the shape of the distribution of outcomes,
select the range below that appears to have the highest probability of occurrence.
-729 to -350 (Upper Limit then Lower Limit)
-250 to 200
400 to 800
1100 to 1500
8. Consider 4 sets of data:
Set W: set of all real numbers over the range 1 to 100.
Set X: set of all integers over the range 1 to 100.
Set Y: set of all real numbers over the range 1 to 3
Set Z: set of all whole numbers over the range 1 to 10,000.
Which set has the LEAST numbers?
Set W.
Set X.
Set Y.
Set Z.
Cannot be determined from the information given.
9. Assume the data set Y and Z have a covariance of -500. Which of the following do you
know to be true? Select all that apply.
Dataset Y and Z have a strong relationship.
Dataset Y and Z have a negative relationship.
The result may be affected by the units of measurement.
Dataset Y and Z have a casual relationship.
10. Select all the example of Discrete data below:
Bees in a beehive
Honey in a beehive
Fish in the sea
Voltage level of a battery
Your dog’s weight
Time you wake up in the morning
Languages spoken
Voters for a particular candidate in an election
Cooking oil used in recipe
Animals on a farm