0% found this document useful (2 votes)
1K views5 pages

Assignment Questions

The document contains two assignments related to analyzing data distributions and descriptive statistics: 1) The first assignment involves calculating basic data descriptors like count, mean, median, standard deviation for total sales and analyzing the distribution shape and how measures differ by region and age group. 2) The second assignment involves calculating covariance and correlation between datasets, analyzing histogram shapes to understand variable distributions, identifying discrete vs. continuous data, and understanding how covariance relates to relationship strength.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (2 votes)
1K views5 pages

Assignment Questions

The document contains two assignments related to analyzing data distributions and descriptive statistics: 1) The first assignment involves calculating basic data descriptors like count, mean, median, standard deviation for total sales and analyzing the distribution shape and how measures differ by region and age group. 2) The second assignment involves calculating covariance and correlation between datasets, analyzing histogram shapes to understand variable distributions, identifying discrete vs. continuous data, and understanding how covariance relates to relationship strength.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ASSIGNMENT 1

Basic Data Description and Data Distributions

Basic Data Descriptors

1. The total number of orders in the “OrderList.xlsx” file is:


Ans- 3,000

2. The median of “Total Sale $” is larger than mean. By how much? Round to 2 decimal
places.
Ans- 2.81

3. What is the standard deviation of Total Sale? Round to 2 decimal place.


Ans- 120.26

4. What percentage of orders fell within the interquartile range of Total Sale?
Ans- 50%

5. What is the approximate shape of the distribution of total sales? (Hint: Create a
histogram to see, or use what you know about the mean/median relationship and the
rule of thumb.)

 The distribution is uniform.


 The distribution is symmetric.
 The distribution is skewed right.
 The distribution is skewed left.

Answer-The distribution is skewed left


.
6. Given the limited information you have, your boss wants you to group customers in a
meaningful way. You decided to take a look at how the order region impacts things.
Calculate the average total sales from the North region only. What is the difference
between the North region average total sales and the average total sales across all
regions (including the North)? Round to 2 decimal places.
Ans- 1.44

7. What is the absolute value of difference between the North region median total sales
and all orders median total sales (across all regions including the North)?
Round your answer to two decimal places.
Ans- 0.58
8. Next, take a look at customer age. Create 3 age groups: 21-30, 31-40, 41-50. What is
the average total sales for the age group with the highest average? Round to 2 decimal
places.
Ans- 494.62 in the Age Group of 21-30

9. What is the median total sales of the age group with the highest average Total Sales?
Round to 2 decimal places.
Ans- 479.50

10. Given the mean and median of the group with the highest average sales, what can you
say about the distribution of total sales within that group?
 The distribution is uniform.
 The distribution is normal.
 The distribution is skewed right.
 The distribution is skewed left.

11. Based on this data, what would you recommend to your boss?

 We should separate customers by region and target the North region as our main
customer segment has historically brought in higher average total sales.
 We should separate customers by age group and target 21-30 as our main
customer segment. That segment has historically brought in higher average total
sales.
 We should separate customers by age group and target 31-40 as our main customer
segment. That segment has historically brought in higher average total sales.
 We should separate customers by age group and target 41-50 as our main customer
segment. That segment has historically brought in higher average total sales

ASSIGNMENT 2
Measures of Association, Probability, and Data
Distributions Descriptive

Descriptive Measures of Association, probability, and Statistical


Distribution
1. How many rows of data are included in the dataset given?
Ans – 6,000

2. What is the covariance of Datasets A and B? Round to 2 decimal places.


Ans- 81379.81

3. Which dataset pair has the highest covariance?


 A&B
 B&C
 A&C
 Cannot be determined with the information given

4. Which dataset pair has the strongest relationship?


 A&B
 B&C
 A&C
 Cannot be determined with the information given

5. Given that dataset A outcomes always occurs before dataset B outcomes (and no other
information). Can you conclude that A causes B?

 Yes, because all requirements for causation are met.


 Yes, because covariance and correlation are both positive.
 No, the variables are not correlated.
 No, there is no control for external variables.

6. Create a histogram of Dataset A. Based on the shape of the distribution of outcomes,


which of the followings is most likely true?
 Higher values are much more likely to occur than lower values.
 Negative values are much more likely to occur than positive values.
 All values in the range have a relatively equivalent chance of occurring, with a slightly
lower probability on the high end.
 No information can be used from this dataset.

7. Create a histogram of Dataset B. Based on the shape of the distribution of outcomes,


select the range below that appears to have the highest probability of occurrence.
 -729 to -350 (Upper Limit then Lower Limit)
 -250 to 200
 400 to 800
 1100 to 1500
8. Consider 4 sets of data:
 Set W: set of all real numbers over the range 1 to 100.
 Set X: set of all integers over the range 1 to 100.
 Set Y: set of all real numbers over the range 1 to 3
 Set Z: set of all whole numbers over the range 1 to 10,000.
 Which set has the LEAST numbers?

 Set W.
 Set X.
 Set Y.
 Set Z.
 Cannot be determined from the information given.

9. Assume the data set Y and Z have a covariance of -500. Which of the following do you
know to be true? Select all that apply.
 Dataset Y and Z have a strong relationship.
 Dataset Y and Z have a negative relationship.
 The result may be affected by the units of measurement.
 Dataset Y and Z have a casual relationship.

10. Select all the example of Discrete data below:


 Bees in a beehive
 Honey in a beehive
 Fish in the sea
 Voltage level of a battery
 Your dog’s weight
 Time you wake up in the morning
 Languages spoken
 Voters for a particular candidate in an election
 Cooking oil used in recipe
 Animals on a farm

You might also like