0% found this document useful (0 votes)
16 views5 pages

Assignment

The document outlines the course 'Probability and Statistics for Data Analysis' with various assignments focusing on real-world applications of statistical methods. It includes tasks related to probability distributions, correlation and regression analysis, measures of central tendency, hypothesis testing, and experimental design. Each unit contains specific questions and instructions for data analysis using R Studio and other statistical tools.

Uploaded by

anaikutti329
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Assignment

The document outlines the course 'Probability and Statistics for Data Analysis' with various assignments focusing on real-world applications of statistical methods. It includes tasks related to probability distributions, correlation and regression analysis, measures of central tendency, hypothesis testing, and experimental design. Each unit contains specific questions and instructions for data analysis using R Studio and other statistical tools.

Uploaded by

anaikutti329
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Course Code : 23MA2201

Course Name : PROBABILITY AND STATISTICS FOR DATA ANALYSIS


Semester/Year : II/ I (IT)
Subject In charge : Dr. A. Mohammed shapique
Q.No Unit No. Assignment Task/Questions Marks CO PO
1.Apply the relevant distribution and compute the solution for the real time situation
2.Compute correlation and regression for the real time dataset using R studio
3.Evaluate measures of central tendency, dispersion, skewness, and kurtosis to analyze
univariate data distributions
4.Calculate sampling parameters and apply hypothesis tests for means, variances, and goodness
of fit
5.Analyze one-way and two-way classifications and evaluate experimental designs,
1.1 UNIT 1o For each scenario, use the appropriate 16 CO 1 PO 1,
probability distribution and calculate the PO 2,
relevant probabilities. PO 3,
Compare and discuss how each distribution fits PO 4,
the given scenario.
a) Calculate the probability of getting exactly
3 heads in 5 tosses of a fair coin.
b) Model the number of calls received by a
call center in an hour if the average rate is
4 calls per hour.
c) Calculate the probability of a basketball
player making their first successful shot on
the 4th attempt.
d) Given that a dice is rolled, calculate the
probability of rolling a number between 2
and 5.
e) Calculate the probability that a customer
waits less than 5 minutes in line at a bank
if the average waiting time is 10 minutes.
f) If the height of adult males follows a
normal distribution with a mean of 175 cm
and a standard deviation of 7 cm, calculate
the probability that a randomly selected
male is between 170 cm and 180 cm tall..
1.2 Instructions: Select a real-world scenario for 16 CO 1 PO 1,
each of the following distributions and apply PO 2,
the relevant distribution: PO 3,
a) In a factory, 80% of items pass the quality PO 4,
control inspection. You randomly select 6
items from the production line.
b) Calculate the probability that exactly 5
items pass the inspection.
c) A grocery store experiences an average of 3
customers arriving per 10-minute interval.
d) Calculate the probability of receiving
exactly 2 customers in the next 10-minute
interval.
e) A technician tests a machine, and the
probability of it passing a test is 0.7. The
technician continues testing until the first
success. Calculate the probability that the
technician needs exactly 4 tests to get the
first successful result.
f) You are planning a random meeting time
between 10:00 AM and 11:00 AM. The
time of the meeting is uniformly distributed
between these two hours. Calculate the
probability that the meeting occurs between
10:15 AM and 10:30 AM.
g) A delivery service has an average delivery
time of 15 minutes. Assume that the
delivery times are exponentially distributed.
Calculate the probability that a delivery
takes more than 20 minutes.
h) The test scores in a class follow a normal
distribution with a mean of 75 and a
standard deviation of 10. Calculate the
probability that a randomly selected student
score between 70 and 80.
For each scenario, use the appropriate
probability distribution and calculate the
relevant probabilities. Interpret the results and
explain the practical implications of these
probabilities.
2.1 UNIT 2 Collect data of two-wheeler sales from the 16 CO 2 PO 1,
following website PO 2,
https://www.autopunditz.com/two-wheeler- PO 3,
sales-figures PO 4
Analyze your data using the techniques
a) Correlation,
b) Rank correlation,
c) linear regression
d) Computation of correlation coefficient
and trend equation for the given data
using R studio.
The above website consists of two-wheeler
sales from Jan 2024 to Dec 2024. Choose
any one month of your choice for computing
the above said statistical method
2.2 Choose a dataset with a temporal or sequential 16 CO 2 PO 1,
aspect, such as sales data over several months, PO 2,
population growth over years, or temperature PO 3,
changes over days PO 4
a) Plot the data and fit a linear trend equation
using the least squares method.
b) Calculate the slope and intercept of the
line.
c) Present the linear trend equation.
d) Interpret the slope and intercept in the
context of the data.
e) Discuss how well the linear model fits the
data and its potential predictive power.
3.1 UNIT 3 Find two online datasets (e.g., from 16 CO 3 PO 1,
government databases, Kaggle, or open data PO 2,
repositories) that contain numerical data such PO 3,
as average income, age, or exam scores. PO 4,
For each dataset, calculate the following:
a) Mean, Median, and Mode.
b) Identify any differences between the
mean, median, and mode in each dataset.
c) Interpret which measure of central
tendency best represents the data in each
case.
d) Calculate the skewness and kurtosis.
e) Plot the histogram of the above data using
R studio
3.2 Create an online survey using Google Forms 16 CO 3 PO 1,
or any survey platform. Ask at least 10-15 PO 2,
participants to rate their satisfaction with a PO 3,
product (e.g., smartphone,) on a scale of 1- PO 4,
10.
a) Classification and Tabulation: Organize
the survey data into a frequency
distribution table, classifying the responses
into appropriate intervals or categories.
b) Graphical Representation: Plot the data
using bar graphs, histograms, or pie charts
to visually represent the distribution of
responses.
c) Present the frequency distribution table and
graphical representations.
d) Discuss any patterns or trends observed in
the data.
4.1 UNIT 4 Task 1: Collect categorical data from any 16 CO 4 PO 1,
credible source (such as PO 2,
https://www.kaggle.com/datasets/syedanwara PO 3,
fridi/vehicle-sales-data or statistical PO 4,
databases).
Example: The distribution of types of vehicles
sold in a region (SUV, Sedan, Truck, etc.).
a) Set the Hypothesis
b) Choose an expected distribution based
on prior knowledge or theory.
c) Perform the Chi-Square Goodness of
Fit test.
d) Interpret the Chi-Square statistic and
p-value.
Task 2
Collect data involving two categorical variables
from any credible online source.
Example: Relationship between smoking habits
(Smoker/Non-Smoker) and the presence of a
health condition (Yes/No).
a) Create a contingency table from the
collected data.
b) Perform the Chi-Square Test for
Independence.
c) Interpret the Chi-Square statistic and p-
value.
d) Submit a brief report including the
following sections for each task:
e) Introduction: Purpose of the study and
hypothesis.
f) Data Description: Source of data and key
variables.
g) Methodology: Explanation of the Chi-
Square test applied.
h) Results: Tables, calculations, and
interpretation of p-values.
i) Conclusion: Findings and their real-world
implications

4.2 Choose a dataset that includes at least two 16 CO 4 PO 1,


groups or categories for comparison. PO 2,
Example datasets could be related to education PO 3,
(test scores), health (blood pressure levels), PO 4,
economics (income levels), or sports (player
performance).
a) Perform an independent t-test and interpret
the results.
b) Test whether the variances of the two
groups are equal. Ensure you calculate and
interpret the F-statistic properly.
c) Submit a brief report including the
following sections:
d) Purpose of the study and hypothesis.
e) Source of data and key variables.
f) Explanation of the t-test and F-test applied.
g) Tables, calculations, and interpretation of
p-values.
h) Findings and their real-world implications.
5.1 UNIT 5 Search for studies or online datasets that 16 CO 5 PO 1,
compare plant growth across different soil PO 2,
types (e.g., soil A, soil B, soil C) in different PO 3,
farm locations (e.g., urban, rural, forest). PO 4,
Organize the data into three blocks based on
farm locations:
o Farm Location 1
o Farm Location 2
o Farm Location 3
a) Measure the plant growth (e.g., height,
number of leaves) for each soil type within
each location.
b) Perform a randomized block design
analysis to test if the soil type has a
significant effect on plant growth.
c) State the null and alternative hypotheses.
d) Perform the ANOVA test using the
randomized block design.
e) Analyze the effect of soil type on plant
growth.
f) Conclude the findings and discuss the
importance of soil types in farming
practices.
5.2 To analyse how product colour affects 16 CO 5 PO 1,
consumer preferences, using a Latin Square PO 2,
Design. PO 3,
Instructions: PO 4,
Collect Data: Find consumer preference data
from online surveys or studies related to
product colour choices. If such data is not
available, you may conduct a small survey
among your friends/family asking them to rank
three colours (e.g., red, blue, green) for a
product (e.g., a smartphone case, car colour,
etc.).
Rank the preference for the product in each
colour from 1 to 3.
Use a Latin Square Design to assign the order
in which participants view each colour to avoid
order bias (each colour must appear in every
row and column).
Analyse the data to determine if there is a
significant preference for any colour.
Report:
a) Apply the Latin square design for
analysis.
b) Calculate and interpret the results.
c) Conclude which color is most preferred
based on the data.

You might also like