MALNAD COLLEGE OF ENGINEERING
(An Autonomous Institution under Visvesvaraya Technological University, Belagavi)
Department of Computer Science and Engineering
MATHEMATICS FOR COMPUTER SCIENCE – IV
COURSE CODE: 23MACS401
Submitted By
RITISH SHARMA - 4MC23CS137
SWAROOP K VASISTA - 4MC23CS166
SYED AHMED - 4MC23CS168
ULLAS GOWDA JS - 4MC23CS175
UTKARSH KASHYAP - 4MC23CS178
Under the guidance of
Mr. ADITHYA G N
Assistant Professor
Department of Computer Science and Engineering
Table of Contents
SL NO. Problems Page No
1 Application of Multiple Regression 03
2 Applications of continuous Random 11
Variable and Sampling Theory
3 Hypothesis Analysis of Defects in Printed 14
Circuit Boards
4 Application of Markov Chain 20
5 Application Problem: Using the t- 25
Distribution
Page | 2
Application of Multiple Regression
Abstract :
Multiple regression analysis is a powerful statistical tool used to model the relationship
between one dependent variable and several independent variables. This report explores the
application of multiple regression when the expected output depends on three, four, or five
inputs. Through practical problems and their solutions, the report demonstrates how multiple
regression helps in understanding complex relationships and making predictions in fields such
as agriculture, finance, and healthcare.
Introduction:
In real-world scenarios, outcomes often depend on several factors rather than just one.
Multiple regression extends simple linear regression by allowing for the inclusion of multiple
independent variables, making it possible to analyze and predict outcomes in complex
systems. This method is widely used in various domains, including economics, engineering,
and the social sciences, to quantify the effect of several predictors on a single response
variable.
Objectives of Multiple Regression with Multiple Inputs:-
To Quantify the Relationship Between Multiple Predictors and a Single Output Understand how
several independent variables jointly influence a dependent variable.
To Estimate the Individual Impact of Each Input Variable
Determine the unique effect of each predictor on the outcome while controlling for others.
To Make Accurate Predictions in Complex, Real-World Scenarios
Use the regression model to forecast outcomes based on new sets of input values.
To Identify and Assess the Significance of Important Factors
Detect which variables have statistically significant effects on the output.
To Support Data-Driven Decision Making in Various Fields
Provide actionable insights for fields like economics, healthcare, engineering, and business
by modeling and interpreting multifactorial relationships
Page | 3
Analysis, Results & Discussion :
Problem 1: Predicting Crop Yield (3
Inputs) Scenario:
An agricultural scientist wants to predict crop yield (Y) based on rainfall (X₁), temperature
(X₂), and fertilizer usage (X₃).
Regression Model:
Y=β0+β1X1+β2X2+β3X3+ϵY=β0+β1X1+β2X2+β3X3+ϵ
Sample Data:
Solution:
State the Problem
We want to predict crop yield (Y) based on three
factors: Rainfall (X₁)
Temperature (X₂)
Fertilizer usage
(X₃) Collect and
Organize Data
Prepare your data in a
table:
Page | 4
Write the System in Matrix Form Let’s write the regression
equation for each observation: 2.0=β0+100β1+25β2+50β3
2.5=β0+120β1+23β2+55β3 1.8=β0+90β1+27β2+45β3
2.7=β0+110β1+26β2+60β3
2.2=β0+105β1+24β2+52β3
Or, in matrix notation:
Y=Xβ+ϵY=Xβ+ϵ
Where:
YY is the vector of observed yields
XX is the matrix of predictors (including a column of 1’s for the intercept)
ββ is the vector of coefficients
Calculate the Coefficients Mathematically
The least squares solution is:
Page | 5
Write the Final Regression Equation
Page | 6
Problem 2: Predicting Stock Price (4 Inputs)
Step 1: State the Problem
Predict the stock price (YY) using:
S&P 500 index (X₁)
Oil price (X₂)
Interest rate
(X₃)
Oil futures price (X )
4
Page | 7
Problem 3: Predicting Patient Recovery Time (5 Inputs)
Step 1: State the Problem
Predict recovery time (YY)
using: Age (X 1)
Dosage (X 2)
Pre-existing condition (X3 , 1=yes, 0=no)
Blood pressure (X 4)
Heart rate (X 5)
Page | 8
Page | 9
Conclusion
Multiple regression analysis is a vital statistical tool for
understanding and modeling the relationship between a dependent
variable and several independent variables. By incorporating
multiple inputs, this method allows researchers and professionals
to capture the complexity of real-world phenomena, estimate the
individual and combined effects of various factors, and make
accurate predictions. Whether applied in agriculture, finance,
healthcare, or engineering, multiple regression not only enhances
our ability to interpret data but also supports data-driven decision-
making. Mastery of this technique is essential for anyone seeking
to analyze multifactorial systems and derive actionable insights
from complex datasets.
Page | 10
Page | 1
Page | 2
Page | 3
Page | 4
Page | 5
Page | 6
Page | 7
Page | 8
Page | 9
Page | 10
Applications of continuous Random Variable and Sampling
Theory
Abstract
1. Continuous Random Variables (Current Measurement): Measurements such as
electrical current often follow a normal distribution, allowing precise probability
calculations for expected ranges. By standardizing the variable using the Z-score
transformation, one can compute the likelihood that a measurement falls within a
specified interval, offering insights into sensor accuracy and system stability.
2. Confidence Interval Estimation (Propellant Burning Rate): In process testing, the
confidence interval quantifies the range within which the true mean value (e.g., burn
time) likely lies, based on sample data. When the sample size is large, the interval is
computed using the sample mean, standard deviation, and the standard normal critical
value, enabling engineers to estimate operational characteristics with controlled
uncertainty.
Introduction
In engineering and applied sciences, understanding and managing uncertainty is essential for
accurate measurements, efficient processes, and reliable outcomes. Statistical methods rooted
in probability theory play a critical role in analyzing and interpreting real-world data.
Continuous random variables, such as current measurements, often follow normal
distributions, enabling precise predictions through Z-scores. Confidence interval estimation
provides a range in which true process parameters, like propellant burn time, are expected to
lie, accounting for data variability. Hypothesis testing further aids in decision-making by
evaluating whether observed outcomes—such as mean processing time or drying
proportions—meet expected standards. In comparative studies, two-sample hypothesis testing
allows engineers to determine whether differences between interventions (e.g., different
catalysts) are statistically significant. Together, these techniques form the foundation for
modeling uncertainty, validating performance, and guiding improvements in both
measurement systems and experimental processes.
Page | 11
Analysis, Results & Discussion:
Problem 1: Current Measurement Problem (Based on Continuous Random
Variable) The current I measured in a conductor is modeled as a
continuous random variable with a normal distribution, mean μ = 12 A,
and standard deviation σ = 1.5 A. What is the probability that a randomly
taken measurement lies between 10 A and 14 A?
The Probability that the current lies between 10A and 14A is 81.64%.
Page | 12
Problem 2: Propellant Burning Rate Problem (Confidence Interval from
Sampling Theory)
A researcher measures the burning time (in seconds) of 36 propellant
samples. The sample mean is 8.4 seconds with a sample standard deviation
of 1.2 seconds. Construct a 95% confidence interval for the true mean
burning time.
With 95% confidence, the true mean burning time lies between 8.008s
and 8.792s.
Page | 13
Hypothesis Analysis of Defects in Printed Circuit Boards
Introduction
Printed Circuit Boards (PCBs) are essential components in nearly all electronic devices. High
reliability and quality are critical in PCB manufacturing, as defects can cause product failure
and increase manufacturing costs. Manufacturers typically aim for a very low defect rate to
ensure quality. This report uses hypothesis testing to analyze whether the defect rate observed
in a sample of PCBs deviates significantly from the manufacturer's claimed standard.
Problem 1
A PCB manufacturing company claims that no more than 5% of their products are defective.
To verify this, a quality control team randomly selected 100 PCBs from a recent production
batch and found that 10 of them were defective. This report aims to determine, using
hypothesis testing, whether this sample provides statistically significant evidence that the
defect rate is higher than claimed.
Data Collection
Hypothesis Formulation
We will perform a one-sample z-test for proportions.
• Null Hypothesis (H₀): The true proportion of defects is 5% (p = 0.05)
• Alternative Hypothesis (H₁): The true proportion of defects is not 5% (p ≠ 0.05)
• Significance Level (α): 0.05
Statistical Analysis
We use the formula for the z-test for proportions:
Page | 14
Since the p-value (0.022) < α (0.05), we reject the null hypothesis.
Conclusion: There is statistically significant evidence that the defect rate in the PCBs is
different from the claimed 5%. In this case, the sample suggests a higher defect rate (10%).
Page | 15
Problem 2
A PCB supplier claims that no more than 2% of their boards are defective. A customer tests
400 boards and finds 12 defective ones.Is there sufficient evidence at the 0.05 significance
level to suggest the defect rate is higher than claimed?
Solution:
Page | 16
Problem 3
The average number of defects per PCB is claimed to be 1.2. A new supplier sends a sample
of 30 boards, which shows a sample mean of 1.5 defects and standard deviation of 0.6 . Test
at the 0.05 level whether this supplier has a higher average defect rate.
SOLUTION
Degrees of freedom: df=29
Critical t (0.05 level, one-tailed): ≈ 1.699
Conclusion: Since 2.74 > 1.699, we reject H0. There is significant evidence that the
supplier’s average defect rate is higher than 1.2.
Page | 17
Problem 4
Two PCB production lines yield different defect counts:
• Line A: 300 boards, 9 defects
• Line B: 300 boards, 3 defects
Is there a significant difference in defect proportions at the 0.05 level?
Solution:
Critical z (two-tailed, α = 0.05): ±1.96
Conclusion: 1.74 < 1.96 → Fail to reject H0 No statistically significant difference between
the two lines.
Page | 18
Conclusion of Hypothesis Analysis
Hypothesis analysis is a powerful statistical method used to make decisions or inferences
about population parameters based on sample data. It provides a structured approach to test
assumptions—such as defect rates in manufacturing—by comparing observed data to
expected outcomes under a stated null hypothesis.
Through the process of hypothesis testing, we determine whether the observed differences are
due to random chance or reflect a statistically significant effect. This involves formulating
null and alternative hypotheses, choosing a significance level, calculating a test statistic, and
making a decision based on critical values or p-values.
• Rejecting the null hypothesis suggests strong evidence for a real effect or difference.
• Failing to reject the null implies insufficient evidence to support a change from the
assumed condition.
Hypothesis analysis supports data-driven decision-making, helping businesses, researchers,
and engineers validate claims, identify quality issues, and improve processes. In quality
control contexts such as PCB manufacturing, it ensures product standards are met and helps
maintain reliability and customer trust.
Page | 19
Application of Markov Chain
Abstract:
A Markov Chain A Markov chain is a stochastic model describing a sequence of possible events
where the probability of each event depends only on the state attained in the previous event.
Introduction:
Markov Chains are powerful mathematical models used to describe systems that transition from one
state to another in a probabilistic manner. Named after Russian mathematician Andrey Markov, these
models assume the “memoryless” property—meaning the probability of transitioning to the next state
depends only on the current state and not on the sequence of events that preceded it. This
characteristic makes Markov Chains highly suitable for analyzing dynamic systems in a wide range of
real-world applications.
In the field of computer science, Markov Chains are commonly used in algorithms such as Google's
PageRank and in modeling web page navigation patterns. In economics and finance, they are applied
to model market trends, credit risk, and economic forecasting. In biology, they help simulate genetic
sequences and model population dynamics. Additionally, they are valuable in queueing theory, speech
recognition, game theory, and reliability engineering.
One of the main advantages of using Markov Chains is their ability to simplify complex systems into
manageable probabilistic models. They provide insight into long-term behavior through steady-state
analysis and can be used to predict future outcomes with a known degree of uncertainty. This makes
them an essential tool in both theoretical research and practical decision-making.
Page | 20
Markov Chain Problems: Population Distribution due to Migration
Problem 1: Migration Between Three Cities
Context: Three cities – A, B, and C – experience population shifts every year. The transition
probability matrix is:
Tasks:
1. Compute the population distribution after one year.
2. Compute the population distribution after two years.
3. Determine whether a steady-state distribution might exist.
Now calculate each component:
City A:
4000(0.6) + 3000(0.2) + 3000(0.1) = 2400 + 600 + 300 = 3300
City B:
4000(0.3) + 3000(0.5) + 3000(0.2) = 1200 + 1500 + 600 = 3300
City C:
4000(0.1) + 3000(0.3) + 3000(0.7) = 400 + 900 + 2100 = 3400
X(1) = [3300, 3300, 3400]
Page | 21
Step 2: Population after 2 years
Now use x(1) and multiply again:
x(2) = x(1) ⋅ P = [3300, 3300, 3400] ⋅ P
Calculate:
City A:
3300(0.6) + 3300(0.2) + 3400(0.1) = 1980 + 660 + 340 = 2980
City B:
3300(0.3) + 3300(0.5) + 3400(0.2) = 990 + 1650 + 680 = 3320
City C:
3300(0.1) + 3300(0.3) + 3400(0.7) = 330 + 990 + 2380 = 3700
X(2) = [2980, 3320, 3700]
Step 3: Steady-state check
The population values are stabilizing between years. So a steady-state distribution likely exists,
meaning future distributions won’t change much.
Page | 22
Problem 2: Urban-Rural Migration
Context: A region has two areas: Urban (U) and Rural (R). Each year, 15% of rural residents move to
urban areas, and 10% of urban residents move to rural areas.
Tasks:
1. Find the population distribution after 3 years.
2. Compute the long-term (steady-state) population distribution.
3. Interpret the steady-state.
Given:
Transition matrix:
Initial population:
x(0) = [6000,4000]
Step 1: Population after 1 year
x(1) = x(0) ⋅ P = [6000, 4000] ⋅ P
Urban:
Page | 23
6000(0.9) + 4000(0.15) = 5400+600 = 6000
Rural:
6000(0.1) + 4000(0.8) = 600+3200 = 3800
x(1) = [6000,3800]
Step 2: Repeat for 3 years
Year 2:
x(2) = [6000, 3800]⋅P
Urban:
6000(0.9) + 3800(0.15) = 5400+570 = 5970
Rural:
6000(0.1) + 3800(0.8) = 600+3040 = 3640
x(2) = [5970,3640]
Year 3:
x(3) = [5970, 3640] ⋅ P
Urban:
5970(0.9) + 3640(0.15) = 5373+546 = 5919
Rural:
5970(0.1) + 3640(0.8) = 597+2912 = 3509
x(3) = [5919, 3509]
Step 3: Steady-state interpretation
The values are slowly stabilizing (Already reached. x* = [6000, 4000]). Eventually, the population
will settle into a fixed proportion between urban and rural areas.
Page | 24
Application Problem: Using the t-Distribution
Introduction
In statistical analysis, making inferences about a population based on sample data is a
common practice. When the population standard deviation is unknown and the sample size is
relatively small, the t-distribution becomes an essential tool. The t-distribution allows for
more accurate estimation and hypothesis testing under these conditions by accounting for the
added uncertainty in smaller samples. This topic explores how the t-distribution is applied to
real-world problems involving sample means, including confidence intervals and hypothesis
testing, highlighting its importance in practical statistical decision-making.
Problem:1
A teacher wants to determine whether the average test score of her class is significantly
different from the national average of 75. She randomly selects a sample of 10 students and
obtains the following scores:
78, 74, 69, 80, 72, 77, 73, 76, 71, 75
Using a 5% significance level, test whether the class mean is significantly different from the
national average.
Solution:
Step 1: State the Hypotheses
Null Hypothesis (H₀): μ = 75
Alternative Hypothesis (H₁): μ ≠ 75
Step 2: Set the Significance Level
α = 0.05 (5%)
Step 3: Calculate the Sample Mean and Standard Deviation
Sample scores: 78, 74, 69, 80, 72, 77, 73, 76, 71, 75
Sample mean (x̄) = (78 + 74 + 69 + 80 + 72 + 77 + 73 + 76 + 71 + 75) / 10 = 74.5
Sample standard deviation (s) = √[Σ(xi - x̄)² / (n - 1)] = 3.16
Step 4: Calculate the t-Statistic
t = (x̄ - μ) / (s / √n) = (74.5 - 75) / (3.16 / √10) ≈ -0.5
Step 5: Determine the Degrees of Freedom and Critical Value
Degrees of freedom (df) = n - 1 = 9
From t-table at df=9 and α=0.05 (two-tailed), critical t ≈ ±2.262
|t| = 0.5 < 2.262 → Fail to reject H₀
Step 6: Conclusion
There is not enough evidence to say the class mean is significantly different from the national
average at 5% level of significance.
Page | 25
Problem:2
A pharmaceutical company wants to test the effectiveness of a new drug designed to lower
blood pressure. A sample of 15 patients is selected, and their systolic blood pressure levels
are measured after administering the drug. The sample mean decrease in blood pressure is 8
mmHg with a sample standard deviation of 4 mmHg. Assuming the decrease in blood
pressure is normally distributed, test at the 5% significance level whether the drug is effective
(i.e., whether the mean decrease is significantly greater than 0).
Solution:
Step 1: Define the Hypotheses - Null hypothesis (H0): μ = 0 (The drug has no effect) –
Alternative hypothesis (H1): μ > 0 (The drug is effective)
Step 2: Identify the Test Statistic Since the population standard deviation is unknown and
the sample size is small (n < 30), we use the t-distribution:
Step 3: Determine the Degrees of Freedom and Critical Value - Degrees of freedom (df) =
n - 1 = 14. Using a t-table at α = 0.05 for a one-tailed test and df = 14, the critical value
Step 4: Make a Decision - Calculated t = 7.74 - Critical t = 1.761 - Since 7.74 > 1.761, we
reject the null hypothesis.
Step 5: Conclusion There is sufficient evidence at the 5% significance level to conclude that
the drug is effective in lowering blood pressure.
Conclusion
The t-distribution plays a crucial role in statistical inference, particularly when working with
small samples and unknown population parameters. Through its application, researchers and
analysts can draw meaningful conclusions and make confident decisions even in the absence
of complete population data. Understanding when and how to use the t-distribution enables
more accurate and reliable analysis, ensuring that conclusions drawn from data are both valid
and scientifically sound.
Page | 26