0% found this document useful (0 votes)

29 views53 pages

Slides Data Analysis

This document discusses using Python for analyzing risk engineering data. It begins by stating that the purpose of computing is insight, not just numbers. It then shows diagrams of how data, probabilistic models, and consequence models are used to determine risks, costs, criteria, and decision making in risk engineering. The rest of the document focuses on descriptive statistics in Python, including measures of central tendency like mean, median, and mode. It discusses measures of variability like variance and standard deviation. It provides examples calculating these statistics and illustrates them with histograms, box plots, and violin plots using sample fatigue life data.

Uploaded by

Rui Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views53 pages

Slides Data Analysis

Uploaded by

Rui Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Analyzing data using Python

Eric Marsden
<[Link]@[Link]>

The purpose of computing is insight, not numbers.

– Richard Hamming
Where does this fit into risk engineering?

curve
fitting
data probabilistic model consequence model

event probabilities event consequences

risks
Where does this fit into risk engineering?

curve
fitting
data probabilistic model consequence model

event probabilities event consequences

risks costs

criteria

decision-making
Where does this fit into risk engineering?

curve
fitting
data probabilistic model consequence model

event probabilities event consequences

These slides
risks costs

criteria

decision-making
Descriptive statistics
Descriptive statistics

▷ Descriptive statistics allow you to summarize information about

observations
• organize and simplify data to help understand it

▷ Inferential statistics use observations (data from a sample) to make

inferences about the total population
• generalize from a sample to a population
Descriptive statistics

Source: [Link]
Measures of central tendency

▷ Central tendency (the “middle” of your data) is measured either by the

median or the mean

▷ The median is the point in the distribution where half the values are
lower, and half higher
• it’s the 0.5 quantile

▷ The (arithmetic) mean (also called the average or the mathematical

expectation) is the “center of mass” of the distribution
𝑏
• continuous case: 𝔼(𝑋) = ∫𝑎 𝑥𝑓 (𝑥)𝑑𝑥
• discrete case: 𝔼(𝑋) = ∑𝑖 𝑥𝑖 𝑃(𝑥𝑖 )

▷ The mode is the element that occurs most frequently in your data
Illustration: fatigue life of aluminium sheeting

Measurements of fatigue life

(thousands of cycles until rupture) of > import numpy
> cycles = [Link]([370, 1016, 1235, [...] 1560, 1792])
strips of 6061-T6 aluminium sheeting, > [Link]()
subjected to loads of 21 000 PSI. 1400.9108910891089
> [Link](cycles)
1400.9108910891089
Data from Birnbaum and Saunders > [Link](cycles)
1416.0
(1958).

Source: Birnbaum, Z. W. and Saunders, S. C. (1958), A statistical model for life-length of materials, Journal of the American Statistical
Association, 53(281)
Aside: sensitivity to outliers

Note: the mean is quite sensitive to outliers, the median much less.
▷ the median is what’s called a robust measure of central tendency

> import numpy

> weights = [Link](80, 10, 1000)
> [Link](weights)
79.83294314806949
> [Link](weights)
79.69717178759265
> [Link](weights, 50)
79.69717178759265 # 50th percentile = 0.5 quantile = median
> weights = [Link](weights, [10001, 101010]) # outliers
> [Link](weights)
190.4630171138418 # <-- big change
> [Link](weights)
79.70768232050916 # <-- almost unchanged
Measures of central tendency

If the distribution of data is symmetrical, then the mean

is equal to the median.

If the distribution is asymmetric (skewed), the mean is

generally closer to the skew than the median.
Degree of asymmetry is measured by skewness (Python:
[Link]())

Negative skew Positive skew

Measures of variability

▷ Variance measures the dispersion (spread) of observations around the

mean
• 𝑉 𝑎𝑟(𝑋) = 𝔼 [(𝑋 − 𝔼[𝑋])2 ]
• continuous case: 𝜎2 = ∫(𝑥 − 𝜇)2 𝑓 (𝑥)𝑑𝑥 where 𝑓 (𝑥) is the probability density
function of 𝑋
𝑛
• discrete case: 𝜎2 = 1
𝑛−1 ∑𝑖=1 (𝑥𝑖 − 𝜇)2
• note: if observations are in metres, variance is measured in 𝑚2
• Python: [Link]() or [Link](array)

▷ Standard deviation is the square root of the variance

• it has the same units as the mean
• Python: [Link]() or [Link](array)
Exercise: Simple descriptive statistics

Task: Choose randomly 1000 integers from a uniform distribution

between 100 and 200. Calculate the mean, min, max, variance and
standard deviation of this sample.

> import numpy

> obs = [Link](100, 201, 1000)
> [Link]()
149.49199999999999
> [Link]()
100
> [Link]()
200
> [Link]()
823.99793599999998
> [Link]()
28.705364237368595
Histograms: plots of variability

Histograms are a sort of bar graph that shows

the distribution of data values. The vertical axis import [Link] as plt
displays raw counts or proportions. # our Birnbaum and Sanders failure data
[Link](cycles)
To build a histogram: [Link]("Cycles until failure")

1 Subdivide the observations into several equal

classes or intervals (called “bins”) 25

20
2 Count the number of observations in each
interval 15

3 Plot the number of observations in each 10

interval
5

Note: the width of the bins is important to obtain a 0

500 1000 1500 2000 2500
“reasonable” histogram, but is subjective. Cycles until failure
Quartiles

▷ A quartile is the value that marks one of the divisions

that breaks a dataset into four equal parts

▷ The first quartile, at the 25th percentile, divides the 25% of observations
25% of
first ¼ of cases from the latter ¾ observations

▷ The second quartile, the median, divides the dataset in 25% of observations

half
25% of observations

▷ The third quartile, the 75th percentile, divides the first

¾ of cases from the latter ¼
interquartile range

▷ The interquartile range (IQR) is the distance between

the first and third quartiles
• 25th percentile and the 75th percentile
Box and whisker plot

A “box and whisker” plot or boxplot shows the spread of import [Link] as plt

the data [Link](cycles)

[Link]("Cycles until failure")
▷ the median (horizontal line)
2500
▷ lower and upper quartiles Q1 and Q3 (the box)
2000

Cycles until failure

▷ upper whisker: last datum < Q3 + 1.5×IQR 1500

1000
▷ the lower whisker: first datum > Q1 - 1.5×IQR
500

▷ any data beyond the whiskers are typically called 1

outliers
plot whiskers
Note that some people th
tly, to rep rese nt the 5th and 95
differen and
e, or eve n the min
percentiles for exampl
max valu es…
Violin plot

2500
Adds a kernel density estimation to a boxplot
2000

1500
import seaborn as sns
1000
[Link](cycles, orient="v")
[Link]("Cycles until failure") 500

0
Cycles until failure
Bias and precision

Precise Imprecise

Biased

Unbiased

A good estimator should be unbiased, precise and consistent (converge as sample size
increases).
Estimating values

▷ In engineering, providing a point estimate is not enough: we also need to

know the associated uncertainty
• especially for risk engineering!

▷ One option is to report the standard error

• 𝜎̂
√𝑛
, where 𝜎̂ is the sample standard deviation (an estimator for the population
standard deviation) and 𝑛 is the size of the sample
• difficult to interpret without making assumptions about the distribution of the
error (often assumed to be normal)

▷ Alternatively, we might report a confidence interval

Confidence intervals

▷ A two-sided confidence interval is an interval [𝐿, 𝑈] such that C% of the

time, the parameter of interest will be included in that interval
• most commonly, 95% confidence intervals are used

▷ Confidence intervals are used to describe the uncertainty in a point

estimate
• a wider confidence interval means greater uncertainty
Interpreting confidence intervals

population
mean

m
m
m A 90% confidence interval means that 10% of
m the time, the parameter of interest will not be
m included in that interval.
m Here, for a two-sided confidence interval.
m
m
m
m
Interpreting confidence intervals

population
mean

m
m
m A 90% confidence interval means that 10% of
m the time, the parameter of interest will not be
m included in that interval.
m Here, for a one-sided confidence interval.
m
m
m
m
Illustration: fatigue life of aluminium sheeting

import seaborn as sns

Confidence intervals can be displayed
[Link](cycles, ci=95, capsize=0.1)
graphically on a barplot, as “error lines”. [Link]("Cycles until failure (95% CI)")

Note however that this graphical presentation

is ambiguous, because some authors represent
the standard deviation on error bars. The
caption should always state what the error bars
represent.
0 200 400 600 800 1000 1200 1400
Cycles until failure, with 95% confidence interval

Data from Birnbaum and Saunders (1958)

Statistical inference

population

sample
Statistical inference means deducing
information about a population by
examining only a subset of the
population (the sample).

We use a sample statistic to estimate

a population parameter.
How to determine the confidence interval?
▷ If you make assumptions about the distribution of your data (eg. “the
observations are normally distributed”), you can calculate the confidence
interval for your estimation of the parameter of interest (eg. the mean)
using analytical quantile measures

▷ If you don’t want to make too many assumptions, a technique called the
bootstrap can help

▷ General idea:
• I want to determine how much uncertainty is generated by the fact that I only
have a limited sample of my full population
• If I had a “full” sample, I could extract a large number of limited samples and
examine the amount of variability in those samples (how much does my
parameter of interest vary across these samples?)
• I only have a limited sample, but I can look at a large number of limited
samples from my own limited sample, by sampling with replacement
Bootstrap methods

▷ “Bootstrapping” means resampling your data with replacement

▷ Instead of fitting your model to the original 𝑋 and 𝑦, you fit your model to
resampled versions of 𝑋 and 𝑦 many times

▷ Provides 𝑛 slightly different models from which we create a confidence

interval

▷ Hypothesis: observations are the result of a model plus noise

Bootstrapping confidence intervals in Python

1 Take a large number of samples of the same size as our original dataset, by
sampling with replacement

2 For each sample, calculate the parameter of interest (eg. the mean)

3 Calculate the relevant percentile from the distribution of the parameter of interest

def bootstrap_confidence_intervals(data, estimator, percentiles, runs=1000):

replicates = [Link](runs)
for i in range(runs):
replicates[i] = estimator([Link](data, len(data), replace=True))
est = [Link](replicates)
ci = [Link]([Link](replicates), percentiles)
return (est, ci)
Illustration with Excel

Import your data into the first

row of a spreadsheet (here we
have 10 observations).

Calculate the sample mean (last

column) using function
AVERAGE.
Illustration with Excel

The first replicate, placed in the

row under the sample, is
obtained by resampling with
replacement from the original
sample.

Each cell contains the formula

INDEX(A1:J1,
RANDBETWEEN(1, 10)),
meaning “choose a random
element from the first row”.

Calculate the mean of the first

replicate (last column) using
function AVERAGE.
Illustration with Excel

Generate a large number of

replicates (here 18), in successive
rows of the spreadsheet.

The mean of all the replicate

means (here in bold) is the
bootstrapped mean. Other
estimations such as confidence
intervals can be obtained from
the blue column of replicate
means.
Illustration: bootstrapped mean of Birnbaum data

Let’s calculate the bootstrapped mean and associated 95% confidence interval for
the Birnbaum and Saunders (1958) fatigue data.

> import numpy

> cycles = [Link]([370, 1016, 1235, [...] 1560, 1792])
> [Link]()
1400.9108910891089
> est, ci = bootstrap_confidence_intervals(cycles, [Link], [5, 95])
> print("Bootstrapped mean and CI: {:.1f} [{:.1f}, {:.1f}]".format(est, ci[0], ci[1]))
Bootstrapped mean and CI95: 1403.7 [1332.8, 1478.9]
Fitting models
Fitting a distribution to observations

▷ The probability distributions defined in [Link] have a fit()

method to find the parameters that provide a “best” fit
• more precisely, the maximum likelihood of being the best fit

▷ For example
• [Link](data)
• [Link](data)
• [Link](data)
• [Link](data)

▷ They return different parameters depending on the distribution, with

some common parameters
• loc for the mean
• scale for the standard deviation
Example: material strength data

Let us examine material strength data collected by the US

NIST. We plot a histogram to examine the distribution of Vangel material data (n=100)
the observations. 0.05

0.04

0.03
import pandas
import [Link] as plt 0.02
data = pandas.read_csv("[Link]", header=None)
vangel = data[0] 0.01
N = len(vangel)
0.00
[Link](vangel, density=True, alpha=0.5) 10 20 30 40 50 60
[Link]("Vangel material data (n={})".format(N)) Specimen strength
[Link]("Specimen strength")

Data source: [Link] from NIST dataplot datasets

Example: material strength data
There are no obvious outliers in this dataset. The distribution is
asymmetric (the right tail is longer than the left tail) and the
literature suggests that material strength data can often be well
modeled using a Weibull distribution, so we fit a Weibull
distribution to the data, which we plot superimposed on the
Weibull fit on Vangel data
histogram. 0.05

The weibull_min.fit() function returns the parameters 0.04

for a Weibull distribution that shows the best fit to the 0.03

distribution (using a technique called maximum likelihood 0.02

estimation that we will not describe here).
0.01

0.00
from [Link] import weibull_min 10 20 30 40 50 60
[Link](vangel, density=True, alpha=0.5) Specimen strength

shape, loc, scale = weibull_min.fit(vangel, floc=0)

x = [Link]([Link](), [Link](), 100)
[Link](x, weibull_min(shape, loc, scale).pdf(x))
[Link]("Weibull fit on Vangel data")
[Link]("Specimen strength")

Data source: [Link] from NIST dataplot datasets

Example: material strength data

It may be more revealing to plot the cumulative

failure-intensity data, the CDF of the dataset. We
superimpose the empirical CDF generated directly from
Vangel cumulative failure data
the observations, and the analytical CDF of the fitted 1.0 Empirical CDF
Weibull fit

Weibull distribution. 0.8

0.6

import [Link]
0.4
ecdf = [Link](vangel)
[Link](x, ecdf(x), label="Empirical CDF") 0.2

[Link](x, weibull_min(shape,loc,scale).cdf(x),\ 0.0

label="Weibull fit") 10 20 30 40 50 60
Specimen strength
[Link]("Vangel cumulative failure intensity")
[Link]("Specimen strength")
[Link]()
Example: material strength data

To assess whether the Weibull distribution is a good fit

for this dataset, we examine a probability plot of the
quantiles of the empirical dataset against the quantiles of
the Weibull distribution. If the data comes from a Weibull
Weibull probability plot of Vangel data
distribution, the points will be close to a diagonal line. 60

50
Distributions generally differ the most in the tails, so the

Ordered Values
40
position of the points at the left and right extremes of the
plot are the most important to examine. 30

In this case, the fit with a Weibull distribution is good. 10

10 20 30 40 50
from [Link] import probplot, weibull_min Theoretical quantiles
probplot(vangel, \
dist=weibull_min(shape,loc,scale),\
plot=[Link]().add_subplot(111))
[Link]("Weibull probability plot of Vangel data")
Example: material strength data

We can use the fitted distribution (the model for our observations)
to estimate confidence intervals concerning the material strength.
For example, what is the minimal strength of this material, with a
99% confidence level? It’s the 0.01 quantile, or the first percentile
of our distribution.

> [Link].weibull_min(shape, loc, scale).ppf(0.01)

7.039123866878374
Source: [Link]/2048, CC BY-NC licence
Assessing goodness of fit

The Kolmogorov-Smirnov test provides a measure of goodness of

fit.

It returns a distance 𝐷, the maximum distance between the CDFs

of the two samples. The closer this number is to 0, the more likely
it is that the two samples were drawn from the same distribution.
Python: [Link](obs, distribution)

(The K-S test also returns a p-value, which describes the statistical significance of the D
statistic. However, the p-value is not valid when testing against a model that has been
fitted to the data, so we will ignore it here.)
Exercise

Problem
We have the following field data for time to failure of a pump (in hours): 3568
2599 3681 3100 2772 3272 3529 1770 2951 3024 3761 3671 2977 3110 2567 3341
2825 3921 2498 2447 3615 2601 2185 3324 2892 2942 3992 2924 3544 3359 2702
3658 3089 3272 2833 3090 1879 2151 2371 2936
What is the probability that the pump will fail after it has worked for at least
2000 hours? Provide a 95% confidence interval for your estimate.
Exercise

Solution (1/3)

import [Link]
import [Link] as plt
Distribution of failure times (hours)
0.0008

obs = [3568, 2599, 3681, 3100, 2772, 3272, 3529, 1770, 2951, \ 0.0006

3024, 3761, 3671, 2977, 3110, 2567, 3341, 2825, 3921, 2498, \ 0.0004

2447, 3615, 2601, 2185, 3324, 2892, 2942, 3992, 2924, 3544, \ 0.0002

3359, 2702, 3658, 3089, 3272, 2833, 3090, 1879, 2151, 2371, 2936] 0.0000
2000 2500 3000 3500 4000 4500 5000

# start with some exploratory data analysis to look at the “shape”

# of the data
[Link](obs, density=True)
[Link]("Distribution of failure times (hours)")
Exercise

Solution (2/3)
Distribution of failure times (hours)

0.0008

# From the histogram, a normal distribution looks like a

0.0006

0.0004

# reasonable model. We fit a distribution and plot an 0.0002

# overlay of the fitted distribution on the histogram, 0.0000

2500 3000 3500 4000

# as well as a probability plot.

mu, sigma = [Link](obs)
fitted = [Link](mu, sigma)
[Link](obs, density=True, alpha=0.5)
Probability Plot

support = [Link]([Link](), [Link](), 100) 4000

[Link](support, [Link](support), lw=3)

Ordered Values
3500

[Link]("Distribution of failure times (hours)") 3000

2500

fig = [Link]().add_subplot(111)
2000
2000 2500 3000 3500 4000
Theoretical quantiles

[Link](obs, dist=fitted, plot=fig)

Exercise

Solution (3/3)

def failure_prob(observations):
mu, sigma = [Link](observations)
return [Link](mu, sigma).cdf(2000)

# Estimate confidence intervals using the bootstrap method. This is

# estimating the amount of uncertainty in our estimated failure probability
# that is caused by the limited number of observations.
est, ci = bootstrap_confidence_intervals(obs, failure_prob, [2.5, 97.5])
print("Estimate {:.5f}, CI95=[{:.5f}, {:.5f}]".format(est, ci[0], ci[1]))

Results are something like 0.01075, 𝐶𝐼95 = [0.00206, 0.02429].

Illustration: fitting a distribution to wind speed data

TLS 2013 wind speed data

Let us examine a histogram of wind speed data from TLS 0.07

airport, in 2013. 0.06

0.05

data = pandas.read_csv("[Link]") 0.04

wind = data["Mean Wind SpeedKm/h"] 0.03

[Link](wind, density=True, alpha=0.5) 0.02

[Link]("Wind speed (km/h)") 0.01

[Link]("TLS 2013 wind speed data") 0.00

10 20 30 40 50
Wind speed (km/h)
Illustration: fitting a distribution to wind speed data
Normal fit on TLS 2013 wind speed data
0.07

We can attempt to fit a normal distribution to the data, and 0.06 K-S test: D=1.0, p=0.0

examine a probability plot. (Note that the data distribution 0.05

looks skewed and the normal distribution is symmetric, so it’s 0.04

0.03
not really a very good choice).
0.02

0.01

0.00
shape, loc = [Link](wind) 10 20 30 40 50
fitted = [Link](shape, loc) Wind speed (km/h)
[Link](wind, density=True, alpha=0.5)
x = [Link]([Link](), [Link](), 100)
[Link](x, [Link](x))
Normal probability plot of wind speed
[Link]("Normal fit on TLS 2013 wind speed data")
50
[Link]("Wind speed (km/h)")

40
[Link](wind, dist=fitted,

Ordered Values
plot=[Link]().add_subplot(111))
30
[Link]("Normal probability plot of wind speed")

Indeed, the probability plot shows quite a poor fit for the 10

normal distribution, in particular in the tails of the 0

distributions. 0 10 20 30
Theoretical quantiles
Illustration: fitting a distribution to wind speed data
Lognormal fit on TLS 2013 wind speed data
0.07

0.06 K-S test: D=0.075, p=0.03

We can attempt to fit a lognormal distribution to the data, 0.05

0.04
and examine a probability plot. 0.03

0.02
shape, loc, scale = [Link](wind) 0.01
fitted = [Link](shape, loc, scale)
0.00
[Link](wind, density=True, alpha=0.5) 10 20 30 40 50
x = [Link]([Link](), [Link](), 100) Wind speed (km/h)
[Link](x, [Link](x))
[Link]("Lognormal fit on TLS 2013 wind speed data") Lognormal probability plot of wind speed
[Link]("Wind speed (km/h)") 50

[Link](wind, dist=fitted, 40

Ordered Values
plot=[Link]().add_subplot(111))
30
[Link]("Lognormal probability plot of wind speed")
20

The probability plot gives much better results here. 10

0
0 10 20 30 40
Theoretical quantiles
Exercise
▷ Download heat flow meter data collected by B. Zarr (NIST, 1990)
→ https:
//[Link]/div898/handbook/eda/section4/[Link]

▷ Plot a histogram for the data

▷ Generate a normal probability plot to check whether the measurements

fit a normal (Gaussian) distribution

▷ Fit a normal distribution to the data

▷ Calculate the sample mean and the estimated population mean using the
bootstrap technique

▷ Calculate the standard deviation

▷ Estimate the 95% confidence interval for the population mean, using the
bootstrap technique
Analyzing
data
TLS 2013 wind speed data
0.07

Analyzing data: wind speed 0.06

0.05

0.04

0.03

0.02

0.01

0.00
10 20 30 40 50
Wind speed (km/h)
▷ Import wind speed data for Toulouse airport 50
Normal probability plot of wind speed

Find the mean of the distribution

Ordered Values
▷ 30

▷ Plot a histogram of the data 0

0 10 20 30
Theoretical quantiles
▷ Does the data seem to follow a normal distribution?
• use a probability plot to check 50
Weibull probability plot of wind speed

Check whether a Weibull distribution fits better

Ordered Values
▷ 30

▷ Predict the highest wind speed expected in a 10-year interval 10

0
0 10 20 30
Theoretical quantiles

Data downloaded from [Link]/history/monthly/fr/blagnac/LFBO

Analyze HadCRUT4 data on global temperature change

▷ HadCRUT4 is a gridded dataset of global historical surface 1.0

Average anomaly
temperature anomalies relative to a 1961-1990 reference period
0.5

▷ Data available from 0.0

[Link]/hadobs/hadcrut4/
−0.5

▷ Exercise: import and plot the northern hemisphere ensemble −1.0

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020
Year

median time series data, including uncertainty intervals

Image credits

▷ Microscope on slide 41 adapted from [Link]/p/aeh1J5, CC BY

licence

For more free content on risk engineering,

visit [Link]
Feedback welcome!
This presentation is distributed under the terms of the
Creative Commons Attribution – Share Alike licence

@LearnRiskEng

[Link]/RiskEngineering
Was some of the content unclear? Which parts were most useful to
you? Your comments to feedback@[Link]
(email) or @LearnRiskEng (Twitter) will help us to improve these
materials. Thanks!

For more free content on risk engineering,

visit [Link]

Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
No ratings yet
Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
39 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Statistical Analysis of Bridge Conditions
No ratings yet
Statistical Analysis of Bridge Conditions
9 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Data Science: Data Wrangling Guide
No ratings yet
Data Science: Data Wrangling Guide
41 pages
Robust Statistics For Outlier Detection: Peter J. Rousseeuw and Mia Hubert
No ratings yet
Robust Statistics For Outlier Detection: Peter J. Rousseeuw and Mia Hubert
7 pages
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
No ratings yet
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
19 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
Business Analytics - The Science of Data Driven Decision Making
No ratings yet
Business Analytics - The Science of Data Driven Decision Making
55 pages
Understanding Stemplots in Statistics
No ratings yet
Understanding Stemplots in Statistics
36 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
RousseeuwHubert AnomalyDetection WIRES 2018
No ratings yet
RousseeuwHubert AnomalyDetection WIRES 2018
14 pages
Descriptive Statistics - Numerical Measures
No ratings yet
Descriptive Statistics - Numerical Measures
91 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
6 CE 411 - HYDROLOGY (Statistical Measures)
No ratings yet
6 CE 411 - HYDROLOGY (Statistical Measures)
33 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Estimation
No ratings yet
Estimation
25 pages
Business Intelligence and Data Analytics - Week 2
No ratings yet
Business Intelligence and Data Analytics - Week 2
24 pages
Lecture 2
No ratings yet
Lecture 2
38 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
6.lab Activity
No ratings yet
6.lab Activity
23 pages
Lecture01
No ratings yet
Lecture01
76 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
Experiment - 1 csd201
No ratings yet
Experiment - 1 csd201
19 pages
DFSS SciPy
No ratings yet
DFSS SciPy
7 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Feature Engineering
No ratings yet
Feature Engineering
35 pages
Note 02
No ratings yet
Note 02
31 pages
Frequentist Statistical Inference Explained
No ratings yet
Frequentist Statistical Inference Explained
15 pages
1 Program
No ratings yet
1 Program
20 pages
Machine Learning Attribute Types Explained
No ratings yet
Machine Learning Attribute Types Explained
31 pages
Description of Uncertainty 2024
No ratings yet
Description of Uncertainty 2024
16 pages
02 Data
No ratings yet
02 Data
36 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Chapter 2
No ratings yet
Chapter 2
52 pages
Project Report Writing Guidelines
No ratings yet
Project Report Writing Guidelines
31 pages
Lecture Slides - Capítulo 02
No ratings yet
Lecture Slides - Capítulo 02
21 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
69 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
EDA 1 Continuation
No ratings yet
EDA 1 Continuation
10 pages
CS361 FA23 Lec2 Post
No ratings yet
CS361 FA23 Lec2 Post
67 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Statistics
No ratings yet
Statistics
23 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
106 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
AI Data Science: Stats & Python Analysis
No ratings yet
AI Data Science: Stats & Python Analysis
7 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
Chapter 3 - 250720 - 111806
No ratings yet
Chapter 3 - 250720 - 111806
40 pages
Module 1
No ratings yet
Module 1
53 pages
Organisation Theory and Behaviour Assignment
No ratings yet
Organisation Theory and Behaviour Assignment
4 pages
The Impact of Academic Performance On Self-Esteem Among The Female Students Studying in Different Colleges Under Royal University of Bhutan
No ratings yet
The Impact of Academic Performance On Self-Esteem Among The Female Students Studying in Different Colleges Under Royal University of Bhutan
7 pages
Practice Questions
No ratings yet
Practice Questions
3 pages
M.tech Dissertation Topics
100% (2)
M.tech Dissertation Topics
7 pages
White America's Challenges 2007
100% (1)
White America's Challenges 2007
108 pages
Sabnis Et Al 2013 JM Sales Lead Black Hole
No ratings yet
Sabnis Et Al 2013 JM Sales Lead Black Hole
17 pages
Assessment in Mathematics
No ratings yet
Assessment in Mathematics
53 pages
Social Science Research Workshop 2024
No ratings yet
Social Science Research Workshop 2024
6 pages
Faizan
No ratings yet
Faizan
11 pages
ICICI Bank Financial Planning Insights
No ratings yet
ICICI Bank Financial Planning Insights
11 pages
Timetable ILP
No ratings yet
Timetable ILP
6 pages
Tourist Satisfaction Survey for Hot Springs
100% (5)
Tourist Satisfaction Survey for Hot Springs
2 pages
Preventing Wrong-Site Surgery Errors
No ratings yet
Preventing Wrong-Site Surgery Errors
1 page
Code Mixing in English Teaching Methods
No ratings yet
Code Mixing in English Teaching Methods
23 pages
Factors Influencing The College Choice Decisions of Graduate Students
No ratings yet
Factors Influencing The College Choice Decisions of Graduate Students
16 pages
Interviews Focus Groups Usc Temp
No ratings yet
Interviews Focus Groups Usc Temp
25 pages
Visual Da'wah on TikTok & Instagram
No ratings yet
Visual Da'wah on TikTok & Instagram
25 pages
Oral Presentation DR Varsha
No ratings yet
Oral Presentation DR Varsha
16 pages
Literature Review On Monorail
100% (2)
Literature Review On Monorail
6 pages
8 Substance and Behavioral Addictions Assessment Instruments
No ratings yet
8 Substance and Behavioral Addictions Assessment Instruments
19 pages
Adult Development and Aging Chapter 3
No ratings yet
Adult Development and Aging Chapter 3
12 pages
The Impacts of Social Media To The Behavior of The Selected High School Students in Aclc College of Taytay
No ratings yet
The Impacts of Social Media To The Behavior of The Selected High School Students in Aclc College of Taytay
15 pages
The Project Audit
0% (1)
The Project Audit
17 pages
Fundamental of Planning - Robbins
100% (1)
Fundamental of Planning - Robbins
35 pages
Ethical Reasoning
No ratings yet
Ethical Reasoning
21 pages
Acid Base Indicator
0% (1)
Acid Base Indicator
17 pages
PHD Regulations - Amended in June 2022
No ratings yet
PHD Regulations - Amended in June 2022
60 pages
MBA Research Design Overview
No ratings yet
MBA Research Design Overview
18 pages
Dependability 3 Unlocked
No ratings yet
Dependability 3 Unlocked
26 pages
Employee Engagement in IT Sector Study
100% (1)
Employee Engagement in IT Sector Study
19 pages