0% found this document useful (0 votes)
67 views21 pages

Using and Handling Data

This document provides an overview of data, including what data is, different types of data, and methods of collecting data. It discusses qualitative vs quantitative data, discrete vs continuous data, and analog vs digital data. Common methods of collecting data include censuses, sample surveys, and experiments. The document also covers topics like showing data through various graphs, conducting surveys, measures of central tendency and spread, and probability.

Uploaded by

Mangala Semage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views21 pages

Using and Handling Data

This document provides an overview of data, including what data is, different types of data, and methods of collecting data. It discusses qualitative vs quantitative data, discrete vs continuous data, and analog vs digital data. Common methods of collecting data include censuses, sample surveys, and experiments. The document also covers topics like showing data through various graphs, conducting surveys, measures of central tendency and spread, and probability.

Uploaded by

Mangala Semage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Using and Handling Data

Data Index

Probability and
Statistics Index

Graphs Index

What is Data?

What is Data?

Discrete and Continuous Data

Advanced: Analog and Digital Data

How to Show Data


Bar Graphs

Pie Charts

Dot Plots

Line Graphs

Scatter (x,y) Plots

Pictographs

Histograms

Frequency Distribution

Stem and Leaf Plots

Cumulative Tables and Graphs

Graph Paper Maker

Surveys

How to Do a Survey

Survey Questions
Showing the Results of a Survey

Accuracy and Precision

Activity: Asking Questions

Activity: Improving Questions

Probability and Statistics


Measures of Central Value
Finding a Central Value

Calculate the Mean Value and The Mean Machine

Find the Median Value

Find the Mode or Modal Value

Activity: Averages Brain-Teaser

Calculate the Mean from a Frequency Table

Advanced: Mean, Median and Mode from Grouped Frequencies

Weighted Mean

Measures of Spread
The Range

Quartiles and the Interquartile Range

Percentiles

Mean Deviation

Standard Deviation

Standard Deviation Calculator

Standard Deviation Formulas

Comparing Data

Univariate and Bivariate Data

Scatter (x,y) Plots

Outliers

Correlation

Probability
Probability

The Probability Line

The Spinner

The Basic Counting Principle

Relative Frequency

Activities:

An Experiment with a Die

An Experiment with Dice

Dropping a Coin onto a Grid

Buffon's Needle

Random Words

Lotteries

Events

Complement

Probability: Types of Events

Independent Events

Dependent Events: Conditional Probability


Tree Diagrams

Mutually Exclusive Events

False Positives and False Negatives

Shared Birthdays

Combinations and Permutations

Combinations and Permutations

Combinations and Permutations Calculator

Random Variables
Random Variables

Random Variables - Continuous

Random Variables - Mean, Variance and Standard Deviation

The Binomial Distribution


Quincunx and Quincunx Explained

The Binomial Distribution

The Normal Distribution


Normal Distribution

Standard Normal Distribution Table

Skewed Data

What is Data?
Data is a collection of facts, such as numbers, words, measurements, observations or even just
descriptions of things.

Qualitative vs Quantitative
Data can be qualitative or quantitative.

Qualitative data is descriptive information (it describes something)

Quantitative data, is numerical information (numbers).

And Quantitative data can also be Discrete or Continuous:


Discrete data can only take certain values (like whole numbers)

Continuous data can take any value (within a range)

Put simply: Discrete data is counted, Continuous data is measured

Example: What do we know about Arrow the Dog?

Qualitative:

He is brown and black

He has long hair

He has lots of energy

Quantitative:

Discrete:

o He has 4 legs

o He has 2 brothers

Continuous:

o He weighs 25.5 kg

o He is 565 mm tall

To help you remember think "Quantitative is about Quantity"

More Examples
Qualitative:

Your friends' favorite holiday destination

The most common given names in your town

How people describe the smell of a new perfume

Quantitative:

Height (Continuous)

Weight (Continuous)

Petals on a flower (Discrete)

Customers in a shop (Discrete)

Collecting
Data can be collected in many ways. The simplest way is direct observation.

Example: you want to find how many cars pass by a certain point on a road in a 10-minute
interval.

So: stand at that point on the road, and count the cars that pass by in that interval.

We collect data by doing a Survey.

Census or Sample
A Census is when we collect data for every member of the group (the whole "population").

A Sample is when we collect data just for selected members of the group.

Example: there are 120 people in your local football club.

You can ask everyone (all 120) what their age is. That is a census.

Or you could just choose the people that are there this afternoon. That is a sample.

A census is accurate, but hard to do. A sample is not as accurate, but may be good enough, and is
a lot easier.
Language
Data or Datum?

The singular form is "datum", so we say "that datum is very high".

"Data" is the plural so we say "the data are available", but it is also a collection of facts, so "the
data is available" is fine too.

Make your own Graphs


Explore the wonderful world of graphs. Create your own, and see what different functions
produce. Get to understand what is really happening.

What type of Graph do you want?

Function Grapher Equation Grapher


and Calculator

Make a Bar Graph,


Print or Save Blank Graph Paper
Line Graph or Pie Chart

You can explore ...


... the properties of
a Straight Line
Graph

... the properties of


a Quadratic
Equation Graph

... Cartesian
Coordinates

And also:

Try some Sample Graphs

Make up a function like you


use a calculator, then graph
the result
Make a Bar Graph (old
version)

Discrete and Continuous Data


Data can be Descriptive (like "high" or "fast") or Numerical (numbers).

And Numerical Data can be Discrete or Continuous:

Discrete data is counted,


Continuous data is measured

Discrete Data
Discrete Data can only take certain values.

Example: the number of students in a class (you can't have half a student).

Example: the results of rolling 2 dice:

can only have the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12

Continuous Data
Continuous Data can take any value (within a range)

Examples:

A person's height: could be any value (within the range of human heights), not just
certain fixed heights,

Time in a race: you could even measure it to fractions of a second,

A dog's weight,

The length of a leaf,

Lots more!

Analog and Digital


Analog: something physical with continuous change.
Digital: made up of numbers.
Arrow Barks!
Let's record him barking:


Arrow's bark is analog. It is actual pressure waves in the air, so it is physical with
continuous change.
Continuous change: changes smoothly ... no sudden breaks.
And the microphone converts that pressure into an electrical signal. It is stilll analog (the
electricity is physical, and has continuous change).
But when it gets to your computer or phone it gets
converted to digits!

Thousands of times a second the analog


signal is measured by special electronics ...
and is then saved as numbers.

So the "sound" is now "12, 25, 39, 52, 68, 71, 78, 82, 82, 79, 70, 59, ..." (in fact it would
be in binary, so would be something like "000011000001100100100111...")
It is now digital!
Notice the digital data has sudden jumps up and down ... it does not change
continuously.
It is Discrete Data: that means it can only be certain values (such as 1, 2, 3, etc).

Digital data is very easy for computers and phones to use. It can be saved, shared
electronically, sent all over the world quickly and more.
How can we hear Digits?
Easy! The numbers are used to control the size of an electrical signal, which is analog.

The electricity can be sent to a


speaker ...

... to make sound waves again!

Digital becomes Analog

It should sound very much like the original bark (but not perfectly so!)
Digital Pictures
A similar thing happens when you take a picture.
Light (which is analog) gets projected onto a grid of millions of little sensors inside the
camera:

The camera measures the light at each point and produces numbers.
The picture is now digital!
So the "picture" is now "A1DDF9, ADE3FF, B5E7FE, AFE4F8, ...", which are
hexadecimal color numbers, (that are used internally in binary, so would be something
like "101000011101110111111001...")
Look really closely at a digital picture ... it is made up of millions of little squares called
"pixels":


Each "pixel" is made using a hexadecimal color number.
Digital IS Numbers
So digital pictures, music, videos etc are actually stored on your computer or phone as
numbers.
Numbers rule!

Data Collection Methods

To derive conclusions from data, we need to know how the data were collected; that is, we need
to know the method(s) of data collection.

Methods of Data Collection


For this tutorial, we will cover four methods of data collection.
Census. A census is a study that obtains data from every member of a population. In most
studies, a census is not practical, because of the cost and/or time required.

Sample survey. A sample survey is a study that obtains data from a subset of a
population, in order to estimate population attributes.

Experiment. An experiment is a controlled study in which the researcher attempts to


understand cause-and-effect relationships. The study is "controlled" in the sense that the
researcher controls (1) how subjects are assigned to groups and (2) which treatments each
group receives.

In the analysis phase, the researcher compares group scores on some dependent variable.
Based on the analysis, the researcher draws a conclusion about whether the treatment
( independent variable) had a causal effect on the dependent variable.

Observational study. Like experiments, observational studies attempt to understand


cause-and-effect relationships. However, unlike experiments, the researcher is not able to
control (1) how subjects are assigned to groups and/or (2) which treatments each group
receives.

Data Collection Methods: Pros and Cons


Each method of data collection has advantages and disadvantages.

Resources. When the population is large, a sample survey has a big resource advantage
over a census. A well-designed sample survey can provide very precise estimates of
population parameters - quicker, cheaper, and with less manpower than a census.

Generalizability. Generalizability refers to the appropriateness of applying findings from


a study to a larger population. Generalizability requires random selection. If participants
in a study are randomly selected from a larger population, it is appropriate to generalize
study results to the larger population; if not, it is not appropriate to generalize.

Observational studies do not feature random selection; so generalizing from the results of
an observational study to a larger population can be a problem.

Causal inference. Cause-and-effect relationships can be teased out when subjects are
randomly assigned to groups. Therefore, experiments, which allow the researcher to
control assignment of subjects to treatment groups, are the best method for investigating
causal relationships.

Test Your Understanding


Problem
Which of the following statements are true?

I. A sample survey is an example of an experimental study.


II. An observational study requires fewer resources than an experiment.
III. The best method for investigating causal relationships is an observational study.

(A) I only
(B) II only
(C) III only
(D) All of the above.
(E) None of the above.

Solution

The correct answer is (E). Unlike an experiment, a sample survey does not require the researcher
to assign treatments to survey respondents. Therefore, a sample survey is not an experimental
study. An observational study may or may not require fewer resources (time, money, manpower)
than an experiment. The best method for investigating causal relationships is an experiment - not
an observational study - because an experiment features randomized assignment of subjects to
treatment groups.

What Are Variables?

In statistics, a variable has two defining characteristics:

A variable is an attribute that describes a person, place, thing, or idea.

The value of the variable can "vary" from one entity to another.

For example, a person's hair color is a potential variable, which could have the value of "blond"
for one person and "brunette" for another.

Qualitative vs. Quantitative Variables


Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).

Qualitative. Qualitative variables take on values that are names or labels. The color of a
ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be
examples of qualitative or categorical variables.
Quantitative. Quantitative variables are numeric. They represent a measurable quantity.
For example, when we speak of the population of a city, we are talking about the number
of people in the city - a measurable attribute of the city. Therefore, population would be a
quantitative variable.

In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or z).

Discrete vs. Continuous Variables


Quantitative variables can be further classified as discrete or continuous. If a variable can take
on any value between its minimum value and its maximum value, it is called a continuous
variable; otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continouous variables.

Suppose the fire department mandates that all fire fighters must weigh between 150 and
250 pounds. The weight of a fire fighter would be an example of a continuous variable;
since a fire fighter's weight could take on any value between 150 and 250 pounds.

Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between 0
and plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of
heads must be a discrete variable.

Univariate vs. Bivariate Data


Statistical data are often classified according to the number of variables being studied.

Univariate data. When we conduct a study that looks at only one variable, we say that
we are working with univariate data. Suppose, for example, that we conducted a survey
to estimate the average weight of high school students. Since we are only working with
one variable (weight), we would be working with univariate data.

Bivariate data. When we conduct a study that examines the relationship between two
variables, we are working with bivariate data. Suppose we conducted a study to see if
there were a relationship between the height and weight of high school students. Since we
are working with two variables (height and weight), we would be working with bivariate
data.

Test Your Understanding


Problem 1

Which of the following statements are true?


I. All variables can be classified as quantitative or categorical variables.
II. Categorical variables can be continuous variables.
III. Quantitative variables can be discrete variables.

(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III

Solution

The correct answer is (E). All variables can be classified as quantitative or categorical variables.
Discrete variables are indeed a category of quantitative variables. Categorical variables, however,
are not numeric. Therefore, they cannot be classified as continuous variables.

Populations and Samples

The study of statistics revolves around the study of data sets. This lesson describes two important
types of data sets - populations and samples. Along the way, we introduce simple random
sampling, the main method used in this tutorial to select samples.

Population vs Sample
The main difference between a population and sample has to do with how observations are
assigned to the data set.

A population includes all of the elements from a set of data.

A sample consists of one or more observations from the population.

Depending on the sampling method, a sample can have fewer observations than the population,
the same number of observations, or more observations. More than one sample can be derived
from the same population.

Other differences have to do with nomenclature, notation, and computations. For example,

A a measurable characteristic of a population, such as a mean or standard


deviation, is called a parameter; but a measurable characteristic of a
sample is called a statistic.

We will see in future lessons that the mean of a population is denoted by the
symbol ; but the mean of a sample is denoted by the symbol x.

We will also learn in future lessons that the formula for the standard deviation
of a population is different from the formula for the standard deviation of a
sample.
What is Simple Random Sampling?
A sampling method is a procedure for selecting sample elements from a population. Simple
random sampling refers to a sampling method that has the following properties.

The population consists of N objects.

The sample consists of n objects.

All possible samples of n objects are equally likely to occur.

An important benefit of simple random sampling is that it allows researchers to use statistical
methods to analyze sample results. For example, given a simple random sample, researchers can
use statistical methods to define a confidence interval around a sample mean. Statistical analysis
is not appropriate when non-random sampling methods are used.

There are many ways to obtain a simple random sample. One way would be the lottery method.
Each of the N population members is assigned a unique number. The numbers are placed in a
bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population
members having the selected numbers are included in the sample.

Random Number Generator


In practice, the lottery method described above can be cumbersome, particularly with large
sample sizes. As an alternative, use Stat Trek's Random Number Generator. With the Random
Number Generator, you can select up to 1000 random numbers quickly and easily. This tool is
provided at no cost - free!! To access the Random Number Generator, simply click on the button
below. It can also be found under the Stat Tools tab, which appears in the header of every Stat
Trek web page.

Random Number Generator

Sampling With Replacement and Without Replacement


Suppose we use the lottery method described above to select a simple random sample. After we
pick a number from the bowl, we can put the number aside or we can put it back into the bowl. If
we put the number back in the bowl, it may be selected more than once; if we put it aside, it can
selected only one time.

When a population element can be selected more than one time, we are sampling with
replacement. When a population element can be selected only one time, we are sampling
without replacement.

Test Your Understanding


Problem 1

Which of the following statements are true?


I. The mean of a population is denoted by x.
II. Sample size is never bigger than population size.
III. The population mean is a statistic.

(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.

Solution

The correct answer is (E), none of the above.

The mean of a population is denoted by ; not x. When sampling with replacement, sample size
can be greater than population size. And the population mean is a parameter; the sample mean is
a statistic.

You might also like