INTRODUCTION TO DATA ANALYSIS
TOPIC 1: BASIC CONCEPTS AND DATA ORGANIZATION
DATA ANALYSIS
Data analysis in psychology is a methodological tool of
basically statistical character. Before starting, one must have
tell a series of basic concepts:
Statistics: it deals with the systematization, collection, organization and
presentation of the data related to a phenomenon that presents
variability or uncertainty for its methodical study with the aim of
make forecasts about them, make decisions or obtain
conclusions. There are two types:
Descriptive statistics. It includes the procedures directed
to the organization and description of a dataset.
Inferential statistics. It is oriented towards making inferences.
about a population based on the known characteristics of
a sample obtained from her.
Population: the set of all elements that fulfill a
specific characteristic subject of study.
Sample: a subset of a population that is representative of it.
To be so, it must be selected by sampling methods.
probabilistic, resulting in a probabilistic sample.
Parameter: descriptive property of a population.
Statistic: descriptive property of a sample.
THE SCALES OF MEASUREMENT
The rigorous measurement of psychological variables is the preliminary step
It is unavoidable for any subsequent use that will be made of them. The
measurements the process by which numbers are assigned to objects or
characteristics (properties of the objects or people that are desired
study) according to specific rules. If we consider that the number and
types of modalities (different ways of presenting a characteristic) of
the different characteristics can be very varied, it is not surprising that
In each case, a different type of scale is used. These scales are:
Nominal: consists of the arbitrary assignment of numbers or symbols
to each of the modalities of a characteristic. Using this
scale can only infer relationships of equality or inequality
between objects with respect to a characteristic, which does not imply any
another property.
INTRODUCTION TO DATA ANALYSIS
Ordinal: the numbers indicate the relative positions of objects
regarding a certain attribute, that is to say, it can be established
an ascending or descending arrangement of objects or facts that
they are measured. Therefore, in addition to indicating that something is different from something else,
it also indicates whether it is greater or lesser, but no distance is contemplated
between some measurements and others, since no units of measurement are used.
Interval: what characterizes this scale is the existence of a
common and constant unit of measurement. The assigned numbers
therefore represent the degree to which an object possesses the attribute
studied, allowing them to be arranged in relation to it and, moreover,
calculate the exact numerical distance between them. It does not exist
an absolute zero since the origin of measurement is arbitrary.
Reason: this level adds to the interval measures the existence of
an absolute zero that indicates the total absence of the characteristic that
it is being studied. Thanks to this, we can also establish
equality or inequality of reasons.
THE VARIABLES
Unavariablees a numerical representation of a characteristic that
presents more than one modality (value) of a certain set. If only
It presents a modality; that characteristic would be a constant. There are three.
types of variables:
Qualitative: this type includes nominal variables (sex,
marital status, race...). Depending on the number of categories that
they are classified into dichotomous (they present two modalities)
and polymorphic (they present more than two modalities). They can also
dichotomize or polytomize variables that could be measured in a
higher level to simplify work.
Quasi-quantitative: this type belongs to ordinal variables.
(hardness, level of satisfaction, ranking position...)
Quantitative: this type includes interval and ...
reason. Depending on the numerical values that can be assigned to them,
are classified as continuous (given two arbitrary values, always
one can find an intermediate) and discrete (given two values
consecutive, there is no intermediate one.
DATA ORGANIZATION
Matrix of cases by variables
Once the data from a study have been collected, they need to be organized.
information to describe the studied phenomenon in a table,
what constitutes the first step in data analysis of any
INTRODUCTION TO DATA ANALYSIS
Research. Each row of the table is occupied by a case, while
that the recorded variables occupy the columns. This table is
they call it a matrix of cases by variables. Usually it
performing frequency distributions (representation of the
relationship between a set of exhaustive and mutually exclusive measures
exclusive and the frequency of each of them) of the variables of
interest to facilitate data visualization, make graphs of
they and statistical calculations.
Frequency distribution: qualitative variable
To build a basic frequency distribution table, one needs to
count the number of cases for each value of each variable,
although other data can be added.
X I pi PI
Man 24 0.6 60
Woman 16 0.4 40
Total n=40 1 100
In the first column of the table, the values are specified that
you can take the variable (X). In the second column appears the
absolute frequency (ni), that is, the number of times that
those values appear. The sum of the frequencies for all the
The values of the variable represent the total of the sample (n).
the third column shows the relative frequency (pI) or proportion,
what is the quotient of the absolute frequency divided by the size of the
sample (pI= niThe sum of all proportions will be 1.
the fourth column shows values in percentage terms (PI) that
it is the result of multiplying the values of the relative frequency by
one hundredi= pix 100); the sum of all percentage values will be,
therefore, 100.
Frequency distribution and graphical representation: variable
semi-quantitative and quantitative of reduced modalities
X ni pI Pi na pa Pa
Primary 13 0.32 32 13 0.32 32
ESO 11 0.28 28 24 0.6 60
FP or 7 0.18 18 31 0.78 78
bachelor
to
Diplomat 4 0.1 10 35 0.88 88
ra
Graduate 5 0.12 12 40 1 100
INTRODUCTION TO DATA ANALYSIS
ra
Total 40 1 100
As it is an ordinal variable, the values of the variable (X)
must be placed in ascending order or
descending. The accumulated absolute frequency (naindicate the
number of times the mode is repeated plus the modes
inferior. The accumulated proportions (pais the quotient between
the accumulated absolute frequency of each modality and the total of
observations (pa= naFinally, the accumulated percentages
(Pathey are the result of multiplying the accumulated proportions by
one hundred (Pa= pax 100).
Frequency distribution and graphical representation: variable
quantitative
For cases of variables like age, which can acquire a multitude
of distinct values, the grouping into intervals is proceeded for a
better data management. Despite everything, not in all distributions
for quantitative frequencies it is necessary to use intervals (case of the
that have few options). It should be noted that,
When using intervals, information is lost.
Lim. Limits. Exact Midpoint i
Apparents
26-35 25.5-35.5 30.5 10
36-45 35.5-45.5 40.5 3
46-55 45.5-55.5 50.5 13
56-65 55.5-65.5 60.5 7
66-75 65.5-75.5 70.5 7
40
In the first column of each row, a range of values appears.
These intervals constitute the apparent, virtual, or limits.
informed of the interval. For each interval there is a lower limit
and a higher one. It should be noted that between the upper limit of a
interval and the lower bound of the following (or vice versa) may have values
intermediaries, so the column of exact or real limits (lim.
exact = reported value ± 0.5 x I, where I = instrument unit
measurement) shows where the range that one encompasses ends.
exactly and where does the one that covers the next one start. The point
the midpoint of the intervals is the semisum of the exact limits or of
the apparent limits. In the last column, the frequency appears
absoluteiThe amplitude of an interval is the
difference between the exact upper limit and the exact lower limit.
INTRODUCTION TO DATA ANALYSIS
Graphic representations
Ungráfications a quick and intuitive way to visualize a
frequency distribution. There are many types, with some being
more appropriate than others in relation to the number and level of measurement
of the variable we want to represent.
For graphical representations of a variable, the following are used:
Bar chart: it is commonly used for variables
nominal, ordinal, and discrete quantitative (for these two)
lastly, a bar graph can also be made
cumulative with the accumulated frequencies). The values of
the variables are placed on the x-axis (horizontal), and the
frequencies, in the vertical axis. Above each
the modality of the variable is a rectangle whose height
coincides with the value of the frequency, whether absolute or
relative.
Sector diagram: it is used for qualitative variables and
quasi-quantitative, and it is a representation in the form of a circle
divided into sections whose area is proportional to the
frequency of the corresponding modality. It is common
indicate the percentage of each value of the variable.
Pictograms: they are used for qualitative variables and express
through drawings, symbols, etc. the frequencies of the
modalities of the variable. These graphs represent
different scales of the same drawing so that the area of
each one is proportional to the frequency.
The histogram: it is used for continuous quantitative variables.
with data grouped in intervals. On the x-axis,
they place the exact limits of each of the intervals in the
that the data, or the midpoints of the
intervals, and rectangles are raised over them, as was done
with the bar chart, but taking into account that the
The base of each bar coincides with the actual limits of the interval.
and that the order is not arbitrary. This graph can be constructed
for absolute or relative frequencies, individual or
accumulated.
Frequency polygon: it differs from the histogram in that it
draw by connecting the midpoints of each interval using
a line.
For graphical representations of two variables, the following are used:
INTRODUCTION TO DATA ANALYSIS
Bar chart set: suitable when at least
one of the variables is qualitative. If both are qualitative,
it is advisable to create a double-entry table where
confront the two variables, like this:
Sex Man Woman Total
Marital status
Married 12 12 24
Divorced 4 2 6
Single 4 2 6
Widower 4 0 4
Total 24 16 40
On the x-axis is one of the variables, and next to
she, or about her, with another color, the other is placed, offering
a provided visual comparison.
Scatter plot: it is used in the case of two
quantitative variables, suggesting a possible relationship
linear between both. One of the variables is placed on the axis of
abscissas and another one in the ordinates. For each pair of data, it
locate the intersection between both variables and in it
put a period.
Frequency distribution: qualitative variable
Frequency distributions have three basic properties:
Central tendency. It refers to the value around which a
particular frequency distribution. This general magnitude
it can be quantified by means of indices known as
central tendency or average statistics.
Variability or dispersion. It refers to the degree of
concentration of the observations around the average. One
frequency distribution will be homogeneous or little variable if
the data differ little from each other, and heterogeneous or very variable
if the data is very dispersed with respect to the average. This
Property is independent of the previous one.
Asymmetry or bias. It refers to the degree to which the data is
they are distributed evenly above and below the
central tendency. A distribution will be symmetrical if when divided
at two, at the height of the middle, the two halves overlap,
it will be positively asymmetric if the higher concentration of
scores occur at the lower end of the scale and will be
negative asymmetric if it occurs in the upper part.