BASIC CONCEPTS
Before evaluating the different types of data that permeate an epidemiological study, it is worth
discussing about some key concepts (herein named data, variables and observations):
Data - collect information by means of questions, systematic observations, and imaging or
laboratory tests. All this gathered information represents the data of the research. All the
information collected during research is generically named "data." A set of individual data makes
it possible to perform statistical analysis. If the quality of data is good, i.e., if the way
information was gathered was appropriate, the next stages of database preparation, which will set
the ground for analysis and presentation of results, will be properly conducted.
Observations - are measurements carried out in one or more individuals, based on one or more
variables. For instance, if one is working with the variable "sex" in a sample of 20 individuals
and knows the exact amount of men and women in this sample (10 for each group), it can be said
that this variable has 20 observations.
Variables - are constituted by data. For instance, an individual may be male or female. In this
case, there are 10 observations for each sex, but "sex" is the variable that is referred to as a
whole. Another example of variable is "age" in complete years, in which observations are the
values 1 year, 2 years, 3 years, and so forth. In other words, variables are characteristics or
attributes that can be measured, assuming different values, such as sex, skin type, eye color, age
of the individuals under study, laboratory results, or the presence of a given lesion/disease.
Variables are specifically divided into two large groups: (a) the group of categorical or
qualitative variables, which is subdivided into dichotomous, nominal and ordinal variables;
and (b) the group of numerical or quantitative variables, which is subdivided into continuous and
discrete variables.
Categorical variables
a. Dichotomous variables, also known as binary variables: are those that have only two
categories, i.e., only two response options. Typical examples of this type of variable are
sex (male and female) and presence of skin cancer (yes or no).
b. Ordinal variables: are those that have three or more categories with an obvious ordering
of the categories (whether in an ascending or descending order). For example, Fitzpatrick
skin classification into types I, II, III, IV and V.1
c. Nominal variables: are those that have three or more categories with no apparent ordering
of the categories. Example: blood types A, B, AB, and O, or brown, blue or green eye
colors.
Numerical variables
a. Discrete variables: are observations that can only take certain numerical values. An
example of this type of variable is subjects' age, when assessed in complete years of life
(1 year, 2 years, 3 years, 4 years, etc.) and the number of times a set of patients visited the
dermatologist in a year.
b. Continuous variables: are those measured on a continuous scale, i.e., which have as many
decimal places as the measuring instrument can record. For instance: blood pressure,
birth weight, height, or even age, when measured on a continuous scale.
It is important to point out that, depending on the objectives of the study, data may be collected
as discrete or continuous variables and be subsequently transformed into categorical variables to
suit the purpose of the research and/or make interpretation easier. However, it is important to
emphasize that variables measured on a numerical scale (whether discrete or continuous) are
richer in information and should be preferred for statistical analyses. Figure 1 shows a diagram
that makes it easier to understand, identify and classify the abovementioned variables.