Business Analytics: Data Measurement & Classification
Business Analytics: Data Measurement & Classification
LEVELS OF MEASUREMENT
The level of measurement refers to the relationship among the values that are assigned to
the attributes for a variable. Each scale of measurement has certain properties which in
turn determine the appropriateness for use of certain statistical analyses. It is important
for the researcher to understand the different levels of measurement, as these levels of
measurement, together with how the research question is phrased, dictate what statistical
analysis is appropriate.
The second level of measurement is the ORDINAL Level of Measurement. This level of
measurement depicts some ordered relationship among the variable's
observations. Suppose a student scores the highest grade of 100 in the class. In this case,
he would be assigned the first rank. Then, another classmate scores the second highest
grade of a 92; she would be assigned the second rank. A third student scores a 81 and he
would be assigned the third rank, and so on. The ordinal level of measurement indicates
an ordering of the measurements.
The third level of measurement is the INTERVAL Level of Measurement. The interval
level of measurement not only classifies and orders the measurements, but it also specifies
that the distances between each interval on the scale are equivalent along the scale from
low interval to high interval. For example, an interval level of measurement could be the
measurement of anxiety in a student between the score of 10 and 11; this interval is the
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 1
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
same as that of a student who scores between 40 and 41. A popular example of this level
of measurement is temperature in centigrade, where, for example, the distance between
940C and 960C is the same as the distance between 1000C and 1020C.
The fourth level of measurement is the RATIO Level of Measurement. In this level of
measurement, the observations, in addition to having equal intervals, can have a value of
zero as well. The zero in the scale makes this type of measurement unlike the other
types of measurement, although the properties are similar to that of the interval level of
measurement. In the ratio level of measurement, the divisions between the points on the
scale have an equivalent distance between them.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 2
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
CLASSIFICATION OF DATA
The process of grouping into different classes or sub classes according to some
characteristics is known as classification, tabulation is concerned with the systematic
arrangement and presentation of classified data. Thus classification is the first step in
tabulation. For Example, letters in the post office are classified according to their
destinations viz., Delhi, Madurai, Bangalore, Mumbai etc.,
c) Qualitative Classification: In this type of classification data are classified on the basis
of same attributes or quality like sex, literacy, religion, employment etc., such
attributes cannot be measured along with a scale. For example, if the population to be
classified in respect to one attribute, say sex, then we can classify them into two namely
that of males and females. Similarly, they can also be classified into ‘employed’ or
‘unemployed’ on the basis of another attribute ‘employment’. Thus when the
classification is done with respect to one attribute, which is dichotomous in nature, two
classes are formed, one possessing the attribute and the other not possessing the
attribute. This type of classification is called simple or dichotomous classification.
The classification, where two or more attributes are considered and several
classes are formed, is called a manifold classification. For example, if we classify
population simultaneously with respect to two attributes, e.g. sex and employment, then
population are first classified with respect to ‘sex’ into ‘males’ and ‘females’. Each of
these classes may then be further classified into ‘Urban’, ‘Semi-Urban’ and ‘Rural’ on
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 3
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
the basis of attribute ‘employment’ and as such Population are classified into four
classes namely.
(i) Male in Urban Area
(ii) Male Semi-Urban Area
(iii) Male in Rural Area
(iv) Female in Urban Area
(v) Female Semi-Urban Area
(vi) Female in Rural Area
Still the classification may be further extended by considering other attributes like
marital status etc. This can be explained by the following chart
Weight (in lbs) 90-100 100-110 110-120 120-130 130-140 140-150 Total
No. of
50 200 260 360 90 40 1000
Students
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 4
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Univariate Analysis
It explores each variable in a data set, separately. It looks at the range of values, as
well as the central tendency of the values. It describes the pattern of response to the
variable. It describes each variable on its own.
Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other
words your data has only one variable. It doesn’t deal with causes or relationships
(unlike regression) and its major purpose is to describe; It takes data, summarizes that
data and finds patterns in the data.
The most common univariate analysis is checking the central tendency (mean, median and
mode), the range, the maximum and minimum values, and standard deviation of a variable.
Common visual technique used for univariate analysis is a histogram, which is a frequency
distribution graph.
Univariate analysis is conducted in many ways and most of these ways are of a
descriptive nature. These are the Frequency Distribution Tables, Frequency Polygons,
Histograms, Bar Charts and Pie Charts
Bivariate Analysis
Bivariate analysis is where you are comparing two variables to study their relationships.
These variables could be dependent or independent to each other. In Bivariate analysis
is that there is always a Y-value for each X-value.
The most common visual technique for bivariate analysis is a scatter plot, where one
variable is on the x-axis and the other on the y-axis.
In addition to the scatter plot, regression plot and correlation coefficient are also
frequently used to study the relationship of the variables.
For example, continuing with the iris dataset, you can compare “sepal length” vs “sepal
width” or “sepal length” vs the “petal length” to see if there is a relationship.
Multivariate Analysis
Multivariate analysis is similar to Bivariate analysis but you are comparing more than two
variables. For three variables, create a 3-D model to study the relationship (also known
as Tri-variate Analysis).
Multivariate analysis takes a whole host of variables into consideration. This makes it a
complicated as well as essential tool. The greatest virtue of such a model is that it
considers as many factors into consideration as possible. This results in tremendous
reduction of bias and gives a result closest to reality.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 5
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 6
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
DATA CATEGORIES
Master Data
Master Data is key business information that supports the transactions.
Master Data describes the customers, products, parts, employees, materials, suppliers,
sites, etc. involved in the transactions.
It is commonly referred to as Places (locations, geography, sites,
etc.), Parties (persons, customers, suppliers, employees, etc.), and Things (products,
items, material, vehicles, etc.).
Master data already exists and is used in the operational systems, with some issues.
Master data in these systems is:
Not high quality data,
Scattered and duplicated;
Not truly managed.
Master Data is usually authored and used in the normal course of operations by existing
business processes. Unfortunately, these operational business processes are tailored
for an “application-specific” use case of this master data and therefore fail in achieving
the overall enterprise requirement that mandates commonly used master data across
applications with high-quality standards and common governance.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 7
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Reference Data
Reference data are sets of values or classification schemas that are referred to by
systems, applications, data stores, processes, and reports, as well as by transactional
and master records.
Reference data may be used to differentiate one type of record from another for
categorization and analysis, or they may be a significant fact such as country, which
appears within a larger information set such as address.
It is data that is referenced and shared by a number of systems.
Examples include lists of valid values, code lists, status codes, state abbreviations,
demographic fields, flags, product types, gender, chart of accounts, and product
hierarchy.
Most of the reference data refers to concepts that either impact business processes
– e.g. order status (CREATED | APPROVED | REJECTED | etc.) - or is used as an
additional standardized semantic that further clarifies the interpretation of a data
record - e.g. employee job position (JUNIOR | SENIOR | VP | etc.).
Some of the reference data can be universal and/or standardized (e.g. Countries – ISO
3166-1). Other reference data may be “agreed on” within the enterprise (customer
status), or within a given business domain (product classifications).
Reference Data is frequently considered as a subset of master data. The full name for
this data category is Master Reference Data.
Transactional Data
Transactional data describe an internal or external event or transaction that takes
place as an organization conducts its business.
Transactional data describes business events. It is the largest volume of data in the
enterprise.
Examples of business events include:
Buying products from suppliers,
Selling products to customers,
Shipping items to customer sites,
Hiring employees, managing their vacations or changing their positions.
Examples include sales orders, invoices, purchase orders, shipping documents, passport
applications, credit card payments, and insurance claims.
These data are typically grouped into transactional records, which include associated
master and reference data.
Transactional Data is typically handled in operational applications, known under the
CRM, ERP, SCM, HR, etc. acronyms.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 8
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Metadata:
Metadata literally means “data about data.”
Metadata label, describe, or characterize other data and make it easier to retrieve,
interpret, or use information.
Technical metadata are metadata used to describe technology and data structures.
Examples of technical metadata are field names, length, type, lineage, and database
table layouts.
Business metadata describe the nontechnical aspects of data and their usage.
Examples are field definitions, report names, headings in reports and on Web pages,
application screen names, data quality statistics, and the parties accountable for data
quality for a particular field.
Audit trail metadata are a specific type of metadata, typically stored in a record and
protected from alteration, that capture how, when, and by whom the data were created,
accessed, updated, or deleted.
Audit trail metadata are used for security, compliance, or forensic purposes.
Examples include timestamp, creator, create date, and update date.
Although audit trail metadata are typically stored in a record, technical metadata and
business metadata are usually stored separately from the data they describe.
These are the most common types of metadata, but it could be argued that there are
other types of metadata that make it easier to retrieve, interpret, or use information.
The label for any metadata may not be as important as the fact that it is being
deliberately used to support data goals.
Any discipline or activity that uses data is likely to have associated metadata.
Historical Data
Historical data contain significant facts, as of a certain point in time, that should not
be altered except to correct an error.
They are important to security and compliance.
Operational systems can also contain history tables for reporting or analysis purposes.
Examples include point-in-time reports, database snapshots, and version information.
Temporary Data
Temporary data are kept in memory to speed up processing.
They are not viewed by humans and are used for technical purposes.
Examples include a copy of a table that is created during a processing session to speed
up lookups.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 9
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 10
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 11
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
3 V’S OF DATA
Volume
In big data, Volume is the huge set of data which has huge form.
The volume describes the huge set of data which is very complex to process further
for extracting valuable information from it.
Volume does not describe actual size to grant it as big data, it has relatively big size.
The size could be in Terabyte, Exabyte or even in Zettabyte.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 12
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Velocity
a. In big data, Velocity demonstrate two things mainly, Speed of growth of data and
Speed of transmission of data
b. Velocity refers to data generating, increasing and sharing at a particular speed
through the resources.
c. Speed of growth of data:
The data increases day by day through various resources. Some of the resources
are explained below,
Internet of Things (IOT): IOT is prominent for contributing in big data. It
generates data through IOT devices placed in automated vehicles, digital IOT
bulbs, IOT based robots etc.
Social Media: As you see, users on social media increasing day by day so that they
exactly generating huge batches of data.
Such as many other resources, who generates data at such high speeds.
d. Speed of transmission of data:
The speed is also take major role in identifying big data.
Big data increasing in rapid fast manner which makes it very complex to process
fast and makes difficult to transmit it quickly through fiber optic or
electromagnetic way of transmission.
Therefore, this term is very important to demonstrate velocity.
For Example: Twitter generates 500 Million tweets per day, rate of speed of
generation of data and rate of speed of transmission of data is very high.
Variety
a. In big data, Variety is nothing but different types of data.
b. This term demonstrates various types of data such as texts, audios, videos, XML file,
data in rows and columns etc.
c. Each type of data has separate way to process itself therefore, it is necessary to
categorize different types of data.
d. In Big Data, data is categorizing in mainly three types as follows,
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 13
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Structured Data: The data which is in the format of relational database and have
structured properly in rows and columns format is known as Structured Data.
Unstructured Data: The data which includes various types of data such as audio,
video, XML file, word file etc. and does not organize in proper format then it is
said to be Unstructured Data.
Semi-structured Data: Semi-structured data is self-explanatory that it is the
data which not fully structured or unstructured. In it, data is partially structured
and mixed with unstructured format of data.
For Example: The social media contains photos, videos and texts of people in huge figure.
This data is nothing but big data, it can be well-structured or unstructured or semi-
structured.
Veracity
Not that all data that come for processing are valuable. So, unless the data is cleansed
correctly, it is not wise to store or process complete data. Especially when the volume is
such massive, there comes this dimension of big data – veracity. These particular
characteristics also help determine whether the data is coming from a reliable source or
the right fit for the analytic model.
Value
The primary interest for big data is probably for its business value. Perhaps this is the
most crucial characteristic of big data. Because unless you get any business insights out
of it, there is no meaning of other big data characteristics.
Variability
In Big data analysis, data inconsistency is a common scenario that arises as the data is
sourced from different sources. Besides, it contains different data types. Hence, to get
meaningful data from that enormous amount of data, anomaly and outlier detection are
essential. So, variability is considered as one of the characteristics of big data.
Visualization
Big data processing is not the only means of getting a meaningful result out of it. Unless
it is represented or visualizes in a meaningful way, there is no point in analysing it. Hence,
big data must be visualized with appropriate tools that serve different parameters to
help data scientists or analysts understand it better.
However, plotting billions of data points is not an easy task. Furthermore, it associates
different techniques like using tree maps, network diagrams, cone trees, etc.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 14
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Validity
Validity has some similarities with veracity. As the meaning of the word suggests, the
validity of big data means how correct the data is for its purpose. Interestingly a
considerable portion of big data remains un-useful, which is considered as ‘dark data.'
The remaining part of collected unstructured data is cleansed first for analysis.
Volatility
Volatility refers to the time considerations placed on a particular data set. It involves
considering if data acquired a year ago would be relevant for analysis for predictive
modelling today. This is specific to the analyses being performed. Similarly, volatility also
means gauging whether a particular data set is historic or not. Usually, data volatility
comes under data governance and is assessed by data engineers.
Vulnerability
Big data is often about consumers. We often overlook the potential harm in sharing our
shopping data, but the reality is that it can be used to uncover confidential information
about an individual. For instance, Target accurately predicted a teenage girl’s pregnancy
before her own parents knew it. To avoid such consequences, it’s important to be mindful
of the information we share online.
Virality
This describes how quickly information gets dispersed across people to people networks.
Virality measures how quickly data is spread and shared to each unique node. Time is a
determinant factor along with the rate of spread.
Viscosity
Viscosity measures the resistance to flow in the volume of data. This resistance can come
from different data sources, friction from integration flow rates and processing reuired
to turn the data into insights. Technologies to deal with viscosity include improved
streaming, agile integration bus and complex event processing. This is all about whether
or not the big data sticks with you or does it call for action.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 15
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Polls
Polls comprise of one single or multiple choice question.
When it is required to have a quick pulse of the audience’s sentiments, you can go for
polls.
Because they are short in length, it is easier to get responses from the people.
Similar to surveys, online polls, too, can be embedded into various platforms.
Once the respondents answer the question, they can also be shown how they stand
compared to others’ responses.
Interviews
In this method, the interviewer asks questions either face-to-face or through
telephone to the respondents.
In face-to-face interviews, the interviewer asks a series of questions to the interviewee
in person and notes down responses.
In case it is not feasible to meet the person, the interviewer can go for a telephonic
interview.
This form of data collection is suitable when there are only a few respondents.
It is too time-consuming and tedious to repeat the same process if there are many
participants.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 16
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Delphi Technique
In this method, market experts are provided with the estimates and assumptions of
forecasts made by other experts in the industry.
Experts may reconsider and revise their estimates and assumptions based on the
information provided by other experts.
The consensus of all experts on demand forecasts constitutes the final demand
forecast.
Focus Groups
In a focus group, a small group of people, around 8-10 members, discuss the common
areas of the problem.
Each individual provides his insights on the issue concerned.
A moderator regulates the discussion among the group members.
At the end of the discussion, the group reaches a consensus.
Questionnaire
A questionnaire is a printed set of questions, either open-ended or closed-ended.
The respondents are required to answer based on their knowledge and experience with
the issue concerned.
The questionnaire is a part of the survey, whereas the questionnaire’s end-goal may or
may not be a survey.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 17
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
ROLE OF COMPETENCIES
Data analytics competencies help define success in any data analytics role. The skills and
abilities collected below include specific behaviors and technical skills that are consistently
exhibited by professionals in the data analytics field.
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 18
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 19
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 20
MBA I Year II SEM (Academic Lecture Material) Unit-II: Business Analytics
Compiled by: Dr. I. J. Raghavendra, Associate Professor, SMS, GIET University, Odisha 21