Reference Notes
Reference Notes
Business analytics is the practice of using data and statistical methods to analyze business
performance, identify patterns, and make data-driven decisions. Business analytics combines the
fields of statistics, data science, and business management to provide insights and solutions to
business problems. Here is an example of how business analytics can be used in a retail business:
A retail business wants to improve their sales performance and decides to use business analytics to
help identify areas of improvement. They collect data on customer purchases, such as the type of
products purchased, the time of day, and the amount spent. The retail business then uses statistical
methods to analyze the data and identify patterns and trends.
Using the data analysis, the retail business identifies that the most popular products are purchased
during specific hours of the day. They also find that customers who purchase certain products are
more likely to purchase related products in the same transaction. Armed with this information, the
business decides to:
• Increase the availability of popular products during peak hours
• Bundle related products together to increase the likelihood of a higher transaction value
• Create targeted marketing campaigns to promote popular products during peak hours
As a result of implementing these changes based on data analysis, the retail business sees an increase
in sales performance and customer satisfaction.
In this example, business analytics was used to analyze customer data and make data-driven decisions
to improve business performance. By leveraging data and statistical methods, businesses can gain
valuable insights and make more informed decisions.
There are four types of business analytics: descriptive, predictive, diagnostic, and prescriptive. Here
is an example of how each type of analytics can be used in a manufacturing business:
1. Descriptive Analytics: Descriptive analytics is used to summarize and describe data to gain a
better understanding of what has happened in the past. An example of descriptive analytics
in a manufacturing business is the analysis of historical data to identify production trends. A
manufacturer might use descriptive analytics to analyze the total units produced in the last
year, the number of units produced per month, and the average time taken to produce each
unit.
2. Predictive Analytics: Predictive analytics is used to forecast future outcomes based on
historical data. An example of predictive analytics in a manufacturing business is the analysis
of production data to predict future demand. A manufacturer might use predictive analytics
to identify the future demand for a specific product, based on historical sales data, industry
trends, and seasonality.
3. Diagnostic Analytics: Diagnostic analytics is used to determine the cause of a specific problem
or issue. An example of diagnostic analytics in a manufacturing business is the analysis of
production data to identify the cause of a decrease in production efficiency. A manufacturer
might use diagnostic analytics to identify the root cause of a decrease in production efficiency,
such as a malfunctioning machine or an inadequate supply of raw materials.
4. Prescriptive Analytics: Prescriptive analytics is used to provide recommendations on the best
course of action to achieve a desired outcome. An example of prescriptive analytics in a
manufacturing business is the optimization of the production process to minimize costs and
maximize output. A manufacturer might use prescriptive analytics to identify the optimal mix
of raw materials, production methods, and machine settings to minimize costs while
maximizing production output.
In summary, each type of business analytics plays a unique role in the analysis of data to drive business
decisions. By leveraging data analytics, businesses can gain valuable insights, make more informed
decisions, and ultimately improve performance.
The process of business analytics typically involves the following steps:
1. Define the business problem or question: The first step is to clearly define the problem or
question that needs to be addressed. This can involve identifying the key business objectives,
understanding the business processes, and defining the key performance indicators (KPIs) that
will be used to measure success.
2. Collect data: Once the business problem has been defined, the next step is to collect the
relevant data. This can involve gathering data from various sources, such as internal
databases, external sources, and public data repositories. The data may be in structured or
unstructured form.
3. Cleanse and prepare the data: After the data has been collected, it needs to be cleaned and
prepared for analysis. This involves removing any inconsistencies or errors, transforming the
data into a consistent format, and ensuring that the data is ready for analysis.
4. Analyze the data: With the data cleaned and prepared, the next step is to analyze the data.
This can involve using various statistical and data mining techniques to identify patterns,
trends, and insights. Common analytical techniques include regression analysis, clustering,
decision trees, and time series analysis.
5. Interpret the results: Once the analysis is complete, the next step is to interpret the results.
This involves using the insights gained from the analysis to make informed business decisions.
The results may be visualized using charts, graphs, or other visualization tools to make them
more accessible to decision-makers.
6. Communicate the findings: The final step is to communicate the findings to key stakeholders
in the business. This may involve creating reports, dashboards, or presentations that provide
an overview of the analysis and its implications for the business. Effective communication is
essential to ensure that the insights gained from the analysis are acted upon and that the
business benefits from the investment in analytics.
Business intelligence
Put simply, BI combines technology, processes, and strategies to transform raw data into meaningful
insights,1 helping individuals and organizations make informed decisions and drive business success.
BI is not to be confused with business analytics. Business intelligence takes a descriptive approach to
give clear insight into how a business is performing. Business analytics, on the other hand, is a
predictive effort to describe what an organization might do to achieve greater outcomes.
BI is a useful compass for organizations, as it can guide people through enormous amounts of data
and help them achieve their business goals. It empowers business leaders, executives, and analysts to
gather, analyze, and interpret data from sources across the enterprise for a comprehensive view of
operations, customers, market trends, and more.
2. Uncovering Valuable Customer Insights: The customer is king! The importance of understanding
customer preferences and behavior in today's competitive market can't be overstated. BI tools analyze
information such as customer data, purchase history, and interactions to reveal valuable insights. A
customer's purchasing history, for example, holds a wealth of knowledge. By studying past
transactions, business analysts can gain a better understanding of what products or services resonate
with their customers the most. This knowledge enables businesses to optimize their product offerings,
fine-tune marketing strategies, and even develop personalized recommendations for each customer,
leading to increased customer satisfaction and loyalty. Similarly, information gathered from customer
interactions with a company is crucial. It allows companies to assess the quality of customer service,
identify pain points, and uncover opportunities for improvement. Understanding how customers
engage with a company's products or services helps businesses to better meet customer needs,
ensuring a positive and seamless customer experience.
3. Identifying Needs for New Products or Services: BI insights can spark new ideas. When analyzing
market trends and consumer demand, business analysts and their colleagues often identify gaps in
the market and opportunities for new products or services. Using BI, they can confidently develop
offerings that align with customer needs, ensuring a greater chance of success. The ability to innovate
and offer products or services that cater to specific market demands is a valuable means toward
boosting revenue and contributing significantly to ROI.
Data and information
Data and information are two concepts that are closely related but different in meaning. Data refers
to raw, unprocessed facts or figures that have no meaning or context on their own. Information, on
the other hand, is the processed, organized, and meaningful data that can be used for decision making,
analysis, or communication.
To better understand the difference between data and information, let's look at an example of data
and how it can be transformed into information.
Suppose we have a dataset of sales figures for a company's products over the past year. The dataset
might include columns for product name, date of sale, and sale price. This raw data is just a collection
of numbers and text with no meaning or context. We can say that this is data.
Now, if we sort this data by product name and create a chart that shows the total sales for each
product, we have transformed the raw data into meaningful information. This information can help
us understand which products are most popular and which ones are not selling as well. We can use
this information to make decisions about which products to promote or discontinue.
Another example of data and information can be found in weather data. Suppose we have a dataset
of temperature readings for a city over the past month. This raw data is just a collection of numbers
with no meaning or context. We can say that this is data.
Now, if we organize this data by date and time and create a graph that shows the temperature trends
over the month, we have transformed the raw data into meaningful information. This information can
help us understand how the temperature is changing over time, and we can use it to plan for weather
conditions in the future.
In summary, data is the raw and unprocessed facts and figures, while information is the processed and
organized data that has meaning and context. To transform data into information, we need to analyze,
sort, organize, and visualize the data in a way that provides insights and helps us make decisions.
Understanding the difference between data and information is crucial in today's world, where data is
abundant and valuable. Businesses, governments, and individuals need to be able to distinguish
between the two to make the most out of the data they collect and use it to their advantage.
Here are five key differences between data and information:
1. Meaning: Data refers to raw, unprocessed facts or figures that have no meaning on their own.
Information, on the other hand, is data that has been processed and organized to provide
context and meaning.
2. Form: Data can be in various forms, such as numbers, text, images, or sounds. Information,
however, is usually presented in a more organized and structured form, such as a report,
graph, or chart.
3. Usefulness: Data on its own is not very useful as it does not provide any insights or help us
make decisions. Information, on the other hand, is useful as it provides insights, allows for
analysis, and can be used to make informed decisions.
4. Context: Data is often meaningless without context. Information provides context by
organizing and interpreting the data in a way that is relevant and meaningful to the user.
5. Processing: Data is often raw and unprocessed, whereas information is the result of
processing the data. Processing involves sorting, organizing, analyzing, and presenting the
data in a way that provides insights and helps us make decisions.
Data processing
The data which is collected for the purpose of the study itself cannot reveal everything. This being a
raw data, it is required to process and analyze in order to have desired result. The data which is
collected cannot be directly used for making analysis. Before analysis, data is required to be processed.
Data processing is an intermediate stage between collection of data and their analysis and
interpretation, which include Checking, Editing, Coding and Tabulation. Data processing is a crucial
stage in research. After collecting the data from the field, the researcher has to process and analyze
them in order to arrive at certain conclusions which may confirm or invalidate the hypothesis which
he had formulated towards the beginning of research worth. The mass of data collected during the
field work is to be processed with a view to reducing them to manageable proportions. Only by such
a careful and systematic processing, the data will lend itself for statistical treatment and meaningful
interpretation and conclusion. The processing of data includes editing, coding, classification and
tabulation. The collected data should be organized in such a way so that table charts can be prepared
for presentation. The processing of data is necessary because, the data collected should be examined
and errors and mistakes are rectified so that at the stage of analysis of data, no difficulty is
experienced. Various steps involved in processing of data are Editing, Coding, Classification and
Tabulation.
EDITING: Editing means to rectify or to set to order or to correct or to establish sequence. Editing is
the process of examining the data collected in questionnaire or interview schedule to deduct errors
and omissions and to correct those if possible. When the whole data collection is over, a final and
thorough check up is made for data processing. It is better if the data collected is verified even before
the data analysis is carried out. In this process editing is the first step. Editing is done to assure that
the collected data are accurate, consistant with other facts gathered uniformly entered and as
complete as possible. For example imagine if we get the newspaper unedited, how the news will
appear? Similarly, an unedited film will have no sequence of events, which means the story cannot be
understood at all.
CODING: Coding is the process of organizing the data or response into classes or categories and
assigning numerical or other symbols to responses according to the class or category in which they
fall. Hence coding is considered as the classification process. Coding is necessary for efficient analysis.
Coding is used to compartmentalize several replies effective into a small number of classes which
contain the critical information required for analysis. In the process of coding, the study of answer is
the first step and the last step is transfer of information from the schedule to the separate sheet called
transcription sheet. Transcription sheet is a large summary sheet which contains the answers or codes
of all the respondents. Transcription may not be necessary when only simple tables are required and
the number of respondents is few. Coding is done with the help of set rules. The classes or categories
should be reasonable and should be appropriate to the research problem, under study. The coding
must be exhaustive; it means there should be class for each item of the data. For each answer it should
be assigned with separate number. The coding should be based on the fact of mutual exclusivity it
means specific answer can be place only in one category. The coding must observe the rule of single
dimension; it means every class in the category set is defined in terms of only one concept. Coding
provides base for analysis. It can be simplified if use of pre-coded questionnaire is made. The decision
of coding should be taken well in advance at the stage of designing of questionnaire. By the
investigators time is saved. Standard method should be used in case of hand coding. The process of
coding is hard task but can be simplified if coding preparation is made prior to designing of schedule
or questionnaire. Whatever method is adopted for coding the main important fact is that coding errors
should be reduced.
CLASSIFICATION: Classification is a process in which large data is reduced into homogeneous group
meaningful relationship is needed. It is process of arranging data into groups based on common
characteristics and classification can be done either according to attributes or according to class
intervals.
Types of Classification:
• Geographical: In this type of classification data are classified on the basis of geographical. e.g.
countries, state, cities, village, areas, etc
• Chronological: When data are observed over a period of time the type of classification is
known as chronological classification.
• Qualitative: In qualitative classification data are classified on the basis of some attributes or
quality such as sex, colour of hair, literacy, religion etc. The point to note in this type of
classification is that attribute under study cannot be measured , one can only find out
whether it is present or absent in the units of the population under study.
• Quantitative: In quantitative classification, data are classified based on measurable quantities
or numerical values. These values represent the magnitude or extent of a particular
characteristic, allowing for precise measurement and comparison. Examples include age,
height, weight, income, marks obtained, or the number of family members. Unlike qualitative
classification, where attributes are non-measurable, quantitative classification deals with data
that can be expressed in numerical terms, either as discrete (countable) or continuous
(measurable) variables. This type of classification enables statistical analysis, such as
calculating averages, percentages, and standard deviations.
TABULATION: Tabulation is the process of summarizing raw data and displaying in compact form of
vertical columns and horizontal rows of numbers for further analysis. Analysis of data is made possible
through tables. Tabulation may be done manually or mechanically or electronically. Tabulation is the
process of presenting in an orderly manner of the classified data in a table. In other words, it is a
method of presenting the summarized data. Tabulation is very important because:
• Its helps to conserve space
• It avoids any need for explanation
• Computation of the data is made easier
• Comparison of data becomes very simple
• Adequacy or inadequacy of the data is clearly visible.
A table contains columns and rows. These columns and rows create small boxes which are called cells.
Tables are classified as
• One-way table: one way frequency tables presents the distribution of cases on only a single
dimension or variable.
Female 84 42%
Male 15 25 18
Female 20 16 22
Total 35 41 40
Data Mining
Data mining is a process of discovering patterns, trends, and insights in large datasets. It is a set of
techniques and algorithms that allow users to extract useful information from raw data. Data mining
techniques are used in various fields, such as business, finance, healthcare, and marketing.
In the context of online shopping, data mining can help you find the right product by filtering out
irrelevant data. For example, if you want to buy a camera online, you can use various filters to refine
your search. You can filter the search results by brand, price, features, customer ratings, and other
criteria. By doing so, you can eliminate irrelevant products and focus on the ones that meet your
specific requirements.
Here's an example of how data mining can be used in online shopping:
Suppose you want to buy a digital camera online. You start your search on a popular e-commerce
website and enter the keyword "digital camera" in the search bar. The website returns thousands of
search results, which can be overwhelming.
To narrow down your search, you use filters such as:
• Brand: You want a camera from a reputable brand such as Canon, Nikon, or Sony.
• Price range: You have a budget of Rs. 30,000, so you filter out cameras that are too expensive.
• Resolution: You want a camera with at least 16 megapixels.
• Features: You want a camera with a zoom lens, image stabilization, and video recording
capabilities.
• Customer ratings: You want to see cameras with high ratings from other customers.
By applying these filters, you get a smaller set of search results that meet your criteria. You can then
compare the products based on their features, prices, and ratings to make an informed decision. This
process of refining and filtering data based on specific criteria is an example of data mining in action.
The end goal of data mining is to take raw bits of information and determine if there is cohesion or
correlation among the data. This benefit of data mining allows a company to create value with the
information they have on hand that would otherwise not be overly apparent. Though data models can
be complex, they can also yield fascinating results, unearth hidden trends, and suggest unique
strategies.
Overall, data mining can help organizations gain a competitive advantage by leveraging data to make
better decisions, improve efficiency, and enhance customer satisfaction.
These are just a few examples of how data mining is being used in various industries. As the amount
of data generated continues to grow, data mining is becoming increasingly important for organizations
looking to gain insights and make better decisions.
OLTP Benefits
• Solves and maintains the challenge of daily transaction management
• Simplifies individual procedures and complex duties
• Offers fast transactions
OLAP
OLAP stands for Online Analytical Processing and its primary objective is the analysis of data. It is
generally described as a category of software tools used to provide data analysis for business decisions.
With the help of OLAP, data analysts can get an insight into the information on multiple databases and
analyze them at a time. The main emphasis of OLAP is the response time to complex queries.
OLAP Example
Online Analytical Processing, or OLAP, is a computer approach that allows users to extract and query
data conveniently and selectively to examine it from many perspectives. OLAP business intelligence
queries are frequently used for financial reporting, trend analysis, budgeting, sales forecasting, and
other types of planning.
For instance, a user may request that data be analysed to present a spreadsheet exhibiting all of an
enterprise's clothing products sold in Kolkata in December, compare revenue figures with the ones for
the same items in February, and then see a comparison of other product sales in Kolkata during the
same period.
Some more OLAP examples are:
Market basket analysis is a data mining technique used to identify associations between different
items or products that are frequently purchased together. It is based on the idea that if a customer
buys one item, they are likely to buy another related item as well. This technique is often used in retail,
e-commerce, and marketing to gain insights into customer behavior and preferences.
The analysis works by analyzing transactional data, such as point of sale data or online purchase
history, to identify patterns of co-occurrence between different items. The output of market basket
analysis is a set of rules that describe the relationships between different items. These rules are
often expressed in the form of "if X is purchased, then Y is also likely to be purchased."
Let's consider an example of market basket analysis using bread and butter as the products.
Assume that a grocery store has transaction data that shows the items purchased by customers. From
this data, the store can perform market basket analysis to identify which items are frequently
purchased together.
If we consider the example of bread and butter, the grocery store may find that when customers
purchase bread, they are more likely to also purchase butter. This is because bread and butter are
often consumed together and are considered complementary products.
Using market basket analysis, the grocery store can generate rules to describe the relationship
between bread and butter, such as "If a customer buys bread, there is a high likelihood they will
also buy butter."
Based on these rules, the grocery store can optimize the placement of bread and butter in the store,
such as placing them next to each other on the shelf or in a promotion deal. This can increase sales
and improve customer satisfaction.
Filtering and sorting are essential functions in Excel for organizing and analyzing data. Here's a basic
overview of how to use them:
Filtering: Filtering allows you to display only the rows that meet certain criteria while temporarily
hiding the other rows. This is useful for focusing on specific subsets of data. Here's how to apply
filtering:
SUM: The SUM function in Excel is used to add up a range of numbers in a worksheet. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the sum of all the values in the specified range.
=SUM(A2:A10)
This formula will add up all the values in the range A2:A10 and return the sum.
AVERAGE: The AVERAGE function in Excel is used to calculate the arithmetic mean of a range of
numbers in a worksheet. The function takes one or more arguments, which can be cell references or
numerical values. The function then returns the average of all the values in the specified range.
=AVERAGE(B2:B10)
This formula will calculate the average of all the values in the range B2:B10 and return the result.
IF: The IF function in Excel is used to perform a logical test and return one value if the test is true, and
another value if the test is false. The function requires three arguments: the logical test, the value to
return if the test is true, and the value to return if the test is false.
=IF(A2>B2,"Yes","No")
This formula will perform a logical test to determine if the value in cell A2 is greater than the value in
cell B2. If the test is true, it will return "Yes", and if the test is false, it will return "No".
COUNT: The COUNT function in Excel is used to count the number of cells in a range that contain
numerical values. The function takes one or more arguments, which can be cell references or
numerical values. The function then returns the total count of all the values in the specified range.
=COUNT(A2:A10)
This formula will count the number of cells in the range A2:A10 that contain numerical values and
return the total count.
MIN: The MIN function in Excel is used to find the minimum value in a range of cells. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the smallest value in the specified range.
=MIN(A2:A10)
This formula will find the smallest value in the range A2:A10 and return the result.
MAX: The MAX function in Excel is used to find the maximum value in a range of cells. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the largest value in the specified range.
=MAX(A2:A10)
This formula will find the largest value in the range A2:A10 and return the result.
VLOOKUP: VLOOKUP is a function in Excel that stands for "Vertical Lookup". It is used to search for a
specific value in a column of data and retrieve a corresponding value from the same row in another
column. The function requires four arguments: the value to search for, the range of cells to search in,
the column number of the value to return, and whether the search should be an exact match or an
approximate match.
=VLOOKUP(B2,A2:D10,3,FALSE)
This formula will search for the value in cell B2 within the range A2:D10, and return the value in the
third column of the range.
Illustration 1:
Student Exam
Name Score
John 85
Mary 90
Alice 75
Bob 88
Sarah 92
SUM: This function adds up all the numbers in a range. For example, if you want to find the total exam
scores:
=SUM(B2:B6)
This will give you the sum of all the exam scores, which is 85 + 90 + 75 + 88 + 92 = 430.
SUMIF: Suppose we want to find the total exam score for students who scored above 80. We can use
the SUMIF function to sum the scores based on a specific condition.
=SUMIF(B2:B6, ">80")
This formula will sum all the exam scores (in column B) where the score is greater than 80.
COUNT: This function counts the number of cells that contain numbers within a range. For example,
if you want to count how many students took the exam:
=COUNT(B2:B6)
This will give you the count of cells with exam scores, which is 5.
COUNTIF: If we want to count the number of students who scored above 80, we can use the COUNTIF
function.
=COUNTIF(B2:B6, ">80")
This formula will count the number of exam scores (in column B) that are greater than 80.
MIN: This function returns the smallest number in a range. For example, if you want to find the lowest
exam score:
=MIN(B2:B6)
This will give you the minimum exam score, which is 75.
MAX: This function returns the largest number in a range. For example, if you want to find the highest
exam score:
=MAX(B2:B6)
This will give you the maximum exam score, which is 92.
AVERAGE: This function calculates the average of numbers in a range. For example, if you want to find
the average exam score:
=AVERAGE(B2:B6)
This will give you the average exam score, which is (85 + 90 + 75 + 88 + 92) / 5 = 86.
Illustration 2. (Use SUM, COUNT, MIN, AVERAGE and IF function)