0% found this document useful (0 votes)
9 views17 pages

Reference Notes

Business analytics involves using data and statistical methods to analyze business performance and make data-driven decisions, with applications in various sectors such as retail and manufacturing. It encompasses four types of analytics: descriptive, predictive, diagnostic, and prescriptive, each serving a unique purpose in data analysis. Business intelligence, while related, focuses on descriptive insights to improve operational efficiency and ROI, distinguishing it from the predictive nature of business analytics.

Uploaded by

mahe424
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Reference Notes

Business analytics involves using data and statistical methods to analyze business performance and make data-driven decisions, with applications in various sectors such as retail and manufacturing. It encompasses four types of analytics: descriptive, predictive, diagnostic, and prescriptive, each serving a unique purpose in data analysis. Business intelligence, while related, focuses on descriptive insights to improve operational efficiency and ROI, distinguishing it from the predictive nature of business analytics.

Uploaded by

mahe424
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Business analytics

Business analytics is the practice of using data and statistical methods to analyze business
performance, identify patterns, and make data-driven decisions. Business analytics combines the
fields of statistics, data science, and business management to provide insights and solutions to
business problems. Here is an example of how business analytics can be used in a retail business:
A retail business wants to improve their sales performance and decides to use business analytics to
help identify areas of improvement. They collect data on customer purchases, such as the type of
products purchased, the time of day, and the amount spent. The retail business then uses statistical
methods to analyze the data and identify patterns and trends.
Using the data analysis, the retail business identifies that the most popular products are purchased
during specific hours of the day. They also find that customers who purchase certain products are
more likely to purchase related products in the same transaction. Armed with this information, the
business decides to:
• Increase the availability of popular products during peak hours
• Bundle related products together to increase the likelihood of a higher transaction value
• Create targeted marketing campaigns to promote popular products during peak hours
As a result of implementing these changes based on data analysis, the retail business sees an increase
in sales performance and customer satisfaction.
In this example, business analytics was used to analyze customer data and make data-driven decisions
to improve business performance. By leveraging data and statistical methods, businesses can gain
valuable insights and make more informed decisions.

There are four types of business analytics: descriptive, predictive, diagnostic, and prescriptive. Here
is an example of how each type of analytics can be used in a manufacturing business:
1. Descriptive Analytics: Descriptive analytics is used to summarize and describe data to gain a
better understanding of what has happened in the past. An example of descriptive analytics
in a manufacturing business is the analysis of historical data to identify production trends. A
manufacturer might use descriptive analytics to analyze the total units produced in the last
year, the number of units produced per month, and the average time taken to produce each
unit.
2. Predictive Analytics: Predictive analytics is used to forecast future outcomes based on
historical data. An example of predictive analytics in a manufacturing business is the analysis
of production data to predict future demand. A manufacturer might use predictive analytics
to identify the future demand for a specific product, based on historical sales data, industry
trends, and seasonality.
3. Diagnostic Analytics: Diagnostic analytics is used to determine the cause of a specific problem
or issue. An example of diagnostic analytics in a manufacturing business is the analysis of
production data to identify the cause of a decrease in production efficiency. A manufacturer
might use diagnostic analytics to identify the root cause of a decrease in production efficiency,
such as a malfunctioning machine or an inadequate supply of raw materials.
4. Prescriptive Analytics: Prescriptive analytics is used to provide recommendations on the best
course of action to achieve a desired outcome. An example of prescriptive analytics in a
manufacturing business is the optimization of the production process to minimize costs and
maximize output. A manufacturer might use prescriptive analytics to identify the optimal mix
of raw materials, production methods, and machine settings to minimize costs while
maximizing production output.
In summary, each type of business analytics plays a unique role in the analysis of data to drive business
decisions. By leveraging data analytics, businesses can gain valuable insights, make more informed
decisions, and ultimately improve performance.
The process of business analytics typically involves the following steps:
1. Define the business problem or question: The first step is to clearly define the problem or
question that needs to be addressed. This can involve identifying the key business objectives,
understanding the business processes, and defining the key performance indicators (KPIs) that
will be used to measure success.
2. Collect data: Once the business problem has been defined, the next step is to collect the
relevant data. This can involve gathering data from various sources, such as internal
databases, external sources, and public data repositories. The data may be in structured or
unstructured form.
3. Cleanse and prepare the data: After the data has been collected, it needs to be cleaned and
prepared for analysis. This involves removing any inconsistencies or errors, transforming the
data into a consistent format, and ensuring that the data is ready for analysis.
4. Analyze the data: With the data cleaned and prepared, the next step is to analyze the data.
This can involve using various statistical and data mining techniques to identify patterns,
trends, and insights. Common analytical techniques include regression analysis, clustering,
decision trees, and time series analysis.
5. Interpret the results: Once the analysis is complete, the next step is to interpret the results.
This involves using the insights gained from the analysis to make informed business decisions.
The results may be visualized using charts, graphs, or other visualization tools to make them
more accessible to decision-makers.
6. Communicate the findings: The final step is to communicate the findings to key stakeholders
in the business. This may involve creating reports, dashboards, or presentations that provide
an overview of the analysis and its implications for the business. Effective communication is
essential to ensure that the insights gained from the analysis are acted upon and that the
business benefits from the investment in analytics.

There are several key advantages of business analytics:


• Improved Decision Making: Business analytics provides insights that enable businesses to
make better-informed decisions. By analyzing data from different sources and identifying
patterns and trends, businesses can identify opportunities and risks, and make data-driven
decisions.
• Increased Efficiency and Effectiveness: By using business analytics tools, businesses can
automate repetitive tasks and streamline their operations, which can lead to increased
efficiency and effectiveness. This can help businesses save time and money, and improve their
bottom line.
• Competitive Advantage: By using business analytics to gain insights into customer behavior,
market trends, and competitive intelligence, businesses can gain a competitive advantage.
This can help businesses stay ahead of their competitors and identify new opportunities for
growth.
• Improved Customer Satisfaction: By analyzing customer data, businesses can gain insights into
customer behavior and preferences, which can help them improve their products and services
and better meet the needs of their customers. This can lead to increased customer satisfaction
and loyalty.
• Better Risk Management: By using business analytics to identify potential risks and
opportunities, businesses can better manage their risks and make more informed decisions.
This can help businesses avoid potential losses and improve their overall risk management
strategy.
Overall, the key advantages of business analytics include improved decision-making, increased
efficiency and effectiveness, competitive advantage, improved customer satisfaction, and better risk
management.

Business intelligence
Put simply, BI combines technology, processes, and strategies to transform raw data into meaningful
insights,1 helping individuals and organizations make informed decisions and drive business success.
BI is not to be confused with business analytics. Business intelligence takes a descriptive approach to
give clear insight into how a business is performing. Business analytics, on the other hand, is a
predictive effort to describe what an organization might do to achieve greater outcomes.
BI is a useful compass for organizations, as it can guide people through enormous amounts of data
and help them achieve their business goals. It empowers business leaders, executives, and analysts to
gather, analyze, and interpret data from sources across the enterprise for a comprehensive view of
operations, customers, market trends, and more.

The Importance of BI: Improving Return on Investment


Business intelligence is a powerful tool with which organizations can improve their return on
investment (ROI). Applying appropriate BI concepts increases ROI in three essential ways:1
1. Revealing Operational Inefficiencies and Opportunities for Enhancement: BI gives a comprehensive
view of an organization’s business operations, allowing leaders to flag inefficiencies and possible areas
for improvement. After thorough data analysis and review of performance metrics, professionals
within each company can go on to streamline processes, optimize resource allocation, reduce
operational costs, and more. When used effectively, business intelligence can help people eliminate
workflow bottlenecks, enhance productivity, and maximize efficiency.

2. Uncovering Valuable Customer Insights: The customer is king! The importance of understanding
customer preferences and behavior in today's competitive market can't be overstated. BI tools analyze
information such as customer data, purchase history, and interactions to reveal valuable insights. A
customer's purchasing history, for example, holds a wealth of knowledge. By studying past
transactions, business analysts can gain a better understanding of what products or services resonate
with their customers the most. This knowledge enables businesses to optimize their product offerings,
fine-tune marketing strategies, and even develop personalized recommendations for each customer,
leading to increased customer satisfaction and loyalty. Similarly, information gathered from customer
interactions with a company is crucial. It allows companies to assess the quality of customer service,
identify pain points, and uncover opportunities for improvement. Understanding how customers
engage with a company's products or services helps businesses to better meet customer needs,
ensuring a positive and seamless customer experience.

3. Identifying Needs for New Products or Services: BI insights can spark new ideas. When analyzing
market trends and consumer demand, business analysts and their colleagues often identify gaps in
the market and opportunities for new products or services. Using BI, they can confidently develop
offerings that align with customer needs, ensuring a greater chance of success. The ability to innovate
and offer products or services that cater to specific market demands is a valuable means toward
boosting revenue and contributing significantly to ROI.
Data and information
Data and information are two concepts that are closely related but different in meaning. Data refers
to raw, unprocessed facts or figures that have no meaning or context on their own. Information, on
the other hand, is the processed, organized, and meaningful data that can be used for decision making,
analysis, or communication.
To better understand the difference between data and information, let's look at an example of data
and how it can be transformed into information.
Suppose we have a dataset of sales figures for a company's products over the past year. The dataset
might include columns for product name, date of sale, and sale price. This raw data is just a collection
of numbers and text with no meaning or context. We can say that this is data.
Now, if we sort this data by product name and create a chart that shows the total sales for each
product, we have transformed the raw data into meaningful information. This information can help
us understand which products are most popular and which ones are not selling as well. We can use
this information to make decisions about which products to promote or discontinue.
Another example of data and information can be found in weather data. Suppose we have a dataset
of temperature readings for a city over the past month. This raw data is just a collection of numbers
with no meaning or context. We can say that this is data.
Now, if we organize this data by date and time and create a graph that shows the temperature trends
over the month, we have transformed the raw data into meaningful information. This information can
help us understand how the temperature is changing over time, and we can use it to plan for weather
conditions in the future.
In summary, data is the raw and unprocessed facts and figures, while information is the processed and
organized data that has meaning and context. To transform data into information, we need to analyze,
sort, organize, and visualize the data in a way that provides insights and helps us make decisions.
Understanding the difference between data and information is crucial in today's world, where data is
abundant and valuable. Businesses, governments, and individuals need to be able to distinguish
between the two to make the most out of the data they collect and use it to their advantage.
Here are five key differences between data and information:
1. Meaning: Data refers to raw, unprocessed facts or figures that have no meaning on their own.
Information, on the other hand, is data that has been processed and organized to provide
context and meaning.
2. Form: Data can be in various forms, such as numbers, text, images, or sounds. Information,
however, is usually presented in a more organized and structured form, such as a report,
graph, or chart.
3. Usefulness: Data on its own is not very useful as it does not provide any insights or help us
make decisions. Information, on the other hand, is useful as it provides insights, allows for
analysis, and can be used to make informed decisions.
4. Context: Data is often meaningless without context. Information provides context by
organizing and interpreting the data in a way that is relevant and meaningful to the user.
5. Processing: Data is often raw and unprocessed, whereas information is the result of
processing the data. Processing involves sorting, organizing, analyzing, and presenting the
data in a way that provides insights and helps us make decisions.

Data processing
The data which is collected for the purpose of the study itself cannot reveal everything. This being a
raw data, it is required to process and analyze in order to have desired result. The data which is
collected cannot be directly used for making analysis. Before analysis, data is required to be processed.
Data processing is an intermediate stage between collection of data and their analysis and
interpretation, which include Checking, Editing, Coding and Tabulation. Data processing is a crucial
stage in research. After collecting the data from the field, the researcher has to process and analyze
them in order to arrive at certain conclusions which may confirm or invalidate the hypothesis which
he had formulated towards the beginning of research worth. The mass of data collected during the
field work is to be processed with a view to reducing them to manageable proportions. Only by such
a careful and systematic processing, the data will lend itself for statistical treatment and meaningful
interpretation and conclusion. The processing of data includes editing, coding, classification and
tabulation. The collected data should be organized in such a way so that table charts can be prepared
for presentation. The processing of data is necessary because, the data collected should be examined
and errors and mistakes are rectified so that at the stage of analysis of data, no difficulty is
experienced. Various steps involved in processing of data are Editing, Coding, Classification and
Tabulation.

EDITING: Editing means to rectify or to set to order or to correct or to establish sequence. Editing is
the process of examining the data collected in questionnaire or interview schedule to deduct errors
and omissions and to correct those if possible. When the whole data collection is over, a final and
thorough check up is made for data processing. It is better if the data collected is verified even before
the data analysis is carried out. In this process editing is the first step. Editing is done to assure that
the collected data are accurate, consistant with other facts gathered uniformly entered and as
complete as possible. For example imagine if we get the newspaper unedited, how the news will
appear? Similarly, an unedited film will have no sequence of events, which means the story cannot be
understood at all.

CODING: Coding is the process of organizing the data or response into classes or categories and
assigning numerical or other symbols to responses according to the class or category in which they
fall. Hence coding is considered as the classification process. Coding is necessary for efficient analysis.
Coding is used to compartmentalize several replies effective into a small number of classes which
contain the critical information required for analysis. In the process of coding, the study of answer is
the first step and the last step is transfer of information from the schedule to the separate sheet called
transcription sheet. Transcription sheet is a large summary sheet which contains the answers or codes
of all the respondents. Transcription may not be necessary when only simple tables are required and
the number of respondents is few. Coding is done with the help of set rules. The classes or categories
should be reasonable and should be appropriate to the research problem, under study. The coding
must be exhaustive; it means there should be class for each item of the data. For each answer it should
be assigned with separate number. The coding should be based on the fact of mutual exclusivity it
means specific answer can be place only in one category. The coding must observe the rule of single
dimension; it means every class in the category set is defined in terms of only one concept. Coding
provides base for analysis. It can be simplified if use of pre-coded questionnaire is made. The decision
of coding should be taken well in advance at the stage of designing of questionnaire. By the
investigators time is saved. Standard method should be used in case of hand coding. The process of
coding is hard task but can be simplified if coding preparation is made prior to designing of schedule
or questionnaire. Whatever method is adopted for coding the main important fact is that coding errors
should be reduced.
CLASSIFICATION: Classification is a process in which large data is reduced into homogeneous group
meaningful relationship is needed. It is process of arranging data into groups based on common
characteristics and classification can be done either according to attributes or according to class
intervals.
Types of Classification:
• Geographical: In this type of classification data are classified on the basis of geographical. e.g.
countries, state, cities, village, areas, etc
• Chronological: When data are observed over a period of time the type of classification is
known as chronological classification.
• Qualitative: In qualitative classification data are classified on the basis of some attributes or
quality such as sex, colour of hair, literacy, religion etc. The point to note in this type of
classification is that attribute under study cannot be measured , one can only find out
whether it is present or absent in the units of the population under study.
• Quantitative: In quantitative classification, data are classified based on measurable quantities
or numerical values. These values represent the magnitude or extent of a particular
characteristic, allowing for precise measurement and comparison. Examples include age,
height, weight, income, marks obtained, or the number of family members. Unlike qualitative
classification, where attributes are non-measurable, quantitative classification deals with data
that can be expressed in numerical terms, either as discrete (countable) or continuous
(measurable) variables. This type of classification enables statistical analysis, such as
calculating averages, percentages, and standard deviations.

TABULATION: Tabulation is the process of summarizing raw data and displaying in compact form of
vertical columns and horizontal rows of numbers for further analysis. Analysis of data is made possible
through tables. Tabulation may be done manually or mechanically or electronically. Tabulation is the
process of presenting in an orderly manner of the classified data in a table. In other words, it is a
method of presenting the summarized data. Tabulation is very important because:
• Its helps to conserve space
• It avoids any need for explanation
• Computation of the data is made easier
• Comparison of data becomes very simple
• Adequacy or inadequacy of the data is clearly visible.
A table contains columns and rows. These columns and rows create small boxes which are called cells.
Tables are classified as
• One-way table: one way frequency tables presents the distribution of cases on only a single
dimension or variable.

Gender No. of Respondents %

Male 116 58%

Female 84 42%

Total 200 100%


• Two-way table: Distributions in terms of two or more variables and the relationship between
two variable are shown in two –way tables.

Gender Education levels

Primary Secondary Graduation

Male 15 25 18

Female 20 16 22

Total 35 41 40

Sources of data (Primary and secondary)


Primary and secondary sources of data refer to the origin of the data being collected, and whether it
is being collected for the first time or has already been collected by someone else. Here are the
definitions and meanings of primary and secondary sources of data:
Primary sources of data: Primary data is collected for the first time by the researcher, usually to
answer a specific research question or to address a particular problem. Primary sources of data are
original and firsthand, and may include surveys, experiments, observations, interviews, and focus
groups. Primary data collection involves directly engaging with the subjects being studied to obtain
new and unique data. This type of data is typically more accurate, relevant, and specific to the research
question being investigated. However, primary data collection can be time-consuming and costly.
Secondary sources of data: Secondary data has already been collected and published by someone
else for a different purpose. Secondary sources of data can be obtained from various sources,
including academic literature, government records, commercial databases, and social media
platforms. This type of data can be used to answer different research questions or to support existing
research. Secondary data collection is often less expensive and less time-consuming than primary data
collection, but the data may be less specific and accurate than primary data. It is important to evaluate
the quality and reliability of secondary sources of data before using them for research purposes.
In summary, primary sources of data are original and collected for the first time to answer a specific
research question, while secondary sources of data have already been collected and can be used for
different research purposes. Both primary and secondary sources of data have their advantages and
disadvantages, and the choice of which to use will depend on the research question, resources, and
other factors.
Primary Sources of Data:
1. Surveys: Surveys can be conducted to collect primary data from individuals or groups. Surveys
can be conducted in person, over the phone, or online.
2. Interviews: Interviews can be conducted in person, over the phone, or online to collect
primary data from individuals.
3. Observations: Observations can be conducted to collect primary data by watching and
recording behavior, events, or processes.
4. Experiments: Experiments can be conducted to collect primary data by manipulating one or
more variables and recording the outcomes.
5. Focus groups: Focus groups can be conducted to collect primary data by bringing together a
group of people to discuss a topic.
Secondary Sources of Data:
1. Government records: Government records, such as census data, crime statistics, and
economic data, are examples of secondary sources of data.
2. Academic literature: Scholarly articles, books, and reports are examples of secondary sources
of data. These sources can provide information on previous research and data collected by
other researchers.
3. Commercial databases: Commercial databases, such as market research reports, financial
data, and company profiles, are examples of secondary sources of data.
4. Social media: Social media platforms, such as Twitter, Facebook, and Instagram, can provide
secondary data on consumer behavior, sentiment, and trends.
5. Websites: Websites can provide secondary data on topics such as health, education, and
politics. For example, government websites may provide data on public health, while
educational websites may provide data on student performance.
In summary, primary sources of data include surveys, interviews, observations, experiments, and
focus groups, while secondary sources of data include government records, academic literature,
commercial databases, social media, and websites.

Data Mining
Data mining is a process of discovering patterns, trends, and insights in large datasets. It is a set of
techniques and algorithms that allow users to extract useful information from raw data. Data mining
techniques are used in various fields, such as business, finance, healthcare, and marketing.
In the context of online shopping, data mining can help you find the right product by filtering out
irrelevant data. For example, if you want to buy a camera online, you can use various filters to refine
your search. You can filter the search results by brand, price, features, customer ratings, and other
criteria. By doing so, you can eliminate irrelevant products and focus on the ones that meet your
specific requirements.
Here's an example of how data mining can be used in online shopping:
Suppose you want to buy a digital camera online. You start your search on a popular e-commerce
website and enter the keyword "digital camera" in the search bar. The website returns thousands of
search results, which can be overwhelming.
To narrow down your search, you use filters such as:
• Brand: You want a camera from a reputable brand such as Canon, Nikon, or Sony.
• Price range: You have a budget of Rs. 30,000, so you filter out cameras that are too expensive.
• Resolution: You want a camera with at least 16 megapixels.
• Features: You want a camera with a zoom lens, image stabilization, and video recording
capabilities.
• Customer ratings: You want to see cameras with high ratings from other customers.

By applying these filters, you get a smaller set of search results that meet your criteria. You can then
compare the products based on their features, prices, and ratings to make an informed decision. This
process of refining and filtering data based on specific criteria is an example of data mining in action.
The end goal of data mining is to take raw bits of information and determine if there is cohesion or
correlation among the data. This benefit of data mining allows a company to create value with the
information they have on hand that would otherwise not be overly apparent. Though data models can
be complex, they can also yield fascinating results, unearth hidden trends, and suggest unique
strategies.

Some benefits of Data Mining


Here are some of the key benefits of data mining:
• Improved decision-making: By uncovering hidden patterns and trends in data, data mining can
help organizations make better decisions. For example, a retailer can use data mining to
analyze sales data and identify which products are selling well, which products are not selling,
and what factors influence customer buying behavior.
• Increased efficiency: Data mining can help organizations streamline their operations and
reduce costs. For example, a manufacturer can use data mining to identify inefficiencies in its
production process and optimize its supply chain.
• Enhanced customer insights: By analyzing customer data, organizations can gain insights into
customer preferences, behavior, and needs. This can help organizations tailor their products
and services to meet customer demands and improve customer satisfaction.
• Improved marketing effectiveness: Data mining can help organizations identify the most
effective marketing strategies and channels. For example, a retailer can use data mining to
analyze customer data and identify which marketing campaigns are driving the most sales.
• Fraud detection and prevention: Data mining can be used to detect and prevent fraud in
various industries, such as banking and insurance. By analyzing transaction data, data mining
algorithms can identify patterns and anomalies that may indicate fraudulent activity.

Overall, data mining can help organizations gain a competitive advantage by leveraging data to make
better decisions, improve efficiency, and enhance customer satisfaction.

Various applications of Data mining in different industries:


Some examples of the applications of data mining in various industries:
• Retail: Retailers use data mining to analyze sales data and customer behavior to identify
buying patterns, preferences, and trends. This information can be used to optimize product
offerings, pricing strategies, and marketing campaigns. For example, Amazon uses data mining
algorithms to recommend products to customers based on their browsing and purchase
history.
• Healthcare: Healthcare organizations use data mining to analyze patient data and identify
trends and patterns that can improve diagnosis, treatment, and patient outcomes. For
example, data mining algorithms can be used to analyze electronic health records (EHRs) to
identify risk factors for certain diseases or conditions.
• Finance: Financial institutions use data mining to analyze transaction data and identify
patterns or anomalies that may indicate fraud, money laundering, or other financial crimes.
Data mining algorithms can also be used to identify profitable investment opportunities or to
analyze credit risk.
• Manufacturing: Manufacturers use data mining to optimize production processes and reduce
costs. For example, data mining algorithms can be used to identify inefficiencies in the
manufacturing process, such as bottlenecks or waste, and to optimize production schedules.
• Marketing: Marketers use data mining to analyze customer data and identify patterns and
trends that can be used to improve marketing campaigns. For example, data mining
algorithms can be used to analyze social media data to identify customer sentiment and
preferences, or to analyze customer browsing and purchase history to identify opportunities
for cross-selling or up-selling.
• Transportation: Transportation companies use data mining to optimize routes and schedules,
reduce costs, and improve safety. For example, data mining algorithms can be used to analyze
traffic patterns and road conditions to optimize delivery routes, or to analyze driver behavior
to identify opportunities for safety improvements.

These are just a few examples of how data mining is being used in various industries. As the amount
of data generated continues to grow, data mining is becoming increasingly important for organizations
looking to gain insights and make better decisions.

Data Mining Techniques:


Data mining techniques can broadly be categorized into two types: predictive and descriptive. Here is
a brief overview of some of the most commonly used techniques within each category:
Predictive Data Mining Techniques:
a. Classification: This technique is used to categorize data into specific groups based on a set of
predefined attributes. For example, a bank might use classification to categorize customers as
high risk, medium risk, or low risk based on their credit score, income, and other factors.
b. Regression: Regression is used to predict a numerical value based on a set of input variables.
For example, a retailer might use regression to predict future sales based on historical sales
data and other factors such as advertising spend and seasonality.
c. Prediction: This technique is used to make predictions about future events or outcomes based
on historical data. For example, a stock trader might use prediction to forecast future stock
prices based on historical market trends.
d. Time series: Time series analysis is used to analyze data that is collected over time, such as
stock prices or weather patterns. This technique is used to identify trends and patterns in the
data and to make predictions about future values.
Descriptive Data Mining Techniques:
a. Clustering: Clustering is used to group similar data points together based on their attributes.
For example, a retailer might use clustering to group customers based on their purchasing
behavior, such as customers who frequently purchase electronics.
b. Summarization: Summarization is used to generate summary statistics about a dataset, such
as mean, median, and mode. For example, a healthcare organization might use summarization
to calculate the average length of stay for patients in a hospital.
c. Association: Association is used to identify relationships between variables in a dataset. For
example, a retailer might use association to identify which products are frequently purchased
together, such as milk and bread.
d. Sequence: Sequence analysis is used to analyze data that is collected in a sequence, such as
clickstream data or customer browsing behavior. This technique is used to identify patterns
and trends in the data and to make predictions about future behavior.

OLTP & OLAP


OLAP stands for Online Analytical Processing. OLAP systems have the capability to analyze database
information of multiple systems at the current time. The primary goal of OLAP Service is data analysis
and not data processing.
OLTP stands for Online Transaction Processing. OLTP has the work to administer day-to-day
transactions in any organization. The main goal of OLTP is data processing not data analysis.

Online Analytical Processing (OLAP)


Online Analytical Processing (OLAP) consists of a type of software tool that is used for data analysis
for business decisions. OLAP provides an environment to get insights from the database retrieved from
multiple database systems at one time.
OLTP Example
The ATM centre is an example of an OLTP system. Assume that a couple has a joint bank account. One
day, they arrive at different ATMs simultaneously and want to withdraw the whole amount from their
bank accounts.
The user who completes the authentication procedure first, however, will be eligible to receive
money. In this situation, the OLTP system ensures that the withdrawn amount is never more
significant than the amount in the bank. The critical thing to remember here is that OLTP systems are
designed for transactional excellence instead of data analysis.

OLTP Benefits
• Solves and maintains the challenge of daily transaction management
• Simplifies individual procedures and complex duties
• Offers fast transactions

OLAP
OLAP stands for Online Analytical Processing and its primary objective is the analysis of data. It is
generally described as a category of software tools used to provide data analysis for business decisions.
With the help of OLAP, data analysts can get an insight into the information on multiple databases and
analyze them at a time. The main emphasis of OLAP is the response time to complex queries.
OLAP Example
Online Analytical Processing, or OLAP, is a computer approach that allows users to extract and query
data conveniently and selectively to examine it from many perspectives. OLAP business intelligence
queries are frequently used for financial reporting, trend analysis, budgeting, sales forecasting, and
other types of planning.
For instance, a user may request that data be analysed to present a spreadsheet exhibiting all of an
enterprise's clothing products sold in Kolkata in December, compare revenue figures with the ones for
the same items in February, and then see a comparison of other product sales in Kolkata during the
same period.
Some more OLAP examples are:

• Personalized homepage for different customers (Netflix, Amazon)


• Comparison of sales in different months stored in separate databases.

Market basket analysis is a data mining technique used to identify associations between different
items or products that are frequently purchased together. It is based on the idea that if a customer
buys one item, they are likely to buy another related item as well. This technique is often used in retail,
e-commerce, and marketing to gain insights into customer behavior and preferences.
The analysis works by analyzing transactional data, such as point of sale data or online purchase
history, to identify patterns of co-occurrence between different items. The output of market basket
analysis is a set of rules that describe the relationships between different items. These rules are
often expressed in the form of "if X is purchased, then Y is also likely to be purchased."
Let's consider an example of market basket analysis using bread and butter as the products.

Assume that a grocery store has transaction data that shows the items purchased by customers. From
this data, the store can perform market basket analysis to identify which items are frequently
purchased together.
If we consider the example of bread and butter, the grocery store may find that when customers
purchase bread, they are more likely to also purchase butter. This is because bread and butter are
often consumed together and are considered complementary products.
Using market basket analysis, the grocery store can generate rules to describe the relationship
between bread and butter, such as "If a customer buys bread, there is a high likelihood they will
also buy butter."
Based on these rules, the grocery store can optimize the placement of bread and butter in the store,
such as placing them next to each other on the shelf or in a promotion deal. This can increase sales
and improve customer satisfaction.

5 Step Data Mining Process:


1. Problem Definition: Define the problem you want to solve and identify the specific data mining
goals you want to achieve.
2. Data Collection and Preparation: Collect and prepare the relevant data for analysis by cleaning
and transforming it.
3. Data Exploration: Explore the data using descriptive statistics and data visualization to gain a
better understanding of the data and identify any patterns or trends.
4. Data Modeling: Choose an appropriate data mining technique and build a model using the
prepared data.
5. Model Evaluation and Deployment: Evaluate the model using statistical measures to
determine how well it performs, and deploy the model into a production environment where
it can be used to solve the problem.

Filtering and Sorting in Excel:

Filtering and sorting are essential functions in Excel for organizing and analyzing data. Here's a basic
overview of how to use them:

Filtering: Filtering allows you to display only the rows that meet certain criteria while temporarily
hiding the other rows. This is useful for focusing on specific subsets of data. Here's how to apply
filtering:

• Select the range of cells that contains your data.


• Go to the "Data" tab on the Excel ribbon.
• Click on the "Filter" button. You'll see small drop-down arrows appear next to each column
header.
• Click on the drop-down arrow next to the column you want to filter.
• You can then select specific values to display or use text filters, number filters, date filters,
etc., depending on the data type in the column.
• Once you've selected your filter criteria, Excel will hide the rows that don't meet those criteria,
displaying only the rows that match.
Sorting:Sorting allows you to rearrange your data based on the values in one or more columns. This
makes it easier to analyze and find information. Here's how to sort data:

• Select the range of cells that contains your data.


• Go to the "Data" tab on the Excel ribbon.
• Click on the "Sort A to Z" or "Sort Z to A" button to sort the selected column in ascending or
descending order, respectively. Alternatively, you can click on "Sort" and choose "Custom
Sort" to sort by multiple columns or specify a custom sorting order.
• Excel will rearrange the rows based on the values in the selected column(s), with the lowest
(or highest) values at the top.

Basic Excel Functions:

SUM: The SUM function in Excel is used to add up a range of numbers in a worksheet. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the sum of all the values in the specified range.

=SUM(A2:A10)

This formula will add up all the values in the range A2:A10 and return the sum.

AVERAGE: The AVERAGE function in Excel is used to calculate the arithmetic mean of a range of
numbers in a worksheet. The function takes one or more arguments, which can be cell references or
numerical values. The function then returns the average of all the values in the specified range.

=AVERAGE(B2:B10)

This formula will calculate the average of all the values in the range B2:B10 and return the result.

IF: The IF function in Excel is used to perform a logical test and return one value if the test is true, and
another value if the test is false. The function requires three arguments: the logical test, the value to
return if the test is true, and the value to return if the test is false.

=IF(A2>B2,"Yes","No")

This formula will perform a logical test to determine if the value in cell A2 is greater than the value in
cell B2. If the test is true, it will return "Yes", and if the test is false, it will return "No".

COUNT: The COUNT function in Excel is used to count the number of cells in a range that contain
numerical values. The function takes one or more arguments, which can be cell references or
numerical values. The function then returns the total count of all the values in the specified range.

=COUNT(A2:A10)

This formula will count the number of cells in the range A2:A10 that contain numerical values and
return the total count.
MIN: The MIN function in Excel is used to find the minimum value in a range of cells. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the smallest value in the specified range.

=MIN(A2:A10)

This formula will find the smallest value in the range A2:A10 and return the result.

MAX: The MAX function in Excel is used to find the maximum value in a range of cells. The function
takes one or more arguments, which can be cell references or numerical values. The function then
returns the largest value in the specified range.

=MAX(A2:A10)

This formula will find the largest value in the range A2:A10 and return the result.

VLOOKUP: VLOOKUP is a function in Excel that stands for "Vertical Lookup". It is used to search for a
specific value in a column of data and retrieve a corresponding value from the same row in another
column. The function requires four arguments: the value to search for, the range of cells to search in,
the column number of the value to return, and whether the search should be an exact match or an
approximate match.

=VLOOKUP(B2,A2:D10,3,FALSE)

This formula will search for the value in cell B2 within the range A2:D10, and return the value in the
third column of the range.

Illustration 1:

Student Exam
Name Score
John 85
Mary 90
Alice 75
Bob 88
Sarah 92

SUM: This function adds up all the numbers in a range. For example, if you want to find the total exam
scores:

=SUM(B2:B6)

This will give you the sum of all the exam scores, which is 85 + 90 + 75 + 88 + 92 = 430.
SUMIF: Suppose we want to find the total exam score for students who scored above 80. We can use
the SUMIF function to sum the scores based on a specific condition.

=SUMIF(B2:B6, ">80")

This formula will sum all the exam scores (in column B) where the score is greater than 80.

COUNT: This function counts the number of cells that contain numbers within a range. For example,
if you want to count how many students took the exam:

=COUNT(B2:B6)

This will give you the count of cells with exam scores, which is 5.

COUNTIF: If we want to count the number of students who scored above 80, we can use the COUNTIF
function.

=COUNTIF(B2:B6, ">80")

This formula will count the number of exam scores (in column B) that are greater than 80.

MIN: This function returns the smallest number in a range. For example, if you want to find the lowest
exam score:

=MIN(B2:B6)

This will give you the minimum exam score, which is 75.

MAX: This function returns the largest number in a range. For example, if you want to find the highest
exam score:

=MAX(B2:B6)

This will give you the maximum exam score, which is 92.

AVERAGE: This function calculates the average of numbers in a range. For example, if you want to find
the average exam score:

=AVERAGE(B2:B6)

This will give you the average exam score, which is (85 + 90 + 75 + 88 + 92) / 5 = 86.
Illustration 2. (Use SUM, COUNT, MIN, AVERAGE and IF function)

Illustration 2. (Use VLOOKUP function)


Illustration 3.
Date Name Product Rate Qty Total
01-03-2023 Rahul Jacket 2000 1 2000
02-03-2023 Aman Detergent 110 3 330
03-03-2023 Manoj Pair of shoes 4000 6 24000
04-03-2023 Vijay Detergent 110 0
05-03-2023 Anant Detergent 110 0
06-03-2023 Hemant Jacket 2000 5 10000
07-03-2023 Roshan Pair of shoes 4000 1 4000
08-03-2023 Rahul Jacket 2000 2 4000
09-03-2023 Suresh Juices 210 1 210
10-03-2023 Mandeep Jacket 2000 1 2000
06-04-2023 Anant Pair of shoes 4000 0
07-04-2023 Rahul Detergent 110 1 110
08-04-2023 Rahul Juices 210 5 1050
09-04-2023 Sandeep Jacket 2000 1 2000
10-04-2023 Rahul Pair of shoes 4000 2 8000
11-04-2023 Aman Juices 210 0
12-04-2023 Manoj Pair of shoes 4000 1 4000

Perform the following actions on the above data:


• Month wise total Sale
• Month wise Average Sale
• Who visited the most
• Who bought the most
• Most purchased item
• Most sold item
• Most Expensive Item
• Inexpensive Item
• No. of items remained unsold in April

You might also like