Business Analytics Notes
Business Analytics Notes
1.1 Introduction
Businesses today generate an overwhelming amount of data from various sources like
social media, customer interactions, financial transactions, and more. This is often
referred to as 'big data'.
Business analytics is a valuable practice that uses data and statistical methods to analyse
business operations and decisions.
The primary role of business analytics is to transform raw data into meaningful or
actionable insights.
It allows businesses to understand their past performance, predict future outcomes, and
prescribe optimal strategies for success.
The area of business analytics includes a wide range of methodologies and techniques. It
involves the collection, cleaning, and transformation of data from various sources into
one integrated document.
Various statistical analysis and data mining techniques are then applied to explore
correlations, patterns, and trends within the data.
These findings are often visualised through various data visualization tools, which
enables stakeholders to understand complex data easily and effectively.
Business analytics also leverages machine learning algorithms and predictive models to
forecast future outcomes and make precise predictions. In this historical data is analysed
to identify existing patterns, thereby enabling businesses to anticipate market
opportunities and risks involved.
Its roots can be traced back to earlier studies such as scientific management and related
methodologies. These ideas and practices paved the way for the development of
modern business analytics.
Modern business analytics now leverages machine learning, advanced data analysis
techniques, and AI to obtain insights from data.
Figure 1 in the source illustrates the roots of the term 'Business analytics'. These include:
The architectural framework of business analytics represents the structure and flow of
data from sources, moving through data analysis tools and technologies involved in the
analytics ecosystem.
1. Data Sources: Data can be obtained from both internal and external sources.
Internal sources refer to information obtained from within the organisation, such
as from the firm's internal data repositories, transactional systems, databases,
Enterprise Resource Planning (ERP) systems, Customer Relationship Management
(CRM) systems, and other records. This provides valuable insights into the
organisation's operations, performance, customer interactions, and more.
External sources refer to publicly available data or data available from other
stakeholders like government agencies, research institutions, industry
associations, social media platforms, market research firms, news outlets,
customer, supplier, manufacturer, industry, media, and logistic partner.
Through these stages, both internal and external data can be organised
and transformed into a suitable format for analytics. Structured and
standardized data provides a solid foundation for conducting various
analytical techniques.
3. Platform and Tools: After data transformation, various tools and techniques are
used to perform the three types of analytics. These include programming
languages, statistical tools, business intelligence tools for visualisation, machine
learning, and data mining tools, data integration tools, and Cloud-based analytics
tools. Commonly used platforms and tools include programming languages like R
and Python, statistical tools like MATLAB, STATA, SPSS, R Studio, business
intelligence tools like Microsoft Power BI, Tableau, and QlikView.
1.2 Definition
Davenport and Harris (2007) referred to analytics as "extensive use of data, statistical
and quantitative analysis, exploratory and predictive models, and fact-based
management to drive decisions and actions".
Wilder and Ozgur (2015) defined analytics as "the application of processes and
techniques that transform raw data into meaningful information to improve decision
making".
Boyd (2012) in his study said "analytics is a scientific process of transforming data into
insight for making better decisions".
The terms 'Analysis' and 'Analytics' are often used interchangeably, but there are
significant differences.
Analysis refers to the process of examining data to get insights, understand existing
patterns, and make logical decisions. It breaks down information, identifies trends, and
makes meaningful inferences about the data. Analysis focuses on answering the "what
and why" of a given problem or situation by investigating historical data or current
observations. Analysis involves applying various techniques such as data visualization,
qualitative reasoning, and other statistical analysis to draw conclusions based on the
available data.
Analytics encompasses the extensive use of data, statistical methods, algorithms, and
computational tools to extract insights, predict outcomes, and prescribe actions. It
incorporates a more systematic and rigorous approach to decision-making by leveraging
advanced techniques like machine learning, predictive modelling, and optimization.
Analytics is often regarded as forward-looking, aiming to identify patterns and trends in
data that can be used to make predictions and optimize future outcomes.
Steps Data gathering, Data Identifying the problem, finding the Data,
involved validation, Interpretation, Data Filtering, Data Validation, Data
Analysis, Results. Cleaning, Data Visualization, Data Analysis,
Inference, Prediction.
Forecasting It provides required insights It explores the data from the past to make
from the past to understand appropriate decisions in the future.
what happened so far.
Tools Tables, Excel, SPARK, Google R, Python, SAS, Google Analytics, Excel etc..
Fusion tables, Node XL, etc..
This type of analytics is not used to predict trends or draw inferences about the future.
The results of descriptive analytics are typically displayed using various visual data
representations like bar, line, and pie charts. These charts provide useful insights into the
data and serve as a foundation for further analysis.
Examples of descriptive analytics include: Summarising past data such as sales data,
production data, revenue data; Analysing social media statistics of users such as
engagement, penetration, activities like posts, tweets, shares; Reporting general trends
over time like the number of publications over the years or share price over a period;
Collating survey results like election polls, customer satisfaction surveys.
It is a more advanced method of data analysis which makes use of probabilities to draw
inferences or make predictions for the future.
This type of analytics is concerned with predicting future value or performance, and is
also known as 'Forward-looking analytics'.
In addition, it uses various other statistical modelling and machine learning techniques
to foresee the occurrence or likelihood of an event.
It tries to simulate the future considering the set of assumptions provided by the user.
It applies the gathered information from descriptive and predictive analytics to the
decision making process. Prescriptive analytics is considered more sophisticated than
descriptive and predictive analytics.
By employing all three types of analytics, organizations can move from understanding
the past (descriptive) to predicting the future (predictive) and prescribing optimal
actions (prescriptive). Descriptive analytics lays the foundation by providing insights into
historical data, predictive analytics helps anticipate future trends, and prescriptive
analytics guides decision-makers to make optimal choices based on data-driven
recommendations. These types of analytics work together to enable organizations to
gain a comprehensive understanding of their operations, make more accurate forecasts,
and drive effective decision-making in a data-driven world.
1.5 Application of analytics
The application of business analytics is diverse and spans across multiple functional
areas within an organization.
1.5-1 Finance
The significance of data analytics in finance is growing rapidly, with ample data flooding
the financial services industry over the past five years.
From an accounting team of a small business to a finance giant, everyone has more
financial data than the knowledge of how to best use it.
Business analytics enables companies to identify existing trends or problems and make
inferences for the future.
Finance firms apply business analytics to three major areas: consumer insights,
algorithm trading, and fraud detection.
Businesses globally use data analytics to improve their internal functions or operations.
Some instances are: Data analytics has reduced human error from daily financial
transactions; it enables finance executives to turn structured or unstructured data into
insights that promote better decision making; it helps finance teams gather needed data
to gain a clear view of Key Performance Indicators (KPIs) like revenue generated, net
income, payroll costs; it allows finance teams to scrutinise and comprehend vital
metrics, and detect fraud in revenue turnover.
Additionally, business analytics has improved stock markets and upgraded investment-
related decision making.
Finance data analysts often are knowledgeable and proficient in skills related to various
topics like Data mining, Financial analytics, Understanding business models, Financial
forecasting, Creating financial models, Risk management, Big data analytics, Advanced
analytics, Data management, Predictive analytics, Microsoft Excel, Algorithms and
algorithmic trading, Python, Automation, Data science, Business intelligence, Machine
learning, Artificial intelligence, and Real-time data flows.
1.5-2 Marketing
o Optimising prices: Analytics keeps track of competitor prices and inflation rates
to predict the purchasing power of the customer, enabling the marketer to
optimise its prices accordingly and providing suggestive measures to justify
prices.
Benefits include:
o Identifying Skill Gaps: HR analytics plays a crucial role in identifying skill gaps
within the organization by utilising data visualization and automation tools. HR
teams can assess skills, highlight areas of strength and weakness, identify areas
requiring upskilling or training, and develop targeted learning programs to bridge
gaps.
Analytics plays a significant role in managing patients in the healthcare sector. It broadly
covers areas like:
o Disease prevention: On the basis of genetics and past history, analytics can
recognise or predict the probable issue before they actually arise. By recognising
early signs of a disease, analytics can help prevent it from becoming incurable.
The application of analytics is not limited to the abovementioned areas; it has much
wider applicability.
In the expanded model, Silverwind is not only guaranteeing the performance of the
product but also helping their customers learn how to best use it.
Descriptive analytics helped Silverwind drive up the sale of windmills and their parts by
looking at transaction data.
1.7 Summary
The concept of 'Business' comes from the sixteenth century, meaning any commercial
activity of making one's living or making money by producing or buying and selling
products.
The history of the term 'business analytics' extends back to the late 1990s and has its
roots in scientific management, especially the ideas of Taylor, Gantt, and Gilbreths, the
Western Electric Hawthorne studies of Mayo and Roethlisberger, and operations
research by Dantzig, Koopmans, von Neumann, and many others.
The architectural framework of business analytics contains four key elements: Data
Sources, Business Data Transformation, Platform and Tools, and Business Analytics
Applications.
The terms 'Analysis' and 'Analytics' are often used interchangeably, but there are
significant differences.
Prescriptive analytics is a method that advises the user on all possible actions to
optimize the overall objective.
Business analytics is applicable to numerous areas. The most significant areas mentioned
are Finance, Human resource, Marketing, and Health care.
UNIT 3
4.1 INTRODUCTION TO R
It is open-source software, freely available under the GNU General Public License.
It also offers integrated software facilities for data manipulation, data visualisation, data
storage, and handling.
R is often viewed as more than just a statistical system; it's an environment that can be
easily extended through packages.
It initially came with about eight packages, but many more can be made available
through CRAN (Comprehensive R Archive Network).
RStudio makes it easier for statisticians and data scientists to work with R.
RStudio is broadly divided into four panes: Source Editor, Console, Environment, and
Plots.
o Source Editor: Located in the top left corner, users can open, edit, and execute
various code-related or data files. Additional opened files will be added as a new
tab.
o Console: Located at the bottom left, this is the command line of RStudio where
codes are executed immediately. It is the input window of the RStudio.
o Environment: Located in the top right, this pane shows the various objects such
as data frames, arrays, and variables that a user has in their workspace. It also
displays the values for objects.
o Plots: Located at the bottom right, this is the output window where graphs and
plots created in RStudio are displayed. This pane also has tabs for Files, Packages,
Help, Viewer, and Presentation.
R is the programming language itself, used for statistical computing and graphics. It is an
independent platform that can be used on any operating system that understands R
coding.
R is described as less elaborate than RStudio in terms of the elaborative process. RStudio
is more elaborate and provides a more user-friendly environment.
Versatile Data Handling: R can perform operations on various data structures like arrays,
matrices, vectors, and other data objects.
Data Wrangling: R can collect data, perform data cleansing (detecting and
removing/correcting inaccurate or corrupt records), and convert raw data into a desired
format.
Powerful Graphics: It has a large collection of graphical libraries that produce high-
quality static or dynamic graphs.
Highly Active Community: Being an open-source language, R has a large and active
community that produces many ideas and helps users.
Machine learning capabilities: R can be useful for machine learning, sentiment and
model prediction analysis.
4.4 GETTING STARTED WITH R To work with R, users must install the following software and
packages:
In a package, pre-written functions can be imported into a program, saving the user from
writing the entire code.
The term 'package' is distinct from 'code' (an initial command or instruction) and 'library'
(a collection of packages).
To import data, click on the 'Environment' tab on the upper right side of RStudio.
A drop-down menu will appear showing multiple file type options (Excel, SPSS, SAS,
stata, etc.).
They facilitate data analysis operations and help in manipulating data efficiently.
R has various types of data structures, some of which require the same type of data
(homogeneous) while others accept multiple data types (heterogeneous).
The main data structures in R are Vectors, Matrices, Arrays, Lists, Factors, and Data
Frames.
4.5.1 Vector
o Examples are shown for creating vectors with numeric, logical, and character
data. Attempting to create a vector with mixed data types results in an error.
4.5.2 Matrix
o Matrices can be created using the matrix() syntax, specifying data, number of
rows (nrow), number of columns (ncol), and arrangement direction (byrow).
4.5.3 Array
o An array is a data structure designed to hold multiple items of the same type
together.
o In the context of R programming, arrays are objects capable of storing data with
two or more dimensions.
o Arrays can store values having only a similar kind of data type, thus they are
homogeneous.
4.5.4 Lists
o Unlike atomic vectors, the contents of a list are not restricted to a single mode
and can encompass any mixture of data types.
o The elements of a list can be of any type of R object, even lists containing further
lists.
o Examples show creating lists with mixed data types (numeric, character, logical,
complex) and lists containing named elements.
4.5.5 Factors
o They are useful in columns that have a limited number of unique values.
o Factors can be created using the factor() function, which takes vectors as inputs.
o An example shows creating a factor from a vector of genders and children labels,
displaying the unique categories as 'Levels'.
o In a data frame, each column holds values representing a specific variable, while
each row comprises a collection of values sourced from a corresponding column.
o A data frame is a special type of list where every element has the same length.
o An example shows creating a data frame for employee information (id, name,
department) using the data.frame() function.
Company XYZ, a retail company, wants to analyse customer purchase data to gain
insights into customer behaviour and optimise marketing strategies.
They collect transaction data including date, customer ID, product ID, average quantity
purchased, and average price.
To perform the analysis, the company decides to use R and its data structures.
The first step is to import the customer purchase data into R using functions like
read.csv() or read.table() to read data from CSV or text files into a data frame. The
example shows loading the data into a data frame named purchase_data.
Exploratory Data Analysis (EDA): The company wants to understand the structure of the
data and explore some basic statistics using R functions and data structures. Examples
include checking the dimensions (dim()), viewing the first few rows (head()), and
calculating summary statistics (summary()).
Data Manipulation: The company wants to calculate total sales and average quantity
purchased for each product. R's data manipulation capabilities like subsetting, grouping,
and aggregation can be used. An example shows subsetting the data frame to select
'product_id', 'average_price', and 'average_quantity'.
Data Visualization: To gain insights, the company wants to create visualisations using R
packages and data structures. The ggplot2 package can be used to create plots.
Examples show creating histograms of 'average_price' and 'average_quantity'.
Insights and Decision Making: Based on the analysis and visualisations, the company
can gain insights into customer purchasing patterns and make data-driven decisions. This
can help identify top-selling products, understand purchasing patterns, and strategise
marketing efforts.