0% found this document useful (0 votes)
30 views20 pages

Business Analytics Notes

Business analytics involves using data and statistical methods to analyze business operations, transforming raw data into actionable insights that enhance decision-making and performance. It encompasses various methodologies, including descriptive, predictive, and prescriptive analytics, which help organizations understand past performance, predict future outcomes, and optimize strategies. The field has evolved from early management theories to incorporate advanced techniques like machine learning and data visualization, making it accessible to businesses of all sizes.

Uploaded by

pranjal.pp22536
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views20 pages

Business Analytics Notes

Business analytics involves using data and statistical methods to analyze business operations, transforming raw data into actionable insights that enhance decision-making and performance. It encompasses various methodologies, including descriptive, predictive, and prescriptive analytics, which help organizations understand past performance, predict future outcomes, and optimize strategies. The field has evolved from early management theories to incorporate advanced techniques like machine learning and data visualization, making it accessible to businesses of all sizes.

Uploaded by

pranjal.pp22536
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 1: Introduction to Business Analytics

1.1 Introduction

 In general, 'Business analytics' is the implementation of analytics to business operations


and problems.

 Businesses today generate an overwhelming amount of data from various sources like
social media, customer interactions, financial transactions, and more. This is often
referred to as 'big data'.

 Big data presents both opportunities and challenges for businesses.

 Business analytics is a valuable practice that uses data and statistical methods to analyse
business operations and decisions.

 It comprises of systematic exploration, interpretation, and communication of significant


trends and patterns within data to improve efficiency of operations, drive business
performance, and gain a competitive advantage.

 The primary role of business analytics is to transform raw data into meaningful or
actionable insights.

 This transformation enables businesses to make data-driven decisions that improve


customer experience, optimise processes, enhance profitability, and minimise risks.

 It allows businesses to understand their past performance, predict future outcomes, and
prescribe optimal strategies for success.

 The area of business analytics includes a wide range of methodologies and techniques. It
involves the collection, cleaning, and transformation of data from various sources into
one integrated document.

 Various statistical analysis and data mining techniques are then applied to explore
correlations, patterns, and trends within the data.

 These findings are often visualised through various data visualization tools, which
enables stakeholders to understand complex data easily and effectively.

 Business analytics also leverages machine learning algorithms and predictive models to
forecast future outcomes and make precise predictions. In this historical data is analysed
to identify existing patterns, thereby enabling businesses to anticipate market
opportunities and risks involved.

 These insights empower businesses to proactively devise their strategies accordingly.


 The application of business analytics is not limited to large corporations; it is also
relevant to small and medium-sized (SMEs) firms. SMEs can benefit from leveraging data
to drive their decision-making processes.

 Technological advancements and the availability of user-friendly analytics tools have


made it more accessible for businesses of all sizes to harness the power of data-driven
insights.

1.1-1 History of the term 'Business analytics'

 The term "business analytics" emerged in the late 1990s.

 Its roots can be traced back to earlier studies such as scientific management and related
methodologies. These ideas and practices paved the way for the development of
modern business analytics.

 Modern business analytics now leverages machine learning, advanced data analysis
techniques, and AI to obtain insights from data.

 Figure 1 in the source illustrates the roots of the term 'Business analytics'. These include:

o Scientific Management by F.W. Taylor. Scientific management, especially the ideas


of Taylor, Gantt, and Gilbreths, focused on increasing efficiency through scientific
analysis of work. Taylor believed that even small activity like loading paper sheets
into boxcars could be scientifically planned, saving time and human energy. His
recommendation of using scientific analysis to improve organisational efficiency
and effectiveness can be seen as a precursor to the data-driven approach of
modern business analytics.

o Hawthorne studies by Elton Mayo. The Western Electric Hawthorne studies of


Mayo and Roethlisberger explored the relationship between productivity and
lighting conditions, finding a significant relation to the discipline of business
analytics. They found the mere act of observing and studying individuals in an
experimental setting had a significant impact on their behaviour and performance.
Insights from this experiment prompted practitioners to explore the influence of
various factors on human behaviour and productivity. Drawing on the principles of
the Hawthorne studies, the field of business analytics emerged to systematically
analyse and interpret large amounts of data to gain valuable insights into
organisational performance and drive logical decisions.

o Operations research by George B. Dantzig, Koopmans, von Neumann, and many


others. Pioneers such as Dantzig and others laid the groundwork for business
analytics by developing various algorithms and methodologies. Dantzig is renowned
for developing the simplex algorithm, which revolutionised the field of
optimisation. This algorithm provided a systematic and efficient approach to solve
complex optimization problems, enabling businesses to make better decisions
regarding production planning, resource allocation, and logistics. The simplex
algorithm, along with other optimisation techniques developed by Dantzig and his
contemporaries, formed the basis for applying quantitative analysis to business
problems. By utilising mathematical algorithms and models, researchers were able
to tackle a wide range of challenges including inventory management, production
scheduling, and transportation optimization.

1.1-2 Architectural framework of business analytics

 The architectural framework of business analytics represents the structure and flow of
data from sources, moving through data analysis tools and technologies involved in the
analytics ecosystem.

 Leveraging this framework, organizations can develop a comprehensive and well-


structured approach to harnessing the power of analytics.

 The framework contains four key elements:

1. Data Sources: Data can be obtained from both internal and external sources.
Internal sources refer to information obtained from within the organisation, such
as from the firm's internal data repositories, transactional systems, databases,
Enterprise Resource Planning (ERP) systems, Customer Relationship Management
(CRM) systems, and other records. This provides valuable insights into the
organisation's operations, performance, customer interactions, and more.
External sources refer to publicly available data or data available from other
stakeholders like government agencies, research institutions, industry
associations, social media platforms, market research firms, news outlets,
customer, supplier, manufacturer, industry, media, and logistic partner.

2. Business Data Transformation: To perform analytics, data needs to be pooled


and organised in a structured manner. The data extracted from various sources
goes through stages of data preparation before it can be used for analytics. These
stages are:

 Data Integration: Combining and consolidating data from different


sources into a unified dataset, extracting data from various systems or
files, transforming it into a consistent format, and merging it to create a
comprehensive view.
 Data Cleaning: Identifying and resolving inconsistencies, errors, missing
values, and outliers to ensure data quality and accuracy and make data
reliable for analysis. Techniques include data validation, outlier detection,
and error correction.

 Data Codification: Assigning codes or categories to the data, enabling


easier analysis and comparison, such as coding survey responses or
categorising text data for statistical analysis or sentiment analysis.

 Data Transcription: Organising data into a structured or standardized


format like tables, rows, and columns, ensuring consistent data types and
establishing a clear structure for analysis. Standardizing data enables
easier integration with analytics tools, databases, and external systems.

 Through these stages, both internal and external data can be organised
and transformed into a suitable format for analytics. Structured and
standardized data provides a solid foundation for conducting various
analytical techniques.

3. Platform and Tools: After data transformation, various tools and techniques are
used to perform the three types of analytics. These include programming
languages, statistical tools, business intelligence tools for visualisation, machine
learning, and data mining tools, data integration tools, and Cloud-based analytics
tools. Commonly used platforms and tools include programming languages like R
and Python, statistical tools like MATLAB, STATA, SPSS, R Studio, business
intelligence tools like Microsoft Power BI, Tableau, and QlikView.

4. Business Analytics Applications: These applications involve various analytical


tools such as data mining, OLAP (online analytical processing), query, and reports
that give data-driven suggestions to facilitate managerial decision making.
Examples include Data Mining (exploring large databases to identify companies
with high turnover ratio using techniques like clustering, classification,
algorithms, and association), Queries and OLAP (digging specific information,
analysing extracted data from varied perspectives, and generating visualisations
and reports across various dimensions), and Reporting (creating reports to
highlight identified companies, compare turnover ratios, and include
performance indicators like revenue generated, net income, payroll costs).

1.2 Definition

 Various authors have given different definitions of analytics.


 Jalali & Park (2018) defined it as "application of models, methods, and tools to the
analysis of data to gain insight to make informed decisions".

 Barga et al (2015) defined analytics as "the use of data, information technology,


statistical analysis, quantitative methods, and mathematical computer-based models
and visualization to help decision-makers gain improved insight about stakeholders (such
as customers, suppliers, etc.) and make better, fact-based decisions".

 Fitz-enz & Mattox (2014) defined analytics as a "communications device, bringing


together information from multiple sources to provide an actionable representation of a
current state and a likely future".

 Analytics is a mental framework, a logical progression, and a set of statistical tools.

 Davenport and Harris (2007) referred to analytics as "extensive use of data, statistical
and quantitative analysis, exploratory and predictive models, and fact-based
management to drive decisions and actions".

 Wilder and Ozgur (2015) defined analytics as "the application of processes and
techniques that transform raw data into meaningful information to improve decision
making".

 Boyd (2012) in his study said "analytics is a scientific process of transforming data into
insight for making better decisions".

 Nelson (2017) added that analytics is a "scientific process or discipline of fact-based


problem-solving".

 Based on various definitions, business analytics can be summarised as the application of


methods, models, and tools to data to gain insight and make informed, fact-based
decisions.

1.3 Analysis and analytics

 The terms 'Analysis' and 'Analytics' are often used interchangeably, but there are
significant differences.

 Analysis refers to the process of examining data to get insights, understand existing
patterns, and make logical decisions. It breaks down information, identifies trends, and
makes meaningful inferences about the data. Analysis focuses on answering the "what
and why" of a given problem or situation by investigating historical data or current
observations. Analysis involves applying various techniques such as data visualization,
qualitative reasoning, and other statistical analysis to draw conclusions based on the
available data.
 Analytics encompasses the extensive use of data, statistical methods, algorithms, and
computational tools to extract insights, predict outcomes, and prescribe actions. It
incorporates a more systematic and rigorous approach to decision-making by leveraging
advanced techniques like machine learning, predictive modelling, and optimization.
Analytics is often regarded as forward-looking, aiming to identify patterns and trends in
data that can be used to make predictions and optimize future outcomes.

Basis Analysis Analytics

Meaning Refers to a detailed Refers to a systematic computational


examination of the elements analysis of data or statistics using
or structure of data or mathematics, logic, and science.
information.

Scope It is a wider term. It is a narrow term.

Subset Analysis consists of analytics. Analytics is a subset of analysis.

Steps Data gathering, Data Identifying the problem, finding the Data,
involved validation, Interpretation, Data Filtering, Data Validation, Data
Analysis, Results. Cleaning, Data Visualization, Data Analysis,
Inference, Prediction.

Forecasting It provides required insights It explores the data from the past to make
from the past to understand appropriate decisions in the future.
what happened so far.

Tools Tables, Excel, SPARK, Google R, Python, SAS, Google Analytics, Excel etc..
Fusion tables, Node XL, etc..

 Understanding the distinction is important as it enables organizations and individuals to


effectively utilise these concepts in decision-making processes. By recognizing the
distinctive capabilities and aim of each approach, businesses can exploit the power of
both analysis and analytics to optimize operations, drive innovation, and achieve their
strategic goals.

1.4 Types of analytics

 Analytics is categorised into three main types: Descriptive analytics, Predictive


analytics, and Prescriptive analytics. Each type serves a distinct purpose in extracting
insights and informing decision-making.

1.4-1 Descriptive analytics


 Descriptive analytics is about 'Describing something'. It studies "What has happened?"
or "What is going on?".

 It is a process of summarising or describing an existing situation or data as it is, using


various statistical or business intelligence tools and techniques.

 It is a process of using historical or extant data to identify trends and patterns.

 Descriptive analytics is used to get a better understanding of the current situation.

 This type of analytics is not used to predict trends or draw inferences about the future.

 Descriptive analytics is also known as 'the simplest form of data analytics'.

 The results of descriptive analytics are typically displayed using various visual data
representations like bar, line, and pie charts. These charts provide useful insights into the
data and serve as a foundation for further analysis.

 Examples of descriptive analytics include: Summarising past data such as sales data,
production data, revenue data; Analysing social media statistics of users such as
engagement, penetration, activities like posts, tweets, shares; Reporting general trends
over time like the number of publications over the years or share price over a period;
Collating survey results like election polls, customer satisfaction surveys.

1.4-2 Predictive analytics

 Predictive analytics is concerned with predicting future outcomes based on historical or


current data. It answers "What will happen?" or "What would happen?".

 It is a more advanced method of data analysis which makes use of probabilities to draw
inferences or make predictions for the future.

 This type of analytics is concerned with predicting future value or performance, and is
also known as 'Forward-looking analytics'.

 Like descriptive analytics, predictive analytics uses data mining techniques.

 In addition, it uses various other statistical modelling and machine learning techniques
to foresee the occurrence or likelihood of an event.

 Descriptive analytics acts as a base for predictive analytics.

 Examples of predictive analytics applications include: Predicting customer preferences


based on shopping history; Predicting whether employees will stay or quit; Helping
predict staff and resources requirements in hospitals; Predicting prices of various stocks.

1.4-3 Prescriptive analytics


 Prescriptive analytics is a method that advises the user of all possible actions that could
optimize the overall objective. It answers "How can we make it happen?" or "What
should the business do?".

 Prescriptive analytics uses data generated by descriptive and predictive analytics to


suggest alternative courses of action to maximize efficiency.

 It uses a combination of statistical models, algorithms, and mathematical models etc. to


recommend the best practises to the user.

 It tries to simulate the future considering the set of assumptions provided by the user.

 Unlike predictive analytics which gives possible alternatives or actions, prescriptive


analytics suggests the best or optimal action among a given set of alternatives.

 It applies the gathered information from descriptive and predictive analytics to the
decision making process. Prescriptive analytics is considered more sophisticated than
descriptive and predictive analytics.

 Prescriptive analytics quantifies the repercussions associated with each alternative


considering different future scenarios, helping suggest the best alternative.

 Prescriptive analytics can be invaluable for optimizing operations, growing sales,


managing risk. To operate effectively, the models and algorithms need a solid data
pipeline to ensure the data is up to date and accurate.

 Examples of prescriptive analytics applications include: Helping doctors or healthcare


decision makers by suggesting the best alternative course of action for patients; Helping
marketing firms or salespeople be more precise in customer outreach by suggesting
optimal marketing strategies like how to price products, design advertising campaigns;
Helping transportation industry shippers generate large amounts of data related to
routes, time, and other constraints, suggesting optimal routes for transportation;
Financial firms and traders using historical data of stocks and measures of risk to select
the best or optimal investment option.

 By employing all three types of analytics, organizations can move from understanding
the past (descriptive) to predicting the future (predictive) and prescribing optimal
actions (prescriptive). Descriptive analytics lays the foundation by providing insights into
historical data, predictive analytics helps anticipate future trends, and prescriptive
analytics guides decision-makers to make optimal choices based on data-driven
recommendations. These types of analytics work together to enable organizations to
gain a comprehensive understanding of their operations, make more accurate forecasts,
and drive effective decision-making in a data-driven world.
1.5 Application of analytics

 The application of business analytics is diverse and spans across multiple functional
areas within an organization.

 It can be utilized in marketing to identify target segments, personalise campaigns, and


optimise advertising spend. In operations, it can optimise supply chain management,
improve demand forecasting, and streamline production processes. In finance, it can
support financial planning, risk assessment, and fraud detection. Additionally, business
analytics can aid in human resources by optimizing workforce planning, employee
engagement, and talent management.

Detailed applications are discussed for:

1.5-1 Finance

 The significance of data analytics in finance is growing rapidly, with ample data flooding
the financial services industry over the past five years.

 From an accounting team of a small business to a finance giant, everyone has more
financial data than the knowledge of how to best use it.

 Business analytics enables companies to identify existing trends or problems and make
inferences for the future.

 Finance firms apply business analytics to three major areas: consumer insights,
algorithm trading, and fraud detection.

 Businesses globally use data analytics to improve their internal functions or operations.

 Data analytics has revolutionised the finance industry.

 Some instances are: Data analytics has reduced human error from daily financial
transactions; it enables finance executives to turn structured or unstructured data into
insights that promote better decision making; it helps finance teams gather needed data
to gain a clear view of Key Performance Indicators (KPIs) like revenue generated, net
income, payroll costs; it allows finance teams to scrutinise and comprehend vital
metrics, and detect fraud in revenue turnover.

 Additionally, business analytics has improved stock markets and upgraded investment-
related decision making.

 Finance data analysts often are knowledgeable and proficient in skills related to various
topics like Data mining, Financial analytics, Understanding business models, Financial
forecasting, Creating financial models, Risk management, Big data analytics, Advanced
analytics, Data management, Predictive analytics, Microsoft Excel, Algorithms and
algorithmic trading, Python, Automation, Data science, Business intelligence, Machine
learning, Artificial intelligence, and Real-time data flows.

1.5-2 Marketing

 Marketing and analytics go hand in hand.

 Some important applications of analytics for marketing firms include:

o CRM (Customer Relationship Management): Analytics helps marketing firms


know their customers and understand their taste and preferences, enabling
marketers to understand the decision making of the customer before they
purchase the product, which in turn improves the customer's journey.
o Brand positioning: Using growth statistics and customer base, analytics enables
marketers to position themselves appropriately in the market. For example,
analytics can suggest a niche marketing strategy for a given popular product.

o Optimising prices: Analytics keeps track of competitor prices and inflation rates
to predict the purchasing power of the customer, enabling the marketer to
optimise its prices accordingly and providing suggestive measures to justify
prices.

o Designing campaigns and advertisements: Business analytics helps marketing


firms predict consumer behaviour, improve decision-making, and determine the
ROI of marketing efforts. Data science and analytics provide an in-depth
understanding of the target audience, identifying minor details of behaviour,
preferences, cultural context, and other influencing factors a marketer might
miss.

1.5-3 Human resource

 Analytics offers benefits to HR firms, empowering them to make data-drive decisions to


optimise their operations.

 Benefits include:

o Improving Talent Acquisition: Analytics enables HR managers to make informed


decisions regarding talent acquisition by analysing data from recruitment
channels, candidate profiles, and hiring outcomes. HR teams can identify
effective sourcing strategies, optimize job postings, and better select candidates,
leading to improved recruitment efficiency and ability to attract top talent.
o Preventing Workplace Misconduct: Leveraging analytics, HR firms can identify
trends or common occurrences related to misconduct by analysing employee
data and monitoring indicators such as complaints, performance issues, or
disciplinary actions. HR teams can proactively identify potential misconduct risks
and take preventive measures to maintain a positive and compliant work
environment.

o Staff Retention and Turnover Management: HR analytics allows organizations to


calculate crucial metrics like staff turnover rate. By analysing historical turnover
data and identifying patterns or factors contributing to high turnover, HR
managers can implement strategies to increase staff retention, address
underlying issues, improve employee engagement, and design targeted retention
programs based on data-driven insights.

o Measuring Key Performance Indicators (KPIs): Analytics enables HR firms to


measure and track key performance indicators of employees. By leveraging data
on individual and team performance, HR managers can assess productivity, goal
achievement, and other relevant metrics. This helps evaluate the effectiveness of
performance management systems, identify top performers, and align
performance with organizational goals. Additionally, estimating the Return on
Investment (ROI) of each employee becomes possible.

o Identifying Skill Gaps: HR analytics plays a crucial role in identifying skill gaps
within the organization by utilising data visualization and automation tools. HR
teams can assess skills, highlight areas of strength and weakness, identify areas
requiring upskilling or training, and develop targeted learning programs to bridge
gaps.

o Creating an Engaging Work Environment: Analytics provides insights about the


workforce, including types of individuals, cultural dynamics, and work
preferences by analysing employee data. HR teams can better understand
employee needs, preferences, and engagement levels. This information can be
used to design initiatives that foster a more engaging work environment, tailor
employee experiences, and promote a positive company culture.

o Identifying Upskilling Opportunities: HR analytics allows HR managers to collect


data on areas where employees might need upskilling. By analyzing training data,
performance metrics, and individual development plans, HR teams can identify
gaps in skills and knowledge, enabling them to offer targeted upskilling
opportunities to employees, ensuring they have the most up-to-date training and
stay competitive.
1.5-4 Healthcare

 Analytics plays a significant role in managing patients in the healthcare sector. It broadly
covers areas like:

o Diagnosis and treatment: A significant application is medical imaging used to


diagnose disease and prescribe appropriate treatment. Algorithms interpret
MRIs, X-rays etc., identify patterns or detect tumors or organ anomalies, and can
diagnose irregular heartbeats faster than a cardiologist.

o Drug discovery: Analytics enables pharmaceutical companies to discover drugs


for diseases that are hard to detect. For example, an AI company named
Benevolent AI created an artificial smart device that can create medicines for
dreadful diseases.

o Disease prevention: On the basis of genetics and past history, analytics can
recognise or predict the probable issue before they actually arise. By recognising
early signs of a disease, analytics can help prevent it from becoming incurable.

o Miscellaneous: Other applications include post-care monitoring, hospital


operations, prescription auditing, predicting disease outcome, tracking patient
prescriptions, and identifying risk of substance abuse.

 The application of analytics is not limited to the abovementioned areas; it has much
wider applicability.

1.6 Case study on silverwind company

 In the expanded model, Silverwind is not only guaranteeing the performance of the
product but also helping their customers learn how to best use it.

 Descriptive analytics helped Silverwind drive up the sale of windmills and their parts by
looking at transaction data.

 Predictive analytics helped Silverwind predict the performance level of a given


equipment under the contractual model, specifically predicting what conditions would
cause the equipment to perform up to the mark or underperform.

 Prescriptive analytics helped Silverwind prescribe alternative ways of operating the


windmill to its customers to maximise the value derived.

1.7 Summary
 The concept of 'Business' comes from the sixteenth century, meaning any commercial
activity of making one's living or making money by producing or buying and selling
products.

 The history of the term 'business analytics' extends back to the late 1990s and has its
roots in scientific management, especially the ideas of Taylor, Gantt, and Gilbreths, the
Western Electric Hawthorne studies of Mayo and Roethlisberger, and operations
research by Dantzig, Koopmans, von Neumann, and many others.

 Various definitions of analytics exist, generally describing it as the application of models,


methods, and tools to analyse data to gain insight and make informed, fact-based
decisions.

 The architectural framework of business analytics contains four key elements: Data
Sources, Business Data Transformation, Platform and Tools, and Business Analytics
Applications.

 The terms 'Analysis' and 'Analytics' are often used interchangeably, but there are
significant differences.

 Analytics is of three types: Descriptive, Predictive, and Prescriptive.

 Descriptive analytics is a study of 'What has happened?' or 'What is going on?'.

 Predictive analytics is concerned with predicting future outcomes based on historical or


current data.

 Prescriptive analytics is a method that advises the user on all possible actions to
optimize the overall objective.

 Business analytics is applicable to numerous areas. The most significant areas mentioned
are Finance, Human resource, Marketing, and Health care.

UNIT 3
4.1 INTRODUCTION TO R

 R is a programming language designed for statistical computing and graphics.

 It is open-source software, freely available under the GNU General Public License.

 R can be used on various operating systems, including Windows and macOS.


 R provides many graphical and statistical tools, such as linear and nonlinear modelling,
classification, classical statistical tests, and clustering.

 It also offers integrated software facilities for data manipulation, data visualisation, data
storage, and handling.

 The 'R Environment' signifies an orderly structured system.

 R is often viewed as more than just a statistical system; it's an environment that can be
easily extended through packages.

 It initially came with about eight packages, but many more can be made available
through CRAN (Comprehensive R Archive Network).

4.2 INTRODUCTION TO R STUDIO

 RStudio is an integrated development environment (IDE) specifically designed to work


with R language.

 It provides a user-friendly interface.

 RStudio makes it easier for statisticians and data scientists to work with R.

 RStudio complements R by providing an efficient working environment for project


organisation, simplifying code writing, package management, and data visualisation.

 RStudio is broadly divided into four panes: Source Editor, Console, Environment, and
Plots.

o Source Editor: Located in the top left corner, users can open, edit, and execute
various code-related or data files. Additional opened files will be added as a new
tab.

o Console: Located at the bottom left, this is the command line of RStudio where
codes are executed immediately. It is the input window of the RStudio.

o Environment: Located in the top right, this pane shows the various objects such
as data frames, arrays, and variables that a user has in their workspace. It also
displays the values for objects.

o Plots: Located at the bottom right, this is the output window where graphs and
plots created in RStudio are displayed. This pane also has tabs for Files, Packages,
Help, Viewer, and Presentation.

4.2.1 Understanding the Distinction: R vs. R Studio


 'R' and 'RStudio' are related terms often used interchangeably, but they serve different
purposes.

 R is the programming language itself, used for statistical computing and graphics. It is an
independent platform that can be used on any operating system that understands R
coding.

 RStudio is an interface or an Integrated Development Environment (IDE) that


understands, writes, and speaks the R language. It is designed specifically for the R
language.

 R is described as less elaborate than RStudio in terms of the elaborative process. RStudio
is more elaborate and provides a more user-friendly environment.

4.3 ADVANTAGES OF R R offers several advantages:

 Statistical Excellence: R is a statistical language widely used by statisticians for statistical


computation.

 Open-source: It can be used freely without paying any fee.

 Extensive Libraries: R has a large collection of libraries offering numerous applications,


including graphical libraries.

 Cross-platform Support: R language is independent of machine, facilitating cross-


platform operations.

 Versatile Data Handling: R can perform operations on various data structures like arrays,
matrices, vectors, and other data objects.

 Data Wrangling: R can collect data, perform data cleansing (detecting and
removing/correcting inaccurate or corrupt records), and convert raw data into a desired
format.

 Powerful Graphics: It has a large collection of graphical libraries that produce high-
quality static or dynamic graphs.

 Highly Active Community: Being an open-source language, R has a large and active
community that produces many ideas and helps users.

 Parallel and Distributed Computing: R supports parallel computing (performing tasks


simultaneously across multiple processors) and distributed computing (connecting
multiple computers to act together).

 Interpreter based execution: R is an interpreted language.


 Compatible with other Programming Languages: R works perfectly with other
languages like Python, Java, etc..

 Machine learning capabilities: R can be useful for machine learning, sentiment and
model prediction analysis.

 Interaction with Databases: R consists of packages enabling interaction with databases


like Oracle, RMySQL, Open Database Connectivity Protocol, etc..

4.4 GETTING STARTED WITH R To work with R, users must install the following software and
packages:

 R environment: Installation steps for Windows involve going to cran.r-project.org,


clicking on "Download R for Windows", and following the download process.

 RStudio: Installation steps involve going to rstudio.com/download, scrolling down, and


clicking on the download option for RStudio Desktop Open-Source License.

4.4.3 Installing R Packages

 A Package is a collection of codes clubbed together to provide specific features or


functionality.

 Packages extend the capabilities of the software or programming language.

 In a package, pre-written functions can be imported into a program, saving the user from
writing the entire code.

 The term 'package' is distinct from 'code' (an initial command or instruction) and 'library'
(a collection of packages).

 Examples of R packages include ggplot2, shiny, reshape2, and dplyr.

 In RStudio, a package can be installed using codes in the Console (e.g.,


install.packages("package name") and then library(package name)) or by using the 'Plots'
pane (specifically the 'Packages' tab).

4.4.4 Importing Data in RStudio

 To import data, click on the 'Environment' tab on the upper right side of RStudio.

 Click on the 'Import Dataset' tab.

 A drop-down menu will appear showing multiple file type options (Excel, SPSS, SAS,
stata, etc.).

 Select the option according to the file type.


 Select the file, browse to its location, and click 'Import'.

4.5 DATA STRUCTURES IN R

 Data structures are objects used to store data in an organised manner.

 They facilitate data analysis operations and help in manipulating data efficiently.

 R has various types of data structures, some of which require the same type of data
(homogeneous) while others accept multiple data types (heterogeneous).

 The main data structures in R are Vectors, Matrices, Arrays, Lists, Factors, and Data
Frames.

 4.5.1 Vector

o A vector is a homogeneous data structure, meaning it accepts data of the same


kind.

o Data can be integer, numeric, logical, character, or complex.

o A vector will only allow one type of data.

o Examples are shown for creating vectors with numeric, logical, and character
data. Attempting to create a vector with mixed data types results in an error.

 4.5.2 Matrix

o A matrix is a two-dimensional data structure.

o It is homogeneous in nature, allowing the same type of data.

o If elements of multiple types are forced into a matrix, it leads to coercion.

o Matrices can be created using the matrix() syntax, specifying data, number of
rows (nrow), number of columns (ncol), and arrangement direction (byrow).

o Examples show creating matrices with elements arranged row-wise


(byrow=TRUE) and column-wise (byrow=FALSE).

 4.5.3 Array

o An array is a data structure designed to hold multiple items of the same type
together.

o In the context of R programming, arrays are objects capable of storing data with
two or more dimensions.
o Arrays can store values having only a similar kind of data type, thus they are
homogeneous.

o An array consists of Array Index (location of element), Array Element (items


stored), and Array Length (number of elements).

o An example demonstrates creating a 3x3x2 array using the array() syntax,


combining two vectors.

 4.5.4 Lists

o In R, lists act as containers.

o Unlike atomic vectors, the contents of a list are not restricted to a single mode
and can encompass any mixture of data types.

o Lists are considered heterogeneous.

o The elements of a list can be of any type of R object, even lists containing further
lists.

o The syntax to create a list is list().

o Examples show creating lists with mixed data types (numeric, character, logical,
complex) and lists containing named elements.

 4.5.5 Factors

o Factors are used to categorise unique values in columns, such as "Male",


"Female", "TRUE", "FALSE", etc., and store them as levels.

o They are useful in columns that have a limited number of unique values.

o Factors can store both strings and integers.

o Factors can be created using the factor() function, which takes vectors as inputs.

o An example shows creating a factor from a vector of genders and children labels,
displaying the unique categories as 'Levels'.

 4.5.6 Data Frame

o A data frame is a structure similar to a two-dimensional array, representing a


table with columns and rows.

o In a data frame, each column holds values representing a specific variable, while
each row comprises a collection of values sourced from a corresponding column.
o A data frame is a special type of list where every element has the same length.

o Characteristics of a data frame include:

 Column labels cannot be empty.

 Row labels should be unique.

 It can store multiple types of data, including factor, numeric, or character


types, making it heterogeneous.

 The number of items in a data frame should be the same.

o A data frame is created by importing data (e.g., using read.csv() or read.table())


or using the data.frame() function. The number of rows and columns can be
calculated using nrow() and ncol().

o An example shows creating a data frame for employee information (id, name,
department) using the data.frame() function.

o To extract a specific column from a data frame, the syntax


data.frame$column_name is used.

4.6 APPLY TO BUSINESS CASE STUDY ON XYZ RETAIL COMPANY

 Company XYZ, a retail company, wants to analyse customer purchase data to gain
insights into customer behaviour and optimise marketing strategies.

 They collect transaction data including date, customer ID, product ID, average quantity
purchased, and average price.

 To perform the analysis, the company decides to use R and its data structures.

 The first step is to import the customer purchase data into R using functions like
read.csv() or read.table() to read data from CSV or text files into a data frame. The
example shows loading the data into a data frame named purchase_data.

 Exploratory Data Analysis (EDA): The company wants to understand the structure of the
data and explore some basic statistics using R functions and data structures. Examples
include checking the dimensions (dim()), viewing the first few rows (head()), and
calculating summary statistics (summary()).

 Data Manipulation: The company wants to calculate total sales and average quantity
purchased for each product. R's data manipulation capabilities like subsetting, grouping,
and aggregation can be used. An example shows subsetting the data frame to select
'product_id', 'average_price', and 'average_quantity'.
 Data Visualization: To gain insights, the company wants to create visualisations using R
packages and data structures. The ggplot2 package can be used to create plots.
Examples show creating histograms of 'average_price' and 'average_quantity'.

 Insights and Decision Making: Based on the analysis and visualisations, the company
can gain insights into customer purchasing patterns and make data-driven decisions. This
can help identify top-selling products, understand purchasing patterns, and strategise
marketing efforts.

You might also like