MODULE 7:
INTRODUCTION TO
DATA MINING
FUNDAMENTALS OF BUSINESS ANALYTICS
Learning Objectives:
At the end of this module, you should be able to
1. Define data mining
2. Know the importance of Data Mining
3. Identify some common approaches used in data mining
4. Understand the process in Data Mining
5. Identify the benefits of Data Mining
6. Identify the different tools used in data mining
Data Mining (Predictive Analytics)
Definition:
Non-trivial extraction of implicit, previously unknown and potentially
useful information from data
Exploration & analysis, by automatic or semi-automatic means, of
large quantities of data in order to discover meaningful patterns
Data Mining is about explaining the past and predicting the future
by means of data analysis
Data Mining
Data mining is the process of discovering insights from data.
Data mining describes the next step of the analysis and involves
a search of the data to identify patterns and meaning.
Why is Data Mining important?
Data mining is a crucial component of successful analytics
initiatives in organizations. The information it generates can
be used in business intelligence (BI) and advanced
analytics applications that involve analysis of historical
data, as well as real-time analytics applications that
examine streaming data as it's created or collected.
Why is Data Mining important?
Effective data mining aids in various aspects of planning business strategies and
managing operations. That includes customer-facing functions such as marketing,
advertising, sales and customer support, plus manufacturing, supply chain
management, finance and HR.
Data mining supports fraud detection, risk management, cybersecurity planning
and many other critical business use cases. It also plays an important role in
healthcare, government, scientific research, mathematics, sports and more.
Approaches in Data Mining
Data Exploration and Reduction. This often involves identifying
groups in which the elements of the groups are in some way
similar. This approach is often used to understand differences
among customers and segment them into homogenous groups.
Classification. Classification is the process of analyzing data to
predict how to classify a new data element. An example of
classification is spam filtering in an e-mail client
Approaches in Data Mining
Association. Association is the process of analyzing databases to
identify natural associations among variables and create rules for
target marketing or buying recommendations
Cause-and-effect modeling. Cause-and-effect modeling is the
process of developing analytic models to describe the relationship
between metrics that drive business performance—for instance,
profitability, customer satisfaction, or employee satisfaction.
Primary Stages in Data Mining Process
Data gathering. Relevant data for an analytics application is identified and
assembled. The data may be located in different source systems, a data warehouse
or a data lake, an increasingly common repository in big data environments that
contain a mix of structured and unstructured data. External data sources may also
be used. Wherever the data comes from, a data scientist often moves it to a data
lake for the remaining steps in the process.
Data preparation. This stage includes a set of steps to get the data ready to be
mined. It starts with data exploration, profiling and pre-processing, followed by data
cleansing work to fix errors and other data quality issues. Data transformation is also
done to make data sets consistent, unless a data scientist is looking to analyze
unfiltered raw data for a particular application.
Primary Stages in Data Mining Process
Mining the data. Once the data is prepared, a data scientist chooses the
appropriate data mining technique and then implements one or more algorithms
to do the mining. In machine learning applications, the algorithms typically must be
trained on sample data sets to look for the information being sought before they're
run against the full set of data.
Data analysis and interpretation. The data mining results are used to create
analytical models that can help drive decision-making and other business actions.
The data scientist or another member of a data science team also must
communicate the findings to business executives and users, often through data
visualization and the use of data storytelling techniques.
Benefits of Data Mining
More effective marketing and sales.
Better customer service
Improved supply chain management.
Increased production uptime.
Stronger risk management
Lower costs
lead to higher revenue and profits
Data Mining Tools and Software
Data Mining tools can be defined as software programs that help in the framing and execution of
data mining techniques. This is done to create data models (Data models are abstract models which
organize elements of data and standardize how they relate to one another and to the properties of
real-world entities) and test them as well.
Data mining tools are available from a large number of vendors, typically as part of software
platforms that also include other types of data science and advanced analytics tools.
Key features provided by data mining software include data preparation capabilities, built-in
algorithms, predictive modeling support, a GUI-based development environment, and tools for
deploying models and scoring how they perform.
References:
Evans, J. (2016). Business Analytics (2nd ed.). Pearson.
Stedman, C. (n.d.). Data Mining. Https://www.Techtarget.com. Retrieved
January 20, 2023, from
ttps://www.techtarget.com/searchbusinessanalytics/definition/data-mining