0% found this document useful (0 votes)
236 views7 pages

Data Mining and Business Analytics

The document discusses predictive analytics, emphasizing its role in predicting future trends and behaviors using measurable variables known as predictors. It outlines the knowledge discovery process in data mining and highlights various tools, including open-source options like R, Weka, Orange, and KNIME, for performing predictive data analysis. The paper aims to enhance understanding of predictive analytics to facilitate quick decision-making across different sectors.

Uploaded by

rnjagi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views7 pages

Data Mining and Business Analytics

The document discusses predictive analytics, emphasizing its role in predicting future trends and behaviors using measurable variables known as predictors. It outlines the knowledge discovery process in data mining and highlights various tools, including open-source options like R, Weka, Orange, and KNIME, for performing predictive data analysis. The paper aims to enhance understanding of predictive analytics to facilitate quick decision-making across different sectors.

Uploaded by

rnjagi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Finding predictive information in large databases

11-19-2022
Abstract
Predictive analytics is the branch of data mining concerned with the prediction of future
probabilities and trends. The central element of predictive analytics is the predictor, a variable
that can be measured for an individual or other entity to predict future behavior. For example, an
insurance company is likely to take into account potential driving safety predictors such as age,
gender, and driving record when issuing car insurance policies.

Performing Predictive Data Analytics on huge data sets will help us in quick Decision Making
forecasting on the results obtained on live or sample data. Data mining, the extraction of hidden
predictive information from large databases, is a powerful new technology with great potential to
help companies or individuals to focus on the most important information in their data
warehouses. Predictive data analytics can be performed in various areas such as medical,
agriculture, behavior prediction of kids, behavior of a customer in a particular business etc. In
this aspect the paper elaborates on the tools available to do perform predictive data analytics and
also introduce the importance of data mining. Predictive data analysis is done using variables as
attributes known as predictors.

Scope of the project


This paper helps us in specifying how to do Predictive Data Analytics in data mining using
various tools. There are various Open Source Tools which help us in performing Predictive
Analytics such as R Studio, Weka, KNIME etc. This paper also lists various predictive analytic
tools and specify there features and usage. A comparison also can be made or decision can be
taken by the reader to use a specific tool based on the requirement.

The main scope is to enhance the study of predictive data analysis and provide the necessary
help in quick decision making in any of the important area.

Data mining derives its name from the similarities between searching for valuable business
information in a large database — for example, finding linked products in gigabytes of store
scanner data — and mining a mountain for a vein of valuable ore. Both processes require either
sifting through an immense amount of material, or intelligently probing it to find exactly where
the value resides. Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:

1
Data mining for business analytics
A. Automated prediction of trends and behaviors. Data mining automates the process of
finding predictive information in large databases. Questions that traditionally required
extensive hands-on analysis can now be answered directly from the data — quickly. A
typical example of a predictive problem is targeted marketing. Data mining uses data on
past promotional mailings to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include forecasting bankruptcy
and other forms of default, and identifying segments of a population likely to respond
similarly to given events.
B. Automated discovery of previously unknown patterns. Data mining tools sweep through
databases and identify previously hidden patterns in one step. An example of pattern
discovery is the analysis of retail sales data to identify seemingly unrelated products that
are often purchased together. Other pattern discovery problems include detecting
fraudulent credit card transactions and identifying anomalous data that could represent
data entry keying error.

KNOWLEDGE DISCOVERY PROCESS


Knowledge Discovery in Databases (KDD) is an automatic, exploratory analysis and modeling
of large data repositories. KDD is the organized process of identifying valid, novel, useful, and
understandable patterns from large and complex data sets.

The knowledge discovery process consists of six stages:

 Data Selection
 Cleaning
 Enrichment
 Coding
 Data mining
 Reporting

TOOLS FOR PREDICTIVE DATA ANALYSIS


Data Mining Tools
There are various effective software tools for Data Mining that can help to find the relationships,
clusters, patterns, categorizing, summarizing, etc. from the huge data sets. Such data mining

2
Data mining for business analytics
tools can help one to take most accurate decisions which come out profitable for their business.
Categories of Data Mining Tools

There are many tools used for Data Mining. They are broadly classified into three categories
Traditional data mining tools, Dashboards and text mining tools.

Traditional Data Mining Tools


Traditional mining programs help the companies in establishing data patterns and trends by using
various complex algorithms and techniques. Some of these tools are installed on the desktop
computers to monitor the data and emphasize trends and others capture information residing
outside a data base. Majority of these programs are supported by windows and UNIX versions.
However, some software specializes in one operating system only. In addition to that some may
work in only one database type. But, Most of the software will be able to handle any data using
online analytical processing or a similar technology.

Dashboards

Dashboards reflect data changed and update on screen. Dashboards are normally installed in
computers to monitor information in a database and it reflects data changes and updates the data
in the form of a chart or table on the screen. It enables the user to see how the business is
performing. Historical data can be referenced and checks against the current status in order to see
the changes in the business. By this way, dashboards is very easy to use and helps the manager a
lot with great appeal to have an overview of the company’s performance.

Text-Mining Tools
The third type of data mining tools is called as a text-mining tool because of its ability to mine
data from different kind of text starting from Microsoft Word, Acrobat PDF documents to simple
text files. This provides facility of scanning the content and converts the selected into a format
that is compatible with the tools database without opening different applications.

OPEN SOURCE TOOLS FOR DATA MINING


R

R is an open source programming language and environment for statistical computing and
graphics. R provides a wide variety of graphical and statistical techniques such as linear and non-

3
Data mining for business analytics
linear modeling, classical statistical tests, series analysis, classification clustering and is highly
extensible. Researchers in various fields of applied statistics have adopted R for statistical
software development and data analysis. Extensibility and superb data visualization are the two
main reasons for the success of R

Weka.

Weka is a collection of machine learning algorithms for data mining tasks and well suited for
developing new machine learning schemes. Weka is a java based software capability of working
under various operating systems and contains tools for data pre-processing, classification,
regression, clustering, association rules and visualization. The algorithms can either be applied
directly to a dataset or called from a user’s java code. Weka is probably the most successful open
source data mining software which has inspired by the development of other programs with more
sophisticated graphical user interface and better visualization methods.

Orange

Orange is an open source data mining and visualization software with active community and
which helps novice and experts for their analysis. It has the ability to work under various
platforms like windows, Mac Os C and GNU/Linux operating systems and it’s packed with data
analytics features. It enables design of data analysis process through user friendly visual
programming or python scripting. It has specialized add-ons like Bio orange for bio informatics.
Python is picking up in popularity because it’s simple and easy to learn yet powerful. Hence,
when it comes to looking for a tool for your work and you are a Python developer, look no
further than Orange, a Python-based, powerful and open source tool for both novices and
experts.

KNIME

Data preprocessing has three main components: extraction, transformation and loading. KNIME
does all three. It gives you a graphical user interface to allow for the assembly of nodes for data
processing. It is an open source data analytics, reporting and integration platform. KNIME also

4
Data mining for business analytics
integrates various components for machine learning and data mining through its modular data
pipelining concept and has caught the eye of business intelligence and financial data analysis.
Written in Java and based on Eclipse, KNIME is easy to extend and to add plug-in. Additional
functionalities can be added on the go. Plenty of data integration modules are already included in
the core version.

References

5
Data mining for business analytics
1. Philip K. Chan, Florida Institute of Technology Wei Fan, Andreas L. Prodromidis, and
Salvatore J. Stolfo, Columbia University “Distributed Data Mining in Credit Card Fraud
Detection” IEEE intelligent systems.
2. P. Chan and S. Stolfo, “Metalearning for Multistrategy and Parallel Learning,” Proc.
Second Int’l Workshop Multistrategy Learning, Center for Artificial Intelligence, George
Mason Univ., Fairfax,Va., 1993, pp. 150–165

6
Data mining for business analytics

You might also like