0% found this document useful (0 votes)
16 views47 pages

Data Mining Architecture Components

Uploaded by

rogitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views47 pages

Data Mining Architecture Components

Uploaded by

rogitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Unit V: Data Mining Tools & Techniques


Implementation Process - Data Mining Architecture - Clustering in Data Mining - Different types
of Clustering - Text Data Mining - Bitcoin Data Mining - Data Mining Vs Big Data - Data Mining
Models - Trends in Data Mining.

5.1 Introduction
Data mining is a significant method where previously unknown and potentially useful
information is extracted from the vast amount of data. The data mining process involves
several components, and these components constitute a data mining system
architecture.

Data Mining Architecture


The significant components of data mining systems are a data source, data mining
engine, data warehouse server, the pattern evaluation module, graphical user interface,
and knowledge base.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Data Source:
The actual source of data is the Database, data warehouse, World Wide Web (WWW),
text files, and other documents. You need a huge amount of historical data for data
mining to be successful. Organizations typically store data in databases or data
warehouses. Data warehouses may comprise one or more databases, text files
spreadsheets, or other repositories of data. Sometimes, even plain text files or
spreadsheets may contain information. Another primary source of data is the World
Wide Web or the internet.

Different processes:
Before passing the data to the database or data warehouse server, the data must be
cleaned, integrated, and selected. As the information comes from various sources and in
different formats, it can't be used directly for the data mining procedure because the
data may not be complete and accurate. So, the first data requires to be cleaned and
unified. More information than needed will be collected from various data sources, and
only the data of interest will have to be selected and passed to the server. These
procedures are not as easy as we think. Several methods may be performed on the data
as part of selection, integration, and cleaning.

Database or Data Warehouse Server:


The database or data warehouse server consists of the original data that is ready to be
processed. Hence, the server is cause for retrieving the relevant data that is based on
data mining as per user request.

Data Mining Engine:


The data mining engine is a major component of any data mining system. It contains
several modules for operating data mining tasks, including association, characterization,
classification, clustering, prediction, time-series analysis, etc.

In other words, we can say data mining is the root of our data mining architecture. It
comprises instruments and software used to obtain insights and knowledge from data
collected from various data sources and stored within the data warehouse.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Pattern Evaluation Module:


The Pattern evaluation module is primarily responsible for the measure of investigation
of the pattern by using a threshold value. It collaborates with the data mining engine to
focus the search on exciting patterns.

This segment commonly employs stake measures that cooperate with the data mining
modules to focus the search towards fascinating patterns. It might utilize a stake
threshold to filter out discovered patterns. On the other hand, the pattern evaluation
module might be coordinated with the mining module, depending on the
implementation of the data mining techniques used. For efficient data mining, it is
abnormally suggested to push the evaluation of pattern stake as much as possible into
the mining procedure to confine the search to only fascinating patterns.

Graphical User Interface:


The graphical user interface (GUI) module communicates between the data mining
system and the user. This module helps the user to easily and efficiently use the system
without knowing the complexity of the process. This module cooperates with the data
mining system when the user specifies a query or a task and displays the results.

Knowledge Base:
The knowledge base is helpful in the entire process of data mining. It might be helpful
to guide the search or evaluate the stake of the result patterns. The knowledge base
may even contain user views and data from user experiences that might be helpful in
the data mining process. The data mining engine may receive inputs from the
knowledge base to make the result more accurate and reliable. The pattern assessment
module regularly interacts with the knowledge base to get inputs, and also update it.

5.2 Data Mining Implementation


Process
Many different sectors are taking advantage of data mining to boost their business
efficiency, including manufacturing, chemical, marketing, aerospace, etc. Therefore, the
need for a conventional data mining process improved effectively. Data mining
techniques must be reliable, repeatable by company individuals with little or no
knowledge of the data mining context. As a result, a cross-industry standard process for
data mining (CRISP-DM) was first introduced in 1990, after going through many
workshops, and contribution for more than 300 organizations.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Data mining is described as a process of finding hidden precious data by evaluating the
huge quantity of information stored in data warehouses, using multiple data mining
techniques such as Artificial Intelligence (AI), Machine learning and statistics.

Let's examine the implementation process for data mining in details:

The Cross-Industry Standard Process for


Data Mining (CRISP-DM)
Cross-industry Standard Process of Data Mining (CRISP-DM) comprises of six phases
designed as a cyclical method as the given figure:

ADVERTISEMENT
ADVERTISEMENT
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

1. Business understanding:
It focuses on understanding the project goals and requirements form a business point
of view, then converting this information into a data mining problem afterward a
preliminary plan designed to accomplish the target.

Tasks:

o Determine business objectives


o Access situation
o Determine data mining goals
o Produce a project plan

Determine business objectives:

o It Understands the project targets and prerequisites from a business point of


view.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Thoroughly understand what the customer wants to achieve.


o Reveal significant factors, at the starting, it can impact the result of the project.

Access situation:

o It requires a more detailed analysis of facts about all the resources, constraints,
assumptions, and others that ought to be considered.

Determine data mining goals:

o A business goal states the target of the business terminology. For example,
increase catalog sales to the existing customer.
o A data mining goal describes the project objectives. For example, It assumes how
many objects a customer will buy, given their demographics details (Age, Salary,
and City) and the price of the item over the past three years.

Produce a project plan:

o It states the targeted plan to accomplish the business and data mining plan.
o The project plan should define the expected set of steps to be performed during
the rest of the project, including the latest technique and better selection of
tools.

2. Data Understanding:
Data understanding starts with an original data collection and proceeds with operations
to get familiar with the data, to data quality issues, to find better insight in data, or to
detect interesting subsets for concealed information hypothesis.

Tasks:

o Collects initial data


o Describe data
o Explore data
o Verify data quality

Collect initial data:


23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o It acquires the information mentioned in the project resources.


o It includes data loading if needed for data understanding.
o It may lead to original data preparation steps.
o If various information sources are acquired then integration is an extra issue,
either here or at the subsequent stage of data preparation.

Describe data:

o It examines the "gross" or "surface" characteristics of the information obtained.


o It reports on the outcomes.

Explore data:

o Addressing data mining issues that can be resolved by querying,


visualizing, and reporting, including:
o Distribution of important characteristics, results of simple aggregation.
o Establish the relationship between the small number of attributes.
o Characteristics of important sub-populations, simple statical analysis.
o It may refine the data mining objectives.
o It may contribute or refine the information description, and quality reports.
o It may feed into the transformation and other necessary information preparation.

Verify data quality:

o It examines the data quality and addressing questions.

3. Data Preparation:
o It usually takes more than 90 percent of the time.
o It covers all operations to build the final data set from the original raw
information.
o Data preparation is probable to be done several times and not in any prescribed
order.

Tasks:

o Select data
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Clean data
o Construct data
o Integrate data
o Format data

Select data:

o It decides which information to be used for evaluation.


o In the data selection criteria include significance to data mining objectives,
quality and technical limitations such as data volume boundaries or data types.
o It covers the selection of characteristics and the choice of the document in the
table.

Clean data:

o It may involve the selection of clean subsets of data, inserting appropriate


defaults or more ambitious methods, such as estimating missing information by
modeling.

Construct data:

o It comprises of Constructive information preparation, such as generating derived


characteristics, complete new documents, or transformed values of current
characteristics.

Integrate data:

o Integrate data refers to the methods whereby data is combined from various
tables, or documents to create new documents or values.

Format data:

o Formatting data refer mainly to linguistic changes produced to information that


does not alter their significance but may require a modeling tool.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

4. Modeling:
In modeling, various modeling methods are selected and applied, and their parameters
are measured to optimum values. Some methods gave particular requirements on the
form of data. Therefore, stepping back to the data preparation phase is necessary.

Tasks:

o Select modeling technique


o Generate test design
o Build model
o Access model

Select modeling technique:

o It selects the real modeling method that is to be used. For example, decision tree,
neural network.
o If various methods are applied,then it performs this task individually for each
method.

Generate test Design:

o Generate a procedure or mechanism for testing the validity and quality of the
model before constructing a model. For example, in classification, error rates are
commonly used as quality measures for data mining models. Therefore, typically
separate the data set into train and test set, build the model on the train set and
assess its quality on the separate test set.

Build model:

o To create one or more models, we need to run the modeling tool on the
prepared data set.

Assess model:

o It interprets the models according to its domain expertise, the data mining
success criteria, and the required design.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o It assesses the success of the application of modeling and discovers methods


more technically.
o It Contacts business analytics and domain specialists later to discuss the
outcomes of data mining in the business context.

5. Evaluation:
o At the last of this phase, a decision on the use of the data mining results should
be reached.
o It evaluates the model efficiently, and review the steps executed to build the
model and to ensure that the business objectives are properly achieved.
o The main objective of the evaluation is to determine some significant business
issue that has not been regarded adequately.
o At the last of this phase, a decision on the use of the data mining outcomes
should be reached.

Tasks:

o Evaluate results
o Review process
o Determine next steps

Evaluate results:

o It assesses the degree to which the model meets the organization's business
objectives.
o It tests the model on test apps in the actual implementation when time and
budget limitations permit and also assesses other data mining results produced.
o It unveils additional difficulties, suggestions, or information for future
instructions.

Review process:

o The review process does a more detailed evaluation of the data mining
engagement to determine when there is a significant factor or task that has been
somehow ignored.
o It reviews quality assurance problems.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Determine next steps:

o It decides how to proceed at this stage.


o It decides whether to complete the project and move on to deployment when
necessary or whether to initiate further iterations or set up new data-mining
initiatives.it includes resources analysis and budget that influence the decisions.

6. Deployment:
Determine:

o Deployment refers to how the outcomes need to be utilized.

Deploy data mining results by:

o It includes scoring a database, utilizing results as company guidelines, interactive


internet scoring.
o The information acquired will need to be organized and presented in a way that
can be used by the client. However, the deployment phase can be as easy as
producing. However, depending on the demands, the deployment phase may be
as simple as generating a report or as complicated as applying a repeatable data
mining method across the organizations.

Tasks:

o Plan deployment
o Plan monitoring and maintenance
o Produce final report
o Review project

Plan deployment:

o To deploy the data mining outcomes into the business, takes the assessment
results and concludes a strategy for deployment.
o It refers to documentation of the process for later deployment.

Plan monitoring and maintenance:


23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o It is important when the data mining results become part of the day-to-day
business and its environment.
o It helps to avoid unnecessarily long periods of misuse of data mining results.
o It needs a detailed analysis of the monitoring process.

Produce final report:

o A final report can be drawn up by the project leader and his team.
o It may only be a summary of the project and its experience.
o It may be a final and comprehensive presentation of data mining.

Review project:

o Review projects evaluate what went right and what went wrong, what was done
wrong, and what needs to be improved.

5.3 Clustering in Data Mining


Clustering is an unsupervised Machine Learning-based Algorithm that comprises a
group of data points into clusters so that the objects belong to the same group.

Clustering helps to splits data into several subsets. Each of these subsets contains data
similar to each other, and these subsets are called clusters. Now that the data from our
customer base is divided into clusters, we can make an informed decision about who we
think is best suited for this product.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Let's understand this with an example, suppose we are a market manager, and we have
a new tempting product to sell. We are sure that the product would bring enormous
profit, as long as it is sold to the right people. So, how can we tell who is best suited for
the product from our company's huge customer base?
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Clustering, falling under the category of unsupervised machine learning, is one of the
problems that machine learning algorithms solve.

Clustering only utilizes input data, to determine patterns, anomalies, or similarities in its
input data.

A good clustering algorithm aims to obtain clusters whose:

o The intra-cluster similarities are high, It implies that the data present inside the
cluster is similar to one another.
o The inter-cluster similarity is low, and it means each cluster holds data that is not
similar to other data.

What is a Cluster?
o A cluster is a subset of similar objects
o A subset of objects such that the distance between any of the two objects in the
cluster is less than the distance between any object in the cluster and any object
that is not located inside it.
o A connected region of a multidimensional space with a comparatively high
density of objects.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

What is clustering in Data Mining?


o Clustering is the method of converting a group of abstract objects into classes of
similar objects.
o Clustering is a method of partitioning a set of data or objects into a set of
significant subclasses called clusters.
o It helps users to understand the structure or natural grouping in a data set and
used either as a stand-alone instrument to get a better insight into data
distribution or as a pre-processing step for other algorithms

Important points:
o Data objects of a cluster can be considered as one group.
o We first partition the information set into groups while doing cluster analysis. It is
based on data similarities and then assigns the levels to the groups.
o The over-classification main advantage is that it is adaptable to modifications,
and it helps single out important characteristics that differentiate between
distinct groups.

Applications of cluster analysis in data mining:


o In many applications, clustering analysis is widely used, such as data analysis,
market research, pattern recognition, and image processing.
o It assists marketers to find different groups in their client base and based on the
purchasing patterns. They can characterize their customer groups.
o It helps in allocating documents on the internet for data discovery.
o Clustering is also used in tracking applications such as detection of credit card
fraud.
o As a data mining function, cluster analysis serves as a tool to gain insight into the
distribution of data to analyze the characteristics of each cluster.
o In terms of biology, It can be used to determine plant and animal taxonomies,
categorization of genes with the same functionalities and gain insight into
structure inherent to populations.
o It helps in the identification of areas of similar land that are used in an earth
observation database and the identification of house groups in a city according
to house type, value, and geographical location.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Why is clustering used in data mining?


Clustering analysis has been an evolving problem in data mining due to its variety of
applications. The advent of various data clustering tools in the last few years and their
comprehensive use in a broad range of applications, including image processing,
computational biology, mobile communication, medicine, and economics, must
contribute to the popularity of these algorithms. The main issue with the data clustering
algorithms is that it cant be standardized. The advanced algorithm may give the best
results with one type of data set, but it may fail or perform poorly with other kinds of
data set. Although many efforts have been made to standardize the algorithms that can
perform well in all situations, no significant achievement has been achieved so far. Many
clustering tools have been proposed so far. However, each algorithm has its advantages
or disadvantages and cant work on all real situations.

1. Scalability:

Scalability in clustering implies that as we boost the amount of data objects, the time to
perform clustering should approximately scale to the complexity order of the algorithm.
For example, if we perform K- means clustering, we know it is O(n), where n is the
number of objects in the data. If we raise the number of data objects 10 folds, then the
time taken to cluster them should also approximately increase 10 times. It means there
should be a linear relationship. If that is not the case, then there is some error with our
implementation process.

Data should be scalable if it is not scalable, then we can't get the appropriate result. The
figure illustrates the graphical example where it may lead to the wrong result.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

2. Interpretability:

The outcomes of clustering should be interpretable, comprehensible, and usable.

3. Discovery of clusters with attribute shape:

The clustering algorithm should be able to find arbitrary shape clusters. They should not
be limited to only distance measurements that tend to discover a spherical cluster of
small sizes.

4. Ability to deal with different types of attributes:

Algorithms should be capable of being applied to any data such as data based on
intervals (numeric), binary data, and categorical data.

5. Ability to deal with noisy data:

Databases contain data that is noisy, missing, or incorrect. Few algorithms are sensitive
to such data and may result in poor quality clusters.

6. High dimensionality:

The clustering tools should not only able to handle high dimensional data space but
also the low-dimensional space.

5.4 Text Data Mining


Text data mining can be described as the process of extracting essential data from
standard language text. All the data that we generate via text messages, documents,
emails, files are written in common language text. Text mining is primarily used to draw
useful insights or patterns from such data.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

The text mining market has experienced exponential growth and adoption over the last
few years and also expected to gain significant growth and adoption in the coming
future. One of the primary reasons behind the adoption of text mining is higher
competition in the business market, many organizations seeking value-added solutions
to compete with other organizations. With increasing completion in business and
changing customer perspectives, organizations are making huge investments to find a
solution that is capable of analyzing customer and competitor data to improve
competitiveness. The primary source of data is e-commerce websites, social media
platforms, published articles, survey, and many more. The larger part of the generated
data is unstructured, which makes it challenging and expensive for the organizations to
analyze with the help of the people. This challenge integrates with the exponential
growth in data generation has led to the growth of analytical tools. It is not only able to
handle large volumes of text data but also helps in decision-making purposes. Text
mining software empowers a user to draw useful information from a huge set of data
available sources.

Areas of text mining in data mining:


These are the following area of text mining :
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Information Extraction:
The automatic extraction of structured data such as entities, entities relationships,
and attributes describing entities from an unstructured source is called
information extraction.
o Natural Language Processing:
NLP stands for Natural language processing. Computer software can understand
human language as same as it is spoken. NLP is primarily a component of
artificial intelligence(AI). The development of the NLP application is difficult
because computers generally expect humans to "Speak" to them in a
programming language that is accurate, clear, and exceptionally structured.
Human speech is usually not authentic so that it can depend on many complex
variables, including slang, social context, and regional dialects.
o Data Mining:
Data mining refers to the extraction of useful data, hidden patterns from large
data sets. Data mining tools can predict behaviors and future trends that allow
businesses to make a better data-driven decision. Data mining tools can be used
to resolve many business problems that have traditionally been too time-
consuming.
o Information Retrieval:
Information retrieval deals with retrieving useful data from data that is stored in
our systems. Alternately, as an analogy, we can view search engines that happen
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

on websites such as e-commerce sites or any other sites as part of information


retrieval.

Text Mining Process:


The text mining process incorporates the following steps to extract the data from the
document.

o Text transformation
A text transformation is a technique that is used to control the capitalization of
the text.
Here the two major way of document representation is given.

a. Bag of words
b. Vector Space
o Text Pre-processing
Pre-processing is a significant task and a critical step in Text Mining, Natural
Language Processing (NLP), and information retrieval(IR). In the field of text
mining, data pre-processing is used for extracting useful information and
knowledge from unstructured text data. Information Retrieval (IR) is a matter of
choosing which documents in a collection should be retrieved to fulfill the user's
need.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Feature selection:
Feature selection is a significant part of data mining. Feature selection can be
defined as the process of reducing the input of processing or finding the
essential information sources. The feature selection is also called variable
selection.
o Data Mining:
Now, in this step, the text mining procedure merges with the conventional
process. Classic Data Mining procedures are used in the structural database.
o Evaluate:
Afterward, it evaluates the results. Once the result is evaluated, the result
abandon.
o Applications:
These are the following text mining applications:
o Risk Management:
Risk Management is a systematic and logical procedure of analyzing, identifying,
treating, and monitoring the risks involved in any action or process in
organizations. Insufficient risk analysis is usually a leading cause of
disappointment. It is particularly true in the financial organizations where
adoption of Risk Management Software based on text mining technology can
effectively enhance the ability to diminish risk. It enables the administration of
millions of sources and petabytes of text documents, and giving the ability to
connect the data. It helps to access the appropriate data at the right time.
o Customer Care Service:
Text mining methods, particularly NLP, are finding increasing significance in the
field of customer care. Organizations are spending in text analytics programming
to improve their overall experience by accessing the textual data from different
sources such as customer feedback, surveys, customer calls, etc. The primary
objective of text analysis is to reduce the response time of the organizations and
help to address the complaints of the customer rapidly and productively.
o Business Intelligence:
Companies and business firms have started to use text mining strategies as a
major aspect of their business intelligence. Besides providing significant insights
into customer behavior and trends, text mining strategies also support
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

organizations to analyze the qualities and weaknesses of their opponent's so,


giving them a competitive advantage in the market.
o Social Media Analysis:
Social media analysis helps to track the online data, and there are numerous text
mining tools designed particularly for performance analysis of social media sites.
These tools help to monitor and interpret the text generated via the internet from
the news, emails, blogs, etc. Text mining tools can precisely analyze the total no
of posts, followers, and total no of likes of your brand on a social media platform
that enables you to understand the response of the individuals who are
interacting with your brand and content.

Text Mining Approaches in Data Mining:


These are the following text mining approaches that are used in data mining.

1. Keyword-based Association Analysis:

It collects sets of keywords or terms that often happen together and afterward discover
the association relationship among them. First, it preprocesses the text data by parsing,
stemming, removing stop words, etc. Once it pre-processed the data, then it induces
association mining algorithms. Here, human effort is not required, so the number of
unwanted results and the execution time is reduced.

2. Document Classification Analysis:

Automatic document classification:

This analysis is used for the automatic classification of the huge number of online text
documents like web pages, emails, etc. Text document classification varies with the
classification of relational data as document databases are not organized according to
attribute values pairs.

Numericizing text:
o Stemming algorithms
A significant pre-processing step before ordering of input documents starts with
the stemming of words. The terms "stemming" can be defined as a reduction of
words to their roots. For example, different grammatical forms of words and
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

ordered are the same. The primary purpose of stemming is to ensure a similar
word by text mining program.
o Support for different languages:
There are some highly language-dependent operations such as stemming,
synonyms, the letters that are allowed in words. Therefore, support for various
languages is important.
o Exclude certain character:
Excluding numbers, specific characters, or series of characters, or words that are
shorter or longer than a specific number of letters can be done before the
ordering of the input documents.
o Include lists, exclude lists (stop-words):
A particular list of words to be listed can be characterized, and it is useful when
we want to search for a specific word. It also classifies the input documents based
on the frequencies with which those words occur. Additionally, "stop words,"
which means terms that are to be rejected from the ordering can be
characterized. Normally, a default list of English stop words incorporates "the,"
"a," "since," etc. These words are used in the respective language very often but
communicate very little data in the document.

5.5 Bitcoin Data Mining


Bitcoin mining refers to the process of authenticating and adding transactional records
to the public ledger. The public ledge is known as the blockchain because it comprises a
chain of the block.

Before we understand the Bitcoin mining concept, we should understand what Bitcoin
is. Bitcoin is virtual money having some value, and its value is not static, it varies
according to time. There is no Bitcoin regulatory body that regulates the Bitcoin
transactions.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Let's understand the bitcoin concept with an example. The company manager takes a
dummy thing and announces that who will get this thing will be the happiest employer
of the organization and get an international holiday ticket. So everyone trying to buy
that dummy thing that has no value and in this way, this dummy thing will have some
value may be lies between 10$ to 20$ or anything. We can relate these things with the
Bitcoin if the number of purchasers of Bitcoin increases, then the value of Bitcoin also
increases to a saturated value afterward it stops.

Bitcoin was created under the pseudonym(False name) Satoshi Nakamoto, who
announced the invention, and later it was implemented as open-source code. An only
end-to-end version of electronic money would enable online payments to be sent
directly from one person to another without the interference of an economic
body.Bitcoin is a network practice that empowers people to transfer assets rights on
account units called Bitcoin's, made in limited quantity. When an individual sends a
couple of bitcoins to another individual, this data is communicated to the peer-to-
peer bitcoin network.

This technology remains similar to purchasing something with virtual currency. However,
one advantage of Bitcoins is that the arrangement remains unidentified. The personal
identity of the sender and the beneficiary (receiver) remain encrypted. It is the primary
reason that's why it has become a trusted form of money transaction on the web. By
convention, the complexity in making distributed money is the requirement for a
proposal to avoid double-spending. One individual may simultaneously transmit two
transactions, sending similar coins to two distinct parties on the network. Bitcoin settles
this difficulty and ensures agreement of rights by keeping up a community ledger of all
transactions, called the blockchain. New transactions are grouped mutually and are
checked against the existing record to make sure all new communications are valid.
Bitcoin's accuracy is ensured by individuals who give computation authority to its
system known as miners to validate and affix transactions to a public ledger.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Bitcoins don't exist physically and are only an arrangement of virtual data. It can be
exchanged for genuine money, and are broadly acceptable in most countries around the
globe. There is no central authority for Bitcoins, similar to a central bank(RBI in India)
that controls the monetary policy. Alternatively, developers solve complex puzzles to
support Bitcoin transactions. This process is called Bitcoin mining.

How to Mine Bitcoins:


It is quite a complex process, but if you want to take it directly, then here is the process
of how it works. You need to get a CPU(Central Processing Unit) with excellent
processing power and a speedy web interface. In the next step, there are numerous
online networks that list out the latest Bitcoin transactions taking place in real-time.
Afterward, Sign in with a Bitcoin customer and attempt to approve those transactions by
assessing blocks of data, called hash. Now, communication goes through several
systems, called nodes, which are simply blocks of data, and since the data is encoded, a
miner is needed to check if his answers are accurate.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

It is quite a complex process, but if you want to take it directly, then here is the process
of how it works. You need to get a CPU(Central Processing Unit) with excellent
processing power and a speedy web interface. In the next step, there are numerous
online networks that list out the latest Bitcoin transactions taking place in real-time.
Afterward, Sign in with a Bitcoin customer and attempt to approve those transactions by
assessing blocks of data, called hash. Now, communication goes through several
systems, called nodes, which are simply blocks of data, and since the data is encoded, a
miner is needed to check if his answers are accurate.

How the Bitcoin Mining Works:


Bitcoin Mining requires a task that is exceptionally tricky to perform, but simple to verify.
It uses cryptography, with a hash function called double SHA-256( a one-way function
that converts a text of any dimension into a string of 256 bits). A hash accepts a portion
of data as input and reduces it down into a smaller hash value (256 bits). With a
cryptographic hash, there is no other option to get a hash value we want without
attempting a ton of sources. Once we find an input that gives the value we want, it is a
simple task for anybody to validate the hash. So, cryptographic hashing turns into a
decent method to apply the Bitcoin "Proof-of-work" (data that is complex to produce
but easy for others to verify).
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

If we consider a block to mine first, we need to collect the new transactions into a block,
and then we hash the block to form a 256-bit block hash value. When the hash initiates
with sufficient zeros, the block has been successfully mined and is directed to the Bitcoin
network, and that has turned into the identifier for the block. In many cases, the hash is
not successful, so we need to alter the block to some extent and try again and again.

Bitcoin Transaction:
A Bitcoin transaction is a section of data that is transmitted to the network and, if valid,
it ends up in a block in the blockchain. The concept of a Bitcoin transaction is to transfer
the responsibility of an amount of Bitcoin address.

When we send Bitcoin, an individual data structure, namely a Bitcoin transaction, is


made by your wallet customer and afterward communicate to rebroadcast the
transaction. If the operation is valid, nodes will incorporate it in the block they are
mining, within 10-20 minutes, the transaction will be included, along with other
transactions, in a block in the blockchain. Finally, the receiver can see the transaction
amount in their wallet.

Some facts about transactions:

o The Bitcoin amount that we send is always sent to a particular address.


o The Bitcoin amount we get is locked to the receiving address, which is associated
with our wallet.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Every time we spend Bitcoin, the amount we spend will consistently come from
funds received earlier and currently present in our wallet.
o Addresses receive Bitcoin, but they don?t send Bitcoin, it is sent from a wallet.

Bitcoin Wallets:
Bitcoin wallets compile the private keys through which we access a bitcoin address and
payout our funds. They appear in different forms, designed for specific types of devices.
We can even use hardcopy to store data to avoid having them on the computer. It is
important to secure and back up our Bitcoin wallet. Bitcoins are the latest technology of
cash, and very soon, other merchants start accepting them as payment.

We know how a bitcoin transaction mechanism works and how they are created, but
how they are stored? We store money in a physical wallet, and bitcoin works similarly,
except it is generally digital. In brief, we don't need to stock bitcoins anywhere. What we
store are the secured digital keys used to access our public bitcoin address and sign
transactions.

There are mainly five types of wallet that are given below:

Desktop Wallets:
First, we need to install the original bitcoin customer (Bitcoin Core). If we have already
installed, then we are running a wallet, but may not know it. In addition to depend on
transactions on the network, this software also empowers us to create a bitcoin address
for transfer and getting the virtual currency. MultiBit(Bitcoin wallet) runs on Mac OSX,
Windows, and Linux. Hive is an OS X- based wallet with some particular features,
including an application store that associates directly to bitcoin services.

Mobile Wallets:
An application on our cell phone, the wallet can store up the security key for our bitcoin
addresses, and enable us to pay for things straightforwardly with our phone. Many
times, a bitcoin wallet will even take advantage of a cell phone?s near-field
communication(NFC) aspect, empowering us to tap the mobile phone against a reader
and pay bitcoins without entering any data at all. A bitcoin customer has to download
the whole bitcoin blockchain, which is always developing and is multiple gigabytes in
size. A ton of mobile phones wouldn't be able to hold the blockchain in their memory. In
such a case, they can use alternative options, and these mobile users are repeatedly
designed with simplified payment verification (SPV) in mind. They download a confined
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

subset of the blockchain and depend on other trusted nodes in the bitcoin system to
ensure that they have the precise data. Mycelium is the example of mobile wallets that
comprises of the Android-based Bitcoin wallet.

Online Wallets:
Electronic wallets stores our security keys on the web, on a computer, limited by
someone else and coupled to the Internet. Various online services are accessible, and
the network to mobile and desktop wallets copying our address among various devices
that we own. One significant advantage of online wallets is that we can access them
from anywhere, in spite of which device we are using.

Hardware Wallets:
Hardware wallets are incomplete numbers. These are sharp devices that can hold private
keys electronically and make easy payments. The compact Ledger USB bitcoin Wallet
utilizes smartcard protection and is accessible at a reasonable cost.

Paper Wallets:
The cheapest alternative for keeping our bitcoins safe and sound is significantly called a
paper wallet. There are various sites offering paper bitcoin wallet services. They deliver a
bitcoin address for us and generate an image containing two QR codes. The first one is
the public address that we can use to receive bitcoins, and the other is the private key
that we use to pay out bitcoins stored at the address. The primary advantage of a paper
wallet is that the private keys are not stored digitally anyplace, so it secures our wallet
from cyber attacks.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

5.7 Data Mining Vs Big Data


Data Mining uses tools such as statistical models, machine learning, and visualization to
"Mine" (extract) the useful data and patterns from the Big Data, whereas Big Data
processes high-volume and high-velocity data, which is challenging to do in older
databases and analysis program.

Big Data:
Big Data refers to the vast amount that can be structured, semi-structured, and
unstructured sets of data ranging in terms of tera-bytes. It is challenging to process a
huge amount of data on a single system that's why the RAM of our computer stores the
interim calculations during the processing and analyzing. When we try to process such a
huge amount of data, it takes much time to do these processing steps on a single
system. Also, our computer system doesn't work correctly due to overload.

Here we will understand the concept (how much data is produced) with a live example.
We all know about Big Bazaar. We as a customer goes to Big Bazaar at least once a
month. These stores monitor each of its product that the customers purchase from
them, and from which store location over the world. They have a live information
feeding system that stores all the data in huge central servers. Imagine the number of
Big bazaar stores in India alone is around 250. Monitoring every single item purchased
by every customer along with the item description will make the data go around 1 TB in
a month.

What does Big Bazaar do with that data?


We know some promotions are running in Big Bazaar on some items. Do we genuinely
believe Big Bazaar would just run those products without any full back-up to find those
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

promotions would increase their sales and generate a surplus? That is where Big Data
analysis plays a vital role. Using Data Analysis techniques, Big Bazaar targets its new
customers as well as existing customers to purchase more from its stores.

Big data comprises of 5Vs that is Volume, Variety, Velocity, Veracity, and Value.

Volume: In Big Data, volume refers to an amount of data that can be huge when it
comes to big data.

Variety: In Big Data, variety refers to various types of data such as web server logs,
social media data, company data.

Velocity: In Big Data, velocity refers to how data is growing with respect to time. In
general, data is increasing exponentially at a very fast rate.

Veracity: Big Data Veracity refers to the uncertainty of data.

Value: In Big Data, value refers to the data which we are storing, and processing is
valuable or not and how we are getting the advantage of these huge data sets.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

How to Process Big Data:


A very efficient method, known as Hadoop, is primarily used for Big data processing. It
is an Open-source software that works on a Distributed Parallel processing method.

The Apache Hadoop methods are comprised


of the given modules:
Hadoop Common:
It contains dictionaries and utilities required by other Hadoop modules.

Hadoop Distributed File System(HDFS):


A distributed file-system which stores data on commodity machine, supporting very
high gross bandwidth over the cluster.

Hadoop YARN:
It is a resource-management Platform responsible for administrating various resources
in clusters and using them for scheduling of user's application.

Hadoop MapReduce:
It is a programming model for huge-scale data processing.

Data Mining:
As the name suggests, Data Mining refers to the mining of huge data sets to identify
trends, patterns, and extract useful information is called data mining.

In data Mining, we are looking for hidden data but without any idea about what exactly
type of data we are looking for and what we plan to use it for once you find it. When we
discover interesting information, we start thinking about how to make use of it to boost
business.

We will understand the data mining concept with an example:

A Data Miner starts discovering the call records of a mobile network operator without
any specific target from his manager. The manager probably gives him a significant
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

objective to discover at least a few new patterns in a month. As he begins extracting the
data to discover a pattern that there are some international calls on Friday (example)
compared to all other days. Now he shares this data with management, and they come
up with a plan to shrink international call rates on Friday and start a campaign. Call
duration goes high, and customers are happy with low call rates, more customers join,
the organization makes more profit as utilization percentage has increased.

There are various steps involved in Data Mining:


23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Data Integration:
In step first, Data are integrated and collected from various sources.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Data Selection:
In the first step, we may not collect all the data simultaneously, so in this step, we select
only those data which are left, and we think it is useful for data mining.

Data Cleaning:
In this step, the information we have collected is not clean and may consist of errors,
noisy or inconsistent data, missing values. So we need to implement various strategies
to get rid of such problems.

Data Transformation:
The data even after cleaning is not prepared for mining, so we need to transform them
into structures for mining. The methods used to achieve this are aggregation,
normalization, smoothing, etc.

Data Mining:
Once the data has transformed, we are ready to implement data mining methods on
data to extract useful data and patterns from data sets. Techniques like clustering
association rules are among the many various techniques used for data mining.

Pattern Evaluation:
Patten evaluation contains visualization, removing random patterns, transformation, etc.
from the patterns we generated.

Decision:
It is the last step in data mining. It helps users to make use of the acquired user data to
make better data-driven decisions.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Difference Between Data Mining and Big Data:

Data Mining Big Data

It primarily targets an analysis of data to It primarily targets the data


extract useful information. relationship.

It can be used for large volume as well as low It contains a huge volume of data.
volume data.

It is a method primarily used for data It is a whole concept than a brief


analysis. term.

It is primarily based on Statistical Analysis, It is primarily based on data analysis,


generally target prediction, and finding generally target prediction, and
business factors on a small scale. finding business factors on a large
scale.

It uses the following data types e.g., It uses the following data types e.g.,
Structured data, relational, and dimensional Structured, Semi-Structured, and
database. unstructured data.

It expresses what about the data. It refers to why of the data.

It is the closest view of the data. It is a broad view of the data.

It is primarily used for strategic decision- It is primarily used for Dashboards


making purposes. and predictive measures.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

5.8 Data Mining Models


Data mining uses raw data to extract information and present it uniquely. The data
mining process is usually found in the most diverse range of applications, including
business intelligence studies, political model forecasting, web ranking forecasting,
weather pattern model forecasting, etc. In business operation intelligence studies,
business experts mine huge data sets related to a business operation or a market and
try to discover previously unrecognized trends and relationships. Data mining is also
used in organizations that utilize big data as a raw data source to extract the required
data.

Read on the given article to know the data mining models with examples.

What are data mining models?


A Data mining model refers to a method that usually use to present the information and
various ways in which they can apply information to specific questions and problems. As
per the specialists, the data mining regression model is the most commonly used data
mining model. In this process, a mining expert first analyzes the data sets and creates a
formula that defines them. Various Financial market analysts use this model to make
predictions related to prices and market trends.

Another significant data mining model is based on the association rule. First, the data
mining analysts analyze the data sets to find which components usually appear
together. When they find the two components are paired simultaneously, it assumes
that there are some relation exits between them. For instance, an electronic shop might
find that consumers often purchase a marker and pen simultaneously they purchase a
book. A shop manager can use the detailed information from the data mining model to
increase sales by presenting all related products at the same place.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Types of data mining models

1. Predictive data mining models


2. Descriptive data mining models

Predictive data mining models


A predictive data mining model predicts the values of data using known results
gathered from the different data sets. Predictive modeling can not be classified as a
separate discipline; it occurs in all organizations or industries across all disciplines. The
main objective of predictive data mining models is to predict the future based on the
past data, generally but not always on the statistical modeling.

Predictive modeling is used in healthcare industries to identify high-risk patients with


congestive heart failures, high blood pressure, diabetes, infection, cancer, etc. It is also
used in the vehicle insurance company to assign the risk of accidents to the
policyholder.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

A predictive model of a data mining task comprises classification, regression, prediction,


and time series analysis. The predictive model of data mining is also called statistical
regression. It refers to a monitoring learning technique that includes an explication of
the dependency of a few attribute's values upon the other attribute's value in the same
product and the growth of a model that can predict these attribute's values in previous
cases.

Classification:

In data mining, classification refers to a form of data analysis where a machine learning
model assigns a specific category to a new observation. It is based on what the model
has learned from the data sets. In other words, classification is the act of assigning
objects to many predefined categories.

One example of classification in the banking and financial services industry is identifying
whether transactions are fraudulent or not. In the same way, machine learning can also
be used to predict whether a loan application would be approved or not.

Regression:

Regression refers to a method that verifies the value of data for a function. Generally, it
is used for appropriate data.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

A linear regression model in the context of machine learning or statistics is basically a


linear approach for modeling the relationships between the dependent variable known
as the result and your independent variable is known as features.

If your model has only one independent variable, it is called simple linear regression,
and else it is called multiple linear regression.

Types of regression

1. Linear Regression:

Linear regression is related to the search for the optimal line which fits the two
attributes so that with the help of one attribute, we can predict the other.

2. Multi-linear regression

Multi-linear regression includes two or more than two attributes, and the data are fit to
multi-dimensional space.

Prediction:

In data mining, prediction is used to identify data value based on the description of
another corresponding data value. The prediction in data mining is known as Numeric
Prediction. Generally, regression analysis is used for prediction. For example, in credit
card fraud detection, data history for a particular person's credit card usage has to be
analyzed. If any abnormal pattern was detected, it should be reported as 'fraudulent
action'.

Time series analysis:

Time series analysis refers to the data sets based on time. It serves as an independent
variable to predict the dependent variable in time.

Descriptive model
A descriptive model differentiates the patterns and relationships in data. A descriptive
model does not attempt to generalize to a statistical population or random process. A
predictive model attempts to generalize to a population or random process. Predictive
models should give prediction intervals and must be cross-validated; that is, they must
prove that they can be used to make predictions with data that was not used in
constructing the model.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Descriptive analytics focuses on the summarization and conversion of the data into
useful information for reporting and monitoring.

Clustering:

Clustering is grouping a set of objects so that objects in the same group called a cluster
are more similar than those in other groups clusters.

Association rules:

Association rules determine a causal relationship between huge sets of data objects. The
way the algorithm works is that you have. For example, a list of items you purchase at
the grocery store for the past six months data, and it calculates a percentage at which
items are purchased together. For example, what are the chances of you buying milk
with cereal?

Sequence:

Sequence refers to the discovery of useful patterns in the data is in relation to some
objective of how it is interesting.

Summarization:

Summarization holds a data set in more depth which is easy to understand form.

5.9Trends in Data Mining


Data mining is one of the most widely used methods to extract data from different
sources and organize them for better usage. Despite having different commercial
systems for data mining, many challenges come up when they are actually implemented.
With the rapid evolution in the field of data mining, companies are expected to stay
abreast with all the new developments.

Complex algorithms form the basis for data mining as they allow data segmentation to
identify trends and patterns, detect variations, and predict the probabilities of various
events. The raw data may come in both analog and digital formats and is inherently
based on the source of the data. Companies need to keep track of the latest data
mining trends and stay updated to do well in the industry and overcome challenging
competition.
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Corporations can use data mining to discover customers' choices, make a good
relationship with customers, increase revenue, and reduce risks. Data mining is based on
complex algorithms that allow data segmentation to discover numerous trends and
patterns, detect deviations, and estimate the likelihood of certain occurrences occurring.
Raw data can be in both analog and digital formats, and it is essentially dependent on
the data's source. Companies must keep up with the latest data mining trends and stay
current to succeed in the industry and beat out the competition.

Types of Mining Sequence in Data Mining


Here are the following types of mining sequences in data mining, such as:

1. Mining Time Series


A specified number of data points are recorded at a specific time or events obtained
over repeated measurements of time in a mining time series. The values or data are
typically measured in equal time intervals like- hourly, weekly, or daily. Time-series data
is also recorded at regular intervals, or characteristic time-series components are the
trend, seasonal, cycle, or irregular.

Application of Time Series

o Financial: Stock market analysis


o Industry: Power consumption
o Scientific: Experiment result
o Meteorological: Precipitation

Time Series Analysis Methods

Trend Analysis: Categories of Time Series movements:


23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

o Long-term or Trend Movements: General direction in which a time series


moves over a long time interval.
o Cyclic Movements:Long-term oscillation about a trend line or curve.
o Seasonal Movements:A time series appears to follow substantially identical
patterns during the corresponding months of subsequent years.
o Irregular or Random Movements: Changes that occur randomly due to
unplanned events.

Similarity Search:

o Data Reduction
o Indexing Methods
o Similarity Search Methods
o Query Languages

2. Mining Symbolic Sequence


A symbolic sequence comprises an ordered list of elements that can be recorded with or
without a sense of time. This sequence can be used in various ways, including consumer
shopping sequences, web clickstreams, software execution sequences, biological
sequences, etc.

Mining sequential patterns entail identifying the subsequences that frequently appear in
one or more sequences. As a result of substantial research in this area, many scalable
algorithms have been developed. Alternatively, we can only mine the set of closed
sequential patterns, where a sequential pattern s is closed if a correct subsequence of s'
and s' has the same support as s.

3. Mining Biological Sequence


Biological sequences are made up of nucleotide or amino acid sequences. Biological
sequence analysis compares, aligns, indexes, and analyzes biological sequences in
bioinformatics and modern biology. Biological sequences analysis plays a crucial role in
bioinformatics and modern biology. Such analysis can be partitioned into pairwise
sequence alignment and multiple sequence alignment.

Biological Sequence Methods:


23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

i. Alignment of Biological Sequences:


o Pairwise Alignment
o The BLAST Local Alignment Algorithm
o Multiple Sequence Alignment Methods
ii. Biological Sequence Analysis Using a Hidden Markov Model:
o Markov Chain
o Hidden Markov Model
o Forward Algorithm
o Viterbi Algorithm
o Baum-Welch Algorithm

Application of Data Mining:

i. Financial Information Analysis:


o Loan payment prediction/consumer credit policy analysis
o Design and construction of information warehouse
o Financial information collected in bank and money establishments area
units is typically comparatively complete, reliable, and top-quality.
ii. Retail Industry:
o Multidimensional analysis (sales, customers, products, time, etc.)
o Sales campaign analysis
o Customer retention
o Product recommendation
o Using visualization tools for data analysis
iii. Science and Engineering:
o Data processing and data warehouse
o Mining complex data types
o Network-based mining
o Graph-based mining
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

Trends in Data Mining


Businesses that have been slow in adopting the process of data mining are now
catching up with the others. Extracting important information through the process of
data mining is widely used to make critical business decisions. We can expect data
mining to become as ubiquitous as some of the more prevalent technologies used
today in the coming decade. Data mining concepts are still evolving, and here are the
following latest trends, such as:

1. Application exploration

Data mining is increasingly used to explore applications in other areas, such as financial
analysis, telecommunications, biomedicine, wireless security, and science.

2. Multimedia Data Mining

This is one of the latest methods which is catching up because of the growing ability to
capture useful data accurately. It involves data extraction from different kinds of
multimedia sources such as audio, text, hypertext, video, images, etc. The data is
converted into a numerical representation in different formats. This method can be used
in clustering and classifications, performing similarity checks, and identifying
associations.

3. Ubiquitous Data Mining

This method involves mining data from mobile devices to get information about
individuals. Despite having several challenges in this type, such as complexity, privacy,
cost, etc., this method has a lot of opportunities to be enormous in various industries,
especially in studying human-computer interactions.

4. Distributed Data Mining

This type of data mining is gaining popularity as it involves mining a huge amount of
information stored in different company locations or at different organizations. Highly
sophisticated algorithms are used to extract data from different locations and provide
proper insights and reports based on them.

5. Embedded Data Mining

Data mining features are increasingly finding their way into many enterprise software
use cases, from sales forecasting in CRM SaaS platforms to cyber threat detection in
intrusion detection/prevention systems. The embedding of data mining into vertical
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

market software applications enables prediction capabilities for any number of


industries and opens up new realms of possibilities for unique value creation.

6. Spatial and Geographic Data Mining

This new trending type of data mining includes extracting information from
environmental, astronomical, and geographical data, including images taken from outer
space. This type of data mining can reveal various aspects such as distance and
topology, which are mainly used in geographic information systems and other
navigation applications.

7. Time Series and Sequence Data Mining

The primary application of this type of data mining is the study of cyclical and seasonal
trends. This practice is also helpful in analyzing even random events which occur outside
the normal series of events. Retail companies mainly use this method to access
customers' buying patterns and behaviors.

8. Data Mining Dominance in the Pharmaceutical And Health Care Industries

Both the pharmaceutical and health care industries have long been innovators in the
category of data mining. The recent rapid development of coronavirus vaccines is
directly attributed to advances in pharmaceutical testing data mining techniques,
specifically signal detection during the clinical trial process for new drugs. In health care,
specialized data mining techniques are being used to analyze DNA sequences for
creating custom therapies, make better-informed diagnoses, and more.

9. Increasing Automation In Data Mining

Today's data mining solutions typically integrate ML and big data stores to provide
advanced data management functionality alongside sophisticated data analysis
techniques. Earlier incarnations of data mining involved manual coding by specialists
with a deep background in statistics and programming. Modern techniques are highly
automated, with AI/ML replacing most of these previously manual processes for
developing pattern-discovering algorithms.

10. Data Mining Vendor Consolidation

If history is any indication, significant product consolidation in the data mining space is
imminent as larger database vendors acquire data mining tooling startups to augment
their offerings with new features. The current fragmented market and a broad range of
23PCOAE24-1 DATA MINING AND DATA WAREHOUSING

data mining players resemble the adjacent big data vendor landscape that continues to
undergo consolidation.

11. Biological data mining

Mining DNA and protein sequences, mining high dimensional microarray data,
biological pathway and network analysis, link analysis across heterogeneous biological
data, and information integration of biological data by data mining are interesting
topics for biological data mining research.

You might also like