DATA MINING ARCHITECTURE
Introduction to Data mining Architecture
Data mining is described as a process of discover or extracting
interesting knowledge from large amounts of data stored in multiple data
sources such as file systems, databases, data warehouses…etc. This
knowledge contributes a lot of benefits to business strategies, scientific,
medical research, governments and individual.
Business data is collected explosively every minute through business
transactions and stored in relational database systems. In order to
provide insight about the business processes, data warehouse systems
have been built to provide analytical reports that help business users to
make decisions.
Data is now stored in databases and/or data warehouse systems so
should we design a data mining system that decouples or couples with
databases and data warehouse systems? This question leads to four
possible architectures of a data mining system as follows:
No-coupling: in this architecture, data mining system does not
utilize any functionality of a database or data warehouse system. A
no-coupling data mining system retrieves data from a particular
data sources such as file system, processes data using major data
mining algorithms and stores results into file system. The no-
coupling data mining architecture does not take any advantages of
database or data warehouse that is already very efficient in
organizing, storing, accessing and retrieving data. The no-coupling
architecture is considered a poor architecture for data mining
system however it is used for simple data mining processes.
Loose Coupling: in this architecture, data mining system uses
database or data warehouse for data retrieval. In loose coupling
data mining architecture, data mining system retrieves data from
database or data warehouse, processes data using data mining
algorithms and stores the result in those systems. This architecture
is mainly for memory-based data mining system that does not
require high scalability and high performance.
Semi-tight Coupling: in semi-tight coupling data mining
architecture, beside linking to database or data warehouse system,
data mining system uses several features of database or data
warehouse systems to perform some data mining tasks including
sorting, indexing, aggregation…etc. In this architecture, some
intermediate result can be stored in database or data warehouse
system for better performance.
Tight Coupling: in tight coupling data mining architecture,
database or data warehouse is treated as an information retrieval
component of data mining system using integration. All the
features of database or data warehouse are used to perform data
mining tasks. This architecture provides system scalability, high
performance and integrated information.
Let’s examine the tight-coupling data mining architecture in a greater
detail.
Tight-coupling data mining architecture
Data Mining Architecture
There are three tiers in the tight-coupling data mining architecture:
1. Data layer: as mentioned above, data layer can be database and/or
data warehouse systems. This layer is an interface for all data
sources. Data mining results are stored in data layer so it can be
presented to end-user in form of reports or other kind of
visualization.
2. Data mining application layer is used to retrieve data from
database. Some transformation routine can be performed here to
transform data into desired format. Then data is processed using
various data mining algorithms.
3. Front-end layer provides intuitive and friendly user interface for
end-user to interact with data mining system. Data mining result
presented in visualization form to the user in the front-end layer.
In this article, we’ve discussed various data mining architectures, its
advantages and disadvantages. And then we looked into a tight-couple
data mining architecture – the most desired, high performance and
scalable data mining architecture.