0% found this document useful (0 votes)
5 views23 pages

Unit1 - Part 2

The document discusses data mining and warehousing, covering technologies used such as statistics, machine learning, database systems, and information retrieval. It highlights applications in business intelligence and web search engines, along with challenges faced by these applications. Major issues in data mining research are also outlined, including methodology, user interaction, efficiency, diversity of data types, and societal implications regarding privacy.

Uploaded by

surya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views23 pages

Unit1 - Part 2

The document discusses data mining and warehousing, covering technologies used such as statistics, machine learning, database systems, and information retrieval. It highlights applications in business intelligence and web search engines, along with challenges faced by these applications. Major issues in data mining research are also outlined, including methodology, user interaction, efficiency, diversity of data types, and societal implications regarding privacy.

Uploaded by

surya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Datamining & Warehousing

Unit 1 – Part 2
Dr.VIDHYA B
ASSISTANT PROFESSOR & HEAD
Department of Computer Technology
Sri Ramakrishna College of Arts and Science
Coimbatore - 641 006
Tamil Nadu, India

1
Agenda
■ Technology Used

■ Kind of Applications
■ Major Issues in Data Mining
■ Summary

Sri Ramakrishna College of Arts & Science


2
Data Mining - Technologies Used

Pattern
Machine Recogniti
Statistics
Learning on

Database Visualizat
Systems Data ion
Mining
Datawar
ehouse Algorith
ms
Informati
on High-Perfor
Applicati
mance
Retrieval ons Computing

Sri Ramakrishna College of Arts & Science 3


Data Mining - Technologies Used

1. Statistics - The collection, analysis, interpretation or


explanation, and presentation of data.
A statistical model is a set of mathematical functions that describe

the behavior of the objects in a target class in terms of random


variables and their associated probability distributions
Statistics research develops tools for prediction and forecasting

using data and statistical models. Statistical methods can be used


to summarize or describe a collection of data.
A statistical hypothesis test (sometimes called confirmatory data

analysis

Sri Ramakrishna College of Arts & Science 4


Data Mining - Technologies Used

2. Machine learning is a technique for computer programs to


automatically learn to recognize complex patterns and make
intelligent decisions based on data.

Sri Ramakrishna College of Arts & Science 5


Data Mining - Technologies Used
■ Supervised learning - is defined as classification, learning comes from
the labeled examples in the training data set.
■ Unsupervised learning is defined as clustering, the learning process is
unsupervised since the input examples are not class labeled, clustering to
discover classes within the data
■ Semi-supervised learning is a class of machine learning techniques that
make use of both labeled and unlabeled examples when learning a model.
■ Active learning is a machine learning approach that lets users play an
active role in the learning process. The goal is to optimize the model quality
by actively acquiring knowledge from human users, given a constraint on
how many examples they can be asked to label

Sri Ramakrishna College of Arts & Science 6


Data Mining - Technologies Used
For classification and clustering
tasks, machine learning research
often focuses on the accuracy of the
model.
▪In addition to accuracy, data
mining research places strong
emphasis on the efficiency and
scalability of mining methods on
large data sets.
▪Ways to handle complex types of
data and explore new, alternative
methods.

Sri Ramakrishna College of Arts & Science 7


Data Mining - Technologies Used
3. Database Systems and Data Warehouses:
■ Database systems research focuses on the creation, maintenance, and use of
databases for organizations and end-users.
■ A data warehouse integrates data originating from multiple sources and various
timeframes. It consolidates data in multidimensional space to form partially
materialized data cubes.
■ The data cube model not only facilitates OLAP in multidimensional databases
but also promotes multidimensional data mining

Sri Ramakrishna College of Arts & Science 8


Data Mining - Technologies Used
4. Information retrieval (IR):
■ It is the science of searching for documents or information in documents.
■ Documents can be text or multimedia, reside on the Web.
■ Differences between traditional information retrieval and database systems:
(1) the data under search are unstructured;
(2) the queries are formed mainly by keywords, which do not have complex structures
■ Digital libraries, digital governments, and health care information systems have
huge data, effective search and analysis have raised many challenging issues in
data mining.
■ Hence text mining and multimedia data mining, integrated with information
retrieval methods, have become increasingly important.

Sri Ramakrishna College of Arts & Science 9


Applications of Data Mining
■ Data mining has seen great successes in many applications.
■ To demonstrate the importance of applications as a major dimension in data
mining research and development, discussed as two highly successful and
popular application examples of data mining.

10
Applications of Data Mining
1. Business Intelligence:
■ Business intelligence (BI) technologies provide historical, current, and
predictive views of business operations.
Examples:
■Reporting,
■Online analytical processing,
■Business performance management,
■Competitive intelligence,
■Benchmarking,
■ To perform effective market analysis, compare customer feedback on similar
products, discover the strengths and weaknesses of their competitors, retain
highly valuable customers, and make smart business decisions.
■ Online analytical processing tools in business intelligence rely on data
warehousing and multidimensional datamining.

Sri Ramakrishna College of Arts & Science 11


Applications of Data Mining

■ The core of predictive analytics in business


intelligence:
■ Classification and prediction techniques
■ Clustering in customer relationship management,
groups customers based on their similarities.
■ Characterization mining techniques, understand
features of each customer group and develop
customized customer reward programs.

Sri Ramakrishna College of Arts & Science 12


Applications of Data Mining
2. Web Search Engines:
■ It is a specialized computer server that searches for information on the Web,
contain web pages, images, and other types of files.
■ Search engines operate algorithmically or by a mixture of algorithmic and
human input
■ Web search engines uses data mining techniques:
■ crawling (e.g., deciding which pages should be crawled and the crawling frequencies)
■ indexing (e.g., selecting pages to be indexed and deciding to which extent the index
should be constructed), and
■ searching (e.g., deciding how pages should be ranked)

Sri Ramakrishna College of Arts & Science 13


Applications of Data Mining
Challenges of Web Search Engines:
1. Handle a huge and ever-growing amount of data.
■ computer clouds, consist of thousands or even hundreds of thousands of computers that
collaboratively mine the huge amount of data.
2. Web search engines often have to deal with online data
■ A search engine afford constructing a model offline on huge data sets - construct a query
classifier that assigns a search query to predefined categories based on the query topic
(Apple)
■ Maintaining and incrementally updating a model on fast growing data streams.
3. Web search engines deal with queries that are asked only a very small
number of times
■ The total number of queries asked can be huge, most of the queries may be asked only
once or a few times. Such severely skewed data are challenging for many data mining and
machine learning methods

Sri Ramakrishna College of Arts & Science 14


Major issues of Data Mining

The major issues in data mining research, partitioned into five groups

Sri Ramakrishna College of Arts & Science 15


Major issues of Data Mining
1. Mining Methodology

Sri Ramakrishna College of Arts & Science 16


Major issues of Data Mining
1. Mining Methodology:
■ Mining various and new kinds of knowledge:
Due to the diversity of applications, new mining tasks continue to emerge,
making data mining a dynamic and fast-growing field.
■ Mining knowledge in multidimensional space:
Interesting patterns can be searched among combinations of dimensions
(attributes) at varying levels of abstraction. Such mining is known as
(exploratory) multidimensional data mining
■ Data mining—an interdisciplinary effort.
■ To mine data with natural language text, fuse data mining methods with
methods of information retrieval and natural language processing
■ The mining of software bugs in large programs, called as bug mining,
benefits from the incorporation

Sri Ramakrishna College of Arts & Science 17


Major issues of Data Mining
Mining Methodology:
■ Boosting the power of discovery in a networked environment of software
engineering knowledge into the data mining process:
■ Semantic links across multiple data objects can be used, Knowledge derived in one set of
objects can be used to boost the discovery of knowledge in a “related” or semantically
linked set of objects.
■ Handling uncertainty, noise, or incompleteness of data:
Errors and noise may confuse the data mining process, leading to the derivation
of erroneous patterns.
■ Pattern evaluation and pattern- or constraint-guided mining:
Techniques are needed to assess the interestingness of discovered patterns
based on subjective measures.

Sri Ramakrishna College of Arts & Science 18


Major issues of Data Mining
2. User Interaction

Flexible user Constraints -Query languages -adopt expressive


interfaces and Rules users to pose knowledge
an exploratory Pattern evaluation – ad hoc Queries representations,
mining environment search toward - Optimization of the - user-friendly interfaces,
- Sample interesting patterns. processing and visualization
-Explore techniques.
-Estimate
-Dynamic change

Sri Ramakrishna College of Arts & Science 19


Major issues of Data Mining

extract information from - first partition the data into - a distributed and
huge amounts of data “pieces.” collaborative way
- Efficiency, - Each piece is processed, - promote incremental
- Scalability, in parallel, by searching data mining
-Performance, for patterns
-optimization,

Sri Ramakrishna College of Arts & Science 20


Major issues of Data Mining
4. Diversity of Datatypes

Sri Ramakrishna College of Arts & Science 21


Major issues of Data Mining
5. Data Mining and Society
The improper
disclosure or use of
data and the potential
Poses the risk of violation of individual
disclosing an privacy and data
individual’s personal protection rights
information. are areas of
Studies on concern that
privacy-preserving data need to be addressed.
publishing and
data mining are ongoing.

Data mining results


obtained through
mouse clicking.
Intelligent search
engines and
Internet-based stores
perform such
invisible data mining

Sri Ramakrishna College of Arts & Science 22


Summary
■ Data mining: Discovering interesting patterns and knowledge from
massive amount of data
■ A natural evolution of database technology, in great demand, with
wide applications
■ A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
■ Mining can be performed in a variety of data
■ Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
■ Data mining technologies and applications
■ Major issues in data mining

23

You might also like