0% found this document useful (0 votes)

38 views61 pages

Week 11 Lecture

This document discusses data mining applications and data ethics. It covers topics on mining text data, mining timeseries data, and data ethics. For mining text data, it discusses techniques like vector space representation, preprocessing, tf-idf representation, and representative-based algorithms like k-means clustering and scatter/gather approaches. For mining timeseries data, it discusses challenges like normalization and missing values imputation, and dimensionality in multivariate timeseries. It also provides an overview of forecasting in timeseries analysis.

Uploaded by

Sasi Dharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views61 pages

Week 11 Lecture

Uploaded by

Sasi Dharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ECS766P Data Mining

Week 11: Data Mining Applications & Data Ethics

Emmanouil Benetos
[email protected]

December 2022
School of EECS, Queen Mary University of London
Last week: Web Mining

• Six paradigms for today’s Internet

• Technology review
• Internet Mining Applications
• Ingesting Internet data
• Search Engine Indexing & Ranking

1
This week’s contents

1. Mining Text Data

2. Mining Timeseries Data

3. Data Ethics

2
Reading

• Chapters 13, 14, 16, and 20 of C. C. Aggarwal, “Data Mining: The

Textbook”, Springer, 2015 [non-essential reading]

Data Ethics content adapted from material by Dr Usman Naeem and the
Institute of Coding (IoC)
http://eecs.qmul.ac.uk/ioc/

3
Mining Text Data
Mining Text Data: Introduction

Mining Text Data

The text domain is sometimes challenging for mining purposes
because of its sparse and high-dimensional nature. Therefore,
specialised algorithms need to be designed. The ﬁrst step is the
construction of a bag-of-words representation for text data.

Several preprocessing steps need to be applied, such as stop-word

removal, stemming, and the removal of digits from the
representation.

Algorithms for problems such as clustering and classiﬁcation need

to be modiﬁed as well. The k-means method, hierarchical methods,
and probabilistic methods can be suitably modiﬁed to work for text
data.

4
Mining Text Data: Introduction

Text data are found in many domains:

• Digital libraries
• Web and Web-enabled applications
• News services

Modeling of Text:

• A sequence (string)
• A multidimensional record

5
Mining Text Data: Multidimensional Representations

Some terminology:

• Data point: document

• Data set: corpus
• Feature: word/term
• The set of features: lexicon

Vector Space Representation:

• Common words are removed

• Variations of the same word are consolidated
• Displays frequencies of individual words

6
Mining Text Data: Vector Space Representation

Figure: vector space representation for a collection of documents.

This particular representation is also called a document-term matrix.

7
Mining Text Data: Speciﬁc Characteristics of Text

Number of “Zero” Attributes (Sparsity):

• Most attribute values in a document are 0. This phenomenon is

referred to as high-dimensional sparsity.
• Affects many fundamental aspects of text mining, such as
distance computation.

Nonnegativity:

• Frequencies are nonnegative.

• The presence of a word is statistically more signiﬁcant than its
absence.

Side Information:

• Hyperlinks or other metadata associated with a document.

8
Mining Text Data: Data Preprocessing

Stop Word Removal:

• Words in a language that are not very discriminative for mining

• Articles, prepositions, and conjunctions

Stemming:

• Consolidate variations of the same word

• Singular and plural representations, different tenses, common
root extraction

Punctuation Marks:

• Commas, semicolons, digits, hyphens

9
Mining Text Data: tf–idf representation

Inverse Document Frequency:

idf(w) = log10 (|D|/|Dw |)

where |Dw | is the number of documents in which the word w occurs,

and |D| is the total number of documents.

Term Frequency:
fw,d
tf(w, d) = !
w′ ∈d fw′ ,d
The ratio of the number of appearances fw,d of word w in document
d, divided with the total number of words in document d.

Term frequency–Inverse document frequency (tf–idf):

tfidf(w, d) = tf(w, d) · idf(w)

10
Mining Text Data: Representative-Based Algorithms

Most clustering algorithms can be extended to text data, following

some modiﬁcations.

Representative-Based Algorithms: Since the vector space

representation of text is also a multidimensional data point,
algorithms such as k-means can be used for text data.

Modiﬁcations:

• Choice of Similarity Function: Cosine similarity

• Computation of the cluster centroids:
• Low-frequency words in the cluster are not retained
• A representative set of words are retained for each cluster
(200 to 400 words)
• Have signiﬁcant effectiveness advantages

11
Mining Text Data: Scatter/Gather Approach

The scatter/gather approach is effective because of its ability to

combine hierarchical and k-means algorithms.

• While the k-means algorithm scales as O(k · n), it is sensitive to

initialisation.
• While hierarchical partitioning algorithms are very robust, they
typically do not scale well.
• A Two-phase Approach:
1. Apply a procedure to create a robust set of initial seeds
(buckshot or fractionation procedure)
2. Apply a k-means approach on the resulting set of seeds

12
Mining Text Data: Scatter/Gather Approach

Buckshot
√
• Select a seed (sample of documents) of size k·n
• k is the number of clusters
• n is the number of documents
• Apply agglomerative hierarchical clustering to this initial sample
of seeds
• The time complexity is O(k · n)

• Agglomerative clustering methods

• The individual data points are successively merged into
higher-level clusters.

13
Mining Text Data: Scatter/Gather Approach

Fractionation

• Break up the corpus into n/m buckets, each of size m

• An agglomerative algorithm is applied to each bucket to reduce
them by a factor ν ∈ {0, 1}
• Then, we obtain νn agglomerated documents over all buckets
• An “agglomerated document” is deﬁned as the
concatenation of the documents in a cluster.
• Repeat the above process until k agglomerated documents

14
Mining Text Data: Scatter/Gather Approach

Fractionation

• Types of Partition
• Random partitioning
• Sort the documents by the index of the jth most common
word in the document. Contiguous groups of m documents
in this sort order are mapped to clusters.
• Time Complexity
• O(nm(1 + ν + ν 2 + ...)) = O(nm)

15
Mining Text Data: Scatter/Gather Approach

k-means algorithm
When the initial cluster centers have been determined with the use
of the buckshot or fractionation algorithms, one can apply the
k-means algorithm with the seeds obtained in the ﬁrst step.

• Each document is assigned to the nearest of the k cluster

centers
• The centroid of each such cluster is determined as the
concatenation of the documents in that cluster
• Furthermore, the less frequent words of each centroid are
removed

16
Mining Timeseries Data
Mining Timeseries Data: Introduction

Mining Timeseries Data

Timeseries data is common in many domains, such as sensor
networking, healthcare, and ﬁnancial markets.

Typically, timeseries data needs to be normalised, and missing

values need to be imputed for effective processing. Numerous data
reduction techniques such as Fourier and wavelet transforms are
used in timeseries analysis. The choice of similarity function is the
most crucial aspect of time series analysis.

Forecasting is an important problem in timeseries analysis because

it can be used to make predictions about data points in the future.
Most timeseries applications use either point-wise or shape-wise
analysis.

17
Mining Timeseries Data: Introduction

• Temporal data may be either discrete or continuous:

• Continuous temporal data sets are timeseries
• Discrete temporal data sets are sequences
• Time series data are viewed as contextual data representations,
with contextual and behavoural attrributes.
• Two types of models:
• Real-time analysis
• Retrospective analysis

18
Mining Timeseries Data: Data Preparation

Multivariate Time Series Data

A time series of length n and dimensionality d contains d numeric

features at each of n timestamps t1 , ..., tn . Each timestamp contains a
component for each of the d series. Therefore, the set of values
received at timestamp ti is Ȳi = (y1i , ..., ydi ). The value of the jth series
at timestamp ti is yji .

In a univariate time series, the value of d is 1. In such cases, a series

of length n is represented as a set of scalar behavioral values
y1 , ..., yn , associated with the timestamps t1 , ..., tn .

19
Mining Timeseries Data: Data Preprocessing

Handling Missing Values

The most common methodology used for handling missing,

unequally spaced, or unsynchronised values is linear interpolation.

Let yi and yj be values of the timeseries at times ti and tj ,

respectively, where i < j. Let t be a time drawn from the interval (ti ,
tj ). Then, the interpolated value of the series is given by:
" #
t − ti
y = yi + · (yj − yi )
tj − ti

Polynomial interpolation or spline interpolation are also possible.

20
Mining Timeseries Data: Data Preprocessing

Noise Removal

• Binning
• Grouping data into time intervals of size k
• Averaging value of data points in each interval
• Let yi·k+1 ...yi·k+k be the values at timestamps ti·k+1 ...ti·k+k .
The new binned value is:
!k
yi·k+r
yi+1 = r=1
′

k
• Moving-Average Smoothing: Moving-average (rolling averages)
methods reduce the loss in binning by using overlapping bins,
over which the averages are computed. Here a bin is
constructed starting at each timestamp in the series.

21
Mining Timeseries Data: Data Preprocessing

• Exponential Smoothing
The smoothed value yi is deﬁned as a linear combination of the
′

current value yi , and the previously smoothed value yi−1 .

′

Parameter α ∈ (0, 1) controls the smoothing:

′
yi = α · yi + (1 − α) · y′i−1

22
Mining Timeseries Data: Data Preprocessing

Normalisation

• Minmax normalisation to (0,1)

Let the minimum and maximum value of the time series be min
and max, respectively. Then, the time series value yi is mapped
to the new value yi in the range (0, 1) as:
′

′ yi − min
yi =
max − min
• z-score normalisation
Let µ and σ represent the mean and standard deviation of the
values in the timeseries. Then, the timeseries value yi is mapped
to a new value zi as:
y −µ
zi = i
σ

23
Mining Timeseries Data: Data Transformation

Discrete Wavelet Transform (DWT)

• DWT converts a timeseries to multidimensional data.

• A key advantage is that the DWT can capture both frequency and
temporal information.

24
Mining Timeseries Data: Data Transformation

Discrete Fourier Transform (DFT)

Idea: Decompose a given signal into a superposition of sinusoids
(elementary signals).

• The magnitude reﬂects the intensity at which the sinusoid of a

speciﬁc frequency appears in the signal.
• The phase reﬂects how the sinusoid has to be shi ted to best
correlate with the signal.
25
Mining Timeseries Data: Data Transformation

Discrete Fourier Transform (DFT)

Any series of length n can be expressed as a linear combination of
smooth periodic sinusoidal series. Consider a time series x0 ...xn−1 .
Each coefficient Xk of the Fourier transform is a complex value which
is deﬁned as follows:

n−1
$ n−1
$ n−1
$
Xk = xr ·e −irωk
= xr ·cos(rωk)−i xr ·sin(rωk) ∀k ∈ {0...n−1}
r=0 r=0 r=0

Where ω is set to 2π/n radians, and the notation i denotes the

√
imaginary number −1.

26
Mining Timeseries Data: Forecasting

The prediction of future trends has applications in:

• Retail sales
• Stock markets
• Weather forecasting
• Medicine and health

27
Mining Timeseries Data: Forecasting

Timeseries can be either stationary or nonstationary:

• A stationary stochastic process is one whose parameters, such

as the mean and variance, do not change with time.
• A nonstationary process is one whose parameters change with
time.

In forecasting, we typically convert or assume timeseries to be

stationary and use statistical parameters for forecasting.
Statistical methods for timeseries forecasing:

• Autoregressive (AR) models

• Moving average (MA) models
• Autoregressive Moving Average (ARMA) models
• Autoregressive Integrated Moving Average (ARIMA) models

28
Data Ethics
Is our underlying data ﬁt for purpose?

The objective of this section is to provide students with an

understanding of the key ethical and legal issues as well as
challenges that they might face when working on data mining. The
lecture will also provide insights on how to address these issues
based on the UK’s Data Ethics Framework.

Fundamental questions:

• “Does my analysis of the dataset infringe on a user’s privacy?”

• “Does the use of a particular dataset lead to ethical issues?”
• “Is the dataset accurate and ﬁt for purpose?”

29
What is Data Ethics?

In simple terms, ethics can be considered as conducting an activity

in a ‘good’, ‘acceptable’ or ‘right’ way. But, how can we determine
what is ‘good’, ‘acceptable’ or ‘right’? This can be subjective, as this is
something that is normally based on the values that are the norm
within different groups of people, which is normally inﬂuenced by
factors such as culture. The moral philosophy discipline categorises
ethics into the following two perspectives:

• Kantian: The ethical action is driven by moral values and

principles of the individual. This perspective is not concerned
about the consequence of an individual’s actions.
• Utilitarian: The action is ethical if the intention is to maximise
positive outcomes for a larger population of individuals. This
perspective is concerned about the consequence of an
individual’s actions.

30
What is Data Ethics?

Both Kantian and Utilitarian perspectives have their advantages and

disadvantages:

• Kantian perspective can be difficult to recognise moral (good)

values of an individual.
• Utilitarian perspective can overlook minority groups, as this
perspective only considers positive outcomes for the larger
group of individuals.

Data Ethics is concerned with the values and methods that are
adopted when we generate, analyse and disseminate data. Hence, a
fundamental objective of data ethics is to ensure that you consider
the social and legal implications of how and for what purpose you
use the data and algorithms as a data scientist.

31
Data Ethics - Suggested Reading

• Mingers, J., & Walsham, G. (2010). Towards ethical information systems:

The contribution of discourse ethics. MIS Quarterly, 34(4), 833–854.
• Pasquale, Frank & Citron, Danielee Keats (2014) Promoting Innovation
While Preventing Discrimination: Policy Goals for the Scored Society.
Washington Law Review 89:1413.
• Newell, S., & Marabelli, M. (2015). Strategic opportunities (and
challenges) of algorithmic decision making: A call for action on the
long-term societal effects of ‘datiﬁcation’. The Journal of Strategic
Information Systems.
• Vallor, S. (2016). Technology and the virtues: A philosophical guide to a
future worth wanting. Oxford University Press.
• Gumbus, A., & Grodzinsky, F. (2016). Era of big data: Danger of
descrimination. ACM SIGCAS Computers and Society, 45(3), 118–125.

32
Case study

“We also should be worried about misdirection of the innovation of scoring

in the employment context—particularly if firms can effectively hide
misconduct via scores. Existing laws prohibit some discriminatory uses of
the data. For example, an employer cannot fire workers simply because they
have an illness. But Big Data methods are able to predict diabetes from a
totally innocuous data set (including items like eating habits, drugstore
visits, magazine subscriptions, and the like). [...] For example, a firm could
conclude a worker is likely to be diabetic and that they are likely to be a
“high cost worker” given the significant monthly costs of diabetic medical
care.
(from Pasquale and Citron, 2014)

33
What is the Data Ethics Framework?

The Data Ethics Framework has been developed by the UK

government that prescribes the design of appropriate data use,
which is aimed at statisticians, analysts and data scientists working
directly/indirectly within the public sector. The objective of this
framework is to encourage ethical data use to build better services,
which is based on the following values of the Civil Service Code:

• Integrity
• Honesty
• Objectivity
• Impartiality

34
Resources:

• Data Ethics Framework

https://www.gov.uk/government/publications/
data-ethics-framework/data-ethics-framework
• Data Ethics Workbook
https://www.gov.uk/government/publications/
data-ethics-workbook

35
Which data are we allowed to use?

Quantitative secondary research sources that includes datasets such

as census data, birth/death rates, unemployment rates are a type of
data normally generated by governments, organisations and
charities.

Are we allowed to make use of this data? The answer is ‘yes’,

however we need to be aware of legislations related to the usage of
data. According to gov.uk, This includes how we:

• Produce statistics
• Protect privacy by design
• Minimise the data needed to achieve our need
• Keep personal and non-personal data secure

36
Personal Data Protection

If you intend to use personal data, then you must ensure that you
comply with the principles of the General Data Protection Regulation
(GDPR) and Data Protection Act 2018 (DPA 2018).

The importance of GDPR cannot be underestimated, as it aims to

improve the protection of data subject’s rights within Europe. In
addition to this, GDPR clearly articulates what companies must do to
protect personal data.

37
Personal Data Protection

Data scientists also need to take into consideration the

interpretability of data, as this is also a GDPR requirement.

There are two aspects to the interpretability of data (this legal

deﬁnition also includes models), which are transparency and post
hoc explanations.

38
Personal Data Protection

Transparency is based on how your model works, while post hoc

explanations are based on the information derived from your model.
From a GDPR perspective, this is important as a user has the legal
right to ﬁnd out how an algorithmic decision was made about them.

Resources:

• General Protection Data Protection Regulation (GDPR)

https://gdpr-info.eu

• Data Protection Act 2018 (DPA 2018)

https://www.legislation.gov.uk/ukpga/2018/12/enacted

39
Case Study: Autonomous Vehicles

Decision making models are dependent on data that is generated

given a particular scenario. One such example is the series of
decisions that have to be made given the data captured by the
multiple sensors in Autonomous Vehicles (AVs). The questions that
we need to think about are:

• “Who makes these decisions?”

• “Are there any legal liabilities for these decisions?”

The advent of AVs is seen as a progressive step towards a smart city

infrastructure, where the motivation is to provide safe roads by
reducing traffic accidents. However, decision-making models will
likely make a series of difficult moral decisions if the vehicle is
involved in a crash.

40
Case Study: Autonomous Vehicles

Let us consider the following scenarios:

• Scenario A:
The vehicle will keep on driving straight on the road and kill a group of
pedestrians or
The vehicle will swerve to the right and kill one person walking on the
pavement
• Scenario B:
The vehicle will keep on driving straight on the road and kill one
pedestrian or
The vehicle will swerve to the right onto the pavement and kill the
passenger in the vehicle
• Scenario C:
The vehicle will keep on driving straight on the road and kill a group of
pedestrians or
The vehicle will swerve to the right onto the pavement and kill the
passenger in the vehicle

41
Case Study: Autonomous Vehicles

These scenarios clearly illustrate why the designing of these models

can lead to ethical dilemmas, however will the legal liabilities be the
same for a pre-programmed AV and a human-driven car?
42
Readings:

• Contissa, G., Lagioia, F., & Sartor, G. (2017). The Ethical Knob:
ethically-customisable automated vehicles and the law. Artiﬁcial
Intelligence and Law, 25(3), 365-378.
• Ethics guidelines for trustworthy AI
https://ec.europa.eu/digital-single-market/
en/news/ethics-guidelines-trustworthy-ai

43
Data Reliability

Limitations with datasets can lead to data analysis being misleading

and unreliable. Hence, this is seen as an ethical concern.

How do we determine if a dataset is reliable?

We need to take into consideration the lineage of the dataset, as this

will allow us to trace back any errors or discrepancies to the
beginning of the data analysis process.

44
Data Reliability

Identiﬁcation of the data lineage can be done by answering the

following set of questions:

• What is the source of the data?

• How was the data collected? Was it by humans? Or automated
systems?
• Why was the data collected?
• Does the data reﬂect its target population?
• Are there any patterns in the data?
• Is data likely to change over time?
• Are there omissions from the dataset?
• What was the sampling method used to collect this data?

45
Data Bias

Bias within datasets can be caused by:

• Datasets that do not accurately represent the cohort that the

insights will be based on.
• Datasets produced by humans, which can be in the form of
curated news articles or social media content, which leads to
bias against a group of people.

There are huge ethical implications of having bias within datasets

that can lead to biased models that will be prejudiced and harmful
towards people.

An example of this is the case study about the “COMPAS Recidivism

Algorithm” that was used to predict a defendant’s likelihood to
commit a crime.

46
Data Bias - Types of Biases

Selection Bias
This type of bias occurs when the dataset does not reﬂect the
population or cohort that the insights or decisions will be based on.
This is very common with surveys. Hence, it tends to lead to a
situation where you only end up with willing participants who are a
small subset of the population and do not reﬂect the characteristics
of an average person. Hence, the existence of this bias is due to the
need of working with data that is easily accessible.

Self-Selection Bias
This is a subcategory of the selection bias, where the subject within
the analysis selects themselves. For example, if you are running an
online poll on how many people in a town can use an email client.
The results for this will not represent the entire town as only the
participants who had received the poll via email would be the most
likely to reply with a response.
47
Data Bias - Types of Biases

Omitted Variable Bias

This bias occurs when variables or features are omitted from the
dataset with the belief that they are not relevant to the output given
existing beliefs.

Observer Bias
The is type of bias occurs when the data scientist subconsciously
inﬂuences the outcome of their research by:

• Having previous knowledge or subjective feelings about a

sample of people being studied.
• Unintentional manipulation of participants during surveys or
interviews.
• Cherry picking a group of people who have characteristics that
will support the data scientist’s hypothesis.

48
Data Bias - Types of Biases

Social Bias
Social bias can be positive and negative and refers to being in favor
or against individuals or groups based on their social identities.
Commonly occurs in data science when using data collected from the
web, news, and social media.

The following case study illustrates an example of this, where text

features trained on Google News articles exhibited female and male
gender stereotypes:

• T. Bolukbasi et al, “Man is to Computer Programmer as Woman is

to Homemaker? Debiasing Word Embeddings”, 30th Conference
on Neural Information Processing Systems, 2016.

49
Personal vs Sensitive Data

What is the difference between personal and sensitive data?

Personal Data
Personal data is information that can be used to identify an
individual. Typical examples are name (ﬁrst/middle/last name),
address, email address, national insurance number, location data, IP
address, signature, date of birth and bank account details.

Typically datasets are made up of multiple pieces of personal data

which can be combined together to identify an individual.

50
Personal vs Sensitive Data

Sensitive Data
Sensitive data is a category of personal information that may lead to
harm or discrimination if not treated with extra care and security. For
example, sensitive information about an individual could be on:

• ethnicity
• religious beliefs
• political views and opinions
• sexual orientation
• trade union membership
• biometric data
• health records
• criminal records

This type of data should be encrypted or pseudonymised and stored

separately from other personal data.
51
Resources:

• https://www.itgovernance.co.uk/data-protection-dpa-and-eu-
data-protection-regulation
• https://ico.org.uk/media/for-
organisations/documents/1554/determining-what-is-personal-
data.pdf

52
Quiz

Question 1. Which of the following is considered personal data?

A Salary/wages
B Religious beliefs
C Sexual orientation
D Philosophical beliefs

Question 2. Which of the following is considered sensitive data?

A Hours of employment
B Emergency contact person details
C IP address
D Religious affiliation

53
Research Ethics at QMUL

All projects that involve human participants or personal data require ethics
approval from the university - including MSc projects!
Most projects which involve surveys/questionnaires can be approved
through the EECS ‘low risk’ ethics approval process:
https://qmulprod.sharepoint.com/sites/EECS-
DevolvedSchoolResearchEthicsCommittee/

Medium/high risk ethics applications are submitted to the Queen Mary

Ethics of Research Committee: http://www.jrmo.org.uk/performing-
research/conducting-research-with-human-participants-outside-the-nhs/

54
Summary

Mining Text Data It is the process of deriving high-quality information

from text-like datasets.
Mining Timeseries Data It comprises methods for analysing
timeseries data in order to extract meaningful statistics and other
characteristics of the datasets.
Data Ethics evaluates moral problems related to data, algorithms
and corresponding practices in order to formulate and support
morally good solutions.
Data Reliability refers to the assurance of the accuracy and
consistency of datasets.
Data Bias results in skewed outcomes, low accuracy levels, and
analytical errors.
Personal Data is information on an individual. Sensitive Data is
speciﬁc personal information that can cause discrimination.
55
2022 Intelligent Sensing Winter School

56
Questions?
also please use the forum on QM+

Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
DM Laqs
No ratings yet
DM Laqs
14 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
97 pages
01 Intro
No ratings yet
01 Intro
26 pages
Sathyapriya Thesis NEW
No ratings yet
Sathyapriya Thesis NEW
47 pages
Intro 1
No ratings yet
Intro 1
43 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Data Mining (Module-1)
No ratings yet
Data Mining (Module-1)
14 pages
Lecture 13
No ratings yet
Lecture 13
51 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
35 pages
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
No ratings yet
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
6 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
36 pages
TMK DWDM Unit 7 Advance Topics
No ratings yet
TMK DWDM Unit 7 Advance Topics
28 pages
BI Ch02
No ratings yet
BI Ch02
29 pages
Datamining Quiz
No ratings yet
Datamining Quiz
173 pages
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
No ratings yet
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
58 pages
DM 5th Unit
No ratings yet
DM 5th Unit
54 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 3 & 4
No ratings yet
Unit 3 & 4
50 pages
Complex Data Mining
No ratings yet
Complex Data Mining
5 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
23 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
38 pages
Chap1 Introduction
No ratings yet
Chap1 Introduction
21 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
37 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
3analysing Important Trend
No ratings yet
3analysing Important Trend
52 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Overview of Data Mining Tools & Techniques
No ratings yet
Overview of Data Mining Tools & Techniques
22 pages
DWDM Unit Ii
No ratings yet
DWDM Unit Ii
24 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
DM Passing Package
No ratings yet
DM Passing Package
38 pages
Data Mining & BI Course Guide
No ratings yet
Data Mining & BI Course Guide
25 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
30 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Data Mining Unit4
No ratings yet
Data Mining Unit4
16 pages
MODULE-V - Data Mining Trends and Research Frontiers
No ratings yet
MODULE-V - Data Mining Trends and Research Frontiers
52 pages
KDD and Data Mining Explained
No ratings yet
KDD and Data Mining Explained
46 pages
Chapter 1 - What Is Data Mining
No ratings yet
Chapter 1 - What Is Data Mining
8 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
17 pages
Concepts and Techniques: - Chapter 13
No ratings yet
Concepts and Techniques: - Chapter 13
52 pages
Data Mining for Business Experts
No ratings yet
Data Mining for Business Experts
41 pages
Unit-5 Case Studies - 21CSE355T
No ratings yet
Unit-5 Case Studies - 21CSE355T
21 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
Combine 056
No ratings yet
Combine 056
57 pages
Data Mining Challenges and Issues
No ratings yet
Data Mining Challenges and Issues
24 pages
Mobile Miner: Community Mining Algorithm
No ratings yet
Mobile Miner: Community Mining Algorithm
20 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
Data Mining - Extracting Knowledge From Large Datasets
No ratings yet
Data Mining - Extracting Knowledge From Large Datasets
1 page
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Data Mining-Converti 2
No ratings yet
Data Mining-Converti 2
3 pages
UNIT - II - Data Mining Essentials
No ratings yet
UNIT - II - Data Mining Essentials
20 pages
Letter of Recommendation 2
No ratings yet
Letter of Recommendation 2
1 page
Untitled Document
No ratings yet
Untitled Document
3 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Dummy PDF File Overview
No ratings yet
Dummy PDF File Overview
1 page
Self-Employment Webinar Essay
No ratings yet
Self-Employment Webinar Essay
1 page
IELTS Writing Task 2 Estimated Overall Band Score: 9 Prompt
No ratings yet
IELTS Writing Task 2 Estimated Overall Band Score: 9 Prompt
2 pages
Assumptions of Regression Including Independent of Errors
No ratings yet
Assumptions of Regression Including Independent of Errors
5 pages
Probabilistic Machine Learning An Introduction 1st Edition Kevin P. Murphy Ready To Read
No ratings yet
Probabilistic Machine Learning An Introduction 1st Edition Kevin P. Murphy Ready To Read
88 pages
Curve Fitting
No ratings yet
Curve Fitting
32 pages
ABCA 2 Model Building
No ratings yet
ABCA 2 Model Building
9 pages
Problem Solving & Algorithms Guide
No ratings yet
Problem Solving & Algorithms Guide
17 pages
Urban Planning: Facility Location Optimization
No ratings yet
Urban Planning: Facility Location Optimization
18 pages
Python Lab Manual With Output Alg FC A
No ratings yet
Python Lab Manual With Output Alg FC A
12 pages
Experiment: 5 Object: Find All The Code Words of The (15,11) Hamming Code and Verify
No ratings yet
Experiment: 5 Object: Find All The Code Words of The (15,11) Hamming Code and Verify
2 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
Kec 553a Lab Manual DSP 20-21
No ratings yet
Kec 553a Lab Manual DSP 20-21
62 pages
Implementation of Modified Booth Algorithm (Radix 4) and Its Comparison With Booth Algorithm (Radix-2)
No ratings yet
Implementation of Modified Booth Algorithm (Radix 4) and Its Comparison With Booth Algorithm (Radix-2)
18 pages
ML Unit 5
No ratings yet
ML Unit 5
19 pages
Optimal Dispatch for Three-Unit System
No ratings yet
Optimal Dispatch for Three-Unit System
10 pages
DS Module Iv Part 2
No ratings yet
DS Module Iv Part 2
20 pages
Lecture#14
No ratings yet
Lecture#14
38 pages
Transformer - Ipynb - Colab
No ratings yet
Transformer - Ipynb - Colab
5 pages
Chapter Four - Advanced Sorting Algorithms
No ratings yet
Chapter Four - Advanced Sorting Algorithms
26 pages
Chapter 2 - Polynomials
No ratings yet
Chapter 2 - Polynomials
27 pages
Knapsack Problems: I. History
No ratings yet
Knapsack Problems: I. History
7 pages
BCSE204L CAT1 Model Question
No ratings yet
BCSE204L CAT1 Model Question
2 pages
!!!recap!!! Fourier Series Coefficients of Periodic Digital Signals
No ratings yet
!!!recap!!! Fourier Series Coefficients of Periodic Digital Signals
13 pages
R22 ML Question Bank-1
No ratings yet
R22 ML Question Bank-1
7 pages
AI Search Algorithms Explained
No ratings yet
AI Search Algorithms Explained
78 pages
HW 1 Eeowh 3
No ratings yet
HW 1 Eeowh 3
6 pages
Cpu Scheduling
No ratings yet
Cpu Scheduling
7 pages
Lecture 16 - Classification
No ratings yet
Lecture 16 - Classification
43 pages
Yamaha Rivage PM10 Review
No ratings yet
Yamaha Rivage PM10 Review
18 pages
String Matching
No ratings yet
String Matching
4 pages
Tugas: Pengantar Geofisika Eksplorasi
No ratings yet
Tugas: Pengantar Geofisika Eksplorasi
4 pages
Data Structures Exam Prep
No ratings yet
Data Structures Exam Prep
1 page