0% found this document useful (0 votes)

12 views5 pages

Data Discretization II

Data discretization is a technique in data science that simplifies large datasets by converting continuous data values into discrete intervals while minimizing information loss. It can be performed using supervised or unsupervised methods, with techniques such as decision tree analysis, binning, and cluster analysis. Discretization is important for improving feature interpretation and reducing noise in data, making it easier for machine learning algorithms to process continuous attributes.

Uploaded by

Tejovanth .D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Data Discretization II

Uploaded by

Tejovanth .D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

What is Data Discretization?

Data discretization in data science is the technique used to evaluate and manage large
amounts of data into simplified forms. This technique converts a large number of data values
into a smaller number of values. In a nutshell, data discretization is a method that converts
the attribute values of continuous data into a discrete collection of intervals while
minimizing the amount of data that is lost in the process.

The first method is known as supervised discretization, and the second is known as
unsupervised discretization. Both of these methods are used to discretize data. A technique
known as supervised discretization utilizes class data as part of its analysis. The term
"unsupervised discretization" describes a process that is determined by the way in which the
operation is carried out. This indicates that it is applicable to both the top-down technique
of dividing and the bottom-up method of merging. Get the required Data Science Online
Certification Course and become fully prepared for these prominent concepts of data
science.

What is Data Discretization in Data Mining?

The process of transforming the attribute values of continuous data into a limited set of
intervals while sacrificing as little information as possible is referred to as "data
discretization." The process of data discretization, in which interval markers are substituted
for the values of the numeric data, makes the transmission of data more easier. It is possible
to substitute interval labels like (0-10, 11-20...) or (0-10, 11-20...) for the values that are
stored in the 'generation' variable, which are similar in nature (kid, youth, adult, senior). The
process of data discretization can be broken down into two distinct subcategories: the first is
supervised discretization, in which the class data is utilized; the second is unsupervised
discretization, in which the results are determined by the direction in which the operation is
carried out, also known as a "top-down splitting strategy" or a "bottom-up merging
strategy."
ontinuous characteristics are a requirement for many different types of data mining
projects in the real world. However, a significant number of the most recent exploratory data
mining algorithms have difficulty appealing to qualities of this kind. In addition, even if the
machine learning job is able to manage a continuous attribute, the output will benefit
substantially if the continuous attributes are replaced with their quantized values. This is
because the machine learning task is better able to manage the continuous values. The act
of converting continuous data into intervals and then designating the precise value that
should be used for each interval is known as data discretization. It is also possible to describe
it as the process of discretizing time based on the units of time intervals, as opposed to a
particular value.

Although the discrete values from the discrete attribute domain are not required to be
present in each discrete interval of the discretized attribute domain, these discrete values
must nonetheless cause an ordering to be imposed on the domain of the discrete attribute
itself. As a consequence of this, it results in a very significant increase in the consistency of
the information that is discovered, as well as a decrease in the amount of time required to
complete various data mining tasks, such as the discovery of association rules, classification,
and of course, prediction. It provides a steady improvement for domains that have a modest
number of continuous characteristics, but even as the number of attributes rises, it is usually
always accurate.

Discretization from The Top-down

The process is referred to as top-down discretization or slicing if it begins by first locating

one or a few points to divide the entire set of attributes (referred to as split points or cut
points), and if it then performs this recursively at the intervals that result from the divisions
made by those points.

Discretization from The Bottom-up

Bottom-up discretization or merging is the term used to describe the process when it begins
by considering all of the continuous values as possible split-points. Other continuous values
are then discarded by combining neighboring values to form intervals, which is why this
method is also known as bottom-up discretization.

Quick discretization of an attribute is possible, and it enables one to achieve what is known
as a definition hierarchy, which is a hierarchical split of the attribute values.

What are Some Famous Techniques of Data Discretization?

Data Discretization Using Decision Tree Analysis - A supervised method is used to do data
discretization in an application of decision tree analysis known as top-down slicing. This
operation is carried out to ensure accurate results. In order to discretize a numeric attribute,
you must first select the attribute that has the lowest entropy, and then you must put that
attribute through a recursive process that will break it up into several discrete disjoint
intervals, one below the other, using the same splitting criterion. This must be done in order
for the attribute to be discretized.

Binning - This method may also be utilized for the discretization of data and, moreover, for
the establishment of thought hierarchies. The values discovered for an attribute are
organized into a set of bins with widths and frequencies that are equal to one another. The
numbers are then smoothed down by applying either the bin mean or the bin median to
each bean. You may construct concept hierarchy by iteratively applying this approach.
recursively. Unsupervised discretization is achieved by binning since it does not make use of
any class information.

Histogram Analysis - The observed value of an attribute is partitioned by the histogram into
a collection of discrete subsets, which are sometimes referred to as buckets or bins.

Cluster Analysis - The practice of discretizing data frequently takes the form of cluster
analysis. It is possible to create a clustering method by first isolating a computational
characteristic of A and then separating the values of A into clusters or classes.

It is possible to further break down each original cluster or division into a large number of
subcultures, producing a hierarchy level that is lower than the first one.
Data Discretization Using Correlation Analysis - After discretizing the data using linear
regression, the best neighboring intervals are identified, and then the big intervals are joined
to produce larger overlaps in order to generate the final set of 20 overlapping intervals. It is
a technique that requires supervision.

Generation Concept Hierarchy for Nominal Data - The nominal data or nominal attribute is
one that has a limited number of distinct values, but there is no ordering between the
values. Nominal qualities include things like employment category, age category, geographic
location, item category, and so on and so forth. The definition hierarchy is formed by the
nominal attributes, which are created by adding a collection of attributes. It is able to
establish a hierarchy of definitions, such as a road, a region, a state, and a nation all at once.

The data are transformed into several levels thanks to the concept hierarchy. The definition
hierarchy may be constructed, and this can be accomplished at the level of the schema, by
adding partial or absolute ordering between the attributes.

If you are determined to learn Data Science, go ahead & follow this complete guide to Data
Science Career Path.

Why Discretization is Important?

There are mathematical challenges associated with continuous data for an unlimited
number of degrees of freedom (DoF). Implementing discretization is necessary for data
scientists to do their work for a variety of reasons.

▪ Features Interpretation - Continuous functions, which have unlimited degrees of

freedom, have a reduced likelihood of correlating with the target variable and can
have a complicated non-linear interaction. This is because the degrees of freedom
are endless. As a result, having a proper comprehension of such a function can prove
to be more difficult. Following the discretization of a variable, it is possible to see
groups that correspond to the goal.

▪ Ratio Signal-to-Noise - When we discretize a model, we may fit it into bins and
lessen the impact of tiny data variations in the process. Sometimes, the term "noise"
is used to refer to slight deviations. This noise will be reduced as a result of
discretization. This is known as the "smoothing" approach, and it involves lowering
the amount of noise in the data by smoothing out the variations that come from
each bin.

Examples of Discretization in Data Science?

The process of transforming continuous qualities into discrete attributes is referred to as

"data discretization" in the field of data mining.This technique may also be used to create
binary attributes from other data types.

Example:
# demonstration of the discretization transform

from numpy.random import randn

from sklearn.preprocessing import KBinsDiscretizer

from matplotlib import pyplot

# generate gaussian data sample

data = randn(1000)

# histogram of the raw data

pyplot.hist(data, bins=25)

pyplot.show()

# reshape data to have rows and columns

data = data.reshape((len(data),1))

# discretization transform the raw data

kbins = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')

data_trans = kbins.fit_transform(data)

# summarize first few rows

print(data_trans[:10, :])

# histogram of the transformed data

pyplot.hist(data_trans, bins=10)

pyplot.show()

Data Discretization
No ratings yet
Data Discretization
4 pages
Discretization and Concept Hierarchy in Data Mining
No ratings yet
Discretization and Concept Hierarchy in Data Mining
2 pages
#CH-2 1 5
No ratings yet
#CH-2 1 5
19 pages
Data Discretization and Visualization Guide
No ratings yet
Data Discretization and Visualization Guide
3 pages
Survey of Discretization Techniques
No ratings yet
Survey of Discretization Techniques
12 pages
Data Discretization
No ratings yet
Data Discretization
9 pages
What Is Data Science and Cpare Data Science and Information Science
No ratings yet
What Is Data Science and Cpare Data Science and Information Science
11 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
Data Transformation in Data Mining
No ratings yet
Data Transformation in Data Mining
6 pages
Improved Discretization Based Decision Tree For Continuous Attributes
No ratings yet
Improved Discretization Based Decision Tree For Continuous Attributes
5 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
11 pages
DATA Transformation
No ratings yet
DATA Transformation
9 pages
05 DS Data Preprocessing - Cleaning
No ratings yet
05 DS Data Preprocessing - Cleaning
14 pages
Data Discretization Unification
No ratings yet
Data Discretization Unification
14 pages
DWDM AR16 Unit 1.2
No ratings yet
DWDM AR16 Unit 1.2
14 pages
Mining Using Genitic Algorithms
No ratings yet
Mining Using Genitic Algorithms
7 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Wa0029.
No ratings yet
Wa0029.
4 pages
Data Discretization Techniques
No ratings yet
Data Discretization Techniques
21 pages
IDS5
No ratings yet
IDS5
56 pages
Data Preprocessing Techniques Guide
No ratings yet
Data Preprocessing Techniques Guide
35 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Introduction To Data Science 8-2-2025
No ratings yet
Introduction To Data Science 8-2-2025
6 pages
Data Preprocessing Techniques Overview
100% (1)
Data Preprocessing Techniques Overview
39 pages
4 Popular Discretization Techniques You Need To Know in Data Science
No ratings yet
4 Popular Discretization Techniques You Need To Know in Data Science
17 pages
Ignore The Tuple
No ratings yet
Ignore The Tuple
2 pages
4 - Discretization and Concept Hierarchy
No ratings yet
4 - Discretization and Concept Hierarchy
26 pages
III Unit Mtech 2023
No ratings yet
III Unit Mtech 2023
121 pages
Big Data Lecture # 04
No ratings yet
Big Data Lecture # 04
22 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
19 pages
Insem Notes
No ratings yet
Insem Notes
8 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
Data Discretization & Visualization
No ratings yet
Data Discretization & Visualization
11 pages
Unit 2
No ratings yet
Unit 2
34 pages
Module III Data Mining
No ratings yet
Module III Data Mining
7 pages
Data Mining Module 2 Important Topics PYQs
No ratings yet
Data Mining Module 2 Important Topics PYQs
35 pages
DW&DM (Unit - 4)
No ratings yet
DW&DM (Unit - 4)
9 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Discretization Techniques in Data Mining
No ratings yet
Discretization Techniques in Data Mining
17 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
DWH Unit-3
No ratings yet
DWH Unit-3
12 pages
063143jnr 2 Ijast
No ratings yet
063143jnr 2 Ijast
14 pages
17 Data Analysis
No ratings yet
17 Data Analysis
64 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
Data Transformation and Standardization
No ratings yet
Data Transformation and Standardization
5 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
Data Pre-processing Techniques Explained
No ratings yet
Data Pre-processing Techniques Explained
101 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
Data Preprocessing Techniques Overview
No ratings yet
Data Preprocessing Techniques Overview
8 pages
4 - Discretization and Concept Hierarchy
No ratings yet
4 - Discretization and Concept Hierarchy
27 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
62 pages
Unit-3 Data Reduction
No ratings yet
Unit-3 Data Reduction
5 pages
Lecture 7 Data Reduction
No ratings yet
Lecture 7 Data Reduction
5 pages
CAT 1 Assignmnet CO1 CO2 Answers
No ratings yet
CAT 1 Assignmnet CO1 CO2 Answers
27 pages
Attribute Subset Selection in Data Mining
No ratings yet
Attribute Subset Selection in Data Mining
4 pages
CSBS Syllabus Book 02 02 2023
No ratings yet
CSBS Syllabus Book 02 02 2023
253 pages
Cat 1 - QB - 20Cbpc503-Design Thinking
No ratings yet
Cat 1 - QB - 20Cbpc503-Design Thinking
7 pages
Cat 1 - QB - 20Cbpc503-Design Thinking
No ratings yet
Cat 1 - QB - 20Cbpc503-Design Thinking
3 pages
5.3 Array of Structures, Self Referential Structures
No ratings yet
5.3 Array of Structures, Self Referential Structures
24 pages
Virtusa 2025-26 Materials
100% (1)
Virtusa 2025-26 Materials
42 pages
Virtusa Coding Questions 9th July, 2025
No ratings yet
Virtusa Coding Questions 9th July, 2025
12 pages
Virtusa Aptitude Preparation
No ratings yet
Virtusa Aptitude Preparation
9 pages
McDonaldization in Everyday Life
100% (1)
McDonaldization in Everyday Life
8 pages
A Systematic Approach To DC Bus Control
No ratings yet
A Systematic Approach To DC Bus Control
9 pages
The Effects of Direct and Indirect Instruction On Students Achievements in Mathematics
100% (1)
The Effects of Direct and Indirect Instruction On Students Achievements in Mathematics
18 pages
Maed Thesis Template
No ratings yet
Maed Thesis Template
192 pages
AP 2 Module 1
No ratings yet
AP 2 Module 1
31 pages
(Ebook) Songs of Naropa: Commentaries On Songs of Realization by Khenchen Rinpoche, Thrangu Rinpoche, Marcia Binder Schmidt, Erik Pema Kunsang ISBN 9789627341284, 9627341282 Instant Download
100% (1)
(Ebook) Songs of Naropa: Commentaries On Songs of Realization by Khenchen Rinpoche, Thrangu Rinpoche, Marcia Binder Schmidt, Erik Pema Kunsang ISBN 9789627341284, 9627341282 Instant Download
142 pages
Bacteria: Friend or Foe?
No ratings yet
Bacteria: Friend or Foe?
6 pages
Spelling Connections Grade 4 Homework
50% (2)
Spelling Connections Grade 4 Homework
7 pages
EE 311 Module 2: Resistance
No ratings yet
EE 311 Module 2: Resistance
13 pages
Understanding Pediculosis and Lice
No ratings yet
Understanding Pediculosis and Lice
15 pages
Bill Summary
No ratings yet
Bill Summary
4 pages
Periosteal Reaction
No ratings yet
Periosteal Reaction
9 pages
April 2020 Journal Entries Overview
50% (2)
April 2020 Journal Entries Overview
14 pages
Pediatric Bleeding Questionnaire Scoring Key
No ratings yet
Pediatric Bleeding Questionnaire Scoring Key
1 page
Kaveri SSP 24 25
No ratings yet
Kaveri SSP 24 25
2 pages
Sharadindu Bandyopadhyay
0% (1)
Sharadindu Bandyopadhyay
12 pages
March 2025 Atswa Examination Timetable
No ratings yet
March 2025 Atswa Examination Timetable
2 pages
Geologic Time Scale
100% (2)
Geologic Time Scale
12 pages
Duplicate Registration Certificate Procedure
No ratings yet
Duplicate Registration Certificate Procedure
2 pages
Industrial Iot 4G Lte Router & Gateway: ICR-3231, ICR-3231W
No ratings yet
Industrial Iot 4G Lte Router & Gateway: ICR-3231, ICR-3231W
4 pages
Cascading Archival Setup Guide
No ratings yet
Cascading Archival Setup Guide
2 pages
Mold & Lyme Toxins: Health Impact
100% (2)
Mold & Lyme Toxins: Health Impact
4 pages
Bitcoin Security & Script Techniques
No ratings yet
Bitcoin Security & Script Techniques
15 pages
2023-2024 Mass, Weight and Density - PPTX Updated
No ratings yet
2023-2024 Mass, Weight and Density - PPTX Updated
40 pages
Fundamentals of Physics Sixth Edition: Halliday Resnick Walker
0% (1)
Fundamentals of Physics Sixth Edition: Halliday Resnick Walker
5 pages
Fluidized Bed Dryer Appratus
No ratings yet
Fluidized Bed Dryer Appratus
6 pages
Barcelona Itinerary
No ratings yet
Barcelona Itinerary
6 pages
Accessory Installation
No ratings yet
Accessory Installation
5 pages
2024 Camp Program Overview
No ratings yet
2024 Camp Program Overview
11 pages
Statistical Analysis & Hypothesis Testing
100% (7)
Statistical Analysis & Hypothesis Testing
13 pages

Data Discretization II

Uploaded by

Data Discretization II

Uploaded by

What is Data Discretization?

What is Data Discretization in Data Mining?

Discretization from The Top-down

The process is referred to as top-down discretization or slicing if it begins by first locating

Discretization from The Bottom-up

What are Some Famous Techniques of Data Discretization?

Why Discretization is Important?

▪ Features Interpretation - Continuous functions, which have unlimited degrees of

Examples of Discretization in Data Science?

The process of transforming continuous qualities into discrete attributes is referred to as

from numpy.random import randn

from sklearn.preprocessing import KBinsDiscretizer

from matplotlib import pyplot

# generate gaussian data sample

# histogram of the raw data

# reshape data to have rows and columns

# discretization transform the raw data

kbins = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')

# summarize first few rows

# histogram of the transformed data

You might also like