0% found this document useful (0 votes)
11 views17 pages

Unit III Notes

Uploaded by

mathibalan020080
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Unit III Notes

Uploaded by

mathibalan020080
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT III: Introduction to Machine Learning

The Origins of Machine Learning. Uses and Abuses of Machine Learning. How do
Machines Learn? - Abstraction and Knowledge Representation, Generalization.
Assessing the Success of Learning 4 Steps to Apply Machine Learning to Data.
Choosing a Machine Learning Algorithm - Thinking about the Input Data,
Thinking about Types of Machine Learning Algorithms, Matching Data to an
Appropriate Algorithm.

Machine Learning– Study of algorithms that– improve their performance– at some task–
with experience

• Optimize a performance criterion using example data or past experience.

• Role of Statistics: Inference from a sample

• Role of Computer science: Efficient algorithms to– Solve the optimization problem–
Representing and evaluating the model for inference

The Origins of Machine Learning

● Roots in Artificial Intelligence (1950s–1980s):

o Inspired by early AI research (Turing’s ideas on intelligent machines,


perceptrons by Rosenblatt in 1957).

o Symbolic AI tried to represent knowledge with explicit rules but struggled


with complexity.

● Statistical Learning & Algorithms (1980s–1990s):

o Development of decision trees, Bayesian models, and support vector


machines.

o Shift from rule-based AI to data-driven approaches.

● Modern ML (2000s–present):

o Explosion of data + computational power (GPUs, cloud computing).

o Neural networks, deep learning, reinforcement learning powering


applications from speech recognition to autonomous systems.

2. Uses and Abuses of Machine Learning

Uses (Benefits)

● Healthcare: Disease prediction, medical imaging, drug discovery.

● Finance: Fraud detection, algorithmic trading, credit risk scoring.


● Transportation: Self-driving cars, traffic prediction.

● Natural Language Processing (NLP): Translation, chatbots, sentiment analysis.

● Personalization: Recommender systems (Netflix, Amazon, Spotify).

Risks / Misuses

● Bias and Discrimination: Models can reflect or amplify biases in training data.

● Privacy Violations: Overuse of personal data without consent.

● Overfitting & Misinterpretation: Models perform well on training data but fail in
real-world settings.

● Over-Reliance on Black-Box Systems: Blind trust in ML without transparency or


accountability.

● Unethical Applications: Deepfakes, surveillance misuse, manipulative


recommendation systems.

How Do Machines Learn?

Machines learn by identifying patterns and relationships in data through algorithms. One
common approach is supervised learning, where a model learns from labeled data to map
inputs to outputs—for instance, predicting house prices using historical records. In contrast,
unsupervised learning deals with unlabeled data, where the model discovers hidden
structures such as customer groups through clustering. Another important method is
reinforcement learning, where a model interacts with its environment and improves its
performance based on rewards or penalties, as seen in AlphaGo mastering the game of Go.
A powerful subset of machine learning is deep learning, which employs multi-layered
neural networks to capture high-level abstractions, making it effective in tasks like speech
recognition and image classification. Overall, the process of machine learning involves
feeding input data into the system, extracting meaningful features, training a model,
evaluating its performance, and finally using it for prediction or decision-making.

Abstraction and Knowledge Representation

Abstraction: The process of reducing complexity by focusing on the essential features of data
or a problem while ignoring irrelevant details.

Example: Representing an image as “edges, corners, and objects” instead of raw pixels.

Knowledge Representation (KR): How knowledge is formally stored and structured in AI


systems so that algorithms can reason with it.

▪ Logical Representation (rules, facts, ontologies)

▪ Semantic Networks/Graphs (nodes & relationships)


▪ Feature Vectors / Embeddings (numerical representation of
knowledge for ML models)

Together, abstraction + KR help AI systems reason, learn, and make decisions effectively.

2. Generalization

The ability of a machine learning model to apply what it learned from training data to new,
unseen data. Good generalization means the model captures underlying patterns rather than
memorizing (overfitting).

Example: A sentiment analysis model trained on movie reviews should correctly classify
unseen product reviews.

3. Assessing the Success of Learning

Evaluating how well a machine learning system has learned is a crucial step to ensure its
effectiveness. The process begins with measuring training performance, where the model is
checked to see how well it fits the given training data. However, good training results alone
are not enough, so the next step is to evaluate validation accuracy using unseen validation
data; this helps in fine-tuning hyperparameters and preventing overfitting. After validation,
the model’s testing or deployment performance is assessed on completely independent test
data using metrics such as accuracy, precision, recall, F1-score, or RMSE, depending on the
type of problem. Finally, a generalization check is performed to confirm that the model can
handle real-world data or cross-domain scenarios effectively, rather than only performing
well on the dataset it was trained on. Together, these steps ensure that the learning process
results in a robust and reliable machine learning model.

4. Steps to Apply Machine Learning to Data

1. Data Collection & Preparation

o Gather raw data, clean it (remove noise, handle missing values), and
transform into usable format.

2. Feature Engineering & Representation

o Select important features, normalize/encode them, and represent data in


numerical form.

3. Model Selection & Training

o Choose suitable algorithms (e.g., regression, decision tree, clustering) and


train the model on the dataset.

4. Evaluation & Deployment

o Assess model performance using metrics, optimize it, and deploy into real-
world systems for use.
Input Data:

Database Data

• Data Warehouse- a repository of information collected from multiple sources, stored


under a unified schema, and usually residing at a single site

• Transactional Data- each record in a transactional database captures a transaction

• Other Kinds of Data– say, time-related or sequence data, data streams, spatial data,
engineering design data, hypertext and multimedia data, graph and network data, the Web

Read Data from Various Sources

• CSV Files

• Excel Files

• JSON Files

• SQL Databases

• Web APIs

• Web Scraping

• Big Data Sources

• Streaming Data

Common Types of Data Visualizations


• BarCharts: Used to compare categories or discrete data.

• Line Charts: Show trends or changes over time.

• Scatter Plots: Display relationships between two variables.

• Histograms: Visualize the distribution of a continuous variable.

• Pie Charts: Show parts of a whole (use with caution due to limited accuracy).

• Heatmaps: Display relationships in a matrix-like format.

• BoxPlots: Summarize data distribution and identify outliers.

• AreaCharts: Similar to line charts, often used to show cumulative values

INPUT DATA PROCESSING:

Data Processing is defined as the procedure of extracting information from huge sets of data.
In other words, we can say that data Processing is Processing knowledge from data.

• Data Processing is one of the most useful techniques that help entrepreneurs,
researchers, and individuals to extract valuable information from huge sets of data. The
knowledge discovery process includes Data cleaning, Data integration, Data selection, Data
transformation, Data Processing, Pattern evaluation, and Knowledge presentation.

• Data Processing is the act of automatically searching for large stores of information
to find trends and patterns that go beyond simple analysis procedures.

• Data Processing utilizes complex mathematical algorithms for data segments and
evaluates the probability of future events. Data Processing is also called Knowledge
Discovery of Data (KDD).

DATA PROCESSING IMPLEMENTATION PROCESS:


Business understanding:

In this phase, business and data-Processing goals are established.

• First, you need to understand business and client objectives. You need to define what
your client wants (which many times even they do not know themselves)

• Take stock of the current data Processing scenario. Factor in resources,


assumption, constraints, and other significant factors into your assessment.

• Using business objectives and current scenario, define your data Processing goals.

• A good data Processing plan is very detailed and should be developed to accomplish
both business and data Processing goals.

Data understanding:

In this phase, sanity check on data is performed to check whether its appropriate for the data
Processing goals.

• First, data is collected from multiple data sources available in the organization.

• These data sources may include multiple databases, flat filer or data cubes. There are
issues like object matching and schema integration which can arise during Data Integration
process. It is a quite complex and tricky process as data from various sources unlikely to
match easily. For example, table A contains an entity named cust_no whereas another table
B contains an entity named cust-id.

• Therefore, it is quite difficult to ensure that both of these given objects refer to the
same value or not. Here, Metadata should be used to reduce errors in the data integration
process.

• Next, the step is to search for properties of acquired data. A good way to explore the
data is to answer the data Processing questions (decided in business phase) using the query,
reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained. Missing data if
any should be acquired.

Data preparation:

• In this phase, data is made production ready.

• The data preparation process consumes about 90% of the time of the project.

• The data from different sources should be selected, cleaned, transformed, formatted,
anonymized, and constructed (if required).

• Data cleaning is a process to “clean” the data by smoothing noisy data and filling in

missing values.

• For example, for a customer demographics profile, age data is missing. The data is
incomplete and should be filled. In some cases, there could be data outliers. For instance, age
has a value 300. Data could be inconsistent. For instance, name of the customer is different in
different tables.

• Data transformation operations change the data to make it useful in data Processing.
Following transformation can be applied

Data transformation:

Data transformation operations would contribute toward the success of the Processing
process.

Smoothing: It helps to remove noise from the data.

Aggregation: Summary or aggregation operations are applied to the data. I.e., the weekly
sales data is aggregated to calculate the monthly and yearly total.

Generalization: In this step, Low-level data is replaced by higher-level concepts with the
help of concept hierarchies. For example, the city is replaced by the county.

Normalization: Normalization performed when the attribute data are scaled up o scaled
down. Example: Data should fall in the range -2.0 to 2.0 post-normalization.

Modeling

In this phase, mathematical models are used to determine data patterns.

• Based on the business objectives, suitable modeling techniques should be selected for
the prepared dataset.

• Create a scenario to test check the quality and validity of the model.

• Run the model on the prepared dataset.


• Results should be assessed by all stakeholders to make sure that model can meet data
Processing objectives.

Evaluation:

In this phase, patterns identified are evaluated against the business objectives.

• Results generated by the data Processing model should be evaluated against the
business objectives.

• Gaining business understanding is an iterative process. In fact, while understanding,


new business requirements may be raised because of data Processing.

• A go or no-go decision is taken to move the model in the deployment phase.

Deployment:

1. Classification:

2. Clustering:

3. Regression:

4. Association Rules:

5. Outer detection:

This type of data Processing technique refers to observation of data items in the dataset
which do not match an expected pattern or expected behavior. This technique can be used in
a variety of domains, such as intrusion, detection, fraud or fault detection, etc. Outer
detection is also called Outlier Analysis or Outlier Processing.

6. Sequential Patterns:

7. Prediction:

The significant components of data Processing systems are a data source, data Processing
engine, data warehouse server, the pattern evaluation module, graphical user interface, and
knowledge base.

Data Source:

• The actual source of data is the Database, data warehouse, World Wide Web
(WWW), text files, and other documents. You need a huge amount of historical data for data
Processing to be successful. Organizations typically store data in databases or data
warehouses.

• Data warehouses may comprise one or more databases, text files spreadsheets, or
other repositories of data. Sometimes, even plain text files or spreadsheets may contain
information. Another primary source of data is the World Wide Web or the internet.

Different processes:

• Before passing the data to the database or data warehouse server, the data must be
cleaned, integrated, and selected. As the information comes from various sources and in
different formats, it can't be used directly for the data Processing procedure because the data
may not be complete and accurate.

• So, the first data requires to be cleaning and unifying. More information than needed
will be collected from various data sources, and only the data of interest will have to be
selected and passed to the server.

• 4These procedures are not as easy as we think. Several methods may be performed
on the data as part of selection, integration, and cleaning.

Data Processing Engine:

• The data Processing engine is a major component of any data Processing system. It
contains several modules for operating data Processing tasks, including association,
characterization, classification, clustering, prediction, time-series analysis, etc.

• In other words, we can say data Processing is the root of our data Processing
architecture. It comprises instruments and software used to obtain insights and knowledge
from data collected from various data sources and stored within the data warehouse.

Pattern Evaluation Module:

• The Pattern evaluation module is primarily responsible for the measure of


investigation of the pattern by using a threshold value. It collaborates with the data
Processing engine to focus the search on exciting patterns.

1. Building up an understanding of the application domain

This is the initial preliminary step. It develops the scene for understanding what should be
done with the various decisions like transformation, algorithms, representation, etc.

end-user and the environment in which the knowledge discovery process will occur
( involves relevant prior knowledge).

2. Choosing and creating a data set on which discovery will be performed

Once defined the objectives, the data that will be utilized for the knowledge discovery
process should be determined. This incorporates discovering what data is accessible,
obtaining important data, and afterward integrating all the data for knowledge discovery
onto one set involves the qualities that will be considered for the process. This process is
important because of Data Processing learns and discovers from the accessible data. This is
the evidence base for building the models. If some significant attributes are missing, at that
point, then the entire study may be unsuccessful from this respect, the more attributes are
considered. On the other hand, to organize, collect, and operate advanced data repositories
is expensive, and there is an arrangement with the opportunity for best understanding the
phenomena. This arrangement refers to an aspect where the interactive and iterative aspect
of the KDD is taking place. This begins with the best available data sets and later expands
and observes the impact in terms of knowledge discovery and modeling.

3. Preprocessing and cleansing

In this step, data reliability is improved. It incorporates data clearing, for example, Handling
the missing quantities and removal of noise or outliers. It might include complex statistical
techniques or use a Data Processing algorithm in this context. For example, when one
suspects that a specific attribute of lacking reliability or has many missing data, at this point,
this attribute could turn into the objective of the Data Processing supervised algorithm. A
prediction model for these attributes will be created, and after that, missing data can be
predicted. The expansion to which one pays attention to this level relies upon numerous
factors. Regardless, studying the aspects is significant and regularly revealing by itself, to
enterprise data frameworks.

4. Data Transformation

In this stage, the creation of appropriate data for Data Processing is prepared and
developed. Techniques here incorporate dimension reduction( for example, feature selection
and extraction and record sampling), also attribute transformation(for example,
discretization of numerical attributes and functional transformation). This step can be
essential for the success of the entire KDD project, and it is typically very project-specific.
For example, in medical assessments, the quotient of attributes may often be the most
significant factor and not each one by itself. In business, we may need to think about impacts
beyond our control as well as efforts and transient issues. For example, studying the impact
of advertising accumulation. However, if we do not utilize the right transformation at the
starting, then we may acquire an amazing effect that insights to us about the transformation
required in the next iteration. Thus, the KDD process follows upon itself and prompts an
understanding of the transformation required.

5. Prediction and description

We are now prepared to decide on which kind of Data Processing to use, for example,
classification, regression, clustering, etcMost Data Processing techniques depend on
inductive learning, where a model is built explicitly or implicitly by generalizing from an
adequate number of preparing models. The fundamental assumption of the inductive
approach is that the prepared model applies to future cases. The technique also takes into
account the level of meta-learning for the specific set of accessible data.

6. Selecting the Data Processing algorithm

Having the technique, we now decide on the strategies. This stage incorporates choosing a
particular technique to be used for searching patterns that include multiple inducers. For
example, considering precision versus understandability, the previous is better with neural
networks, while the latter is better with decision trees. For each system of meta-learning,
there are several possibilities of how it can be succeeded. Meta-learning focuses on clarifying
what causes a Data Processing algorithm to be fruitful or not in a specific issue. Thus, this
methodology attempts to understand the situation under which a Data Processing algorithm
is most suitable. Each algorithm has parameters and strategies of leaning, such as ten folds
cross-validation or another division for training and testing.

At last, the implementation of the algorithm is reached. In this stage, we may need to utilize
the algorithm several times until a satisfying outcome is obtained. For example, by turning
the algorithms control parameters, such as the minimum number of instances in a single leaf
of a decision tree.
8. Evaluation

In this step, we assess and interpret the mined patterns, rules, and reliability to the objective
characterized in the first step. Here we consider the preprocessing steps as for their impact
on the Data Processing algorithm results. For example, including a feature in step 4, and
repeat from there. This step focuses on the comprehensibility and utility of the induced
model. In this step, the identified knowledge is also recorded for further use. The last step is
the use, and overall feedback and discovery results acquire by Data Processing.

It has six sequential phases:

1. Business understanding – What does the business need?

2. Data understanding – What data do we have / need? Is it clean?

3. Data preparation – How do we organize the data for modeling?

4. Modeling – What modeling techniques should we apply?

5. Evaluation – Which model best meets the business objectives?

6. Deployment – How do stakeholders access the results?

DATA PROCESSING APPLICATIONS

• Sequence analysis in bioinformatics

• Classification of astronomical objects

• Medical decision support.

• Detect security violations

• Misuse Detection

• Anomaly Detection

• Direct mail targeting

• Stock trading

• Customer segmentation

• Churn prediction (Churn prediction is one of the most popular Big Data use cases in
business)

• Data Processing concepts are in use for Sales and marketing to provide better
customer service, to improve cross-selling opportunities, to increase direct mail response
rates.

• Customer Retention in the form of pattern identification and prediction of likely


defections is possible by Data Processing.

• Risk Assessment and Fraud area also use the data-Processing concept for
identifying inappropriate or unusual behavior etc.
Education: For analyzing the education sector, data Processing uses Educational Data
Processing (EDM) method.

• Predicting students admission in higher education

• Predicting students profiling

• Predicting student performance

• Teachers teaching performance

• Curriculum development

• Predicting student placement opportunities

Data Processing Applications

Here is the list of areas where data Processing is widely used −

• Financial Data Analysis

• Retail Industry

• Telecommunication Industry

• Biological Data Analysis

• Other Scientific Applications

• Intrusion Detection

Financial Data Analysis

The financial data in banking and financial industry is generally reliable and of high quality
which facilitates systematic data analysis and data Processing. Some of the typical cases are
as follows −

• Design and construction of data warehouses for multidimensional data analysis and
data Processing.

• Loan payment prediction and customer credit policy analysis.

• Classification and clustering of customers for targeted marketing.

• Detection of money laundering and other financial crimes.

Retail Industry

Data Processing has its great application in Retail Industry because it collects large amount
of data from on sales, customer purchasing history, goods transportation, consumption and
services. It is natural that the quantity of data collected will continue to expand rapidly
because of the increasing ease, availability and popularity of the web.
Data Processing in retail industry helps in identifying customer buying patterns and trends
that lead to improved quality of customer service and good customer retention and
satisfaction. Here is the list of examples of data Processing in the retail industry −

• Design and Construction of data warehouses based on the benefits of data


Processing.

• Multidimensional analysis of sales, customers, products, time and region.

• Analysis of effectiveness of sales campaigns.

• Customer Retention.

• Product recommendation and cross-referencing of items.

Telecommunication Industry

Today the telecommunication industry is one of the most emerging industries providing
various services such as fax, pager, cellular phone, internet messenger, images, e-mail,
web data

transmission, etc. Due to the development of new computer and communication


technologies, the telecommunication industry is rapidly expanding. This is the reason why
data Processing is become very important to help and understand the business.

Data Processing in telecommunication industry helps in identifying the telecommunication


patterns, catch fraudulent activities, make better use of resource, and improve quality of
service. Here is the list of examples for which data Processing improves telecommunication
services −

• Multidimensional Analysis of Telecommunication data.

• Fraudulent pattern analysis.

• Identification of unusual patterns.

• Multidimensional association and sequential patterns analysis.

• Mobile Telecommunication services.

• Use of visualization tools in telecommunication data analysis. Biological Data


Analysis

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data Processing is a
very important part of Bioinformatics. Following are the aspects in which data Processing
contributes for biological data analysis −

• Semantic integration of heterogeneous, distributed genomic and proteomic


databases.

• Alignment, indexing, similarity search and comparative analysis multiple


nucleotide sequences.
• Discovery of structural patterns and analysis of genetic networks and protein
pathways.

• Association and path analysis.

• Visualization tools in genetic data analysis. Other Scientific Applications

The applications discussed above tend to handle relatively small and homogeneous data sets
for which the statistical techniques are appropriate. Huge amount of data have been
collected from scientific domains such as geosciences, astronomy, etc. A large amount of
data sets is being generated because of the fast numerical simulations in various fields such
as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc. Following
are the applications of data Processing in the field of Scientific Applications −

• Data Warehouses and data preprocessing.

• Graph-based Processing.

• Visualization and domain specific knowledge. Intrusion Detection

Intrusion refers to any kind of action that threatens integrity, confidentiality, or the
availability of network resources. In this world of connectivity, security has become the
major issue. With increased usage of internet and availability of the tools and tricks for
intruding and attacking

network prompted intrusion detection to become a critical component of network


administration. Here is the list of areas in which data Processing technology may be applied
for intrusion detection −

• Development of data Processing algorithm for intrusion detection.

• Association and correlation analysis, aggregation to help select and build


discriminating attributes.

• Analysis of Stream data.

• Distributed data Processing.

• Visualization and query tools.

1.7 CHALLENGES OF DATA Handling

Challenges of Data usage Nowadays knowledge discovery is evolving a crucial technology


for business and researchers in many domains. Data Processing is developing into
established and trusted discipline, many still pending challenges have to be solved. Some of
these challenges are given below.

Security and Social Challenges:


Decision-Making strategies are done through data collection-sharing, so it requires
considerable security. Private information about individuals and sensitive information are
collected for customers profiles, user behaviour pattern understanding. Illegal access to
information and the confidential nature of information becoming an important issue.

User Interface: The knowledge discovered is discovered using data Processing tools is useful
only if it is interesting and above all understandable by the user. From good visualization
interpretation of data, Processing results can be eased and helps better understand their
requirements. To obtain good visualization many research is carried out for big data sets
that display and manipulate mined knowledge.

(i) Processing based on Level of Abstraction: Data Processing process needs to be


collaborative because it allows users to concentrate on pattern finding, presenting and
optimizing requests for data Processing based on returned results.

(ii) Integration of Background Knowledge: Previous information may be used to express


discovered patterns to direct the exploration processes and to express discovered patterns.

(iii) Processing Methodology Challenges: These challenges are related to data Processing
approaches and their limitations. Processing approaches that cause the problem are:

(i) Versatility of the Processing approaches,

(ii) Diversity of data available,

(iii) Dimensionality of the domain,

(iv) Control and handling of noise in data, etc.

Different approaches may implement differently based upon data consideration. Some
algorithms require noise-free data. Most data sets contain exceptions, invalid or incomplete
information lead to complication in the analysis process and some cases compromise the
precision of the results.

Complex Data: Real-world data is heterogeneous and it could be multimedia data


containing images, audio and video, complex data, temporal data, spatial data, time series,
natural language text etc. It is difficult to handle these various kinds of data and extract the
required information.

New tools and methodologies are developing to extract relevant information.

(i) Complex data types: The database can include complex data elements, objects with
graphical data, spatial data, and temporal data. Processing all these kinds of data is not
practical to be done one device.
(ii) Processing from Varied Sources: The data is gathered from different sources on
Network. The data source may be of different kinds depending on how they are stored such
as structured, semi-structured or unstructured.

Performance: The performance of the data Processing system depends on the efficiency of
algorithms and techniques are using. The algorithms and techniques designed are not up to
the mark lead to affect the performance of the data Processing process.

(i) Efficiency and Scalability of the Algorithms: The data Processing algorithm must be
efficient and scalable to extract information from huge amounts of data in the database.

(ii) Improvement of Processing Algorithms: Factors such as the enormous size of the
database, the entire data flow and the difficulty of data Processing approaches inspire the
creation of parallel & distributed data Processing algorithms.

Matching the correct algorithm with the data:


1. Nature of the data (size, dimensionality, linear/non-linear patterns,
labeled/unlabeled).

2. Learning objective (predicting labels, estimating continuous values, or discovering


hidden structure).

Let’s examine with examples across three problem types:

1. Classification (Learning objective: predict discrete labels)

Data characteristics: categorical/label-based outputs. Example: Predicting whether an email


is spam or not spam.

Algorithm choice: If data is linearly separable → Logistic Regression or Linear SVM works
well. If data has non-linear boundaries → Decision Trees or Random Forests perform better.
If dataset is very large and complex → Neural Networks (Deep Learning) may be chosen.

2. Regression (Learning objective: predict continuous values)

Data characteristics: numerical outputs, may have linear or non-linear relationships.


Example: Predicting house prices based on features (size, location, number of rooms).

Algorithm choice: If relationship is linear → Linear Regression suffices. If data shows non-
linear trends → Polynomial Regression or Support Vector Regression (SVR). If high-
dimensional and complex → Ensemble methods (Gradient Boosting, Random Forest
Regression).

3. Clustering (Learning objective: discover hidden structure, no labels)

Data characteristics: unlabeled datasets. Example: Customer segmentation in marketing.

Algorithm choice: If clusters are spherical and similar size → K-Means is effective. If clusters
are arbitrary shapes → DBSCAN is better. If data is hierarchical in nature → Hierarchical
Clustering gives better insight.
Remaining Refer Hand written material.

Explain the algorithms wherever the applications required.

You might also like