Machine Learning Practical File
Machine Learning Practical File
CSE(AI&ML)
Roll. No.:
Group/Branch:
LIST OF EXPERIMENTS
S No Program Name Dates Signature
PROGRAM 1
PML1: Text Pre-processing using NLKT, which focuses on preparing and cleaning text data.
Answer: Text pre-processing is crucial in NLP because it transforms raw text into a clean, structured
format that models can understand. This step helps reduce noise, improves data quality, and enhances the
efficiency and accuracy of machine learning algorithms.
Explanation: Pre-processing techniques like tokenization, stop word removal, and
stemming/lemmatization help standardize text and reduce the size of the feature space.
Answer: Tokenization is the process of breaking down text into smaller pieces, such as words, phrases,
or sentences. It is necessary because most NLP algorithms require input in the form of individual tokens
to analyze and extract meaningful information.
Explanation: For example, in the sentence "Machine learning is powerful," tokenization would split it
into ["Machine", "learning", "is", "powerful"] for further processing.
4. What is the role of the Natural Language Toolkit (NLTK) in text pre-processing?
Answer: NLTK is a Python library that provides tools for text processing and analysis, including
tokenization, stop word removal, stemming, lemmatization, and more. It simplifies the process of
preparing text data for machine learning models.
Explanation: NLTK comes with pre-built functions and corpora that can be leveraged to create efficient
text processing pipelines without having to write extensive code from scratch.
5. What is the difference between stemming and lemmatization, and when would you use each?
Answer: Stemming reduces words to their base or root form, which may not always be a valid word
(e.g., "running" becomes "run"). Lemmatization reduces words to their canonical form, ensuring that the
reduced form is a meaningful word (e.g., "running" becomes "run," "better" becomes "good").
Explanation: Stemming is faster and less computationally intensive but may lead to less accurate results.
Lemmatization is more accurate as it considers the context of the word but is computationally more
expensive. You would use stemming when you need quick results and accuracy is not the priority, while
lemmatization is used when precision is essential.
Answer: In machine learning, bias refers to the error introduced by approximating a real-world problem,
which may be complex, by a simpler model. High bias leads to underfitting, where the model fails to
capture the underlying patterns in the data.
Explanation: Models with high bias make strong assumptions about the data and are unable to learn
adequately from it. Examples include linear models applied to non-linear data.
2. What is variance in machine learning, and how does it impact model performance?
Answer: Variance refers to the model's sensitivity to small changes in the training data. High variance
means that the model fits the training data too closely (overfitting) and may not generalize well to unseen
data.
Explanation: A model with high variance captures noise in the training data, leading to poor
performance on test data. This happens when the model is too complex and fits every detail of the
training set.
Answer: The bias-variance trade-off is the balance between two sources of error that affect model
performance: bias and variance. Low bias usually comes with high variance (overfitting), and low
variance often comes with high bias (underfitting). The goal is to find the optimal balance where both are
minimized to improve model accuracy.
Explanation: An ideal model achieves a balance that minimizes overall error on new data. Too simple
models underfit, while overly complex models overfit.
1. What is the difference between training, validation, and test sets in machine learning?
Solution:
Training Set: This is the dataset used to train the machine learning model. The model learns patterns and
relationships in the data during this phase.
Validation Set: This set is used to tune the hyperparameters of the model and evaluate its performance
during training. It helps to avoid overfitting by providing an unbiased evaluation of the model during
training.
Test Set: This dataset is used to assess the model's final performance after training. It provides an
unbiased estimate of the model's generalization ability on unseen data.
2. Why is it important to separate the data into training, validation, and test sets?
Solution:
Separating the data helps to ensure that the model is able to generalize to unseen data. If the same data is
used for training and testing, the model may memorize the data (overfitting) and fail to perform well on
new, unseen data. The validation set is crucial for fine-tuning the model and ensuring it does not overfit,
while the test set provides a final, unbiased evaluation of model performance.
Solution:
Cross-validation is a technique where the dataset is split into several subsets (folds). The model is trained
on some folds and validated on the remaining fold. This process is repeated for each fold, and the
performance metrics are averaged to give a more reliable estimate of model performance. Cross-
validation helps reduce variance in the model evaluation and prevents overfitting, especially when the
dataset is small.
4. What is overfitting, and how can you detect it using validation and test sets?
Solution:
Overfitting occurs when a model learns the noise or random fluctuations in the training data rather than
the underlying patterns, leading to poor performance on unseen data. To detect overfitting:
Validation Set: If the model performs well on the training set but poorly on the validation set, it may be
overfitting.
Test Set: Similarly, if the model shows good performance on the training set but poor performance on
the test set, it suggests that the model is not generalizing well to new data.
Solution:
Hyperparameter tuning involves adjusting the model's hyperparameters (e.g., learning rate, regularization
strength) to improve its performance. This can be done using the validation set:
Grid Search: Test a range of hyperparameter values to find the combination that gives the best
performance on the validation set.
Random Search: Randomly sample hyperparameters and evaluate the model on the validation set.
Bayesian Optimization: Use probabilistic models to guide the search for optimal hyperparameters.
PROGRAM 4
PML4(A): Classification Loss, detailing loss functions used in classification tasks.
PROGRAM 4
PML4(B): Regression Loss, describing loss functions relevant to regression tasks.
2. What is the difference between binary cross-entropy and categorical cross-entropy loss?
Solution:
Binary cross-entropy is used for binary classification tasks, while categorical cross-entropy is used for
multi-class classification.
Binary cross-entropy compares predicted probabilities for a single class (0 or 1), while categorical cross-
entropy compares predicted probabilities across multiple classes.
Explanation:
Binary cross-entropy is for two classes, while categorical cross-entropy handles multiple classes by
calculating loss for each class and then averaging the result.
5. How can class imbalance affect classification loss, and how can it be addressed?
Solution:
Class imbalance can cause the model to favor the majority class, leading to biased predictions. It can be
addressed using weighted loss, resampling, or focal loss.
Explanation:
Weighted loss assigns higher penalties to misclassifications of the minority class, helping balance the
model's performance across all classes.
REGRESSION LOSS:
2. What is mean squared error (MSE) loss and why is it commonly used in regression?
Solution:
MSE is the average of the squared differences between predicted and actual values. It is commonly used
because it penalizes large errors more heavily.
Solution:
Huber loss combines MSE and MAE. It uses MSE for small errors and switches to MAE for large errors,
making it less sensitive to outliers than MSE.
Explanation:
By minimizing regression loss (e.g., using gradient descent), the model adjusts its parameters to reduce
the error between predicted and actual values, improving prediction accuracy over time.
PROGRAM 5
PML5: K-Nearest Neighbors (KNN), an introduction to the KNN algorithm for classification
and regression.
PROGRAM 6
Varun Sharma (25341)
Explanation:
Euclidean distance is straightforward and works well for many problems. It measures the straight-line distance
between two points in a multi-dimensional space. Other distance metrics like Manhattan or Minkowski can also
be used depending on the problem.
4.What are the main advantages and disadvantages of the KNN algorithm?
Solution:
Advantages:
o Simple to implement and understand.
o No training phase (instance-based learning).
o Effective for non-linear decision boundaries.
Disadvantages:
o Slow prediction phase for large datasets since it requires computing distances to all points.
o Sensitive to the choice of kkk and irrelevant features.
o Struggles with high-dimensional data (curse of dimensionality).
Explanation:
KNN is intuitive and does not require a training phase, but its performance can degrade with large datasets and
high-dimensional spaces due to the need to compute distances for each test point.
5. How does KNN handle ties in classification (when two or more classes have the same number of nearest
neighbors)?
Solution:
KNN typically handles ties by:
Selecting the class of the nearest neighbor (if there's a tie in the number of neighbors).
Choosing a class based on distance-weighted voting, where closer neighbors have more influence.
PROGRAM 7
PML7: SVM Binary Classification, covering the basics of Support Vector Machine for binary
classification.
Solution:
SVM for binary classification is a supervised machine learning algorithm that finds the optimal hyperplane
that separates two classes in a high-dimensional feature space, maximizing the margin between them.
Explanation:
SVM works by finding the decision boundary (hyperplane) that best separates the data into two classes while
maximizing the margin (distance between the hyperplane and the nearest points of each class, called support
vectors). This results in a classifier that generalizes well to unseen data.
Solution:
SVM handles non-linearly separable data by using the kernel trick. The kernel function maps the input data
into a higher-dimensional space where a linear hyperplane can separate the classes.
Explanation:
For data that isn't linearly separable, SVM uses kernels (e.g., polynomial, radial basis function) to transform
the data into a higher-dimensional space where it becomes easier to find a linear separating hyperplane.
Solution:
Binary Classification: Involves classifying data into two distinct classes (e.g., class 0 vs. class 1).
Multiclass Classification: Involves classifying data into more than two classes (e.g., class 0, class 1, and
class 2).
Explanation:
Binary classification deals with two classes, and models are optimized to distinguish between them. In
multiclass classification, the model must distinguish between more than two classes, requiring different
strategies such as one-vs-rest (OvR) or one-vs-one (OvO).
Solution:
SVM handles multiclass classification using strategies like One-vs-Rest (OvR) or One-vs-One (OvO),
where multiple binary classifiers are trained to handle different class combinations.
Explanation:
One-vs-Rest (OvR): A binary classifier is trained for each class, distinguishing that class from all others.
One-vs-One (OvO): A binary classifier is trained for every pair of classes, and the class with the most
votes from these classifiers is chosen.
5. What are the key differences in the approach of binary classification and multiclass classification with
SVM?
Solution:
In binary classification, SVM aims to find a single hyperplane that separates two classes.
In multiclass classification, SVM requires combining multiple binary classifiers using strategies like One-vs-
Rest (OvR) or One-vs-One (OvO).
Explanation:
For binary classification, a single optimal hyperplane is sufficient. In multiclass classification, since there are
more than two classes, additional classifiers or methods are needed to handle multiple classes. The SVM
framework needs adaptations to manage the increased complexity of multiclass problems.
PROGRAM 8
PML8: Naive Bayes Classifier, exploring the Naive Bayes algorithm for classification tasks
and showing Confusion Matrix parameters
Explanation:
It measures the proportion of predicted positives that are actually correct, indicating the accuracy of positive
predictions.
7. What is the difference between precision and recall in terms of a Confusion Matrix?
Solution:
Precision measures the accuracy of positive predictions, while recall measures the ability to correctly identify all
actual positive instances.
Explanation:
Precision focuses on the correctness of the positive predictions, and recall focuses on capturing all actual positive
cases, even at the expense of false positives.
Solution:
PCA is a linear technique that transforms high-dimensional data into a lower-dimensional form by finding the
principal components that capture the most variance in the data.
Explanation:
By reducing the number of dimensions, PCA helps simplify the dataset, improving computational efficiency and
mitigating the curse of dimensionality while preserving essential patterns in the data.
Solution:
PCA in Scikit-learn reduces dimensionality by projecting the data onto a set of orthogonal axes (principal
components) that maximize variance, using the PCA class.
Explanation:
The principal components are ordered by the amount of variance they capture, and by selecting a subset of these
components, PCA reduces the number of features while retaining most of the original data's variability.
Solution:
To apply PCA in Scikit-learn, you use the PCA class, fit the model to the data using fit() and transform it with
transform() or fit_transform() for dimensionality reduction.
Explanation:
The fit() method computes the principal components, and transform() reduces the data to the selected number of
components, which you can specify by setting the n_components parameter.
4. What parameter in Scikit-learn's PCA class controls the number of principal components?
Solution:
The n_components parameter in the PCA class controls the number of principal components to keep after
dimensionality reduction.
Explanation:
Setting n_components to a number less than the original number of features reduces the data's dimensions, while
setting it to a float between 0 and 1 keeps enough components to preserve that percentage of the variance.
Solution:
PCA can project high-dimensional data into 2 or 3 dimensions, allowing easier visualization of complex datasets
while retaining most of the original variance.
Explanation:
By reducing the dimensionality to 2 or 3 principal components, PCA helps visualize the data in scatter plots,
revealing patterns, clusters, or relationships that are not visible in higher-dimensional spaces
PROGRAM 10
PML10: Decision Trees (DT), explaining how decision tree algorithms function for
classification and regression.
Explanation:
In classification, the tree is built by recursively splitting the dataset using the feature that best separates the
classes, typically based on metrics like Gini impurity or Information Gain.
Explanation:
Continuous features are divided into intervals, and the tree determines the best split by minimizing a criterion such
as Gini impurity or entropy, which helps distinguish the classes at each node.
Explanation:
The criteria measure how well a split divides the data, with the goal of maximizing the homogeneity of the target
variable within each branch after the split.
Explanation:
Unlike classification, where the most frequent class is assigned to a leaf, in regression, the Decision Tree predicts
a continuous value by averaging the target variable within each subset (leaf).
Explanation:
Overfitting happens when a tree perfectly fits the training data, including noise, which reduces its ability to
generalize to new data. Regularization techniques like pruning or limiting depth help prevent this.
To facilitate the
graduates with the
ability to visualize,
gather information,
articulate,
analyze, solve complex
problems, and make
decisions. These are
essential to address
the
challenges of complex
and computation
intensive problems
increasing their
productivity.
PEO2 – TECHNICAL
SKILLS
To facilitate the
graduates with the
technical skills that
prepare them for
immediate
employment and
pursue certification
providing a deeper
understanding of the
technology in
advanced areasof
computer science and
related fields, thus
Varun Sharma (25341)
encouraging to pursue
higher
education and research
based on their interest.
To facilitate the
graduates with the soft
skills that include
fulfilling the mission,
setting
goals, showing self-
confidence by
communicating
effectively, having a
positive attitude, get
involved in team- work,
being a leader,
managing their career
and their life.
PEO4 – PROFESSIONAL
ETHICS
To facilitate the
graduates with the
knowledge of
professional and ethical
responsibilities
by paying attention to
grooming, being
conservative with style,
following dress codes,
safety
codes, and adapting
themselves to
technological
advancements.
PEO1 – ANALYTICAL
SKILLS
To facilitate the
graduates with the
ability to visualize,
gather information,
articulate,
analyze, solve complex
problems, and make
decisions. These are
essential to address
the
challenges of complex
and computation
intensive problems
increasing their
productivity.
PEO2 – TECHNICAL
SKILLS
To facilitate the
graduates with the
technical skills that
prepare them for
immediate
employment and
pursue certification
providing a deeper
understanding of the
technology in
advanced areasof
computer science and
related fields, thus
encouraging to pursue
higher
To facilitate the
graduates with the soft
skills that include
fulfilling the mission,
setting
goals, showing self-
confidence by
communicating
effectively, having a
positive attitude, get
PEO4 – PROFESSIONAL
ETHICS
To facilitate the
graduates with the
knowledge of
professional and ethical
responsibilities
by paying attention to
grooming, being
conservative with style,
Varun Sharma (25341)