0% found this document useful (0 votes)
43 views3 pages

Test 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views3 pages

Test 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction to Machine Learning

Test 1 Machine Learning

Ayoub Asri

2023-09-30

Contents
Rules : 1

Questions 1
1. KNN for regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. KNN for classification 2

3. KNN 2

4. SVM classification plot 2

5. SVM for classification 3

Rules :
1. The student have to answer all these questions and provide one file containing code, results and
comments (or more one file). The comments must be about the results not the code itself.
2. Only the use of tidymodels is accepted on the modeling stage. Any other tool is allowed in the other
steps.
3. The deadline is 10 days from the moment this document was sent whcih make the deadline : Thursday
November 23th 2023 at 23:59:59
4. Any new ideas on commenting the results or use of pre-processing or any idea to improve the result or
the performance of the model will be highly rewarded.
5. Any cheating will have consequences on the final mark.

Questions
1. KNN for regression
The goal of this exercise is to fit a KNN model for a regression problem. We want to determine the price of a
car based on its caracteristics.
Q1.
Split the dataset into 90% on training set.
Create a Nearest Neighbor model while tuning the hyperparameter of Neighbors and taking all information
included in all the variables (if possible)

1
2. KNN for classification
We will try to fit the attrition variable using KNN model. Load the dataset “hrt.csv”
Q1. inspect the dataset and split it (80/20 split)
Q2. create a KNN model to estimate the attrition variable. The student must tune the hyperparameter
“neighbors”. The choice of the recipe is mandatory but the choice of recipe can vary.

3. KNN
In this last part of KNN we will study the effect of changing the value of the neighbors in the resulting
estimate in a regression problem of KNN.
To illustrate this aspect, we will use the data set from the file “sacramento.csv”
Q1. plot the scatterplot of the price (target variable) vs sqft (the feature variable). Is there any relationship ?
Q2. split the data (3/4 for train)
Q3. Tune the neighbors hyperparameter for the KNN model using a grid of all values between a 1 and 200.
Q4. plot all the values of the performance measure depending on the different values of the hyperparameters.
Q5. create a function that takes an imput value of the neighbor and that estimate a model and return
(output) a plot containing the scatterplot of the y vs x variable plus a line plot of the estimated y (ŷ) vs x.
Use this function to try different values for the neighbors. What this plot illustrate when we use higher values
of neighbors ? smaller values ?

4. SVM classification plot


Run this code to create this fictional data set.
library(tidyverse)

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --


## v dplyr 1.1.3 v readr 2.1.4
## v forcats 1.0.0 v stringr 1.5.0
## v ggplot2 3.4.4 v tibble 3.2.1
## v lubridate 1.9.3 v tidyr 1.3.0
## v purrr 1.0.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)

## -- Attaching packages -------------------------------------- tidymodels 1.1.1 --


## v broom 1.0.5 v rsample 1.2.0
## v dials 1.2.0 v tune 1.1.2
## v infer 1.0.5 v workflows 1.1.3
## v modeldata 1.2.0 v workflowsets 1.0.1
## v parsnip 1.1.1 v yardstick 1.2.0
## v recipes 1.0.8
## -- Conflicts ----------------------------------------- tidymodels_conflicts() --
## x scales::discard() masks purrr::discard()
## x dplyr::filter() masks stats::filter()

2
## x recipes::fixed() masks stringr::fixed()
## x dplyr::lag() masks stats::lag()
## x yardstick::spec() masks readr::spec()
## x recipes::step() masks stats::step()
## * Use tidymodels_prefer() to resolve common conflicts.
x <- rbind(matrix(rnorm(200),100,2),matrix(rnorm(200,mean=3),100,2))
y <- matrix(c(rep(1,100),rep(-1,100)))

df <- x %>%
bind_cols(y)

## New names:
## * `` -> `...1`
## * `` -> `...2`
## * `` -> `...3`
names(df) <- c("x1", "x2", "y")

df <- df %>%
mutate(y = as.factor(y))

Q1. plot the scatterplot containing x2 vs x1 and color the results by the target variable (yellow for -1 and
green for +1)
Q2. create an SVM with RBF kernel model tune its parameters and finally plot a classification plot using
this final model.
Q3. interpret each color and shape of the resulting classification plot
Q4. precise the performance measure of this model (on the training set !)

5. SVM for classification


Use the same data set hrt from exercise 2.
Q1. split the data (80/20 split)
Q2. define an appropriate recipe.
Q3. fit 3 models of SVM with different kernels. (use random values of hyperparameters for each model).
which is the best model ?
Q4. Use an SVM model with an RBF kernel and tune its hyperparameters. Present the predictions and the
performance measures for the best model.

You might also like