LAB 04 REPORT
Code:
Explanations:
Importing libraries that are useful for our project
The shor form of numpy,matplotlib and pandas as np,plt and pd
Code & Output:
Explanations:
Importing dataset through pandas and
showing data set
Code & Output:
Explanations:
There are 5 columns in which 4 are independent and 1 are dependent. So we are storing
independent columns in x frame, and storing dependent column in y . Then we are showing
the elements of y
Code & Output:
In this we are finding if there is any null or NAN values in my dataset. isnull() is a python
built in function which explores if there is any null values and sum() is also a python built in
function which accumulates all the null values of each column. In the result,we are getting no
null values in each column and as a result no null values in the whole
Code & Output:
There’s a text/string value column in my dataset. So we need to encode the values so that our
models can clearly understand , before encoding we need to watch if that column values are
independent values or dependent values. If they are independent values, then we can
implement OneHotEncoder,but if they are dependent values, then we can implement
LabelEncoder. For OneHotEncoding, we are importing ColumnTransformer and
OneHotEncoder from sklearn and then we are passing props to the ColumnTransformer
object. To place the encoding values in the text/string values we are using fit_transform() .
Code & Output:
Explanation:
Here we are splitting the dataset into two parts, Training set and Testing set.Usually the ratio
is maintained as 80:20 ( 80% data used for Training and 20% used for Testing ) .To split, we
are importing train_test_split from sklearn. Then we’re passing props to train_test_split ,
test_size=0.2 means 20% data will be reserved for testing data, and random state means
The data will be shuffled and the model should pick the testing data in a random order .
What if i don’t use Random State ?
Ans: It will be a risk factor for my model. Suppose i have 100 data, I split the data in a ratio
of 80:20 for training and testing. Here model will pick first 80 data for training and
remaining 20 datas for testing. The remaining 20 data can be of the same value/category. If
we are just testing our model on the same category values, then we are not knowing how the
model is reacting to different categorical values. As a result, it is hard to find out if the
model is actually good or not! So we shuffle the dataset and using random state allows us to
pick the testing data from anywhere from the dataset.
Now we’re importing LinearRegression from sklearn and we are using x_train and y_train for
our model training .
s
we want to see the predictions our model.We are training on x_training and y_training and
our predictions results are getting stored in y_pred. We want to see two more values after
fraction, so we are using precision=2 . if we want to see n more values then precision =n . By
the next line we are showing the actual result and our predicted result.