Feature Engg Code

Feature engg for machine learning

Uploaded by

promodkumarsahu7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

30 views16 pages

Feature Engg Code

Feature engg for machine learning

Uploaded by

promodkumarsahu7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

11722, 2332 AM. In]: In [8]: In [9]: In [10]: In [12]: In [14]: In [15]: In [16]: # Home import import import import local_data_path Scaler DSML Feature Engg Class Code - Jupyter Notebook Loan decision automation pandas as pd Aumpy as np matplotlib.pyplot as seaborn as sns plt E:\DATA_SCIENCE\Scaler\Data\loan-prediction\train.csv" # https://drive. google. con/drive/folders/1QFDGIHCZPqS5kD7_Uo8SCH9BSAZAVBQj ?usp=st # Step 1: Data exploration (Basic) data = pd.read_csv(local_data_path) data. info() Rangelndex: 614 entries, @ to 613 Data columns (total 13 columns): # Column @ Loan_iD 614 1 Gender 601 2 Married 611 3. Dependents 599 4 Education 614 5 Self_Employed 582 6 ApplicantIncome 614 7 CoapplicantIncome 614 8 LoanAmount 592 9 Loan_Amount_Term 60 1@ Credit_History 564 11 Property_Area 614 12 Loan_Status 614 dtypes: floaté4(4), int64(1), object(8) memory usage: 62.5+ KB Non-Null Count non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null data = data.drop("Loan_1D", axis=1) object object object object object object inted floatea floate4 floated Floated object object Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 1611722, 2332 AM. Scaler DSML Feature Engg Class Code - Jupyter Notebook In [17]: data.describe() out [47]: Applicantincome Coapplicantincome LoanAmount Loan Amount Term Credit History count 614.0000 '614.000000 592.0000 ‘600.0000 564.000000 mean 5403.459283 1621.245798 148.412162 34200000 0.842199 std 6109.041673 2926.248369 85.587325 5.12041 0.364878 min 10,000000 0.000000 8.000000 2.00000 0.000000 25% 2677-50000 0.000000 100.000000 ‘360.0000 1.000000 50% 3812.500000 1188,500000 128.0000 360,00000 1.000000 75% 5796,000000 2297.250000 168.0000 360.00000 1.000000 max 81000.000000-—-41667:000000_700.000000 480.00000 1.000000 In [19]: data.describe(includ “object"]).transpose() out [19]: count unique top freq Gender 601 2 Male 489 Married 611 2 Yes 398 Dependents 5994 0 345 Education 614 2 Graduate 480 Solf_Employed 582 2 No 500 Property Area 614 3. Semiurban 233, Loan Status 614 2 Y 422 In [25]: data.Loan_Status out(25]: <<<2< Y Y Y Y N Loan_Status, Length: 614, dtype: object In [26]: # Step 2: Brainstorming Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 21611722, 2332 AM. In [27]: data.info() Scaler DSML Feature Engg Class Code - Jupyter Notebook RangeIndex: 614 entries, @ to 613 Data columns (total 12 columns): # Column 10 Property _Area 11 Loan_status dtypes: float6a(4), int6a(1), object(7) Loan_Amount_Term Credit_History @ Gender 1 Married 2 Dependents 3 Education 4 Self_Employed 5 ApplicantIncome 6 CoapplicantIncome 7 Loanamount 8 9 memory usage: 57.7+ KB In [28]: # Step 3: Look at basic distributions (univariates) Non-Null Count 601 611 599 614 582 614 614 592 600 564 614 614 non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null # Step 4: Handle missing values In [29]: data.isna().sum() out[29]: Gender Married Dependents Education Self_Employed Applicant Income CoapplicantIncome LoanAnount Loan_Amount_Term Credit_History Property_Area Loan_status dtype: intea 13 15 32 22 14 50 In [33]: def missing_to_df(df): total_missing df percent_missing df = (df.isnul1().sum()/dF. isnul1().count()).sort_values(asce missing data_df = pd.concat( [total_missing df, percent_missing df], axis=1, Percent” keys ) Tota: df .isnull().sum() .sort_values (ascendin return missing data_df object object object object object intea Floated floates Floated Floated object object Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 3611722, 2332 AM. In [34]: out [34]: In [36]: In [38]: out [38]: In [39]: In [40]: In [42]: In [46]: Scaler DSML Feature Engg Class Code - Jupyter Notebook missing_df = missing_to_df(data) missing_df {missing df["Total"] > 0] Total Percent Credit History 50 0.081433 Solf_Employed 32. 0.052117 LeanAmount 22 0.096831 Dependents 15 0.024430 Loan_Amount_Term 14 0.022801 Gender 13. 0.021173 Married 3. 0.004886, data["Credit_History"] = data["Credit_History"].fillna(2) data["Self_Employe ].unique() array(['No', ‘Yes', nan], dtype-object) data["Self_Employe ] = data["Self_Employed"] .fil1na("other” from sklearn.inpute import SimpleImputer num_missing = [“LoanAmount", “Loan_Amount_Tern”] median_imputer = SimpleInputer (strateg; for col in num_missing: data[col] = pd.DataFrame(median_imputer.fit_transform(pd.DataFrame(data[col]) median") cat_missing = ["Gender", "Married", "Dependents"] freq_imputer = SimpleInputer(strategy="nost_frequent") for col in cat_missing: data[col] = pd.DataFrame(freq_imputer. Fit_transform(pd.DataFrame(data[col]))! Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit ane31122, 232 AM ‘Scaler DSML Feature Engg Class Code - Jupyter Notebook In [47]: data. isnul1().sum() out [47]: Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome Loananount Loan_Amount_Term Credit_History Property_Area Loan_status dtype: intea In [48]: # Removing or replacing redundant or eroneous values # if income was negative # married had some number In [49]: # detect and handle outLiers In [ ]: #more EDA, univariates and bivariates In [58]: sns.countplot(data=data, x="Loan_Status") Out[50]: 400 350 300 0 count 20 150 100 Loan_Status Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 5161122, 232 AM ‘Scaler DSML Feature Engg Class Code - Jupyter Notebook In [51]: sns.distplot(data[ "Applicant income” ]) €:\Users\nitis\anaconda3\1ib\site-packages\seaborn\distributions.py:2619: Futur eWarning: “distplot” is a deprecated function and will be removed in a future v ersion. Please adapt your code to use either “displot’ (a figure-level function with similar flexibility) or “histplot” (an axes-level function for histogram s). warnings.warn(msg, FutureWarning) out[51]: 0.00020 0.00005 0.00000 0 zo000 40000 —~—«eou00 |—«80000 eplicantincome In [52]: data.boxplot(column="ApplicantIncome", by="Educat ion") plt.show() Boxlopgpseaat acation sono nooo 60000 o 0000 @ seo 8 : 100 x00 t 200 ° = csucaton Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 6611722, 2332 AM. In [53]: out [53]: In [54]: out [54]: In [58]: out [58]: Scaler DSML Feature Engg Class Code - Jupyter Notebook sns .distplot (data["CoapplicantIncome" ]) €:\Users\nitis\anaconda3\1ib\site-packages\seaborn\distributions.py:2619: Futur eWarning: “distplot” is a deprecated function and will be removed in a future v ersion. Please adapt your code to use either “displot’ (a figure-level function with similar flexibility) or “histplot” (an axes-level function for histogran s). warnings.warn(msg, FutureWarning) 0.0005 0.0004 0.0003 Density ‘0.0002 0.0001 0.0000 0 1000 20000 ~=«30000~=«=«0000 Coapplicantincome data.groupby("Loan_Status").mean(){"ApplicantIncone" ] Loan_status N 5aa6.e78125 Y 5384068720 Name: ApplicantIncone, dtype: floatea bins = [@, 2500, 4000, 6000, 81000] group_name = ["Low", "Average", "High", "Very High"] data["Income_bin"] = pd.cut(data["ApplicantIncome"], bins=bins, labels=group_name data["Income_bin"] e High 1 High 2 Average 3 Average 4 High 609 Average 610 High 611 Very High 612 Very High 613 High Name: Income_bin, Length: 614, dtype: category Categories (4, object): [Low < Average < High < Very High] Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit m631122, 232 AM In [61]: In [62]: In [ ]: In [63]: In [64]: out [64]: ‘Scaler DSML Feature Engg Class Code - Jupyter Notebook Income_bin = pd.crosstab(data["Incone_bin"], data[”Loan_status"]) Income_bin.div(Income_bin.sum(axis=1), axis=0).plot(kind="bar", figsize=(4,4)) plt.xlabel("ApplicantIncome") plt.ylabel("Percentage") plt.show() 07 06 Percentage a1 oo Low Average High very High ‘Applicantincome # above is also not useful, because the approval rate across income bins is very # Feature Engineering data["Totalincome"] = data["ApplicantIncone"] + data["Coapplicant Income" ] data["TotalIncome_bin"] = pd.cut(data["TotalIncome"], bins=bins, labels=group_nar data["TotalIncome_bin"] RUNES 613 High Very High Average High High Average High Very High Very High High Name: TotalIncome_bin, Length: 614, dtype: category Categories (4, object): [Low < Average < High < Very High] Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit ane31122, 232 AM ‘Scaler DSML Feature Engg Class Code - Jupyter Notebook In [66]: TotalIncome_bin = pd.crosstab(data["Totalincome_bin"], data["Loan_status"]) Total Income_bin.div(TotalIncome_bin.sum(axis= plt.xlabel ("Total Income") plt.ylabel("Percentage") plt.show() ), axis=0).plot(kind="bar", stackec 10 Loan status os 02 oo Low Average High very High “Btalincome In [67]: data = data.drop(["Income_bin”, “TotalIncome_bin"], axis=1) In [68]: data["Loan_Amount_Term"].nunique() out[6s]: 10 In [69]: data["Loan_Anount_Term"].value_counts() out[s9]: 360.8 526 180.0 44 480.0 15, 300.0 84.0 240.0 120.0 36.0 60.0 12.0 Name: Loan_Anount_Term, dtype: intea ere weal In [78]: data["Loan_Amount_Term"] = (data["Loan_Amount_Tern"]/12).astype(' float") Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit one11722, 2332 AM. Scaler DSML Feature Engg Class Code - Jupyter Notebook In [71]: pd.crosstab(data["Loan_Amount_Term"], data["Loan_Status"]) out [71]: Loan_status NY Loan_Amount_1 19 0 4 a) 500 2 m 403 10 0 8 150 15 29 20 1 38 20 5 8 30.0 159 367 40 9 6 In [72]: data["Loan_Amount_per_year"] = data["LoanAmount" ]/data["Loan_Amount_Term"] In [75]: data["EMI"] = data["Loan_Amount_per_year"]*1000/12 In [76]: data out 76]: Gonder Married Dependents Education SelfEmployed Applicantincome Coapplicantincome 0 Wale No 0 Graduate %o sea 06 4 Malo Yes + Graduate No 4583 1508.0 2 Mae Yes 0 Graduate ves eco oo 3 Mae Yes © reat No 2583 2580 4 Mae No 0 Graduate No 000 oo 608 Female No 0 Graduate No 200 oo 610 Male Yes, a Graduate wo 4106 oo et Male Yes 1 Graduate No sore 2409 oz Mae Yes 2 Graduate No 7588 oo 13 Female No 0 Gradute ves 489 oo 614 rows x 15 columns Iocaiost 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code ipynbit 101631122, 232 AM In [7]: data” ‘Scaler DSML Feature Engg Class Code -Jupyter Notebook ‘able_to_pay_ENI"] = (data["TotalIncome"]/1@ > data["EMI"]).astype( int’) In [78]: sns.countplot(x='able_to_pay_fMI’, datasdata, hue="Loan_status") out[78]: 400 350 300 0 200 count 150 100 In [79]: datal" out [79]: @ 1 2 BF Name: In [85]: data[" In [87]: datat" localhost 8888inotebookslJunyter Loan status ° 1 sable_to_pay EMI Dependents" ].value_counts() 360 102 101 51. Dependents, dtype: intea Dependents" ].replace(*3+', 3, inplace= rue) ‘Dependents"] = data["Dependents”].astype("float") Notebooks/ScalerScaler DSML Feature Engg Class Code jpyno# ne31122, 232 AM In [88]: out [88]: In [89]: In [99]: ‘Scaler DSML Feature Engg Class Code -Jupyter Notebook sns.countplot(data=data, x="Dependents’, hue="Loan_Status") 250 200 150 count 100 00 10 20 30 Dependents # bivariate with credit_history, you will find that better credit history has bet data. info() Rangelndex: 614 entries, @ to 613 Data columns (total 16 columns # Column Non-Null Count type ® Gender 614 non-null object 1 Married 614 non-null object 2 Dependents 614 non-null floatea 3. Education 614 non-null object 4 Self_Employed 614 non-null object 5 ApplicantIncome 614 non-null intea 6 CoapplicantIncone 614 non-null _—float6a 7 LoanAmount 614 non-null float 8 Loan_Amount_Term 614 non-null —_floate4 9 Credit_History 614 non-null floate4 1@ Property_Area 614 non-null object 11 Loan_status 614 non-null object 12 TotalIncome 614 non-null —_floatea 33. Loan_Amount_per_year 614 non-null —float6a 14 EMT 614 non-null —_floatea 35. able_to_pay_EMT 614 non-null —_int32 dtypes: floatea(8), int32(1), int64(1), object(6) mem jory usage: 74.5+ KB Iocahst 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code lpynbit 1611722, 2332 AM. In [91]: out [91]: In [95]: In [96]: In [ ]: Scaler DSML Feature Engg Class Code - Jupyter Notebook data["Self_Employed"].value_counts() No 50 Yes 82 Other 32 Name: Self_Employed, dtype: inte4 data = pd.get_dummies(data, drop_first=True) data. info() Rangelndex: # Column @ Dependents 1 Applicant Income 2 CoapplicantIncome 3° LoanAmount 4 Loan_Anount_Term 5 Credit History 6 TotalIncome 7 Loan_Amount_per_year 8 EMT 9 able_to_pay_EMI 10 Gender_Nale 11 Married_Yes 12 Education Not Graduate 13 Self_Employed_other 14 Self_Employed_Yes 15 Property _Area_Semiurban 16 Property _Area_Urban 17 Loan_Status_Y dtypes: floatea(), int32(1), memory usage: [email protected] KB Feature engineering + feature transformation + new features + one hot encoding and so on. # Dimensionality reduction # removing unwanted features 614 entries, @ to 613 Data columns (total 18 columns. Non-Null Count oa 614 14 1a 614 o14 o14 614 614 614 614 614 614 614 614 614 614 614 non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null Dtype Floates intea floated floated Floated Floated Floated Floates floates int32 uints uints uints uints uints uints uints uints inte4(1), uinte(s) # check corr and remove features Iocahst 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code lpynbit 131631122, 232 AM ‘Scaler DML Feature Engg Class Code - Jupyter Notebook In [97 plt. Figure(Figsize=(20,20)) sns.heatmap(data.corr(), annot=True) plt.show() In [98]: # spearmans ranking corr coeff Iocahst 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code lpynbit 14631122, 232 AM ‘Scaler DML Feature Engg Class Code - Jupyter Notebook In [99]: plt.figure(figsize=(20,20)) sns. heatmap (data. corr(method=" sp plt.show() arman"), annot=True) In [100]: # feature scaling In [11]: from sklearn.preprocessing import StandardScaler, MinMaxScaler In [102]: normalizer = MinMaxScaler() Iocahst 8888inotebooks/Jupyter Notebooks/Scalr/Scaler DSML Feature Engg Class Code lpynbit 1911611722, 2332 AM. Scaler DSML Feature Engg Class Code - Jupyter Notebook In [104]: pd.DataFrame(normalizer.fit_transform(data), columns=data.columns) out[104]: Dependents Applicantincome Coapplicantincome LoanAmount Loan_Amount Term Credit Hi © 0.000000 0.070489 0.000000 0.172214 0.743590 4 099399 0.054890 oossie2 —o.t72214 0.743590 2 0.000000 0.035250 0.900000 0.082489 0.743690 3 0.000000 0.030093, o0ssse2 0.160637 0.743590 4 0.000000 0.072356, 0.000000 0.191027 0.743590 609 0.000000 o.o34014 .000000 0.089725 0.743590 610 1.000000 0.048930 0.000000 0.044863, 0.358074 10399933 0.097984 0.005760 0.353111 0.743590 612 0.886867 0.091936, 0.000000 0.257598 0.743590 613 0.000000 0.054830 o.900000 0.179450 0.743590 614 rows * 18 columns > Inf]: Inf]: mf]: localhost 88tinctobooks/JupyerNoleboots/Scala/Scalar DSML Feature Engg Class Code py tee

Assignment 1
No ratings yet
Assignment 1
12 pages
Exp 343
No ratings yet
Exp 343
18 pages
Cleaning Data
No ratings yet
Cleaning Data
18 pages
57 - AI2 - PRAC 6.ipynb - Colab
No ratings yet
57 - AI2 - PRAC 6.ipynb - Colab
3 pages
DS2 C5 S1 Preparing Data Machine Learning Concept Codebook
No ratings yet
DS2 C5 S1 Preparing Data Machine Learning Concept Codebook
1 page
LOan Final
No ratings yet
LOan Final
6 pages
Credit Risk Prediction Model Overview
No ratings yet
Credit Risk Prediction Model Overview
19 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
Python Pandas: 12 Data Manipulation Techniques
100% (2)
Python Pandas: 12 Data Manipulation Techniques
19 pages
Loan Prediction
No ratings yet
Loan Prediction
26 pages
Download
No ratings yet
Download
10 pages
Eda 2 Code
No ratings yet
Eda 2 Code
20 pages
2.3 - Jupyter Notebook
No ratings yet
2.3 - Jupyter Notebook
24 pages
DS - Assig-03-Part-I - Jupyter Notebook
No ratings yet
DS - Assig-03-Part-I - Jupyter Notebook
8 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Loan Default Prediction System
No ratings yet
Loan Default Prediction System
13 pages
LendingClub Loan Default Prediction Model
No ratings yet
LendingClub Loan Default Prediction Model
18 pages
ML
No ratings yet
ML
10 pages
Feature Engineering - 01
No ratings yet
Feature Engineering - 01
31 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Loan Interest Prediction Using Linear Regression
No ratings yet
Loan Interest Prediction Using Linear Regression
26 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Fraud Detection
No ratings yet
Fraud Detection
7 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Credit Scores Classification
No ratings yet
Credit Scores Classification
104 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Day89 90 Loan Predictions Model 1706059551
No ratings yet
Day89 90 Loan Predictions Model 1706059551
25 pages
Analyzing Customer Data with NumPy
No ratings yet
Analyzing Customer Data with NumPy
9 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Data Science for Home Loan Automation
No ratings yet
Data Science for Home Loan Automation
11 pages
Task-2 Example Code
No ratings yet
Task-2 Example Code
8 pages
Feature Scaling (MinMaxScaler)
No ratings yet
Feature Scaling (MinMaxScaler)
18 pages
Decision Tree & Random Forest Guide
No ratings yet
Decision Tree & Random Forest Guide
7 pages
Loan Students
No ratings yet
Loan Students
2 pages
I Love Merge
No ratings yet
I Love Merge
56 pages
SanatKulkarni - AP22110010183 - Assignment3-1
No ratings yet
SanatKulkarni - AP22110010183 - Assignment3-1
4 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
GmPrac3 - Jupyter Notebook
No ratings yet
GmPrac3 - Jupyter Notebook
10 pages
Predictive 23-06-2025 - Jupyter Notebook
No ratings yet
Predictive 23-06-2025 - Jupyter Notebook
14 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Projet 2 Classification Des Crédits
No ratings yet
Projet 2 Classification Des Crédits
24 pages
DACLUSTER
No ratings yet
DACLUSTER
9 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Data Preprocessing Techniques Guide
No ratings yet
Data Preprocessing Techniques Guide
23 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Exercises 2
No ratings yet
Exercises 2
10 pages

Feature Engg Code

Uploaded by

Feature Engg Code

Uploaded by

You might also like