0% found this document useful (0 votes)
22 views11 pages

MajorProject - Ipynb - Colaboratory

Uploaded by

gupta.2001karan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

MajorProject - Ipynb - Colaboratory

Uploaded by

gupta.2001karan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3/14/24, 12:39 AM MajorProject.

ipynb - Colaboratory

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd
import ydata_profiling as pp
from sklearn.metrics import confusion_matrix,accuracy_score,roc_curve,classification_report
from ydata_profiling import ProfileReport
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
from sklearn import preprocessing
from collections import Counter
from sklearn import metrics
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = [12, 8]
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')

#!pip install ydata_profiling

from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive

df=pd.read_csv("/content/drive/MyDrive/Major project/train.csv")

df.head()

ID A1_Score A2_Score A3_Score A4_Score A5_Score A6_Score A7_Score A8_Score

0 1 1 0 1 0 1 0 1 0

1 2 0 0 0 0 0 0 0 0

2 3 1 1 1 1 1 1 1 1

3 4 0 0 0 0 0 0 0 0

4 5 0 0 0 0 0 0 0 0

df.describe()

ID A1_Score A2_Score A3_Score A4_Score A5_Score A6_Score

count 800.0000 800.000000 800.000000 800.000000 800.00000 800.000000 800.000000

mean 400.5000 0.560000 0.530000 0.450000 0.41500 0.395000 0.303750

std 231.0844 0.496697 0.499411 0.497805 0.49303 0.489157 0.460164

min 1.0000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000

25% 200.7500 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000

50% 400.5000 1.000000 1.000000 0.000000 0.00000 0.000000 0.000000

75% 600.2500 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000

max 800.0000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000

df.shape

(800, 22)

df.size

17600

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 1/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 800 non-null int64
1 A1_Score 800 non-null int64
2 A2_Score 800 non-null int64
3 A3_Score 800 non-null int64
4 A4_Score 800 non-null int64
5 A5_Score 800 non-null int64
6 A6_Score 800 non-null int64
7 A7_Score 800 non-null int64
8 A8_Score 800 non-null int64
9 A9_Score 800 non-null int64
10 A10_Score 800 non-null int64
11 age 800 non-null float64
12 gender 800 non-null object
13 ethnicity 800 non-null object
14 jaundice 800 non-null object
15 austim 800 non-null object
16 contry_of_res 800 non-null object
17 used_app_before 800 non-null object
18 result 800 non-null float64
19 age_desc 800 non-null object
20 relation 800 non-null object
21 Class/ASD 800 non-null int64
dtypes: float64(2), int64(12), object(8)
memory usage: 137.6+ KB

df.hist(figsize=(15, 15),grid=False)

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 2/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

array([[<Axes: title={'center': 'ID'}>,


<Axes: title={'center': 'A1_Score'}>,
<Axes: title={'center': 'A2_Score'}>,
<Axes: title={'center': 'A3_Score'}>],
[<Axes: title={'center': 'A4_Score'}>,
<Axes: title={'center': 'A5_Score'}>,
<Axes: title={'center': 'A6_Score'}>,
<Axes: title={'center': 'A7_Score'}>],
[<Axes: title={'center': 'A8_Score'}>,
<Axes: title={'center': 'A9_Score'}>,
<Axes: title={'center': 'A10_Score'}>,
<Axes: title={'center': 'age'}>],
[<Axes: title={'center': 'result'}>,
<Axes: title={'center': 'Class/ASD'}>, <Axes: >, <Axes: >]],
dtype=object)

df.isnull().sum()

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 3/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

ID 0
A1_Score 0
A2_Score 0
A3_Score 0
A4_Score 0
A5_Score 0
A6_Score 0
A7_Score 0
A8_Score 0
A9_Score 0
A10_Score 0
age 0
gender 0
ethnicity 0
jaundice 0
austim 0
contry_of_res 0
used_app_before 0
result 0
age_desc 0
relation 0
Class/ASD 0
dtype: int64

df.dtypes

ID int64
A1_Score int64
A2_Score int64
A3_Score int64
A4_Score int64
A5_Score int64
A6_Score int64
A7_Score int64
A8_Score int64
A9_Score int64
A10_Score int64
age float64
gender object
ethnicity object
jaundice object
austim object
contry_of_res object
used_app_before object
result float64
age_desc object
relation object
Class/ASD int64
dtype: object

df.duplicated().sum()

df["ethnicity"].value_counts()

White-European 257
? 203
Middle Eastern 97
Asian 67
Black 47
South Asian 34
Pasifika 32
Others 29
Latino 17
Hispanic 9
Turkish 5
others 3
Name: ethnicity, dtype: int64

df["austim"].value_counts()

no 669
yes 131
Name: austim, dtype: int64

ProfileReport(df)

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 4/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

Summarize dataset: 100% 40/40 [00:07<00:00, 2.35it/s, Completed]

Generate report structure: 100% 1/1 [00:11<00:00, 11.68s/it]

Render HTML: 100% 1/1 [00:02<00:00, 2.04s/it]

Overview

Dataset statistics
Number of variables 22

Number of observations 800

Missing cells 0

Missing cells (%) 0.0%

Duplicate rows 0

Duplicate rows (%) 0.0%

Total size in memory 137.6 KiB

Average record size in memory 176.2 B

Variable types
Numeric 3

Categorical 15

Boolean 3

Text 1

Alerts

age_desc has constant value "" Constant

score_features = df.filter(regex='A[0-9]_', axis=1).columns.tolist()

# This line groups the train_data DataFrame by the 'Class/ASD' column and calculates the mean for the columns specified by score_features
df.groupby('Class/ASD')[score_features].mean().T.plot.bar()
plt.title('Mean score - Autism Spectrum Quotient (AQ) 10 item screening tool')
plt.xticks(ticks=range(len(score_features)), labels=[x.split('_')[0] for x in score_features], rotation=0);

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 5/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

#Visualization of Ethnicity of the patient

df.groupby('ethnicity')['Class/ASD'].mean().sort_values().plot.bar()
plt.title('Ethnicity of the patient')
plt.xticks();

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 6/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

df['contry_of_res'].unique()

array(['Austria', 'India', 'United States', 'South Africa', 'Jordan',


'United Kingdom', 'Brazil', 'New Zealand', 'Canada', 'Kazakhstan',
'United Arab Emirates', 'Australia', 'Ukraine', 'Iraq', 'France',
'Malaysia', 'Viet Nam', 'Egypt', 'Netherlands', 'Afghanistan',
'Oman', 'Italy', 'AmericanSamoa', 'Bahamas', 'Saudi Arabia',
'Ireland', 'Aruba', 'Sri Lanka', 'Russia', 'Bolivia', 'Azerbaijan',
'Armenia', 'Serbia', 'Ethiopia', 'Sweden', 'Iceland', 'Hong Kong',
'Angola', 'China', 'Germany', 'Spain', 'Tonga', 'Pakistan', 'Iran',
'Argentina', 'Japan', 'Mexico', 'Nicaragua', 'Sierra Leone',
'Czech Republic', 'Niger', 'Romania', 'Cyprus', 'Belgium',
'Burundi', 'Bangladesh'], dtype=object)

import missingno as msno


msno.bar(df)

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 7/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

<Axes: >

import plotly.express as px

# Assuming 'train_data' is your DataFrame


autism_colors = ['pink', 'purple']

# Grouping data to get counts


autism_counts = df['Class/ASD'].value_counts().reset_index()
autism_counts.columns = ['Class/ASD', 'Count']

# Plotting donut chart


fig = px.pie(autism_counts, names='Class/ASD', values='Count', hole=0.7, color_discrete_sequence=autism_colors,
title='VISUALIZATION OF TOTAL AUTISM SPECTRUM DISORDER (ASD)')

fig.show()

VISUALIZATION OF TOTAL AUTISM SPECTRUM DISORDER

0
1
20.1%

79.9%

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 8/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

#Visualization of country of residence of the patient - Treemap

fig = px.treemap(df, path=['contry_of_res','Class/ASD'], color='Class/ASD',


color_continuous_scale='plotly3',
)

fig.update_layout(title="<b> COUNTRY OF RESIDENCE OF THE PATIENT - TREEMAP <b>",


titlefont={'size': 20, 'family': "San Serif"},
height=500, width=1000,
template='simple_white',
autosize=False,
margin=dict(l=50,r=50,b=50, t=250,
),
)
fig.update_layout(margin = dict(t=50, l=50, r=50, b=100))
fig.show()

COUNTRY OF RESIDENCE OF THE PATIENT - TREEMAP

Class/ASD
United States New Zealand Jordan Canada Afghanistan Netherlands 1
1 1
1 0 0 0 0 1 0 0
1

0.8
Austria France Kazakhstan
Spain Ireland
1
0 0 1
0 1
0 0
1

0.6
United Arab Emirates
Iran Malaysia South Africa Viet Nam
AmericanSamoa

0
1
1
1

0
0 0 0 1 0

India United Kingdom Sri Lanka Armenia


Iraq Mexico Nicaragua Pakistan
0.4
0 1 Russia
1
0
0 1
0 0 0 0 1
0
0 Aruba Saudi Arabia Germany
Serbia Ukraine

Australia Bahamas
0 1

0
1
0

0.2
Cyprus Oman Sweden
1

Azerbaijan Iceland

0 1 Brazil
0
1 Bolivia
Angola Belgium Egypt Hong Kong

0 1
0

Italy
1

0 China Niger
Argentina Burundi Japan
Sierra Leone

Ethiopia

1
0
Bangladesh Czech Republic
Romania Tonga
0

0 0 1

import seaborn as sns


import matplotlib.pyplot as plt

# Assuming 'df' is your DataFrame


sns.countplot(x="jaundice", data=df, palette="Purples")
plt.title('Autism by Jaundice')
plt.show()

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 9/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

output

features = [
'age',
'jaundice',
'austim',
'result'
]

pip install lazypredict

Collecting lazypredict
Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from lazypredict) (8.1.7)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from lazypredict) (1.2.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from lazypredict) (1.5.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from lazypredict) (4.66.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from lazypredict) (1.3.2)
Requirement already satisfied: lightgbm in /usr/local/lib/python3.10/dist-packages (from lazypredict) (4.1.0)
Requirement already satisfied: xgboost in /usr/local/lib/python3.10/dist-packages (from lazypredict) (2.0.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from lightgbm->lazypredict) (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from lightgbm->lazypredict) (1.11.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->lazypredict) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->lazypredict) (2023.4)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->lazypredict) (3.3
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->lazypredict
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12

import lazypredict

from lazypredict.Supervised import LazyClassifier


from sklearn.model_selection import train_test_split

features = ['age', 'jaundice', 'austim', 'result']


X = df[features]
y = df['Class/ASD']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=123)

clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)


models,predictions = clf.fit(X_train, X_test, y_train, y_test)

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 10/11
3/14/24, 12:39 AM MajorProject.ipynb - Colaboratory

100%|██████████| 29/29 [00:01<00:00, 19.50it/s][LightGBM] [Info] Number of positive: 76, number of negative: 324
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000236 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 275
[LightGBM] [Info] Number of data points in the train set: 400, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.190000 -> initscore=-1.450010
[LightGBM] [Info] Start training from score -1.450010
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

https://colab.research.google.com/drive/1RUQOzdtCsrC7GDSnMi1R0qacKcno1TK2#scrollTo=7BIqOTJ3rW06&printMode=true 11/11

You might also like