0% found this document useful (0 votes)

84 views25 pages

Srivardhan Python

The document summarizes data analysis on Welsh employment statistics from 2008-2019: 1. Public administration employed the most workers over the period while real estate employed the fewest. 2. Real estate saw the highest employment growth rate at 86.7% while retail saw the smallest at 0.64%. 3. 2018 had the highest overall employment of 1.45 million, while 2010 had the lowest employment of 1.33 million.

Uploaded by

Pallavi Pallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views25 pages

Srivardhan Python

Uploaded by

Pallavi Pallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

5/22/2020 srivardhan python

Programming for data analysis

Name: katakam srivardhan hruday kuamr

student id: st20166815

Moodle code: CIS7031_S2_19

Moodle leader: Imitiaz Khan

1. Data Preparation
1.1 Downloaded dataset for the period 2008 to 2019 from stat wales data source.

1.2 Data has been processed and we found that there is no outlier or null vales in the dataset.

1.3 Dataset has changes the name of the industry as aforementioned in assignment.

In [1]:

import pandas as pd
import numpy as np
%matplotlib inline

In [2]:

base_data=pd.read_excel('C:\\Users\\Admin\\Downloads\\Python\\[Link]')

Below is the final dataframe, shows wales total employment values.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 1/25

5/22/2020 srivardhan python

In [3]:

base_data.head(15)

Out[3]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

0 Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

1 Production 156700 149800 158600 154400 164200 173300 172300 162500 1

2 Construction 96600 93200 90000 91300 89300 97000 92600 102700

3 Retail 345400 344500 343100 347300 345100 337300 357700 360200 3

4 ICT 27800 27900 26400 27200 26900 35700 24000 34400

5 Finance 33800 29800 33200 31100 32400 32400 30800 31000

6 Real_Estate 13500 14600 17600 18800 18000 22200 19100 22700

7 Professional_Service 144800 145800 143600 137300 149900 152900 166200 161200 1

8 Public_Adminstration 415600 418600 425600 421000 427000 427600 423200 418500 4

9 Other_Service 64200 68000 72400 72800 75500 73300 77200 72400

In [4]:

base_data.index=base_data['Industry']
base_data.head()

Out[4]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

Industry

Agriculture Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

Production Production 156700 149800 158600 154400 164200 173300 172300 162500

Construction Construction 96600 93200 90000 91300 89300 97000 92600 102700

Retail Retail 345400 344500 343100 347300 345100 337300 357700 360200

ICT ICT 27800 27900 26400 27200 26900 35700 24000 34400

In [5]:

del base_data['Industry']

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 2/25

5/22/2020 srivardhan python

In [6]:

base_data.head()

Out[6]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [7]:

base_data['Total_Employees']=base_data.sum(axis=1)

In [8]:

base_data['Total_Employees_Growth']=round(((base_data[2018]/base_data[2009])-1)*100,2)

In [9]:

base_data.head(10)

Out[9]:

2009 2010 2011 2012 2013 2014 2015 2016 20

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 402

Production 156700 149800 158600 154400 164200 173300 172300 162500 1651

Construction 96600 93200 90000 91300 89300 97000 92600 102700 908

Retail 345400 344500 343100 347300 345100 337300 357700 360200 3335

ICT 27800 27900 26400 27200 26900 35700 24000 34400 589

Finance 33800 29800 33200 31100 32400 32400 30800 31000 321

Real_Estate 13500 14600 17600 18800 18000 22200 19100 22700 182

Professional_Service 144800 145800 143600 137300 149900 152900 166200 161200 1764

Public_Adminstration 415600 418600 425600 421000 427000 427600 423200 418500 4245

Other_Service 64200 68000 72400 72800 75500 73300 77200 72400 832

2. Data Analysis
localhost:8888/nbconvert/html/srivardhan [Link]?download=false 3/25
5/22/2020 srivardhan python

In [10]:

base_data['Total_Employees'].min()
Out[10]:

189900

In [11]:

base_data[base_data['Total_Employees']==base_data['Total_Employees'].min()].index

Out[11]:

Index(['Real_Estate'], dtype='object', name='Industry')

In [12]:

base_data['Total_Employees'].max()

Out[12]:

4236500

In [13]:

base_data[base_data['Total_Employees']==base_data['Total_Employees'].max()].index

Out[13]:

Index(['Public_Adminstration'], dtype='object', name='Industry')

In [14]:

import [Link] as px

2.1 Which industry employed highest and lowest workers over the period

We have fetch data of workplace employment by industry and area (Wales) for 12 years i.e. from 2008 to
2019. Below is the visualisation using python plotly express. It can observed that public administration has
highest number of employments over the period while real estate employee least number of the employee in
time span from 2008 to 2019.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 4/25

5/22/2020 srivardhan python

In [15]:

fig = [Link](base_data, y="Total_Employees", x=base_data.index, color=base_data.index,t

ext='Total_Employees')
fig.update_layout(title_text='Industry Employee Numbers')

[Link]()

Industry Employee Numbers

3.5M

3461700
3M
Total_Employees

2.5M

1.5M
162

2.2 Which industry has the highest and lowest overall growth over the period?

The below visualization shows industry percentage growth of employment over the period. It can be
observed that real estate shows highest percentage i.e. 86% growth in the employment from 2008 to 2019
while retail shows least percentage employee growth i.e. 0.64%.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 5/25

5/22/2020 srivardhan python

In [16]:

fig = [Link](base_data, y="Total_Employees_Growth", x=base_data.index, color=base_data.

index,text='Total_Employees_Growth')
fig.update_layout(title_text='Industry Employee % Growth')
[Link]()

Industry Employee % Growth

90
86.67
80

70
tal_Employees_Growth

2.3 Which years are the best and worst performing year in relation to number of employments. (highest and
lowest employment)

Bar graph visualisation shows number of performing years in relations to employment. It shows that 2018 is
the best performing year with highest employment with 1.45 million whiles 2010 is worst performing year with
least number of employments with 1.33 million.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 6/25

5/22/2020 srivardhan python

In [17]:

base_data2=base_data.T
base_data2.head()

Out[17]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

In [18]:

base_data2.head()

Out[18]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

In [19]:

base_data2['Yearly_Total_Employees']=base_data2.sum(axis=1)

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 7/25

5/22/2020 srivardhan python

In [20]:

base_data2.head(10)

Out[20]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate Pro

2009 37700.0 156700.0 96600.0 345400.0 27800.0 33800.0 13500.0

2010 38200.0 149800.0 93200.0 344500.0 27900.0 29800.0 14600.0

2011 36100.0 158600.0 90000.0 343100.0 26400.0 33200.0 17600.0

2012 36100.0 154400.0 91300.0 347300.0 27200.0 31100.0 18800.0

2013 36800.0 164200.0 89300.0 345100.0 26900.0 32400.0 18000.0

2014 42700.0 173300.0 97000.0 337300.0 35700.0 32400.0 22200.0

2015 40700.0 172300.0 92600.0 357700.0 24000.0 30800.0 19100.0

2016 43200.0 162500.0 102700.0 360200.0 34400.0 31000.0 22700.0

2017 40200.0 165100.0 90800.0 333500.0 58900.0 32100.0 18200.0

2018 41100.0 165700.0 101800.0 347600.0 31500.0 35500.0 25200.0

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 8/25

5/22/2020 srivardhan python

In [21]:

fig=[Link](base_data2,x=base_data2.index,y="Yearly_Total_Employees")
fig.update_layout(title='Yearly Total Employee',legend=dict(x=0,y=0.5))
[Link]()

Yearly Total Employee

14M

12M
rly_Total_Employees

10M

3 Visual analysis

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 9/25

5/22/2020 srivardhan python

In [22]:

base_data3=pd.read_excel('C:\\Users\\Admin\\Downloads\\Python\\[Link]')
base_data3.index=base_data3['Industry']
base_data3.head()

Out[22]:

Industry 2009 2010 2011 2012 2013 2014 2015 2016

Industry

Agriculture Agriculture 37700 38200 36100 36100 36800 42700 40700 43200

Production Production 156700 149800 158600 154400 164200 173300 172300 162500

Construction Construction 96600 93200 90000 91300 89300 97000 92600 102700

Retail Retail 345400 344500 343100 347300 345100 337300 357700 360200

ICT ICT 27800 27900 26400 27200 26900 35700 24000 34400

3.1 Create a dynamic scatter/bubble plot showing the change of workforce number over the period using
plotly Express.

To plot scatter chart, first we have to convert dataframe into columns, below is syntax to convert data frame
into columns.

In [23]:

del base_data3['Industry']

In [24]:

base_data4=base_data3.T

In [25]:

Final_df=[Link](columns=['Year','Workforce','Industry','Workforce_Change'])
for col in base_data4.columns:
if col!='Yearly_Total_Employees':
#print(col)
final_data=[Link](columns=['Year','Workforce','Industry','Workforce_Chang
e'])
final_data['Workforce']=base_data4[col].tolist()
final_data['Industry']=col
final_data['Year']=base_data4[col].index
final_data['Workforce_Change']= final_data['Workforce'] - final_data['Workforc
e'].shift()
final_data=final_data.fillna(0)
Final_df=Final_df.append(final_data)

Final output of the Data Frame to plot scatter chart.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 10/25

5/22/2020 srivardhan python

In [26]:

Final_df.head(100)

Out[26]:

Year Workforce Industry Workforce_Change

0 2009 37700 Agriculture 0.0

1 2010 38200 Agriculture 500.0

2 2011 36100 Agriculture -2100.0

3 2012 36100 Agriculture 0.0

4 2013 36800 Agriculture 700.0

... ... ... ... ...

5 2014 73300 Other_Service -2200.0

6 2015 77200 Other_Service 3900.0

7 2016 72400 Other_Service -4800.0

8 2017 83200 Other_Service 10800.0

9 2018 81800 Other_Service -1400.0

100 rows × 4 columns

Below dynamic scatter plot visualization shows the change of workforce number over the period. It can be
observed that in year 2017 ICT industry shows highest number of increases in workforce change followed by
retail industry with workforce employee to 20.4k in 2015 while same industry(ICT) shows highest number of
decreases in workforce in 2018 followed by retail in 2017. So it can be concluded that retail and ICT shows a
greater number of workforce changes over time period.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 11/25

5/22/2020 srivardhan python

In [27]:

fig = [Link](Final_df, x="Year", y="Workforce_Change", color="Industry",

log_x=True, size_max=60)
fig.update_layout(title='Scatter plot of change in workforce')
[Link]()

Scatter plot of change in workforce

20k

10k
Workforce_Change

10k

4. PCA/Correlation
PCA is basically dimensionality reduction method that is used to reduce the dimensions of the dataset into
smaller set of the variables. Using below syntax in python we have drawn PCA = 2 (Principle Component
Analysis).

In [28]:

from [Link] import PCA

pca = PCA()
PCA_base=base_data3[[2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]]

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 12/25

5/22/2020 srivardhan python

In [29]:

PCA_base.head()

Out[29]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [30]:

pca.n_components = 2
X_reduced = pca.fit_transform(PCA_base)
df_X_reduced = [Link](X_reduced,columns=['PC1','PC2'], index=PCA_base.index)
df_X_reduced=round(df_X_reduced,2)

In [31]:

df_X_reduced.head(10)

Out[31]:

PC1 PC2

Industry

Agriculture -312091.23 -8151.22

Production 76819.90 912.47

Construction -137358.52 -8786.80

Retail 658534.20 -15872.69

ICT -335201.17 10284.00

Finance -334432.79 -10293.86

Real_Estate -376204.46 -7409.96

Professional_Service 58629.49 33479.53

Public_Adminstration 903354.93 2773.17

Other_Service -202050.35 3065.36

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 13/25

5/22/2020 srivardhan python

In [32]:

corr = df_X_reduced.[Link]()
[Link].background_gradient(cmap='coolwarm')

Out[32]:

Industry Agriculture Production Construction Retail ICT Finance Real_Estate

Industry

Agriculture 1 -1 1 -1 1 1

Production -1 1 -1 1 -1 -1 -

Construction 1 -1 1 -1 1 1

Retail -1 1 -1 1 -1 -1 -

ICT 1 -1 1 -1 1 1

Finance 1 -1 1 -1 1 1

Real_Estate 1 -1 1 -1 1 1

Professional_Service -1 1 -1 1 -1 -1 -

Public_Adminstration -1 1 -1 1 -1 -1 -

Other_Service 1 -1 1 -1 1 1

Real estate, Finance, agriculture, and ICT have large negative loading on principle component 2. This
component focuses on wales more unemployed workforce. While, production, public admin service, retail
and professional service have positive loading on component 1. This component have focuses on industry
have more workforce.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 14/25

5/22/2020 srivardhan python

In [33]:

fig = [Link](df_X_reduced, x='PC1', y='PC2',color=df_X_reduced.index,hover_name=df_

X_reduced.index)
fig.update_layout(title='Principle Component Analysis Scatterplot')
[Link]()

Principle Component Analysis Scatterplot

30k

20k

10k
PC2

4.2 Correlation for each industry over years

Below correlation matrix shows correlation for each industry from 2009 to 2018. It can be observe that
agriculture industry is highly correlated with construction industry and other services is also positively
corelated with professional service and public administration. whereas retail and ICT shows weak linear
relationship.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 15/25

5/22/2020 srivardhan python

In [34]:

PCA_base.head()

Out[34]:

2009 2010 2011 2012 2013 2014 2015 2016 2017 2

Industry

Agriculture 37700 38200 36100 36100 36800 42700 40700 43200 40200 41

Production 156700 149800 158600 154400 164200 173300 172300 162500 165100 165

Construction 96600 93200 90000 91300 89300 97000 92600 102700 90800 101

Retail 345400 344500 343100 347300 345100 337300 357700 360200 333500 347

ICT 27800 27900 26400 27200 26900 35700 24000 34400 58900 31

In [35]:

import [Link] as plt

corr = round(PCA_base.[Link](),3)
[Link].background_gradient(cmap='coolwarm')

Out[35]:

Industry Agriculture Production Construction Retail ICT Finance Real_Es

Industry

Agriculture 1 0.647 0.727 0.228 0.378 -0.005 0

Production 0.647 1 0.188 0.028 0.232 0.225 0

Construction 0.727 0.188 1 0.414 0.01 0.309 0

Retail 0.228 0.028 0.414 1 -0.552 -0.253 0

ICT 0.378 0.232 0.01 -0.552 1 0.043 0

Finance -0.005 0.225 0.309 -0.253 0.043 1 0

Real_Estate 0.668 0.604 0.598 0.232 0.154 0.316

Professional_Service 0.637 0.56 0.441 0.046 0.503 0.389 0

Public_Adminstration 0.195 0.547 0.08 -0.258 0.122 0.59 0

Other_Service 0.333 0.578 -0.031 -0.156 0.543 0.242 0

5. Clustering (k means & hierarchical)

5.1 Using the best and worst performing year column’s employment data (2.3) undertake a K means
clustering analysis (K=2 & 3) and identify industries cluster together.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 16/25

5/22/2020 srivardhan python

Below is K-means Clustering Table. K_2 is 2 means clustering K_3 is 3 means clustering

In [36]:

cluster_base=base_data3[[2010,2018]]

In [37]:

cluster_base.head()

Out[37]:

2010 2018

Industry

Agriculture 38200 41100

Production 149800 165700

Construction 93200 101800

Retail 344500 347600

ICT 27900 31500

In [38]:

import [Link] as plt

from [Link] import KMeans
cluster = KMeans(n_clusters=2)
predicted_2 = cluster.fit_predict(cluster_base)

cluster2 = KMeans(n_clusters=3)
predicted_3 = cluster2.fit_predict(cluster_base)

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 17/25

5/22/2020 srivardhan python

In [39]:

cluster_base['K_2']=predicted_2+1
cluster_base['K_3']=predicted_3+1
cluster_base['K_2']=cluster_base['K_2'].astype(str)
cluster_base['K_3']=cluster_base['K_3'].astype(str)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 18/25

5/22/2020 srivardhan python

In [40]:

cluster_base.head(10)

Out[40]:

2010 2018 K_2 K_3

Industry

Agriculture 38200 41100 1 2

Production 149800 165700 1 1

Construction 93200 101800 1 2

Retail 344500 347600 2 3

ICT 27900 31500 1 2

Finance 29800 35500 1 2

Real_Estate 14600 25200 1 2

Professional_Service 145800 187100 1 1

Public_Adminstration 418600 434900 2 3

Other_Service 68000 81800 1 2

Scatter plot of K means clustering with K=2. From below two cluster visualization it can be observed that
more industry are in cluster 2 while cluster 1 has only 2 industry with certain similarities.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 19/25

5/22/2020 srivardhan python

In [41]:

fig = [Link](cluster_base, x=2010, y=2018, color="K_2",hover_name=cluster_base.inde

x)
fig.update_layout(title='Scatter plot of K means clustering with K=2')
[Link]()

Scatter plot of K means clustering with K=2

450k

400k

350k

300k

250k
2018

200k

Scatter plot of K means clustering with K=3. From below k = 3 cluster visualization it can be observed
that more industry are in cluster 3 while cluster 1 and cluster 2 has 2 industry with certain similarities.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 20/25

5/22/2020 srivardhan python

In [42]:

fig = [Link](cluster_base, x=2010, y=2018, color="K_3",hover_name=cluster_base.inde

x)
fig.update_layout(title='Scatter plot of K means clustering with K=3')
[Link]()

Scatter plot of K means clustering with K=3

450k

400k

350k

300k

250k
2018

200k

5.2 Hierarchical cluster

Dendrogram is used to determine the number of appropriate clusters in hierarchical clustering. It is the main
output of hierarchical clustering. The horizontal axis of dendrogram represent distances between cluster. The
number of clusters is equal to distance between two straight line drawn from one cluster to another. This is
refer to as Euclidean distance. So from above diagram using this clustering we have identified 6 clusters.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 21/25

5/22/2020 srivardhan python

In [43]:

import [Link] as sch

#Lets create a dendrogram variable linkage is actually the algorithm #itself of hierarc
hical clustering and then in linkage we have to #specify on which data we apply and eng
age. This is X dataset
dendrogram = [Link]([Link](base_data3[[2010,2018]], method = "ward"))
[Link]('Dendrogram')
[Link]('Years')
[Link]('Euclidean distances')
[Link]()

In [ ]:

In [44]:

from [Link] import AgglomerativeClustering

hc = AgglomerativeClustering(n_clusters = 6, affinity = 'euclidean', linkage ='ward')
# Lets try to fit the hierarchical clustering algorithm to dataset #X while creating t
he clusters vector that tells for each customer #which cluster the customer belongs to.
y_hc=hc.fit_predict(base_data3[[2010,2018]])

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 22/25

5/22/2020 srivardhan python

In [45]:

cluster_base['Hierarchical_clustering']=y_hc
cluster_base['Hierarchical_clustering']=cluster_base['Hierarchical_clustering']+1
cluster_base['Hierarchical_clustering']=cluster_base['Hierarchical_clustering'].astype(
str)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: Settin
gWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: [Link]

s/stable/user_guide/[Link]#returning-a-view-versus-a-copy

Below Scatter plot is created using the k = 6 cluster.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 23/25

5/22/2020 srivardhan python

In [46]:

fig = [Link](cluster_base, x=2010, y=2018, color="Hierarchical_clustering",hover_na

me=cluster_base.index)
fig.update_layout(title='Scatter plot of Hierarchical clustering with K=6')
[Link]()

Scatter plot of Hierarchical clustering with K=6

450k

400k

350k

300k

250k
2018

200k

k-means cluster is formed with predetermine number of clusters. In this we have identify the industry cluster
of best and worst performing year of employment with k = 2 and k = 3 cluster while in hierarchical clustering
as name suggest built hierarchy of cluster and result of number of clusters are reproduced as k =6 industry
cluster for best and worst performing year of employment.

6. Discussion
Provide a brief discussion (~ 300 words) on employment landscape of Wales based on the employment data
analysis results.

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 24/25

5/22/2020 srivardhan python

From the report it can be observed that employment in wales shows highest workforce in public
administration services followed by retail, production, and professional services while least work force is in
real estate, but it shows highest growth percentage in employment from 2008 to 2019. Though retail work
force is second highest, but this industry has lowest percentage growth rate over a period. With year wise
total wales employment, 2018 shows highest employment with real estate showing 38% growth while ICT
shows negative % growth. In 2010, wales shows least total workforce, with average negative (-2%) growth.
From correlation matrix, it can be observe that agriculture industry is highly correlated with construction
industry and other services is also positively corelated with professional service and public administration.
whereas retail and ICT shows weak linear relationship.

In [ ]:

localhost:8888/nbconvert/html/srivardhan [Link]?download=false 25/25

HACKATHON
No ratings yet
HACKATHON
8 pages
PFDA
No ratings yet
PFDA
23 pages
Data Science Jobs & Salaries Report
No ratings yet
Data Science Jobs & Salaries Report
8 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
Unemployment Analysis in India 2024
No ratings yet
Unemployment Analysis in India 2024
35 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
EM2301. Practical Class 7
No ratings yet
EM2301. Practical Class 7
5 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Data Analyst Syllabus (For Aundh)
No ratings yet
Data Analyst Syllabus (For Aundh)
8 pages
Data Analytics Using Python
100% (1)
Data Analytics Using Python
8 pages
Set B
No ratings yet
Set B
8 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
EDA Techniques in SAS for Data Science
No ratings yet
EDA Techniques in SAS for Data Science
25 pages
Prac 1
No ratings yet
Prac 1
5 pages
BI Pracrical
No ratings yet
BI Pracrical
12 pages
Unemployment
No ratings yet
Unemployment
18 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Unemployment Analysis Using Python
No ratings yet
Unemployment Analysis Using Python
32 pages
Data Analytics Broucher
No ratings yet
Data Analytics Broucher
20 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
DS - Program Curriculum
No ratings yet
DS - Program Curriculum
11 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Wongjianwei (TP061912)
No ratings yet
Wongjianwei (TP061912)
13 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Python For Data Analysis Notes
No ratings yet
Python For Data Analysis Notes
3 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Python Finance & Trading Guide
No ratings yet
Python Finance & Trading Guide
11 pages
Prac 1
No ratings yet
Prac 1
5 pages
[email protected]
No ratings yet
[email protected]
13 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Pandas
No ratings yet
Pandas
32 pages
Business Analytics Project For 20 Marks - M
No ratings yet
Business Analytics Project For 20 Marks - M
4 pages
Data Representation
No ratings yet
Data Representation
13 pages
DAP Mini Report1
No ratings yet
DAP Mini Report1
28 pages
Python & Excel for Data Science
No ratings yet
Python & Excel for Data Science
19 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
Answer Key For SET-1 TO 3
No ratings yet
Answer Key For SET-1 TO 3
7 pages
Sample Question For Practice
No ratings yet
Sample Question For Practice
8 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Data Pre-processing & Visualization Guide
No ratings yet
Data Pre-processing & Visualization Guide
51 pages
Excel Function Analysis Request - Monica AI Chat
No ratings yet
Excel Function Analysis Request - Monica AI Chat
11 pages
Explore and Transform Data Based On Rows - Transcript
No ratings yet
Explore and Transform Data Based On Rows - Transcript
3 pages
Essential Data Analysis Skills
No ratings yet
Essential Data Analysis Skills
8 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
10-Day Learning Plan - Data Handling & Cleaning
No ratings yet
10-Day Learning Plan - Data Handling & Cleaning
7 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
Unemployment Ip (Vikas, Nikhil, Abhijith)
No ratings yet
Unemployment Ip (Vikas, Nikhil, Abhijith)
21 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Analyze Salary Data with Python EDA
No ratings yet
Analyze Salary Data with Python EDA
20 pages
Report Shawari
No ratings yet
Report Shawari
10 pages
Comprehensive Data Analyst Guide
No ratings yet
Comprehensive Data Analyst Guide
1 page
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
18 pages
Dance Academy Management System: Presented by
No ratings yet
Dance Academy Management System: Presented by
21 pages
Dance and Health Research Project Report
No ratings yet
Dance and Health Research Project Report
47 pages
The Impact of Reward On Employee Performance (A Case Study of Malakand Private School)
No ratings yet
The Impact of Reward On Employee Performance (A Case Study of Malakand Private School)
10 pages
Lva1 App6892
No ratings yet
Lva1 App6892
62 pages
17020-Rohini Jamdade
No ratings yet
17020-Rohini Jamdade
24 pages
Service Quality and Customer Satisfaction: Variation in Customer Perception Across Demographic Profiles in Life Insurance Industry
No ratings yet
Service Quality and Customer Satisfaction: Variation in Customer Perception Across Demographic Profiles in Life Insurance Industry
9 pages
Index: 1.1 Key Features
No ratings yet
Index: 1.1 Key Features
53 pages
Employer Branding Impact on IT Recruitment
No ratings yet
Employer Branding Impact on IT Recruitment
148 pages
Social Networking Site Analysis
No ratings yet
Social Networking Site Analysis
66 pages
In Patient Satisfaction Survey-How Does It Help Our Health Care Delivery System (The Patient, The Health Care Giver and The Organization) ?
No ratings yet
In Patient Satisfaction Survey-How Does It Help Our Health Care Delivery System (The Patient, The Health Care Giver and The Organization) ?
10 pages
Prevention of Industrial Accidents: Measures and Challenges
No ratings yet
Prevention of Industrial Accidents: Measures and Challenges
8 pages
2-34-1378890420-3. Recruitment Screening - Full
No ratings yet
2-34-1378890420-3. Recruitment Screening - Full
8 pages
Summary of Findings, Suggestions and Conclusion
50% (2)
Summary of Findings, Suggestions and Conclusion
16 pages
Name - Shashi Kiran N K College - Govt Engineering College K.R.Pet Email: - CONTACT - +91 9741529057
No ratings yet
Name - Shashi Kiran N K College - Govt Engineering College K.R.Pet Email: - CONTACT - +91 9741529057
6 pages
A Study On Service Quality Measurement and Its Impact in Opting Insurance Companies
No ratings yet
A Study On Service Quality Measurement and Its Impact in Opting Insurance Companies
21 pages
A Literature Review On Global Occupational Safety and Health Practice & Accidents Severity
No ratings yet
A Literature Review On Global Occupational Safety and Health Practice & Accidents Severity
32 pages
Industrial Accidents and Their Prevention: A Case of Satluj Jal Viduat Nigam Limited, Shimla, Himachal Pradesh
No ratings yet
Industrial Accidents and Their Prevention: A Case of Satluj Jal Viduat Nigam Limited, Shimla, Himachal Pradesh
7 pages
THE EFFECTIVENESS OF TIME MANAGEMENT FOR EMPLOYEES - Report
0% (1)
THE EFFECTIVENESS OF TIME MANAGEMENT FOR EMPLOYEES - Report
85 pages
Design of Motor Vehicle Insurance Policy Management Application
No ratings yet
Design of Motor Vehicle Insurance Policy Management Application
11 pages
VEHICLE INSURANCE - Report
No ratings yet
VEHICLE INSURANCE - Report
9 pages
CN7021 Project Template
No ratings yet
CN7021 Project Template
8 pages
Online Insurance Management Insights
No ratings yet
Online Insurance Management Insights
4 pages
Travel Management
No ratings yet
Travel Management
101 pages
HITBSecConf2014 Hotel Reservation Form
No ratings yet
HITBSecConf2014 Hotel Reservation Form
1 page
May 22nd To May 29th, 2010: Reservations Department
No ratings yet
May 22nd To May 29th, 2010: Reservations Department
1 page
Hotel Booking Guidelines
No ratings yet
Hotel Booking Guidelines
2 pages
Cremorne Point Circuit (Nsw-Cremorner-Cpc)
No ratings yet
Cremorne Point Circuit (Nsw-Cremorner-Cpc)
5 pages
BS 3921-1985
100% (4)
BS 3921-1985
30 pages
MOD-5 Notes
No ratings yet
MOD-5 Notes
58 pages
Java Programming Exam Prep
No ratings yet
Java Programming Exam Prep
2 pages
Radio Reloj Aiwa Fr-A300
No ratings yet
Radio Reloj Aiwa Fr-A300
20 pages
Application Proforma
No ratings yet
Application Proforma
14 pages
PDHID Manual
No ratings yet
PDHID Manual
25 pages
FMD1 Slipping Clutch Test Apparatus
No ratings yet
FMD1 Slipping Clutch Test Apparatus
2 pages
EFFECT OF AI IN TALENT ACQUISITION Chapter - 1
No ratings yet
EFFECT OF AI IN TALENT ACQUISITION Chapter - 1
3 pages
Eicher Motors Limited - Comprehensive Company Report
No ratings yet
Eicher Motors Limited - Comprehensive Company Report
5 pages
Rice, Richardson, Clark 2012 - Perfeccionismo, Procrastinación y Trastornos Psicologicos PDF
No ratings yet
Rice, Richardson, Clark 2012 - Perfeccionismo, Procrastinación y Trastornos Psicologicos PDF
15 pages
Talking With Your Angels
100% (6)
Talking With Your Angels
39 pages
Development and Evaluation of Portable G
100% (1)
Development and Evaluation of Portable G
4 pages
NDE Level III Service Contract
No ratings yet
NDE Level III Service Contract
2 pages
Tenable OT Security-User Guide
No ratings yet
Tenable OT Security-User Guide
383 pages
Burien Park & Recreation Plan 2011-2025
No ratings yet
Burien Park & Recreation Plan 2011-2025
443 pages
UPSC Mains: Buddhism & Nalanda
No ratings yet
UPSC Mains: Buddhism & Nalanda
25 pages
Experiment Instruction of Proximate Analysis
No ratings yet
Experiment Instruction of Proximate Analysis
7 pages
Ujjayi Pranayama: Benefits and Insights
No ratings yet
Ujjayi Pranayama: Benefits and Insights
5 pages
Factors Influencing Slow Learners' Personality
No ratings yet
Factors Influencing Slow Learners' Personality
4 pages
Psce Conference
No ratings yet
Psce Conference
96 pages
Panasonic CSR and Green Strategy Analysis
No ratings yet
Panasonic CSR and Green Strategy Analysis
9 pages
King
No ratings yet
King
5 pages
English 11 - Unit 1 - Day 1 - Vocabulary Activities
No ratings yet
English 11 - Unit 1 - Day 1 - Vocabulary Activities
2 pages
Getting Start Guide - New Product Development Project Plan - Clickup PDF
No ratings yet
Getting Start Guide - New Product Development Project Plan - Clickup PDF
8 pages
SunshineTT 6
No ratings yet
SunshineTT 6
7 pages
DRV Ind Vol 66 1 Minarik PDF
No ratings yet
DRV Ind Vol 66 1 Minarik PDF
6 pages
Prayer Book
No ratings yet
Prayer Book
74 pages
MITPE - Learning Facilitator - Role Description
No ratings yet
MITPE - Learning Facilitator - Role Description
3 pages
Family Law Post-Judgment Guide
100% (1)
Family Law Post-Judgment Guide
28 pages