0% found this document useful (0 votes)

65 views8 pages

Assignment 4

The document outlines Assignment 4, which consists of two main parts: analyzing randomly generated graphs to identify the algorithms used for their creation, and predicting management salaries and future connections within a company's email network. The assignment includes coding tasks using Python libraries such as NetworkX and scikit-learn, with specific requirements for returning results in the form of lists and Pandas series. Grading is based on the accuracy of predictions measured by the Area Under the ROC Curve (AUC).

Uploaded by

mymail8795

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views8 pages

Assignment 4

Uploaded by

mymail8795

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

10/16/25, 8:24 PM assignment4

Assignment4 (Score: 3.0 / 3.0)

1. Test cell (Score: 1.0 / 1.0)
2. Test cell (Score: 1.0 / 1.0)
3. Test cell (Score: 1.0 / 1.0)

Assignment 4
In [1]: import networkx as nx
import pandas as pd
import numpy as np
import pickle

Part 1 - Random Graph Identification

For the first part of this assignment you will analyze randomly generated graphs and determine which
algorithm created them.

In [2]: G1 = nx.read_gpickle("assets/A4_P1_G1")
G2 = nx.read_gpickle("assets/A4_P1_G2")
G3 = nx.read_gpickle("assets/A4_P1_G3")
G4 = nx.read_gpickle("assets/A4_P1_G4")
G5 = nx.read_gpickle("assets/A4_P1_G5")
P1_Graphs = [G1, G2, G3, G4, G5]

P1_Graphs is a list containing 5 networkx graphs. Each of these graphs were generated by one of three
possible algorithms:

Preferential Attachment ( 'PA' )

Small World with low probability of rewiring ( 'SW_L' )
Small World with high probability of rewiring ( 'SW_H' )

Anaylze each of the 5 graphs using any methodology and determine which of the three algorithms
generated each graph.

The graph_identification function should return a list of length 5 where each element in the list is
either 'PA' , 'SW_L' , or 'SW_H' .

https://www.coursera.org/api/rest/v1/executorruns/richfeedback?id=KXJ2DKdOEfCYSQr_4Hq04Q&feedbackType=HTML 1/8
10/16/25, 8:24 PM assignment4

In [3]: Student's answer (Top)

def graph_identification():
# Analyze graphs to identify which algorithm generated them
# PA (Preferential Attachment): high clustering, power-law de
gree distribution
# SW_L (Small World Low rewiring): high clustering, low avera
ge shortest path
# SW_H (Small World High rewiring): low clustering, low avera
ge shortest path

results = []

for g in P1_Graphs:
# Calculate key metrics
avg_clustering = nx.average_clustering(g)
try:
avg_path_length = nx.average_shortest_path_length(g)
except:
# If graph is disconnected, use the largest component
largest_cc = max(nx.connected_components(g), key=len)
subgraph = g.subgraph(largest_cc)
avg_path_length = nx.average_shortest_path_length(sub
graph)

# Degree distribution analysis

degrees = [d for n, d in g.degree()]
avg_degree = np.mean(degrees)
std_degree = np.std(degrees)
max_degree = np.max(degrees)

# Classification logic:
# PA: Very high degree variance, some nodes with very hig
h degree
# SW_L: High clustering, relatively low path length
# SW_H: Low clustering, low path length

degree_variance = std_degree / avg_degree if avg_degree >

0 else 0

# PA has very high degree variance (power law)

if degree_variance > 1.5 or max_degree > 3 * avg_degree:
results.append('PA')
# SW_L has high clustering coefficient
elif avg_clustering > 0.4:
results.append('SW_L')
# SW_H has lower clustering
else:
results.append('SW_H')

return results

In [4]: Grade cell: cell-efb9da7e1c19accf Score: 1.0 / 1.0 (Top)

ans_one = graph_identification()
assert type(ans_one) == list, "You must return a list"

Part 2 - Company Emails

For the second part of this assignment you will be working with a company's email network where each
node corresponds to a person at the company, and each edge indicates that at least one email has been
sent between two people.

The network also contains the node attributes Department and ManagmentSalary .

Department indicates the department in the company which the person belongs to, and
ManagmentSalary indicates whether that person is receiving a managment position salary.

In [5]: G = pickle.load(open('assets/email_prediction_NEW.txt', 'rb'))

print(f"Graph with {len(nx.nodes(G))} nodes and {len(nx.edges(G))}

edges")

Graph with 1005 nodes and 16706 edges

Part 2A - Salary Prediction

Using network G , identify the people in the network with missing values for the node attribute
ManagementSalary and predict whether or not these individuals are receiving a managment position
salary.

To accomplish this, you will need to create a matrix of node features of your choice using networkx, train a
sklearn classifier on nodes that have ManagementSalary data, and predict a probability of the node
receiving a managment salary for nodes where ManagementSalary is missing.

Your predictions will need to be given as the probability that the corresponding employee is receiving a
managment position salary.

The evaluation metric for this assignment is the Area Under the ROC Curve (AUC).

Your grade will be based on the AUC score computed for your classifier. A model which with an AUC of
0.75 or higher will recieve full points.

Using your trained classifier, return a Pandas series of length 252 with the data being the probability of
receiving managment salary, and the index being the node id.

Example:

1 1.0
2 0.0
5 0.8
8 1.0
...
996 0.7
1000 0.5
1001 0.0
Length: 252, dtype: float64

In [6]: list(G.nodes(data=True))[:5] # print the first 5 nodes

Out[6]: [(0, {'Department': 1, 'ManagementSalary': 0.0}),

(1, {'Department': 1, 'ManagementSalary': nan}),
(581, {'Department': 3, 'ManagementSalary': 0.0}),
(6, {'Department': 25, 'ManagementSalary': 1.0}),
(65, {'Department': 4, 'ManagementSalary': nan})]

In [7]: Student's answer (Top)

from sklearn.neural_network import MLPClassifier

from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import networkx as nx

def salary_predictions():
def is_management(node):
managementSalary = node[1]['ManagementSalary']
if managementSalary == 0:
return 0
elif managementSalary == 1:
return 1
else:
return None

df = pd.DataFrame(index=G.nodes())
df['clustering'] = pd.Series(nx.clustering(G))
df['degree'] = [val for node, val in G.degree()]
df['degree_centrality'] = [val for node, val in nx.degree_cen
trality(G).items()]
df['closeness'] = [val for node, val in nx.closeness_centrali
ty(G).items()]
df['betweeness'] = [val for node, val in nx.betweenness_centr
ality(G).items()]
df['pr'] = [val for node, val in nx.pagerank(G).items()]
df['is_management'] = pd.Series([is_management(node) for node
in G.nodes(data=True)], index=df.index)

df_train = df[~pd.isnull(df['is_management'])]
df_test = df[pd.isnull(df['is_management'])]
features = ['clustering', 'degree', 'degree_centrality', 'clo
seness', 'betweeness', 'pr']
X_train = df_train[features]
Y_train = df_train['is_management']
X_test = df_test[features]
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
clf = MLPClassifier(hidden_layer_sizes=[10, 5], alpha=5,
random_state=0, solver='lbfgs', verbose=0)
clf.fit(X_train_scaled, Y_train)
test_proba = clf.predict_proba(X_test_scaled)[:, 1]
return pd.Series(test_proba, X_test.index)

In [8]: Grade cell: cell-bc9c23e7517908ab Score: 1.0 / 1.0 (Top)

ans_salary_preds = salary_predictions()
assert type(ans_salary_preds) == pd.core.series.Series, "You must
return a Pandas series"
assert len(ans_salary_preds) == 252, "The series must be of lengt
h 252"

Part 2B - New Connections Prediction

For the last part of this assignment, you will predict future connections between employees of the network.
The future connections information has been loaded into the variable future_connections . The index is
a tuple indicating a pair of nodes that currently do not have a connection, and the Future Connection
column indicates if an edge between those two nodes will exist in the future, where a value of 1.0 indicates
a future connection.

In [9]: future_connections = pd.read_csv('assets/Future_Connections.csv', i

ndex_col=0, converters={0: eval})
future_connections.head(10)

Out[9]:
Future Connection

(6, 840) 0.0

(4, 197) 0.0

(620, 979) 0.0

(519, 872) 0.0

(382, 423) 0.0

(97, 226) 1.0

(349, 905) 0.0

(429, 860) 0.0

(309, 989) 0.0

(468, 880) 0.0

Using network G and future_connections , identify the edges in future_connections with missing
values and predict whether or not these edges will have a future connection.

To accomplish this, you will need to:

1. Create a matrix of features of your choice for the edges found in future_connections using
Networkx
2. Train a sklearn classifier on those edges in future_connections that have Future Connection
data
3. Predict a probability of the edge being a future connection for those edges in future_connections
where Future Connection is missing.

Your predictions will need to be given as the probability of the corresponding edge being a future
connection.

The evaluation metric for this assignment is the Area Under the ROC Curve (AUC).

Your grade will be based on the AUC score computed for your classifier. A model which with an AUC of
0.75 or higher will recieve full points.

Using your trained classifier, return a series of length 122112 with the data being the probability of the
edge being a future connection, and the index being the edge as represented by a tuple of nodes.

Example:

(107, 348) 0.35

(542, 751) 0.40
(20, 426) 0.55
(50, 989) 0.35
...
(939, 940) 0.15
(555, 905) 0.35
(75, 101) 0.65
Length: 122112, dtype: float64

In [10]: Student's answer (Top)

def new_connections_predictions():

from sklearn.ensemble import GradientBoostingClassifier

future_connections['pref_attachment'] = [list(nx.preferential
_attachment(G, [node_pair]))[0][2]
for node_pair in fut
ure_connections.index]
future_connections['comm_neighbors'] = [len(list(nx.common_ne
ighbors(G, node_pair[0], node_pair[1])))
for node_pair in futu
re_connections.index]
train_data = future_connections[~future_connections['Future C
onnection'].isnull()]
test_data = future_connections[future_connections['Future Con
nection'].isnull()]
clf = GradientBoostingClassifier()
clf.fit(train_data[['pref_attachment','comm_neighbors']].valu
es, train_data['Future Connection'].values)
preds = clf.predict_proba(test_data[['pref_attachment','comm_
neighbors']].values)[:,1]
return pd.Series(preds, index=test_data.index)

new_connections_predictions()

Out[10]: (107, 348) 0.031823

(542, 751) 0.012931
(20, 426) 0.543026
(50, 989) 0.013104
(942, 986) 0.013103
...
(165, 923) 0.013183
(673, 755) 0.013103
(939, 940) 0.013103
(555, 905) 0.012931
(75, 101) 0.017730
Length: 122112, dtype: float64

In [11]: Grade cell: cell-979b4a17d794f3d0 Score: 1.0 / 1.0 (Top)

ans_prob_preds = new_connections_predictions()
assert type(ans_prob_preds) == pd.core.series.Series, "You must r
eturn a Pandas series"
assert len(ans_prob_preds) == 122112, "The series must be of leng
th 122112"

In [ ]:

This assignment was graded by mooc_adswpy:9154b96e4479, v1.37.030923

https://www.coursera.org/api/rest/v1/executorruns/richfeedback?id=KXJ2DKdOEfCYSQr_4Hq04Q&feedbackType=HTML 8/8

Network Visualization Lab
No ratings yet
Network Visualization Lab
7 pages
Social Network Analysis Guide
No ratings yet
Social Network Analysis Guide
20 pages
Unit 5
No ratings yet
Unit 5
22 pages
Social Network Analysis with NetworkX
No ratings yet
Social Network Analysis with NetworkX
16 pages
ML 5
No ratings yet
ML 5
23 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Ai ML Programs
No ratings yet
Ai ML Programs
34 pages
AI LAB Contents
No ratings yet
AI LAB Contents
19 pages
CSE 3024: Web Mining: Lab Assessment - 3
No ratings yet
CSE 3024: Web Mining: Lab Assessment - 3
13 pages
Directed Graphical Models in Python
No ratings yet
Directed Graphical Models in Python
24 pages
05 Solving Shortest Path Problems With Networkx - Completed PDF
No ratings yet
05 Solving Shortest Path Problems With Networkx - Completed PDF
7 pages
Unit 6
No ratings yet
Unit 6
34 pages
Salary Estimation Using K-Nearest Neighbour
No ratings yet
Salary Estimation Using K-Nearest Neighbour
1 page
Id5132 1
No ratings yet
Id5132 1
22 pages
Wee 5
No ratings yet
Wee 5
2 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Ai&Ml Lab Record Final
No ratings yet
Ai&Ml Lab Record Final
31 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
S 11
No ratings yet
S 11
7 pages
AI&ML PGM
No ratings yet
AI&ML PGM
53 pages
US Airports Network Analysis
No ratings yet
US Airports Network Analysis
3 pages
AI & ML Laboratory Report - Bharathidasan College
No ratings yet
AI & ML Laboratory Report - Bharathidasan College
49 pages
SSRN 3526707
No ratings yet
SSRN 3526707
5 pages
Certura Machine Learning Internship
No ratings yet
Certura Machine Learning Internship
21 pages
Reddy Ranjith Kumar - Project
No ratings yet
Reddy Ranjith Kumar - Project
13 pages
Numpy Module
No ratings yet
Numpy Module
10 pages
Unit 4 DS
No ratings yet
Unit 4 DS
16 pages
AIML Lab Manual Final
No ratings yet
AIML Lab Manual Final
43 pages
OOP Using Python Hands-On Assessment
No ratings yet
OOP Using Python Hands-On Assessment
10 pages
Graph Analysis2 Code
No ratings yet
Graph Analysis2 Code
2 pages
DS Lab 14 - Clustering
No ratings yet
DS Lab 14 - Clustering
3 pages
DT Project
No ratings yet
DT Project
5 pages
CSE 6040: Clustering with Lloyd's Algorithm
No ratings yet
CSE 6040: Clustering with Lloyd's Algorithm
15 pages
Aiml
No ratings yet
Aiml
85 pages
Understanding Graph Theory with NetworkX
No ratings yet
Understanding Graph Theory with NetworkX
23 pages
Project Submission Edunet Foundation
No ratings yet
Project Submission Edunet Foundation
10 pages
AIML Spiral
No ratings yet
AIML Spiral
41 pages
Graph Theory Basics in Python
No ratings yet
Graph Theory Basics in Python
10 pages
CS178 Winter 2017 Homework 1 Guide
No ratings yet
CS178 Winter 2017 Homework 1 Guide
4 pages
Data Analysis and Modeling Report
No ratings yet
Data Analysis and Modeling Report
9 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Should We Use Mock Data Becaue Columns Isnt Enough
No ratings yet
Should We Use Mock Data Becaue Columns Isnt Enough
7 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Stanford Web Graph Analysis Report
No ratings yet
Stanford Web Graph Analysis Report
22 pages
010 SVR in Python
No ratings yet
010 SVR in Python
14 pages
Graph Analysis3 Code
No ratings yet
Graph Analysis3 Code
2 pages
Introduction To Data Mining & Classification
No ratings yet
Introduction To Data Mining & Classification
58 pages
121a1114 D2 Sma Exp4
No ratings yet
121a1114 D2 Sma Exp4
5 pages
Shortest Patha Algorithms Explained
No ratings yet
Shortest Patha Algorithms Explained
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
23 pages
Avantika CPS Lab
No ratings yet
Avantika CPS Lab
31 pages
Floqer Assignment Software Developer-2
No ratings yet
Floqer Assignment Software Developer-2
2 pages
Machine Learning 600 Assignment
No ratings yet
Machine Learning 600 Assignment
17 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Python Network Analysis Guide
No ratings yet
Python Network Analysis Guide
47 pages
2nd Year
No ratings yet
2nd Year
83 pages
Python Report Ritik
No ratings yet
Python Report Ritik
15 pages
Affinity-Driven Enhancements in Devops Pipelines: Benefits and Measurement Techniques
No ratings yet
Affinity-Driven Enhancements in Devops Pipelines: Benefits and Measurement Techniques
9 pages
Financial Modeling for Analysts
100% (1)
Financial Modeling for Analysts
53 pages
Essential DSA Courses for Interviews
No ratings yet
Essential DSA Courses for Interviews
16 pages
Final SIH Draft
No ratings yet
Final SIH Draft
7 pages
Banking Crisis Insights and Future Outlook
No ratings yet
Banking Crisis Insights and Future Outlook
9 pages
Eureka Forbes Customer Preference Report
No ratings yet
Eureka Forbes Customer Preference Report
5 pages
Intervention Material (Advanced Class)
No ratings yet
Intervention Material (Advanced Class)
3 pages
COVID-19's Impact on WMSU Nursing Students
No ratings yet
COVID-19's Impact on WMSU Nursing Students
10 pages
Dsasyallabus
No ratings yet
Dsasyallabus
1 page
Criminal Procedure
No ratings yet
Criminal Procedure
51 pages
R24 MBA III & IV Updated - 16 July 2025 - Semester Syllabus
No ratings yet
R24 MBA III & IV Updated - 16 July 2025 - Semester Syllabus
220 pages
A.C No. 11026, Nov. 29, 2023
No ratings yet
A.C No. 11026, Nov. 29, 2023
1 page
1 s2.0 S0048969723077057 Main
No ratings yet
1 s2.0 S0048969723077057 Main
18 pages
DN250 Turbine Gas Meter Specs
No ratings yet
DN250 Turbine Gas Meter Specs
1 page
HR Management MCQs for Strategic Insights
100% (2)
HR Management MCQs for Strategic Insights
24 pages
Risk Management and Insurance Syllabus
No ratings yet
Risk Management and Insurance Syllabus
77 pages
Current and Quick Ratio Analysis
No ratings yet
Current and Quick Ratio Analysis
5 pages
2SB817C/2SD1047C: 140V / 12A, AF 80W Output Applications
No ratings yet
2SB817C/2SD1047C: 140V / 12A, AF 80W Output Applications
2 pages
ISO CD 22519 TraceabilityE
No ratings yet
ISO CD 22519 TraceabilityE
10 pages
Chapter 2.1-Multimedia Storage Techniques
67% (3)
Chapter 2.1-Multimedia Storage Techniques
41 pages
Mern
No ratings yet
Mern
195 pages
C Programming: Switch & Goto
No ratings yet
C Programming: Switch & Goto
18 pages
Sir Richard Rogers: Srikumaran Umapathy Varun Sreenath Vignesh Vasanth Yashwanth TM
No ratings yet
Sir Richard Rogers: Srikumaran Umapathy Varun Sreenath Vignesh Vasanth Yashwanth TM
29 pages
7SG14 - Duobias M Complete Technical Manual PDF
0% (1)
7SG14 - Duobias M Complete Technical Manual PDF
142 pages
Extension Module 4AI + 4AO: AK-XM 103A
No ratings yet
Extension Module 4AI + 4AO: AK-XM 103A
4 pages
Isca CH 4 BCP PM
No ratings yet
Isca CH 4 BCP PM
18 pages
Elcometer Coating Thickness Gauge 456 Full Manual
No ratings yet
Elcometer Coating Thickness Gauge 456 Full Manual
89 pages
thinkIIT 2025 26BLR 00047
No ratings yet
thinkIIT 2025 26BLR 00047
2 pages
ISO 4291 1985 PDF Version (En)
No ratings yet
ISO 4291 1985 PDF Version (En)
20 pages
Materials Studio V23 Technical Specs
No ratings yet
Materials Studio V23 Technical Specs
7 pages
HVAC Water Treatment Specifications
No ratings yet
HVAC Water Treatment Specifications
6 pages
Total Quality Management Guide
No ratings yet
Total Quality Management Guide
18 pages
Gujarat Pipavav Port IPO Details
No ratings yet
Gujarat Pipavav Port IPO Details
353 pages
Desigo CC V7 System Description (HQ Edition) A6V14195951 - en
No ratings yet
Desigo CC V7 System Description (HQ Edition) A6V14195951 - en
49 pages

Assignment 4

Uploaded by

Assignment 4

Uploaded by

10/16/25, 8:24 PM assignment4

Assignment4 (Score: 3.0 / 3.0)

Part 1 - Random Graph Identification

Preferential Attachment ( 'PA' )

In [3]: Student's answer (Top)

# Degree distribution analysis

degree_variance = std_degree / avg_degree if avg_degree >

# PA has very high degree variance (power law)

In [4]: Grade cell: cell-efb9da7e1c19accf Score: 1.0 / 1.0 (Top)

Part 2 - Company Emails

In [5]: G = pickle.load(open('assets/email_prediction_NEW.txt', 'rb'))

print(f"Graph with {len(nx.nodes(G))} nodes and {len(nx.edges(G))}

Graph with 1005 nodes and 16706 edges

Part 2A - Salary Prediction

In [6]: list(G.nodes(data=True))[:5] # print the first 5 nodes

Out[6]: [(0, {'Department': 1, 'ManagementSalary': 0.0}),

In [7]: Student's answer (Top)

from sklearn.neural_network import MLPClassifier

In [8]: Grade cell: cell-bc9c23e7517908ab Score: 1.0 / 1.0 (Top)

Part 2B - New Connections Prediction

In [9]: future_connections = pd.read_csv('assets/Future_Connections.csv', i

(6, 840) 0.0

(4, 197) 0.0

(620, 979) 0.0

(519, 872) 0.0

(382, 423) 0.0

(97, 226) 1.0

(349, 905) 0.0

(429, 860) 0.0

(309, 989) 0.0

(468, 880) 0.0

To accomplish this, you will need to:

(107, 348) 0.35

In [10]: Student's answer (Top)

from sklearn.ensemble import GradientBoostingClassifier

Out[10]: (107, 348) 0.031823

In [11]: Grade cell: cell-979b4a17d794f3d0 Score: 1.0 / 1.0 (Top)

This assignment was graded by mooc_adswpy:9154b96e4479, v1.37.030923

You might also like