0% found this document useful (0 votes)
8 views947 pages

NetMiner Module Reference

3. NetMiner Module Reference

Uploaded by

Gaeun Baek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views947 pages

NetMiner Module Reference

3. NetMiner Module Reference

Uploaded by

Gaeun Baek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NetMiner Module Reference

ii
Copyright and Trademarks

NetMiner Module Reference


Version 4.4

Copyright 2000-2018 by Cyram Inc.

Companies, names and data used in examples herein are fictitious unless otherwise noted.

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or

transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or

otherwise, without the prior written permission of Cyram Inc.

The information in this publication is provided for information only and is subject to change without

notice. Cyram Inc. assume no responsibility or liability for any loss or damage that may arise from

the use of any information in this publication. The software described in this book is furnished under

license and may only be used or copied in accordance with the terms of that license.

NetMiner is a registered trademark of Cyram Inc. CYRAM and CYRAM logo are registered

trademarks of Cyram Inc.

Cyram Inc.

#904, U-Space 2B, 670, Daewangpangyo-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13494,

South Korea

Tel: +82-31-739-8352

Fax: +82-31-739-8354

Electronic access: [Link]

How to reference NetMiner 4

iii
NetMiner Module Reference

In order to reference NetMiner 4, please employ the following bold-faced reference, which is also

found at NetMiner Menu: “Help >> About NetMiner 4”.

Cyram (2018). NetMiner 4.4. Seoul: Cyram Inc.

Technical Support

The NetMiner Help file will guide you to use NetMiner 4. Simply press F1 function key or use

NetMiner menu bar for triggering Help feature.

Cyram is committed to provide a reliable, high-quality, yet easy-to-use product. If you have any

problem of installing or using NetMiner, please contact us by mail, phone and fax or preferably by

email and World Wide Web.

Cyram Inc.

#904, U-Space 2B, 670, Daewangpangyo-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13494,

South Korea

Tel: +82-31-739-8352

Fax: +82-31-739-8354

Email: netminer@[Link] Web: [Link]

When reporting a problem on NetMiner, please include the following information:

 your name and e-mail address

 version number of NetMiner program

 your hardware and software configuration. (ex. O/S, JRE version, RAM, CPU, etc.)

 a description of steps to reproduce the problem

iv
v
NetMiner Module Reference

Menu Categories
I. Transform

II. Analyze

III. Statistics

IV. Mining

V. Visualize

VI. Chart

vi
Modules List
Menu Categories ........................................................................................ vi
Modules List .............................................................................................. vii
I. Transform ................................................................................................ 1
Transform >> Direction >> Symmetrize ............................................. 3
Transform >> Direction >> Transpose ............................................... 6
Transform >> Value >> Dichotomize .................................................. 8
Transform >> Value >> Reverse ....................................................... 10
Transform >> Value >> Normalize .................................................... 13
Transform >> Value >> Recode ........................................................ 16
Transform >> Value >> Missing ........................................................ 19
Transform >> Value >> Diagonal ...................................................... 21
Transform >> NodeSet >> Ego Network .......................................... 24
Transform >> NodeSet >> Reorder ................................................... 27
Transform >> LinkSet >> Incidence .................................................. 29
Transform >> Linkset >> Line Graph ................................................ 32
Transform >> Linkset >> Link Reduction ......................................... 35
Transform >> Linkset >> Link Reduction Simulation ....................... 38
Transform >> Matrix >> Vectorize >> 1-mode Network ................ 43
Transform >> Matrix >> Vectorize >> 2-mode Network ................ 46
Transform >> Layer >> Split ............................................................. 49
Transform >> Layer >> Merge .......................................................... 52
Transform >> Layer >> Multiplex ..................................................... 55
Transform >> Mode >> 2-mode Network ........................................ 58
Transform >> Mode >> 1-mode Network ........................................ 66
Transform >> Mode >> Main Node Attribute ................................... 72
Transform >> Mode >> Tree Construction ....................................... 75
Transform >> Random >>1-mode Network >> Erdos-Renyi .......... 78
Transform >> Random >>1-mode Network >> Scale-Free ............ 81
Transform >> Random >>1-mode Network >> QAP Permutation .. 84

vii
NetMiner Module Reference

Transform >> Random >>1-mode Network >> MCMC .................... 86


II. Analyze .................................................................................................. 89
Analyze >> Neighbor >> Degree ....................................................... 93
Analyze >> Neighbor >> Ego Network ............................................. 98
Analyze >> Neighbor >> Structural Hole ........................................ 102
Analyze >> Neighbor >> Homophily................................................ 107
Analyze >> Neighbor >> Assortativity ............................................ 110
Analyze >> Neighbor >> Equicentrality .......................................... 113
Analyze >> Subgraph >> Dyad Census ........................................... 117
Analyze >> Subgraph >> Triad Census ........................................... 121
Analyze >> Subgraph >> Triad Combination .................................. 126
Analyze >> Subgraph >> Motif Search ............................................ 130
Analyze >> Connection >> Shortest Path ....................................... 135
Analyze >> Connection >> All Path Finding ................................... 141
Analyze >> Connection >> All Cycle Finding ................................. 145
Analyze >> Connection >> Dependency ......................................... 149
Analyze >> Connection >> Node Connectivity ............................... 153
Analyze >> Connection >> Link Connectivity ................................. 157
Analyze >> Connection >> Minimum Cutset ................................... 162
Analyze >> Connection >> Maximum Flow ..................................... 166
Analyze >> Connection >> Topological Sort .................................. 170
Analyze >> Connection >> PFnet .................................................... 174
Analyze >> Connection >> Influence ............................................... 178
Analyze >> Connection >> Accessibility ......................................... 182
Analyze >> Cohesion >> Component ............................................... 186
Analyze >> Cohesion >> Bi-Component ......................................... 191
Analyze >> Diffusion >> Influence Network >> Effects................. 196
Analyze >> Diffusion >> Influence Network >> Sequence ............ 201
Analyze >> Diffusion >> Linear Threshold >> Process ................. 205
Analyze >> Diffusion >> Linear Threshold >> Target ................... 212
Analyze >> Cohesion >> Clique ....................................................... 217

viii
Analyze >> Cohesion >> Generalized Clique .................................. 222
Analyze >> Cohesion >> n-Clique ................................................... 228
Analyze >> Cohesion >> n-Clan ...................................................... 234
Analyze >> Cohesion >> k-Plex ...................................................... 240
Analyze >> Cohesion >> k-Core ..................................................... 246
Analyze >> Cohesion >> Lambda Set .............................................. 250
Analyze >> Cohesion >> Community (Betweenness) ..................... 257
Analyze >> Cohesion >> Community (Modularity) ......................... 262
Analyze >> Cohesion >> Community (Eigenvector) ....................... 267
Analyze >> Cohesion >> Community (Label Propagation) ............. 271
Analyze >> Cohesion >> Community (Blondel) .............................. 275
Analyze >> Cohesion >> Cohesive Block ....................................... 279
Analyze >> Cohesion >> s-Clique ................................................... 283
Analyze >> Centrality >> Degree .................................................... 288
Analyze >> Centrality >> Coreness................................................. 293
Analyze >> Centrality >> Closeness ............................................... 297
Analyze >> Centrality >> Decay ...................................................... 302
Analyze >> Centrality >> Percolation ............................................. 307
Analyze >> Centrality >> Betweenness >> Node ........................... 314
Analyze >> Centrality >> Betweenness >> Link ............................ 318
Analyze >> Centrality >> Flow Betweenness ................................. 322
Analyze >> Centrality >> R.W. Betweenness ................................. 325
Analyze >> Centrality >> Information ............................................. 329
Analyze >> Centrality >> Load ........................................................ 333
Analyze >> Centrality >> Eigenvector ............................................ 337
Analyze >> Centrality >> Status ...................................................... 341
Analyze >> Centrality >> Power ..................................................... 346
Analyze >> Centrality >> Effects .................................................... 350
Analyze >> Centrality >> PageRank ................................................ 354
Analyze >> Centrality >> Generalized PageRank ........................... 358
Analyze >> Centrality >> HITS........................................................ 363

ix
NetMiner Module Reference

Analyze >> Centrality >> Community.............................................. 367


Analyze >> Equivalence >> Structural >> Profile .......................... 371
Analyze >> Equivalence >> Structural >> CONCOR ...................... 381
Analyze >> Equivalence >> Regular >> REGGE ............................. 388
Analyze >> Equivalence >> Regular >> CatRE ............................... 396
Analyze >> Equivalence >> Role >> Triad ..................................... 402
Analyze >> Equivalence >> Role >> Local ..................................... 408
Analyze >> Equivalence >> SimRank .............................................. 414
Analyze >> Position >> Blockmodel (Conventional) ...................... 421
Analyze >> Position >> Brokerage .................................................. 431
Analyze >> Position >> Bow-Tie Model ......................................... 437
Analyze >> Position >> Expand/Collapse ....................................... 442
Analyze >> Properties >> Network >> Multiple ............................. 447
Analyze >> Properties >> Network >> Modularity ........................ 452
Analyze >> Properties >> Group ..................................................... 455
Analyze >> Models >> Dyadic Interaction (p1) .............................. 459
Analyze >> Models >> ERGM (p*) ................................................... 465
Analyze >> Models >> Blockmodel (Generalized) .......................... 474
Analyze >> Two Mode >> Degree ................................................... 481
Analyze >> Two Mode >> Eigenvector Centrality ......................... 484
Analyze >> Two Mode >> Max. Matching ...................................... 488
III. Statistics............................................................................................. 492
Statistics >> MDS ............................................................................. 494
Statistics >> Correspondence .......................................................... 498
Statistics >> Decomposition >> Eigenvector .................................. 501
Statistics >> Decomposition >> Singular ........................................ 503
Statistics >> Decomposition >> Spectral ........................................ 506
Statistics >> Covariance Matrix ...................................................... 509
Statistics >> Principal Component ................................................... 511
Statistics >> Factor Analysis ........................................................... 515
Statistics >> Frequency >> Vector ................................................. 519

x
Statistics >> Frequency >> Matrix .................................................. 522
Statistics >> Gini Coefficient >> Vector ......................................... 525
Statistics >> Gini Coefficient >> Matrix .......................................... 528
Statistics >> Power Law >> Vector ................................................ 531
Statistics >> Power Law >> Matrix ................................................. 534
Statistics >> Descriptives >> Vector .............................................. 538
Statistics >> Descriptives >> Matrix ............................................... 540
Statistics >> Crosstabs >> Vector ................................................... 542
Statistics >> Crosstabs >> Matrix ................................................... 546
Statistics >> ANOVA >> Vector ...................................................... 550
Statistics >> ANOVA >> Matrix ....................................................... 554
Statistics >> Correlation >> Vector ................................................ 558
Statistics >> Correlation >> Matrix ................................................. 566
Statistics >> Autocorrelation >> Join-Count .................................. 574
Statistics >> Autocorrelation >> Continuous .................................. 578
Statistics >> Regression >> Vector ................................................. 581
Statistics >> Regression >> Matrix ................................................. 585
Statistics >> Logistic Regression >> Vector .................................. 589
Statistics >> Logistic Regression >> Matrix ................................... 595
IV. Mining ................................................................................................ 601
Mining >> Frequent Subgraph >> GREW >> Undirected Graphs... 603
Mining >> Frequent Subgraph >> GREW >> Directed Graphs ....... 607
Mining >> Frequent Subgraph >> gSpan >> Multiple Graphs ........ 611
Mining >> Frequent Subgraph >> gSpan >> Partitioning ............... 611
Getting Started with Solving Classification Problems using NetMiner
........................................................................................................... 620
Mining >> Classification >> k-Nearest Neighbor (KNN) >> Matrix
........................................................................................................... 629
Mining >> Classification >> k-Nearest Neighbor (KNN) >> Vector
........................................................................................................... 638
Mining >> Classification >> CART .................................................. 647

xi
NetMiner Module Reference

Mining >> Classification >> Naive Bayes ....................................... 657


Mining >> Classification >> Discriminant Analysis ........................ 665
Mining >> Classification >> Support Vector Machines (SVMs) ..... 675
Mining >> Classification >> Multilayer Perceptron ........................ 685
Mining >> Regression >> Classification and Regression Tree
(CART) .............................................................................................. 694
Mining >> Collaborative Filtering >> Singular Value Decomposition
(SVD) ................................................................................................. 702
Mining >> Collaborative Filtering >> Singular Value
Decomposition++ (SVD++) ............................................................ 710
Mining >> Collaborative Filtering >> Social Singular Value
Decomposition++ (SSVD++) .......................................................... 718
Mining >> Collaborative Filtering >> Implicit Singular Value
Decomposition (ISVD) ...................................................................... 725
Mining >> Collaborative Filtering >> User Based .......................... 732
Mining >> Reduction >> Non-Negative Matrix Factorization (NNMF)
........................................................................................................... 738
Mining >> Clustering (Common) ...................................................... 743
Mining >> Clustering >> Hierarchical >> Matrix ............................ 749
Mining >> Clustering >> Hierarchical >> Vector ........................... 753
Mining >> Clustering >> K-means .................................................. 757
Mining >> Clustering >> Gaussian Mixture Model (GMM) ............. 764
Mining >> Clustering >> Partitioning Around Medoids (PAM) >>
Matrix ................................................................................................ 773
Mining >> Clustering >> Partitioning Around Medoids (PAM) >>
Vector ................................................................................................ 779
Mining >> Anomaly Detection >> Probability Distribution >>
Independent Normal ......................................................................... 785
Mining >> Anomaly Detection >> Probability Distribution >>
Multivariate Normal .......................................................................... 791
Mining >> Anomaly Detection >> Local Outlier Factor >> Matrix 795

xii
Mining >> Anomaly Detection >> Local Outlier Factor >> Vector 802
Mining >> Anomaly Detection >> Attribute Value Frequency(AVF)
........................................................................................................... 809
Mining >> Text >> Topic >> Latent Dirichlet Allocation (LDA) .... 813
V. Visualize .............................................................................................. 819
Visualize >> Layout >> 2D ............................................................... 821
Visualize >> Layout >> 3D ............................................................... 824
Visualize >> Drawing >> 2D ............................................................ 827
Visualize >> Drawing >> 3D ............................................................ 829
Visualize >> Spring >> 2D ............................................................... 831
Visualize >> Spring >> 2D >> Kamada & Kawai ............................ 832
Visualize >> Spring >> 2D >> Stress Majorization ......................... 836
Visualize >> Spring >> 2D >> Eades ............................................... 840
Visualize >> Spring >> 2D >>Fruchterman & Reingold ................. 844
Visualize >> Spring >> 2D >> GEM ................................................. 848
Visualize >> Spring >> 2D >> HDE ................................................. 852
Visualize >> Spring >> 3D ............................................................... 854
Visualize >> Spring >> 3D >> Kamada & Kawai ............................ 855
Visualize >> Spring >> 3D >> Eades ............................................... 859
Visualize >> MDS >> 2D .................................................................. 863
Visualize >> MDS >> 3D .................................................................. 867
Visualize >> Clustered >> 2D .......................................................... 871
Visualize >> Clustered >> 2D >> Clustered-CoLa ......................... 872
Visualize >> Clustered >> 2D >> Clustered Eades ........................ 876
Visualize >> Clustered >> 3D >> Clustered Eades ........................ 880
Visualize >> Layered >> 2D >> Dig-CoLa ..................................... 885
Visualize >> Circular >> 2D >> Circumference .............................. 889
Visualize >> Circular >> 2D >> Concentric .................................... 892
Visualize >> Circular >> 2D >> Radial ............................................ 895
Visualize >> Simple >> 2D >> Fixed ............................................... 898
Visualize >> Simple >> 2D >> Random ........................................... 901

xiii
NetMiner Module Reference

Visualize >> Two Mode >> Spring .................................................. 903


Visualize >> Link Layout >> Edge Bundling >> Divided Edge
Bundling ............................................................................................. 905
VI. Chart .................................................................................................. 908
Chart >> Pie Chart ............................................................................ 909
Chart >> Matrix Diagram ................................................................. 911
Chart >> Area Bar ............................................................................ 914
Chart >> Box Plot ............................................................................. 916
Chart >> Scatter Plot ........................................................................ 919
Chart >> Contour Plot ...................................................................... 921
Chart >> Surface Plot ....................................................................... 924
Chart >> Network Contour Plot ....................................................... 927
Chart >> Network Surface Plot ....................................................... 930

xiv
I. Transform

I. Transform
1. Direction
 Symmetrize

 Transpose

2. Value
 Dichotomize

 Reverse

 Normalize

 Recode

 Missing

 Diagonal

3. NodeSet
 Ego Network

 Reorder

4. LinkSet
 Incidence

 Line Graph

 Link Reduction
 Link Reduction Simulation

5. Matrix
 Vectorize >> 1-mode Network

 Vectorize >> 2-mode Network

6. Layer
 Split

 Merge

 Multiplex

7. Mode
 2-mode Network

1
NetMiner Module Reference

 1-mode Network

 Main Node Attribute

 Tree Construction

8. Random
 1-mode Network >> Erdos-Renyi

 1-mode Network >> Scale-Free

 1-mode Network >> QAP Permutation

 1-mode Network >> MCMC

2
I. Transform

Transform >> Direction >> Symmetrize

 Menu
Transform >> Direction >> Symmetrize

 Description
This module transforms a directed/asymmetric 1-mode Network data into an undirected/symmetric 1-

mode Network data.

 User Options

 Input
1-mode Network: select directed/asymmetric 1-mode Network to
transform. You can select multiple networks at once.

 Main process
Operator is one of ‘MAX’, ‘MIN’, ‘AVG’, ‘SUM’, ‘PRODUCT’, ‘LOWER’, ‘UPPER’.

MAX: X 'i , j  X ' j ,i  MAX ( X i , j , X j ,i )

MIN: X 'i , j  X ' j ,i  MIN ( X i , j , X j ,i )

X i , j  X j ,i
AVG: X ' i , j  X ' j ,i 
2

SUM: X ' i , j  X ' j ,i  X i , j  X j ,i

PRODUCT: X ' i , j  X ' j ,i  X i , j  X j ,i

3
NetMiner Module Reference

 X j ,i for i  j
LOWER: X 'i , j  
 X i, j for i  j

 X j ,i for i  j
UPPER: X 'i , j  
 X i, j for i  j

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Symmetrize’

module, Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Symmetrized 1-mode Network Table


Symmetrized 1-mode Network is created.

4
I. Transform

 Example
If symmetrize method is “Max”,

• Data Expression

• Network Visualization

 Time Complexity
 O(m)

 Related Topics

5
NetMiner Module Reference

Transform >> Direction >> Transpose

 Menu
Transform >> Direction >> Transpose

 Description
This module transposes a 1-mode Network data, i.e., resulting in reverse directions of all edges.

 User Options

 Input
1-mode Network: select directed/asymmetric 1-mode Network to
transform. You can select multiple networks at once.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Transpose’ module,

Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

6
I. Transform

 Tables

Transposed 1-mode Network Table


Transposed 1-mode Network is created.

 Example

 Time Complexity
 O(m)

 Related Topics

7
NetMiner Module Reference

Transform >> Value >> Dichotomize

 Menu
Transform >> Value >> Dichotomize

 Description
For a given dataset, selected 1-mode Network variable(s) are transformed from weighted/valued data

to an unweighted/binary data, according to some specified criterion.

 User Options

 Input
1-mode Network: select weighted/valued 1-mode Network to
transform. You can select multiple networks at once.

 Main process
Criterion is one of ‘>’, ‘>=’, ‘=’, ‘<’, ‘<=’, ‘!=’.

User can specify criterion value (default value is 0).

If criterion is ‘>’ and criterion value is ‘1’, then X_ij would be 1 if

X_ij > 1, be 0 else. When other criterion is selected, same logic is

applied.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Dichotomize’

module, Main Report and Transformed Result Tables are created.

8
I. Transform

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Dichotomized Data Table


New (completely or partially) dichotomized data is generated.

 Example
If cut-off method is Greater Than (GT) 0,

 Time Complexity
 O(m)

 Related Topics

9
NetMiner Module Reference

Transform >> Value >> Reverse

 Menu
Transform >> Value >> Reverse

 Description
For a weighted data, weights are transformed so that maximum weight becomes minimum weight

and minimum weight becomes maximum weight. That is, similarity data is converted to dissimilarity

data, and dissimilarity data is converted to similarity data.

 User Options

 Input

1-mode Network: select 1-mode Network to be reversed. You can


select multiple networks at once.

2-mode Network: select 2-mode Network to be reversed. You can


select multiple networks at once.

Node Attribute: select Main Node Attribute to be reversed. You can


select multiple vectors at once.

 Main process

Diagonal Handling Option: If you select ‘retain’, diagonal values will be reversed some another
value. But if you select ‘ignore’, diagonal values will remain unchanged.

10
I. Transform

Process 0.0: If you include 0 in processing, 0 will be reversed some


another value. But if you exclude 0, 0 will be just 0 in output.

Reverse Weight Method


- Interval: It reverses selected variable linearly. (X’_ij = Max(X) +

Min(X) – X_ij)

- Ratio: It reverses selected variable inversely. (X’_ij = 1/X_ij)

- Fixed Decay: It reverses selected variable exponentially. So, when

you reverse your data by fixed decay, beta should smaller than 1. (X’_ij = beta^X_ij)

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Reverse’ module,

Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Reversed Data Table


New (completely or partially) reversed data is generated.

11
NetMiner Module Reference

 Example
If you set option like ‘exclude 0, interval, retain’, then

 Time Complexity
 O(n^2)

 Related Topics

12
I. Transform

Transform >> Value >> Normalize

 Menu
Transform >> Value >> Normalize

 Description
This function is to normalize dimension according to selected criterion. It is useful when you

compare multiple data.

 User Options

 Input

1-mode Network: select 1-mode Network to be normalized. You can


select multiple networks at once.

2-mode Network: select 2-mode Network to be normalized. You can


select multiple networks at once.

Node Attribute: select Main Node Attribute to be normalized. You


can select multiple vectors at once.

 Main process

Diagonal Handling Option: If you select ‘retain’, diagonal values will be normalized some another
value. But if you select ‘ignore’, diagonal values will remain unchanged.

Dimension

13
NetMiner Module Reference

- Rows: Each row will be normalized according to criterion.

- Columns: Each column will be normalized according to criterion.

- Matrix: Matrix itself will be normalized according to criterion.

- Rows & Columns: Rows are normalized, columns are normalized,

and in turn rows are normalized, again. It iteratively repeats

aforementioned steps according to criterion until stopping condition

is satisfied.

Stop Condition
- Rows & Columns normalizing stops if the number of iterations >= user specified value or Delta(=

sum of absolute changes of every elements in matrix caused by one step of normalizing) <= user

specified value

Criterion
- Sum: You can assign a value for normalization criterion as the sum of elements in normalized

object.

- Avg.: You can assign a value for normalization criterion as the average of elements in normalized

object.

- Std. dev.: You can assign a value for normalization criterion as the standard deviation of elements in

normalized object.

- Z-score: You can assign a value for normalization criterion as the average of elements in

normalized object in that the Std. dev. of elements in normalized object will be 1.

- Euclidean: You can assign a value for normalization criterion as the

euclidean norm of elements in normalized object.

- Absolute Maximum: You can assign a value for normalization

criterion as the absolute maximum of elements in normalized object.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Normalize’

module, Main Report and Transformed Result Tables are created.

14
I. Transform

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Normalized Data Table


New (completely or partially) normalized data is generated.

 Example
If the normalization method of the weight of the selected 1-mode Network variable is 'The sum of

rows = 1 including diagonal values', in the below output, you can see that row sums are all 1.

 Time Complexity
 O(n^2)

 Related Topics

15
NetMiner Module Reference

Transform >> Value >> Recode

 Menu
Transform >> Value >> Recode

 Description
For selected variable(s) of a given dataset, ranges of values are changed to new values. Thus, only

numerical values can be recoded.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

2-mode Network: select 2-mode Network to transform. You can


select multiple networks at once.

Node Attribute: select Main Node Attribute to transform. You can


select multiple vectors at once.

 Main process
Diagonal Handling Option: If you select ‘retain’, diagonal values
will be recoded some another value. But if you select ‘ignore’,

diagonal values will remain unchanged.

Recoding Rules: Each rule is processed in their input order. Elements


smaller or equal than Start Value and larger or equal than End Value

would be replaced to new value.

16
I. Transform

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Recode’ module, Main

Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Recoded Data Table


New (completely or partially) recoded data is generated.

 Example
If the weight of selected 1-mode Network variable is recoded as follows, 0~2 = 10, 3~5=20

17
NetMiner Module Reference

 Time Complexity
 O(n^2)

 Related Topics

18
I. Transform

Transform >> Value >> Missing

 Menu
Transform >> Value >> missing

 Description
For selected variable(s) of a given DataSet, user missing values or system missing values are recoded

to a new value. Only numerical values can be recoded.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

2-mode Network: select 2-mode Network to transform. You can


select multiple networks at once.

Node Attribute: select Main Node Attribute to transform. You can


select multiple vectors at once.

 Main process
Recode user missing value or system missing value to a new value

*System missing value is displayed as -999999 in the data display in

an editing session.

 Output

19
NetMiner Module Reference

You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Missing’ module,

Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Missing Recoded Data Table


New recoded data is generated.

 Time Complexity
 O(n^2)

 Related Topics

20
I. Transform

Transform >> Value >> Diagonal

 Menu
Transform >> Value >> Diagonal

 Description
For selected 1-mode Network variable of a given DataSet, diagonal values are replaced by constant

or main node attribute vector.

 User Options

 Input
1-mode Network: select directed/asymmetric 1-mode Network to
transform. You can select multiple networks at once.

 Main process
- With one value: replace diagonal values to user specified value

- With a vector: replace diagonal values to Main Node Attribute

according to selected vector.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Diagonal’ module, Main Report and Transformed Result Tables are created.

21
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Diagonal Replaced Data Table


New diagonal replaced data is generated.

 Time Complexity
 O(n)

22
I. Transform

 Related Topics

23
NetMiner Module Reference

Transform >> NodeSet >> Ego Network

 Menu
Transform >> NodeSet >> Ego Network

 Description
Ego network of focal nodes is the network including only reachable nodes from focal nodes based

upon specified distance and in/out direction of edges. This function makes new graph consisting of

only ego-network for the specified focal nodes. Ego network transform treats one-edge distance

equally regardless of its weight value; Therefore, you should dichotomize your data before running

this module.

 User Options

 Input

1-mode Network: select 1-mode Network to transform. You can


select multiple networks at once.

Node: select focal nodes (egos). You can select multiple nodes at
once.

 Pre-process
- Dichotomize: You should dichotomize your data before running

module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

24
I. Transform

 Main process

Distance: Decide distance number. Neighbors within specified


‘decided distance’ will be included in extracted network.

Include
- Include focal nodes: If ‘no’ is selected, extracted network doesn’t

have focal nodes.

- Include links between alters: If ‘no’ is selected, extracted network doesn’t have links between alters.

(Alter means the node that is not focal node but neighbor of some focal nodes)

Direction (Direct Neighbor): When determining direct neighbors of focal nodes, if you select
- ‘In’: then only in-neighbors of focal nodes are selected.

- ‘Out’: then only out-neighbors of focal nodes are selected.

- ‘In And Out’: then only nodes in both in-neighbors and out-neighbors of focal nodes are selected.

- ‘In Or Out’: then nodes in one of in-neighbors and out-neighbors of focal nodes are selected.

Direction (Indirect Neighbor): When determining indirect neighbors (nodes farther than distance 2

from focal nodes), this criterion is used following same logic as ‘Direction (Direct Neighbor)’.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Ego Network’ module,

Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

25
NetMiner Module Reference

 Tables

Result Data Table


Neighbors of focus node are indicated as "Included".

You can compose a new DataSet by 'included' nodes using Data

menu.

You can check a generated new Workfile in Workfile Tree as follows.

Distance Vector Table


Distance vector of the neighbors of the focus node.

 Example
If selected Focal Node is a, diagonal values are included, selected direction is ‘in’, including links

between alters.

 Time Complexity
 O(n)

 Related Topics

26
I. Transform

Transform >> NodeSet >> Reorder

 Menu
Transform >> NodeSet >> Reorder

 Description
This function is to reorder nodes of Main Nodeset. You can sort the nodes increasingly or

decreasingly in terms of user specified options.

 User Options

 Input

Select Nodeset: select a Nodeset to reorder.

Attributes: You may reorder the Nodeset by criterion vectors.


Maximum three attributes can be used as criterion vectors. If the

checkbox is selected, Nodeset is reordered in increasing order.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Reorder’ module,

Main Report and Transformed Result Tables are created.

27
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Order Vector Table


Orders by criterion vector are presented. The new data

reordered by this order may be saved as a new Workfile.

 Example
If the criterion vector for sorting is Dept. and the order of sorting is increasing,

 Time Complexity
 O(n)

 Related Topics

28
I. Transform

Transform >> LinkSet >> Incidence

 Menu
Transform >> LinkSet >> Incidence

 Description
Let’s say that node a and node b is incident to link (a, b). This module makes a 2-mode network,

which is an incidence matrix, from a selected 1-mode matrix. The rows of the result represent nodes

and the columns represent links. The dimension of the result is [# Nodes x # Links] and its (i, j)

element is 1 when (node) i is incident to (link) j, and 0 otherwise. By definition, the row sum of the

incidence matrix is equal to the total degree of the node corresponding to that row. And the column

sum will be 2 for all columns except self-loop links. If you retain diagonal cells, self-loop links will

be presented in the matrix. The column sum of each self-loop link is 1.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

 Main process
Select Diagonal Handling Option: If you select ‘retain’, *self-loop
would appear in generated incidence 2-mode Network as a 'column'.

*self-loop: Diagonal cell means a link that its source node is same to

its target node. We call this link a self-loop link. It is shown as ‘(node a -> node a)’ in the column of

the incidence network.

29
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Incidence’ module,

Main Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Incidence 2-mode Network Data Table


New Incidence Network Data is created.

30
I. Transform

 Example

 Time Complexity
 O(m)

 Related Topics

31
NetMiner Module Reference

Transform >> Linkset >> Line Graph

 Menu
Transform >> LinkSet >> Line Graph

 Description
‘Line Graph’ transforms a graph into a new graph whose each node represents a link in the input

graph. Two nodes in converted graph, which were originally two links in the input graph, are

adjacent if two links selected in the original input graph were adjacent to a common node.

Since when a directed graph is given, 'adjacency' of two links in the input can be obscure, we use one

of several pertinent definitions. Two nodes in the output representing link (a,b) and link (c,d) in the

input graph are adjacent if ‘b = c’ or ‘d=a’.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

 Main process

Diagonal Handling Option: If you select ‘retain’, *self-loop would


appear in generated line graph(1-mode Network) as a node.

*self-loop: Diagonal cell means a link that its source node is same to

its target node. We call this link a self-loop link. It is represented as a

node named ‘(node a-> node a)’ in the line graph.

Line Graph Link Weight Option: If you select ‘Dichotomized’, the

32
I. Transform

Line Graph is generated as a dichotomized network.

Weight Calculation Option: When ‘Weighted’ is selected as the ‘Line Graph Link Weight Option’,
‘Weight Calculation Option’ is activated. The weight value of resulted Line Graph can be defined in

this option.

- Similarity Adjacency Type

If two links of line graph are not adjacent, weight of link((a,b),(c,d)) = weight of link(a,b)/degree of

link(a,b) + weight of link(c,d)/degree of link(c,d)

If two links of line graph are adjacent, weight of link((a,b),(b,c)) = {weight of link(a,b)+weight of

link(b,c)}/degree of node b

- Dissimilarity Adjacency Type: weight of link((a,b),(b,c)) = {weight of link(a,b) + weight of

link(b,c)}/2

- Capacity Adjacency Type: weight of link((a,b),(b,c)) = min(weight of link(a,b), weight of link(b,c)

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Line Graph’

module, Main Report and Line Graph are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Line Graph Table


New transformed 1-mode Network variable (line graph).

33
NetMiner Module Reference

 Example

 Time Complexity
 O(m*k) where k is the average degree of original graph

 Reference
 Frank Harary (1969). Graph Theory. Perseus Books. Chapter 8. Line Graphs.

 Related Topics

34
I. Transform

Transform >> Linkset >> Link Reduction

 Menu
Transform >> Linkset >> Link Reduction

 Description
Link Reduction module performs the reduction on the size of the network.

 User Options

 Input

1-mode Network: select 1-mode Network to reduce the size of the


network.

Link Attribute: select a link attribute which would be used to sort


the links. The chosen link attribute should be a numerical variable.

 Main process

Extract Method:
- Portion (Top/Bottom): The links of the top/bottom in the sorted list

are extracted. Hence, the remaining bottom/top links are deleted. The

portion of extracted links are defined by user.

- Number (Top/Bottom): The links of the top/bottom in the sorted list

are extracted. Hence, the remaining bottom/top links are deleted. The

number of extracted links are defined by user.

- Value: The links whose attribute value is greater/smaller than or

equal to this value is extracted.

35
NetMiner Module Reference

Handling tie values at the last tier: This option is to determine whether the links with its attribute
values being the last tier are excluded or included.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Link Reduction’

module, Main Report and Reduction Result table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
- Reduction Result: The number of included links, excluded links and total links are presented.

 Tables

Reduction Result
For all the links, the criterion value is presented. You will be able to check which node is included or

excluded in the result table.

36
I. Transform

 Example

 Time Complexity
 O(m*log m)

 Reference

 Related Topics

37
NetMiner Module Reference

Transform >> Linkset >> Link Reduction

Simulation

 Menu
Transform>>LinkSet>> Link Reduction Simulation

 Description
This module measures how various network properties such as ‘Number of Links’ and ‘Density’

change when removing the links. A link is removed if its weight is smaller than the threshold value,

which changes in between the range that a user specifies. The plots, which show the relationship

between the threshold value and each property, are presented.

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only
choose one 1-mode network.

 Link Merge: When selected data contains multiple

links, where more than two links connect the same

source node and target node pair, a user should

decide how to merge them into a single link.

 Main process
Minimum Threshold / Maximum Threshold / Interval: The threshold value changes from the
‘Minimum Threshold’ value to the ‘Maximum Threshold’ value with the ‘Interval’ interval.

Properties: Select the properties to be measured.

 #of Links: Computes the number of links.

38
I. Transform

 Density: The proportion of lines that is actually present


in the network. It is the ratio of (the number of lines

present) to (the number of maximum possible lines).

 Average Degree: Average degrees for all nodes.

 # of Components (Weak): A weak component is the


maximal subgraph in which each pair of node is

connected by a semi-path.

 # of Components (Strong): A strong component is the


maximal subgraph in which each pair of node is

connected by a path in both directions.

 Inclusiveness: The number of connected nodes


expressed as a proportion of the total number of nodes.

Connected nodes are the nodes that are not an isolate.

(i.e. inclusiveness = the number of connected nodes /

the number of nodes)

 Reciprocity (Arc Method): The ratio of (the number of links which are the part of
reciprocated relations) to (total number of links)

 Reciprocity (Dyad Method): The ratio of (the number of reciprocated node pairs) to (the
number of connected node pairs)

 Transitivity: The ratio of (total number of transitive triads) to (total number of transitive and
intransitive triads). For digraphs, it is the ratio of (the number of transitive triads) to (the

number of potentially transitive triads).

 Clustering Coefficient: Percentage of the links that are actually present for a node and its
alters. After picking a node, find all of its neighbor nodes. It is a ratio of (the number of

connections observed) to (the number of the maximum possible connections) between its

neighbor nodes. The clustering coefficient of the entire network is the average of the

clustering coefficients for every node.

 Mean Distance: The average geodesic distance between any pair of nodes in a network.

 Diameter: The largest geodesic distance between any pair of nodes in a network.

 Node Connectivity: The minimum number of nodes that must be removed to disconnect the
network.

39
NetMiner Module Reference

 Link Connectivity: The minimum number of links that must be removed to disconnect the
network.

 Connectedness: Calculates ratio of pairs it can be reached mutually each other in the digraph.

 Efficiency: Calculates how efficient the network’s connection is.

 Hierarchy: Measures how much network has hierarchical character.

 LUB: Computes how many roots there are, if the network is regarded as a tree.

 # of Isolated Nodes : Computes the number of isolated nodes.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘Link Reduction Simulation’ analysis, ‘Main

Report’ and ‘XY Plot’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
Network properties: The network measures of the selected properties (when we apply different

threshold values) are reported.

 Charts

40
I. Transform

XY Plot
As to the selected property, the relation between a threshold value and a property value is represented.

 Time Complexity
 # of Links: O(m)

 Density: O(m)

 # of Components (Weak and Strong): O(m)

 Inclusiveness: O(m)

 Average Degree: O(m)

 Reciprocity: O(m)

 Transitivity: O(n3)

 Clustering Coefficient: O(m2)

 Mean Distance: O(n3)

 Diameter: O(n3)

 Node Connectivity: O(n4)

 Link Connectivity: O(n3)

 Connectedness: O(m)

 Efficiency: O(m)

 Hierarchy: O(n3)

 LUB: O(n3)

41
NetMiner Module Reference

 # of Isolated Nodes: O(m)

 References
 Inclusiveness: John Scott, Social Network Analysis - a handbook, 2nd edition. 2000. (p.70)

 Reciprocity: Zeggelink, E.P.H. (1993). Strangers into friends. The evolution of friendship

networks using an individual oriented modeling approach. Amsterdam: Thesis Publishers, 1993.

 Transitivity: Frank, O., &Harary, F. (1982). Cluster inference by using transitivity indices in

empirical graphs. Journal of the American Statistical Association, 77, 835-840.

 Clustering Coefficient: Watts D J (1999) Small worlds. Princeton University Press, Princeton,

New Jersey. 32-33.

 Connectedness: Krackhardt, David (1994). Graph theoretical dimensions of informal

organizations. In Kathleen Carleyand Michael Prietula, eds. Computational Organizational

Theory, Lawrence Erlbaum Associates, Inc.

 Efficiency: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations.

In Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 Hierarchy: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations.

In Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 LUB: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations. In

Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 Related Topics
 Analyze >> Properties >> Network >> Multiple

42
I. Transform

Transform >> Matrix >> Vectorize >> 1-mode

Network

 Menu
Transform >> Matrix >> Vectorize >> 1-mode Network

 Description
This module transforms the 1-mode network data which is a matrix into a vector. The node pairs or

links in the input matrix would be the nodeset in the transformed data. In addition, the link weight

will be transformed as an attribute of the new dataeset.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

 Main process

Include node pairs


- All node pairs: All node pairs are included in the vectorization

procedure.

- Node pairs with links: Only node pairs with links are included in the

vectorization procedure.

Indicate node pairs without a link

43
NetMiner Module Reference

- Record zero: If If a node pair does not have a link, the weight value of that node pair will be

recorded as "zero".

- Record missing value: If a node pair does not have a link, the weight value of that node pair will be

recorded as "missing value".

Diagonal Handling Option: If you select ‘retain’, *self-loop would appear in generated vector as a
node.

*self-loop: Diagonal cell means a link that its source node is same to its target node. We call this link

a self-loop link. It is shown as ‘(node a -> node a)’ in the column of the incidence network.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Vectorization

>> 1-mode Network’ module, Main Report and Vectorized table

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Vectorized table
Selected 1-mode Network variables are transformed into vectors.

44
I. Transform

 Example

 Time Complexity
 O(n^2)

 Related Topics
 Transform >> Matrix >> Vectorize >> 2-mode Networks

45
NetMiner Module Reference

Transform >> Matrix >> Vectorize >> 2-mode


Network

 Menu
Transform >> Matrix >> Vectorize >> 2-mode Network

 Description
This module transforms the 2-mode network data which is a matrix into a vector. The node pairs or

links in the input matrix would be the nodeset in the transformed data. In addition, the link weight

will be transformed as an attribute of the new dataeset.

 User Options

 Input
2-mode Network: select 2-mode Network to transform. You can
select multiple networks at once.

- Nodeset: At first, you should select a Sub Nodeset containing 2-

mode Network you want to transform.

 Main process

Include node pairs


- All node pairs: All node pairs are included in the vectorization

procedure.

- Node pairs with links: Only node pairs with links are included in

the vectorization procedure.

Indicate node pairs without a link


- Record zero: If If a node pair does not have a link, the weight value of that node pair will be

recorded as "zero".

46
I. Transform

- Record missing value: If a node pair does not have a link, the weight value of that node pair will be

recorded as "missing value".

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Vectorization >> 2-

mode Network’ module, Main Report and Vectorized table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Vectorized table
Selected 2-mode Network variables are transformed into vectors.

 Example

 Time Complexity
 O(nm)

 Related Topics

47
NetMiner Module Reference

 Transform >> Matrix >> Vectorize >> 1-mode Networks

48
I. Transform

Transform >> Layer >> Split

 Menu
Transform >> Layer >> Split

 Description
This function is to split a weighted matrix to multiple binary matrices. At first, it collects all weight

values in original data. If you check ‘Regard 0 as valid value’ in ‘Process 0.0 Option’, 0 may exist in

that list. Then, for each weight value collected, it adds new data by dichotomizing original data. Split

operator specified by user is used as dichotomizing operator and each weight value is used as

dichotomizing value.

 User Options

 Input

1-mode Network: select 1-mode Network to split. You can select


multiple networks at once.

2-mode Network: select 2-mode Network to split. You can select


multiple networks at once.

 Main process

Split Operator: select split operator which will be used as dichotomizing operator. In mathematical
terms,

49
NetMiner Module Reference

M i , j , k  (i, j ) element of k-th matrix.

GT: if X i , j  wk , M i , j ,k  1, else M i , j ,k  0

GE: if X i , j  wk , M i , j ,k  1, else M i , j ,k  0

EQ: if X i , j  wk , M i , j , k  1, else M i , j , k  0

LE: if X i , j  wk , M i , j ,k  1, else M i , j ,k  0

LT: if X i , j  wk , M i , j ,k  1, else M i , j ,k  0

NE: if X i , j  wk , M i , j , k  1, else M i , j , k  0

Where wk is k-th value in input matrix when ordered in ascending order.

Process 0.0: If ‘Regard 0 as valid’ is selected, data is also dichotomized for 0. (So, ‘number of
weight values +1’ binary matrices are created.) If ‘Regard 0 as valid’ is not selected, 0 is ignored.

Select Diagonal Handling Option: If you select ‘retain’, diagonal values also is split. If you select
‘ignore’, diagonal values isn’t split.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Split’ module,

Main Report and Transformed Result Tables are created.

50
I. Transform

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Recoded Data Table


Resulting dataset contains the new divided 1-mode Network variables additionally. Each network is a

dichotomized network made by criterion value(weight) and split operator.

 Example
If Split Operator is EQ, 0 is not regarded as a value; three matrices are created as follow.

 Time Complexity
 O(n^2)

 Related Topics

51
NetMiner Module Reference

Transform >> Layer >> Merge

 Menu
Transform >> Layer >> Merge

 Description
This function inserts a new 1-mode Network/relational variable by combining two or more 1-mode

Network/relational variables to Current Dataset. For the purpose of scale standardization, each 1-

mode Network variable may be dichotomized (according to the user-defined cut-off value) prior to

combination.

 User Options

 Input
1-mode Network: select 1-mode Networks to merge. You can
select multiple networks at once.

 Pre-process
Symmetrize: You can symmetrize your data before running module.
By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data.

 Main process
Merge options: When merging networks, there can be several links connects the same source node
and target node. To merge links with various weight values to one link, you should decide how to

merge them. ‘And’ and ‘Or’ options convey semantic meaning only for unweighted networks. So,

when ‘And’ or ‘Or’ option is selected, the module dichotomizes your data automatically. On the

other hand, when one of the other below-mentioned options is selected, the module doesn’t

dichotomize your data.

52
I. Transform

And: If weights of all links (which connects same source node and target node) are greater than 0, the

weight of new link is 1. Otherwise, it becomes 0.

- Or: If there’s any link among links (which connects same source

node and target node) whose weight is greater than 0, the weight of

new link is 1. If weights of all links are 0, the weight of new link

becomes 0.

- Sum: The weight of new link is sum of the weights of all links (which

connects same source node and target node).

- Average: The weight of new link is average of the weight of all links

(which connects same source node and target node).

- Max: The maximum weight of links is the weight of new link.

- Min: The minimum weight of links is the weight of new link.

- Linear Sum: User can decide coefficient for each network. Each coefficient for network is

multiplied to each weight value. The weight of new link is sum of the multiplied values.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Merge’ module, Main

Report and Transformed Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Report

Main Report
Main Report presents information of process and data only.

53
NetMiner Module Reference

 Tables

Merged Data Table


Merged 1-mode Network data is created.

 Example
After inputs are dichotomized and ‘Or’ is selected for merge option, result is as follow.

 Time Complexity
 O(m)

 Related Topics

54
I. Transform

Transform >> Layer >> Multiplex

 Menu
Transform >> Layer >> Multiplex

 Description
This function is to make a single relation from multiple relations. Let R_i be i-th relation and the

number of relations be k. Then each (i, j) pair has (R_1(i,j), R_2(i,j), …, R_k(i,j)), which is a bundle,

in multiple relations. Some pairs have different bundle, but other pairs can have same bundle. (In

other words, they have the same link values in every network.) This module counts the patterns of

relations and encodes each unique bundle (pattern of relations) to a distinct code. Then it sets the (i, j)

element of multiplex matrix or the result as that bundle’s code. If two node pairs have same code in

multiplex matrix, they have same link values in every input network. This routine is mainly for

categorical relations.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Multiplex’

module, Main Report and Transformed Result Tables are created.

55
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Multiplex Data Table


New transformed 1-mode Network data is generated.

Bundle Code Table


Every Bundle Codes are presented in this table.

56
I. Transform

 Example

 Time Complexity
 O(m * r) where r is # of 1-mode Networks.

 Reference
 Stephen P. Borgatti, Martin G. Everett. Two Algorithms for computing regular equivalence.

Social Networks, v.15, page 371.

 Related Topics

57
NetMiner Module Reference

Transform >> Mode >> 2-mode Network

 Menu
Transform >> Mode >> 2-mode Network

 Description
This module transforms 2-mode Network data to proximity (1-mode Network) data whose each

subject (main node or sub node) represents similarity or distance among main nodes or sub nodes.

Proximity matrix among main nodes is called co-membership matrix, similarly among sub nodes is

called overlap matrix, and relationship matrix between main nodes and sub nodes is called bipartite

matrix. This menu gives users the ability to construct result matrix with co-membership, overlap,

bipartite sub-matrix.

 User Options

 Input
2-mode Network: select 2-mode Network to transform. You can select
multiple networks at once.

- Nodeset: At first, you should select a Sub Nodeset containing 2-mode

Network you want to transform.

- Link Merge: When selected data contains multiple links (more than two links which are composed

of same source node and target node), you should decide how to merge them to a single link.

 Main process

Output Network
Before running this module, you should decide which sub-matrix should be created.

- Co-membership (Main * Main): Decide whether to construct co-membership sub-matrix by

applying specified proximity measures to ‘row vectors’ of main nodes

58
I. Transform

- Overlap (Sub * Sub): Decide whether to construct overlap sub-

matrix by applying specified proximity measures to ‘column vectors’

of sub nodes.

- Bipartite (Main * Sub, Sub * Main): Decide whether to insert

bipartite sub-matrix

Proximity Measures
To create Co-membership sub-matrix or Overlap sub-matrix, row

profile vector (vector corresponding to each rows of input data) or column profile vector (vector

corresponding to each columns of input data) should be compared.

For the method of comparing two vectors, NetMiner provides three different types of measures. First

type is ‘Match’. Match measures compare whether each value of two vectors are identical. For

measures of ‘Match’ type, only binary vectors can be inputted. The result value is usually bigger than

0, and smaller than 1. The closer to 1 result value is the more similar two subjects are. Second type is

‘Correlation’. When vectors (which are represents two subjects) are compared, the bigger result value

is the more similar two subjects are. Third type is ‘Distance’. When vectors (which represent two

subjects) are compared, the bigger result value is, the less similar two subjects are. It means that the

two subjects are more distant.

If you want to construct co-membership matrix or overlap matrix same as in NetMiner II 2.6, then

just use ‘Correlation - Inner Product’ measure.

- Match

For selected two nodes’ row profiles R=(R_1, R_2, …, R_n) and S=(S_1, S_2, …, S_n),

a: The number of i with R_i =1 and S_i = 1

b: The number of i with R_i =1 and S_i = 0

c: The number of i with R_i =0 and S_i = 1

d: The number of i with R_i =0 and S_i = 0

a
Jaccard coefficient 
abc

59
NetMiner Module Reference

ad
Simple Matching 
abcd
a
Ochiai 
{( a  b)( a  c)}1 / 2
2a
Czekanowski, Sorensen, Dice 
2a  b  c
a
Russel, Rao 
abcd
a
Simpson 
min{( a  b), (a  c)}
a
Braun, Blanque 
max{( a  b), (a  c)}
a
Kulczynski1 
bc
1 a a
Kulczynski2  (  )
2 ab ac

Cij Cij
Equivalence Index  ( )( )
Ci Cj

a
Sokal, Sneath, Anderberg 
a  2(b  c)
2a
Mountford 
a(b  c)  2bc
ad  bc
Yule 
ad  bc
ad  bc
Phi  1
{( a  b)( a  c)(b  d )(c  d )}2

(a  d )  (b  c)
Hamman 
abcd

60
I. Transform

a(a  b  c  d )
Mozley, Margalef 
(a  b)( a  c)
ad
Roger, Tanimoto 
a  2b  2c  d
4(ad  bc)
Michael 
(a  d ) 2  (b  c) 2

- Correlation

C ik : k-th element of profile vector which represents subject i.

 (C ik  C i )(C jk  C j )
Pearson’s Correlation 
k 1
n n

 (C
k 1
ik  C i )2  (C
k 1
jk  C j )2

C ik C jk
Cosine Similarity 
k 1
n n

C C
2 2
ik jk
k 1 k 1
n
Inner Product  C
k 1
ik C jk

n
6 (Cik  C jk ) 2
Spearman’s rho  1 
k 1
n(n 2  1)

- Distance

C ik : k-th element of profile vector which represents subject i.

Euclidean Distance  {  (Ck


ik  C jk ) } 2 2

61
NetMiner Module Reference

City Block Metric   Cik  C jk


k

1

Minkowski Metric  {  wk Cik  C jk }
k

Cik  C jk
Canberra Metric   (Ck  C jk )
ik

1 C ik  C jk
Bray-Curtis  k
p  (C
k
ik  C jk )

1 (Cik  C jk ) 2
Divergence  
p k (Cik  C jk ) 2

 C C ik jk
Soergel  k

 max( C , C
k
ik jk )

1 1 1

Bhattacharyya Distance  {  (Cik 2  C jk 2 ) 2 }2


k

1 min( Cik , Cik )


Wave-Heedges  
p k
(1 
max( Cik , C jk )
)

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Mode >> 2-mode

Network’ module, Main Report and Transformed Result Tables are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

62
I. Transform

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Co-membership matrix Table


Co-membership Matrix: Proximity Matrix among Main NodeSets

Overlap Matrix Table


Proximity Matrix among Sub NodeSets

Bipartite Matrix Table

63
NetMiner Module Reference

 Example

 Time Complexity
 O(n^2) when # categories is much smaller than # nodes.

64
I. Transform

 Reference
 Trevor F. Cox and Michael A. A. Cox. Multidimensional Scaling Second Edition. Monographs

on Statistics and Applied Probability 88. CHAPMAN & HALL/CRC. Table 1.1.

 Related Topics

65
NetMiner Module Reference

Transform >> Mode >> 1-mode Network

 Menu
Transform >> Mode >> 1-mode Network

 Description
This module makes a proximity (1-mode Network) matrix between main nodes by comparing row

profile or column profile in the selected matrices. The result (i,j) value also can be interpreted as the

similarity among out neighbors if you compared row profiles. Similarly the result (i,j) value also can

be interpreted as the similarity among in neighbors if you compared column profiles.

 User Options

 Input
1-mode Network: select 1-mode Network to transform. You can
select multiple networks at once.

 Main process

Dimension
Based upon user-specified options, you can compare nodes among

out-neighbors(select row vectors) or nodes among in-

neighbors(select column vectors). To do that, you should compare

two row vectors or column vectors, which are corresponding to two

nodes.

66
I. Transform

Proximity Measures
For the method of comparing two vectors, NetMiner provides three different types of measures. First

type is ‘Match’. Match measures compare whether each values of two vectors are identical. For

measures of ‘Match’ type, only binary vectors can be inputted. The result value is usually bigger than

0, and smaller than 1. The closer to 1 result value is the more similar two subjects are. Second type is

‘Correlation’. When vectors (which are represents two subjects) are compared, the bigger result value

is the more similar two subjects are. Third type is ‘Distance’. When vectors (which represent two

subjects) are compared, the bigger result value is, the less similar two subjects are. It means that the

two subjects are more distant.

If you want to construct co-membership matrix or overlap matrix same as in NetMiner II 2.6, then

just use ‘Correlation - Inner Product’ measure.

- Match

For selected two nodes’ row profiles R=(R_1, R_2, …, R_n) and S=(S_1, S_2, …, S_n),

a: The number of i with R_i =1 and S_i = 1

b: The number of i with R_i =1 and S_i = 0

c: The number of i with R_i =0 and S_i = 1

d: The number of i with R_i =0 and S_i = 0

a
Jaccard coefficient 
abc
ad
Simple Matching 
abcd
a
Ochiai 
{( a  b)( a  c)}1 / 2
2a
Czekanowski, Sorensen, Dice 
2a  b  c
a
Russel, Rao 
abcd

67
NetMiner Module Reference

a
Simpson 
min{( a  b), (a  c)}
a
Braun, Blanque 
max{( a  b), (a  c)}
a
Kulczynski1 
bc
1 a a
Kulczynski2  (  )
2 ab ac

Cij Cij
Equivalence Index  ( )( )
Ci Cj

a
Sokal, Sneath, Anderberg 
a  2(b  c)
2a
Mountford 
a(b  c)  2bc
ad  bc
Yule 
ad  bc
ad  bc
Phi  1
{( a  b)( a  c)(b  d )(c  d )}2

(a  d )  (b  c)
Hamman 
abcd
a(a  b  c  d )
Mozley, Margalef 
(a  b)( a  c)
ad
Roger, Tanimoto 
a  2b  2c  d
4(ad  bc)
Michael 
(a  d ) 2  (b  c) 2

- Correlation

68
I. Transform

C ik : k-th element of profile vector which represents subject i.

 (C ik  C i )(C jk  C j )
Pearson’s Correlation 
k 1
n n

 (C
k 1
ik  C i )2  (C
k 1
jk  C j )2

C ik C jk
Cosine Similarity 
k 1
n n

C C
2 2
ik jk
k 1 k 1
n
Inner Product  C
k 1
ik C jk

n
6 (Cik  C jk ) 2
Spearman’s rho  1 
k 1
n(n 2  1)

- Distance

C ik : k-th element of profile vector which represents subject i.

Euclidean Distance  {  (C k
ik  C jk ) } 2 2

City Block Metric   Cik  C jk


k

1

Minkowski Metric  { w k
k Cik  C jk } 

Cik  C jk
Canberra Metric   (C
k  C jk )
ik

69
NetMiner Module Reference

1 C ik  C jk
Bray-Curtis  k
p  (C
k
ik  C jk )

1 (Cik  C jk ) 2
Divergence  
p k (Cik  C jk ) 2

 C C ik jk
Soergel  k

 max( C , C
k
ik jk )

1 1 1

Bhattacharyya Distance  {  (C k
ik
2  C jk ) }
2
2 2

1 min( Cik , Cik )


Wave-Heedges  
p k
(1 
max( Cik , C jk )
)

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Mode >> 1-mode Network’ module, Main Report and Transformed Result Tables are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

70
I. Transform

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Proximity Measure Data Table

 Time Complexity
 O(n^3)

 Reference
 Trevor F. Cox and Michael A. A. Cox. Multidimensional Scaling Second Edition. Monographs

on Statistics and Applied Probability 88. CHAPMAN & HALL/CRC. Table 1.1.

 Related Topics

71
NetMiner Module Reference

Transform >> Mode >> Main Node Attribute

 Menu
Transform >> Mode >> Main Node Attribute

 Description
This module makes a proximity (1-mode Network) matrix between main nodes by comparing main

node attribute values in the selected attribute vector. There are three measures we provide.

1) Absolute Difference: the (i,j) element of an output matrix = |x_i-x_j|

2) Squared Difference: the (i,j) element of an output matrix = (x_i-x_j)^2

3) Exact Matching: the (i,j) element of an output matrix = 1 if x_i = x_j, 0 otherwise.

 User Options

 Input
Node Attribute: Select Main Node Attribute vector. You can select
multiple data at once. They will be used to make proximity matrix.

 Main process
Proximity Measures: Select proximity measures. Attribute vectors
are compared by selected measure.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Mode >> Main Node Attribute’ module, Main Report and Transformed Result

Tables are created.

72
I. Transform

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Proximity Measure Data Table


Resulting dataset contains the new transformed 1-mode Network variable.

 Time Complexity
 O(n^2)

 Reference

73
NetMiner Module Reference

 Trevor F. Cox and Michael A. A. Cox. Multidimensional Scaling Second Edition. Monographs

on Statistics and Applied Probability 88. CHAPMAN & HALL/CRC. Table 1.1.

 Related Topics

74
I. Transform

Transform >> Mode >> Tree Construction

 Menu
Transform >> Mode >> Tree Construction

 Description
This module constructs a Tree Structure and Inclusion Relationship from the input attributes which

show the Tree Hierarchy. The definitions and concepts regarding Tree Structure in NetMiner would

be found in "Using NetMiner" manual: Concept >> Data Structure >> Data Item >> Representing

Tree Structure

 User Options

 Input

Tree Hierarchy Attributes: select node attributes which represents the


Tree Hierarchy.

- Attribute List View: The selected attributes are listed in the box. If an

attribute value shows a small depth, the attribute should be in upper

position in this box.

- Combo box: When you click the combo box, the list of node attributes

is presented. You can add an attribute to the Tree Hierarchy by

selecting it in this combo box.

- "+" button: The attribute shown in the combo box is added to the Attribute List View.

- "-" button: When you select an attribute in the 'Attribute List View' and click this button, the

selected attribute is removed from the Attribute List View' box.

- arrow buttons: You can set the depths of the attributes by clicking the arrow buttons.

75
NetMiner Module Reference

Common Attributes: The attributes selected in here are saved as a attribute of Tree Nodes. For
example, if the 'Team' attribute is selected in here, the Tree Node 'Advertising (section)' will have the

'Marketing' as the value of 'Team' attribute.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Tree

Construction’ module, Main Report, Tree Nodes table and Inclusion

Relationship table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Tree Nodes
Tree Nodeset which is a special kind of Sub Nodeset is generated. For each Tree Node, the Parent

node of that node is presented. In addition, the attributes data defined in 'Input' Control Item is also

presented.

Inclusion Relationship
The Inclusion Relationship which is a special kind of 2-mode Network is generated. For each Main

Node, affiliated Tree Node is presented.

76
I. Transform

 Time Complexity
 O(k*n) where k is # of ‘Tree Hierarchy Attributes’

 Reference

 Related Topics
 Analyze >> Position >> Expand/Collapse

 Using NetMiner >> Concept >> Data Structure >> Data Item >> Representing Tree Structure

77
NetMiner Module Reference

Transform >> Random >>1-mode Network >> Erdos-


Renyi

 Menu
Transform >> Random>>1-mode Network >> Erdos-Renyi

 Description
In the G(n, M) model, a graph is chosen uniformly at random from the collection of all graphs

which have n nodes and M edges. For example, in the G(3, 2) model, each of the three possible

graphs on three vertices and two edges are included with probability 1/3.

 User Options

 Input

Input Workfile:
- Current Workfile: The generated 1-mode random network data will

be contained from main nodeset in current workfile

-User Defined: The generated 1-mode random network data will be

contained from user defined # of nodes and label prefix

- # of Nodes : The number of nodes in generated network.

- Label Prefix : The label pre fixed in front of node label.

 Main Process

-# of Network: The number of generated network.

-Link Option: You can setting the number of link for generated

network.

78
I. Transform

- Number of Links : Setting total number of links

-Density : Setting total number of links from link density in network.

- Average Degree : Setting total number of links from average degree in network

-Directed : You can generate directional network.

-Weighted : You can generate weighted network. Weight value is random value (0~ # of Node)

-Distribution:

- Uniform : This is setting degree distribution to uniform distribution.

- Normal : This is setting degree distribution to normal distribution.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Erdos-renyi’ module,

Main Report and Generated network Result Tables are created

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Erdos-Renyi

79
NetMiner Module Reference

 Time Complexity
 O(n)

 Reference
 Paul Erdős and Alfréd Rényi,1959:On Random Graphs., Math. Debrecen 6, p. 290–297.

 Related Topics

80
I. Transform

Transform >> Random >>1-mode Network >> Scale-


Free

 Menu
Transform >>Random>> 1-mode Network >> Scale-Free

 Description
You can generate Scale-Free Network. In this algorithm, 1-mode Network generation is started

from some Starting Nodes. Additional nodes are added to these Starting Nodes under the rule of

Preferential Attachment. In Preferential Attachment, the more extensively the node is linked, the

more likely the node will be linked additionally by new link. In other words, the probability of

linking to the nodes from the additional node is proportion to the Degree of each node to be linked.

You can select the number of Starting Nodes and the number of Link per Node.

 User Options

 Input

Input Workfile:
- Current Workfile: The generated 1-mode random network data will

be contained from main nodeset in current workfile.

-User Defined: The generated 1-mode random network data will be contained from user defined # of

nodes and label prefix

- # of Nodes : The number of nodes in generated network.

- Label Prefix : The label pre fixed in front of node label.

81
NetMiner Module Reference

 Main Process

-# of Starting Nodes: For generating scale-free network, you must

setting the number of starting nodes.

-# of Link per Node: When adding node for generating network, the

node preferential attach other nodes as you specified the number of

link.

-# of Networks: The number of generated network

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Scale-Free’ module,

Main Report and Generated network Result Tables are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

Scale-Free

82
I. Transform

 Time Complexity
 O(n)

 Reference
 Albert R., Barabási A.-L. (2002). "Statistical mechanics of complex networks". Rev.

Mod. Phys. 74: 47–[Link].1103/RevModPhys.74.47

 Related Topics

83
NetMiner Module Reference

Transform >> Random >>1-mode Network >> QAP


Permutation

 Menu
Transform >>Ramdom>> 1-mode Network >> QAP Permutation

 Description
QAP Permutation is a 1-mode Network generation algorithm by reordering the Main Nodeset in

Current Workfile.

 User Options

 Input

1-mode network: You can select “Seed Network” for permutation in


input

Link Merge: If the selected network has Multiple Link, you can
select the Merge Option.

 Main Process
-# of Networks: The number of generated network.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘QAP

permutatation’ module, Main Report and Generated network Result

Tables are created.

84
I. Transform

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

QAP Permutation

 Time Complexity
 O(1)

 Reference
 Sensitivity of MRQAP tests to collinearity and autocorrelation conditions. David Dekker, David

Krackhardt, Tom A.B. Snijders, Psychometrika Vol. 72. No 4. 563-581 (2007)

 Related Topics

85
NetMiner Module Reference

Transform >> Random >>1-mode Network >> MCMC

 Menu
Transform >>Random>> 1-mode Network >> MCMC

 Description
MCMC is generating a new 1-mode Network that maintains the sum of In-Degree and Out-Degree

of the selected “Seed Network”. There are two MCMC Option: MCMC[U(X_i+_, X_+j_)] is for

generating a network meeting the condition that the sum of In-Degree and Out-Degree is same with

that of the “Seed Network”. MCMC[U(X_i+_, X_+j_, MAN) is for generating a network meeting

more specific condition that the sum of In-Degree and Out-Degree and the MAN is same with that of

the “Seed Network”.

 User Options

 Input

1-mode network: You can select “Seed Network” for permutation


in input

- Link Merge: If the selected network has Multiple Link, you can

select the Merge Option.

 Pre-Process
- Dichotomize: MCMC algorithm requires dichotomization. This

option is not activated so you can’t uncheck Dichotomize option.

 Main Process

86
I. Transform

-# of Networks: The number of generated network.

-Max Iteration: Rewiring maximum iteration in algorithm

-MCMC Option:

- MCMC[U(Xi+, X+j)]: Generate new matrices whose row

marginal totals and column totals are same to input matrix. Thus,

nodes’ in-degree and out-degree of new matrices are same to those

of original matrix. The analyzed result of new matrices is used for comparing with the result of

original matrix.

- MCMC[U(Xi+, X+j, MAN)] : Generate matrices having upper condition and same dyad census

with original matrix. The analyzed result of new matrices is used for comparing with the result of

original matrix.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘MCMC’ module,

Main Report and Generated network Result Tables are created

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report
Main Report presents information of process and data only.

 Tables

87
NetMiner Module Reference

MCMC

 Time Complexity
 O(i) i: iteration

 Reference
 Simple methods for simulating sociomatrices with given marginal totals, John M. Roberts Jr.

 Related Topics

88
II. Analyze

II. Analyze
1. Neighbor
 Degree

 Ego Network
 Structural Hole

 Homophily

 Assortativity

 Equicentrality

2. Subgraph
 Dyad Census

 Triad Census

 Triad Combination

 Motif Search

3. Connection
 Shortest Path

 All Path Finding

 All Cycle Finding

 Dependency
 Connectivity >> Node

 Connectivity >> Link

 Minimum Cutset

 Maximum Flow

 Topological Sort

 PFnet

 Influence

 Accessibility

4. Cohesion
 Component

89
NetMiner Module Reference

 Bi-Component

 Clique

 Generalized Clique

 n-Clique

 n-Clan

 k-Plex

 k-Core

 Lambda Set

 Community (Betweenness)

 Community (Modularity)

 Community (Eigenvector)

 Community (Label Propagation)

 Community (Blondel)

 Cohesive Block

 s-Clique

5. Centrality
 Degree

 Coreness

 Closeness

 Decay

 Percolation

 Betweenness >> Node

 Betweenness >> Link

 Flow Betweenness

 R.W. Betweenness

 Information

 Load

 Eigenvector

 Status

 Power

 Effects

90
II. Analyze

 PageRank

 Generalized PageRank

 HITS

 Community

6. Equivalence
 Structural >> Profile

 Structural >> CONCOR

 Regular >> REGGE

 Regular >> CatRE

 Role >> Triad

 Role >> Local

 SimRank

7. Position
 Blockmodel (Conventional)

 Brokerage

 Bow-Tie Model

 Expand/Collapse

8. Properties
 Network >> Multiple

 Network >> Modularity

 Group

9. Models
 Dyadic Interaction (p1)

 ERGM (p*)

 Blockmodel (Generalized)

 Influence Network >> Effects

 Influence Network >> Sequence

10. Two Mode


 Degree

 Eigenvector Centrality

 Collaborative Filtering

91
NetMiner Module Reference

 Max. Matching

92
II. Analyze

Analyze >> Neighbor >> Degree

 Menu
Analyze >> Neighbor >> Degree

 Description
This module analyzes degrees and types of nodes in the network.

Two nodes are 'adjacent' if there is a line between them. A node is 'incident' to a line if the node is

one of the pair of nodes defining the line. Nodal 'degree' (of connection) is the number of lines that

are incident with it. It measures the size of its direct neighborhood. In a directed network, nodal ‘In-

Degree’ is the number of lines to which the node as a target is incident. And ‘Out-Degree’ is the

number of lines to which the node as a source is incident. Self-loop links are ignored in the

calculation.

Also, nodes in a directed graph can be categorized into 5 types. The first type, 'isolate' node means

that it does not have any links. The second type, 'transmitter' means that the node has only out links

and no in links. The third type, 'receiver' node has only in links, so just receive something. The fourth

type, 'carrier' node requires that both in degree and out degree are equal to 1. The last, 'ordinary' type

means the node which does not fall in aforementioned categories.

'Density' and 'Inclusiveness' is also measured with degree. Density is the ratio of the number of lines

present to the maximum possible. Inclusiveness is the number of connected nodes expressed as a

proportion of the total number of nodes. Connected nodes mean the nodes except isolates.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

93
NetMiner Module Reference

than two links connect the same source node and target node pair, you should decide how to merge

them to a single link.

 Main process
Measure
- # of links: The degree of each node is the number of links which are

incident from the node.

- Sum of weight: The degree of each node is weight sum of links

which are incident from the node.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Degree’ analysis,

Main Report, Degree Table, Node Type and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Network Density: The density of network is reported.

- Distribution of Degree: Sum, Mean, Std. Dev, Min, Max, number of isolates and pendant, and

inclusiveness are reported for each In-Degree and Out-Degree.

- Number of Node Type: The number of nodes is reported for each node type.

94
II. Analyze

 Tables
Degree Table
In-Degree value and Out-Degree value are presented for each node.

Node Type Vector


The node type is presented for each node.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Height of a node represents In-Degree, and width of a node represents Out-Degree.

The color of a node means its node type.

95
NetMiner Module Reference

 Inspect
Considering the direction of links, this module explores the neighbors of the selected focal node.

 Choose Direction
Selected Direction is the criterion for showing the degree of each

node.

 Select Neighbor
Focal Node
After a focal node is selected, the styles of matching focal node and its neighbors are changed as pre-

established node style in the global option as follow.

- Focal Node: Node >> Focus Node >> Focal Node

- Neighbors: Node >> Focus Node >> Related Node(s)

- Other Nodes: Node >> Focus Node >> Other Node(s)

You can search node using the blank area by writing some parts of

the Node Label in that area. But you need to click the Node Label

below the text box that shows the search result.

Direction
Selected Direction is the criterion for finding the neighbors of the focal node. The change of selected

item is reflected on the network map just by clicking the Submit button

96
II. Analyze

<Example Screen shot>

 Time Complexity
 O(m)

 Reference
 Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications.
Cambridge University Press.

 Related Topics

97
NetMiner Module Reference

Analyze >> Neighbor >> Ego Network

 Menu
Analyze >> Neighbor >> Ego Network

 Description
This module analyzes local connection structure of each node in a network. An ego network

consists of a focal node and set of alter nodes adjacent to or from the focal node. Basic Ego Network

measures include size and density of each ego network.

If the input data is a weighted network, density of ego network can be calculated by two different

ways based upon the dichotomization of network. In a dichotomized network, egonet density is

‘(number of lines present) / (maximum possible number of the lines)’. But, in un-dichotomized

network, egonet density is ‘weight sum / maximum possible number of the lines’.

Egonet size is the number of nodes adjacent to or from a focal node.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running analysis.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

98
II. Analyze

 Main process
Direction
- In: The links which make focal nodes target nodes are included by

ego network.

- Out: The links which make focal nodes target nodes are included by ego network.

- Both: It analyzes the union of in-neighbor ego networks and out-neighbor ego networks.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Ego Network’ analysis,

Main Report, Egonet Details and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Egonet measures: Mean, standard variation,

minimum and maximum value of egonet measures are reported.

 Tables
Ego Network Details
- Egonet Size: Number of alter nodes(nodes that are

adjacent to the focal node). So, focal node is not

included in egonet size.

99
NetMiner Module Reference

- Egonet Density: Density of the Ego Network.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Node with bigger EgoNet size is presented bigger on the map.

 Inspect
This module explores the k-neighbors of the selected focal node by given distance and direction.

 k-Neighbor
Focal Node
After a focal node is selected, the styles of matching focal node and its

neighbors are changed as pre-established node style in the global

option as follow.

- Focal Node: Node >> Focus Node >> Focal Node

- Neighbors: Node >> Focus Node >> Related Node(s)

- Other Nodes: Node >> Focus Node >> Other Node(s)

100
II. Analyze

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

Distance
Selected Distance (k) is a criterion for finding the k-neighbors of the focal node.

Direction
Selected Direction is the criterion for finding k-neighbors of the focal node.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot: Distance=2, In-direction>

 Time Complexity
 O(m)

 Reference
 Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications.
Cambridge University Press.

 Related Topics

101
NetMiner Module Reference

Analyze >> Neighbor >> Structural Hole

 Menu
Analyze >> Neighbor >> Structural Hole

 Description
Burt’s (1992) Structural Hole analyzes local connection structure of each node in a network. With the

assumption that non-redundant relation is efficient relation, structural hole measures redundancy,

efficiency, effective size, constraint and hierarchy of each node.

- Redundancy: For node i, portion of node i’s relationship with node j that is redundant to portion of

node i's relationship with other primary contacts, which are also connected to j. Assume that the

information node i receives from node j is 1. Redundancy means the amount of information that node

i can receive from other nodes. So the high redundancy of i means that i isn’t managing its network

efficiently.

- Efficiency: (1 – redundancy) summed up for all alters

- Effective Size: sum of efficiency in each alter node across relationships

- Constraint: measure of the extent to which ego is invested in people who are invested in other of

ego's alters

- Hierarchy: the extent to which constraint on ego is concentrated in a single alter It lies 0 to 1, if

hierarchy is 1, constraint of node i is concentrated in only one alter.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

102
II. Analyze

should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Structural Hole’

analysis, Main Report, Structural Hole Measures, Redundancy Matrix,

Constraint Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Structural Hole Measures: Mean, standard variation, minimum and maximum value

of structural hole measures are reported.

 Tables
Structural Hole Measures
Efficiency, Effective Size, Aggregated

Constraint and Hierarchy are showed.

103
NetMiner Module Reference

 Redundancy Matrix
portion of node i's relationship with node j

that is redundant to node i's relations with

other primary contacts

 Constraint Matrix
measure of the extent to which ego is invested

in people who are invested in other of ego's

alters

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Node with higher Structural Hole score is presented bigger on the map.

104
II. Analyze

 Inspect
Structural Hole inspects the Burt’s Structural Hole Measures (Constraint, Redundancy) of the

neighbors of the selected focal node.

 Choose Measure
Structural Hole Measure
Sizes of each node are changed by selected Structural Hole measure.

 Structural Hole
Focal Node
Selecting a Focal Node makes the node style of the matching focal

node and the neighbor nodes on the network map change as pre-

established node style in the global option as follows.

- Focal Node: Focus Node - Focal Node

- Neighbors: Focus Node - Related Node (s)

- Other Nodes: Focus Node - Other Nodes (s)

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

Constraint and Redundancy Selection


The change of selected item is reflected on the network map representing the Constraint (or

Redundancy) values below the neighbor’s node by clicking the Constraint (or Redundancy) radio

button. The change of selected item is reflected on the network map just by clicking the Submit

button

<Example Screen shot>

105
NetMiner Module Reference

You can see the Constraint (or Redundancy) Value at the right side of each node label

 Time Complexity
 O(m)

 Reference
 Burt, R.S. (1992). Structural Holes: The social structure of competition. Cambridge:
Harvard University Press

 Related Topics

106
II. Analyze

Analyze >> Neighbor >> Homophily

 Menu
Analyze >> Neighbor >> Homophily

 Description
Homophily compares the selected attribute data between the each focused node and its corresponding

neighbors of the node.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

Select Vector: Select a Main Node Attribute data. Selected vector is


used as a criterion for comparing the focused node and the neighbors.

 Main process
Type of Neighbor: Decide which type of neighbors should be
compared. In-neighbors are the neighbors that make Ego as a target

node. And Out-neighbors are the neighbors that make Ego as a source

node.

Type of Attribute value: Select type of the Attribute which is selected in the Input stage. You can
select Categorical variable or Continuous variable.

107
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Homophily’

analysis, Main Report, Homophily table and Link Information table

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Homophily: Categorical
For each node, composition of neighbors is presented. This table is created when you selected

'Categorical Variable' in the Main process.

– Ego’s value: The value of selected attribute is presented for each node.

– # of Neighbors [A]: The number of in-neighbors or out-neighbors of each node.

– [A]/Total Number of Links(%): The link number of each node / the number of all links

– # of Neighbors with value =”1.0”: The number of Neighbors with attribute value = 1

– “1.0”/[A] (%): The percentage of the number of neighbors with attribute value =1 (number of

neighbors with attribute value =1 / number of neighbors)

– “ 1.0”/Total # of nodes with value=”1.0”: number of neighbors with attribute value =1 / number of

nodes with attribute value = 1

108
II. Analyze

Three Columns (# of Neighbors with value =”1.0”, “1.0”/[A] (%),“ 1.0”/Total # of nodes with

value=”1.0) are generated for each attribute variable.

Homophily: Continuous
This table is created when you selected 'Continuous Variable' in the Main process.

– Ego: The Attribute value of each node.

– Neighbor’s Mean: The mean of the attribute values of neighbors

– Neighbor’s [Link]: The standard deviation of the attribute values of neighbors.

– Neighbor’s [Link]. from Ego: The standard deviation of the attribute values of out neighbors

using the attribute value of the ego as the mean not the neighbor’s mean.

Link Information
For each link, the attribute values of source node and target node are presented.

 Time Complexity
 O(m)

 Related Topics

109
NetMiner Module Reference

Analyze >> Neighbor >> Assortativity

 Menu
Analyze >> Neighbor >> Assortativity

 Description
Assortative mixing measures the degree at which the nodes that have high (low) attribute value (in

default, degree) tend to be connected to other nodes with high (low) attribute value, respectively. A

given network is assortative if r > 0, disassortative if r < 0 and has no assortative mixing if r = 0.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

Attribute Vector: Selected vector is used for verifying tendency


between links and attribute.

- Degree: Use Degree.

- User Defined: Select Main Node Attribute.

 Pre-process
Dichotomize: You should dichotomize your data before running
analysis. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

110
II. Analyze

Symmetrize: You should symmetrize your data before running module. By symmetrizing,
directed/asymmetric data is transformed to undirected/symmetric data. And if you symmetrize your

data, algorithm will perform faster.

 Main process
Diagonal Handling Option: If retain, diagonal values are included to
analysis. But if you select ‘ignore’, diagonal values are excluded from

analysis.

 Post-process
Significance Test: You can test the significance of your data. Available
method is ‘Quadratic Assignment Procedure (QAP)’. It makes new

networks by changing rows and columns, and computes expected value

in those networks. ‘Iteration’ option controls how many networks should

be made.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Assortativity’ analysis,

Main Report and Assortativity Table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Result of Autocorrelation Assortativity: Observed, Expected, [Link]., P(>=Obs.), P(==Obs.),

P(<=Obs.) are created. ‘p(>=Obs.)’ means the possibility of ‘expected >= observed.

111
NetMiner Module Reference

 Tables
Assortativity Table
For each link, attribute vector of source node and target node

is reported. The Pearson correlation of two vectors is degree

assortativity.

 Time Complexity
 O(m)

 Reference
 M. E. J. Newman, 2002, Assortative mixing in networks, Phys. Rev. Lett. 89, 208701 (2002).

 Related Topics

112
II. Analyze

Analyze >> Neighbor >> Equicentrality

 Menu
Analyze >> Neighbor >> Equicentrality

 Description
Equicentrality is a measure that quantifies the degree of similarity between the connected nodes’

centralities. A high Equicentrality for a network, means the tendency of nodes with high centrality to

connect to others with high centrality, and similarly for low centrality.

This is the same as the measure ‘Assortativity’ (Newman 2002), in that both tell how similarly the

nodes are linked to each other in the network. However, the Assortativity is based on the Pearson

correlation, and the Equicentrality is based on the Euclidean distance.

This difference results two remarkable points (S. M. Kang 2007). One is that, in case of Assortativity,

network of nodes connected to very similar ones but not linearly related may have a low correlation

value (thus low Assortativiy). In Equicentrality, this problem is adjusted by replacing it with the

Euclidean distance.

The other is that, due to the intrinsic characteristics of two similarity measures, Pearson correlation

and Euclidean distance, Equicentrality is more robust to missing links. Particularly for the small and

dense network (N < 500), Equicentrality outperforms the Assortativity in terms of error caused by the

missing data(S. M. Kang 2007).

Equicentrality is calculated by the expression presented below.

Notations

E : set of links in the network

ci,e : centrality of one node incident to link e

cj,e : centrality of the other node incident to link e

113
NetMiner Module Reference

The denominator in the second term( max{…} ) is determined by assuming the highest un-equally

connected network(network having starred topology). As the differences between the centralities of

the connected nodes tend to be large, the second term in the expression gets close to 1, resulting the

Ec value to be near at zero. For the case of opposite tendency, the second term will be small,

resulting Ec value close to 1.

Here, three kinds of centralities are available: Degree Centrality, Betweenness Centrality, Closeness

centrality.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You can symmetrize your data before running module.


By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

114
II. Analyze

 Main process
Centrality Measure: Select which centrality measure should be used
to compute Equicentrality measure. For the Closeness centrality, user

should decide how to handle unreachable nodes.

 Post-process
Significance Test: You can test the significance of your data.
Available method is ‘Quadratic Assignment Procedure (QAP)’. It

makes new networks by changing rows and columns, and computes

expected value in those networks. ‘Iteration’ option controls how many

networks should be made.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Equicentrality’ analysis,

Main Report and Centrality vector are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Result of Equicentrality: Observed, Expected, [Link]., P(>=Obs.), P(==Obs.), P(<=Obs.) are

created. ‘p(>=Obs.)’ means the possibility of ‘expected >= observed.

115
NetMiner Module Reference

 Tables
Centrality Vector
The centrality values of selected measure are reported.

 Time Complexity
 Degree centrality: O(m)

 Betweenness centrality: O(n^3)

 Closeness centrality: O(n^3)

 Reference
 Soong Moon Kang, A note on measures of similarity based on centrality. Social Networks
29(2007) 137-142.

 Related Topics
 Analyze >> Neighbor >> Assortativity

 Analyze >> Centrality >> Degree

 Analyze >> Centrality >> Closeness

 Analyze >> Centrality >> Betweenness > Node

116
II. Analyze

Analyze >> Subgraph >> Dyad Census

 Menu
Analyze >> Subgraph >> Dyad Census

 Description
A dyad is composed of two nodes and possible links between the nodes. There are three types of

dyad, acronymed as MAN, and every relation patterns between two nodes can be represented by

these types. ‘Dyad Census’ module computes MAN indices which are:

a) Number of mutually connected pairs (Mutual, ‘M’ of ‘MAN’),

b) Number of asymmetrically connected pairs (Asymmetric, ‘A’ of ‘MAN’),

c) Number of not connected pairs (Null, ‘N’ of ‘MAN’).

Moreover for significance test, the algorithm simulates dyad census from uniformly randomly

generated matrices having same row marginal totals, column marginal totals from given sociomatrix

and report average, mean, etc. In fact, the simulation mechanism is called

MCMC[U(X_{i+},X_{+j})].

 Process Flow

117
NetMiner Module Reference

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Post-process
Significance Test: You can test the significance of your data. If
significance test is started, it makes new networks whose property is

same to input network, and computes the number of Mutuals,

Asymmetrics, Nulls in those networks. Available method for

significance test is Markov Chain Monte Carlo(MCMC). In the

‘Iterations’ option, you can decide how many matrices will be made

for significance test. The more iteration is performed, the more reliable result can be obtained in spite

of more computing time required.

- MCMC[U(Xi+, X+j)]: Generate new matrices whose row marginal totals and column totals are

same to input matrix. Thus, nodes’ in-degree and out-degree of new matrices are same to those of

original matrix. The number of MAN in new matrices is used for comparing with the number of

original matrix’s MAN.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Dyad Census’

analysis, Main Report and Dyad Census Table are created.

118
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Dyad Census: Observed values and statistics resulted from Dyad Census simulation are showed in

Main Report. It contains observed, expected, standard deviation, standard error and variance. But, if

significance test isn’t performed, only observed value will be displayed.

When you interpret the result, compare the observed and the expected. For example, please assume

that Observed Mutuals are 15, Expected Mutuals are 10, and Standard deviation is 0.5. In this case,

Mutuality of observed is much higher than the expected.

 Tables
Dyad Census Table: Observed value, expected value, standard deviation, Standard Error, and the
variance are reported. But, if you do not test significance, some results will not be available with only

observed value being reported.

 Time Complexity
 O(n^2 * number of iterations)

119
NetMiner Module Reference

 Reference
 John M. Roberts Jr., 2000. Simple methods for simulating sociomatrices with given marginal
totals. Social Networks 22, 273-283

 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications,
Cambridge, 1994, p. 510, 13.3 Dyads

 Related Topics

120
II. Analyze

Analyze >> Subgraph >> Triad Census

 Menu
Analyze >> Subgraph >> Triad Census

 Description
A triad is composed of three nodes and links between the nodes. There are 16 types of triad, and

every relation patterns between three nodes can be represented by these types. The name and shape

of each triad are presented in the following figure.

‘Triad Census’ analysis counts 16 triad isomorphism classes. And for significance test, algorithm

computes triad census from randomly generated matrices having same row marginal totals, column

marginal totals, and optionally dyad census (with given sociomatrix) and it reports average, mean, etc.

121
NetMiner Module Reference

 Process Flow

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Post-process
Significance Test: You can test the significance of your data. If
significance test is started, it makes new networks whose property is

same to input network, and computes the number of 16 triad

isomorphism classes in those networks. Available method for

significance test is Markov Chain Monte Carlo(MCMC). In the

‘Iterations’ option, you can decide how many matrices will be made

for significance test. The more iteration is performed, the more reliable result can be obtained in spite

122
II. Analyze

of more computing time required.

- MCMC[U(Xi+, X+j)]: Generate new matrices whose row marginal totals and column totals are

same to input matrix. Thus, nodes’ in-degree and out-degree of new matrices are same to those of

original matrix. The analyzed result of new matrices is used for comparing with the result of original

matrix.

- MCMC[U(Xi+, X+j, MAN)] : Generate matrices having upper condition and same dyad census with

original matrix. The analyzed result of new matrices is used for comparing with the result of original

matrix.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Triad Census’

analysis, Main Report and Triad Census Table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Observed values and statistics result from Triad Census simulation are reported. For 16 triad

isomorphism classes, Observed, Expected, Standard Deviation, Standard Error and Variance are

presented. But if significance test isn’t performed, only Observed is presented.

When you interpret the result, compare the observed and the expected. For example, please assume

that observed 111D are 120, expected 111D are 100, and standard deviation is 5. In this case, you can

see that 111D of observed are more than expected number.

123
NetMiner Module Reference

 Tables
Triad Census Table
Observed values and statistics result from Triad Census simulation are presented. For 16 triad

isomorphism classes, observed, expected, standard deviation, standard error and variance are

reported. But, if you don’t test significance, some results cannot be defined. So in that case, only

observed value is reported.

 Time Complexity
 O(n^2 * number of iterations)

 Reference
 Holland, P.W., and Leinhardt, S. (1970). A method for detecting structure in sociometric data.

124
II. Analyze

American Journal of Sociology. 70, 492-513.

 Davis, J.A., and Leinhardt, S. (1968). The structure of positive interpersonal relations in small
groups. In Berger, J. (ed.), Sociological Theories in Progress. Volume 2, pages 218-251. Boston:

Houghton Mifflin.

 John M. Roberts Jr., 2000. Simple methods for simulating sociomatrices with given marginal
totals. Social Networks 22, 273-283

 Related Topics
Analyze >> Subgraph >> Dyad Census

125
NetMiner Module Reference

Analyze >> Subgraph >> Triad Combination

 Menu
Analyze >> Subgraph >> Triad Combination

 Description
‘Triad Combination’ analysis helps users to simulate Triad Census operationally. This algorithm

gives coefficient to Triad Census result, and combines the result linearly. That is, with this algorithm,

you can weight result of Triad Census with real numbers. Triad Combination gives observed value,

expected value, standard deviation of values, Tau value of linear combination of a triad census.

 Process Flow

126
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Main process
Weight Vector: set linear combination coefficient of each triad pattern.

Triad Combination module multiplies coefficient value to each

number of triad type. For example, the number of triad whose

coefficient is set by 0 does not contribute to linear combination

coefficient.

Default vector (:0,0,0,0,0,0,0,0,1,0,0,2,2,1,3,6) represents transitivity-

like measures.

 Post-process
Significance Test: You can test the significance of your data. If
significance test is started, it makes new networks whose property is

same to input network, and computes the number of 16 triad

isomorphism classes in those networks. Available method for

significance test is Markov Chain Monte Carlo (MCMC). In the

‘Iterations’ option, you can decide how many matrices will be made

for significance test. The more iteration is performed, the more

reliable result can be obtained in spite of more computing time

required.

- MCMC[U(Xi+, X+j)]: Generate new matrices whose row marginal

totals and column totals are same to input matrix. Thus, nodes’ in-

degree and out-degree of new matrices are same to those of original matrix. The analyzed result of

127
NetMiner Module Reference

new matrices is used for comparing with the result of original matrix.

- MCMC[U(Xi+, X+j, MAN)] : Generate matrices having upper condition and same dyad census as

original matrix. The analyzed result of new matrices is used for comparing with the result of original

matrix.

- MCMC[U(MAN)]: Generate matrices having same dyad census as given matrix. The analyzed

result of new matrices is used for comparing with the result of original matrix.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Triad

Combination’ analysis, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Observed values and statistics result from linear combination of triad census are reported. Observed,

Expected, Standard Deviation, Standard Error and Variance are presented.

- Tau: It is the normalized value of weight configuration vector. Tau value usually is assumed to have

almost normal distribution whose mean is 0 and variance is 1.

- p-value of Tau: It is the p-value of Tau which is given for two tailed test. Under the above

condition(mean 0, variance 1, normal distribution),

128
II. Analyze

 Time Complexity
 O(n^2 * number of iterations)

 Reference
 Holland, P.W., and Leinhardt, S. (1970). A method for detecting structure in sociometric data.
American Journal of Sociology. 70, 492-513.

 Davis, J.A., and Leinhardt, S. (1968). The structure of positive interpersonal relations in small
groups. In Berger, J. (ed.), Sociological Theories in Progress. Volume 2, pages 218-251. Boston:

Houghton Mifflin.

 John M. Roberts Jr., 2000. Simple methods for simulating sociomatrices with given marginal
totals. Social Networks 22, 273-283

 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications,
Cambridge, 1994, p. 510, 13.3 Dyads

 Related Topics
 Analyze >> Subgraph >> Dyad Census

 Analyze >> Subgraph >> Triad Census

129
NetMiner Module Reference

Analyze >> Subgraph >> Motif Search

 Menu
Analyze >> Subgraph >> Motif Search

 Description
This performs motif analysis of user-defined graph. You can draw the subgraph structure of your

interest by using Graph Editor.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one 1-mode Network at once.
- Link Merge: When selected data contains multiple link (more than two links which are composed of

same source node and target node), you should decide how to merge them to a single link.

 Main Process
Search Motif: Draw Motif to search. You can draw a new node just by clicking left button on the

blank space. You can draw a new link by clicking a node and then clicking another node. To delete a

link or a node, first click the node or link, and then click mouse right button on that node or link. You

can see ‘delete’ option in the menu.

130
II. Analyze

 Post-process
Significance Test: You can test the significance of your data.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Motif Search’ analysis, Main Report, Instance List Table, Frequencies Table,

Instance Affiliation Table and Spring Map are created.

131
NetMiner Module Reference

 Outputs
Each output is separated by Inner Tab at the bottom of Output Window.

 Reports
- Main Report

 Tables
- Instance List

132
II. Analyze

- Frequencies:

 Maps
- Spring Map

 Time Complexity
 O(n^2 * number of iterations)

133
NetMiner Module Reference

 Reference
 Grochow and Kellis, 2007, Network Motif Discovery Using Subgraph Enumeration and
Symmetry-Breaking

 Related Topics
Analyze >> Subgraph >> Dyad Census

Analyze >> Subgraph >> Triad Census

134
II. Analyze

Analyze >> Connection >> Shortest Path

 Menu
Analyze >> Connection >> Shortest Path

 Description
Shortest Path analyzes indirect connectedness among the nodes in a network. Path is a sequence of

nodes and lines where all nodes and lines are distinct.

A geodesic (or shortest) path between a pair of nodes is the path (between those nodes) whose length

is shortest. Geodesic distance between a pair of nodes is the length of shortest path between the two

nodes. If there is a path between two nodes, then they are said to be reachable.

In this module, calculating method is different based upon whether the input network data is

weighted or not, which therefore requires your right selection on options that fit for your intention.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running module.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

135
NetMiner Module Reference

 Main process
Weight Option: decide how to handle the weight value of link. If you
dichotomized your data at pre-process step, weight option is

meaningless.

- Reverse: Weight is strength. The larger the link weight, the closer the

two nodes are located. So, link weight is transformed to distance using

linear interval reverse function. For example, if the minimum weight

is 1 and maximum weight is 3, 1 is transformed to 3 and 3 is transformed to 1.

- As is: Link weight is cost. So link weight is considered as distance between nodes. Thus, the larger

link weight is, the farther two nodes are. In this case, sum of link weight becomes the geodesic

distance.

Closeness transform: converts distance matrix to a closeness matrix by selected methods.

Dmax : max. distance between nodes (except infinity)

Dmin : min. distance between nodes

regarding distance between i and j ( Di , j )

- Linear: if i == j, Di , j = 1,

if Di , j == INFINITY, Di , j = 0,

else, Di , j = 1 - (d[i][j] - Dmin ) / Dmax

- Division: if i == j, Di , j = Dmax + Dmin ,

if Di , j == INFINITY, Di , j = 0,

else, Di , j = Dmax / Di , j

136
II. Analyze

- Subtract: if i == j, Di , j = Dmax + Dmin ,

if Di , j == INFINITY, Di , j = 0,

else, Di , j = ( Dmax + Dmin ) - Di , j

- Fixed Decay(+Beta): if i == j, Di , j =1

if Di , j == INFINITY, Di , j =0

else, Di , j = beta ^ Di , j

Unreachable Option: This option defines the distance between two nodes which have no path. It is
computed by adding user-defined value to diameter. Network diameter is the length of the largest

geodesic distance between any pair of nodes.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Shortest Path’ analysis, Main Report, Distance Matrix and Spring Map are created.

137
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Network Diameter: It is the length of the largest geodesic distance between any pair of nodes.

- Distribution of Distances and Reachables: Mean, standard deviation, minimum value and

maximum value are reported for geodesic distance, number of reachable node which is connected by

in-link or out-link.

 Tables
Distance Matrix: It is a 1-mode Network whose cell represents geodesic distance between a pair of
nodes.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

138
II. Analyze

 Inspect
This module explores the shortest path between two selected source node and target node.

 k-Shortest Path
Source Node and Target Node
Selecting a Source Node and a Target Node makes the node style

of the matching Source Node, Target Node and Intermediate

Nodes that connects the two selected nodes on the network map

change as pre-established node style in the global option as follows

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

- Intermediate Node: Node >> Focus Pair >> Related Node(s)

- Other Node: Node >> Focus Pair >> Other Node(s)

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

Path Number

139
NetMiner Module Reference

This module shows the shortest path to the k-th shortest path. They are presented in the below area.

 k-Neighbor
Focal Node
After a focal node is selected, the styles of matching focal node and its

neighbors are changed as pre-established node style in the global

option as follow.

- Focal Node: Node >> Focus Node >> Focal Node

- Neighbors: Node >> Focus Node >> Related Node(s)

- Other Nodes: Node >> Focus Node >> Other Node(s)

You can search node using the blank area by writing some parts of the

Node Label in that area. But you need to click the Node Label below the text box that shows the

search result.

Distance
Selected Distance (k) is a criterion for finding the k-neighbors of the focal node.

Direction
Selected Direction is the criterion for finding k-neighbors of the focal node.

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 (Known as Floyd & Warshall Algorithm). Robert W. Floyd. Algorithm 97 (SHORTEST PATH).
Communications of the ACM, 5(6):345, 1962.

 Finding the K Shortest Loopless Paths in a Network. Jin Y. Yen. Management Science, Vol. 17,
No. 11, Theory Series. (Jul., 1971), pp. 712-716.

 Related Topics

140
II. Analyze

Analyze >> Connection >> All Path Finding

 Menu
Analyze >> Connection >> All Path Finding

 Description
This algorithm finds all simple paths from a starting node (source) to a ending node (target) in a

network. The paths may be enumerated with a depth-first search. The search can avoid repeating

vertices by marking them as they are visited in the recursion, then removing the mark just before

returning from the recursive call.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Main process

Select Node: Select a start node and an ending node.

Intermediate Node(s) Option: When intermediate node option is activated, you can select
intermediate node(s). So, results show only paths by way of intermediate node.

141
NetMiner Module Reference

Distance Option: When this distance range option is given, result


shows only paths that have a geodesic distance given in this distance

range option.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘All path Finding’

analysis, Main Report, Indexing Label Table, Path Table and Spring

Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Starting Node

- Ending Node

- Intermediate Node(s)

- Min Distance: The shortest distance among all path distance

- Max Distance: The longest distance among all path distance

- # of path: The number of all simple path

142
II. Analyze

 Tables
Indexing label: It is the table that shows a index to a label.

Path: It is table that shows a simple path


sequence and distance

 Maps
Spring Map
- Default Layout: A map is drawn by

Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by

Common option in the Preference >>

Node tab.

143
NetMiner Module Reference

 Inspect
Select Path
You can see the available pathways in the Select Path control item. Selected pathway is represented

on the network map as follows.

<Example Screen shot>

 Time Complexity
 O(N^2 * Max Depth)

 Reference
 Migliore. An Algorithm to find all paths between two nodes in a graph. Journal of
Computational Physics 87, 231-236.

144
II. Analyze

Analyze >> Connection >> All Cycle Finding

 Menu
Analyze >> Connection >> All Cycle Finding

 Description
This module finds all the elementary circuits of a directed graph.
The idea of this algorithm is to enumerate all cycles in a strongly connected component. Next step

will be to implement the "feedback arc set" of this connected component to find the optimal way to

break these loops so that the connected component can be partitioned into smaller DAGs.

The algorithm resembles algorithms by Tiernan and Tarjan, but is faster than these algorithms
because it considers each edge at most twice between any one circuit and the next in the output

sequence.

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only choose
one 1-mode network.

 Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target

node pair, a user should decide how to merge them into a single

link.

 Pre-process
Dichotomize: A user needs to dichotomize data before running a module. The weighted or valued
data is transformed to unweighted or binary data as a result of

dichotomizing data.

 Main process

145
NetMiner Module Reference

Intermediate Node(s) Option: When the intermediate node option is


activated, a user can select an intermediate node(s). If selected, results

show only cycles that contain an intermediate node.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘All Cycle Finding’ analysis, ‘Main Report’, ‘Cycle

Table’ and ‘Spring Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
 # of Cycles: The number of all cycle path

 Intermediate Nodes: Selected intermediate node(s)

 Min Distance

 Max Distance

146
II. Analyze

 Tables
Cycle: Contains information about cycle sequence and distance.

 Maps
Spring Map
 Default Layout: Kamada& Kawai algorithm (Spring >>Kamada& Kawai) is used to draw a

map by default.

 Default Style: The default style is set according to ‘Common’ option in ‘Preference >> Node’

tab. The size of a node on the map is proportional to its centrality score (e.g. a node with the

highest centrality score will be depicted as the biggest node on a map).

147
NetMiner Module Reference

 Inspect
A user can see the available cycle of each node on a map in ‘Select Subgraph Search’ control item.

The selected cycle is represented on a network map as follows.

 Time Complexity
 O( (n + m) * (c + 1) ) where c is the number (#) of cycles.

 References
 Donald B. Johnson: Finding All the Elementary Circuits of a Directed Graph. SIAM Journal

on Computing. Volumne 4, Nr. 1 (1975), pp. 77-84

148
II. Analyze

Analyze >> Connection >> Dependency

 Menu
Analyze >> Connection >> Dependency

 Description
‘Dependency’ analysis measures how much node i is dependent to node j when going to other nodes.

More precisely, the dependency value of i on j is proportional to the sum of the fraction [the number

of geodesics from i to each node in the network via j] / [the number of geodesics from i to each node

in the network]. The computation process is same as that of the Betweenness Centrality.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running module.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Main process
Direction: Direction of links should be considered to compute
dependency.

- In: In-path dependency. After ‘In’ is selected in this option, geodesics from every reachable node k

149
NetMiner Module Reference

to node i are considered, instead of considering geodesics from node i to node k(each node in the

network).

- Out: Out-path dependency

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Dependency’

analysis, Main Report, Dependency Matrix and Spring Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Distribution of dependency: Mean, Standard deviation, Minimum value, and Maximum value of

dependency are reported.

 Tables
Dependency Matrix
It is a 1-mode Network whose cell represents

dependency between a pair of nodes

150
II. Analyze

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module shows the dependency value of selected two nodes.

 Lookup Dependency
Two Nodes Selection
After a source node and a target node are selected, node style of

the matching two nodes on the network map changes as pre-

established node style in the global option as follow.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The dependency value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

151
NetMiner Module Reference

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Freeman, L.C. (1980). The gatekeeper, pair-dependency, and structural centrality. Quality and
Quantity. 14, 585-592.

 Related Topics
 Analyze >> Connection >> Shortest Path

152
II. Analyze

Analyze >> Connection >> Node Connectivity

 Menu
Analyze >> Connection >> Connectivity >> Node

 Description
‘Node Connectivity’ module analyzes vulnerability of network. The node connectivity of a dyad is
the minimum number of nodes whose removal makes two nodes unreachable. Normally, a higher

value means more robust connection between them. For example, you can make no paths between

node a and c just by removing node b, in a network that contains two links (a->b), (b->c). In this

example, after removing 1 node(b), no path exists between node a and c. So, Node Connectivity of (a,

c) is 1.

Node Connectivity of whole network is minimum Node Connectivity of whole node connectivity of

node pairs. That is, Node connectivity of a network is the minimum number of nodes whose removal

results in a disconnected network.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
You should symmetrize your data before running module. By

symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data, algorithm will perform faster.

153
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Node Connectivity’

analysis, Main Report, Node Connectivity Matrix and Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Distribution of node connectivity: It shows the distribution of Node Connectivity. It is composed of

Mean, Standard deviation, Min and Max.

 Tables
Node Connectivity Matrix
It is the matrix which shows Node Connectivity for every pair of nodes.

154
II. Analyze

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

 Inspect
Node explores the Node Connectivity between two selected nodes.

 Lookup Node Connectivity


Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Node Connectivity Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label

in that area. But you need to click the Node Label below the text box that shows

the search result.

155
NetMiner Module Reference

The change of selected item is reflected on the network map just by clicking the

Submit button

 Bridge / Cutpoint
Checking the Show check box makes the style of the Cut Points and

Bridges of the network change as pre-established style of Cut Point

and Bridge in the global option.

- Bridge: Link >> Path >> Bridge

- Cutpoint: Node >> Focus Node >> Related Node(s)

 Time Complexity
 O(n^3)

 Reference
 (Definition) Harary, F. (1969). Graph Theory. Reading, MA : Addison-Wesley. p. 43

 (Implementation Origin) S. Even, Graph Algorithms, Computer Science Pr;(1979).

 (Implementation) Abdol-Hossein Esfahanian, On the Evolution of Graph Connectivity


Algorithms. - P.10, Algorithm 9

 Related Topics
Analyze >> Connection >> Shortest Path

156
II. Analyze

Analyze >> Connection >> Link Connectivity

 Menu
Analyze >> Connection >> Connectivity >> Link

 Description
‘Link Connectivity’ analyzes vulnerability of connections among the nodes in a network. Line-

connectivity of a pair of nodes is minimum number of lines that must be removed to leave no path

between two nodes. The minimum connectivity between any pair of nodes becomes the network link

connectivity. (i.e. minimum number of lines that must be removed to make the network

disconnected)

‘Link Connectivity’ module treats 1 link as 1 step. That is, all links are treated equally without regard

to weight value. Thus, before running ‘Link Connectivity’, you should dichotomize your data.

- Bridge: a line of a network, which if deleted along with any incident nodes would increase the

number of connected components.

- Cutpoint (articulation node): a node of a network, which if deleted along with any incident lines

would increase the number of connected components.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

157
NetMiner Module Reference

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

And if you symmetrize your data, algorithm will perform faster.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Link

Connectivity’ analysis, Main Report, Link Connectivity Matrix and

Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Network Line-connectivity

Distribution (Sum, Mean, Std. Dev, Min, Max) of Line-Connectivity

Number & Member of Bridges

Number & Member of Cutpoints

158
II. Analyze

 Tables
Distance Matrix: geodesic distance between all pairs of nodes

 Maps
Spring Map
- Default Layout: A map is drawn by

Spring >> Kamada & Kawai

algorithm.

- Default Styling: Default style is set

by Common option in the Preference

>> Node tab.

159
NetMiner Module Reference

 Inspect
This module explores the Link Connectivity between two selected nodes.

 Lookup Link Connectivity


Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The Link Connectivity Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button.

 Bridge
Checking the Show check box makes the style of the Bridges of the

network change as pre-established style of Bridge in the global option.

- Bridge: Link >> Path >> Bridge

<Example Screen shot>

 Time Complexity
 O(n^3)

160
II. Analyze

 Reference
 (Definition) Harary, F. (1969). Graph Theory. Reading, MA : Addison-Wesley. p. 43

 Related Topics
 Analyze >> Connection >> Shortest Path

 Analyze >> Connection >> Node Connectivity

 Analyze >> Connection >> Maximum Flow

161
NetMiner Module Reference

Analyze >> Connection >> Minimum Cutset

 Menu
Analyze >> Connection >> Min. Cutset

 Description
A cutpoint is a node of a network, which if being deleted along with any incident lines would

increase the number of connected components. A cutset is extended concept of cutpoint from one

node to a set of nodes. That is, a vertex set V is a cutest, if its removal increases the number of

components. Many cutsets with same size can exist in a network. Among them, cutsets whose size is

the minimum are specially treated as minimum cutsets. In fact, the size of a minimum cutset is equal

to the node connectivity of the network. This analysis finds the minimum size cutsets.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running module. By symmetrizing,
directed/asymmetric data is transformed to undirected/symmetric data. And if you symmetrize your

162
II. Analyze

data, algorithm will perform faster.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Minimum Cutset’

analysis, Main Report, Minimum Cutset Affiliation Matrix and Spring

Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # Min. Cutset: the number of minimum cutsets is reported.

- Members of Min. Cutset: the list of cutsets and members of each cutest are reported.

 Tables
Min. Cutset Affiliation Matrix
It is a 2-mode Network matrix whose main

nodes are maintained, and sub nodes are

minimum cutsets

163
NetMiner Module Reference

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module explores the Cutsets of the network.

 Select Cutset
After a Cutset is selected in the inspect item, the style of nodes on the map is changed as pre-

established style in the global option. Corresponding global option is as follow.

- Cutset Nodes: Node >> Subset membership >> Subset member node(s)

- Other Nodes: Node >> Subset membership >> Subset non-member node(s)

<Example Screen shot>

164
II. Analyze

 Time Complexity
 O(2^k * n^3), k: the connectivity of the graph

 Reference
 Arkady Kanevsky. 1993. Finding All Minimum-Size Separating Vertex Sets in a
[Link], vol. 23, 533-541.

 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications,
Cambridge, 1994, 4.2. Connectivity of Graphs

 Related Topics
 Analyze >> Connection >> Node Connectivity

 Analyze >> Connection >> Link Connectivity

 Analyze >> Cohesion >> Cohesive Block

165
NetMiner Module Reference

Analyze >> Connection >> Maximum Flow

 Menu
Analyze >> Connection >> Max. Flow

 Description
‘Max. Flow’ computes maximum flow of all ordered pairs of nodes. Maximum flow from a source

node to a sink node (or a target node) is maximal possible total flow utilizing all the paths, given the

constraint of flow capacity (= weight) for each link.

This module analyzes both directed and undirected networks. After you do symmetrize before

running module, maximum flow will be calculated regardless of link directions.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Symmetrize: You can symmetrize your data before running module.
By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data,

algorithm will perform faster.

166
II. Analyze

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Maximum Flow’

analysis, Main Report, Max. Flow Matrix and Spring Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Max. Flow: Mean, Standard variation, Minimum and Maximum of the Maximum

Flow are reported.

 Tables
Max. Flow Matrix
It is a 1-mode matrix whose cell means maximum flow value between two nodes.

167
NetMiner Module Reference

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module shows the maximum flow value of selected two nodes.

 Lookup Max. Flow


Two Nodes Selection
After a source node and a target node are selected, node style of the

matching two nodes on the network map changes as pre-established

node style in the global option as follow.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The Maximum Flow value between two selected nodes is represented

in the text box. You can search node using the blank area by writing

some parts of the Node Label in that area. But you need to click the Node Label below the text box

168
II. Analyze

that shows the search result.

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 (Implementation) Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein.
(2001). Introduction to Algorithms, Second Edition. The MIT Press. Chapter 26.5 The relabel-

to-front algorithm.

 (Origin) Andrew V. Goldberg. Efficient Graph Algorithms for Sequential and Parallel
Computers. PhD thesis, Department of Electrical Engineering and Computer Science, MIT,

1987.

 Related Topics
 Analyze >> Connection >> Link Connectivity

169
NetMiner Module Reference

Analyze >> Connection >> Topological Sort

 Menu
Analyze >> Connection >> Topological Sort

 Description
This module gives one possible topological sort of a given graph. A topological sort or topological
ordering of a directed graph is defined as follows:

A topological sort is a sequence containing every element in such

that if there exists a path from to , then .

For topological ordering to exist, a given graph should not contain any cycle. Vice versa, any

directed graph that contains no cycle has at least one topological ordering. In other words, a directed

graph is a directed acyclic graph ('DAG') if and only if it has a topological sort.

The most known application of a topological sort is scheduling in project management. The jobs are

represented by vertices, and there is an edge from to if job x must be completed before job y

gets started. Thus, a topological sort gives an order to perform the jobs.

The algorithm used to construct a topological ordering runs in a linear-time for any DAG.

 User Options

 Input

170
II. Analyze

1-mode Network: Select a 1-mode network. A user can only choose


one 1-mode network.

 Link Merge: When selected data contains multiple links,

where more than two links connect the same source node and

target node pair, a user should decide how to merge them

into a single link.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘Topological Sort’ analysis, ‘Main Report’,

‘Topological Order Vector’and‘Layered Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

 Tables
Topological Order Vector: Shows one possible
topological ordering of a given network.

 Maps
Layered Map
 Default Layout: Dig-CoLa algorithm (Layered>>Dig-CoLa) is used to draw a map by default.

 Default Style: The default style is set according to ‘Common’ option in ‘Preference >> Node’

tab.

171
NetMiner Module Reference

 Inspect
A user can trace the ordering sequence of the nodes on the map.

Trace (Layered Map): A user can trace how a node


was chosen step by step.

172
II. Analyze

: Previously visited nodes.


: The node chosen at the current step.

: Unvisited nodes.

 Time Complexity
 O(n + m)

 References
 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001),

Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 549–552

173
NetMiner Module Reference

Analyze >> Connection >> PFnet

 Menu
Analyze >> Connection >> PFnet

 Description
PFnet Module finds the PathFinder Network (r=∞, q=N-1), which is the union set of all existing

MST(minimum spanning tree) networks.

MST is a spanning tree, which is a tree including all nodes in the network, with its sum of link

weights being as small as possible, thus minimum spanning tree. ( or, as large as possible, in the case

of maximum spanning tree ).

Whether the spanning trees have the minimum sum of link weights or, otherwise, the maximum is

selectable by the option “Link Weight”.

The input network must be symmetrized prior to the Main process of PathFinder Network.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Symmetrize: Since the MST on which the PatfFinder Network is
based is defined for the undirected graphs, the symmetrizing Pre-

process is forced to set on.

174
II. Analyze

 Main process
Link Weight: This is the option for choosing the resulting PathFinder
network to be the union of Maximum spanning trees or that of

Minimum spanning trees.

If “Similarity” is chosen, the PFNET based on the Maximum

spanning tree will be output.

On the other hand, if “Dissimilarity” is chosen, the PFNET based on the Minimum spanning tree will

be output.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘PFnet’ analysis,

Main Report, PFnet Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of Links in the Original Network : This is the number of links in the original Network.

- # of Links in the PFnet: : This is the number of links in the PathFinder Network (r=∞, q=N-1).

- Total Weight in the Original Network: This is the sum of weights of existing links in the original

network.

- Total Weight in the PFnet: This is the sum of weights of existing links in the PathFinder Network

(r=∞, q=N-1).

175
NetMiner Module Reference

 Tables
pfnet
The Table gives the matrix representation of

the PathFinder Network. Each non-zero value

in the cells is the weight of the link between

the node indexed by the cell’s row and that

indexed by the cell’s column.

 Maps
Spring Map
- Default Layout: A map is drawn by

Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set

by Common option in the Preference

>> Node tab.

 Time Complexity
 O(|E| log(|E|))

176
II. Analyze

 Reference
 A quick MST-based algorithm to obtain Pathfinder networks. Arnaud Quirin, etc. Journal of the

American Society for Information Science and Technology. (2007)

 Related Topics

177
NetMiner Module Reference

Analyze >> Connection >> Influence

 Menu
Analyze >> Connection >> Influence

 Description
Influence Matrix is used in computing Katz or Hubbel’s status. When column is selected as influence

direction, (i, j) element of matrix represents the influence from j to i. When row is selected, (i, j)

represents the influence from i to j.

Katz Influence is an index that shows influence sum considering every existing walk from i to j. The

longer the length of a walk, influence decreases exponentially by attenuation factor. Hubbel

Influence is same as Katz’s except a point. When computing Influence, Hubbel’s considers

influences whose source node and target node are same. (In other words, the effect of a node on itself

is included in computing.)

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Main process
Direction: In or Out
- In: (i, j) value of influence matrix means the influence from j to i.

- Out: (i, j) value of influence matrix means the influence from i to j.

178
II. Analyze

Type of Influence: Select computing method.


- Katz: the effect of a node on itself is ignored in computing procedure.

- Hubbel: the effect of a node on itself is considered in computing procedure.

Attenuation Factor (-1 < beta < 1): default value = 0.5. Ideally attenuation factor input value must
be less than the reciprocal of the principal Eigenvalue. As we do not know the value of Eigenvalue in

advance, program calculates Eigenvalue and recodes the input properly. That is, (recode 1 to

1/principal eigenvalue, and value smaller than 1 to value/principal eigenvalue) Recoded attenuation

parameter is displayed in the report.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Influence’ analysis,

Main Report, Influence Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Recoded attenuation factor: attenuation factor divided by principal

Eigenvalue.

- Distribution of Influence: Mean, Standard variation, Minimum value and Maximum value are

reported.

179
NetMiner Module Reference

 Tables
Influence Matrix
It is a 1-mode matrix whose cell means influence value between two nodes.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

180
II. Analyze

 Inspect
This module shows the influence value of selected two nodes.

 Lookup Influence
Two Nodes Selection
After a source node and a target node are selected, node style of the

matching two nodes on the network map changes as pre-established

node style in the global option as follow.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The Influence value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

 Time Complexity
 O(n^3)

 Reference
 Hubbell C H (1965). "An input-output approach to clique identification“. Sociometry, 28,
pp377-399

 Katz L (1953). "A new status index derived from sociometric data analysis". Psychometrika,18,
pp34-43.

 Related Topics

181
NetMiner Module Reference

Analyze >> Connection >> Accessibility

 Menu
Analyze >> Connection >> Accessibility

 Description
This analysis is one of methods for researchers to use for an interdependency network from

information flow network. The (i,j) element of the accessibility matrix means the probability that

information is transferred from i to j via at most k steps when the probability that information is

transferred through one link is rhou.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Main process
Number of Steps (k > 1): Number of steps to track information
spread. Default value = 2. This means that no information spread over

182
II. Analyze

2 steps in input network.

Transmit probability (0 < rhou < 1): default value = 0.5. It is the probability that information is
transferred through one link successfully.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Accessibility’ analysis,

Main Report, Accessibility Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Accessibility: Mean, Standard variation, Minimum and Maximum of accessibility

are reported.

 Tables
Accessibility matrix
It is a 1-mode matrix whose cell means accessibility value between two nodes.

183
NetMiner Module Reference

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default Styling: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module shows the accessibility value of selected two nodes.

 Lookup Accessibility
Two Nodes Selection
After a source node and a target node are selected, node style of the

matching two nodes on the network map changes as pre-established

node style in the global option as follow.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The Accessibility value between two selected nodes is represented in the text box. You can search

node using the blank area by writing some parts of the Node Label in that area. But you need to click

the Node Label below the text box that shows the search result.

184
II. Analyze

 Time Complexity
 O (k x n^3) where k is # steps.

 Reference
 Noah E. Friedkin, 1991. Theoretical Foundations for Centrality Measures. AJS 96 Number 6,
1496, equation (28)

 Related Topics

185
NetMiner Module Reference

Analyze >> Cohesion >> Component

 Menu
Analyze >> Cohesion >> Component

 Description
This module analyzes cohesive structure of a network based on the reachability among nodes. A

Component is a maximal connected sub-graph of a graph.

In a directed network, Weak Component is the maximal sub-graph in which any pair of nodes is

connected by a semi-path (which is a path provided that the direction of links is ignored). And Strong

Component (strongly connected component) is the maximal sub-graph in which any pair of nodes is

connected by both directions.

Please note that directed network and undirected network are handled differently during analysis;

therefore, symmetrizing network is not necessary in ‘Component’ analysis. Also, it considers only

reachability ignoring the weight of links.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Main process
Minimum size of Component: Only components with at least ‘minimum size of component’
vertices are reported.

Component Type: Only components of selected type are reported in output window. Weak

186
II. Analyze

component and strong component can be selected. This option is used

when you analyze directed network.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Component’ analysis,

Main Report, Component Partition Vector and Clustered Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of components: The number of components that exist in the network.

- Members of components: It shows names of nodes affiliated with each component.

- Subgroup Details: Size of a component is the number of nodes affiliated with the component.

Percent is calculated by (number of a component’s nodes/number of nodes). Density value is

calculated by (number of links present / number of maximal possible links) in given component.

187
NetMiner Module Reference

 Tables
Component partition vector
This vector shows the component partition value of each node. Nodes in same component have same

partition value.

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

components.

188
II. Analyze

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the components which are found by analysis on the map.

 Select Component
After a Component Partition Vector is selected, the style of nodes on the map is changed as pre-

established style in the global option. Corresponding global option is as follow.

- Nodes of selected component: Node >> Subset Membership >> Subset Member Node(s)

- Nodes of non-selected component: Node >> Subset Membership >> Subset Non-member Node(s)

189
NetMiner Module Reference

<Example Screen shot>

 Time Complexity
 O(m)

 Reference
 Ellis Horowitz, Sartaj Sahni and Dinesh Mehta. (1999). Fundamentals of Data Structures in C++.
Computer Science Press. (Chap. 6.2.3)

 Related Topics
 Analyze >> Connection >> Node Connectivity

 Analyze >> Connection >> Link Connectivity

 Analyze >> Connection >> Minimum Cutset

190
II. Analyze

Analyze >> Cohesion >> Bi-Component

 Menu
Analyze >> Cohesion >> Bi-Component

 Description
This module analyzes cohesive structure of a network based on the reachability among nodes. A Bi-

component (or bi-connected component) of a graph is a maximal sub-graph not separated by deletion

one node. There are at least two different paths between any two nodes in the Bi-component. ‘Bi-

component’ analysis is related to the ‘connectivity’ analysis. Bi-component results from removing a

cutpoint (articulation node) or a bridge.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

191
NetMiner Module Reference

Symmetrize: You should symmetrize your data before running module. By symmetrizing,
directed/asymmetric data is transformed to undirected/symmetric data.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Bi-Component’

analysis, Main Report, Bi-Component Affiliation Matrix and

Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of bi-components: The number of bi-components that exist in the network.

- Members of bi-components: It shows names of nodes affiliated with each bi-component.

- Size & Density of bi-components: Size of a bi-component is the number of nodes affiliated with the

bi-component. Density value is calculated by (number of links present / number of maximal possible

links) in given bi-component.

192
II. Analyze

 Tables
Bi-Component Affiliation Matrix
It is a 2-mode Network matrix whose main nodes are maintained, and sub nodes are bi-components.

 Maps
Clustered Map
Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

bi-components.

Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the bi-components, bridges and cutpoints which are found by analysis on the map.

 Select Bi-Component
After a Bi-Component Partition Vector is selected in the inspect item, the style of nodes on the map

193
NetMiner Module Reference

is changed as pre-established style in the global option. Corresponding global option is as follow.

Nodes of selected bi-component: Node >> Subset Membership >> Subset Member Node(s)

Nodes of non-selected bi-component: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

 Bridge / Cutpoint
Checking the Show check box makes the style of the Cutpoints and Bridges of the network change as

pre-established style in the global option. Corresponding global option is as follow.

- Bridge: Link >> Path >> Bridge

- Cutpoint: Node >> Focus Node >> Related Node(s)

<Example Screen shot>

194
II. Analyze

 Time Complexity
 O(m)

 Reference
 Ellis Horowitz, Sartaj Sahni and Dinesh Mehta. (1999). Fundamentals of Data Structures in C++.
Computer Science Press. (Chap. 6.2.5)

 Related Topics
 Analyze >> Cohesion >> Component

 Analyze >> Connection >> Connectivity

195
NetMiner Module Reference

Analyze >> Diffusion >> Influence Network >>

Effects

 Menu
Analyze >> Diffusion >> Influence Network >> Effects

 Description
Effects matrix is used in computing Noah E. Friedkin’s Effect Centrality. Rarely there can be case

that inverse used in computing does not exist. In this case, proper analysis result may not exist. This

case is not handled specially.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Normalize: Normalize each row of the matrix in order to normalize
influences from a node to other nodes.

 Main process
Weight Parameter (0 < alpha < 1): Its default value is ‘0.999’.
Ideally, the weight parameter must be less than the reciprocal of the

196
II. Analyze

principal eigenvalue. If ‘Row Normalize’ option of Pre-process is selected, principal eigenvalue is 1.

Note that ‘Row Normalize’ option is already selected in default, because, in most cases, you may use

this option. So, proper weight parameter is in 0 to 1. If the value is near to 1, influence is delivered

far. And if the value is near to 0, range of influence becomes smaller. In the reference article, the

default value(0.999) is recommended.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Influence Network >>

Effects’ analysis, Main Report, Total Effects Matrix, Immediate

Effects Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Effect Scores: It shows distribution of Effect Score. Effect Score is displayed

separately for total effect matrix and immediate effect matrix. (Sum, Mean, [Link]., Min, Max)

197
NetMiner Module Reference

 Tables

Total Effects Matrix: (i, j) element of this matrix means the relative weight of the initial opinion of
actor j in determining the final opinion of actor i.

Immediate Effects Matrix: (i, j) element of this matrix means the average length of sequences from
j to i. Each sequence is weighted according to the strength of its constituent links.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

198
II. Analyze

 Inspect
This module explores the Total Effects and Immediate Effects

between two selected nodes.

 Effects

Two Nodes Selection


Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as pre-established node style in the

global option as follows

- Source Node: Focus Node - Focal Node

- Target Node: Focus Node - Related Node

The Effects Value between two selected nodes is represented in the text box. You can select Total

Effects or Immediate Effects by the radio button. You can search node using the blank area by

writing some parts of the Node Label in that area. But you need to click the Node Label below the

text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot>

199
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference
 Noah E. Friedkin, 1991. Theoretical Foundations for Centrality Measures. AJS 96 Number 6,
1478-1504

 Related Topics

200
II. Analyze

Analyze >> Diffusion >> Influence Network >>


Sequence

 Menu
Analyze >> Diffusion >> Influence Network >> Sequence

 Description
This module is implementation of an algorithm from Noah E. Friedkin’s Social Influence Theory.

This algorithm computes Y’s sequence following this equation: Y (t) = alpha * W Y (t-1) + (1 -

alpha) Y (1), ‘t’ and ‘alpha’ denote ‘time sequence’ and ‘the rate of influence from one node to

another node through link(s)’, respectively. In addition, ‘W’ and ‘Y’ represent the influence matrix

between nodes and the attribute changing by time sequence.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

Select Vector: select a Main Node Attribute data.

 Main process
Weight Parameter (0 < alpha < 1): Its default value = 0.999. Ideally,
the weight parameter must be less than the reciprocal of the principal

201
NetMiner Module Reference

eigenvalue. As we do not know the value of eigenvalue in advance, program calculates eigenvalue

and recodes it properly. (recode 1 to principal eigenvalue and value smaller than 1 to value *

principal eigenvalue) Recoded weight parameter is displayed in the report.

# Changes (0 < k): default value = 10.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Influence

Network >> Sequence’ analysis, Main Report, Sequence Table and

Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Sequence Table: Simulated opinion changes by influence model.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

202
II. Analyze

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module explores the influence flow among the nodes on the network map sequence by sequence.

 Sequence
Selecting a Sequence in the Sequence Combo Box makes the node size

of the nodes on the network map change to represent the influence of a

node at the selected sequence by the size of a node.

The sequence selection buttons below the combo box make the sequence by sequence shift

convenient as follows.

: Shift to the first sequence

: Shift to the previous sequence

: Shift to the next sequence

203
NetMiner Module Reference

: Shift to the last sequence

The change of selected item is reflected on the network map just by clicking the

Submit button

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Noah E. Friedkin, 1991. Theoretical Foundations for Centrality Measures. AJS 96 Number 6,
1478-1504

 Related Topics

204
II. Analyze

Analyze >> Diffusion >> Linear Threshold >> Process

 Menu
Analyze >> Diffusion >> Linear Threshold >> Process

 Description
This module is implementation of the model explaining diffusion of innovations based on direct-

benefit effect: each person’s benefits of adopting an innovation increase as more and more people

adopt it. It assumes situation where an innovation emerges at a number of initial adopters and

propagates through the links of the social network. Each node adopt the innovation if and only if it is

one of the initial adopters or ratio of in-neighbors adopting the innovation is greater than or equal to

threshold value of that node. In valued network, each in-neighbor is weighted by the value of the link

to it. Propagation repeats until there is no node adopting the innovation (equilibrium state). Nodes not

adopting the innovation form clusters, tightly-knit components which make diffusion stop.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one 1-
mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running analysis.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

205
NetMiner Module Reference

 Main process
Threshold (0 <= theta <= 1): In the case of "Homogeneous", every
node has same threshold value typed in. In the case of

"Heterogeneous", nodes have different threshold values which are

selected among node attributes. In the case of "Random", nodes have

different threshold values generated from uniform distribution, U(min,

max) using given seed value.

Initial Adopter: In the case of "Selection", selected nodes are regarded as initial adopters. In the case
of "Attribute", nodes whose selected attribute values are not equal to 0 are regarded as initial

adopters. In the case of "Random", given number of nodes are selected randomly using given seed

value and regarded as initial adopters.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Linear Threshold >>

Process’ analysis, Main Report, Thresholds table, Diffusion Dynamics

table, Diffusion Power table, Diffusion Network table, Diffusion

Process chart and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Number of Ticks to Reach the Equilibrium State : It shows number of

ticks to reach the equilibrium state.

- Number of Initial Adopters / Initial Adoption Rate(%) : It shows number of initial adopters and ratio

of them.

- Nunmber of Final Adopters / Final Adoption Rate(%) : It shows number of adopters in the

equilibrium state and ratio of them.

206
II. Analyze

 Tables
Thresholds Vector: nodes’ threshold values are presented.

Diffusion Dynamics Table: Time when each node adopts the innovation (0 for initial adopters and
X for nodes not adopting the innovation) and cluster number each node belongs to (X for adopters)

are presented.

Diffusion Power Table: Number of neighbors that each node diffuses the innovation directly and
number of nodes that each node diffuses the innovation directly and indirectly (through one or more

step) are presented.

207
NetMiner Module Reference

Diffusion Network Table: A network showing ‘Who diffuses the innovation to whom’ is presented.

 Charts
Diffusion Process Line Graph: Number of nodes new adopters in each tick (Frequency), number of
nodes adopting the innovation before or during each tick (Cumulative) and total number of nodes

(Upperbound) are presented

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: The style of nodes is changed as pre-established node style in the global option as

follows: adopters in the equilibrium state (Node >> Focus Pair >> 1st Node), other nodes (Node >>

Focus Pair >> other Node(s)). The size of nodes means direct diffusion power.

208
II. Analyze

 Inspect
This module explores the diffusion flow among the nodes on the network map tick by tick, diffusion

power of nodes and clusters consisting of nodes not adopting the innovation.

 Diffusion Process
Selecting a tick by moving time slider makes the style of the

nodes and links on the network map is changed as pre-

established style in the global option as follows.

- Nodes adopting the innovation before the tick : Node >> Focus Pair >> 1st Node

- New adopters in the tick : Node >> Focus Pair >> 2nd Node)

- Other nodes : Node >> Focus Pair >> other Node(s)

- Links from nodes adopting the innovation before the tick to new adopters in the tick : Link >>

Diffusion > Diffusion

 Diffusion Power
In the case of "Direct", the size of nodes is changed proportional

to the number of neighbors each node diffuses the innovation

directly. In the case of "Direct & Indirect", the size of nodes is

209
NetMiner Module Reference

changed proportional to the number of nodes that each node diffuses the innovation directly and

indirectly (through one or more step). In the case of "None", the size of nodes is changed equally.

 Cluster
If a cluster is selected in the combo box, the nodes belong to that

clusters are selected in the network map.

<Example Screen shot>

 Time Complexity
 Diffusion Dynamics & Diffusion Process : O(n + m)

 Diffusion Power : O(n^2 + m)

 Reference
 M. Granovetter, "Threshold models of collective behavior", The American Journal of Sociology,
vol. 83, no. 6, pp. 1420–1443, May 1978.

 T. C. Schelling, Micromotives and Macrobehavior. W.W. Norton and Company, 1978.

 D. J. Watts, "A simple model of global cascades on random networks", Proceedings of the
National Academy of Sciences of the United States of America, vol. 99, no. 9, pp. 5766–5771,

April 2002.

210
II. Analyze

 David Easley and Jon Kleinberg, ‘Cascading Behavior in Networks’, Networks, Crowds, and
Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010.

 Related Topics
 Analyze >> Diffusion >> Linear Threshold >> Target

211
NetMiner Module Reference

Analyze >> Diffusion >> Linear Threshold >> Target

 Menu
Analyze >> Diffusion >> Linear Threshold >> Target

 Description
This module is based on a diffusion model of innovations which assumes situation where an

innovation emerges at a number of initial adopters and propagates through the links of the social

network. This module finds a collection of additional adopters (target nodes) to maximize the

diffusion of the innovation. Each node adopt the innovation if and only if it is one of the initial

adopters or it is one of the additional adopters or ratio of in-neighbors adopting the innovation is

greater or equal to threshold value of that node. Propagation repeats until there is no node adopting

the innovation (equilibrium state).

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running analysis.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Main process

212
II. Analyze

Target Final Adoption Rate(%): Set of target nodes should make


final adoption rate greater than or equal to the "Target Final Adoption

Rate". If there is no such set of nodes, set of target nodes should

maximize final adoption rate.

Max Number of Target Nodes: Size of the set of target nodes should
be less than or equal to "Max Number of Target Nodes". If sizes of

target node sets are various, only minimum size target sets are

reported.

Algorithms: In the case of "Basic", it is guaranteed that every target


sets satisfying above conditions will be reported. In the case of

"Branch and Bound", it is guaranteed that one or more target sets

satisfying above conditions will be reported. In the case of "Greedy",

nothing is guaranteed but decent sets of nodes will be reported. "Greedy" is much faster than others

and "Branch and Bound" is faster than "Basic".

Threshold (0 <= theta <= 1): In the case of "Homogeneous", every node has same threshold value
typed in. In the case of "Heterogeneous", nodes have different threshold values which are selected

among node attributes. In the case of "Random", nodes have different threshold values generated

from uniform distribution, U(min, max) using given seed value.

Initial Adopter: In the case of "Selection", selected nodes are


regarded as initial adopters. In the case of "Attribute", nodes whose

selected attribute values are not equal to 0 are regarded as initial

adopters. In the case of "Random", given number of nodes are

selected randomly using given seed value and regarded as initial

adopters.

 Output
You can select which outputs should be reported and which format the

213
NetMiner Module Reference

outputs should be displayed in. In the result of ‘Linear Threshold >> Target’ analysis, Main Report,

Target Nodes table and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Number of Target Node Sets : It shows number of target node sets.

 Tables
Thresholds Nodes: Information about target node sets (one per each column) are presented. For each
target node set, number of target nodes, number of final adopters, final adoption rate and list of target

nodes composing the set are presented.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: The style of nodes is changed as pre-established node style in the global option as

follows: initial adopters (Node >> Focus Pair >> 1st Node), other nodes (Node >> Focus Pair >>

other Node(s)).

214
II. Analyze

 Inspect
This module explores the diffusion effects of target node sets or freely selected nodes.

 Target Node Sets


If a target node set is selected in the combo box, the nodes

belong to that set are selected in the network map. If you click

the "Diffuse" button, the innovation diffuses from selected nodes

and initial adopters and the style of nodes is changed as pre-

established style in the global option as follows.

- Nodes finally adopting the innovation : Node >> Focus Pair >>

1st Node

- Other nodes : Node >> Focus Pair >> other Node(s)

If you click the "Cancel button", the style of nodes is changed as default style and node selection is

initialized.

<Example Screen shot>

215
NetMiner Module Reference

 Time Complexity
 Basic : O( C(n, Max number of target nodes) * m )

 Branch & Bound : O( C(n, Max number of target nodes) * m )

 Greedy : O( n * m * Max number of target nodes )

 Reference
 P. Domingos, M. Richardson. Mining the Network Value of Customers. Seventh International
Conference on Knowledge Discovery and Data Mining, 2001.

 D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of influence through a social
network,” in KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on

Knowledge discovery and data mining. ACM, 2003

 David Easley and Jon Kleinberg, ‘Cascading Behavior in Networks’, Networks, Crowds, and
Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010.

 Related Topics
 Analyze >> Diffusion >> Linear Threshold >> Process

216
II. Analyze

Analyze >> Cohesion >> Clique

 Menu
Analyze >> Cohesion >> Clique

 Description
This module analyzes cohesive structure of a network based on the cohesiveness among the nodes.

Clique is a maximal complete subgraph composed of three or more nodes. It consists of a subset of

nodes, all of which are adjacent to one other, and there are no other nodes in the network that are also

adjacent to all of the members of the clique.

Cliques in a network may overlap, i.e. a node can be a member of more than one cliques. The overlap

structure of cliques can be investigated using clique co-membership and overlap matrices, and

hierarchical clustering of them gives non-overlapping cohesion of nodes and cliques respectively.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running module. By dichotomizing,
weighted/valued data is transformed to non-weighted/binary data.

217
NetMiner Module Reference

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is

transformed to undirected/symmetric data.

 Main process
Minimum size of Clique: Only cliques with at least ‘minimum
size of clique’ vertices are reported.

Algorithm: Peamc algorithm runs much faster than basic


algorithm when the input network is scale-free and has low

clustering coefficient.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Clique’

analysis, Main Report, Clique Affiliation Matrix, Clique Co-

Membership Matrix, Clique Bipartite Matrix, Clique Overlap

Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports
Main Report
- # of Cliques: The number of cliques in the network.

- Members of Cliques: It shows names of nodes affiliated with each cliques.

- Subgroup Details: Show size of cliques and cohesion index value. Size of a clique is the number of

nodes affiliated with the clique. Cohesion index is defined only for undirected graph. It is computed

by [the density of internal ties(clique a -> clique a) / the density of external ties(clique a -> external

nodes)].

218
II. Analyze

 Tables
Clique Affiliation matrix: It is a 2-mode
matrix whose main nodes are maintained,

and sub nodes are cliques.

219
NetMiner Module Reference

Clique Co-membership matrix: It is a


(Main Node by Main Node) 1-mode matrix

whose cell represents the number of co-

membered cliques between two nodes.

Clique Bipartite matrix: (Main Node +


clique) by (Main Node + clique) matrix. It

contains clique affiliation matrix as a sub

block.

Clique Overlap matrix: It is a (clique by


clique) 1-mode matrix whose cell means the

number of nodes overlapped by a pair of

cliques.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

220
II. Analyze

 Inspect
You can see the cliques which are found by analysis on the map.

 Select Clique
After a Clique Partition Vector is selected in the combo box, the style

of nodes on the map is changed as pre-established style in the global

option. Corresponding global option is as follow.

Nodes of selected clique: Node >> Subset Membership >> Subset

Member Node(s)

Nodes of non-selected clique: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

 Time Complexity
 O(2^n)

 Reference
 Bock, R.D., and Husain, S.Z. (1950). An adaptation of Holzinger's B-coefficients for the
analysis of sociometric data. Sociometry. 13, 146-153.

221
NetMiner Module Reference

Analyze >> Cohesion >> Generalized Clique

 Menu
Analyze >> Cohesion >> Generalized Clique

 Description
This module is an ensemble of the n-Clique module and the k-Plex module that takes distance

tolerance (d) and degree tolerance (k) as input parameters. A sub-graph is a generalized clique

when is a maximal sub-graph where each node is close to at least ( ) other nodes in and

one node is close to another node if the geodesic distance from one node to another node is no greater

than .

 If d = 1, a generalized clique is same as k-Plex.

 If k = 1, a generalized clique is same as n-Clique.

Generalized cliques in a network may overlap (i.e. a node can be member of more than one

generalized clique). The overlapping structure of generalized cliques can be investigated using

generalized clique 'Co-membership' and 'Overlap' matrices, and hierarchical clustering of them gives

non-overlapping cohesion of nodes and generalized cliques respectively.

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only choose
one 1-mode network.

 Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target

222
II. Analyze

node pair, a user should decide how to merge them into a

single link.

 Pre-process
Dichotomize: If a user dichotomizes an input network, the
distance between every two adjacent nodes is 1. Otherwise, the distance between two adjacent nodes

is the weight of the link between these nodes.

Symmetrize: If a user symmetrizes an input network, the


direction of links is ignored. Otherwise, the direction of

links is considered thereby calculating a degree and a

geodesic distance based on out-going links.

 Main process
D : Distance Tolerance: The value of a parameter d in the
definition of Generalized Clique

K : Degree Tolerance: The value of a parameter k in the


definition of Generalized Clique.

Minimum Size of G-cliques: Only generalized cliques with


at least ‘minimum size of G-Cliques’ vertices are reported.

Maximum Number of G-cliques: If the number of


generalized cliques found is equal to ‘maximum number of

G-cliques’, the algorithm stops and reportsthese cliques.

Overlap Option: If a user checks ‘Remove Overlapped G-


Cliques’, overlapped cliques are removed so that each node is

a member of at most one generalized clique. Checking this

option also boosts the speed of the algorithm.

223
NetMiner Module Reference

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘Generalized

Clique’ analysis, ‘Main Report’, ‘G-Clique Affiliation Matrix’, ‘G-Clique Comembership Matrix’,

‘G-Clique Bipartite Matrix’, ‘G-Clique Overlap Matrix’ and ‘Spring Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
 # of G-Cliques: The number of generalized cliques in a network

 Members of G-Clique: Shows the names of nodes affiliated with each generalized clique.

 Subgroup Details: Shows size and density of generalized cliques and cohesion index value.

 The size of a generalized clique is the number of nodes affiliated with the generalized

clique.

 Density is computed by (the number of links present) divided by (the number of maximal

possible links).

 Cohesion index, which is defined only for an undirected graph, is computed by (the number

of internal ties) divided by (the number of external ties). Here, internal ties means ties from

a generalized clique to a generalized clique and external ties means ties from a generalized

clique to external nodes.

224
II. Analyze

 Tables
G-Clique Affiliation Matrix: A (# of main
nodes by # of generalized cliques) 2-mode

matrix whose main nodes are from a main

nodeset and sub nodes are generalized cliques.

G-Clique Comembership Matrix: A (# of


main nodes by # of main nodes) 1-mode

matrix where each cell represents the number

of co-membered generalized cliques between two nodes.

G-Clique Bipartite Matrix: A [ (# of main


nodes + # of generalized cliques) by (# of

main nodes + # of generalized cliques) ]

matrix that contains generalized cliques

affiliation matrix as a sub block.

G-Clique Overlap Matrix: A (# of


generalized cliques by # of generalized

cliques) 1-mode matrix where each cell

represents the number of nodes overlapped by

a pair of generalized cliques.

 Maps
Spring Map
 Default Layout: Kamada & Kawai algorithm (Spring >>Kamada & Kawai) is used to draw a

map by default.

225
NetMiner Module Reference

 Default Style: The default style is set according to ‘Common’ option in ‘Preference >> Node’

tab.

 Inspect
A user can locate generalized cliques on a map.

G-Clique
After selecting a ‘Generalized Clique Partition

Vector’ in a combo box, the style of nodes on a map

is changed with the style pre-established in the

global option:

 Nodes of the selected generalized clique:

Node >> Subset Membership >> Subset

Member Node(s)

 Other nodes: Node >> Subset

Membership >> Subset Non-member

Node(s)

226
II. Analyze

 Time Complexity
n
 O(2 )

 Related Topics
 Analyze >> Cohesion >> Clique

 Analyze >> Cohesion >> n-Clique

 Analyze >> Cohesion >> n-Clan

 Analyze >> Cohesion >> k-Plex

 Analyze >> Cohesion >> k-Core

227
NetMiner Module Reference

Analyze >> Cohesion >> n-Clique

 Menu
Analyze >> Cohesion >> n-Clique

 Description
This module analyzes cohesion structure of a network based on the cohesive subset of nodes. If

nodes affiliated in a subset are connected in ‘n’ distances, this subset is called n-Clique. In other

words, n-Clique is a maximal sub-graph in which the largest geodesic distance between any two

nodes is no greater than n. (That is, 1-clique is identical to clique.)

n-Cliques in a network may overlap, i.e. a node can be member of more than one n-Cliques. The

Overlap structure of N-Cliques can be investigated using N-Clique Co-membership and Overlap

matrices, and Hierarchical clustering of them gives non-overlapping Cohesion of nodes and n-

Cliques respectively.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

228
II. Analyze

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data.

 Main process
Maximum Distance (n): largest geodesic distance between pair of
nodes in same clique. (So, Maximum Distance is identical to n of ‘n-

Clique’.

Minimum size of n-Clique: Only n-cliques with at least ‘minimum


size of n-clique’ vertices are reported.

Algorithm: Peamc algorithm runs much faster than basic algorithm


when the input network is scale-free and has low clustering coefficient.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘n-Clique’ analysis,

Main Report, n-Clique Affiliation Matrtix, n-Clique Comembership

Matrix, n-Clique Bipartite Matrix, n-Clique Overlap Matrix and

Spring Map are created.

229
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of N-cliques: The number of n-cliques in the network.

- Members of n-cliques: It shows names of nodes affiliated with each cliques.

- Subgroup Details: Show size, density of cliques and cohesion index value. Size of a clique is the

number of nodes affiliated with the clique. Density is computed by (the number of links present / the

number of maximal possible links). Cohesion index is defined only for undirected graph. It is

computed by [the number of internal ties(n-Clique a -> n-Clique a) / the number of external ties(n-

Clique a -> external nodes)].

230
II. Analyze

 Tables
N-Clique Affiliation matrix: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are n-Cliques.

N-Clique Co-membership matrix: It is a (Main Node by Main Node) 1-mode matrix whose cell
represents the number of co-membered n-Cliques between two nodes.

N-Clique Bipartite matrix: (Main Node + n-Clique) by (Main Node + n-Clique) matrix. It contains
n-clique affiliation matrix as a sub block.

N-Clique Overlap matrix: It is a (n-Clique by n-Clique) 1-mode matrix whose cell represents the
number of nodes overlapped by a pair of n-Cliques.

231
NetMiner Module Reference

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the n-Clique which are found by analysis on the map.

 Select n-Clique
After a n-Clique Partition Vector is selected in the combo box, the

style of nodes on the map is changed as pre-established style in the

global option. Corresponding global option is as follow.

- Nodes of selected n-Clique: Node >> Subset Membership >> Subset

Member Node(s)

- Nodes of non-selected n-Clique: Node >> Subset Membership >> Subset Non-member Node(s)

232
II. Analyze

<Example Screen shot>

 Time Complexity
 O(2^n)

 Reference
 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and
Applications,Cambridge, 1994: p. 271

 Bock, R.D., and Husain, S.Z. (1950). An adaptation of Holzinger's B-coefficients for the
analysis of sociometric data. Sociometry. 13, 146-153.

 Related Topics
 Analyze >> Cohesion >> Clique

233
NetMiner Module Reference

Analyze >> Cohesion >> n-Clan

 Menu
Analyze >> Cohesion >> n-Clan

 Description
This module analyzes cohesion structure of a network based on the cohesive subset of nodes. n-Clan

is a sub-graph in which the geodesic distance between all nodes in the sub-graph is no greater than n

for path within the sub-graph. An n-Clan is an n-Clique which has diameter less than or equal to n.

n-Clans in a network may overlap, i.e. a node can be member of more than one N-Clans. The Overlap

structure of n-Clans can be investigated using n-Clan Co-membership and overlap matrices, and

hierarchical clustering of them gives non-overlapping Cohesion of nodes and n-Clans respectively.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running module. By dichotomizing,
weighted/valued data is transformed to non-weighted/binary data.

234
II. Analyze

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data.

 Main process
Maximum distance (n): a largest geodesic distance between pair of
nodes in same N-clan with paths contained entirely in N-clan.

Therefore, “n” in the n-clans means its diameter value.

Minimum Size of n-Clan: Only n-Clans with at least ‘minimum size of n-Clan’ vertices are reported.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘n-Clan’ analysis, Main

Report, n-Clan Affiliation matrix, n-Clan Co-membership matrix, n-

Clan Bipartite matrix, n-Clan Overlap matrix and Spring Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of n-Clans: The number of n-Clans in the network.

- Members of n-Clans: It shows names of nodes affiliated with each n-Clan.

- Subgroup Details: Show size, density of n-Clans and cohesion index value. Size of an n-clan is the

number of nodes affiliated with the n-clan. Density is computed by (the number of links present / the

235
NetMiner Module Reference

number of maximal possible links). Cohesion index is defined only for undirected graph. It is

computed by [the number of internal ties(n-clan a -> n-clan a) / the number of external ties(n-clan a -

> external nodes)].

 Tables
n-Clan Affiliation matrix: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are n-Clans.

236
II. Analyze

n-Clan Co-membership matrix: It is a (Main Node by Main Node) 1-mode matrix whose cell
represents the number of co-membered n-Clans between two nodes.

n-Clan Bipartite matrix: (Main Node + n-Clan) by (Main Node + n-Clan) matrix. It contains n-Clan
affiliation matrix as a sub block.

n-Clan Overlap matrix: It is a (n-Clan by n-Clan) 1-mode matrix whose cell means the number of
nodes overlapped by a pair of n-Clans.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

237
NetMiner Module Reference

 Inspect
You can locate n-Clans which are found by analysis on the map.

 n-Clan
After an n-Clan Partition Vector is selected in the combo box, the

style of nodes on the map is changed as pre-established style in

the global option. Corresponding global option is as follow.

- Nodes of selected n-Clan: Node >> Subset Membership >>

Subset Member Node(s)

- Nodes of non-selected n-Clan: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

238
II. Analyze

 Time Complexity
 O(2^n)

 Reference
 Mokken, R.J. (1979). Cliques, clubs and clans. Quality and Quantity. 13:161-173.

 Related Topics
 Analyze >> Cohesion >> Clique

 Analyze >> Cohesion >> n-Clique

239
NetMiner Module Reference

Analyze >> Cohesion >> k-Plex

 Menu
Analyze >> Cohesion >> k-Plex

 Description
A sub-graph G_s is a k-Plex when G_s is a maximal sub-graph which each node is adjacent to all

nodes in G_s except for k nodes. For example, 1-Plex is identical to clique and 2-Plex which has 6

nodes is also 4-Core.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data.

240
II. Analyze

 Main process
Maximum k(n): set maximum value of k(n), which is “k” in k-Plex.

Minimum Size of k-Plex: Only k-Plex with at least ‘minimum size of


k-Plex’ vertices are reported.

Algorithm: Default algorithm is set as Pemp. In most cases, this


algorithm runs faster than basic algorithm.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘k-Plex’ analysis, Main

Report, k-Plex Affiliation matrix, k-Plex Comembership matrix, k-Plex

Bipartite matrix, k-Plex Overlap matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of k-Plexes: The number of k-Plexes in the network.

- Members of k-Plexes: It shows names of nodes affiliated with each k-Plex.

- Subgroup Details: Show size, density of k-Plexes and cohesion index value. Size of a k-Plex is the

number of nodes affiliated with the k-Plex. Density is computed by (the number of links present / the

number of maximal possible links). Cohesion index is defined only for undirected graph. It is

computed by [the number of internal ties(k-Plex a -> k-Plex a) / the number of external ties(k-Plex a

-> external nodes)].

241
NetMiner Module Reference

 Tables
k-Plex Affiliation matrix: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are k-Plexes.

k-Plex Co-membership matrix: It is a (Main Node by Main Node) 1-mode matrix whose cell
represents the number of co-membered k-Plexes between two nodes.

k-Plex Bipartite matrix: (Main Node + k-Plex) by (Main Node + k-Plex) matrix. It contains k-Plex
affiliation matrix as a sub block.

242
II. Analyze

k-Plex Overlap matrix: It is a (k-Plex by k-Plex) 1-mode matrix whose cell means the number of
nodes overlapped by a pair of k-Plexes.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can locate the k-Plexes which are found by analysis on the map.

243
NetMiner Module Reference

 k-Plex
After a k-Plex Partition Vector is selected in the combo box, the

style of nodes on the map is changed as pre-established style in the

global option. Corresponding global option is as follow.

- Nodes of selected k-Plex: Node >> Subset Membership >>

Subset Member Node(s)

- Nodes of non-selected k-Plex: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

 Time Complexity
 O(2^n)

 Reference
 Seidman S and Foster B (1978). A graph theoretic generalization of the clique concept. J or
Math Soc, 6, 139-154.

 Seidman S and Foster B (1978). A note on the potential for genuine cross-fertilization between
anthropology and mathematics. Social Networks 1, 65-72.

 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications,
Cambridge, 1994: p.265, 7.4.1 K-Plexes

 Related Topics

244
II. Analyze

 Analyze >> Cohesion >> Clique

 Analyze >> Cohesion >> n-Clique

 Analyze >> Cohesion >> n-Clan

 Analyze >> Cohesion >> k-Core

245
NetMiner Module Reference

Analyze >> Cohesion >> k-Core

 Menu
Analyze >> Cohesion >> k-Core

 Description
This module finds every k-Core in the selected network. A k-Core is a subgraph in which each node

is adjacent to at least k other nodes in the subgraph. That is, for each node in the sub-graph, minimum

nodal degree within the sub-graph is k.

According to the definition of this algorithm, only undirected and unweighted network can be

analyzed by this algorithm. So you should dichotomize and symmetrize your network before running

the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data.

246
II. Analyze

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘k-Core’ analysis, Main

Report, k-Core Affiliation matrix, k-Core Comembership matrix, k-

Core Bipartite matrix, k-Core Overlap matrix and Clustered Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- k-Core: For each coreness which corresponds to 'k' of 'k-Core', the number of included nodes and

the number of components (k-core subgraphs) are displayed.

 Tables
k-Core Affiliation matrix: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are coreness. Each value means the component of corresponding coreness.

247
NetMiner Module Reference

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

k-Cores.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can explorer the k-cores for each level.

 Core
Select K
Select level (k for exploring k-Core).

Core
After a k-Core Partition Vector is selected in the combo box, the style of nodes on the map is

changed as pre-established style in the global option. Corresponding global option is as follow.

- Nodes of selected k-Core: Node >> Subset Membership >> Subset Member Node(s)

- Nodes of non-selected k-Core: Node >> Subset Membership >> Subset Non-member Node(s)

248
II. Analyze

<Example Screen shot>

 Time Complexity
 O(2^n)

 Reference
 Seidman, S. (1983). "Network structure and minimum degree". Social Networks, 5, pp. 269-
287.

 Related Topics
 Analyze >> Cohesion >> k-Plex

249
NetMiner Module Reference

Analyze >> Cohesion >> Lambda Set

 Menu
Analyze >> Cohesion >> Lambda Set

 Description
‘Lambda Set’ analyzes cohesion structure of a network based on the distribution of vulnerability of

connection (i.e. connectivity) among the nodes. Line-connectivity of a pair of nodes is minimum

number of lines that must be removed to leave no path between two nodes.

A set of nodes is a Lambda Set if any pairs of nodes in the set have greater connectivity than any pair

of nodes consisting of one node from within the lambda set and any second node from outside the

lambda set. Like link connectivity, you should dichotomize your data before running module.

Since getting exact lambda-sets is very time-consuming tasks, we get pseudo-lambda-sets by

clustering a link connectivity matrix.

 Process Flow

250
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

 Post-process
Select Clustering Method: pick one of ‘Single’, ‘Complete’,
‘Average’, Ward’.

- Single: the distance between two lambda sets is determined by the

distance of the two closest nodes (nearest neighbors) in the different

lambda sets.

- Average: the distance between two lambda sets is calculated as the average distance between all

pairs of nodes in the two different lambda sets.

- Complete: the distances between lambda sets are determined by the greatest distance between any

two nodes in the different lambda sets (i.e., by the "furthest neighbors").

- Ward: Lambda set membership is assessed by calculating the total sum of squared deviations from

the mean of a lambda set. The criterion for fusion is that it should produce the smallest possible

increase in the error sum of squares.

251
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Lambda Set’ analysis,

Main Report, Lambda Set Affiliation Matrix, Permutation Vector and

Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Cluster Diagram: On the column, (subscript of) nodes are displayed, and on the row, the level of

association (similarity or dissimilarity) among nodes within lambda sets is displayed. Within a given

level, an 'X' between two adjacent columns indicates that the nodes associated with those columns

were assigned to the same lambda set.

 Tables
Lambda set Affiliation Matrix
It is a 2-mode matrix whose main nodes are

maintained, and sub nodes are lambda sets.

- # clusters: number of clusters in each step

- Fusion level: minimum distance between 2 clusters in each step. 2 clusters with minimum distance

merged in that step.

- Best-cut: larger value means nodes are clustered better. There are normally 4 levels of best-cut

score.

252
II. Analyze

1. bad if score < 1.25

2. normal if 1.25 ≤ score < 2.75

3. good if 2.75 ≤ score < 3.5

4. excellent if 3.5 ≤ score

Permutation Vector
The permutation vector is based on the link connectivity. This vector is corresponding to the order of

nodes in dendrogram.

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

lambda sets.

- Default style: Default style is set by Common option in the Preference >> Node tab.

253
NetMiner Module Reference

 Inspect
This module explores the Lambda Sets at the selected fusion level. Also it calculates link

connectivity between two selected nodes at the selected fusion level.

 Lambda Set
Select Level
Select a fusion level. You can select fusion level in consideration

of the Best cut or the number of clusters.

Select Cluster
The available cluster list in the selection box is determined by the

selection of fusion level in the Select Level area.

After a Cluster is selected in the combo box, the style of nodes on

the map is changed as pre-established style in the global option. Corresponding global option is as

follow.

- Nodes of selected cluster: Node >> Subset Membership >> Subset Member Node(s)

- Nodes of non-selected cluster: Node >> Subset Membership >> Subset Non-member Node(s)

Checking Show Border shows the boundary of each Cluster. The decisions are reflected on the map

just by clicking the Submit button.

<Example Screen shot>

254
II. Analyze

 Link Connectivity
Two Nodes Selection
You can search node by writing some parts of the Node Label in

the keyboard-input area next to “Node Label”. And then you

should choose the nodes listed below the text box, which are the

search result.

After a source node and a target node are selected, the style of

matching two nodes on the map is changed as pre-established

style in the global option. Corresponding global option is as

follow.

- Source Node: Node >> Focus Pair >> 1st Node

- Target Node: Node >> Focus Pair >> 2nd Node

The Link Connectivity Value between two selected nodes is represented in the text box. The

decisions are reflected on the map just by clicking the Submit button.

<Example Screen shot>

255
NetMiner Module Reference

 Time Complexity
 O(2^n)

 Reference
 Borgatti, S.P., M.G. Everett and Shirey, P.R. (1990). "LS sets, lambda sets, and other cohesive
subsets" Social Networks 12: 337-358.

 Related Topics
 Analyze >> Connection >> Link Connectivity

 Statistics >> Cluster

256
II. Analyze

Analyze >> Cohesion >> Community

(Betweenness)

 Menu
Analyze >> Cohesion >> Community >> Betweenness

 Description
Community algorithm based on Link Betweenness (which is called GN algorithm) is suggested by

Girvan and Newman. It computes betweenness centrality of all the links in the network, and find

link(s) with maximum betweenness value and remove it, and then recalculate betweenness of all the

links again. Repeating this procedure until no links remain, gives an order of link. The more links are

removed, the more components appeared and these component is equal to the communities in each

level. The result consists of the hierarchical structure discovered in this process. Like link

betweenness centrality, you should dichotomize your data before running module.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

257
NetMiner Module Reference

non-weighted/binary data.

Symmetrize: You should symmetrize your data before running module. By symmetrizing,
directed/asymmetric data is transformed to undirected/symmetric data.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Community

(Betweenness)’ analysis, Main Report, Community Cluster matrix,

Permutation Vector, Dendrogram and Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Cluster Diagram: On the column, (subscript of) nodes are displayed, and on the row, the level of

association (similarity or dissimilarity) among nodes within communities is displayed. Within a

given level, an 'X' between two adjacent columns indicates that the nodes associated with those

columns were assigned to the same community.

258
II. Analyze

 Tables

Community Cluster Matrix


- # of Clusters: It means the number

of clusters in the given level.

- Fusion Level: minimum distance

between 2 clusters in each step. 2

clusters with minimum distance

merged in that step.

- Best Cut: larger value means nodes are clustered better. There are normally 4 levels of best-cut

score.

1. bad if score < 1.25

2. normal if 1.25 ≤ score < 2.75

3. good if 2.75 ≤ score < 3.5

4. excellent if 3.5 ≤ score

Permutation Vector
This vector is corresponding to the order of nodes in dendrogram,

i.e. dendrogram serial result.

 Charts
Dendrogram

259
NetMiner Module Reference

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

communities.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module explores the communities at the selected fusion level.

 Community
Select Level
Select a fusion level. You can select fusion level in consideration of

the Best Cut or the number of clusters.

Select Cluster
The available cluster list in the selection box is determined by the

selection of fusion level in the Select Level area.

After a Cluster is selected in the combo box, the style of nodes on the

map is changed as pre-established style in the global option. Corresponding global option is as follow.

- Nodes of selected cluster: Node >> Subset Membership >> Subset Member Node(s)

- Nodes of non-selected cluster: Node >> Subset Membership >> Subset Non-member Node(s)

260
II. Analyze

Checking Show Border shows the boundary of each Cluster. The decisions are reflected on the map

just by clicking the Submit button.

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Michelle Girvan and M. E. J. Newman,(2002), "Community structure in social and biological
networks".

 Related Topics
 Analyze >> Connection >> Dependency

 Analyze >> Centrality >> Betweenness Centrality

261
NetMiner Module Reference

Analyze >> Cohesion >> Community

(Modularity)

 Menu
Analyze >> Cohesion >> Community >> Modularity

 Description
This module supports the most popular CNM algorithm introduced by Clauset, Newman and Moore

which maximizes modularity(which was also suggested by Newman) with greedy approach. Its

variants HE, HE', NE(suggested by K. Wakita and T. Tsurumi) are also supported, which run

considerably faster with slightly coarser results.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed

to undirected/symmetric data.

262
II. Analyze

 Main process
Algorithms: CNM is the most popular algorithm and gives stable
result but it is the slowest among all. HE' is said to be more probable to

give better result in faster time, but it is not always the case. HE and

NE are far faster, with coarser results. You can compare algorithms'

reliabilities by comparing their actual modularity results.

Include Nonoptimal Output: The algorithm basically stops if it finds community partition with best
modularity. However, sometimes you may want to see nonoptimal results. In this case, toggle on this

checkbox to use them.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Community

(Modularity)’ analysis, Main Report, Community Partition and

Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Best Modularity: Since this module targets on analysis of large networks, time-consuming measures

are avoided. It only gives the best modularity it has reached, and the value stands for how much the

operation was successful. Maximum value of Modularity is 1.

263
NetMiner Module Reference

 Tables
Community Partition
- # of Communities: The number of communities in given steps

- Step #: The number of steps it had run to get the result.

- Modularity: The measure suggested by Newman, which measures how much community partition

was successful.

- Community Partition: In given step, the partitions of nodes are showed.

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

community.

- Default style: Default style is set by Common option in the Preference >> Node tab.

264
II. Analyze

 Inspect
This module explores the communities at the selected modularity.

 Community
Select Level
Select a level of modularity. You can select this level in

consideration of the Bestcut or the number of clusters.

Select Community
The available cluster list in the selection box is determined by the

selection of modularity in the Select Level area.

After a Cluster is selected in the combo box, the style of nodes on the

map is changed as pre-established style in the global option. Corresponding global option is as follow.

Nodes of selected cluster: Node >> Subset Membership >> Subset Member Node(s)

Nodes of non-selected cluster: Node >> Subset Membership >> Subset Non-member Node(s)

Checking Show Border shows the boundary of each Cluster. The decisions are reflected on the map

just by clicking the Submit button.

<Example Screen shot>

265
NetMiner Module Reference

 Time Complexity
 It relies on highly heuristic approach, so its exact complexity is not known. But it is certainly
faster than pure CNM algorithm with complexity O(m * nlog n)

 Reference
 Ken Wakita and Toshiuki Tsurumi, "Finding Community Structure in Megascale Social
Networks".

 Related Topics
 Analyze >> Cohesion >> Betweenness Community

266
II. Analyze

Analyze >> Cohesion >> Community

(Eigenvector)

 Menu
Analyze >> Cohesion >> Community >> Eigenvector

 Description
This module is an implementation of a community algorithm, which splits the input network

successively with leading eigenvector of the modularity matrix. The modularity matrix is equal to

'the input matrix' - 'the expected value matrix when degree is constrained'.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data. So you are forced to give dichotomize option.

 Main process
Analysis Option: Select Directional or Undirectional. If you select undirectional, you should
symmetrize your data before running module.

267
NetMiner Module Reference

Include Nonoptimal Output: The algorithm basically stops if it


founds community partition with best modularity. But in sometimes

you may want to see nonoptimal results. If you'd like to do so, toggle

on this checkbox to use them.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of

‘Community(Eigenvector)’ analysis, Main Report, Community

Partition Table and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Best Modularity: Since this module targets on analysis of large

networks, time-consuming measures are avoided. It only gives the best modularity it has reached, and

it stands for how much the operation was successful. Maximum value of Modularity is 1.

 Tables
Community Partition
- # of Communities: The number of communities in given steps

- Step #: The number of steps it had run to get the result.

- Modularity: The measure suggested by Newman, which

measures how much community partition was successful.

- Community Partition: In given step, the partitions of nodes are

268
II. Analyze

showed.

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered-Eades algorithm. Nodes are clustered

by communities.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module explores the communities at the selected modularity.

 Community
Select Level
Select a level of modularity. You can select level in consideration of

the Best-cut or the number of clusters.

Select Community
The available cluster list in the selection box is determined by the

selection of modularity in the Select Level area.

269
NetMiner Module Reference

After a Cluster is selected in the combo box, the style of nodes on the map is changed as pre-

established style in the global option. Corresponding global option is as follow.

- Nodes of selected cluster: Node >> Subset Membership >> Subset Member Node(s)

- Nodes of non-selected cluster: Node >> Subset Membership >> Subset Non-member Node(s)

Checking Show Border shows the boundary of each Cluster. The decisions are reflected on the map

just by clicking the Submit button.

<Example Screen shot>

 Time Complexity
 O((m+n)nlogn)

 Reference
 M.E.J. Newman "Finding community structure in networks using the eigenvectors of matrices"

 M.E.J. Newman "Modularity and community structure in networks"

 E. A. Leicht and M. E. J. Newman, 2008. Community Structure in Directed Networks. Phys.


Rev. Lett.100,118703.

 Related Topics
 Analyze >> Cohesion >> Betweenness Community

 Analyze >> Cohesion >> Community (Modularity)

270
II. Analyze

Analyze >> Cohesion >> Community (Label


Propagation)

 Menu
Analyze >> Cohesion >> Community >> Label Propagation

 Description
This community algorithm, proposed by Usha Nandini Raghavan, Reka Albert, and Soundar Kumara,

utilizes ‘label-propagation’ in order to detect the communities in a network. It begins with assigning

unique labels to all the nodes. Then, each node seeks for the most frequently found label among the

labels of its adjacent nodes. The most frequently found label becomes the new label of that node.

After this process is done for the first time, the labels of some nodes will not be same to the most

frequently found labels from the adjacents'. This problem occurs because, after a certain node took

the new label (which was the most frequent at that time), the labels of some adjacent nodes may be

changed to other labels. It results that the newly taken label is no more the majority at the end of one

process. Therefore, this process is iterated until this discrepancy does not exist, for all the nodes.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

271
NetMiner Module Reference

unweighted/binary data. So you are forced to give dichotomize option.

Symmetrize: You should symmetrize your data before running module. By symmetrizing,
directed/asymmetric data is transformed to undirected/symmetric data.

 Main process
# of Iterations: A user is able to decide the maximum number of
iterations. Despite the end criterion not being satisfied, the iteration

will stop when this maximum number is reached. Therefore, the user

may wonder if the analyzing was terminated due to this limitation without satisfying the end criterion.

Whether it terminated because of reaching maximum number of iterations can be found in the

“Output Summary” of Main Report output.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Community (Label

Propagation)’ analysis, Main Report, Community Partition and

Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Modularity: This modularity is calculated on the network partitioned with the communities of the

result. This value stands for how much the operation was successful. Maximum value of Modularity

is 1.

- # of Iteration executed : This value represents how many times the module actually repeated the

iteration.

- Is the Stop Criterion Satisfied : As previewed in the Description, the termination of the algorithm

may be classified into two class; one for the stop criterion having been satisfied and the other for the

maximum # of iteration having been reached without the stop criterion accomplished. In the former

272
II. Analyze

case, this value will be “true”, and for the latter, “false”.

 Tables
Community Partition
- # of Communities: The number of communities in given steps

- Step #: The number of steps it had run to get the result.

- Modularity: The measure suggested by Newman, which

measures how much community partition was successful.

- Community Partition: In given step, the partition of each node

is showed.

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered Eades algorithm. Nodes are clustered by

communities.

- Default style: Default style is set by Common option in the Preference >> Node tab.

273
NetMiner Module Reference

 Time Complexity
 O(m)

 Reference
 U. N. Raghavan, R. Albert, and S. Kumara “Near linear time algorithm to detect community
structures in large-scale networks”

 Related Topics
 Analyze >> Cohesion >> Community

274
II. Analyze

Analyze >> Cohesion >> Community (Blondel)

 Menu
Analyze >> Cohesion >> Community >> Blondel

 Description
This community algorithm, proposed by Vincent D Blondel, Jean-Loup Guillaume, Renaud

Lambiotte and Etienne Lefebvre based on "two phases" that are repeated iteratively in order to detect

the communities in a network. First phase is performed iteratively until a local maximum of

modularity is reached. In second phase, when a local maximum has been attained, building a new

network whose nodes are the communities. It finds high modularity partitions of large networks in a

short time and that unfolds a complete hierarchical community structure for the network, thereby

giving access to different resolutions of community detection.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data. So you are forced to give dichotomize option.

Symmetrize: You should symmetrize your data before running

275
NetMiner Module Reference

module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

 Main process
First phase Iteration: In order to decrease the overall running time of
the method it is possible to introduce a iteration time and then stop the

first phase as soon as the relative gain in modularity.

Include Non-optional Output: The algorithm basically shows only


community partition with best modularity. However, sometimes you may want to see non-optimal

results. In this case, toggle on this checkbox to use them.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Community (Blondel)’

analysis, Main Report, Community Partition and Clustered Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- [Link]: This modularity is calculated on the network partitioned with the communities of

the result. This value stands for how much the operation was successful. Maximum value of

Modularity is 1.

 Tables
Community Partition
- # of Clusters: It means the number of clusters in the given level.

- Fusion Level: It means depth of clustering hierarchy in this

276
II. Analyze

module.

- Modularity: Larger value means nodes are clustered better.

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered Eades algorithm. Nodes are clustered by

communities.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
The selected cluster will be represented on the network map as follows.

277
NetMiner Module Reference

 Time Complexity
 It relies on highly heuristic approach, so its exact complexity is not known. But it is certainly
faster than pure CNM algorithm with complexity O(m * nlog n)

 Reference
 Fast unfolding of communities in large networks, Vincent D. Blondel et al. J. Stat. Mech.
P10008(2008)

 Related Topics
 Analyze >> Cohesion >> Community

278
II. Analyze

Analyze >> Cohesion >> Cohesive Block

 Menu
Analyze >> Cohesion >> Cohesive Block

 Description
This module analyzes hierarchical (nested) cohesive subgroups made by removing ‘node cutsets’

recursively. Cohesive Block requires input network to have only one component.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running module.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running module.


By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Cohesive Block’ analysis, Main Report, Cohesive Block Member and Spring Map

are created.

279
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Members of vertex cut are shown.

 Tables
Cohesive Block Member: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are cohesive blocks.

280
II. Analyze

 Maps
Spring Map
- Default Layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

communities.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect

 Cohesive Block
It shows the hierarchical cohesive subgroups made by removing

cutsets recursively. After a Cohesive Block(Component) is selected,

cutsets of that block is showed in ‘Cutsets’ Inspect item.

 Cutsets
The style of nodes is changed by setting of ‘Edit >> Preference’.

Cutset member: Node tab >> Focus Node >> Related Nodes

Component member: Node tab >> Focus Node >> Focal Node

281
NetMiner Module Reference

<Example Screen shot>

 Time Complexity
 Exponential (n)

 Reference
 James Moody, Douglas R. White. 5/9/2001. Structural Cohesion and Embeddedness: A
hierarchical conception of social groups.

 Related Topics
 Analyze >> Connection >> Node Connectivity

 Analyze >> Connection >> Minimum Cutset

282
II. Analyze

Analyze >> Cohesion >> s-Clique

 Menu
Analyze >> cohesion >> s-Clique

 Description
A group is s-Clique, if it has local maximal SMI (Segregation Matrix Index). That a group G has

local maximal SMI means that no other group has higher SMI value. In addition, no other group has

the same SMI value with one more element or one less element than G. For more information

regarding SMI, please refer to the part about Analyze >> Properties >> Group.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

283
NetMiner Module Reference

 Main process
Maximum Size of s-Clique: Only s-Cliques whose number of
vertices should be equal to or less than the ‘maximum size of s-clique’

are reported.

Minimum size of s-Clique: Only s-Cliques with vertices more than minimum size of component are
reported.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed. In the result of ‘s-Clique’ analysis, Main

Report, s-Clique Affiliation Matrix, s-Clique Comembership Matrix,

s-Clique Bipartite Matrix, s-Clique Overlap Matrix and Clustered Map

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of s-Cliques: The number of s-Cliques that exist in the network.

- Members of s-Cliques: It shows names of nodes affiliated with each s-Clique.

- Subgroup Details: Show size, density, cohesion index value with respect to s-clique. Size of s-

Clique is the number of nodes affiliated with the s-Clique. Density is computed by (the number of

links present / the number of maximal possible links). Cohesion index is defined only for undirected

graph; It is computed by [the number of internal ties (s-Clique a -> s-Clique a) / the number of

external ties(s-Clique a -> external nodes)].

284
II. Analyze

 Tables

s-Clique Affiliation matrix: It is a 2-mode matrix whose main nodes are maintained, and sub nodes
are s-cliques.

s-Clique Co-membership matrix: It is a (Main Node by Main Node) 1-mode matrix whose cell
means the number of co-membered s-cliques between two nodes.

s-Clique Bipartite matrix: (Main Node + s-clique) by (Main Node + s-clique) matrix. It contains s-
clique affiliation matrix as a sub block.

s-Clique Overlap matrix: It is a (s-clique by s-clique) 1-mode matrix whose cell means the number
of nodes overlapped by a pair of s-cliques.

285
NetMiner Module Reference

 Maps
Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

s-Cliques.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the s-Cliques which are found by analysis on the map.

 Select s-Clique
After a s-Clique is selected in the combo box, the style of nodes on

the map is changed as pre-established style in the global option.

Corresponding global option is as follow.

- Nodes of selected s-Clique: Node >> Subset Membership >> Subset

Member Node(s)

286
II. Analyze

- Nodes of non-selected s-Clique: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

 Time Complexity
 O(nCm) (exponential), n = # nodes in network, m = maximum size of S-Clique

 Reference
 M. Fershtman, 1997. Cohesive group detection in a social network by the segregation, Social
Networks 19, 193-207

 Related Topics
 Analyze >> Properties >> Group

287
NetMiner Module Reference

Analyze >> Centrality >> Degree

 Menu
Analyze >> Centrality >> Degree

 Description
This module measures centrality of a network structure based on degree (of connections). Degree

centrality is computed simply by the portion of nodes that are adjacent to each node.

sum[ weight of incident links]


Degree centrality of node =
# nodes  1
In a directed network, in-degree centrality is the portion of nodes that are adjacent to each node, and

out-degree centrality is the portion of nodes that are adjacent from each node. If 1-mode Network is

weighted, weighted degree centrality is computed.

Degree Centralization index is a measure of variability of individual centrality scores. The larger

degree centralization index is the more centralized that network is. It is computed by following

equation.

 (max . centrality  node' s centrality)


every nodes
Degree centralization index =
# nodes  2
Note that in weighted 1-mode Network, degree centralization index may be larger than 1.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

288
II. Analyze

pair, you should decide how to merge them to a single link.

 Main Process
- # of links: The degree of each node is the number of links which are

incident from the node.

- Sum of weight: The degree of each node is weight sum of links which

are incident from the node.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Degree’ analysis,

Main Report, Degree Centrality Vector Spring Map and Concentric

Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Degree Centrality Scores: Sum, mean, standard deviation, Minimum, Maximum of

Degree Centrality are reported for each in-degree and out-degree.

- Network Degree Centralization Index: Degree Centralization Index is reported for each In-Degree

and Out-Degree.

289
NetMiner Module Reference

 Tables
Degree Centrality Table
This vector shows the degree centrality value for each

node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

290
II. Analyze

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
With this item, user’s able to show Degree Centrality of each node on the map.

 Choose Direction (Spring Map, Concentric Map)


The network map is redrew by the selected direction of Degree

Centrality.

<Example Screen shot>

291
NetMiner Module Reference

 Time Complexity
 O(m)

 Reference
 Freeman L C (1979). "Centrality in Social Networks: Conceptual clarification", Social
Networks 1, 215-239.

 Related Topics
 Analyze >> Neighbor >> Degree

292
II. Analyze

Analyze >> Centrality >> Coreness

 Menu
Analyze >> Centrality >> Coreness

 Description
Coreness Centrality evaluates the maximal core number of each node, which means the maximum

value of k when some 'k-core's include that node.

Unweighted and undirected network is needed to compute Coreness Centrality. So you should

dichotomize and symmetrize your data before running module.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

And if you symmetrize your data, algorithm will perform faster.

293
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Coreness Centrality’

analysis, Main Report, Coreness Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Coreness Centrality Vector
This vector shows the coreness centrality value for each node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

294
II. Analyze

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

295
NetMiner Module Reference

 Inspect
Coreness Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(m)

 Reference
 Vladimir Batagelj, Matjaz Zaversnik, An O(m) algorithm for Cores Decomposition of Networks,
2002

 Related Topics
 Analyze >> Cohesion >> k-Core

296
II. Analyze

Analyze >> Centrality >> Closeness

 Menu
Analyze >> Centrality >> Closeness

 Description
This module analyzes centrality of a network structure based on geodesic distances among the nodes.

Closeness centrality is measured by the inverse of the sum of distances from a node to all the other

nodes, which is then normalized by multiplying it by (n-1).

For a directed network, each of in-closeness centrality and out-closeness centrality is measured

separately, depending on whether the distances computed by in-path or out-path, respectively.

For a disconnected network, closeness centrality cannot be defined. In this case you have the option

(which is default) to ignore unreachable nodes, i. e. to confine only to reachable domain of each node.

If you decide not to ignore unreachable pair of nodes, then their distance is assumed to be maximum

geodesic distance between reachable pairs plus 1.

Closeness Centralization Index is a measure of variability of individual closeness centrality scores.

This index shows closeness centralization score and its value is in 0 to 1. If every node has same

closeness centrality value, the closeness centralization index is 0. If one node has especially large

centrality value as in, for example, a star graph, the closeness centralization index is 1.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

297
NetMiner Module Reference

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Main process
Ignore Unreachable: If you check ‘Igonore Unreachable’ check box,
unreachable nodes are ignored. In this case, closeness centrality is

defined only for the biggest component.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Closeness Centrality’

analysis, Main Report, Closeness Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Closeness Centrality Scores: Sum, mean, standard deviation, Minimum, Maximum

of Closeness Centrality are reported for each in-closeness and out-closeness.

- Network Closeness Centralization Index: Closeness Centralization Index is reported for each In-

Closeness and Out-Closeness.

298
II. Analyze

 Tables
Closeness Centrality Vector
This vector shows the closeness centrality value for each

node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

299
NetMiner Module Reference

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the closeness centrality value of each node on the map.

 Choose Direction (Spring Map, Concentric Map)


The network map is redrawn by the selected direction of Closeness Centrality.

300
II. Analyze

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Freeman L C (1979). "Centrality in Social Networks: Conceptual clarification", Social
Networks 1, 215-239.

 Related Topics
 Analyze >> Connection >> Shortest Path

301
NetMiner Module Reference

Analyze >> Centrality >> Decay

 Menu
Analyze >> Centrality >> Decay

 Description
This module analyzes the centrality of a network structure based on geodesic distances among the

th
nodes. Decay centrality of node is defined as ( where means

the set of all nodes and means the geodesic distance from ith node to jth node. If jth node is

disconnected from ith node, hence .

For a directed network, in-decay centrality and out-decay centrality are measured separately since the

distances are computed by in-path and out-path respectively.

Decay Centralization Index is a measure of variability of individual decay centrality scores and can
have a value between 0 and 1. If every node has the same decay centrality value, the decay

centralization index is 0. If the centrality value of one node is significantly larger than that of other

nodes as in, for example, a star graph, the decay centralization index is 1.

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only
choose one 1-mode network.

 Link Merge: When selected data contains multiple


links, where more than two links connect the same

302
II. Analyze

source node and target node pair, a user should decide how to merge them into a single link.

 Pre-process
Dichotomize: If a user dichotomizes the weight, geodesic
distances are calculated based on the number of links

between two nodes. Otherwise, the weight of links is

regarded as the distance from a source node to a target node.

In this case, geodesic distances are calculated based on the sum of the links’ weights between two

nodes.

 Main process

Base (delta):Designates value between 0 and 1.

 As approaches to 1, the distance between nodes become unimportant and decay centrality

depends on the number of nodes reachable from each node.

 As approaches to 0, decay centrality gives more weight to closer nodes.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Decay Centrality’ analysis, ‘Main

Report’, ‘Decay Centrality Vector’, ‘Spring Map’ and

‘Concentric Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

303
NetMiner Module Reference

 Distribution of Decay Centrality Scores: ‘Sum’, ‘mean’, ‘standard deviation’, ‘minimum’ and

‘maximum of Decay Centrality’.

 Network Decay Centralization Index: Decay centralization index is reported for each in-

decay and out-decay.

 Tables
Decay Centrality Vector: This vector
contains the decay centrality value for

each node.

 Maps

Spring Map
 Default Layout: Kamada&

Kawai algorithm (Spring

>>Kamada& Kawai) is used

to draw a map by default.

 Default Style: The default

style is set according to

‘Common’ option in

‘Preference >> Node’ tab.

The size of a node on the

map is proportional to its centrality score (e.g. a node with the highest centrality score will be

depicted as the biggest node on a map)

304
II. Analyze

Concentric Map
 Default Layout: Concentric algorithm (Circular >> Concentric) is used to draw a map. The

higher the centrality score of a node, the closer the node to a center.

 Default Style: The default style is set according to ‘Common’ option in ‘Preference >> Node’

tab.

 Inspect
A user can see the decay centrality value of each node on a map.

Choose Direction (Spring Map, Concentric Map)


The network map is re-drawn by considering the selected direction of decay centrality.

305
NetMiner Module Reference

 Time Complexity
 O(m * n * log(n))

 References
 Matthew O. Jackson, (2008), Social and Economic Networks, Princeton University Press, p. 39.

 Related Topics
 Analyze >> Connection >> Shortest Path

 Analyze >> Centrality >> Closeness

306
II. Analyze

Analyze >> Centrality >> Percolation

 Menu
Analyze >> Centrality >> Percolation

 Description
The Percolation Centrality is defined for a given node, at a given time, as the proportion of

‘percolated paths’ that go through that node. A ‘percolated path’ is a shortest path between a pair of

nodes, where the source node is percolated (e.g., infected). The target node can be percolated or non-

percolated, or in a partially percolated state.

where is total number of shortest paths from node to node and is the number of

those paths that pass through . The percolation state of the node at time is denoted by and

two special cases are when which indicates a non-percolated state at time whereas

when which indicates a fully percolated state at time . The values in between indicate

partially percolated states ( e.g., in a network of townships, this would be the percentage of people

infected in that town).

The attached weights to the percolation paths depend on the percolation levels assigned to the source

nodes, based on the premise that the higher the percolation level of a source node is, the more

important are the paths that originate from that node. Nodes which lie on shortest paths originating

from highly percolated nodes are therefore potentially more important to the percolation. The

definition of PC may also be extended to include target node weights as well.

307
NetMiner Module Reference

 User Options
 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network at once.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before
running module. By dichotomizing, weighted/valued data is

transformed to unweighted/binary data

 Main process
Percolation state: When a user chooses this option, fully percolate state in each node is set to
attribute of vector which is selected by a user.

308
II. Analyze

Max Time Step: Maximum time step to percolate transmission.

Transmission probability: A percolated node affects the node with a link on the transmission
probablility.

Initial Adopter: In the case of "Selection", selected nodes are regarded as initial adopters. In the case
of "Attribute", nodes whose selected attribute values are not equal to 0 are regarded as initial

adopters. In the case of "Random", given number of nodes are selected randomly using given seed

value and regarded as initial adopters.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Centrality >> Percolation’ analyze module, Main Report, Correlation Table and

Correlation Significance are created.

309
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Number of Timestep : Percolation transmission times.

- Distribution of Percolation Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Betweenness Centrality are reported for percolation centrality scores at each timestep

 Tables

310
II. Analyze

Percolation Centrality Vector


This vector shows the Percolation centrality value for each node at each timestep

Percolation State Vector

This vector shows the Percolation state value for each node at each timestep

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher position is presented bigger and percolation state is presented by color on the map.

311
NetMiner Module Reference

 Inspect
This module explores the influence flow among the nodes on the network by timestep. A user can

recognize the percolation state by a node color.

 Timestep
Selecting a timestep in the Timestep Combo Box makes the node size of the nodes on the network

map change to represent the influence of a node at the selected timestep by the size and color of a

node.

The timestep selection buttons below the combo box make the timestep by timestep shift convenient

as follows.

: Shift to the first timestep

312
II. Analyze

: Shift to the previous timestep

: Shift to the next timestep

: Shift to the last timeste

 Time Complexity
 O(n^3) , where n is number of nodes.

 Reference
 Piraveenan, Mahendra (2013). "Percolation Centrality: Quantifying Graph-Theoretic Impact of

Nodes during Percolation in Networks". PLOS ONE 8 (1): e53095.

 Ulrik Brandes, A Faster Algorithm for Betweenness Centrality. Journal of Mathematical

Sociology 25(2):163-177, 2001.

 Related Topics
 Analyze >> Centrality >> Betweeness>>Node

313
NetMiner Module Reference

Analyze >> Centrality >> Betweenness >> Node

 Menu
Analyze >> Centrality >> Betweenness >> Node

 Description
This module analyzes centrality of a network structure based on pair-dependency among its nodes.

Betweenness Centrality is measured by the extent to which a node lies between all other pair of

nodes on their geodesic paths. Therefore, the more times a node appears in the paths, the higher

centrality it has. Since the geodesic paths used by this algorithm ignore the weight of links, it is

necessary to dichotomize your data if weighted.

Betweenness Centralization Index is a measure of variability of individual betweenness centrality

scores. This index shows the centralization score and its value is in 0 to 1. If every node has same

centrality value, the centralization index is 0. If one node has especially big centrality value as in, for

example, a star graph, the centralization index is 1.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running module. By dichotomizing,
weighted/valued data is transformed to unweighted/binary data.

Symmetrize: You can symmetrize your data before running module. By symmetrizing,

314
II. Analyze

directed/asymmetric data is transformed to undirected/symmetric data.

And if you symmetrize your data, algorithm will perform faster.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Betweenness

Centrality >> Node’ analysis, Main Report, Node Betweenness

Centrality Vector, Spring Map and Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Betweenness Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Betweenness Centrality are reported for each in-betweenness and out-betweenness.

- Network Betweenness Centralization Index: Betweenness Centralization Index is reported for each

in-betweenness and out-betweenness.

 Tables
Betweenness Centrality Vector
This vector shows the betweenness centrality value for

each node.

315
NetMiner Module Reference

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

316
II. Analyze

 Inspect
Node Betweenness Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(n^3)

 Reference
 Freeman L C (1979). "Centrality in Social Networks: Conceptual clarification", Social
Networks 1, 215-239.

 Ulrik Brandes, A Faster Algorithm for Betweenness Centrality. Journal of Mathematical


Sociology 25(2):163-177, 2001.

 Related Topics
 Analyze >> Connection >> Dependency

317
NetMiner Module Reference

Analyze >> Centrality >> Betweenness >> Link

 Menu
Analyze >> Centrality >> Betweenness >>Link

 Description
Link Betweenness Centrality is measured by the extent to which a link lies between all other pair of

nodes on their geodesic paths. So, the more times a link appears in the paths, the higher centrality it

has.

Since the geodesic paths used by this algorithm ignore the weight of links, you should dichotomize

your data if your data is weighted.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You can symmetrize your data before running module.


By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data, algorithm will perform faster.

318
II. Analyze

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Betweenness

Centrality >> Link’ analysis, Main Report, Link Betweenness

Centrality Matrix and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Link Betweenness Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Link Betweenness Centrality are reported.

 Tables
Link Betweenness Centrality Matrix
This matrix shows the link betweenness centrality value for each node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

319
NetMiner Module Reference

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module determines to shows or hides links based upon selected threshold. If link betweenness

centrality of a link is lower than threshold value, it is hidden.

 Threshold
Selecting Threshold Level using Threshold slider will show or hide

each link on the network map by comparing the Link Betweenness

Centrality against the selected Threshold Level as follows:

- Link Betweenness Centrality > Threshold: Link >> Common >> Normal

- Link Betweenness Centrality < Threshold: Link >> Line >> Fade State

After Show Score box is checked, the Link Betweenness Centrality value is shown near each link on

the network map.

<Example Screen shot>

320
II. Analyze

 Time Complexity
 O(n^3)

 Reference
 Freeman L C (1979). "Centrality in Social Networks: Conceptual clarification", Social
Networks 1, 215-239.

※ We would like to express special thanks to Professor Hawoong Jeong for his kind help to

implement this algorithm.

 Related Topics
 Analyze >> Connection >> Dependency

321
NetMiner Module Reference

Analyze >> Centrality >> Flow Betweenness

 Menu
Analyze >> Centrality >> Flow Betweenness

 Description
The size of the influence of a node 'A' on the flow between two nodes is defined as 1 - [the maximum

flow between two nodes when 'A' is removed from the network / the maximum flow between two

nodes]. The flow betweenness centrality of 'A' is the sum of the values defined previously for all pair

of nodes while none of them is 'A'.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Flow

Betweenness Centrality’ analysis, Main Report, Flow Betweenness

Centrality Vector, Spring Map and Concentric Map are created.

322
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Flow Betweenness Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Flow Betweenness Centrality are reported.

 Tables
Flow Betweenness Centrality Vector
This vector shows the flow betweenness centrality

value for each node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

323
NetMiner Module Reference

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
Flow Betweenness Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(n^4)

 Reference
 Freeman L C, Borgatti S P and White D R (1991). 'Centrality in valued graphs: A measure of
betweenness based on network flow'. Social Networks 13, 141-154.

 Related Topics
 Analyze >> Connection >> Maximum Flow

324
II. Analyze

Analyze >> Centrality >> R.W. Betweenness

 Menu
Analyze >> Centrality >> R.W. Betweenness

 Description
Random-walk betweenness measures the frequency to which a node lies on the random-walk path

between all other pair of nodes. It counts how often a node is traversed by a random walk between

other two nodes. Compared to the Betweenness centrality only considering the shortest path,

Random-walk Betweenness centrality counts all walks as a potential path.

With this algorithm, only undirected and unweighted network can be analyzed due to the definition

of algorithm. So, you should dichotomize and symmetrize your data before running this module.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data, algorithm will perform faster.

325
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Random-Walk

Betweenness Centrality’ analysis, Main Report, Random-Walk

Betweenness Centrality Vector, Spring Map and Concentric Map are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Random-Walk Betweenness Centrality Scores: Sum, mean, standard deviation,

Minimum, Maximum of Random-Walk Betweenness Centrality are reported.

 Tables
Random-Walk Betweenness Centrality Vector
This vector shows the random-walk betweenness centrality value for each node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

326
II. Analyze

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

327
NetMiner Module Reference

 Inspect
R.W. Betweenness Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(n * m)

 Reference
 [Link], 2003, A measure of betweenness centrality based on random walks.

 Related Topics
 Analyze >> Centrality >> Betweenness Centrality >> Node

328
II. Analyze

Analyze >> Centrality >> Information

 Menu
Analyze >> Centralitiy > information

 Description
This module considers link (i, j) as a channel transmitting signal from i to j, and compute the

information centrality of node i, which is the sum of the extent of all information along all paths from

i to every node j. Here the size of the information of a path is inversely proportion to the distance of a

path because the information is normally inversely proportion to the variance of estimators and the

variance is proportion to the distance of the path to be used.

If there are some isolate points, information centrality is meaningless. In this case, during Pre-

processing, remove isolate points, and set their centrality 0. Similarly, if there are two or more

components in network, also information centrality again becomes meaningless (because information

of pair having no path is zero and equations collapses). Therefore, make sure that there must be

single component in the input network.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Symmetrize: You should symmetrize your data before running

329
NetMiner Module Reference

module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

And if you symmetrize your data, algorithm will perform faster.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Information Centrality’

analysis, Main Report, Information Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Information Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Information Centrality are reported.

 Tables
Information Centrality Vector
This vector shows the information centrality value for

each node.

330
II. Analyze

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

331
NetMiner Module Reference

 Inspect
Information Centrality module does not have an Inspect Control Item.

 Time Complexity
 O(n^3)

 Reference
 Stephenson K and Zelen M, 1989. 'Rethinking Centrality: Methods and Examples', Social
Networks 11. pp.1 – 37

 Related Topics

332
II. Analyze

Analyze >> Centrality >> Load

 Menu
Analyze >> Centrality >> Load

 Description
When one unit data packet travels along shortest path between each pair of nodes, Load of node k is

total amount of data packet through k. In detail, when one unit data packet travels from node A to

node B, the path can be divided at some branching point. For example, if node C has a link to node D

and another link to node E, the data branches from node C. (In this case, C is called ‘branching

point’.) When data packet is divided by the number of branches at the node C, load value of node C

increases by amount of data packet passing through the node. And final load value of a node is

computed by sum of every node pairs’ load values. According to the definition of this algorithm, you

should dichotomize your data before running the algorithm.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

333
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Load Centrality’

analysis, Main Report, Load Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Load Centrality Scores: Sum, mean, standard deviation, Minimum, Maximum of

Load Centrality are reported.

 Tables
Load Centrality Vector
This vector shows the Load Centrality value for each node.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

334
II. Analyze

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
Load Centrality module doesn’t have an Inspect Control Item.

335
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference
 K. I. Goh, B. Kahng, and D. Kim. (2001). "Universal load distribution in scale-free networks".
Phys. Rev. Lett. 87, 278701

 K. I. Goh, E.S. Oh, H. Jeong, B. Kahng, and D. Kim, "Classification of scale-free networks",
Proc. Natl. Acad. Sci. U.S.A. 99, 12583-12588 (2002,Oct).

 Related Topics
 Analyze >> Connection >> Shortest Path

336
II. Analyze

Analyze >> Centrality >> Eigenvector

 Menu
Analyze >> centrality >> Eigenvector

 Description
This module analyzes centrality structure of a network based on iteratively weighted degree of the

nodes. Eigenvector Centrality, as defined by Bonacich (1972), of a node is (recursively) proportional

to the sum of eigenvector centralities of the nodes to which it is connected. It is calculated by

computing principal eigenvector (which has the biggest eigenvalue among every eigenvectors.) of

input 1-mode Network. Since by its definition, eigenvector can be computed only in symmetric

matrix, make the 1-mode network symmetrized if directed.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Symmetrize: You should symmetrize your data before running
module. By symmetrizing, directed/asymmetric data is transformed

to undirected/symmetric data. And if you symmetrize your data,

algorithm will perform faster.

337
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Eigenvector

Centrality’ analysis, Main Report, Eigenvector Centrality Vector,

Reflected/Derived/Constant Table, Spring Map and Concentric Map

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Eigenvector Centrality Scores: Sum, mean, standard deviation, Minimum,

Maximum of Eigenvector Centrality are reported.

 Tables
Eigenvector Centrality Vector
This vector shows the eigenvector centrality value for each

node.

Reflected/Derived/Constant Table
We quote description about decomposition

into reflected and derived parts of eigenvector

centrality from Mizruchi’s paper.

Although part of the centrality that i acquires

from j is based on j’s centrality, j’s centrality is also based on i’s centrality. That is, unit i sends some

of its own centrality to unit j at step 1 and then receives some of it back at step 2. We call this

338
II. Analyze

component of centrality as reflected centrality. The remainder of the centrality that unit i receives

from unit j is purely a result of j’s centrality. We call this as component derived centrality.

Constant part may be understood as initial scores of units.

As a result, eigenvector centrality of a node is same as ‘reflected part + derived part + constant part’.

That is, there are two reason that eigenvector centrality of node i can be high. First, it is because of

high degree of nodes, which are connected to i directly or indirectly. And second, the degree of i is so

high that eigenvector centrality of i’s neighbors gets higher, and that makes i’s eigenvector centrality

higher again. For example, you can use ‘constant part + derived part’ to know nodes’ eigenvector

centrality values removing the recursive influence of their own degree.

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

339
NetMiner Module Reference

 Inspect
Eigenvector Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(n^3)

 Reference
 Bonacich P (1972). Factoring and Weighting Approaches to status scores and clique
identification. Journal of Mathematical Sociology 2, 113-120.

 Tony Tam. Demarcating the boundaries between self and the social: The anatomy of centrality
in social networks. Social Networks Volume 11, Issue 4, Pages 315-403 (December 1989).

Pages 387-401

 Mizruchi, M.S., P. Mariolis, M. Schwartz and B. Mintz. “Techniques for disaggregating


centrality scores in social networks.” In Tuma, N.B. (ed.), Sociological Methodology 1986, pp.

26-48. Washington, DC: American Sociological Association.

 Related Topics

340
II. Analyze

Analyze >> Centrality >> Status

 Menu
Analyze >> Centrality >> Status

 Description
This is an implementation of Katz Status Centrality and Hubbell Status Centrality algorithm.

These centralities consider every walks (even up to infinite length walks) between focus node and

pair nodes. If a focus node has more walks, it may have larger centrality value. Surely, contribution

of walks attenuated by (attenuation factor) ^ (length of walk).

Hubbell centrality contains initial status of a node, on the other hand Katz centrality contains only

effects from its neighbor’s status.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Main process
Type of Status: select which algorithm should be used, Katz or
Hubbell.

Attenuation Factor (0 < beta < 1): default value = 0.5. Ideally
inputted attenuation factor must be less than the reciprocal of the

principal eigenvalue. As we do not know the value of eigenvalue in advance, program calculates

341
NetMiner Module Reference

eigenvalue and recodes the input properly. That is, the program recodes 1 to 1/principal eigenvalue,

and value smaller than 1 to value/principal eigenvalue. Recoded attenuation parameter is displayed in

the report.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Status Centrality’

analysis, Main Report, Status Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Recoded Attenuation Factor: Attenuation factor divided by the principal eigenvalue is reported.

- Distribution of Status Centrality Scores: Sum, mean, standard deviation, Minimum, Maximum of

Status Centrality are reported.

 Tables
Status Centrality Vector
This vector shows the status centrality value for each node.

342
II. Analyze

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

343
NetMiner Module Reference

 Inspect
You can see the status centrality value of each node on the map.

 Choose Direction (Spring Map, Concentric Map)


Select In Status or Out Status. The size of nodes changes by the selected option.

<Example Screen shot>

344
II. Analyze

 Time Complexity
 O(n^3)

 Reference
 Hubbell C H (1965). "An input-output approach to clique identification". Sociometry, 28,
pp377-399

 Katz L (1953). "A new status index derived from sociometric data analysis". psychometrika,
18, pp34-43.

 Related Topics

345
NetMiner Module Reference

Analyze >> Centrality >> Power

 Menu
Analyze >> Centrality >> power

 Description
This module computes Philip Bonacich’s power centrality. In equation form, computes (I - beta *

A^t)^(-1) x A*E, when A is a sociomatrix, E is 1 vector. If user wants that power of node connected

by powerful node (which has high centrality) becomes less powerful, user must select negative beta.

If opposite case, user should select positive beta. Size and magnitude of beta should be decided by

the character of the network.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Main process
Attenuation Factor (-1 < beta < 1) : default value = 0.0. Ideally
inputted attenuation factor must be less than the reciprocal of the

principal eigenvalue. As we do not know the value of eigenvalue in

advance, program calculates eigenvalue and recodes it properly. (recode 1 to principal eigenvalue

and value smaller than 1 to value * principal eigenvalue) Recoded attenuation parameter is displayed

in the report. Negative value means that node has more powerful neighbor, node itself becomes less

powerful (for example in market network, cause of competition of bargaining power.)

346
II. Analyze

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Power Centrality’

analysis, Main Report, Power Centrality Vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
-Recoded Attenuation Factor is showed for row and column.

-Distribution of Power Centrality Scores: Sum, mean, standard deviation, Minimum, Maximum of

Closeness Centrality are reported for each in-power centrality and out-power centrality.

 Tables
Power Centrality Vector
This vector shows the power centrality value for each node.

347
NetMiner Module Reference

 Maps
Spring Map
- Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality

score a node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

348
II. Analyze

 Inspect
You can see the power centrality value of each node on the map.

 Choose Direction (Spring Map, Concentric Map)


Select in-power or out-power. The size of nodes changes by the selected option.

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Bonnacich P, 1987. Power and Centrality: A family of Measures. American Journal of
Sociology 92, 1170-1182.

 Related Topics

349
NetMiner Module Reference

Analyze >> Centrality >> Effects

 Menu
Analyze >> Centality >> Effects

 Description
There are total, immediate, and mediative effect centralities. A node's total effect centrality is the

measure for the effect strength from the given node to other nodes through every walks between them.

It is similar to Katz, Hubbell Status and Power Centrality. Immediate effect centrality is the measure

how immediate one’s effect to others, whose concept is similar to the closeness centrality. Mediative

effect centrality is the measure the degree of mediation, whose notion is analogous to the

betweenness centrality.

There can be a case that the analysis is not applicable to a particular network. For example, the

matrix inverse operation used in the middle of computation requires some conditions on the input

network. However, generally users don't have to worry about this restriction.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Row Normalize: Each row would be normalized.

350
II. Analyze

 Main process
Weight Parameter (0 < alpha < 1): default value = 0.999. Ideally
inputted weight parameter must be less than the reciprocal of the

principal eigenvalue. If ‘Row Normalize’ is selected in Pre-process

step (It is selected as default. In most cases, you can use default option.), proper value is between 0 to

1, because principal eigenvalue of row normalized matrix is 1. The closer to 1 the value is, the

farther the effect of the node transmits. Default value is set by recommendation of reference.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Effects Centrality’

analysis, Main Report, Effects Centrality vector, Spring Map and

Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Effect Scores: Sum, mean, standard deviation, Minimum, Maximum of Effect

Centrality scores are reported.

 Tables
Effect Centrality Vector: This vectors show
the total, immediate, mediative effects

centrality value for each node.

351
NetMiner Module Reference

 Maps
Spring Map
Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality score

a node has, the closer to center it is arranged.

Default style: Default style is set by Common option in the Preference >> Node tab.

352
II. Analyze

 Inspect
With this item, user’s able to show Effects Centrality of each node on the map.

 Choose Measure (Spring Map, Concentric Map)


The network map is redrawn by the selected measure of Effects

Centrality.

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Noah E. Friedkin, 1991. Theoretical Foundations for Centrality Measures. AJS 96 Number 6,
1478-1504

 Related Topics

353
NetMiner Module Reference

Analyze >> Centrality >> PageRank

 Menu
Analyze >> Centrality >> PageRank

 Description
This module is an implementation of Google PageRank algorithm. Among several PageRank
models, a random surfer model will be used as it makes this algorithm easy to understand. Assume

that a user starts from a random initial web page and travels to web pages by clicking links. As time

goes by, the probability that a user will be in each web page converges to one value, which is the

PageRank of that web page (in our case, web pages are the nodes).

However, if a network is not completely connected, some web pages cannot be visited hence we need

to assume that a user visits a random web page by not following links with a certain probability. The

probability that a user will visit a web page by clicking a link, is called a ‘damping factor,’ (to be set

in a main process panel). As such, the probability that a user will visit a random web page not by

clicking a link is (1 – damping factor). This algorithm is computed iteratively and in most cases, the

PageRank value converges under 200 times iterations.

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only
choose one 1-mode network.

 Link Merge: When selected data contains multiple links,

where more than two links connect the same source

node and target node pair, a user should decide how to

merge them into a single link.

 Pre-process

354
II. Analyze

Dichotomize: A user needs to dichotomize data before running a module. The weighted or valued
data is transformed to unweighted or binary data as a result of dichotomizing data.

 Main process
# of Iterations: The default value is 200.

Damping Factor (Alpha): The default value is 0.85. The


PageRank algorithm is computed iteratively. At first, each

node is assigned an initial score. For each iteration, new

PageRank score of a node would contain (damping factor *

aggregation score of PageRank score of node’s neighbor).

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘PageRank’ analysis, ‘Main Report’,

‘PageRank Vector’, ‘Spring Map’ and ‘Concentric Map’ are

reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
 Distribution of PageRank Centrality Scores:‘Sum’, ‘mean’, ‘standard deviation’, ‘minimum’

and ‘maximum of PageRank Centrality’.

355
NetMiner Module Reference

 Tables
PageRank Centrality Vector: This vector
contains the PageRank centrality value for

each node.

 Maps
Spring Map
 Default Layout: Kamada&

Kawai algorithm (Spring

>>Kamada& Kawai) is used to

draw a map by default.

 Default Style: The default style

is set according to ‘Common’

option in ‘Preference >> Node’

tab. The size of a node on the

map is proportional to its

centrality score (e.g. a node

with the highest centrality score will be depicted as the biggest node on a map)

Concentric Map
 Default Layout: Concentric

algorithm (Circular >>

Concentric) is used to draw a

map. The higher the

centrality score of a node, the

closer the node to a center.

 Default Style: The default

style is set according to

‘Common’ option in

‘Preference >> Node’ tab.

356
II. Analyze

 Time Complexity
 O(m * k) where k is the number (#) of iterations.

 References
 S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In

proceedings of the 7th International WWW Conference, 1998.

357
NetMiner Module Reference

Analyze >> Centrality >> Generalized PageRank

 Menu
Analyze >> Centrality >>Generalized PageRank

 Description
The Generalized PageRank algorithm generalizes Google’s PageRank algorithm (‘PageRank

algorithm’) from two aspects, which are explained using a random surfer model in this document.
First, when two or more hyperlinks exist on a web page, as to the PageRank algorithm, the

probability that a user clicks on a hyperlink is equal for every hyperlink. However, in regard to the

Generalized PageRank algorithm, these probabilities are not equal. Second, when a user surfs from a

certain web page with the probability of (1 – damping factor), as to the PageRank algorithm, the

probability that a web page acts as a starting point is equal for every web page. However, these

probabilities are not equal for the Generalized PageRank algorithm. This algorithm rather uses the

vector provided by a user to set the probability of a web page acting as a starting point for each page.

If the sum of the values of a vector is not equal to 1, the Generalized PageRank algorithm normalizes

or standardizes values of a vector before using it. The value of Generalized PageRank is the product

of (the number of nodes) and (the probability that a user stays at a page for each page).

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only
choose one 1-mode network.

 Link Merge: When selected data contains multiple links,

where more than two links connect the same source

node and target node pair, a user should decide how to

merge them into a single link.

 Pre-process

358
II. Analyze

Dichotomize: If a user dichotomizes the weight, when two or more


hyperlinks are contained on each web page, the probability that a user

selects a hyperlink will be the same for every hyperlink. This is same

as how the PageRank algorithm works. If one chooses not to

dichotomize, the probability of each hyperlink being selected by a user is proportional to the weight

of each hyperlink.

 Main process
Direction: When a user chooses ‘In’, a user moves from a
target node to a source node. In other words, hyperlinks in a

target web page direct a user to a source web page. When a

user chooses ‘Out’, a user moves from a source node to a

target node. In other words, hyperlinks in a source web page

direct a user to a target web page.

Heterogeneous Beta: When a user chooses this option, this


algorithm rather uses the vector provided by a user to set the

probability of a web page acting a starting point for each page.

If the sum of the values of a vector is not equal to 1, the

Generalized PageRank algorithm normalizes or standardizes values of a vector before using it. If a

user chooses not to use this option, the probability of a web page acting as a starting point for each

page is equal.

Damping Factor (Alpha): Same as the damping factor used in the normal PageRank algorithm. A
user goes to another web page by clicking a hyperlink with the probability of a damping factor, and

begins surfing on a new web page with the probability of (1 – damping factor).

Maximum # of Iterations: PageRank algorithm can be computed iteratively. When using this
option, a user can set the maximum number of iteration. When a user chooses not to use this option,

this algorithm iterates until the result converges.

359
NetMiner Module Reference

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Generalized PageRank’ analysis,

‘Main Report’, ‘G. P. R. Vector’, ‘Spring Map’ and

‘Concentric Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
 Distribution of Generalized PageRank Centrality

Scores: ‘Sum’, ‘mean’, ‘standard deviation’, ‘minimum’ and ‘maximum of Generalized

PageRank Centrality’.

 Tables
G. P. R. Vector: This vector
contains the Generalized PageRank

centrality value for each node. The

Generalized PageRank centrality

value is defined as the product of

the number of nodes and the probability that a user will stay at each page.

 Maps

360
II. Analyze

Spring Map
 Default Layout: Kamada&

Kawai algorithm (Spring

>>Kamada& Kawai) is used

to draw a map by default.

 Default Style: The default

style is set according to

‘Common’ option in

‘Preference >> Node’ tab.

The size of a node on the

map is proportional to its

centrality score (e.g. a node

with the highest centrality

score will be depicted as the biggest node on a map)

Concentric Map
 Default Layout: Concentric

algorithm (Circular >>

Concentric) is used to draw

a map. The higher the

centrality score of a node,

the closer the node to a

center.

 Default Style: The default

style is set according to

‘Common’ option in

‘Preference >> Node’ tab.

 Time Complexity
 O(m * k) where k is the number (#) of iterations.

361
NetMiner Module Reference

 References
 S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In

proceedings of the 7th International WWW Conference, 1998.

 Related Topics
 Analyze >> Centrality >> PageRank

362
II. Analyze

Analyze >> Centrality >> HITS

 Menu
Analyze >> Centrality >> HITS

 Description
This module is an implementation of HITS algorithm. Nodal hub score is proportional to the

combined authority score, and authority score is proportional to the combined hub score of in-

neighbors. That is, nodal hub score becomes higher initially if the node has more out-neighbors, but

it is affected by the authority scores of its out-neighbors. So, if a node has many out-neighbors which

have low authority scores, hub score of that node will be low. Nodal authority score is similar to hub

score, but it is affected by in-neighbors. In addition, authority score of a node is also affected by hub

scores of its in-neighbors.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Degree’ analysis, Main Report, Degree Table, Node Type, Spring Map and

Concentric Map are created.

363
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Distribution of Authority, Hub Scores: Sum, mean, standard deviation, Minimum, Maximum of

Authority value and Hub value are reported.

 Tables
HITS Centrality Vector
This vector shows the authority score and hub score for each node..

364
II. Analyze

 Maps
Spring Map
Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

Concentric Map
Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality score

a node has, the closer to center it is arranged.

Default style: Default style is set by Common option in the Preference >> Node tab.

365
NetMiner Module Reference

 Inspect
With this item, users are able to show HITS Centrality of each node on the map.

 Choose Measure (Spring Map, Concentric Map)


The network map is redrew by the selected direction of Closeness

Centrality.

<Example Screen shot>

 Time Complexity
 O(m)

 Reference
 J. M. Kleinberg, Authoritative sources in a hyperlinked environment. In proceedings of the
ACM-SIAM Symposium on Discrete Algorithms, 1998.

 Related Topics

366
II. Analyze

Analyze >> Centrality >> Community

 Menu
Analyze >> Centrality >> Community

 Description
This module provides centrality measure of a community. The larger the centrality value is, the more

influence the node can give in forming community. That is, nodes with high community centrality

play central role in their local neighborhood. For more information, please refer to its reference.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

Symmetrize: You should symmetrize your data before running


module. By symmetrizing, directed/asymmetric data is transformed to undirected/symmetric data.

And if you symmetrize your data, algorithm will perform faster.

367
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Community Centrality’

analysis, Main Report, Community Centrality Vector, Degree-

Centrality Scatter Plot, Spring Map and Concentric Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Community Centrality Scores: Sum, mean, standard

deviation, Minimum, Maximum of Community Centrality are reported.

 Tables
Community Centrality
This vector shows the community centrality value for each

node.

 Maps
Spring Map
Default Layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

Default style: Default style is set by Common option in the Preference >> Node tab. Node with

higher centrality score is presented bigger on the map.

368
II. Analyze

Concentric Map
Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher the centrality score

a node has, the closer to center it is arranged.

Default style: Default style is set by Common option in the Preference >> Node tab.

 Charts
Degree-Centrality Plot
It is a scatter plot whose x axis is degree and y axis is community centrality.

369
NetMiner Module Reference

 Inspect
Community Centrality module doesn’t have an Inspect Control Item.

 Time Complexity
 O(n^3)

 Reference
 M.E.J. Newman "Finding community structure in networks using the eigenvectors of matrices"

 Related Topics

370
II. Analyze

Analyze >> Equivalence >> Structural >> Profile

 Menu
Analyze >> Equivalence >> Structural >> Profile

 Description
This module analyzes role-set structure of a network based on the similarity of tie-profiles among its

nodes. For all pairs of nodes, Structural equivalence is computed by the various measures of tie-value

from and to all other nodes. The more similar the tie-profiles of a pair of nodes the bigger structural

equivalence is. This module provides various measures which are used to comparing pattern of ties.

(Diagonal elements can be ignored or included in the calculation.)

Subsequent hierarchical clustering of the structural equivalence matrix gives cluster diagram.

 Process Flow

371
NetMiner Module Reference

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many 1-
mode Networks at once.

 Main process
Direction
- In: Algorithm considers in-neighbor proximity.

- Out: Algorithm considers out-neighbor proximity.

- In And Out: Algorithm considers in-neighbor and out-neighbor

proximity.

Diagonal Handling Option


- Ignore: Comparison is performed excluding diagonal values.

- Reciprocal: replace the comparison of element Xi,i with Xj,ij and

Xi,j with Xj,ij by the comparisons Xi,i with Xj,ij and Xi,j with Xj,ij

respectively

- Retain: Comparison is performed including diagonal values.

Proximity measures
Select measure to compute proximity of neighbor profiles.

- Match

For selected two nodes’ row profiles R=(R_1, R_2, …, R_n) and S=(S_1, S_2, …, S_n),

a: The number of i with R_i =1 and S_i = 1

b: The number of i with R_i =1 and S_i = 0

c: The number of i with R_i =0 and S_i = 1

d: The number of i with R_i =0 and S_i = 0

372
II. Analyze

a
Jaccard coefficient 
abc
a
Ochiai 
{( a  b)( a  c)}1 / 2

2a
Czekanowski, Sorensen, Dice 
2a  b  c

a
Russel, Rao 
abcd
a
Simpson 
min{( a  b), (a  c)}
a
Braun, Blanque 
max{( a  b), (a  c)}
a
Kulczynski1 
bc

1 a a
Kulczynski2  (  )
2 ab ac

C ij C ij
Equivalence Index ( )( )
Ci Cj

a
Sokal, Sneath, Anderberg 
a  2(b  c)

2a
Mountford 
a(b  c)  2bc

ad
Simple Matching 
abcd

ad  bc
Yule 
ad  bc

373
NetMiner Module Reference

ad  bc
Phi  1
{( a  b)( a  c)(b  d )(c  d )} 2

(a  d )  (b  c)
Hamman 
abcd

a(a  b  c  d )
Mozley, Margalef 
(a  b)( a  c)

ad
Roger, Tanimoto 
a  2b  2c  d

4(ad  bc)
Michael 
(a  d ) 2  (b  c) 2

- Correlation

C ik : k-th element of profile vector which represents subject i.

 (C ik  C i )(C jk  C j )
Pearson’s Correlation  k 1
n n

 (Cik  C i ) 2
k 1
 (C
k 1
jk  C j )2

C ik C jk
Cosine Similarity  k 1
n n

 Cik C
2 2
jk
k 1 k 1
n
Inner Product   Cik C jk
k 1

374
II. Analyze

n
6 (Cik  C jk ) 2
Spearman’s rho  1 k 1

- Distance
n(n 2  1)

C ik : k-th element of profile vector which represents subject i.

1
Euclidean Distance  { (Cik  C jk ) } 2 2

City Block Metric   C ik  C jk


k

1

Minkowski Metric  { wk Cik  C jk } 
k

Cik  C jk
Canberra Metric 
k ( Cik  C jk )

1
C ik  C jk
 k

 (C
Bray-Curtis
p ik  C jk )
k

1 (Cik  C jk ) 2
Divergence  
p k (Cik  C jk ) 2

 C C ik jk
 k

 max( C , C
Soergel
ik jk )
k

1 1 1
Bhattacharyya Distance  { (Cik  C jk ) } 2 2
2 2

1 min( Cik , Cik )


Wave-Heedges  
p k
(1 
max( Cik , C jk )
)

375
NetMiner Module Reference

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the

longest distance between any two nodes, which belong to the different clusters to each other (i.e., by

the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

network. In other words, if all nodes in the given cluster have the same distance to every node in the

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Structural

Equivalence(Profile)’ analysis, Main Report, Profile Matrix, Profile

Cluster Matrix, Permutation Vector, Dendrogram and MDS Map are

created.

376
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Structural Equivalence

 Tables
Structural Equivalence Profile Matrix

Structural Equivalence Profile Cluster Matrix

377
NetMiner Module Reference

Permutation Vector

 Charts
Dendrogram

 Maps
MDS Map

378
II. Analyze

Node and Link Styling Applied

 Inspect
Explore - Equivalence - Structural - Profile explores the Structural Equivalence Profile between two

selected nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level: You can select Fusion level in consideration

of the Best Cut and the number of Clusters.

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected properties will be reflected on the network map just by clicking the Submit

button.

 Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the node style of the matching two

nodes on the network map change as pre-established node style in the global option as follows

379
NetMiner Module Reference

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Structural Equivalence Profile Value between two selected nodes

is represented in the text box.

You can search node using the blank area by writing some parts of the

Node Label in that area. But you need to click the Node Label below the text box that shows the

search result.

The change of selected item is reflected on the network map just by clicking the Submit button

 Time Complexity
 O(n^3)

 Reference
 Lorrain, F and White, H. C. (1971). Structural equivalence of individuals in social networks.
Journal of Mathematical Sociology. 1, 49-80.

 Burt, R.S. (1976), Positions in networks. Social Forces. 55, 93-122.

 Related Topics

380
II. Analyze

Analyze >> Equivalence >> Structural >>

CONCOR

 Menu
Anlyze >> Equivalence >> Structural >> CONCOR

 Description
It is also called CONvergence of iterated CORelations. CONCOR is a procedure based on the

convergence of iterated correlations. This refers to the observation that repeated calculation of

correlations between rows or columns of a matrix will eventually result in a last correlation matrix,

consisting only +1’s or –1’s. And, these patterns partition nodes into 2 subparts. We can get

hierarchical clustering result by repeating this procedure to subparts.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many
1-mode Networks at once.

 Main process
Direction
In: Algorithm considers in-neighbor proximity.

Out: Algorithm considers out-neighbor proximity.

In And Out: Algorithm considers in-neighbor and out-neighbor

proximity.

381
NetMiner Module Reference

# Iterations: default value = 200

Diagonal Handling Option


Ignore: comparison is performed excluding diagonal values

Reciprocal: replace the comparison of element Xi,i with Xj,ij and Xi,j with Xj,ij by the comparisons

Xi,i with Xj,ij and Xi,j with Xj,ij respectively

Retain: comparison is performed including diagonal values

Maximum Depth of split: default value = 3. This decides a height of dendrogram tree and cluster
diagram. Starting from one cluster, it will eventually have 2^ (depth of split) clusters.

Convergence Criteria: default value = 0.1. During iteration, if absolute changes cause of correlation
is less than convergence criteria, iteration stops.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Degree’ analysis,

Main Report, Degree Table, Node Type and Spring Map are created.

 Outputs

382
II. Analyze

Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Structural Equivalence

Structural Equivalence Cluster Diagram

 Tables
Structural Equivalence CONCOR Matrix

Structural Equivalence CONCOR Cluster Matrix

383
NetMiner Module Reference

Permutation Vector

 Charts
Dendrogram

 Maps
MDS Map

384
II. Analyze

Node and Link Styling Applied

 Inspect
Explore - Equivalence - Structural - CONCOR explores the Structural Equivalence CONCOR

between two selected nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters.

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

385
NetMiner Module Reference

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected item is reflected on the network map just by clicking the

Submit button

<Example Screen shot>

 Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows.

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Structural Equivalence CONCOR Value between two selected nodes is represented in the text

box.

386
II. Analyze

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button.

<Example Screen shot>

 Time Complexity
 O(n^3 * k) where K = # iterations

 Reference
 Breiger R, Boorman S and Arabie P (1975). An algorithm for clustering relational data, with
applications to social network analysis and comparison with multi-dimensional scaling. Journal

of Mathematical Psychology, 12, 328-383

 Related Topics

387
NetMiner Module Reference

Analyze >> Equivalence >> Regular >> REGGE

 Menu
Analyze >> Equivalence >> Regular >> REGGE

 Description
This module is an implementation of REGGE algorithm ("regular resemblance"). This algorithm uses

an iterative procedure in which estimates of the degree of regular equivalence between pairs of nodes

are adjusted in the light of the equivalences of alters adjacent to and from members of pairs.

 Process Flow

388
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many 1-
mode Networks at once.

 Main process
# Iterations : default value = 200

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the

longest distance between any two nodes, which belong to the different clusters to each other (i.e., by

the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

network. In other words, if all nodes in the given cluster have the same distance to every node in the

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Regular Equivalence(REGGE)’ analysis, Main Report, REGGE Matrix, REGGE

389
NetMiner Module Reference

Cluster Matrix, Permutation Vector, Dendrogram and MDS Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Regular Equivalence

390
II. Analyze

 Tables
Regular Equivalence Cluster Diagram

Regular Equivalence Matrix

Regular Equivalence Cluster Matrix

Permutation Vector

391
NetMiner Module Reference

 Charts
Dendrogram

 Maps
MDS Map

Node and Link Styling Applied

392
II. Analyze

 Inspect
Explore - Equivalence - Regular- REGGE explores the Regular Equivalence REGGE between two

selected nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot>

393
NetMiner Module Reference

 Lookup Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Regular Equivalence REGGE Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot>

 Time Complexity
 O(n^3 * k) where k = # iterations

394
II. Analyze

 Reference
 Douglas R. White and Karl P. Reitz (1985), "Graph semigraph homomorphism on network
relations", Social Networks 5, pp 193-234

 Douglas R. White and Karl P. Reitz (1985), "Measuring Role Distance : Structural and
Relational Equivalence"

※ We would like to express special thanks to Professor Douglas R. White for his kind help to

implement this algorithm.

 Related Topics

395
NetMiner Module Reference

Analyze >> Equivalence >> Regular >> CatRE

 Menu
Analyze >> Equivalence >> Regular >> CatRE

 Description
This measure is the Stephen P. Borgatti and Martin G. Everett’s CatREGE and ExCatREGE. This

computes multiplexed matrix (having categorical value in element) and equivalence groups. (In fact,

equivalence matrix) Input matrix must have categorical values. (not continuous values).

 Process Flow

396
II. Analyze

 User Options
 Input
1-mode Network: Select a 1-mode Network. You can choose many
1-mode Networks at once.

 Main process
Method: CatRE, ExCatRE

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the

longest distance between any two nodes, which belong to the different clusters to each other (i.e., by

the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

network. In other words, if all nodes in the given cluster have the same distance to every node in the

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Regular Equivalence(CatRE)’ analysis, Main Report, CatRE Matrix, CatRE Cluster

397
NetMiner Module Reference

Matrix, Permutation Vector, Dendrogram and MDS Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Regular Equivalence

Regular Equivalence Cluster Diagram

398
II. Analyze

 Tables
Regular Equivalence CatRE Matrix

Regular Equivalence CatRE Cluster Matrix

Permutation Vector

 Charts
Dendrogram

399
NetMiner Module Reference

 Maps
MDS Map

Node and Link Styling Applied

 Inspect
Explore - Equivalence - Regular CatRE explores the Structural Equivalence CatRE between two

selected nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters.

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

400
II. Analyze

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected item is reflected on the network map just by clicking the Submit button.

 Lookup Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Regular Equivalence CatRE Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button.

 Time Complexity
 O(n^3) where K = # iterations

 Reference
 Margin G. Everett and Stephen P. Borgatti, 1993. Two algorithms for computing regular
equivalence, Social Networks 15, 361-376.

 Margin G. Everett and Stephen P. Borgatti, 1993. Exact colorations of graphs and digraphs,
Social Networks 18, 319-331.

 Related Topics

401
NetMiner Module Reference

Analyze >> Equivalence >> Role >> Triad

 Menu
Analyze >> Equivalence >> Role >> Triad

 Description
This module analyzes role-set structure of a network based on the similarity of triad patterns among

its nodes. This menu implements Hummel and Sodeur's(1987) Role Equivalence. The adjacency

matrix is dichotomized ( Xi,j = 1, if Xi,j > 0, otherwise 0), and for each node, frequency of 36 triad

types is computed. Then role equivalence is measured by the Euclidean distance of triad pattern for

all pairs of nodes. Subsequent hierarchical clustering of the role equivalence matrix gives cluster

diagram.

 Process Flow

402
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many 1-
mode Networks at once.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the longest distance between any

two nodes, which belong to the different clusters to each other (i.e., by the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

network. In other words, if all nodes in the given cluster have the same distance to every node in the

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

403
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Role

Equivalence(Triad)’ analysis, Main Report, Triad Role Matrix, Triad

Role Cluster Matrix, Permutation Vector, Dendrogram and MDS

Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Role Equivalence

Role Equivalence Cluster Diagram

404
II. Analyze

 Tables
Triad Role Equivalence Matrix

Triad Role Equivalence Cluster Matrix

Permutation Vector

 Charts
Dendrogram

405
NetMiner Module Reference

 Maps
MDS Map

 Inspect
Explore - Equivalence - Role - Triad explores the Role Equivalence Triad between two selected

nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters.

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

406
II. Analyze

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected item is reflected on the network map just by clicking the Submit button.

 Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Role Equivalence Triad Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button.

 Time Complexity
 O(n^3)

 Reference
 Hummel, H. J. and W. Sodeur (1987). Struckturbeschreibung von positionen in sozialen
beziehungsnetzen. Methoden der Netwerkanalyze. F. U. Pappi. Munich, Oldenbourg.

 Related Topics

407
NetMiner Module Reference

Analyze >> Equivalence >> Role >> Local

 Menu
Analyze >> Equivalence >> Role >> Local

 Description
This is Winship and Mandel's Measure. Nodes i and j are more equivalent in terms of role, if the

collection of ways in which actor i relates to others is more similar as the collection of ways in which

actor j relates to others.

 Process Flow

408
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many 1-
mode Networks at once.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

 Main process
Maximum Length: Default value = 2

Duplicate Option
-Remove Duplicate Roles: Yes or No.

-Remove Duplicate Relation: Yes or No.

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the

longest distance between any two nodes, which belong to the different clusters to each other (i.e., by

the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

409
NetMiner Module Reference

network. In other words, if all nodes in the given cluster have the same distance to every node in the

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Role

Equivalence(Local)’ analysis, Main Report, Local Role Matrix, Local

Role Cluster Matrix, Permutation Vector, Dendrogram and MDS Map

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of Role Equivalence

Role Equivalence Cluster Diagram

410
II. Analyze

 Tables
Local Role Equivalence Matrix

Local Role Equivalence Cluster Matrix

Permutation Vector

 Charts
Dendrogram

411
NetMiner Module Reference

 Maps

Node and Link Styling Applied

 Inspect
Explore - Equivalence - Role - Local explores the Role Equivalence Local between two selected

nodes and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

The available cluster list in the selection box is determined by the selection of fusion level in the

412
II. Analyze

Select Level area.

The change of selected item is reflected on the network map just by clicking the Submit button

 Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The Role Equivalence Local Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button

 Time Complexity
 O(n^2)

 Reference
 Winship, C., and Mandel, M. (1983). Roles and Positions: A critique and extension of the block-
modeling approach. In Leinhardt, S. (ed.), Sociological Methodology 1983-1984, pages 314-344.

San Francisco: Jossey-Bass.

 Related Topics

413
NetMiner Module Reference

Analyze >> Equivalence >> SimRank

 Menu
Analyze >> Equivalence >> SimRank

 Description
This module is the implementation of Glen Jeh and Jennifer Widom’s SimRank. Its base concept is

that two nodes are similar if they are referenced by similar nodes.

SimRank score is calculated for pair of nodes that are reachable in distance less than 2 times the

number of user-specified iteration.

 Process Flow

414
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose many 1-
mode Networks at once.

 Main process
Direction: In, Out

# Iterations: default value = 3.

Dampening Parameter
Default value is 0.8. SimRank algorithm is computed iteratively, after a same initial score is set to

each node pair at the beginning. In each step, new SimRank score will be assigned to each node pair,

which is [dampening parameter * aggregation SimRank score of ‘pair of neighbors of two node’].

 Post-process
Clustering Method
- Single: The distance between two clusters is determined by the

distance of the two closest nodes (nearest neighbors), which belong to

the different cluster to each other.

- Complete: The distance between two clusters is determined by the

longest distance between any two nodes, which belong to the different clusters to each other (i.e., by

the "furthest neighbors").

- Average: The distance between two clusters is calculated as the average distance between all pairs

of nodes in the two different clusters.

- Ward: This method is somewhat different from previous three methods. Each cluster's homogeneity

is appraised by the sum of squared deviations (ESS) of the distance between each actor in the given

cluster and each actor in the network from the mean distance between actors in C and one in the

network. In other words, if all nodes in the given cluster have the same distance to every node in the

415
NetMiner Module Reference

network, ESS of the given cluster would be equal to 0 because all nodes in C are homogeneous.

Users need to be careful when this method is used. The criterion for fusion is that it should produce

the smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘SimRank’ analysis,

Main Report, SimRank Equivalence Matrix, SimRank Cluster Matrix,

Permutation Vector, Dendrogram and MDS Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

Distribution (Sum, Mean, Std. Dev, Min, Max) of SimRank Equivalence

SimRank Equivalence Cluster Diagram

416
II. Analyze

 Tables
SimRank Equivalence Matrix

SimRank Cluster Matrix

Permutation Vector

 Charts
Dendrogram

417
NetMiner Module Reference

 Maps
MDS Map

Node and Link Styling Applied

 Inspect
Explore - Equivalence - SimRank explores the SimRank Equivalence between two selected nodes

and the clusters of nodes according to the selected fusion level.

 Cluster
Select Level
Selecting Fusion Level. You can select Fusion level in consideration

of the Best Cut and the number of Clusters

Select Cluster
Selecting Cluster changes the style of nodes on the network map as

pre-established node style in the global option as follow:

Nodes of Selected Cluster: Subset Membership - Subset Member Node(s)

Nodes of Non-selected Cluster: Subset Membership - Subset Non-member Node(s)

418
II. Analyze

The available cluster list in the selection box is determined by the selection of fusion level in the

Select Level area.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot>

 Equivalence
Two Nodes Selection
Selecting a Source Node and a Target Node one by one makes the

node style of the matching two nodes on the network map change as

pre-established node style in the global option as follows

- Source Node: Focus Pair - 1st Node

- Target Node: Focus Pair - 2nd Node

The SimRank Equivalence Value between two selected nodes is represented in the text box.

You can search node using the blank area by writing some parts of the Node Label in that area. But

419
NetMiner Module Reference

you need to click the Node Label below the text box that shows the search result.

The change of selected item is reflected on the network map just by clicking the Submit button

<Example Screen shot>

 Time Complexity
 O (n^2 * k * d^2) where K is # iterations, D is average degree.

 Reference
 Jeh, Glen; Widom, Jennifer. SimRank: A Measure of Structural-Context similarity. Technical
Report, Computer Science Department, Stanford University, 2001

 Related Topics

420
II. Analyze

Analyze >> Position >> Blockmodel

(Conventional)

 Menu
Analyze >> Position >> Blockmodel (Conventional)

 Description
This module consists of two components: 1) a position (partition) of nodes, 2) for each pair of

positions, a report of the tie presence or absence within or between those pairs. Blockmodeling can

be thought as “blocking nodes” and “making new sociorelation (1-mode Network) of blocks”. Main

Node Attribute is used to define blocks.

 Process Flow

421
NetMiner Module Reference

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

Select Vector: Select Main Node Attribute data. It is used to making


blocks.

 Pre-process
Dichotomize: You can dichotomize your data before running module.
By dichotomizing, weighted/valued data is transformed to non-

weighted/binary data.

 Main process
Goodness of fit Index: You can confirm the goodness of fit index of
the blockmodel. NetMiner provides six indices.

Notations
xij
: the (i, j) element of the input 1-mode network. 0 or 1

g: the number of nodes

g k : the number of nodes in group k

 g k gl if k  l
g kl : 
 g k ( g k  1) if k  k

422
II. Analyze

bkl : the (k, l) element of the image matrix. 0 or 1

 x ij
i group k j  group l
 kl : where k  l
g k gl

 x ij
i  group k j  group k
 kl :
g k ( g k  1)

 : threshold density (the density of input 1-mode Network is commonly used)

 1 if  kl  

tkl : 1  
  otherwise

okl : g kl  kl  number of 1' s in the (k , l ) block


okl : g kl  exp ected number of 1' s in the (k , l ) block
*

 bthe group
(t )
xij of i , the group of j

oxx ( t )  number of entries that equal 1 in both x and x ( t )


ox  number of entries in x that equal 1
ox ( t )  number of entries in x ( t ) that equal 1
z x  number of entries in x that equal 0
z x ( t )  number of entries in x ( t ) that equal 0

423
NetMiner Module Reference

Goodness of Fit Indices

1. City Block (density )   | bkl   kl |


k l

 (   ) 2 g kl  1  (okl  okl * ) 2 
2. Max. Chi  squared statistics    kl 2    2 
k l  ( t kl ) g ( g  1)  g ( g  1) k l
*
 okl (tkl ) 
g g
3. City Block (adjacency)   | xij  xij
(t )
|
i 1 j 1
g g

 | x  xij
(t )
ij |
i 1 j 1
4. Match Coefficient  1 
g ( g  1)
5. Matrix Correlatio n  the Pearson correlatio n between xij and xij
(t )

g ( g  1)oxx ( t )  ox ox ( t )

ox ox ( t ) z x z x ( t )

g g
2 xij xij
(t )

i 1 j 1
6. Coefficient of Identity  g g

[ x  ( xij ) 2 ]
2 (t )
ij
i 1 j 1

# of Iterations: To see the significance of given goodness of fit index, this module creates
distribution of goodness of fit indices by changing permutation of given node attribute vector

randomly, which is repeated by the number of iterations.

Dichotomize for Image Matrix: Image matrix is a matrix of link abstraction between positions. At
first, the module computes the density of the links between the positions. Then it removes links

between positions whose density is lower than specific threshold. Although the density of input

network is used for the default threshold, users can specify the

threshold value.

 Post-process

424
II. Analyze

In Post-process stage, you can handle the image matrix created from the Main process.

Dichotomize: You should dichotomize your image matrix. By dichotomizing, weighted/valued data
is transformed to non-weighted/binary data.

Role Typology Threshold:


Proportion of ties Proportion of ties Proportion of ties within position
sent by position received by position > expected value ≤ expected value
>0 Primary Position Broker
>0
~0 Low Status Clique Sycophant
>0 High Status Clique Snob
~0
~0 Isolated Clique Isolate
For each position (collapsed nodes), ‘role typology decision’ decides role of each position by

following rules:

1. If weighted degree sum of a position’s out link (the link whose source node is that position) to

other positions is greater than ‘sending threshold’,

1.1. If weighted degree sum of a position’s in link (the link whose target node is that

position) to other positions is greater than ‘receiving threshold’,

1.1.1. If weighted degree sum of a position’s out link (the link whose target node

is that position) to that position is greater than ‘expected value(density of selected

layer)’, the role of position is ‘Primary Position’.

1.1.2. If weighted degree sum of a position’s out link (the link whose source node

is that position) to that position is smaller than or equal to ‘expected

value(density of selected layer)’, the role of position is ‘Broker’.

1.2. If weighted degree sum of a position’s in link (the link whose target node is that

position) to other positions is smaller than or equal to ‘receiving threshold’,

1.2.1. If weighted degree sum of a position’s out link (the link whose target node

is that position) to that position is greater than ‘expected value(density of selected

layer)’, the role of position is ‘Low Status Clique’.

1.2.2. If weighted degree sum of a position’s out link (the link whose source node

is that position) to that position is smaller than or equal to ‘expected

value(density of selected layer)’, the role of position is ‘Sycophant’.

2. If weighted degree sum of a position’s out link (the link whose source node is that position) to

other positions is smaller than or equal to ‘sending threshold’,

425
NetMiner Module Reference

2.1. If weighted degree sum of a position’s in link (the link whose target node is that

position) to other positions is greater than ‘receiving threshold’,

2.1.1. If weighted degree sum of a position’s out link (the link whose target node

is that position) to that position is greater than ‘expected value(density of selected

layer)’, the role of position is ‘High Status Clique’.

2.1.2. If weighted degree sum of a position’s out link (the link whose source node

is that position) to that position is smaller than or equal to ‘expected

value(density of selected layer)’, the role of position is ‘Snob’.

2.2. If weighted degree sum of a position’s in link (the link whose target node is that

position) to other positions is smaller than or equal to ‘receiving threshold’,

2.2.1. If weighted degree sum of a position’s out link (the link whose target node

is that position) to that position is greater than ‘expected value(density of selected

layer)’, the role of position is ‘Isolated Clique’.

2.2.2. If weighted degree sum of a position’s out link (the link whose source node

is that position) to that position is smaller than or equal to ‘expected

value(density of selected layer)’, the role of position is ‘Isolate’.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of

‘Blockmodel(Conventional)’ analysis, Main Report, Block Image

Matrix, Block Density Matrix, Block Sum Matrix, Block-Node

Affiliation Matrix, # nodes, Block Role Typology and Clustered Map

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

426
II. Analyze

 Reports
Main Report
Goodness of fit Index

- Observed: Goodness of fit Index which is observed in the given matrix.

- Expected: Goodness of fit Index which is expected.

- P >= obs, P < obs: It shows the distribution of Goodness of fit Indices.

 Tables
Block Image Matrix
Block Image Matrix is a dichotomized matrix of Block Density Matrix. ‘1’ in the cell represents the

existence of main node between two positions. And ‘0’ in the cell represents the absence of main

node between two positions.

Block Density Matrix


It is a 1-mode matrix of positions. Each cell represents normalized number of main nodes between

two sub nodes(positions).

Block Sum Matrix


It is a 1-mode matrix of positions. Each cell represents number of main nodes between two sub

nodes(positions).

427
NetMiner Module Reference

Block-Node Affiliation Matrix


It is a 2-mode matrix whose main nodes are positions, and sub nodes are main nodes of input

network. It shows affiliations of nodes with positions.

# Nodes
This vector shows the number of nodes that each position contains.

Block Role Typology


This vector shows the role of each position.

428
II. Analyze

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. Nodes are clustered by

positions (selected vector value).

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
You can see the position on the map.

 Find
After a Position is selected in the combo box, the style of nodes on

the map is changed as pre-established style in the global option.

Corresponding global option is as follow.

Nodes of selected position: Node >> Subset Membership >> Subset Member Node(s)

Nodes of non-selected position: Node >> Subset Membership >> Subset Non-member Node(s)

<Example Screen shot>

429
NetMiner Module Reference

 Time Complexity

 Reference

 Related Topics

430
II. Analyze

Analyze >> Position >> Brokerage

 Menu
Analyze >> Position >> Brokerage

 Description
This module computes Gould & Fernandez's brokerage measure. Given a 1-mode Network and a

partition vector, it analyzes every triad and role of each node in that triad. For each node, it counts the

number of times each node is involved in five kinds of brokerage relationship (Coordinator,

Gatekeeper, Representative, Itinerant and Liaison). With this numbers, you can check the role of each

node in input network.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just


one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

Partition Vector: Select a main node attribute to make partitions.

431
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Brokerage’ analysis,

Main Report, Brokerage Table, Clustered Map and Concentric Map

are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of brokerage: Mean, Standard variation, Minimum and Maximum of brokerage scores

are reported.

 Tables
Brokerage Table
Brokerage scores for each node are presented.

- Partition Value: Partition value of each node is presented.

- Coordinator: If node ‘a’ receives a link from node ‘b’ in the same partition, and send a link to node

‘c’ in the same partition, then add 1 Coordinator score to node ‘a’.

- Gatekeeper: If node ‘a’ receives a link from node ‘b’ in different partition, and send a link to node

‘c’ in the same partition, then add 1 Gatekeeper score to node ‘a’.

- Representative: If node ‘a’ receives a link from node ‘b’ in the same partition, and send a link to

node ‘c’ in different partition, then add 1 Representative score to node ‘a’.

432
II. Analyze

- Itinerant (Consultant): If node ‘a’ receives a link from node ‘b’ in different partition, and send a

link to node ‘c’ in that partition(same as ‘b’), then add 1 Itinerant score to node ‘a’.

- Liaison: If node ‘a’ receives a link from node ‘b’ in different partition, and send a link to node ‘c’ in

another different partition(different from’b’), then add 1 Liaison score to node ‘a’.

 Maps

Clustered Map
- Default Layout: A map is drawn by Clustered >> Clustered-CoLa algorithm. The higher brokerage

score a node has, the bigger it is presented on the map.

- Default style: Default style is set by Common option in the Preference >> Node tab.

Concentric Map
- Default Layout: A map is drawn by Circular >> Concentric algorithm. The higher brokerage score a

node has, the closer to center it is arranged.

- Default style: Default style is set by Common option in the Preference >> Node tab.

433
NetMiner Module Reference

 Inspect
This function explores the Brokerage Level of each node according to the Brokerage Type on the

concentric or radial map.

 Brokerage (Clustered Map)

Group
Nodes of selected group and other nodes are changed by preset style of Preference. And the nodes

and links of selected group are automatically selected on the map.

434
II. Analyze

Measure
The size of node is proportional to its brokerage score. After a

Brokerage Type is selected in this item, the size of node is changed

by the selected Brokerage type.

 Brokerage (Concentric Map)

Group
Nodes of selected Group in Group combo box is included in the concentric or radial map and nodes

of unselected Group is driven in the corner of the map.

Measure
Selecting Brokerage Type in Brokerage Type radio button changes

the position of each node according to the Brokerage Level

determined by the selected Brokerage Type.

The change of selected item is reflected on the network map just

by clicking the Submit button.

435
NetMiner Module Reference

<Example Screen shot>

 Time Complexity
 O(n^3)

 Reference
 Gould, J. and Fernandez, J. 1989. Structures of mediation: A formal approach to brokerage in
transaction networks. Sociological Methodology: 89-126.

 Related Topics

436
II. Analyze

Analyze >> Position >> Bow-Tie Model

 Menu
Analyze >> Position >> Bow-Tie Model

 Description
Bow-tie model is useful for analysis of positional and directional information with respect to

structural properties of each component. Generally, directed graphs with bow-tie model structure

consist of several parts. The first is a giant strongly connected component (GSCC) as a core of

network. The second part is a GIN, which consists of nodes from which the GSCC can be reached,

but cannot be reached from the GSCC. Third is a GOUT, whose nodes can be reached from the

GSCC but from which the SCC cannot be reached. The fourth and fifth are TENDRIL and TUBE,

both parts cannot reachable nor reached from the GSCC. TUBE has link with IN and OUT.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just


one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Bow-tie model’ analysis, Main Report, Type Partition Table, Type Statistics Table,

Type Matrix Table, Bow-Tie Map and Spring Map are created.

437
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Type Partition Table
The table shows what a node type is.

Type Statistics Table


The table shows the number of nodes and rate for each type.

438
II. Analyze

Type Matrix Table


The matrix shows a relation between each group.

Each group is a set of same type node.

 Chart
Bow-Tie Map Chart

 Maps
Spring Map

439
NetMiner Module Reference

- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
Select Type
The selected type of nodes will be represented on the network map.

<Example Screen shot>

 Time Complexity
 O(m+n)

440
II. Analyze

 Reference
 A. Broder et al., 2000. Graph structure in the web. Computer Networks 33, 309-320

 Related Topics

441
NetMiner Module Reference

Analyze >> Position >> Expand/Collapse

 Menu
Analyze >> Position >> Expand/Collapse

 Description
This module constructs a Blockmodel regarding the Tree Structure of nodes. If a user selects a Tree

Dataset and expands/collapses the Tree Hierarchy in the 'Input' Control Item, a Block Matrix is

generated. In addition, some visualization outputs like Matrix Diagram and Spring Map are generated.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one 1-mode Network.
- Link Merge: When selected data contains multiple links, where more than two links connect the

same source node and target node pair, you should decide how to merge them to a single link.

Select Tree: Select a Tree Dataset among the Sub Nodesets and 2-mode Networks. Tree Hierarchy
would be presented belows if selected 2-mode Network is a correct Tree Dataset. The analysis is

done regarding the expanded depth of Tree Hierarchy.

- Nodeset: select a Sub Nodeset which represents Tree Nodeset.

- Tree: select a 2-mode Network which represents Inclusion Relationship.

- Tree Hierarchy: Decide which level of Tree Hierarchy should be presented. The analysis is done

regarding the expanded depth of Tree Hierarchy.

442
II. Analyze

- Expand to level: The Tree Hierarchy is presented at a selected level

(depth).

- Expand All: The Tree Hierarchy is presented at the largest level.

- Collapse All: The Tree Hierarchy is presented at level 1.

Common Attributes: The attributes selected in here are saved as a


attribute of the result Nodes. For example, if the 'Team' attribute is

selected in here, the Tree Node 'Advertising (section)' will have the

'Marketing' as the value of 'Team' attribute. If another nodes are

expanded at a larger level, the nodes - for example, Steven - also have

the 'Team' attribute value.

 Pre-process

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

Dichotomize: You can dichotomize your data before running module.


By dichotomizing, weighted/valued data is transformed to non-

weighted/binary data.

 Main process
Block Link: You can define the link weight values in the result matrix.
- Block Sum: The link weight values in the result matrix are Blcok

Sum.

- Block Density: The link weight values in the result matrix are Blcok Density.

443
NetMiner Module Reference

 Post-process
Cut-off: You can decide the cut-off value in this option. If a link
weight of the result matrix is smaller than the cut-off value, that link is

removed from the result.

- Dichotomized Option: If this option is checked, the result matrix

which is generated after the cut-off process becomes a binary matrix.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Expand/Collapse’

analysis, Main Report, Block Matrix, Block Node List, Main Node

Type, Matrix Diagram and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Block Matrix
1-mode Network of Block Nodes is generated. The Block Nodes are defined in the Tree Hierarchy of

'Input' Control Item.

Block Node List


The node list of the result matrix is presented.

- Node Type: If the Block Node is Main Node, the Node Type is presented as "Main".

- Parent: This vector shows the 'Parent-Child' relationship or the Inclusion Relationship of the Tree

Hierarchy.

- Containing Node: The number of nodes which are included in the Block Node.

444
II. Analyze

- Common Attributes: Selected Common Attributes are presented.

Main Node Type


You can see that the Main Nodes are collapsed in the result Block Matrix. Block Node name is

presented in the Node Type vector for each Main Node.

Matrix Diagram
The Matrix Diagram which represents the Block Matrix is generated.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: The size of a node represents the Containing Nodes. If some nodes have same color,

the parent of the nodes are same.

 Inspect
You can see the position on the map.

 Node Color

 Node Size

 Link Thickness

 Link Threshold

 Time Complexity

 Reference

 Related Topics

445
NetMiner Module Reference

 Transform >> Mode >> Tree Construction

 Using NetMiner >> Concept >> Data Structure >> Data Item >> Representing Tree Structure

446
II. Analyze

Analyze >> Properties >> Network >> Multiple

 Menu
Analyze >> Properties >> Network >> Multiple

 Description
Various network properties are presented.

 User Options

 Input
1-mode Network: Select a 1-mode Network. Multiple networks can
be chosen at once by checking “Select All” check box.

 Pre-process
Dichotomize: Before running module, your data can be dichotomized
such that weighted/valued data is transformed to non-weighted/binary

data.

 Main process
Number of Links: to compute the number of links.

Density: Density is the proportion of lines that are actually present in


the network. It is the ratio of the number of lines present to the number

of the maximum possible lines.

Average Degree: the average degrees for all nodes

# of Components (Weak): Weak Component is the maximal subgraph


in which each pair of node is connected by a semi-path.

447
NetMiner Module Reference

# of Components (Strong): Strong Component is the maximal subgraph in which each pair of node
is connected by a path in both directions.

Inclusiveness: The number of connected nodes expressed as a proportion of the total number of
nodes. Connected nodes mean the nodes except isolates. (Inclusiveness = number of connected

nodes/ number of nodes)

Reciprocity (Arc Method): the ratio of (the number of links which are the part of reciprocated
relations) to (the total number of links)

Reciprocity (Dyad Method): the ratio of (the number of reciprocated node pairs) to (the number of
connected node pairs)

Transitivity: the ratio of total number of transitive triads to the total number of transitive and
intransitive triads. For digraphs, the ratio of the number of transitive triads to the number of

potentially transitive triads

Clustering Coefficient: It is a percentage of the links that are actually present for a node and its
alters. After picking a node, find all its neighbor nodes. It is a ratio of (the number of connections

observed) to (the number of the maximum possible connections) between its neighbor nodes. The

clustering coefficient of the entire network is the average of the clustering coefficients for all the

nodes.

Mean Distance: Mean Distance is the average geodesic distance between any pair of nodes in a
network.

Diameter: Diameter is the largest geodesic distance between any pair of nodes in a network.

Node Connectivity: Node Connectivity is the minimum number of nodes that must be removed to
disconnect the network.

Link Connectivity: Link Connectivity is the minimum number of links that must be removed to
disconnect the network.

Connectedness: This measure is to calculate ratio of pairs it can be reached mutually each other in
the digraph.

Efficiency: This measure is to calculate how much network’s connection is efficient.

Hierarchy: measures how much network have hierarchical character.

LUB: compute how many roots there are, if the network is regarded as tree.

448
II. Analyze

 Post-process
Significance Test: You can test the significance of your results by
taking MCMC(Markov Chain Monte Carlo) method. “Iterations”

option allows you to change how many matrices will be made for the

significance test. The more iteration is performed, the more reliable

test result can be obtained, although at the expense of very long

computation times.

- MCMC[U(Xi+, X+j)]: Make matrices which have same row marginal totals and column totals as

input matrix. That is, the in-degree and out-degree of each node are same as in the input matrix. The

results of these matrices are compared to the result of original matrix.

- MCMC[U(Xi+, X+j, MAN)]: It satisfies the conditions of above option. In addition, it makes

matrices which have same number of mutuals, asymmetrics and nulls. It means that the new matrices

have same dyad census as input matrix. The results of these matrices are compared to the result of

original matrix.

 Output
NetMiner allows you to select which outputs to be reported in which

format. For the result of ‘Network Properties’ analysis, Main Report,

Observed and MCMC Result are reported.

Table Dimension
- Network * Significance * Measure: Each table represents the network

measure.

- Network * Measure * Significance: Each table represents the result of

significance test.

- Measure * Significance * Network: Each table represents a selected

network.

449
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window. For table dimension option,

‘Measure * Significance * Network’ is selected. (Thus, each table represents a selected network.)

 Reports
Main Report
- Network properties: The network measures of the selected properties for the selected networks are

reported.

 Tables
Network Tables
For the selected networks, the network measures of the selected properties and the result of MCMC

(expected properties) is reported.

 Time Complexity
 # Links: O(m)

 Density: O(m)

 # of Components: O(m)

 Inclusiveness: O(m)

 Average Degree: O(m)

 Reciprocity: O(m)

 Transitivity: O(n^3)

 Clustering Coefficient: O(m^2)

 Mean Distance: O(n^3)

 Diameter: O(n^3)

450
II. Analyze

 Node Connectivity: O(n^4)

 Link Connectivity: O(n^3)

 Connedtedness: O(m)

 Efficiency: O(m)

 Hierarchy: O(n^3)

 LUB: O(n^3)

 Reference
 Inclusiveness: John Scott, Social Network Analysis - a handbook, 2nd edition. 2000. (p.70)

 Reciprocity: Zeggelink, E.P.H. (1993). Strangers into friends. The evolution of friendship
networks using an individual oriented modeling approach. Amsterdam: Thesis Publishers, 1993.

 Transitivity: Frank, O., & Harary, F. (1982). Cluster inference by using transitivity indices in
empirical graphs. Journal of the American Statistical Association, 77, 835-840.

 Clustering Coefficient: Watts D J (1999) Small worlds. Princeton University Press, Princeton,
New Jersey. 32-33.

 Connedtedness: Krackhardt, David (1994). Graph theoretical dimensions of informal


organizations. In Kathleen Carleyand Michael Prietula, eds. Computational Organizational

Theory, Lawrence Erlbaum Associates, Inc.

 Efficiency: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations.


In Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 Hierarchy: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations.


In Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 LUB: Krackhardt, David (1994). Graph theoretical dimensions of informal organizations. In


Kathleen Carleyand Michael Prietula, eds. Computational Organizational Theory, Lawrence

Erlbaum Associates, Inc.

 Related Topics

451
NetMiner Module Reference

Analyze >> Properties >> Network >> Modularity

 Menu
Analyze >> Properties >> Network >> Modularity

 Description
This module calculates the modularity of each 1-mode network with selected partition vectors. It is

defined as “the number of edges within communities” subtracted by “expected number of such
edges”. The maximum value of modularity for certain network is 1. Values that are closer to 1,

represent that the community is partitioned more optimally.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose many


1-mode Networks at once.

Node Attribute: Select partition vectors. You may select more than
one node attribute.

 Pre-process

Dichotomize: You should dichotomize your data before running


module. By dichotomizing, weighted/valued data is transformed to

unweighted/binary data. So you are forced to give dichotomize

option.

452
II. Analyze

Symmetrize: Before running module, your data should be symmetrized such that
directed/asymmetric data is transformed to undirected/symmetric data.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Modularity’ analysis,

Main Report and Modularity table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Modularity : The modularity values are shown in the table below. Each row represents the 1-mode

Network and each column represents its partition vector.

 Tables
Modularity Table
The modularity values are shown in the table below. Each row represents the 1-mode Network and

each column represents its partition vector.

 Time Complexity
 O(m)

453
NetMiner Module Reference

 Reference

 F. Chung and L. Lu, Connected components in random graphs with given degree sequences.
Annals of Combinatorics 6, 125–145 (2002)

 M. E. J. Newman , Finding community structure in networks using the eigenvectors of matrices.


5-6 (2006)

 Related Topics
 Analyze > Cohesion > Community

454
II. Analyze

Analyze >> Properties >> Group

 Menu
Analyze >> Properties >> Network

 Description
Various group properties are measured, where each group is defined by the user-specified vector.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: If a network contains multiple links, which means that

more than one link exist for a source node and a target node pair, those

multiple links should be merged to a single link. Please select how to

merge the multiple links.

Select Vector: Select a Main Node Attribute vector in order to divide


groups.

 Main process
External-Internal Index: E-I Index is to compare the number of links
between actors of the same type and between actors of different type.

The index ranges between -1 and 1, -1 indicating that all ties connect

nodes of the same type and vice versa.

SMI: SMI means [Segregation Matrix Index] and is created by Fershtman and Chen.

455
NetMiner Module Reference

D A, A  D A, B
SMI = where D X ,Y means density of choices from X to Y. If group A segregates
D A, A  D B , B

(reveals self-preference), SMI > 0. In the extreme case where A segregates completely, its members

direct no choice outward, SMI = 1. In contrast, where A's members reveal other-preference, namely

direct all their choices outwards, SMI = - 1.

Cohesion Index: Cohesion Index is the extent to which ties are concentrated within a subgroup,
rather than between subgroups. Cohesion index is defined as:

Density: Density of group is the proportion of possible lines that are actually present in the each
group. It is the ratio of the number of lines present to the maximum possible.

Group Modularity: This module calculates the modularity of each partition group. It is defined as
“the number of edges within communities” subtracted by “expected number of such edges”. Total

sum of group modularity is same to network modularity.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Group Property’

analysis, Main Report and Group Property Table are created.

456
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Group properties: E-I Index, SMI, Cohesion Index Density and Group Modularity are reported.

 Tables
Group Property Table
This table presents group properties for each group considering only selected vector.

 Time Complexity
 E-I Index : O(m)

 SMI : O(m)

 Cohesion Index : O(m)

 Density : O(m)

 Group Modularity: O(m)

 Reference
 E-I Index: Krackhardt, David and Robert Stern (1988). Informal Networks and Organizational
Crises: An Experimental Simulation. Social Psychology Quarterly. 51:123-140.

457
NetMiner Module Reference

 SMI: Fershtman, M. and M. Chen, 1993, The segregation matrix: a new index for measuring
sociometric segregation, Megamot34, 563-581 (in Hebrew; an English version is available from

the authors).

 Cohesion Index: Bock, R.D., and Husain, S.Z. (1950). An adaptation of Holzinger’sB-
coefficients for the analysis of sociometric data. Sociometry. 13, 146-153.

 Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications,
Cambridge, 7.6 Measures of Subgroup Cohesion

 Related Topics

458
II. Analyze

Analyze >> Models >> Dyadic Interaction (p1)

 Menu
Analyze >> Models >> Dyadic Interaction (p1)

 Description
‘Dyadic Interaction (p1)’ analysis fits Holland and Leinhardt’s p1 model to explain 1-mode Network.

You can get expansiveness (alpha), popularity (beta) for each node, mutuality (rhou), overall choice.

Also G-square and degree of freedom for significance is shown.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

 Main process
Vector Categorize: If you want to model with Main Node Attribute variable, then check 'Vector
categorize'.

459
NetMiner Module Reference

# of Iterations: p1 fitting is computed iteratively. The more iteration is


performed, the more reliable test result can be obtained, although at the

expense of very long computation times.

Constraints
- Constraint alphas: hypothesize that every nodes’ alpha values

(expansiveness) are 0.

- Constraint betas: hypothesize that every nodes’ beta values

(popularity) are 0.

- Constraint rhou: hypothesize that there is no mutuality in selected 1-mode Network.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Dyadic

Interaction(p1)’ analysis, Main Report, Parameter Estimate Table,

Fitted Matrix, Residual Matrix, Line Plot Chart, Matrix Diagram and

Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- G-square: One of goodness-of-fit statistics, well known in the

categorical data analysis field. As a model fits better, G-square value decreases. If the model fits

perfectly, G-square will approach zero.

- G-square’s Degree of Freedom: The degree of freedom value is computed considering alpha, beta

and rhou constraints.

- mutuality (rhou): Positive mutuality indicates reciprocity in networks with fixed out-degree or in-

degree (or both). Negative mutuality means the opposite.

460
II. Analyze

- overall choice (theta): overall probability that pair of two nodes have link. So, exp(theta) is almost

same as density.

 Tables
Parameter Estimate Table: Expansiveness (alpha) and Popularity (beta) for all nodes or groups(in
the vector categorized cases) are presented.

Fitted Matrix
Fitted Matrix is Node by node or block by block matrix. Each cell means the cell value expected by

the fitted model.

Residual Matrix: Residual Matrix = Observed Matrix – Fitted Matrix

461
NetMiner Module Reference

 Charts
Line Plot Chart
- Blue: Successful Estimate. i.e., the link is observed and is predicted to exist.

(true positive cases)

- Orange: Faulty Guess. i.e., the link is not observed but is predicted to exist. (false positive cases)

- Red: Missing Edge. i.e., the link is observed but is not predicted to exist. (false negative cases)

Matrix Diagram
Each row, column means node. As you change cut-off value to dichotomize fitted matrix, # dyads of

successful estimate, faulty guess, missing edge are changed. In fact, faulty guess and missing edge

are in trade-off. You may select best cut-off value using this chart.

462
II. Analyze

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

 Inspect
This module shows the expectation and the observation of links.

 Threshold
As “Threshold” level is changed with the threshold slider, the color of

links on the map (or the color of each cell in matrix diagram) changes.

The color of each link is decided as a result of comparison between the dichotomized estimated value

and the dichotomized observed value as follows. The cut-off value for the dichotomization of the

estimated value is decided by the Threshold level. (minimum level: 0%, maximum level: 100%)

463
NetMiner Module Reference

<Example Screen shot>

Matrix Diagram Network Map

 Time Complexity
 O(k * n^2) where k is # of iterations

 Reference
 Paul W. Holland and Samuel Leinhardt. March 1981. An exponential family of Probability
distributions for directed graphs. Journal of the American Statistical Association, Vol 76,

Number 373, Invited Papers Section.

 Related Topics

464
II. Analyze

Analyze >> Models >> ERGM (p*)

 Menu
Analyze >> Models >> ERGM (p*)

 Description
ERGM (p*) means ‘Exponential Random Graph Model’. p1 is also one of ERGM. But, because p1

assumes ‘dyadic independence’, it’s inherently unrealistic. But, p* doesn’t assume that. Users can

model given network with several network statistics (network property) and may get parsimonious

but adequate model to describe network.

Matrix Diagram Visualization idea is from Multinet. We thank to Richard in this respect.

 Process Flow
When not categorized by vector

465
NetMiner Module Reference

When categorized by vector


It is special that block structure’s applied to p* model when categorizing with vector data.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Main process
Categorize by Attribute: If some factors are estimated as group level index (e.g. mutuality in boy-
boy block, girl-girl block), select group vector(Main Node Attribute).

Node Attribute: Selected Main Node Attribute data is used to grouping nodes.

466
II. Analyze

# of iterations: p* module of NetMiner fits model with logistic regression. This option is used to
deciding the number of iterations in Logistic Regression Analysis.

Cutoff Value: It is used in dichotomizing the fitted matrix.

Factors: select Block factors you want to model. Dyad types and triad
types are supported.

 Block Structure/Factor Dialog


If user categorized nodes by attribute, and click ‘Run Main’, dialog

will be opened.

- Block Structure: Block identification. If a block has 0 value in block

structure, that block is not contained in model. Blocks with same Block ID contained are contained in

model with same factors.

- Factors: In this option, user’s able to decide which factors should be considered for group level or

whole network. For example, in the following picture, 1, 6, 8, 9 factors are considered for whole

network. 3, 4, 5 factors are considered for block 1.

467
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘ERGM(p*)’ analysis,

Main Report, Classification Table, Parameter Estimate Table,

Parameter Correlation Matrix, Fitted Matrix, Residual Matrix, Line

Plot Chart, Matrix Diagram and Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
-2 Log (Pseudo likelihood): As a model fits better, -2 Log (Pseudo likelihood) decreases. If the model

fits perfectly, -2 Log would be 0. This value is created by comparing input network and fitted matrix.

- Goodness of Fit: This measure is based on the residuals, or the difference between observed values

and expected values by the model. 0 means that the model is perfect and the larger values means that

the model is worse. Details can be found at the reference.

- Model Chi-squared: (-2 Log(Pseudo likelihood) of Null Model) – (-2 Log(Pseudo likelihood) of the

given model).

 Tables
Classification Table
Value in cells (Observed=0, Predicted=0) means the number of (directed) dyads which are not linked

in observed network and are not linked in predicted network.

Value in cells (Observed=0, Predicted=1) means the number of (directed) dyads which are not linked

in observed network and are linked in predicted network.

Other values may be interpreted similarly.

468
II. Analyze

Parameter Estimate Table


- Estimates: estimated value

- [Link].: Standard Error (sample standard deviation)

- PLWald: Wald statistic

- p(df=1.0): p-value

- Exp(b): exp(estimates)

- Counts: the number of the matched patterns in observed network.

Parameter Correlation Table


Factor’s estimated correlation matrix

Fitted Matrix
Fitted Matrix is node by node or block by block matrix. Each cell means the cell value expected by

the fitted model.

469
NetMiner Module Reference

Residual Matrix
Residual Matrix = Observed Matrix – Fitted Matrix

 Charts
Line Plot
- Blue: Successful Estimate. i.e., the link is observed and is predicted to exist. (true positive cases)

- Orange: Faulty Guess. i.e., the link is not observed but is predicted to exist. (false positive cases)

- Red: Missing Edge. i.e., the link is observed but is not predicted to exist. (false negative cases)

Matrix Diagram
Each row, column means node. As you change cut-off value to dichotomize fitted matrix, # dyads of

successful estimate, faulty guess, missing edge are changed. In fact, faulty guess and missing edge

are in trade-off. You may select best cut-off value using this chart.

470
II. Analyze

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

471
NetMiner Module Reference

 Inspect
This module explores the estimation of links between two nodes on the network map according to the

threshold level and the selected pre-established triad relation model factors.

 Threshold
After Threshold level is selected with threshold slider, the color of

each link on the mp (or the color of each cell in matrix diagram)

changes.

The color of each link is decided as a result of comparison between the dichotomized estimated value

and the dichotomized observed value as follows. The cut-off value for the dichotomization of the

estimated value is decided by the Threshold level. (minimum level: 0%, maximum level: 100%)

<Example Screen shot>

Matrix Diagram Network Map

 Time Complexity
 O(n^3 x # Iterations)

 Reference

472
II. Analyze

 Bradley Crouch and Stanley Wasserman. A Practical Guide To Fitting p* Social Network
Models Via Logistic Regression

 Wasserman, S. & Pattison, P. (1996). Logit models and logistic regressions for social networks:
I. An introduction to markov graphs and p*. Psychometrika, 61, 401-425.

 Related Topics
Analyze >> Models >> Dyadic Interaction (P1)

473
NetMiner Module Reference

Analyze >> Models >> Blockmodel

(Generalized)

 Menu
Analyze >> Models >> Blockmodel (Generalized)

 Description
Block-modeling consists of two sub-problems:

1) Partitioning of units - determining the classes (clusters) that form the vertices in a model;

2) Determining the links in a model (and their values).

Generalized Blockmodeling enables users to better reflect the network structure. It unifies and

combines different notions of equivalences (structural, regular, etc.), which can be simultaneously

applied to the same network. Users can use a predefined partition or get the optimized partition for

the defined model. Furthermore since users can define connection type of model, users can better

understand the network structure.

 Process Flow

474
II. Analyze

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where more than two links connect the

same source node and target node pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You should dichotomize your data before running
module. By dichotomizing, weighted/valued data is transformed to

non-weighted/binary data.

 Main process

475
NetMiner Module Reference

# of Iterations: Decide the number of repeating Optimization Engine. If you repeat this engine more,
you’ll get result optimized better. But it will take more time.

# of Iterations: Please specify the number of iteration for BlockModeling operation.

Error Type: Whether separation of model and initial partition’s status is accepted as it is (constant)
or weighted as their block size.

Weight Vector: you can set different weight value to each type. Bigger weight value contributes to
error more (in fact, proportional to the weight value). In the results, type with big weight is less

selected as an optimized model.

476
II. Analyze

Random Partition Number: The partition is determined randomly based upon the user-specified
number of partitions.

Initial Partition: Make partitions with the user-defined vector, or Main Node Attribute.

Model Specification
- Connection Type: Model. What shape should block have?

- Type Weight: What shape is preferred?

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of

‘Blockmodel(Generalized)’ analysis, Main Report, Optimized

Partition Vector, Error Matrix, and Clustered Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Design Matrix

477
NetMiner Module Reference

- Selected Design Matrix: One block can have multiple feasible shapes. Algorithm selects optimal

shape for every block.

 Tables
Optimized Partition Vector

Error Matrix

 Maps

478
II. Analyze

 Inspect

 Time Complexity
 O(n^2)

 Reference

479
NetMiner Module Reference

 Vladimir Batagelj. Notes on block-modeling, Social Networks 19, 143-155.

 Related Topics

480
II. Analyze

Analyze >> Two Mode >> Degree

 Menu
Analyze >> Two Mode >> Degree

 Description
This module is a two-mode version from previous one-mode version of the degree analysis.

 User Options

 Input
2-mode Network: Select a 2-mode Network. Only one 2-mode
Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of

interest should be selected.

- Link Merge: Determine how multiple links are merged to a single

link.

 Main process
Measure
- # of links: The degree of each node is the number of links which are

incident from the node.

- Sum of weight: The degree of each node is weight sum of links which

are incident from the node.

481
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘2-mode Degree’

analysis, Main Report, Main Nodeset Degree, Sub Nodeset Degree and

Spring Map are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Main Nodeset degree: Mean, Std. Dev., Min, and Max of Main Nodeset degree

score are reported, while the distribution not being normalized.

- Distribution of Sub Nodeset degree: Mean, Std. Dev., Min, and Max of Sub Nodeset degree score

are reported, while the distribution not being normalized.

 Tables
Main Nodeset degree vector
For each main node, its degree score is presented.

482
II. Analyze

Sub Nodeset vector


For each sub node, its degree score is presented.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Two-Mode option in the Preference >> Node tab.

 Time Complexity
 O(m)

 Reference

 Related Topics
 Analyze >> Neighbor >> Degree

483
NetMiner Module Reference

Analyze >> Two Mode >> Eigenvector Centrality

 Menu
Analyze >> Two Mode >> Eigenvector Centrality

 Description
This module analyzes centrality structure of a network based on the similarity matrix. After obtaining

1-mode similarity matrix from 2-mode Network data, eigenvector centrality is calculated in the same

way as the 1-mode eigenvector centrality. In the report, 2 kinds of centrality are reported. One comes

from main node-main node similarity matrix and the other comes from sub node-sub node similarity

matrix

 User Options

 Input
2-mode Network: Select a 2-mode Network. Only one 2-mode
Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of interest

should be selected.

- Link Merge: Determine how multiple links are merged to a single link.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘2-mode Eigenvector Centrality’ analysis, Main Report, Main Node Eigenvector

Centrality Vector, Sub Node Eigenvector Centrality Vector, Concentric Map: Main Node and

Concentric Map: Sub Node are created.

484
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Distribution of Main Nodeset Eigenvector Centrality: Mean, [Link]., Min, Max of Main Nodeset

centrality score are reported. (Not normalized)

- Distribution of Sub Nodeset Eigenvector Centrality: Mean, [Link]., Min, Max of Sub Nodeset

centrality score are reported. (Not normalized)

 Tables
Main node eigenvector centrality vector
For each main node, its eigenvector centrality score is

presented.

485
NetMiner Module Reference

Sub node eigenvector centrality vector


For each sub node, its eigenvector centrality score is

presented.

 Maps
Concentric Map: Main Node
- Default layout: A map is drawn by Circular >> Concentric algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

Concentric Map: Sub Node


- Default layout: A map is drawn by Circular >> Concentric algorithm.

- Default style: Default style is set by Two-Mode option in the Preference >> Node tab.

486
II. Analyze

 Time Complexity
 O (n^3 + c^3), where c is the number of sub nodes.

 Reference
 Bonacich P (2002). Hyper-edges and Multi-dimensional Centrality.

 Related Topics
 Analyze >> Centrality >> Eigenvector

487
NetMiner Module Reference

Analyze >> Two Mode >> Max. Matching

 Menu
Analyze >> Two Mode >> Max. Matching

 Description
A matching M is a subset of edges. It is a subset that makes each node (both Main Nodes and Sub

Nodes) incident on just one link of M. A maximum matching is a biggest set among these matchings.

 User Options

 Input
2-mode Network: Select a 2-mode Network. Only one 2-mode
Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of

interest should be selected.

- Link Merge: Determine how multiple links are merged to a single link.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Max Matching’

analysis, Max Matching Matrix and Spring Map are created.

488
II. Analyze

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- # of Pairs: The number of pairs that satisfy max matching is reported.

- Pairs: The name of each node of the pairs is reported.

 Tables
Max. Matching Matrix: node-by-item matching matrix.

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Two-Mode option in the Preference >> Node tab.

489
NetMiner Module Reference

 Inspect
This module explores the Structural Equivalence Profile between two selected nodes and the clusters

of nodes according to the selected fusion level.

 Find
Focal Node
You can search node using the blank area by writing some parts of the Node Label in that area. But

you need to click the Node Label below the text box that shows the search result.

490
II. Analyze

Select View
<Example Screen shot>

 Time Complexity
 O (n^3 + c^3) where c is # categories.

 Reference
 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to

Algorithms second edition, 26.3 Maximum bipartite matching, p.664.

 Related Topics

491
NetMiner Module Reference

III. Statistics
1. MDS

2. Correspondence

3. Decomposition
 Eigenvector
 Singular

 Spectral

4. Covariance Matrix

5. Principal Component

6. Factor Analysis

7. Frequency
 Vector

 Matrix

8. Gini Coefficient
 Vector

 Matrix

9. Power Law
 Vector

 Matrix

10. Descriptives
 Vector

 Matrix

11. Crosstabs
 Vector

 Matrix

12. ANOVA
 Vector

 Matrix

13. Correlation

492
IV. Mining

 Vector

 Matrix

14. Autocorrelation
 Join-Count

 Continuous

15. Regression
 Vector

 Matrix

16. Logistic Regression


 Vector

 Matrix

493
NetMiner Module Reference

Statistics >> MDS

 Menu
Statistics >> MDS

 Description
With similarity data (the bigger (i, j) value of input matrix is the more similar subject i and subject j

are) or dissimilarity data (the bigger (i, j) value of input matrix is the more different subject i and

subject j are) of given subjects, Multidimensional Scaling (MDS) analyzes similarity or dissimilarity

information. Trying to reflect this information, this module arranges nodes on 2D or 3D map. Using

MDS, user’s able to check the similarity or dissimilarity information visually.

NetMiner provides three MDS algorithms.

- c-MDS: c-MDS implements Torgerson-Gower's classical (metric) Multidimensional Scaling, this is

also known as Principal Coordinate Analysis (PCO). A similarity matrix is reversed to a dissimilarity

matrix with linear transformation. The dissimilarity matrix is squared, double centered (new value of

an element = value of input matrix – raw mean – column mean + matrix mean) and multiplied with -

1/2, then eigenvalue decomposition is used to determine the coordinate values. Only the first two

positive ordered eigenvalues and eigenvectors are used. A scale is displayed at left and upper side.

- n-MDS: n-MDS performs non-metric multidimensional scaling of a given ordinal proximity matrix

following ALSCAL(Alternating Least-Squares Scaling) algorithm. The initial configuration is found

using "Classical MDS"(c-MDS). Then disparity matrix is calculated (following Kruskal's least-

squares monotonic transformation) and normalized. Then using this disparity matrix, coordinates are

determined one at a time estimation of coordinates minimizing SStress.

 (d  dˆ ij ) 2
2 2
ij
i, j
(SStress= where d ij is distance between i and j in the normalized disparity
 d ij
4

i, j

matrix, and d̂ ij is distance between i and j which is displayed on 2D or 3D map.)

494
IV. Mining

- Kn-MDS: Kruskal's approach to nonmetric MDS. The details are found at the reference. It finds the

configuration with minimum stress using Kruskal's monotonic least squares regression and Newton-

 (d
i, j
ij  dˆij ) 2
Rhapson method. (Stress = where d ij is distance between i and j in the input
d
2
ij
i, j

dissimilarity matrix, d̂ ij is the result of monotone regression of d ij , and it’s proportional to

distance between i and j displayed on 2D or 3D map.

Only undirected/symmetric data can be inputted because of the meaning of MDS.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Pre-process
Symmetrize: You should symmetrize your data before running
module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data, algorithm will perform faster.

 Main process

MDS Method: Select a MDS method among c-MDS, n-MDS and


Kn-MDS.

Proximity: Decide whether the input 1-mode Network is interpreted as Similarity data or

495
NetMiner Module Reference

Dissimilarity data..

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘MDS’ module, Main

Report, MDS Coordinates Table and MDS Plot are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
1) c-MDS: Proportion explained (Larger value, MDS fits data better)

2) n-MDS, Kn-MDS: Stress (Smaller value, MDS fits data better)

 Tables
MDS Coordinates Table
For each node, its two dimensional coordinates are presented.

496
IV. Mining

 Maps
MDS Map
Default layout: A map is drawn by MDS algorithm.

Default style: none.

 Time Complexity
 O(n^3)

 Reference
 J. C. Gower (1966). Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika, 53:325--388, 156.

 Young, F., W, Takane, Y., & Lewyckyj, R., " Three notes on ALSCAL", Pschometrika, 1978,
43, 433-435

 Kruskal, J.B. Nonmnetric multidimensional scaling: a numerical method. Psychometrika, 29.


115-129.

 Related Topics
 Visualize >> MDS

497
NetMiner Module Reference

Statistics >> Correspondence

 Menu
Statistics >> Correspondence

 Description
For a given 2-mode network, correspondence structure among main nodes (row items) and sub nodes

(column items) are represented in common two-dimensional space.

Row points, which are main nodes, close together indicate that nodes have similar profiles

(conditional distributions) across the columns. Column points, which are sub nodes, close together

indicate that sub nodes have similar profiles (conditional distributions) down the rows. Finally, row

points that are close to column points represent combinations that occur more frequently than would

be expected from an independence model-that is, a model in which the row categories are unrelated

to the column categories.

Euclidean distance in the two-dimensional plot corresponds to a statistical distance between pairs of

rows (or columns) profiles in the original data. It is important to remember that there is not direct

distance relation between a point representing a row profile and a point representing a column profile.

 User Options

 Input

2-mode Network: Select a 2-mode Network. Only one 2-mode


Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of

interest should be selected.

- Link Merge: Determine how multiple links are merged to a single

link.

498
IV. Mining

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Correspondence’

module, Main Report, Correspondence Coordinates Table and

Correspondence Plot are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Dimension1: It is an axis which reflects structure of statistical distances between main nodes and

distances between sub nodes best.

- Dimension2: It is an axis which reflects structure of statistical distances between main nodes and

distances between sub nodes in the second place.

- Inertia: Singular value squared of each axis.

- Proportion: The proportion of reflected (by one axis) distance structure (by each axis) to whole

statistical distance structure.

- Cumulative Proportion: The proportion of reflected distance structure (by two axes) to whole

statistical distance structure.

 Tables
Correspondence Coordinates Table
There are two vectors. X represents dimension1 and

Y represents dimension2.

499
NetMiner Module Reference

Correspondence Analysis Plot


Each node is arranged on the map by its correspondence coordinate.

 Time Complexity
 O(n^3)

 Reference
 Sten-Erik Clausen, (1998), Applied Correspondence Analysis, Sage.

 Related Topics

500
IV. Mining

Statistics >> Decomposition >> Eigenvector

 Menu
Statistics >> Decomposition >> Eigenvector

 Description
Symmetric matrix A with n * n dimension has an eigen-decomposition. After computing eigenvalues

from given matrix A, an diagonal matrix D can be computed such that A = Q D Q’, where Q is an

orthonormal matrix.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

- Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Decomposition’

analysis, Main Report, Q Matrix and Eigenvalues Vector are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

501
NetMiner Module Reference

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Q Matrix: This matrix contains eigenvectors.

Eigenvalues Vector: Diagonal vector of D

 Time Complexity
 O(n^3)

 Reference
 Ingwer Borg, Patrick Groenen. Modern Multidimensional Scaling (Theory and Applications),
Springer. 117 page.

 Related Topics

502
IV. Mining

Statistics >> Decomposition >> Singular

 Menu
Statistics >> Decomposition >> Singular

 Description
The singular value decomposition and SVD of a matrix is a decomposition that closely related to the

eigen-decomposition and equally useful in algebra and for computational purposes. The SVD is also

known as the Eckart-Young theorem.

n x m matrix A can be decomposed into

A = P D Q'

P is an n*m orthonormal matrix whose columns are left singular vectors (i.e., P’P=I).

D is an m*m diagonal matrix whose diagonal value is singular values.

Q is an m*m orthonormal matrix whose columns are right singular vector (i.e., Q’Q=1)

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just


one 1-mode Network.

2-mode Network: Select a 2-mode Network. You can choose just


one 2-mode Network at once.

- Nodeset: At first, you should select a Sub Nodeset containing 2-

mode Network you want to analyze.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node), you should decide how to

merge them to a single link.

503
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Singular

Decomposition’ module, Main Report, Left Singular Vectors, Singular

Values Vector and Right Singular Vectors are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Left Singular Vectors: P matrix

Singular Values Vector: Diagonal vector of D

Right Singular Vectors: Q matrix

504
IV. Mining

 Time Complexity
 O(n^3)

 Reference
 Ingwer Borg, Patrick Groenen. Modern Multidimensional Scaling (Theory and Applications),
Springer. 122 page.

 Related Topics

505
NetMiner Module Reference

Statistics >> Decomposition >> Spectral

 Menu
Statistics >> Decomposition >> Spectral

 Description
A slightly different view of eigen-decompositions leads to an important property of the spectral

decomposition.

Matrix A = Q D Q' = 1 q1 q1 '2 q 2 q 2 '   n q n q n '

This states that the matrix A is decomposed into a sum of matrices( i q i q i ' ).

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Nodeset: At first, you should select a Sub Nodeset containing 2-

mode Network you want to analyze.

 Main process
# Spectral Matrices: Among n matrices, only K matrices (matrices
with largest, second-largest,.., K-th largest lambda value) are

reported.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Spectral Decomposition’ module, Main Report, Eigenvalues Vectors and Spectral

506
IV. Mining

Matrices are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Eigenvalues vector
Eigenvalues computed from input matrix are presented.

Matrix i: component related to i-th largest eigenvalue.

507
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference
 Ingwer Borg, Patrick Groenen. Modern Multidimensional Scaling (Theory and Applications),
Springer. 118 page.

 Related Topics

508
IV. Mining

Statistics >> Covariance Matrix

 Menu
Statistics >> Covariance Matrix

 Description
This module computes a covariance matrix between all pairs of row vectors for their corresponding

main nodes.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just


one 1-mode Network.

2-mode Network: : At first, you should select a Sub Nodeset


containing 2-mode Network you want to analyze. Then, select a 2-

mode Network. You can choose just one 2-mode Network at once.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node),

you should decide how to merge them to a single link.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Covariance Matrix’ statistics module, Main Report and Covariance Matrix are

created.

509
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Main Report presents information of process and data only.

 Tables
Covariance Matrix

 Time Complexity
 O(n^3)

 Reference

 Related Topics

510
IV. Mining

Statistics >> Principal Component

 Menu
Statistics >> Principal Component

 Description
Principal Component Analysis is a mathematical and statistical analysis that transforms several

variables to uncorrelated variables. The first principal component is the linear combination of

variables with maximum variance. The second principal component is also the linear combination of

variables with maximum variance independent of first principal component. It is based on "eigen

analysis" of correlation matrix of input variables.

When a 2-mode Network is inputted, this module computes covariance matrix of row vectors

corresponding to main nodes. Then, it performs principal component analysis with the covariance

matrix. When an 1-mode Network is inputted, this module assumes that the 1-mode Network is a

covariance matrix, and performs principal component analysis with the data. Not all 1-mode Network

is a proper covariance matrix. (Only positive definite 1-mode Network can be used for principal

component analysis as a covariance matrix.) So, you should carefully perform analysis when you use

1-mode Network as input data.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

2-mode Network: At first, you should select a Sub Nodeset

511
NetMiner Module Reference

containing 2-mode Network you want to analyze. Then, select a 2-mode Network. You can choose

just one 2-mode Network at once.

- Link Merge: When selected data contains multiple links (more than two links which are composed

of same source node and target node), you should decide how to merge them to a single link.

 Pre-process
Symmetrize: You should symmetrize your data before running
module. By symmetrizing, directed/asymmetric data is transformed

to undirected/symmetric data. And if you symmetrize your data,

algorithm will perform faster.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Principal

Component’ analysis, Main Report, Covariance Matrix, Eigenvalues

Table, Principal Components Table and Y are created.

 Outputs

Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Total Variance is reported.

512
IV. Mining

 Tables

Covariance Matrix
Covariance matrix is computed from given matrix.

Eigenvalues

Principal Component

 Time Complexity
 O(n^3)

513
NetMiner Module Reference

 Reference

 Related Topics
Statistics >> Covariance

514
IV. Mining

Statistics >> Factor Analysis

 Menu
Statistics >> Factor Analysis

 Description
Factor analysis is to describe, if possible, the covariance relationships among many variables in terms

of a few underlyiing, but unobservable, random quantities called factors. Our model is the orthogonal

factor model which all factors are orthogonal and estimates factors using principal component

method.

After a 2-mode Network is provided as an input, it calculates its covariance matrix of row vectors of

main nodes. Then it performs the factor analysis on the covariance matrix. If a 1-mode Network is

provided as an input, it assumes that the input is an already covariance matrix, and performs the

factor analysis on the covariance matrix. However, not all 1-mode Network is a proper covariance

matrix. (Only positive definite 1-mode Network can be used for principal component analysis as a

covariance matrix.) Therefore, caution is needed in performing the factor analysis using a 1-mode

Network as input data.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

2-mode Network: At first, you should select a Sub Nodeset


containing 2-mode Network you want to analyze. Then, select a 2-

mode Network. You can choose just one 2-mode Network at once.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node),

515
NetMiner Module Reference

you should decide how to merge them to a single link.

 Pre-process
Symmetrize: You should symmetrize your data before running
module. By symmetrizing, directed/asymmetric data is transformed to

undirected/symmetric data. And if you symmetrize your data,

algorithm will perform faster.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Factor Analysis’

statistics module, Main Report, Covariance Matrix, Residual Matrix,

Eigenvalues Table, Factor Loadings Table and Community Table are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Total communality: Communality is proportion of observed values explained by factors.

516
IV. Mining

 Tables
Covariance Matrix

Residual Matrix

Eigenvalues Table

Factor Loadings Table

Communality Table

517
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference

 Related Topics

518
IV. Mining

Statistics >> Frequency >> Vector

 Menu
Statistics >> Frequency >> vector

 Description
This module counts how many times a specific value appears in a vector.

 User Options

 Input
Select Vector: Select a Main Node Attribute to count values.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Frequency >>

Vector’ module, Main Report, Frequency Vector and Pie Chart are

created.

519
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data count: Missing value information is presented.

- Mode: Most frequent value

- IQV: index of qualitative variation, or a measure for nominal variable. The index of being 0 means

that the distribution has no diversity at all, whereas the index of being 1 means that the distribution is

maximally diverse.

- Result of Frequency: Values of the vector and frequency, proportion, cumulative proportion of each

value are reported.

 Tables
Frequency Vector
For each value of selected vector, frequency is presented.

520
IV. Mining

 Charts
Pie Chart

 Time Complexity
 O(n)

 Reference

 Related Topics

521
NetMiner Module Reference

Statistics >> Frequency >> Matrix

 Menu
Statistics >> Frequency >> Matrix

 Description
This module counts how many times a specific value appears in a matrix.

 User Options

 Input
1-mode Network: Select one and only one 1-mode Network.

2-mode Network: First, select a Sub Nodeset containing 2-mode


Network to be analyzed. Then, select the 2-mode Network. Only one

2-mode Network can be chosen at a time.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node),

you should decide how to merge them to a single link.

 Main process
Diagonal Handling Option: For ‘retain’ option, diagonal values will
be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

522
IV. Mining

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Frequency >>

Matrix’ module, Main Report, Frequency Vector and Pie Chart are

created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data count: Missing value information is presented.

- Mode: Most frequent weight.

- IQV: index of qualitative variation, or a measure for nominal variable. The index of being 0 means

that the distribution has no diversity at all, whereas the index of being 1 means that the distribution is

maximally diverse.

- Result of Frequency: Weights of the network and frequency, proportion, cumulative proportion of

each weight are reported.

 Tables
Frequency Vector
For each weight of selected network, frequency is presented.

523
NetMiner Module Reference

 Charts
Pie Chart

 Time Complexity
 O(n^2)

 Reference

 Related Topics

524
IV. Mining

Statistics >> Gini Coefficient >> Vector

 Menu
Statistics >> Gini Coeffiecient >> Vector

 Description
A measure of inequality developed by the Italian statistician Corrado Gini. This module applies Gini

coefficient on the inequality of link degrees possessed, or other attribute possessed. The Gini

coefficient has a number between 0 and 1, where 0 represents perfect equality where everyone has

the same possessions and 1 represents perfect inequality where one person has all the possessions,

and everyone else possesses nothing.

 User Options

 Input
Select Vector: Select a Main Node Attribute to compute Gini coefficient.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Gini Coefficient >> Vecotr’ analysis, Main Report, Accumulation Table and Lorenz

Curve are created.

525
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data count: Missing value information is reported.

- Gini Coefficient: Gini Coefficient for the selected vector is reported.

 Tables
Gini Coefficient Accumulation Table
- Upper: the proportion of vector values that x% upper nodes have.

- Lower: the proportion of vector values that x% lower nodes have.

526
IV. Mining

 Charts
Lorenz Curve

 Time Complexity
 O(n)

 Reference

 Related Topics

527
NetMiner Module Reference

Statistics >> Gini Coefficient >> Matrix

 Menu
Statistics >> Gini Coefficient >> Matrix

 Description
A measure of inequality developed by the Italian statistician Corrado Gini. This module applies Gini

coefficient on the inequality of link weights possessed. The Gini coefficient has a number between 0

and 1, where 0 represents perfect equality where everyone has the same possessions and 1 represents

perfect inequality where one person has all the possessions, and everyone else possesses nothing. It

measures the inequality of link weights of a matrix data.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

2-mode Network: At first, you should select a Sub Nodeset


containing 2-mode Network you want to analyze. Then, select a 2-

mode Network. You can choose just one 2-mode Network at once.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node),

you should decide how to merge them to a single link.

528
IV. Mining

 Main process
Diagonal Handling Option: For ‘retain’ option, diagonal values will
be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Gini Coefficient

>> Matrix’ statistics module, Main Report, Accumulation Table and

Lorenz Curve are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data count: Missing value information is presented.

- Gini Coefficient: Gini Coefficient of selected network.

 Tables
Gini Coefficient Accumulation Table
- Upper: the proportion of vector values that x% upper nodes have.

- Lower: the proportion of vector values that x% lower nodes have.

529
NetMiner Module Reference

 Charts
Lorenz Curve

 Time Complexity
 O(n^2)

 Reference

 Related Topics

530
IV. Mining

Statistics >> Power Law >> Vector

 Menu
Statistics >> Power Law >> Vector

 Description
NetMiner provides the most probable estimation of power-law model. Traditionally, regression

analysis is used to Power-law model fitting. But the rish of such approach is revealed in 'Power Law

Models in Empirical Data(Clauset et. Al)'. (Read the reference paper, if you want to completly

understand this module.)

NetMiner uses more strict maximum-likelihood approaches and goodness of fit test.

- 'alpha' : The estimation of 'alpha' of power law model p(x) = C * x^-alpha (x >= x_min)

- 'x_min' : The estimation of 'x_min' of power law model p(x) = C * x^-alpha (x >= x_min)

- Kolomogrov-Smirnov Statistic: a goodness-of-fit statistic to the input data for the model.

 User Options

 Input
Select Vector: select a Main Node Attribute.

 Main process

Model: Choose a Model option between “discrete” and “continuous”. For the “discrete” option, the
module will assume that the data contains only discrete values, and algorithm takes more times.

Continuous distribution is necessary to power-law modeling, but caution is needed when to choose

the “continuous” option for the continuity of model. (Please refer to the reference paper for more

information.)

Lower Bound Test: NetMiner tests which lower bound(x_min) would fits the data most. If ‘All

531
NetMiner Module Reference

Unique Values’ is selected, NetMiner tests all unique values. If the

data is large, the test will take more times. Using ‘Sampling’ option,

you can reduce the time required for the test. If you want to select

concrete lower bound, select ‘User Defined’.

Goodness-of-fit Test: This option tests the goodness of fit of the


model. Since it takes a long time, please use with caution. P-value of

Power-law model is reported as results of goodness-of fit test.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Power Law >> Vector’

statistics module, Main Report and Log-Log Plot are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Alpha, X_min and Kolmorogrov-Smirnov Statistic are reported. If the goodness-of-fit test option is

checked, p-value is also reported.

532
IV. Mining

 Charts
Log-Log Plot

 Time Complexity
 O(n)

 Reference
 Clauset et. al, 2008, 'Power Law Models in Empirical Data' (Not published yet, submitted to
SIAM Review)

 Related Topics

533
NetMiner Module Reference

Statistics >> Power Law >> Matrix

 Menu
Statistics >> Power Law >> Matrix

 Description
NetMiner provides the most probable estimation of power-law model. Traditionally, regression

analysis is used to Power-law model fitting. But the rish of such approach is revealed in 'Power Law

Models in Empirical Data (Clauset et. Al)'. (Read the reference paper, if you want to completly

understand this module.)

NetMiner uses more strict maximum-likelihood approaches and goodness of fit test.

- 'alpha' : The estimation of 'alpha' of power law model p(x) = C * x^-alpha (x >= x_min)

- 'x_min' : The estimation of 'x_min' of power law model p(x) = C * x^-alpha (x >= x_min)

- Kolomogrov-Smirnov Statistic: a goodness-of-fit statistic to the input data for the model.

 User Options

 Input
1-mode Network: Select a 1-mode Network. You can choose just
one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

 Pre-process
Dichotomize: You can dichotomize your data before running module.
By dichotomizing, weighted/valued data is transformed to

unweighted/binary data.

534
IV. Mining

 Main process

Model: Select discrete or continuous. If you select discrete, program


will assume that the data contains only discrete values. In this case,

performing algorithm takes more times. Continuous distribution is

necessary to power-law modeling, but you should be careful when

you decide the continuity of model. (Read reference paper to get more

information.)

Direction: Choose ‘Direction’ option between “In-Degree” and “Out-


Degree”.

Lower Bound Test: NetMiner tests which lower bound(x_min) would fits the data most. If ‘All
Unique Values’ is selected, NetMiner tests all unique values. If the data is large, the test will take

more times. Using ‘Sampling’ option, you can reduce the time required for the test. If you want to

select concrete lower bound, select ‘User Defined’.

Goodness-of-fit Test: This option tests the goodness of fit of the model. It takes much time. So
please use this option carefully. P-value of Power-law model is computed by the result of goodness-

of fit test.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Power Law >>

Matrix’ statistics module, Main Report and Log-Log Plot are created.

535
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
Alpha, X_min and Kolmorogrov-Smirnov Statistic are reported. After goodness-of-fit test option is

checked, p-value is also reported.

 Charts
Log-Log Plot

 Time Complexity
 O(n^2)

536
IV. Mining

 Reference
 Clauset et. al, 2008, 'Power Law Models in Empirical Data' (Not published yet, submitted to
SIAM Review)

 Related Topics

537
NetMiner Module Reference

Statistics >> Descriptives >> Vector

 Menu
Statistics >> Descriptives >> Vector

 Description
This module computes mean, minimum, maximum, variance, standard deviation of chosen vector.

 User Options

 Input
Select Vector: Select a Main Node Attribute to compute mean,
minimum, maximum, variance and standard deviation.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Descriptives’

analysis, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported. Descriptives of selected vector is reported.

538
IV. Mining

 Time Complexity
 O(n)

 Reference

 Related Topics

539
NetMiner Module Reference

Statistics >> Descriptives >> Matrix

 Menu
Statistics >> Descriptives >> Matrix

 Description
This module computes Mean, Min, Max, Variance, [Link]. of values of a chosen matrix.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

2-mode Network: At first, you should select a Sub Nodeset


containing 2-mode Network you want to analyze. Then, select a 2-

mode Network. You can choose just one 2-mode Network at once.

- Link Merge: When selected data contains multiple links (more than

two links which are composed of same source node and target node),

you should decide how to merge them to a single link.

 Main process
Diagonal Handling Option: For ‘retain’ option, diagonal values will
be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

540
IV. Mining

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Descriptives’

module, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported. Descriptives of selected network is reported.

 Time Complexity
 O(n^2)

 Reference

 Related Topics

541
NetMiner Module Reference

Statistics >> Crosstabs >> Vector

 Menu
Statistics >> Crosstabs >> Vector

 Description
This module forms a two-way table using 2 variables and give some statistics.

 User Options

 Input

Row Vector: A vector that will be used as row

Column Vector: A vector that will be used as column

 Post-process
Significance Test
- Classical: Parametric significance test. Several statistics like chi-square, degree of freedom, p-value

(with the assumption that there is not any association between row variable and column variable)

would be calculated.

- Permutation: This test generates randomly permuted vectors (or

matrices) from original vector (or matrix). Matrix is permuted by

node permutation, i.e, row permutation is same as column

permutation.

- Classical: A parametric significance test. Several statistics like chi-

square, degree of freedom, p-value would be calculated with the assumption that there is no

542
IV. Mining

association between row variable and column variable.

- Permutation: This test generates randomly permuted vectors or matrices from original vector or

matrix, respectively. Matrix is permuted by node permutation, i.e, row permutation is same as

column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and

distribution of statistics results would be given.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Crosstabs’

module, Main Report and Crosstable are created.

543
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report

- Data Count: Missing value information is reported.

- chi square: 0 means no association. The larger the value is, the greater the association is. Its value

depends on sample size.

- Cramer’s V: Statistics to normalize a chi square. 0 means there is no association. It cannot be

greater than 1, which indicates the complete association.

- Normality Test

If classical significance test is used:

– chi-square

– degree of freedom

– p-value

If permutation significance test is used:

– Observed value: chi square value in the original vector

– Expected(mean): Average of chi square values in the permuted vectors

– Std. Dev.: Standard Deviation of chi square values in the permuted vectors

– P (>= Obs.): # permuted vectors whose chi square is greater than the observed value.

– P (<= Obs.): # permuted vectors whose chi square is less than the observed value.

544
IV. Mining

 Tables
Crosstable

 Time Complexity
 O(n)

 Reference

 Related Topics

545
NetMiner Module Reference

Statistics >> Crosstabs >> Matrix

 Menu
Statistics >> Crosstabs >> Matrix

 Description
This module forms a two-way table using 2 variables and gives some statistics.

 User Options

 Input

Select Row Matrix: A matrix that will be used as row

Select Column Matrix: A matrix that will be used as column

 Main process

Vectorization Method: Choose either “Use entire cell values” or


“Use upper triangular values”.

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

546
IV. Mining

 Post-process
Significance Test
- Classical: Parametric significance test. Several statistics like chi-

square, degree of freedom, p-value (with the assumption that there is

not any association between row variable and column variable) would

be calculated.

- Permutation: This test generates randomly permuted vectors (or matrices) from original vector (or

matrix). Matrix is permuted by node permutation, i.e, row permutation is same as column

permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics results would

be given.

547
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Crosstabs’ module,

Main Report and Crosstable are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported.

- chi square: 0 means no association. The larger the value is, the greater the association is. Its value

depends on sample size.

- Cramer’s V: Statistics to normalize a chi square. 0 means there is no association. It cannot be

greater than 1, which indicates the complete association.

- Normality Test

If classical significance test is used:

– chi-square

– degree of freedom

– p-value

If permutation significance test is used :

– Observed value: chi square value in the original matrix

– Expected(mean): Average of chi square values in the permuted matrices

– Std. Dev.: Standard Deviation of chi square values in the permuted matrices
– P (>= Obs.): # permuted matrices whose chi square is greater than the observed value.

– P (<= Obs.): # permuted matrices whose chi square is less than the observed value.

548
IV. Mining

 Tables
Crosstable

 Time Complexity
 O(n^2)

 Reference

 Related Topics

549
NetMiner Module Reference

Statistics >> ANOVA >> Vector

 Menu
Statistics >> ANOVA >> Vector

 Description
The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative

dependent variable by a single factor (independent) variable. Analysis of variance is used to test the

hypothesis that several means are equal.

 User Options

 Input
Dependent Variable (Interval): Select a Main Node Attribute data,
which should be an interval variable.

Independent Variable (Categorical): Select a Main Node Attribute,


which should be a categorical variable.

 Post-process
Significance Test
- Classical: Parametric significance test. Several statistics like chi-square, degree of freedom, p-value

(with the assumption that there is not any association between row variable and column variable)

would be calculated.

- Permutation: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors

(or matrices) from original vector (or matrix). Matrix is permuted by

node permutation, i.e, row permutation is same as column

550
IV. Mining

permutation.

- Permutation: Nonparametric significance test. Permutation Significance Test generates randomly

permuted vectors (or matrices) from original vector (or matrix). Matrix is permuted by node

permutation, i.e, row permutation is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and

distribution of statistics

results would be given.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘ANOVA’ module,

Main Report and Box Plot are created.

551
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

- Data Count: Missing value information is reported.

- Result of ANOVA Vector

- Normality Test:

If classical significance test is used:

– Eta-Square

– F ratio

– p-value

If permutation significance test is used:

– Observed value: F ratio in the original vector

– Expected(mean): Average of F ratio values in the permuted vectors

– Std. Dev.: Standard Deviation of F ratio values in the permuted vectors

– P (>= Obs.): # permuted vectors whose F ratio is greater than the observed value.

– P (<= Obs.): # permuted vectors whose F ratio is less than the observed value.

552
IV. Mining

 Charts
Box Plot

 Time Complexity
 O(n)

 Reference

 Related Topics

553
NetMiner Module Reference

Statistics >> ANOVA >> Matrix

 Menu
Statistics >> ANOVA >> Matrix

 Description
The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative

dependent variable by a single factor (independent) variable. Analysis of variance is used to test the

hypothesis that several means are equal.

 User Options

 Input
Dependent Variable (Interval): Select a 1-mode network data,
which should be an interval variable.

Independent Variable (Categorical): Select a 1-mode network


data, which should be a categorical variable.

 Pre-Process
Vectorization Method: Selecting a vectorization option is mandatory between entire cell values or
upper triangular values.

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

554
IV. Mining

 Post-process
Significance Test
- Classical: Parametric significance test. Several statistics like chi-

square, degree of freedom, p-value (with the assumption that there is

not any association between row variable and column variable) would

be calculated.

- QAP: Nonparametric significance test Permutation Significance Test generates randomly permuted

vectors (or matrices) from original vector (or matrix). Matrix is permuted by node permutation, i.e,

row permutation is same as column permutation.

For example:

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics

results would be given.

555
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘ANOVA’ module,

Main Report and Box Plot are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports
Main Report

- Data Count: Missing value information is reported.

- Result of ANOVA Vector

- Normality Test:

If classical significance test is used :

– Eta-Square

– F ratio

– p-value

If permutation significance test is used :

– Observed value: F ratio in the original matrix

– Expected(mean): Average of F ratio values in the permuted matrices

556
IV. Mining

– Std. Dev.: Standard Deviation of F ratio values in the permuted matrices

– P (> Obs.): # permuted matrices whose F ratio is greater than the observed value.

– P (= Obs.): # permuted matrices whose F ratio is same as the observed value.

– P (< Obs.): # permuted matrices whose F ratio is less than the observed value.

 Charts
Box Plot

 Time Complexity
 O(n)

 Reference

 Related Topics

557
NetMiner Module Reference

Statistics >> Correlation >> Vector

 Menu
Statistics >> Correlation >> Vector

 Description
This module computes various correlation coefficients between pairs of attribute variables.

 User Options

 Input
Select Variables: variables that correlation is computed among.

 Main process

Proximity Measures
These measure options determine how to compare row profiles.

NetMiner3 provides various measuring options to compare two

vectors. Measure options are classified into 3 major categories:

Match, Correlation, and Distance. 1) Match measures check whether

each value of two vectors is identical. For measures under the

‘Match’ category, the input with only binary vectors is allowed. The

result value has range of 0 to 1, where the value closer to 1 indicates

the high similarity between two subjects. 2) In the Correlation

category, when two vectors (which are represents two subjects) are compared, the greater correlation

value is the more similar two subjects are. 3) In the Distance category, when two vectors (which

represent two subjects) are compared, the greater distance value is, the more dissimilar two subjects

are.

558
IV. Mining

- Match

For selected two nodes’ row profiles R=(R_1, R_2, …, R_n) and S=(S_1, S_2, …, S_n),

a: The number of i with R_i =1 and S_i = 1

b: The number of i with R_i =1 and S_i = 0

c: The number of i with R_i =0 and S_i = 1

d: The number of i with R_i =0 and S_i = 0

a
Jaccard coefficient 
abc
a
Ochiai 
{( a  b)( a  c)}1 / 2

2a
Czekanowski, Sorensen, Dice 
2a  b  c

a
Russel, Rao 
abcd
a
Simpson 
min{( a  b), (a  c)}
a
Braun, Blanque 
max{( a  b), (a  c)}
a
Kulczynski1 
bc
1 a a
Kulczynski2  (  )
2 ab ac
C ij C ij
Equivalence Index ( )( )
Ci Cj

a
Sokal, Sneath, Anderberg 
a  2(b  c)

2a
Mountford 
a(b  c)  2bc

559
NetMiner Module Reference

ad
Simple Matching 
abcd

ad  bc
Yule 
ad  bc

ad  bc
Phi  1
{( a  b)( a  c)(b  d )(c  d )} 2

(a  d )  (b  c)
Hamman 
abcd

a(a  b  c  d )
Mozley, Margalef 
(a  b)( a  c)

ad
Roger, Tanimoto 
a  2b  2c  d

4(ad  bc)
Michael 
(a  d ) 2  (b  c) 2

- Correlation

C ik : k-th element of profile vector which represents subject i.

 (C ik  C i )(C jk  C j )
Pearson’s Correlation  k 1
n n

 (Cik  C i ) 2
k 1
 (C
k 1
jk  C j )2

560
IV. Mining

C ik C jk
Cosine Similarity  k 1
n n

 Cik C
2 2
jk
k 1 k 1
n
Inner Product   Cik C jk
k 1
n
6 (Cik  C jk ) 2
Spearman’s rho  1 k 1

n(n 2  1)

- Distance

C ik : k-th element of profile vector which represents subject i.

1
Euclidean Distance  { (Cik  C jk ) 2 } 2
k

City Block Metric   C ik  C jk


k

1

Minkowski Metric  { wk Cik  C jk } 
k

Cik  C jk
Canberra Metric 
k ( Cik  C jk )

1
C ik  C jk
 k

 (C
Bray-Curtis
p ik  C jk )
k

1 (Cik  C jk ) 2
Divergence  
p k (Cik  C jk ) 2

561
NetMiner Module Reference

 C C ik jk
Soergel  k

 max( C , C
k
ik jk )

1 1 1
Bhattacharyya Distance  { (Cik 2  C jk 2 ) 2 } 2
k

1 min( Cik , Cik )


Wave-Heedges  
p k
(1 
max( Cik , C jk )
)

Parameter: used when Minkowski metric is selected.

 Post-process
Significance Test
- Classical: Parametric significance test. Several statistics like chi-

square, degree of freedom, p-value (with the assumption that there is

not any association between row variable and column variable) would

be calculated.

- Permutation: Nonparametric significance test. Permutation

Significance Test generates randomly permuted vectors (or matrices) from original vector (or matrix).

Matrix is permuted by node permutation, i.e, row permutation is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

562
IV. Mining

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics results would

be given.

# of Iterations

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Correlation >> Vector’ statistics module, Main Report, Correlation Table and

Correlation Significance are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

563
NetMiner Module Reference

Main Report
- Data Count: Missing value information is reported.

- Correlation Table

 Tables
Correlation Table

Significance Tables (P > Obs., P= Obs., P< Obs.)

 Time Complexity
 O(n)

564
IV. Mining

 Reference

 Related Topics

565
NetMiner Module Reference

Statistics >> Correlation >> Matrix

 Menu
Statistics >> Correlation >> Matrix

 Description
This module computes various correlation coefficients between pairs of adjacency variables (1-mode

square matrices) by comparing each corresponding cells. In addition, significance level of each

correlation coefficient is verified by QAP permutation significance test.

 User Options

 Input
Select Variables: variables that correlation is computed among.

 Pre-process

Vectorization Method: Choose either “Use entire cell values” or “Use


upper triangular values”.

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

 Main process
These measure options determine how to compare row profiles.

NetMiner3 provides various measuring options to compare two vectors. Measure options are

classified into 3 major categories: Match, Correlation, and Distance. 1) Match measures check

566
IV. Mining

whether each value of two vectors is identical. For measures under

the ‘Match’ category, the input with only binary vectors is allowed.

The result value has range of 0 to 1, where the value closer to 1

indicates the high similarity between two subjects. 2) In the

Correlation category, when two vectors (which are represents two

subjects) are compared, the greater correlation value is the more

similar two subjects are. 3) In the Distance category, when two

vectors (which represent two subjects) are compared, the greater

distance value is, the more dissimilar two subjects are.

- Match

For selected two nodes’ row profiles R=(R_1, R_2, …, R_n) and S=(S_1, S_2, …, S_n),

a: The number of i with R_i =1 and S_i = 1

b: The number of i with R_i =1 and S_i = 0

c: The number of i with R_i =0 and S_i = 1

d: The number of i with R_i =0 and S_i = 0

a
Jaccard coefficient 
abc
a
Ochiai 
{( a  b)( a  c)}1 / 2

2a
Czekanowski, Sorensen, Dice 
2a  b  c

a
Russel, Rao 
abcd
a
Simpson 
min{( a  b), (a  c)}
a
Braun, Blanque 
max{( a  b), (a  c)}

567
NetMiner Module Reference

a
Kulczynski1 
bc
1 a a
Kulczynski2  (  )
2 ab ac

C ij C ij
Equivalence Index ( )( )
Ci Cj

a
Sokal, Sneath, Anderberg 
a  2(b  c)

2a
Mountford 
a(b  c)  2bc

ad
Simple Matching 
abcd

ad  bc
Yule 
ad  bc

ad  bc
Phi  1
{( a  b)( a  c)(b  d )(c  d )} 2

(a  d )  (b  c)
Hamman 
abcd

a(a  b  c  d )
Mozley, Margalef 
(a  b)( a  c)

ad
Roger, Tanimoto 
a  2b  2c  d

4(ad  bc)
Michael 
(a  d ) 2  (b  c) 2

568
IV. Mining

- Correlation

C ik : k-th element of profile vector which represents subject i.

 (C ik  C i )(C jk  C j )
Pearson’s Correlation  k 1
n n

 (Cik  C i ) 2
k 1
 (C
k 1
jk  C j )2

C ik C jk
Cosine Similarity  k 1
n n

 Cik C
2 2
jk
k 1 k 1
n
Inner Product   Cik C jk
k 1
n
6 (Cik  C jk ) 2
Spearman’s rho  1 k 1

n(n 2  1)

- Distance

C ik : k-th element of profile vector which represents subject i.

1
Euclidean Distance  { (Cik  C jk ) 2 } 2
k

City Block Metric   C ik  C jk


k

1

Minkowski Metric  { wk Cik  C jk } 
k

569
NetMiner Module Reference

Cik  C jk
Canberra Metric 
k ( Cik  C jk )

1
C ik  C jk
Bray-Curtis
 k

p  (C
k
ik  C jk )

1 (Cik  C jk ) 2
Divergence  
p k (Cik  C jk ) 2

 C C ik jk
Soergel  k

 max( C , C
k
ik jk )

1 1 1
Bhattacharyya Distance  { (Cik  C jk ) } 2 2
2 2

1 min( Cik , Cik )


Wave-Heedges  
p k
(1 
max( Cik , C jk )
)

Parameter: used when Minkowski metric is selected.

 Post-process

Significance
- Classical: Parametric significance test. Several statistics like chi-

square, degree

of freedom, p-value (with the assumption that there is not any

association between

row variable and column variable) would be calculated.

- QAP: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors (or matrices) from

original vector (or matrix). Matrix is permuted by node permutation, i.e, row permutation

570
IV. Mining

is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics

results would be given.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Correlation >> Matrix’

statistics module, Main Report, Correlation Table and Correlation

Significance tables are created.

571
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported.

- Correlation Table

 Tables
Correlation Table

Significance Tables (P > Obs., P= Obs., P< Obs.)

 Time Complexity
 O(n)

572
IV. Mining

 Reference

 Related Topics

573
NetMiner Module Reference

Statistics >> Autocorrelation >> Join-Count

 Menu
Statistics >> Autocorrelation >> Join-Count

 Description
Autocorrelation measures how like values tend to cluster in network. In this menu, binary variable

are measured.

It is simple to calculate. Three numbers are offered basically, which are BB, BW, WW. BB means

the number of links whose both terminal nodes have B as selected attribute variable, BW means the

number of links whose one terminal node has B and the other has as W, and final WW means the

number of links whose both terminal nodes have W as selected attribute variable.

Also, the significance test by permuting the selected attribute variable randomly is offered.

 User Options

 Input

1-mode Network: This menu analyzes how similar vector values


two connected nodes have.

Select Vector: Select a Main Node Attribute, which should be a


binary variable.

574
IV. Mining

 Pre-process

Dichotomize: This analysis requires an unweighted network. So you


should dichotomize if you want to process a weighted network.

Vector Dichotomize: This analysis requires a dichotomous attribute


variable.

Symmetrize: This analysis requires a symmetric network. So you should symmetrize if you want to
process a directed network.

 Main process

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

 Post-process

Significance
- Permutation: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors

(or matrices) from original vector (or matrix). Matrix is permuted by

node permutation, i.e, row permutation is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

575
NetMiner Module Reference

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics results would

be given.

# of Iterations

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Autocorrelation >> Join-

Count’ module, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported.

- Result of Autocorrelation Join-Count:

Observed: The number of (BB, BW, WW) links in the original matrix

Expected: Average of the number of (BB, BW, WW) links in the permuted matrices

576
IV. Mining

Std. Dev.: Standard Deviation of the number of (BB, BW, WW) links in the permuted matrices

P (> Obs.): # permuted matrices whose the number of (BB, BW, WW) links is greater than the

observed value.

P (= Obs.): # permuted matrices whose the number of (BB, BW, WW) links is same as the observed

value.

P (< Obs.): # permuted vectors whose the number of (BB, BW, WW) links is less than the observed

value.

 Time Complexity
 O(m)

 Reference
 Cliff, A D and Ord, J K. (1973) Spatial Autocorrelation. Pion, London.

 Related Topics

577
NetMiner Module Reference

Statistics >> Autocorrelation >> Continuous

 Menu
Statistics >> Autocorrelation >> Continuous

 Description
This module measures how like values tend to cluster in network. In this menu, continuous variable

are measured. There are two methods: Morans’s I and Geary’s C. 1) Moran's I typically ranges from -

1 to 1. An uncorrelated data has 0 as an I. Negative value of I indicates negative autocorrelation (the

node with high value tends to be connected to the node with low value, and vice versa) and positive

values indicate positive autocorrelation (the node with high value tends to be connected to the node

with high value, and same with low value). 2) Geary's C typically ranges from 0 to 3. It cannot be

negative. An uncorrelated data has an expected C value of 1. Values less than 1 indicate positive

spatial autocorrelation, while values greater than 1 indicate negative autocorrelation.

FYI: Moran’s I is computed by covariance of deviations from the mean of connected pairs. But

Geary’s C is computed by deviations of each node of connected pairs. Moran’s I is more global

indicator, in other hand Geary’s C is focused on local neighborhoods.

 User Options

 Input

1-mode Network: This menu analyzes how similar vector values two
connected nodes have.

Select Vector: Select a Main Node Attribute, which should be a


continuous variable.

578
IV. Mining

 Main process

Algorithm: Moran’ I or Geary’ C

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

 Post-process
Significance: Permutation based significance test.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Autocorrelation >>

Continuous’ module, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count

- Result of Autocorrelation continuous:

Observed: observed value of selected measure

Expected: expected value of selected measure in permutation simulation

Std. Dev.: Standard Deviation of selected measure in permutation simulation

579
NetMiner Module Reference

P (> Obs.): probability simulated measure is higher than observed value.

P (= Obs.): probability simulated measure is same as observed value.

P (< Obs.): probability simulated measure is lower than observed value.

 Time Complexity
 O(m)

 Reference
 Cliff, A D and Ord, J K. (1973) Spatial Autocorrelation. Pion, London.

 Related Topics

580
IV. Mining

Statistics >> Regression >> Vector

 Menu
Statistics >> Regression >> Vector

 Description
This module regress a dependent attribute variable on one or more independent attribute variables. R-

square and F-value is calculated for the regression model, and regression coefficient, standard error is

reported.

 User Options

 Input

Select Dependent Variable: Select a Main Node Attribute data, which


should be used as the dependent variable.

Select Independent Variables: Select Main Node Attributes as the


independent variables. This option is activated when you select a

dependent variable.

 Pre-process
Standardization Option: By checking 'standardize' option, you can
standardize the input variables.

 Post-process
Significance
- Classical: Parametric significance test. Several statistics like chi-square, degree

of freedom, p-value (with the assumption that there is not any association between

581
NetMiner Module Reference

row variable and column variable) would be calculated.

- Permutation: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors

(or matrices) from

original vector (or matrix). Matrix is permuted by node permutation,

i.e, row permutation

is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics results would

be given.

582
IV. Mining

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Regression >>

Vector’ module, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports
Main Report
- Data Count: Missing value information is reported.

- Analysis of Variance

If classical significance test is used: R-square, F-value and p-value (classical test) are reported.

If permutation significance test is used: R-square, F-value, p-value (classical test), P (> Obs.), P (=

Obs.), P (< Obs.) are reported.

- Parameter Estimates

If classical significance test is used: Regression Coefficient, standard error and p-value (in classical

test) are reported.

If permutation significance test is used: Regression Coefficient (as observed), Expected, Std. Dev., P

(> Obs.), P (= Obs.), P (< Obs.) are reported.

583
NetMiner Module Reference

 Time Complexity
 O(nk^2 + k^3) where k = # vectors

 Reference

 Related Topics

584
IV. Mining

Statistics >> Regression >> Matrix

 Menu
Statistics >> Regression >> Matrix

 Description
This module regress a dependent adjacency variable (1-mode square matrix) on one or more

independent adjacency variables(1-mode square matrices). R-square and F-value is calculated for the

regression model, and regression coefficient, standard error is reported. Also QAP permutation

significance test of R-square and each regression coefficients are performed.

 User Options

 Input

Select Dependent Variable: Select a 1-mode Network data, which


should be used as the dependent variable.

Select Independent Variables: Select 1-mode Networks as the


independent variables. This option is activated when you select a

dependent variable.

 Main process

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

Standardization Option: By checking 'standardize' option, you can standardize the input variables.

585
NetMiner Module Reference

 Post-process
Significance
- Classical: Parametric significance test. Several statistics like chi-

square, degree

of freedom, p-value (with the assumption that there is not any

association between

row variable and column variable) would be calculated.

- Permutation: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors

(or matrices) from

original vector (or matrix). Matrix is permuted by node permutation, i.e, row permutation

is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

586
IV. Mining

Statistics is performed to generated vectors (or matrices), and distribution of statistics

results would be given.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Regression >>

Matrix’ module, Main Report is created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report
- Data Count: Missing value information is reported.

- Analysis of Variance

If classical significance test is used: R-square, F-value and p-value (classical test) are reported.

If permutation significance test is used: R-square, F-value, p-value (classical test), P (> Obs.), P (=

Obs.), P (< Obs.) are reported.

- Parameter Estimates

If classical significance test is used: Regression Coefficient, standard error and p-value (in classical

test) are reported.

If permutation significance test is used: Regression Coefficient (as observed), Expected, Std. Dev., P

(> Obs.), P (= Obs.), P (< Obs.) are reported.

587
NetMiner Module Reference

 Time Complexity
 O(n^2*k^2 + k^3) where k = # vectors

 Reference

 Related Topics

588
IV. Mining

Statistics >> Logistic Regression >> Vector

 Menu
Statistics >> Logistic Regression >> Vector

 Description
Logistic Regression module is modified in ways that it can be applied to node attributes as in Vector

version and the weights of the network data as in Matrix version. Its vector version employs some of

Main Node Set attributes as independent variables and its one attribute as a dependent variable,

computes beta parameter values, and predict the values for the dependent variable based upon the

estimated beta parameter values.

 User Options

 Input

Select a Dependent Variable: Select a Main Node Attribute data,


which should be used as the dependent variable.

Select Independent Variables: Select Main Node Attributes as the


independent variables. This option is activated when you select a

dependent variable.

 Pre-process

Dependent Dichotomize: You should dichotomize your data before


running module. By dichotomizing, weighted/valued data is

transformed to non-weighted/binary data.

589
NetMiner Module Reference

Standardization: You can select standardization option before analysis, since the scale of each
independent variable may different from one another.

 Main process

# of Iterations: You can select number of iterations, where the greater


the iteration lead to the better convergence of prediction.

 Post-process
Significance
- Classical: Parametric significance test. Several statistics like chi-

square, degree

of freedom, p-value (with the assumption that there is not any

association between

row variable and column variable) would be calculated.

- Permutation: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors (or matrices) from

original vector (or matrix). Matrix is permuted by node permutation, i.e, row permutation

is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

590
IV. Mining

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics

results would be given.

# of Iterations

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Logistic

Regression’ module, Main Report, Predicted Values Table, Residual

Values Table are created.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports

Main Report

- Data count: Missing value information is reported.

- -2*Loglikelihood: -2 * the log likelihood for the fitted model

- Goodness of Fit: A Pearson statistic after setting the number of observation per each group to 1.

591
NetMiner Module Reference

Ref) Agresti, A. Categorical Data Analysis. Second Edition. 2002. p177.

- Model Chi-Squared: The likelihood-ratio statistics for comparing the fitted Model with the null

model. This value is 2 * (log (likelihood for the fitted model) - log (likelihood for the null model).

- Parameter Estimates

If classical significance test is used:

- Estimates: estimates for Beta parameters

- Std. Err.: Standard Errors for the Estimates

- PLWald: Used to verify whether an estimate for a Beta parameter is in fact zero or not. For example,

if PLWald is zero, the value of estimate as zero makes sense.

- p(df=1.0): Assuming PLWald follows chi-square distribution, it is p-value when the degree of

freedom is one.

- Exp(b) : expontial of each estimate

If permutation significance test is used:

- Observed: estimates for Beta parameters in original vector

- Expected (mean): Average of estimates in the permuted vectors

- [Link].: Standard Deviation of estimates in the permuted vectors

- P (>=Obs.): # permuted vectors whose F ratio is greater than the observed value.

- P (==Obs.): # permuted vectors whose F ratio is same as the observed value.

- P (<=Obs.): # permuted vectors whose F ratio is less than the observed value.

- Exp(b): expontial of each estimate

- Parameter Correlation Matrix:

592
IV. Mining

 Tables

Predicted Values
The values predicted by logistic regressions

Residual Values
Observed value – Predicted Value. The lower, the better in terms of accuracy of prediction.

593
NetMiner Module Reference

 Time Complexity
 O(k*i), where k and i represent the number of cases and the average number of iterations
required, respectively.

 Reference

 Related Topics

594
IV. Mining

Statistics >> Logistic Regression >> Matrix

 Menu
Statistics >> Logistic Regression >> Matrix

 Description
Logistic Regression module is modified in ways that it can be applied to node attributes as in Vector

version and the weights of the network data as in Matrix version. Its matrix version employs weights

of some 1-mode network data as independent variables and one 1-mode network data as a dependent

variable, computes beta parameter values, and predict the values of the matrix.

 User Options

 Input

Dependent Variable: Select a 1-mode Network data, which should be


used as the dependent variable.

Independent Variables: Select 1-mode Networks as the independent


variables. This option is activated when you select a dependent

variable.

 Pre-process

Diagonal Handling Option: For ‘retain’ option, diagonal values will


be included for the operation. Upon ‘ignore’ option, diagonal values

will be excluded from the computation.

Dependent Dichotomize: You should dichotomize your data before running module. By

595
NetMiner Module Reference

dichotomizing, weighted/valued data is transformed to non-weighted/binary data.

Standardization: You can select standardization option before analysis, since the scale of each
independent variable may different from one another.

 Main process

# of Iterations: You can select number of iterations, where the


greater the iteration lead to the better convergence of prediction.

 Post-process
Significance

- Classical: Parametric significance test. Several statistics like chi-

square, degree

of freedom, p-value (with the assumption that there is not any

association between

row variable and column variable) would be calculated.

- QAP: Nonparametric significance test

Permutation Significance Test generates randomly permuted vectors (or matrices) from

original vector (or matrix). Matrix is permuted by node permutation, i.e, row permutation

is same as column permutation.

For example :

Vector : [1, 3, 5, 2]

Randomly Permuted Vectors : [2, 3, 5, 1], [5, 3, 1, 2], ...

Matrix :

5, 1, 2, 3

7, 4, 4, 3

1, 4, 0, 2

4, 2, 1, 6

596
IV. Mining

If node 1, 2 are changed :

4, 7, 4, 3

1, 5, 2, 3

4, 1, 0, 2

2, 4, 1, 6

If node 1, 3 are changed and node 2, 4 are changed :

0, 2, 1, 4

1, 6, 4, 2

2, 3, 5, 1

4, 3, 7, 4

Statistics is performed to generated vectors (or matrices), and distribution of statistics

results would be given.

# of Iterations

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Logistic Regression >> Matrix’ module, Main Report, Predicted Values Table,

Residual Values Table are created.

597
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Reports
Main Report

- Data count: Missing value information is reported.

- -2*Loglikelihood: -2 * the log likelihood for the fitted model

- Goodness of Fit: A Pearson statistic after setting the number of observation per each group to 1.

Ref) Agresti, A. Categorical Data Analysis. Second Edition. 2002. p177.

- Model Chi-Squared: The likelihood-ratio statistics for comparing the fitted Model with the null

model. This value is 2 * (log (likelihood for the fitted model) - log (likelihood for the null model).

- Parameter Estimates

If classical significance test is used:

- Estimates: estimates for Beta parameters

- Std. Err.: Standard Errors for the Estimates

- PLWald: Used to verify whether an estimate for a Beta parameter is in fact zero or not. For example,

if PLWald is zero, the value of estimate as zero makes sense.

- p(df=1.0): Assuming PLWald follows chi-square distribution, it is p-value when the degree of

freedom is one.

- Exp(b) : expontial of each estimate

If permutation significance test is used:

- Observed: estimates for Beta parameters in original matrix

- Expected (mean): Average of estimates in the permuted matrices

- [Link].: Standard Deviation of estimates in the permuted matrices

- P (>=Obs.): # permuted matrices whose F ratio is greater than the observed value.

- P (==Obs.): # permuted matrices whose F ratio is same as the observed value.

- P (<=Obs.): # permuted matrices whose F ratio is less than the observed value.

- Exp(b): expontial of each estimate

598
IV. Mining

- Parameter Correlation Matrix:

 Tables
Predicted Values
The values predicted by logistic regressions.

Residual Values
Observed value – Predicted Value. The lower, the better in terms of accuracy of prediction.

599
NetMiner Module Reference

 Time Complexity
 O(k*i), where k and i represent the number of cases and the average number of iterations
required, respectively.

600
IV. Mining

IV. Mining
1. Frequent Subgraph
 GREW (Undirected Graphs)

 GREW (Directed Graphs)

 gSpan (Multiple Graphs)


 gSpan (Partitioning)

2. Getting Started with Solving Classification Problems using NetMiner

3. Classification
 k-Nearest Neighbor (KNN) (Matrix)

 k-Nearest Neighbor (KNN) (Vector)

 Naive Bayes

 Discriminant Analysis

 Support Vector Machines (SVMs)

 Classification And Regression Tree (CART)

 Multilayer Perceptron

4. Regression
 Classification And Regression Tree (CART)

5. Collaborative Filtering
 Singular Value Decomposition (SVD)
 Singular Value Decomposition++ (SVD++)

 Social Singular Value Decomposition (Social SVD++)

 Implicit Singular Value Decomposition (ISVD)

 User Based

6. Reduction
 Non-Negative Matrix Factorization (NNMF)

7. Clustering
 Hierarchical (Matrix)

 Hierarchical (Vector)

 K-means

601
NetMiner Module Reference

 Gaussian Mixture Model (GMM)

 Partitioning Around Medoids (PAM) (Matrix)

 Partitioning Around Medoids (PAM) (Vector)

8. Anomaly Detection
 Probability Distribution (Independent Normal)

 Probability Distribution (Multivariate Normal)

 Local Outlier Factor (Matrix)

 Local Outlier Factor (Vector)

 Attribute Value Frequency

9. Text
 Topic (LDA)

602
IV. Mining

Mining >> Frequent Subgraph >> GREW >>

Undirected Graphs
 Menu
Mining >> Frequent Subgraph>> GREW >> Undirected Graphs

 Description
This module is the Frequent Subgraph Mining algorithm devised by M. Kuramochi and G. Karypis.
When given a graph database that contains labeled graphs, the algorithm approximately finds a

vertex-disjoint frequent subgraph by employing an Apriori approach.

 User Options
 Input
Undirected 1-mode Networks: Select one or more
undirected 1-mode network(s) to be included in a graph

database.

Node Label: Select a node attribute(s), which will be


used as a node label. This algorithm sets the same label

for two nodes only if they have the same values for the

chosen attributes. For instance, when both nodes have

18 and Seoul for AGE and LOCATION attributes

respectively, they will have the same label. However, if

a user chooses not to select any node attributes, this

algorithm will not consider labels when finding a frequent subgraph(s) among the selected networks.

Link Label: Among the link attributes that selected networks commonly have, select a link
attribute(s). This algorithm sets the same label for two links only if they have the same values for the

chosen attributes. However, if a user chooses not to select any link attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

603
NetMiner Module Reference

CAVEAT: Words ‘networks’ and ‘graphs’ will be used interchangeably.

 Main process
Restriction: Allows a user to find a frequent subgraph(s)
that satisfies the following conditions.

 Input graphs (partitions) containing subgraph:

Sets the minimum and maximum ratio of graphs

containing a frequent subgraph(s) among the

selected graphs.

 # of occurrences: Sets the minimum and maximum

number of occurrences of a frequent subgraph(s)

among the selected graphs.

 Size of subgraphs: Sets the minimum and

maximum size of a frequent subgraph(s)

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘GREW >> Undirected’ analysis,

‘Main Report’, ‘Subgraphs Table’ and ‘Spring Map’ are

reported.

604
IV. Mining

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
 Number of graphs: The number of graphs.

 Number of subgraphs: The number of subgraphs.

 Ratio of graphs containing subgraph: The number of subgraphs for each ratio range.

 Number of occurrences: The number of subgraphs for each occurrence range.

 Size of subgraphs: The number of subgraphs for each size range.

 Tables

605
NetMiner Module Reference

Subgraphs Table: The result for each frequent subgraph is organized in a table form
 A user can right-click onto the name of each subgraph such as S0 to ‘Visualize’ a subgraph.

 A user can right-click onto the title of each column to re-arrange or ‘Sort’ the table.

 Map
Spring: For each selected network, a user can browse as to where a frequent subgraph(s) is located
on a network.

 References
 M. Kuramochi and G. Karypis (2004). GREW: A Scalable frequent subgraph discovery

algorithm.

606
IV. Mining

Mining >> Frequent Subgraph >> GREW >>

Directed Graphs

 Menu
Mining >> Frequent Subgraph>> GREW>> Directed Graphs

 Description
This module is the Frequent Subgraph Mining algorithm devised by M. Kuramochi and G. Karypis.
When given a graph database that contains labeled graphs, the algorithm approximately finds a

vertex-disjoint frequent subgraph by employing an Apriori approach.

 User Options
 Input
Directed 1-mode Networks: Select one or more directed 1-
mode network(s) to be included in a graph database.

Node Label: Select a node attribute(s), which will be used


as a node label. This algorithm sets the same label for two

nodes only if they have the same values for the chosen

attributes. For instance, when both nodes have 18 and Seoul

for AGE and LOCATION attributes respectively, they will

have the same label. However, if a user chooses not to select

any node attribute(s), this algorithm will not consider labels

when finding a frequent subgraph(s) among the selected

networks.

Link Label: Among the link attributes that selected networks commonly have, select a link
attribute(s). This algorithm sets the same label for two links only if they have the same values for the

607
NetMiner Module Reference

chosen attributes. However, if a user chooses not to select any link attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

CAVEAT: Words ‘networks’ and ‘graphs’ will be used interchangeably.

 Main process
Restriction: Allows a user to find a frequent subgraph(s) that satisfies the following conditions.
 Input graphs (partitions) containing subgraph: Sets the minimum and maximum ratio of

graphs containing a frequent subgraph(s) among the selected graphs.

 # of occurrences: Sets the minimum and maximum number of occurrences of a frequent

subgraph(s) among the selected graphs.

 Size of subgraphs: Sets the minimum and maximum size of a frequent subgraph(s)

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘GREW >> Directed’ analysis, ‘Main

Report’, ‘Subgraphs Table’ and ‘Spring Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
 Number of graphs: The number of graphs.

 Number of subgraphs: The number of subgraphs.

 Ratio of graphs containing subgraph: The number of

subgraphs for each ratio range.

 Number of occurrences: The number of subgraphs for each occurrence range.

 Size of subgraphs: The number of subgraphs for each size range.

608
IV. Mining

 Tables
Subgraphs Table: The result for each frequent subgraph is organized in a table form
 A user can right-click onto the name of each subgraph such as S0 to ‘Visualize’ a subgraph.

 A user can right-click onto the title of each column to re-arrange or ‘Sort’ the table.

609
NetMiner Module Reference

 Map
Spring: For each selected network, a user can browse as to where a frequent subgraph(s) is located
on a network.

 References
 M. Kuramochi and G. Karypis (2004). GREW: A Scalable frequent subgraph

discovery algorithm.

610
IV. Mining

Mining >> Frequent Subgraph >> gSpan >>

Multiple Graphs

 Menu
Mining >> Frequent Subgraph >> gSpan>> Multiple Graphs

 Description
This module is the Frequent Subgraph Mining algorithm devised by X. Yan and J. Han. When given

a graph database that contains an undirected label graph(s), this algorithm finds every frequent

subgraph by using a pattern growth approach. As it finds every possible frequent subgraph, if a large

number of labeled graphs is not sparse and the labels are not diverse, it will take long time to

compute the algorithm.

 User Options
 Input

611
NetMiner Module Reference

Undirected 1-mode Networks: Select one undirected 1-mode network to partition this network.

Node Label: Select a node attribute(s), which will be used as a node label. This algorithm sets the
same label for two nodes only if they have the same values for the chosen attributes. For instance,

when both nodes have 18 and Seoul for AGE and LOCATION attributes respectively, they will have

the same label. However, if a user chooses not to select any node attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

Link Label: Among the link attributes that selected networks commonly have, select a link
attribute(s). This algorithm sets the same label for two links only if they have the same values for the

chosen attributes. However, if a user chooses not to select any link attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

 Main process
Restriction: Allows a user to find a frequent subgraph(s) that satisfies the following conditions.

 Input graphs (partitions) containing subgraph: Sets the minimum and maximum ratio of

graphs containing a frequent subgraph(s) among the selected graphs.

612
IV. Mining

 # of occurrences: Sets the minimum and maximum number of occurrences of a frequent

subgraph(s) among the selected graphs.

 Size of subgraphs: Sets the minimum and maximum size of a frequent subgraph(s)

 Output
A user can select in which format(s) the outputs are to be reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
 Number of graphs: The number of graphs.

 Number of subgraphs: The number of subgraphs.

 Ratio of graphs containing subgraph: The number of subgraphs for each ratio range.

 Number of occurrences: The number of subgraphs for each occurrence range.

 Size of subgraphs: The number of subgraphs for each size range.

613
NetMiner Module Reference

 Tables
Subgraphs Table: The result for each frequent subgraph is organized in a table form
 A user can right-click onto the name of each subgraph such as S0 to ‘Visualize’ a subgraph.

 A user can right-click onto the title of each column to re-arrange or ‘Sort’ the table.

 Maps

614
IV. Mining

For each selected network, a user can browse as to where a frequent subgraph(s) is located on a

network. If a user checks ‘Isomorphic Only’, the algorithm searches for a frequent subgraph(s) that is

exactly identical to a network in terms of both edges and nodes.

 References
 X. Yan and J. Han (2004). gSpan: Graph-based substructure pattern mining.

615
NetMiner Module Reference

Mining >> Frequent Subgraph >> gSpan >>

Partitioning

 Menu
Mining >> Frequent Subgraph >> gSpan>> Partitioning

 Description
This module is the Frequent Subgraph Mining algorithm devised by X. Yan and J. Han. When given

a graph database that contains an undirected label graph(s), this algorithm finds every frequent

subgraph by using a pattern growth approach. As it finds every possible frequent subgraph, if a
large number of labeled graphs is not sparse and the labels are not diverse, it will take long time to

compute the algorithm.

 User Options
 Input
Undirected 1-mode Networks: Select one undirected 1-
mode network to partition this network.

Node Label: Select a node attribute(s), which will be used


as a node label. This algorithm sets the same label for two

nodes only if they have the same values for the chosen

attributes. For instance, when both nodes have 18 and

Seoul for AGE and LOCATION attributes respectively,

they will have the same label. However, if a user chooses

not to select any node attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

Link Label: Among the link attributes that selected networks commonly have, select a link
attribute(s). This algorithm sets the same label for two links only if they have the same values for the

616
IV. Mining

chosen attributes. However, if a user chooses not to select any link attributes, this algorithm will not

consider labels when finding a frequent subgraph(s) among the selected networks.

Partition: selects a method as to how to partition the network selected above.


 Attribute (Partition Vector): Partitions the network based on the attribute designated by a

user.

 2-mode Network (Affiliation): Partitions the network based on a 2-mode network affiliation.

 Component: Partitions the network using a component.

 Ego Networks: Composes a graph database using ego networks.

 Main process
Restriction: Allows a user to find a frequent subgraph(s) that satisfies the following conditions.
 Input graphs (partitions) containing subgraph: Sets the minimum and maximum ratio of

graphs containing a frequent subgraph(s) among the selected graphs.

 # of occurrences: Sets the minimum and maximum number of occurrences of a frequent

subgraph(s) among the selected graphs.

 Size of subgraphs: Sets the minimum and maximum size of a frequent subgraph(s)

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘gSpan >>

Partitioning' analysis, ‘Main Report’, ‘Subgraphs Table’ and ‘Spring Map’ are reported.

617
NetMiner Module Reference

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
 Number of graphs: The number of graphs.

 Number of subgraphs: The number of subgraphs.

 Ratio of graphs containing subgraph: The number of

subgraphs for each ratio range.

 Number of occurrences: The number of subgraphs for

each occurrence range.

 Size of subgraphs: The number of subgraphs for each

size range.

618
IV. Mining

 Tables
Subgraphs Table: The result for each frequent subgraph is organized in a table form
 A user can right-click onto the name of each subgraph such as S0 to ‘Visualize’ a subgraph.

 A user can right-click onto the title of each column to re-arrange or ‘Sort’ the table.

 Maps
For each selected network, a user can browse as to where a frequent subgraph(s) is located on a

network. If a user checks ‘Isomorphic Only’, the algorithm searches for a frequent subgraph(s) that is

exactly identical to a network in terms of both edges and nodes.

 References
 X. Yan and J. Han (2004). gSpan: Graph-based substructure pattern mining.

619
NetMiner Module Reference

Getting Started with Solving Classification

Problems using NetMiner

Introduction to classification problems

Unlabeled
test nodes

Labeled
Classification Classifier
training
Algorithm
nodes

Predicted
classification

In NetMiner, there are two broad types of data: attribute of the node from either a main nodeset or a

sub nodeset and network data (i.e. 1-mode network and 2-mode network).

Classifying nodes from a dataset based on certain attributes is a problem that appears in many

domains. To solve this issue, a classification learning algorithm takes a dataset of node attributes as

an input to infer classification rules. After deducing classification rules, the algorithm generates a

classifier (or a hypothesis) that has the rules that is to be used to predict the label of new unlabeled

examples.

620
IV. Mining

For example, the above tables are input data for a classification learning algorithm and we will

assume that a ‘Job-rank’ column is a label vector. The labeled training nodes are used in the training

phase to generate a classifier. After creating a classifier, it is asked what the job rank is for Steve and

Jake. A classifier answers this question based on classification rules inferred from the attributes and

the ‘job rank’ of the labeled training nodes.

A classification problem can easily be transformed to a regression problem by replacing a nominal


label with an ordinal numeric value. For instance, if the ‘Job-rank’ values in the above example are

ordinal instead of nominal, this problem becomes a regression problem. Due to this change, the

learning algorithm predicts the job rank by using continuous values such as 5.6 and 3.5. In a

machine-learning domain, both classification and regression algorithms are categorized as a

supervised learning algorithm.

621
NetMiner Module Reference

Classification Analysis Workflow

The process of applying a classification algorithm to a real-world problem is described below:

Define the classification


Problem from your data

1. Prepare and
Import Data
into NetMiner

2. Choose the Classification


Algorithm.

3. Setting Options
For testing and
Algorithm Parameter Tuning

4. Training

Parameter
Tuning
5. Evaluation

6. Is your Classifier OK?

622
IV. Mining

1. Prepare and Import Data into NetMiner

Classification algorithms in NetMiner work only for data that contains numerical attributes only. As

such, it is necessary to convert data into numerical attributes if a user prepared non-numerical

attributes for a classification problem. For example, if the type of an attribute is ordinal (e.g. a

‘Weather’ attribute having {Rainy, Cloudy, Clear} as its values), it can be converted into a numerical

attribute (e.g. a ‘Weather’ attribute now has {0, 1, 2} as its values). However, there are several

important concepts that a user needs to be aware of:

 It is difficult to convert nominal attributes into numerical attributes because they have no

order. To alleviate the difficulty, an easy approach is to change the column of a nominal

attribute into several binary columns like below:

 Every entry for the column(s) of an attribute(s) must be present for a classification algorithm.

For this reason, a user needs to fill every missing entry with an appropriate value (e.g. mean

value of a column). However, if an attribute containing missing values is chosen as a label

vector, a classifier is going to fill the missing values with the predicted label after terminating

a training process.

This data preparation process may be performed using NetMiner Script. After importing a dataset, a
user can choose a label vector and feature vectors in ‘Input’ menu of a control panel.

623
NetMiner Module Reference

2. Choose the Classification Algorithm

Researchers in a machine-learning domain generally reach a conclusion that there is no single

learning algorithm that can outperform other algorithms over every dataset. Therefore, choosing an

appropriate classification algorithm is crucial for getting the best result or output.

There are five classification algorithms available in NetMiner:

 Classification And Regression Tree (CART)

 Naïve Bayes Classifier (Naïve Bayes)

 K-Nearest Neighbor (KNN)

 Discriminant Analysis (DA)

 Support Vector Machines (SVMs)

As the performance of each algorithm listed above is dependent upon a dataset, here we propose

some guidelines for choosing an appropriate algorithm by comparing features of classification

algorithms.

CART Naive Bayes KNN SVMs DA


Accuracy in ** * ** **** **
general
Speed of learning *** **** **** * ***
Speed of **** **** * **** ****
classification
Explanation **** **** ** * **
ability
Tolerance to ** * * *** **
highly
interdependent
attributes
Tolerance to ** *** * ** ***
noise
Assumption No Yes No No Yes
about the (Gaussian (Gaussian)
distribution of multinomial
the dataset etc.)
The type of Binary tree The parameters Similarity Hyperplane Hyperplane
learning model containing to compute the matrix to classify to classify
the rules posterior
**** represents the best performance and * represents the worst performance.

624
IV. Mining

3. Setting Options For testing and Algorithm Parameter Tuning

After choosing a classification algorithm, the next step is setting options for the algorithm. The three

broad categories of options are: pre process, main process and post process. Among these options,

pre process and post process options, which are about evaluating the performance of an algorithm,

are the same across all classification algorithms. In order to evaluate a classifier, a dataset needs to be

split into a training set and a test set.

The classifier can be evaluated by measuring the difference between the predictions generated by it

and the actual labels (e.g. ‘Job-rank’ column of a test set) in the test set such as a misclassification

rate (or an error rate).

A user can control how data are to be split into a training set and a test set by setting the options in a

pre process control panel.

Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of

data into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of


data into test and training sets (with the specified

condition).

625
NetMiner Module Reference

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with

a certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of

a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as

shown on the right:

626
IV. Mining

In a post process control panel, a user can perform statistical hypothesis testing for the error rate of a

classifier:

Measure Training Accuracy: Turn on this to perform the hypothesis testing for the accuracy of
both training and test set.

Hypothesis testing: Customize the alternative hypotheses for an error rate.

Confidence Interval: Customize the confidence level.

4. Training

For convenience, as everything necessary for training a classifier is prepared, a user can click ‘Run

Process’.

5. Evaluation

After finishing the learning process, NetMiner shows the evaluation results of a classifier in Main

Report ([R] Main tab located at the bottom of a window).

TRAINING ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)

70 60(85.71%) 10(14.29%) 0.6762

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
30 21(70%) 9(30%) 0.2437

627
NetMiner Module Reference

The above table shows the error rate and Cohen’s kappa coefficient for both a training set and a test

set. Although these details give sufficient information about how good a classifier is, NetMiner also

provides other evaluation results for advanced users such as precision, recall, AUC and hypothesis

testing.

6. Is your Classifier OK?

After validating a classifier, a user might want to change it to improve accuracy or speed by tuning

the parameters of a learning algorithm and choosing another classification algorithm:

 To try a different algorithm, follow ‘Choose the Classification Algorithm’ step.


 To tune parameters, consider the table below:

It is important to note that there are tradeoffs between learning accuracy and learning speed.

628
IV. Mining

Mining >> Classification >> k-Nearest Neighbor

(KNN) >> Matrix

 Menu
Mining >> Classification >> KNN >> Matrix

 Description
The k-nearest neighbor algorithm (KNN) is a method for classifying a node by voting based on the
label of the closest k nodes in a feature space. More details can be found at [1].

 User Options

 Input
1-mode Network: Selects a similarity matrix (i.e. n by n)

Label Vector: Selects a column whose distinct values


define the label of nodes. A KNN algorithm predicts the

label of a missing value in a selected column.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified

629
NetMiner Module Reference

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set among a
whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are always

the same.

 Simple Random: Randomly choose the nodes of a test

set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

630
IV. Mining

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as shown on the right:

 Main process
# of Neighbors(k): The number of neighbors to be used for
classifying the label of a node.

Proximity: Decides whether the 1-mode network used as an


input is to be interpreted as 'Similarity' data or 'Dissimilarity'

data.

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measuring Training Accuracy: Turn on this to perform the hypothesis testing for the accuracy of
both training and test set.

Hypothesis Test: Customize the alternative hypotheses for an error rate.

Confidence Interval: Customize the confidence level.

631
NetMiner Module Reference

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘KNN (Matrix)’ analysis, ‘Main

Report’, ‘Predicted Class Table’, ‘Contingency Table’,

‘Neighbor List Table’ and ‘Neighbor Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
30 10(33.33%) 20(66.67%) 0.2638

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
CEO 0 1 29 0 0 0 0
Corp. Finance 3 1 24 2 0.75 0.6 0.6667
Customer
0 1 27 2 0 0 0
Mgmt.
Data Analysis 1 0 28 1 1 0.5 0.6667
Demestic
0 0 29 1 0 0 0
Production 2
Domestic
0 1 27 2 0 0 0
Production 1
Factory Mgmt. 2 2 24 2 0.5 0.5 0.5
Investment 2 0 28 0 1 1 1

632
IV. Mining

F-
class TP FP TN FN Precision Recall
Measure
Manager 0 3 26 1 0 0 0
Overseas
0 1 28 1 0 0 0
Marketing
Overseas
0 1 27 2 0 0 0
Production 1
Private
2 9 18 1 0.1818 0.6667 0.2857
Finance
Production
0 0 29 1 0 0 0
Mgmt.
Strategy 0 0 27 3 0 0 0
Transportation 0 0 29 1 0 0 0
Weighted
. . . . 0.3432 0.3333 0.3175
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 66.67(%) 1.8257 0.9661

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 66.67(%) 49.798(%) 83.5354(%)

TRAINING ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
70 32(45.71%) 38(54.29%) 0.4081

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
CEO 0 0 69 1 0 0 0
Corp. Finance 7 2 58 3 0.7778 0.7 0.7368

633
NetMiner Module Reference

F-
class TP FP TN FN Precision Recall
Measure
Customer
3 3 62 2 0.5 0.6 0.5455
Mgmt.
Data Analysis 3 6 59 2 0.3333 0.6 0.4286
Demestic
2 0 67 1 1 0.6667 0.8
Production 2
Domestic
3 1 64 2 0.75 0.6 0.6667
Production 1
Factory Mgmt. 3 2 60 5 0.6 0.375 0.4615
Investment 4 2 63 1 0.6667 0.8 0.7273
Manager 0 5 63 2 0 0 0
Overseas
0 0 67 3 0 0 0
Marketing
Overseas
3 3 62 2 0.5 0.6 0.5455
Production 1
Private
3 13 51 3 0.1875 0.5 0.2727
Finance
Production
0 1 68 1 0 0 0
Mgmt.
Strategy 1 0 62 7 1 0.125 0.2222
Transportation 0 0 67 3 0 0 0
Weighted
. . . . 0.5493 0.4571 0.4492
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 54.29(%) 0.7171 0.7634

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 54.29(%) 42.6158(%) 65.9556(%)

 Tables
Predicted Class: Shows the following label information for each node.

634
IV. Mining

 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

635
NetMiner Module Reference

Neighbor List Table: Nodes and its neighbors are listed in a table format

 Maps
Neighbor Map: The links between nodes are classified and its k neighbors are displayed using a

spring method.

636
IV. Mining

 Time Complexity
 ( m * n * log k), where n is the number of nodes to be classified, k is the number of

neighbors for classifying a node(s) and m is the number of links.

 References
 [1] Mitchell, T. (1997). Machine Learning, (McGraw-Hill).

637
NetMiner Module Reference

Mining >> Classification >> k-Nearest Neighbor

(KNN) >> Vector

 Menu
Mining >> Classification >> KNN >> Vector

 Description
The k-nearest neighbor algorithm (KNN) is a method for classifying a node by voting based on the
label of the closest k nodes in a feature space. More details can be found at [1].

 User Options
 Input

Label Vector: Selects a column whose distinct values


define the label of nodes. A KNN algorithm predicts the

label of a missing value in a selected column.

Feature Vectors: Selects a column(s) that correspond to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data into

a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified condition).

638
IV. Mining

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set among a
whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a test

set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.


Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as shown on the right:

639
NetMiner Module Reference

 Main process
# of Neighbors (k): The number of neighbors to be used
for classifying the label of a node.

(Dis)similarity Measure: A numerical measure of how


alike or different two nodes are.

 Euclidean Distance:

 CityBlock Distance:

 Cosine (Similarity):

 Correlation (Pearson’s):

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measuring Training Accuracy:Turn on this to perform the hypothesis testing for the accuracy of
both training and test set.

Hypothesis Test: Customize the alternative hypotheses for an error rate.

Confidence Interval: Customize the confidence level.

640
IV. Mining

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘KNN (Vector)’ analysis, ‘Main

Report’, ‘Predicted Class Table’, ‘Contingency Table’,

‘Neighbor List Table’ and ‘Neighbor Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
30 10(33.33%) 20(66.67%) 0.2638

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
CEO 0 1 29 0 0 0 0

641
NetMiner Module Reference

F-
class TP FP TN FN Precision Recall
Measure
Corp. Finance 3 1 24 2 0.75 0.6 0.6667
Customer
0 1 27 2 0 0 0
Mgmt.
Data Analysis 1 0 28 1 1 0.5 0.6667
Demestic
0 0 29 1 0 0 0
Production 2
Domestic
0 1 27 2 0 0 0
Production 1
Factory Mgmt. 2 2 24 2 0.5 0.5 0.5
Investment 2 0 28 0 1 1 1
Manager 0 3 26 1 0 0 0
Overseas
0 1 28 1 0 0 0
Marketing
Overseas
0 1 27 2 0 0 0
Production 1
Private
2 9 18 1 0.1818 0.6667 0.2857
Finance
Production
0 0 29 1 0 0 0
Mgmt.
Strategy 0 0 27 3 0 0 0
Transportation 0 0 29 1 0 0 0
Weighted
. . . . 0.3432 0.3333 0.3175
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 66.67(%) 1.8257 0.9661

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 66.67(%) 49.798(%) 83.5354(%)

TRAINING ACCURACY :: SUMMARY

642
IV. Mining

Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
70 32(45.71%) 38(54.29%) 0.4081

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
CEO 0 0 69 1 0 0 0
Corp. Finance 7 2 58 3 0.7778 0.7 0.7368
Customer
3 3 62 2 0.5 0.6 0.5455
Mgmt.
Data Analysis 3 6 59 2 0.3333 0.6 0.4286
Demestic
2 0 67 1 1 0.6667 0.8
Production 2
Domestic
3 1 64 2 0.75 0.6 0.6667
Production 1
Factory Mgmt. 3 2 60 5 0.6 0.375 0.4615
Investment 4 2 63 1 0.6667 0.8 0.7273
Manager 0 5 63 2 0 0 0
Overseas
0 0 67 3 0 0 0
Marketing
Overseas
3 3 62 2 0.5 0.6 0.5455
Production 1
Private
3 13 51 3 0.1875 0.5 0.2727
Finance
Production
0 1 68 1 0 0 0
Mgmt.
Strategy 1 0 62 7 1 0.125 0.2222
Transportation 0 0 67 3 0 0 0
Weighted
. . . . 0.5493 0.4571 0.4492
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 54.29(%) 0.7171 0.7634

643
NetMiner Module Reference

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 54.29(%) 42.6158(%) 65.9556(%)

 Tables
Predicted Class: Shows the following label information for each node.
 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

644
IV. Mining

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

Neighbor List Table: Nodes and its neighbors are listed in a table format

 Maps
Neighbor Map: The links between nodes are classified and its k neighbors are displayed using a

spring method.

645
NetMiner Module Reference

 Time Complexity
 ( m * n * log k), where n is the number of nodes to be classified, k is the number of

neighbors for classifying a node(s) and m is the number of links.

 References
 [1] Mitchell, T. (1997). Machine Learning, (McGraw-Hill).

646
IV. Mining

Mining >> Classification >> CART

 Menu
Mining >>Classification>> CART

 Description
Classification And Regression Tree (CART), which is trained from training instances, predicts
missing labels of test instances. The decision process, which predicts a label, is about tracing the

decisions in the tree from the root node to a leaf node.

For instance, this tree predicts the label (including ‘Team Member Level 02’, ‘CEO’ and ‘Team

Manager’) based on two attributes ‘Age’ and ‘Organization Satisfaction’. Assume there is an instance

having attributes {‘Age’ = 45, ‘Organization Satisfaction’ = 0.1}. The tree will first decide to go right

since the ‘Age’ = 45 is greater than 43.5 at the root level. As ‘Organization Satisfaction’ = 0.1 is less

than 0.5, an example instance will be classified as CEO.

In the learning phase, CART constructs a tree structure by following the steps below:

647
NetMiner Module Reference

1. At the root node of a tree, begin with all training instances.

2. It tries to divide an instance into every possible binary split for every attribute. Choose a split

with the best criterion value.

 The metric of a criterion can either be ‘Gini impurity’ or ‘entropy’ for a

classification tree.

3. Generate two child nodes corresponding to two parts of the best binary split chosen in step 2

and move the instances to them respectively.

4. Iterate step 1, 2 and 3 for two child nodes until the members of instances in the current node

reach to ‘minimum node size to be split’ or the number of instances in child nodes for every

possible split is less than ‘minimum leaf size’.

 ‘Minimum node size to be split’ and ‘Minimum leaf size’ are the options in main

process panel that a user needs to provide.

Above learning process is depicted in the following figure:

 User Options

648
IV. Mining

 Input
Label Vector: Selects a column whose distinct values
define the label of nodes.

Feature Vectors: Selects columns that correspond to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:

 All Training Set: Use whole data as a training

set.

 Split Test Set (Random): Divide selected set of

data into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of


data into test and training sets (with the

specified condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.
 Random Seed: If a user fixes this parameter with

a certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of

a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of labels for the selected nodes in the test set is

similar to that of whole data set. For example, if the whole data set has 30% red labels, 30%

649
NetMiner Module Reference

blue labels and 40% green labels, the test set should follow the same distribution by having

30% red labels, 30% blue labels and 40% green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as shown on the right:

 Main process
Tree Shape Options:
 Pruning: “Yes” option reshapes the full tree to
prevent overfitting,

 Minimum leaf size: If the number of instances in

children nodes for all possible splits at a current node

is less than this value, the construction of tree will be

stopped.

 Minimum node size to be split: If the number of

instances in a current node reaches to this value, the

650
IV. Mining

construction of tree will be stopped.

Criterion: Specifies the criterion metric to find the best split.

 Gini impurity –

 Entropy -

where is the fraction of instances with label , which is involved in the node of tree. For example,

a node with only one label has a value 0 in terms of both Gini impurity and entropy. Otherwise, it

will have a positive value.

 Post-process
A user can set the alternative hypothesis and the

confidence level for a hypothesis testing of training error.

Measuring Training Accuracy:Turn on this to perform


the hypothesis testing for the accuracy of both training

and test set.

Hypothesis Test: Customize the alternative hypotheses for an


error rate.

Confidence Interval: Customize the confidence level.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘CART (Classification)’ analysis,

‘Main Report’, ‘Predicted Class Table’, ‘Contingency Table’

and ‘Tree Diagram Chart’ are reported.

 Outputs

651
NetMiner Module Reference

An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
45 42(93.33%) 3(6.67%) 0.9

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 15 0 0 30 1 1 1
'versicolor' 15 3 0 27 0.8333 1 0.9091
'virginica' 12 0 3 30 1 0.8 0.8889
Weighted
. . . . 0.9444 0.9333 0.9327
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 6.67(%) -5.8138 0

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 6.67(%) 0(%) 13.9548(%)

TRAINING ACCURACY :: SUMMARY

652
IV. Mining

Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
105 104(99.05%) 1(0.95%) 0.9857
TRAINING ACCURACY :: ACCURACY PER CLASS
F-
class TP FP TN FN Precision Recall
Measure
'setosa' 35 0 0 70 1 1 1
'versicolor' 34 0 1 70 1 0.9714 0.9855
'virginica' 35 1 0 69 0.9722 1 0.9859
Weighted
. . . . 0.9907 0.9905 0.9905
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 0.95(%) -10.0518 0

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 0.95(%) 0(%) 2.8101(%)
TREE INFO
# of Non-leaf # of Leaf
# of Nodes Height
Nodes Nodes
9 4 5 4

TREE DESCRIPTION :shows the tree in a table format. For example, a first row in table means

that if ‘Age <= 44’ then go to the node 2 otherwise, go to the node3. A row without a criterion value

becomes a leaf node.


Att. /
≤ (Left > (Right
Node Label(Leaf Criterion
child) child)
node)
Node 1 Age 44 Node 2 Node 3
Team
Node 2 . . .
Member

653
NetMiner Module Reference

Att. /
≤ (Left > (Right
Node Label(Leaf Criterion
child) child)
node)
Level 02
Node 3 Age 52.5 Node 4 Node 5
Node 4 Age 51.5 Node 6 Node 7
Team
Node 5 Member . . .
Level 01
Performance
Node 6 3.5 Node 8 Node 9
Level
Node 7 CEO . . .
Team
Node 8 . . .
Manager

 Tables
Predicted Class: Shows the following label information for each node.
 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

654
IV. Mining

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

 Charts
Tree Diagram: Shows a trained tree graphically.

655
NetMiner Module Reference

 References
 Murphy, Kevin P. “Machine learning: a probabilistic perspective.” The MIT Press, 2012.

 Fisher, R. A. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of

Eugenics, Vol. 7, pp. 179–188, 1936.

656
IV. Mining

Mining >> Classification >> Naive Bayes

 Menu
Mining >> Classification >> Naïve Bayes

 Description
A Naïve Bayes classifier (‘NB’) assigns a new unlabeled node to the maximum posterior label under

the assumption of conditional independence among attributes given a label value. The assumption of

conditional independence makes an intractable computation of posterior possible. This naïve

assumption works well in practice even when the independence assumption is invalid. Although an

NB may support various different distributions for a likelihood of attributes such as Gaussian and

multinomial, the NB implemented in NetMiner can only support Gaussian distribution.

 User Options

 Input
Label Vector: Selects a column whose distinct values define
the label of nodes.

Feature Vectors: Selects a column(s) that corresponds to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data into a test set and a training set by

random.

657
NetMiner Module Reference

 Split Test Set (Condition): Divide selected set of


data into test and training sets (with the specified

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with

a certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of

a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of

labels for the selected nodes in the test set is

similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue

labels and 40% green labels, the test set should

follow the same distribution by having 30% red labels, 30% blue labels and 40% green labels

among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

658
IV. Mining

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as shown on the right:

 Main process
Prior Policy: Choose prior probabilities for the labels.
 Uniform: The prior probabilities have an equal

value for all labels.

 Empirical: The prior probabilities are assigned

from the distribution of the node’s labels in a

training set.

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measuring Training Accuracy:Turn on this to perform


the hypothesis testing for the accuracy of both training

and test set.

Hypothesis Test: Customize the alternative hypotheses


for an error rate.

659
NetMiner Module Reference

Confidence Interval: Customize the confidence level.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Naïve Bayes’ analysis, ‘Main Report’,

‘Predicted Class Table’, ‘Posterior Probability Table’

‘Contingency Table’ and ‘ROC Curve Chart’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
45 42(93.33%) 3(6.67%) 0.9

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 15 0 0 30 1 1 1
'versicolor' 15 3 0 27 0.8333 1 0.9091
'virginica' 12 0 3 30 1 0.8 0.8889
Weighted
. . . . 0.9444 0.9333 0.9327
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE

660
IV. Mining

P-value(one-
p0 p1 Z
sided)
50(%) 6.67(%) -5.8138 0

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 6.67(%) 0(%) 13.9548(%)

TRAINING ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
105 104(99.05%) 1(0.95%) 0.9857

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 35 0 0 70 1 1 1
'versicolor' 34 0 1 70 1 0.9714 0.9855
'virginica' 35 1 0 69 0.9722 1 0.9859
Weighted
. . . . 0.9907 0.9905 0.9905
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 0.95(%) -10.0518 0

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 0.95(%) 0(%) 2.8101(%)

661
NetMiner Module Reference

 Tables
Predicted Class: Shows the following label information for each node.
 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

662
IV. Mining

Posterior Probability Table: Shows a posterior probability of each label for every node.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

 Charts

663
NetMiner Module Reference

ROC Curves (Test & Train): Shows ROC curves that represent the classifying performance of a
classifier for a training set and a test set. The classifying performance for each label is measured by

the Area Under the ROC Curve (‘AUC’). The larger the AUC or a certain label, the higher the

likelihood that the prediction of label classification is correct.

 Time Complexity
 O(m * n), where m is the number of training nodes and n is the number of attributes

664
IV. Mining

Mining >> Classification >> Discriminant

Analysis

 Menu
Mining >> Classification >> Discriminant Analysis

 Description
Discriminant analysis (‘DA’) is a classification method. It assumes that the nodes included in
different groups (i.e. the nodes with the same label) generate data based on different Gaussian

distributions. Let us reduce the classification problem into two labels {0, 1}. The posterior

probability for each label given the attribute vectors ( ) of a node is as follows:

Since is the same for all labels and cannot affect the classification, we

eliminate and set

Assuming , we get the following discriminant function:

Expanding a quadratic term, we can define a quadratic discriminant that can also be written as:

665
NetMiner Module Reference

where

In the case of two classes, one discriminant function is sufficient:

and we

Assuming that the training set for all labels share a common covariance matrix , the term in the

discriminant function is deleted so that the discriminant function becomes linear. Moreover, in order

to deal with multi-label classification problem, a DA creates binary classifiers and classifies

the new nodes by the weight sum of posterior probabilities. More details can be found at [1, 2]

666
IV. Mining

 User Options
 Input
Label Vector: Selects a column whose distinct values
define the label of nodes.

Feature Vectors: Selects a column(s) that correspond to


features of a training set. Only a numeric column can be

selected.

2. Pre process

Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a

test set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

667
NetMiner Module Reference

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set

by querying as shown on the right:

 Main process
Covariance Policy: The assumption about the shape of
covariance for each group.

 Linear: Assume that the nodes for each label share a

common estimate of covariance.

 Diagonal Linear: Assume that the nodes for each

label share a common diagonal covariance matrix

estimate.

668
IV. Mining

 Quadratic: Assume that the nodes for each label have different estimate of covariance.

 Diagonal Quadratic: Assume that the nodes for each label have different diagonal covariance

matrix estimate.

Prior Policy: Choose prior probabilities for the labels.


 Uniform: The prior probabilities have an equal value for all labels.

 Empirical :The prior probabilities are assigned from the distribution of the node’s labels in a

training set.

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measuring Training Accuracy:Turn on this to perform


the hypothesis testing for the accuracy of both training

and test set.

Hypothesis Test: Customize the alternative hypotheses


for an error rate.

Confidence Interval: Customize the confidence level.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Discriminant Analysis' analysis,

‘Main Report’, ‘Predicted Class Table’, ‘Posterior

Probability Table’ ‘Contingency Table’, ‘ROC Curve Chart’

and ‘Discriminant Function Table’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of

669
NetMiner Module Reference

an output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
45 42(93.33%) 3(6.67%) 0.9

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 15 0 0 30 1 1 1
'versicolor' 15 3 0 27 0.8333 1 0.9091
'virginica' 12 0 3 30 1 0.8 0.8889
Weighted
. . . . 0.9444 0.9333 0.9327
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 6.67(%) -5.8138 0

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 6.67(%) 0(%) 13.9548(%)

TRAINING ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)

670
IV. Mining

Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
105 104(99.05%) 1(0.95%) 0.9857

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 35 0 0 70 1 1 1
'versicolor' 34 0 1 70 1 0.9714 0.9855
'virginica' 35 1 0 69 0.9722 1 0.9859
Weighted
. . . . 0.9907 0.9905 0.9905
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 0.95(%) -10.0518 0

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 0.95(%) 0(%) 2.8101(%)

 Tables
Predicted Class: Shows the following label information for each node.
 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

671
NetMiner Module Reference

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

Posterior Probability Table: Shows a posterior probability of each label for every node.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

672
IV. Mining

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

Discriminant Function Table: Shows a coefficient(s) and a constant for a discrminant function(s).

 Charts
ROC Curves (Test & Train): Shows ROC curves that represent the classifying performance of a
classifier for a training set and a test set. The classifying performance for each label is measured by

the Area Under the ROC Curve (‘AUC’). The larger the AUC or a certain label, the higher the

likelihood that the prediction of label classification is correct.

673
NetMiner Module Reference

 References
 [1] Murphy, Kevin P. “Machine learning: a probabilistic perspective.” The MIT Press, 2012.

 [2] Fisher, R. A. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of

Eugenics, Vol. 7, pp. 179–188, 1936

674
IV. Mining

Mining >> Classification >> Support Vector

Machines (SVMs)

 Menu
Mining >> Classification >> SVMs

 Description
Support vector machine (SVM) is the most popular classification algorithm used in various

domains. In the learning phase, an SVM finds the best hyperplane to classify data. The best

hyperplane for an SVM is the one with the largest margin between two classes. A margin means the

width from a hyperplane to the nearest data point, which is in fact called a support vector. The basic
idea underlying an SVM is that a classifier with a maximum margin has the smallest generalization

error. This intuition is depicted in the following figure:

From a mathematical point of view, finding a maximum margin is a quadratic programming problem.

There are several different algorithms for solving this problem. In NetMiner, the Sequential

Minimal Optimization (SMO) suggested by Platt (1998)[2] is implemented. An SVM is a


classification algorithm for a 2-class supervised dataset.

675
NetMiner Module Reference

 User Options

 Input
Label Vector: Selects a column whose distinct values
define the label of nodes. An SVM algorithm predicts the

label of a missing value in a selected column.

Feature Vectors: Selects a column(s) that correspond to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of


data into test and training sets (with the specified

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with

a certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of

a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of labels for the selected nodes in the test set is

676
IV. Mining

similar to that of whole data set. For example, if the whole data set has 30% red labels, 30%

blue labels and 40% green labels, the test set should follow the same distribution by having

30% red labels, 30% blue labels and 40% green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as shown on the right:

 Main process
Kernel Functions: Choose the function that projects the
training data into a kernel space.

 Polynomial: Polynomial kernel with order 3.

 Quadratic: Quadratic kernel.

 Radial Basis Function: Gaussian Radial Basis

Function kernel.

 Linear: Linear kernel, which is equivalent to a dot

product.

677
NetMiner Module Reference

Learning options:
 Max Iteration: Maximum iteration of SMO algorithm. If the SMO cannot converge until this

limit, the algorithm stops and returns an error.

 KKT Tolerance: A tolerance for convergence. The SMO checks the Karush-Kuhn-

Tucker(KKT) conditions for convergence.

 KKT Violation Level: The proportion of nodes that are allowed to violate Karush-Kuhn-

Tucker (KKT) conditions. If a user sets KKT Violation Level to 0.1, 10% of the nodes can
violate the KKT conditions.

 Box Constraint: Box constraint for the soft margin.

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measuring Training Accuracy:Turn on this to perform


the hypothesis testing for the accuracy of both training

and test set.

Hypothesis Test: Customize the alternative hypotheses


for an error rate.

Confidence Interval: Customize the confidence level.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘SVMs’ analysis, ‘Main Report’,

‘Predicted Class Table’, ‘Contingency Table’, ‘Voting

Summary Table’, ‘Voting Result Table’, ‘Support Vector List

Table’, ‘Alpha Table’, and ‘Bias Table’ are reported.

678
IV. Mining

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
45 42(93.33%) 3(6.67%) 0.9

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 15 0 0 30 1 1 1
'versicolor' 15 3 0 27 0.8333 1 0.9091
'virginica' 12 0 3 30 1 0.8 0.8889
Weighted
. . . . 0.9444 0.9333 0.9327
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 6.67(%) -5.8138 0

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 6.67(%) 0(%) 13.9548(%)

TRAINING ACCURACY :: SUMMARY

679
NetMiner Module Reference

Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
105 104(99.05%) 1(0.95%) 0.9857

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 35 0 0 70 1 1 1
'versicolor' 34 0 1 70 1 0.9714 0.9855
'virginica' 35 1 0 69 0.9722 1 0.9859
Weighted
. . . . 0.9907 0.9905 0.9905
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 0.95(%) -10.0518 0

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 0.95(%) 0(%) 2.8101(%)

 Tables
Predicted Class: Shows the following label information for each node.
 Allocation: Shows whether a node is included in a training set (Training), a test set (Test) or

no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’

label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

680
IV. Mining

 Matching:If a predicted label and an original label are equal, then the entry will be filled with

‘Y’. Otherwise, ‘N’.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

Voting Summary: An entry (i, j) of this table shows how many SVMs voted the label j for each
node i.

681
NetMiner Module Reference

Voting Result: An entry (i, j) of this table shows the label that a jth SVM had voted for each node i.

Support Vector List: An entry (i, j) of this table shows the ith support vector of jth SVM.
 The green entries are support vectors belonging to a first label and

 the pink entries are support vectors belonging to a second label.

682
IV. Mining

Alpha: The alpha vectors for each SVM where an alpha is the weight for each support vector.

683
NetMiner Module Reference

Bias: Bias value for each SVM.

 References
 [1] Cristianini, N., and Shawe-Taylor, J.(2000). “An Introduction to Support Vector Machines

and Other Kernel-basedLearning Methods”, First Edition (Cambridge: Cambridge University

Press). [Link]

 [2] Platt, John. “Fast Training of Support Vector Machines using Sequential Minimal

Optimization”, in Advances in Kernel Methods – Support Vector Learning, B. Scholkopf, C.

Burges, A. Smola, eds., MIT Press (1998).

684
IV. Mining

Mining >> Classification >> Multilayer

Perceptron

 Menu
Mining >> Classification >> Multilayer Perceptron

 Description
A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of

input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a

directed graph, with each layer fully connected to the next one. Except for the input nodes, each node

is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised

learning technique called backpropagation for training the network. MLP is a modification of the

standard linear perceptron and can distinguish data that are not linearly separable.

 User Options

 Input
Label Vector: Selects a column whose distinct values define the label of nodes.

685
NetMiner Module Reference

Feature Vectors: Selects a column(s) that corresponds to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:

 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of

data into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set


of data into test and training sets (with the

specified condition).

Parameters in Split Test Set (Random):

 Proportion (%): The proportion of a test set


among a whole set of data.

 Random Seed: If a user fixes this parameter

with a certain integer, the nodes of a validation

set are always the same.

 Simple Random: Randomly choose the nodes


of a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the


nodes for a test set so that the distribution of

labels for the selected nodes in the test set is

similar to that of whole data set. For example,

if the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

686
IV. Mining

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of
chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the

“Allocation” column are allocated to a test set by

querying as below:

 Main process
Activation Functions: Select the activation function used in
perceptron.

 Sigmoid :

 Hyperbolic Tangent :

 ReLU :

Learning options :
 Hidden Layer1 node: The number of nodes in the first hidden layer, you must have at least

one or more values.

687
NetMiner Module Reference

 Hidden Layer2 node : The number of nodes in the second hidden layer

 Hidden Layer3 node : The number of nodes in the third hidden layer

 Learning Rate: As a learning rate parameter adjusts the weight adjustment degree with

backpropagation method..

 Max iteration: The maximum iteration during the training set learning. The greater the value,

the algorithm calculation time increases.

 Accuracy: When learning the training set, the learning will be terminated if the error falls

below accuracy.

 Post-process
A user can set the alternative hypothesis and the confidence level for a hypothesis testing of training

error.

Measure Training Accuracy: Turn on this to perform


the hypothesis testing for the accuracy of both training and

test set.

Hypothesis testing: Customize the alternative hypotheses


for an error rate.

Confidence Interval: Customize the confidence level.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Multilayer Perceptron’ analysis,

‘Main Report’, ‘Predicted Class Table’, ‘Posterior Probability

Table’ ‘Contingency Table’ and ‘ROC Curve Chart’ are

reported.

688
IV. Mining

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
Evaluation results for a classifier are reported in a main report.

TEST ACCURACY :: SUMMARY


Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
45 42(93.33%) 3(6.67%) 0.9

TEST ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 15 0 0 30 1 1 1
'versicolor' 15 3 0 27 0.8333 1 0.9091
'virginica' 12 0 3 30 1 0.8 0.8889
Weighted
. . . . 0.9444 0.9333 0.9327
Average

TEST ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 6.67(%) -5.8138 0

TEST ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 6.67(%) 0(%) 13.9548(%)

TRAINING ACCURACY :: SUMMARY

689
NetMiner Module Reference

Incorrectly
Correctly
# of Total Classified Cohen's kappa
Classified
instances Instances coefficient
Instances
(Error Rate)
105 104(99.05%) 1(0.95%) 0.9857

TRAINING ACCURACY :: ACCURACY PER CLASS


F-
class TP FP TN FN Precision Recall
Measure
'setosa' 35 0 0 70 1 1 1
'versicolor' 34 0 1 70 1 0.9714 0.9855
'virginica' 35 1 0 69 0.9722 1 0.9859
Weighted
. . . . 0.9907 0.9905 0.9905
Average

TRAINING ACCURACY :: HYPOTHESIS TEST OF ERROR RATE


P-value(one-
p0 p1 Z
sided)
50(%) 0.95(%) -10.0518 0

TRAINING ACCURACY :: CONFIDENCE INTERVAL OF ERROR RATE


Confidence Point
Lower bound Upper bound
Level estimation
95(%) 0.95(%) 0(%) 2.8101(%)

 Tables
Predicted Class: Shows the following label information for each node.

 Allocation: Shows whether a node is included in a training set (Training), a test set (Test)
or no answer set (Missing)

 Original: Shows the label of a node in either a training set or a test set.

 Predicted: Shows the label of a node predicted by a classifier.

 Revised: For nodes in a training set and a test set, the entries are filled with their ‘original’
label. For nodes in a no answer set, their entries are filled with the label ‘predicted’ by a

classifier.

690
IV. Mining

 Matching: If a predicted label and an original label are equal, then the entry will be filled
with ‘Y’. Otherwise, ‘N’.

Posterior Probability Table: Shows a posterior probability of each label for every node.

Contingency Table (Test): Shows a contingency table between original labels (row) and predicted
labels (column).

691
NetMiner Module Reference

Contingency Table (Training): Shows a contingency table between original labels (row) and
predicted labels (column).

 Charts
ROC Curves (Test & Train): Shows ROC curves that represent the classifying performance of a
classifier for a training set and a test set. The classifying performance for each label is measured by

the Area Under the ROC Curve (‘AUC’). The larger the AUC or a certain label, the higher the

likelihood that the prediction of label classification is correct.

 Time Complexity
 O((d+m)PHN

692
IV. Mining

 Where d is the numer of input nodes and m is the number of ouput nodes

 Where p is the number of hidden nodes, H is the number of iterations, and N is the number of

samples.

 References
Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning

representations by back-propagating errors". Nature 323

693
NetMiner Module Reference

Mining >> Regression >> Classification and

Regression Tree (CART)

 Menu
Mining >> Regression >> CART

 Description
Classification And Regression Tree (CART), which is trained from training instances, predicts
missing labels of test instances. The decision process, which predicts a label, is about tracing the

decisions in the tree from the root node to a leaf node.

For instance, this tree predicts the label (including ‘Team Member Level 02’, ‘CEO’ and ‘Team

Manager’) based on two attributes ‘Age’ and ‘Organization Satisfaction’. Assume there is an instance

having attributes {‘Age’ = 45, ‘Organization Satisfaction’ = 0.1}. The tree will first decide to go right

since the ‘Age’ = 45 is greater than 43.5 at the root level. As ‘Organization Satisfaction’ = 0.1 is less

than 0.5, an example instance will be classified as CEO.

In the learning phase, CART constructs a tree structure by following the steps below:

694
IV. Mining

1. At the root node of a tree, begin with all training instances.

2. It tries to divide an instance into every possible binary split for every attribute. Choose a split

with the best criterion value.

 The metric of a criterion can either be ‘Gini impurity’ or ‘entropy’ for a

classification tree.

3. Generate two child nodes corresponding to two parts of the best binary split chosen in step 2

and move the instances to them respectively.

4. Iterate step 1, 2 and 3 for two child nodes until the members of instances in the current node

reach to ‘minimum node size to be split’ or the number of instances in child nodes for every

possible split is less than ‘minimum leaf size’.

 ‘Minimum node size to be split’ and ‘Minimum leaf size’ are the options in main

process panel that a user needs to provide.

Above learning process is depicted in the following figure:

695
NetMiner Module Reference

 User Options

 Input
Numerical Label Vector: Selects a column whose
distinct values define the answer of nodes. CART

algorithm predicts the label of missing value in a selected

column.

Feature Vectors: Selects a column(s) that correspond to


features of a training set. Only a numeric column can be

selected.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data into

a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a

test set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of labels for the selected nodes in the test set is

similar to that of whole data set. For example, if the whole data set has 30% red labels, 30%

696
IV. Mining

blue labels and 40% green labels, the test set should follow the same distribution by having

30% red labels, 30% blue labels and 40% green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as

shown on the right:

 Main process
Tree Shape Options:
 Pruning: “Yes” option reshapes the full tree to

prevent overfitting,

 Minimum leaf size: If the number of instances in

children nodes for all possible splits at a current

node is less than this value, the construction of tree

will be stopped.

 Minimum node size to be split: If the number of

instances in a current node reaches to this value, the construction of tree will be stopped.

697
NetMiner Module Reference

Criterion: Specifies the criterion metric to find the best split.

 Gini impurity –

 Entropy -

where is the fraction of instances with label , which is involved in the node of tree. For example,

a node with only one label has a value 0 in terms of both Gini impurity and entropy. Otherwise, it

will have a positive value.

 Post-process
Measuring Training Accuracy: Turn on this to
perform the hypothesis testing for the accuracy of both

training and test set.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘CART (Regression)’ analysis, ‘Main

Report’, ‘Predicted Value Table’, ‘Scatter Plot Chart’ and ‘Tree

Diagram Chart’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report
TEST ACCURACY
RMSE MAE
R-square
# of (Root mean (Mean Correlation
(Coefficient of
instances squared absolute Coefficient
determination)
error) error)

698
IV. Mining

RMSE MAE
R-square
# of (Root mean (Mean Correlation
(Coefficient of
instances squared absolute Coefficient
determination)
error) error)
6 0.4435 0.3333 0.976 0.9465
TRAINING ACCURACY
RMSE MAE
R-square
# of (Root mean (Mean Correlation
(Coefficient of
instances squared absolute Coefficient
determination)
error) error)
16 0.7202 0.5125 0.9195 0.8456
TREE INFO
# of Non-leaf
# of Nodes # of Leaf Nodes Height
Nodes
5 2 3 2
TREE DESCRIPTION
Att. /
≤ (Left > (Right
Node Value(Leaf Criterion
child) child)
node)
Node 1 Duration 10.5 Node 2 Node 3
Node 2 Job-ranking 6.5 Node 4 Node 5
Node 3 2.8 . . .
Node 4 5.5 . . .
Node 5 7 . . .

 Tables
Predicted Value: Values predicted by models and the difference between predicted values and
original values are shown in this table.

699
NetMiner Module Reference

 Charts
Scatter Plot (Test & Training): Scatter plot of original values (x-axis) and predicted values (y-axis).
If dots are located near the diagonal, it suggests that a trained tree has a high accuracy.

700
IV. Mining

Tree Diagram: Shows a trained tree graphically.

 References
 Murphy, Kevin P. “Machine learning: a probabilistic perspective.” The MIT Press, 2012.

 Fisher, R. A. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of

Eugenics, Vol. 7, pp. 179–188, 1936.

701
NetMiner Module Reference

Mining >> Collaborative Filtering >> Singular

Value Decomposition (SVD)

 Menu
Mining >> Collaborative Filtering >> SVD

 Description
The main task of collaborative filtering is to predict a user’s preference for items based on other

users’ preferences for the items. Singular value decomposition ('SVD') algorithm for collaborative
filtering is a matrix factorization model used to solve a collaborative filtering problem. SVD maps

both users and items to a joint latent factor space of dimension k, such that user-item interactions

are modeled as inner products in that space. The latent space explains ratings by characterizing both
products and users based on factors automatically inferred from a user feedback. For instance, if

products are ‘movies’, factors measure dimensions such as ‘comedy vs. drama’ and ‘amount of

actions’. SVD assumes that only a small number of factors can influence preferences and that a

user’s preference for an item is determined by how each factor applies to the user and the item. This

problem can be formulated as a matrix factorization (‘MF’) problem. In other words, in a k-factor

model, given a preference matrix (the preference matrix can be converted to a 2-

mode network), SVD finds two matrices and such that:

[1]

To find matrices U and M, SVD solves the following optimization problem using a stochastic

gradient descent:

702
IV. Mining

[2]

where is an overfitting regulation parameter and . More details can be found

at [1, 2]. The factorization uses an iterative method starting with random initial values for matrices U

and M.

 User Options

 Input
2-mode Network: Select a 2-mode network. A user can only choose one 2-mode network.
 Link Merge: When selected data contains multiple links, where more than two links connect

the same source node and target node pair, a user should decide how to merge them into a

single link.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data into a test set and a training set by

random.

 Split Test Set (Condition): Divide selected set of data into test and training sets (with the
specified condition).

703
NetMiner Module Reference

Parameters in Split Test Set (Random):

 Proportion (%): The proportion of a test set among a whole set of data.
 Random Seed: If a user fixes this parameter with a certain integer, the nodes of a validation

set are always the same.

 Simple Random: Randomly choose the nodes of a test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

704
IV. Mining

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as shown

on the right:

 Main process
# of Features (rank): Dimensionality of a latent factor space

# of Items to Recommend: The number of items shown in a recommendation table for each user.

Option:
 Min iteration: Minimum number of times to repeat

the learning (optimization) process

 Max iteration: Maximum number of times to

repeat the learning (optimization) process

 Overfitting regularization parameter: : in [1],

which is a regularization constant to prevent the

overfitting of model U and M.

 Learning rate:Controls the step size when matrices U and M are iteratively adjusted.

705
NetMiner Module Reference

 Convergence Tolerance: If | (training error at i iteration) – (training error at i + 1iteration) | <

(Convergence Tolerance), the algorithm will be stopped.

 Random Seed: If a user fixes a random seed, one can get the same validation set for the same

dataset.

 Post-process
Measuring Training Accuracy: Turn on this to
perform the hypothesis testing for the accuracy of both

training and test set.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘SVD’ analysis, ‘Main Report’,

‘Recommendation Table’, ‘Predicted Value Table’, ‘U Matrix

Table’, ‘M Matrix Table’, ‘Answer List Table’, ‘Answer Matrix

Table’ and ‘Learning Curve Chart’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report: Main Report presents information of process, epoch(the number of iteration
performing) and training RMSE and test RMSE.

706
IV. Mining

 Tables
Predicted Value: Values predicted by the models and difference from original are displayed.

Recommendation Table: A node by item recommendation rank matrix, which shows which item is
recommended with kth rank.

707
NetMiner Module Reference

U Matrix:A user by latent factor matrix

M Matrix:An item by latent factor matrix

Answer List and Answer Matrix: Show the prediction for every pair (i.e. user and item).

(Answer List)

(Answer Matrix)

708
IV. Mining

 Charts
Learning Curve: A graphical representation of the decrease in RMSE (vertical axis) with epoch
(horizontal axis)

709
NetMiner Module Reference

Mining >> Collaborative Filtering >> Singular

Value Decomposition++ (SVD++)

 Menu
Mining >> Collaborative Filtering >> SVD++

 Description
Singular value decomposition++ ('SVD++') improves a prediction accuracy of SVD by considering

implicit feedback information. In general, implicit feedback refers to any kinds of users’ history

information, which indicates users’ preference. Given a preference matrix , SVD++ in

NetMiner utilizes two kinds of information from Y, namely:

 a valued 2-mode network

 a dichotomized 2-mode network(indicates which items users rate, regardless of their rating

value)

In SVD++ model, a second set of item factors is added, relating each item j to a factor vector

. These new item factors are used to characterize users based on a set of items that the users

have rated. The exact model is:

(1)

(2)

The set contains the items rated by a user u. To find matrices U, M, , , SVD++ solves the

following optimization problem using a stochastic gradient descent:

710
IV. Mining

(3)

More details can be found at [1].

 User Options
 Input
2-mode Network: Select a 2-mode network. A user can
only choose one 2-mode network. The link weight of a 2-

mode network must be scaled to a value between 1 and 5.

 Link Merge: When selected data contains multiple

links, where more than two links connect the same

source node and target node pair, a user should

decide how to merge them into a single link.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data into

a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set among a
whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a test

set from a main nodeset without any condition.

711
NetMiner Module Reference

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as

shown on the right:

 Main process
# of Features (rank): Dimensionality of a latent factor space

# of Items to Recommend: The number of items shown in a


recommendation table for each user.

Option:

712
IV. Mining

 Min iteration: Minimum number of times to repeat the learning (optimization) process

 Max iteration: Maximum number of times to repeat the learning (optimization) process

 Overfitting regularization parameter: : in (3), which is a regularization constant to prevent

the overfitting of model U, M.

 Learning rate: Controls the step size when matrices U and M are iteratively adjusted.

 W matrix regularization parameter: : in (3), which is a regularization constant to prevent

the overfitting of model W.

 W matrix learning rate: Parameter to control the step size when matrix W is iteratively

adjusted.

 Bias array regularization parameter: : in (3), which is a regularization constant to prevent

the overfitting of bias model.

 Bias array learning rate: Parameter to control the step size when vector ub and ib are

iterately adjusted.

 Convergence Tolerance: If | (training error at i iteration) – (training error at i + 1iteration) | <

(Convergence Tolerance), the algorithm will be stopped.

 Random Seed: If a user fixes a random seed, one can get the same validation set for the same

dataset.

 Post-process
Measuring Training Accuracy:Turn on this to
perform the hypothesis testing for the accuracy of both

training and test set.

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘SVD++’ analysis,

‘Main Report’, ‘Recommendation Table’, ‘Predicted Value Table’, ‘U Matrix Table’, ‘M Matrix

Table’, ‘Answer List Table’, ‘Answer Matrix Table’, ‘C Bias’, ‘M Bias’ and ‘Learning Curve Chart’

are reported.

713
NetMiner Module Reference

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report: Main Report presents information of process, epoch(the number of iteration
performing) and training RMSE and test RMSE.

 Tables

714
IV. Mining

Predicted Value: Values predicted by models and the difference between predicted values and
original values are shown in this table.

Recommendation Table: A node by item recommendation rank matrix, which shows which item is
recommended with kth rank.

U Matrix:A user by latent factor matrix

715
NetMiner Module Reference

M Matrix:An item by latent factor matrix

C Bias: Shows C (User) biases learnt as a part of the model

M Bias: Shows M (Item) biases learnt as a part of the model

Answer List and Answer Matrix: Show the prediction for every pair (i.e. user and item).

(Answer List)

716
IV. Mining

(Answer Matrix)

 Charts
Learning Curve: A graphical representation of the decrease in RMSE (vertical axis) with epoch
(horizontal axis)

717
NetMiner Module Reference

Mining >> Collaborative Filtering >> Social

Singular Value Decomposition++ (SSVD++)

 Menu
Mining >> Collaborative Filtering >> Social SVD++

 Description
Social Singular Value Decomposition++ ('SSVD++') increases the prediction ability of SVD++

using a user’s social relation. Given the preference matrix , SSVD++ finds the model U

and V that minimizes the following objective function:

m n 2 2

     (S    
m m m m
1 1
arg min   Yij  U iV jT  U i U iT )2   U i  friend (i )  U i '    U i  items(i ) Vi ' 
2
ii 

U ,V  i 1 j 1 i 1 i  1 i  i 'friend (i )  i  i 'items (i )  

where| friend(i) | means the number of user i’s friends and | items(j) | means the number of users
who gave a feedback for item j.

 User Options

 Input
1-mode Network: Select a 1-mode network that has users’
social relations.

2-mode Network: Select a 2-mode network. A user can only


choose one 2-mode network. The link weight of a 2-mode

network must be scaled to a value between 1 and 5.

 Link Merge: When selected data contains multiple links,

where more than two links connect the same source node

and target node pair, a user should decide how to merge them into a single link.

718
IV. Mining

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set
among a whole set of data.

 Random Seed: If a user fixes this parameter with

a certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a

test set from a main nodeset without any

condition.

 Stratified (proportional): Randomly select the nodes for a test set so that the distribution of

labels for the selected nodes in the test set is similar to that of whole data set. For example, if

the whole data set has 30% red labels, 30% blue labels and 40% green labels, the test set

should follow the same distribution by having 30% red labels, 30% blue labels and 40%

green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

719
NetMiner Module Reference

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as

shown on the right:

 Main process
# of Features (rank): Dimensionality of a latent factor space

# of Items to Recommend: The number of items shown in a


recommendation table for each user.

Option:
 Min iteration: Minimum number of times to repeat

the learning (optimization) process

 Max iteration: Maximum number of times to repeat the learning (optimization) process

 Overfitting regularization parameter: A regularization constant to prevent the overfitting of

model U, M.

 Learning rate: Controls the step size when matrices U and M are iteratively adjusted.

 Learning rate discount rate: Discount factor for a learning rate.

 Regularization parameter for adjacency: A regularization constant to prevent the overfitting

for adjacency.

 Convergence Tolerance: If | (training error at i iteration) – (training error at i + 1iteration) | <

(Convergence Tolerance), the algorithm will be stopped.

720
IV. Mining

 Random Seed: If a user fixes a random seed, one can get the same validation set for the same

dataset.

 Post-process
Measuring Training Accuracy: Turn on this to
perform the hypothesis testing for the accuracy of both

training and test set.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘Social SVD++’ analysis, ‘Main Report’,

‘Recommendation Table’, ‘Predicted Value Table’, ‘U Matrix

Table’, ‘M Matrix Table’, ‘Answer List Table’, ‘Answer Matrix

Table’, ‘C Bias’, ‘M Bias’ and ‘Learning Curve Chart’ are

reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report: Main Report presents information of process, epoch(the number of iteration
performing) and training RMSE and test RMSE.

721
NetMiner Module Reference

 Tables
Predicted Value: Values predicted by models and the difference between predicted values and
original values are shown in this table.

Recommendation Table: A node by item recommendation rank matrix, which shows which item is
recommended with kth rank.

U Matrix:A user by latent factor matrix.

722
IV. Mining

M Matrix:An item by latent factor matrix

C Bias: Shows C (User) biases learnt as a part of the model

M Bias: Shows M (Item) biases learnt as a part of the model

Answer List and Answer Matrix: Show the prediction for every pair (i.e. user and item).

(Answer List)

723
NetMiner Module Reference

(Answer Matrix)

 Charts
Learning Curve: A graphical representation of the decrease in RMSE (vertical axis) with epoch
(horizontal axis)

724
IV. Mining

Mining >> Collaborative Filtering >> Implicit

Singular Value Decomposition (ISVD)

 Menu
Mining >> Collaborative Filtering >> ISVD

 Description
In NetMiner, collaborative filtering (‘CF’) algorithms such as SVD and SVD++ take a user’s explicit

feedback as an input, which is in the form of a scaled rating (e.g. 1 to 5) that represents a user’s

interest in items. However, for implicit feedback datasets such as a purchase history and a browsing
history, these algorithms cannot be used as numerical values of implicit feedback datasets indicate a

confidence, not a preference. Since users tend to be reluctant to rate products, CF for implicit
feedback data is suitable for many practical solutions. For these reasons, Yifan Hu, a researcher at

AT&T, had proposed a collaborative filtering algorithm for implicit feedback datasets [1]. Hu’s

algorithm is called Implicit Singular Value Decomposition (ISVD) because ISVD is a matrix
decomposition approach for CF. Unlike SVD and SVD++ that learns a model with a stochastic

gradient descent, the learning process of ISVD is similar to an alternative least square method. The
objective function of ISVD is:

(1)

where denotes an error between a binary matrix p and an estimation of p by the

model. This error term is then weighed by confidence (i.e. a user u’s implicit

feedback value for an item i). Therefore, ISVD minimizes the error of a high confidence user-item

pair more than a less confidence user-item pair.

 User Options

725
NetMiner Module Reference

 Input

2-mode Network: Select a 2-mode network. A user can


only choose one 2-mode network. The link weight of a 2-

mode network must be scaled to a value between 1 and 5.

 Link Merge: When selected data contains multiple

links, where more than two links connect the same

source node and target node pair, a user should

decide how to merge them into a single link.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

 Split Test Set (Condition): Divide selected set of data


into test and training sets (with the specified

condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set among a
whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a test

set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the nodes

for a test set so that the distribution of labels for the selected nodes in the test set is similar to

that of whole data set. For example, if the whole data set has 30% red labels, 30% blue labels

726
IV. Mining

and 40% green labels, the test set should follow the same distribution by having 30% red

labels, 30% blue labels and 40% green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as shown

on the right:

 Main process
# of Features (rank): Dimensionality of a latent factor space

# of Items to Recommend: The number of items shown in a


recommendation table for each user.

Option:
 Min iteration: Minimum number of times to repeat the

learning (optimization) process

 Max iteration: Maximum number of times to repeat the learning (optimization) process

727
NetMiner Module Reference

 Overfitting regularization parameter:Regularization constant to prevent the overfitting of

model U and M.

 Alpha:

 Random Seed: If a user fixes a random seed, one can get the same validation set for the same

dataset.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘ISVD’ analysis, ‘Main Report’,

‘Recommendation Table’, ‘U Matrix Table’, ‘M Matrix Table’,

‘Answer List Table’, ‘Answer Matrix Table’ and ‘Learning Curve

Chart’ are reported.

CAVEAT: By default, ‘Answer List Table’ and ‘Answer Matrix


Table’ are unselected. When dealing with a large dataset, there

may be a memory issue if a user wants to generate ‘Answer List

Table’ and ‘Answer Matrix Table’.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report
 Expected Percentile Ranking: We represent the percentile ranking of the program i

within the ordered list of every program prepared for a user u as . Using this

measure, indicates that the program i is predicted to be the most desirable

for a user u thereby preceding every other program in the list. On the other hand,

728
IV. Mining

indicates that the program i is predicted to be the least desirable for a

user u hence placed at the end of the list. The basic quality measure is the ‘expected

percentile ranking’ of a watching unit during the test phase, which is:

The expected percentile ranking > 50% means that the rank that an algorithm predicted is

not better than a random rank (i.e. ranked in random order).

We have decided to use a ‘percentile ranking’ instead of ‘absolute ranking’ in order to

make the discussion general and independent from other programs.

 Precision @ Top # (of items to recommend):

 Tables
Recommendation Table: A node by item recommendation rank matrix, which shows which item is
recommended with kth rank.

U Matrix:A user by latent factor matrix

729
NetMiner Module Reference

M Matrix:An item by latent factor matrix

Answer List and Answer Matrix: Show the prediction for every pair (i.e. user and item).

(Answer List)

(Answer Matrix)

730
IV. Mining

 Charts
Learning Curve: A graphical representation of the decrease in RMSE (vertical axis) with epoch
(horizontal axis)

731
NetMiner Module Reference

Mining >> Collaborative Filtering >> User Based

 Menu
Mining >> Collaborative Filtering >> User Based

 Description
The main task of collaborative filtering is to predict a user’s preference for items based on other
users’ preferences for the items. From the context of network analysis, it can be understood as a 2-

mode scoring based on a nodal similarity of 2-mode links.

For a given 2-mode matrix, the similarity among all pairs of nodes is computed based on their links

to items in another node. This value is then used as a weight for estimating a preference for each item

of each node.

 User Options
 Input
2-mode Network: Select a 2-mode network. A user can only
choose one 2-mode network.

 Nodeset: A sub nodeset containing the 2-mode network

that a user wants to select needs to be chosen.

 Link Merge: When selected data contains multiple links,

where more than two links connect the same source

node and target node pair, a user should decide how

to merge them into a single link.

 Pre-process
Data Allocation:
 All Training Set: Use whole data as a training set.

 Split Test Set (Random): Divide selected set of data

into a test set and a training set by random.

732
IV. Mining

 Split Test Set (Condition): Divide selected set of data into test and training sets (with the
specified condition).

Parameters in Split Test Set (Random):


 Proportion (%): The proportion of a test set among
a whole set of data.

 Random Seed: If a user fixes this parameter with a

certain integer, the nodes of a validation set are

always the same.

 Simple Random: Randomly choose the nodes of a

test set from a main nodeset without any condition.

 Stratified (proportional): Randomly select the

nodes for a test set so that the distribution of labels for the selected nodes in the test set is

similar to that of whole data set. For example, if the whole data set has 30% red labels, 30%

blue labels and 40% green labels, the test set should follow the same distribution by having

30% red labels, 30% blue labels and 40% green labels among the selected nodes.

 Stratified (equal): Choose the nodes of test set randomly so that the labels distribution of

chosen nodes is uniform.

Parameters in Split Test Set(Condition): Choose the nodes of a test set with a conditional
statement. For example, a user can get his or her test set using this option. First, a user adds a new

attribute column (“Allocation” in figure) that indicates at which phase this node is used:

733
NetMiner Module Reference

 The nodes having a “Test” entry for the “Allocation”

column are allocated to a test set by querying as shown

on the right:

 Main process
# of Items to Recommend: The number of items shown in a recommendation table for each user.

Collaborative Filtering (Method): The measures to be used


for computing similarities between two users (i.e. nodes).

 Correlation: Determines the similarity between two

nodes by using the correlation of two rows

corresponding to two nodes.

 Vector Similarity: Determines the similarity between

two nodes by using the vector similarity of two rows

corresponding to two nodes.

Collaborative Filtering (Option):


 Basic: Basic straightforward algorithm.

 Case Amplification: Amplifies the value of a vote.

 Default Voting: Sets a default vote value to every item for every user (to be used when a

dataset is too sparse)

 Inverse User Frequency: If a specific item is generally voted by too many users, the item

takes a negative weight.

Option:
 Collaborative Filtering Parameters: 'Case Amplification Rho', 'Default Voting k' and

'Default Voting d'

 Random Seed: If a user fixes a random seed, one can get the same validation set for the same

dataset.

734
IV. Mining

 Post-process
Measuring Training Accuracy:Turn on this to
perform the hypothesis testing for the accuracy of both

training and test set.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘User Based’ analysis, ‘Main Report’,

‘Recommendation Table’, ‘Predicted Value Table’, ‘Similarity

Matrix’, ‘Answer List Table’ and ‘Answer Matrix Table’ are

reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report: Main Report presents information of process, epoch(the number of iteration
performing) and training RMSE and test RMSE.

 Tables
Recommendation Table: A node by item recommendation rank matrix, which shows which item is
recommended with kth rank.

735
NetMiner Module Reference

Predicted Value: Values predicted by the models and difference from original are displayed.

Similarity Matrix: Shows similarities between nodes.

Answer List and Answer Matrix: Show the prediction for every pair (i.e. user and item).

736
IV. Mining

(Answer List)

(Answer Matrix)

 Time Complexity
 O( n3 + c3 ) where c is the number of categories.

 References
 Paul Resnick, NeophytosIacovou, MiteshSuchak, Peter Bergstrom, John Riedl. (1994).

GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In proceedings of

the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175-186. New

York. ACM.

737
NetMiner Module Reference

Mining >> Reduction >> Non-Negative Matrix

Factorization (NNMF)

 Menu
Mining>> Reduction >> NNMF

 Description
Non-Negative Matrix Factorization (‘NNMF’) algorithm factors a non-negative n by m matrix A into

non-negative factors W (n by k) and H (k by m).

The factors W and H are optimized to minimize the following objective function:

The optimization process uses an iterative method starting

with random initial values for W and H. Because the solution

of optimization may be local minima, repeated factorizations

may yield different W and H. More details can be found at


[1].

 User Options

 Input
1-mode Network: Select a 1-mode network. A user can only
choose one 1-mode network.

738
IV. Mining

2-mode Network: Select a 2-mode network. A user can only choose one 2-mode network. However,
the link weight of a 2-mode network must be a non-negative value.

 Link Merge: When selected data contains multiple links, where more than two links connect
the same source node and target node pair, a user should decide how to merge them into a

single link.

 Main process
# of Features (rank): Rank to approximate W and H

Replicate: The number of times to repeat the factorization,


using new random starting values for W and H.

Algorithm:

 mult: uses a multiplicative update algorithm.

 als: uses an alternating least-squares algorithm.

Option:

 Tolerance for parameters: Termination tolerance


for the factors W and H (Positive Scalar).

 Tolerance for function value: Termination tolerance


for the objective function value (Positive Scalar)

 Max iteration: Maximum number of iterations


allowed (Positive Integer).

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘NNMF’ analysis, ‘Main Report’, ‘W

Matrix Table’, ‘H Matrix Table’ and ‘WH Matrix Table’ are

reported.

739
NetMiner Module Reference

 Outputs

An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

 Root mean square residual:

CAVEAT: We have used ‘YahooStock’ dataset to illustrate how the following outputs might look
like.

 Tables
W Matrix: Left factor W.

H Matrix: Right factor H.

740
IV. Mining

WH Matrix: Lower-rank approximation to A.

 Time Complexity
 O( m * n * k), where m and n is the size of matrix A and k is a rank.

 References

741
NetMiner Module Reference

 [1] Berry, M. W., et al. "Algorithms and Applications for Approximate Nonnegative Matrix
Factorization." Computational Statistics and Data Analysis. Vol. 52, No. 1, 2007, pp. 155–

173.

742
IV. Mining

Mining >> Clustering (Common)

 Menu
Mining >> Clustering

 Description
This document contains explanations that are common to every clustering algorithm implemented in

NetMiner 4.

 User Options

 Input
Partition Vector for Evaluation: The partition vector
produced as the result of running this module can be evaluated

using exemplary partition vectors. If this partition vector is to

be saved as a node attribute, check the box. Upon checking

the box, a contingency table and other performance indices

such as ARI, Homogeneity, Completeness and V-measure will be provided.

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘Mining >>

Clustering’ analysis, ‘Main Report’, ‘Partition Vector Table’, ‘Contingency Table’, ‘Silhouette

Coefficient Table’ and ‘MDS Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

 Clustering Summary:

743
NetMiner Module Reference

o # of instances: The number of instances

o # of clusters: The number of clusters

o Average Silhouette Coefficient: The average of each node’s Silhouette coefficient.

The index ranges from -1 to 1 and increases as clusters become denser and well

separated. This index is provided only if a user selects Silhouette Coefficient

table report.

o ARI (Adjusted Rand Index): ARI measures the similarity between an exemplary

partition vector and the partition vector generated as an output, by ignoring

permutations. An ARI with the value 1 means that two vectors are exactly the

same. If the value is close to 0 or negative, it means that the vectors are different

or slightly similar. This index is provided only if a user selects ‘Partition Vector

for Evaluation’.

o Homogeneity: If each partition produced as an output only contains the members

of a single exemplary partition, the value would be close to 1. However, if each

resulting partition contains the members of different exemplary partitions, the

value of this index would be close to 0. This index is provided only if a user

selects ‘Partition Vector for Evaluation’.

o Completeness: If every member of each exemplary partition was assigned to the

same partition produced as an output, its value would be close to 1. However, if

the members of each exemplary partition were assigned to different resulting

partitions, its value would be close to 0.

o V-measure: The harmonic mean of ‘Homogeneity’ and ‘Completeness’. The

index is provided only if a user selects ‘Partition Vector for Evaluation’.

 # of instances per cluster: The number of instances assigned to each partition (i.e. cluster).

744
IV. Mining

 Attribute Distribution: ‘Mean’, ‘standard deviation’, ‘minimum’ and ‘maximum’ of


attribute values of nodes assigned to each cluster.

 Tables
Partition Vector:Shows clusters to which each node belongs.

Contingency Table:(i, j)th entry shows the number of instances that are assigned to an ith cluster in
an exemplary partition and to the jth cluster in the partition generated as an output.

745
NetMiner Module Reference

Silhouette Coefficient: Shows a Silhouette coefficient for each node, which ranges from -1 to 1. The
higher the Silhouette coefficient for a node, the more appropriately clustered the node is.

 Maps
MDS: By default, nodes’ colors are determined according to a cluster to which each node assigned.

746
IV. Mining

 Inspect
Cluster: Upon selecting a cluster in the combo box, the style
of nodes belonging to the cluster will be changed to the style

pre-established in the global option. The corresponding global

option is as follows:

 Nodes of the selected cluster: Node >> Subset


Membership >> Subset Member Node(s)

 Nodes of the non-selected cluster: Node >> Subset Membership >> Subset Non-member
Node(s)

747
NetMiner Module Reference

Random shift: If many nodes are so overlapped that the


map becomes incomprehensible, randomly shifting nodes

can help a user to understand the map. A user can adjust the

maximum length that nodes can move from its original

position using the slider.

 References

 Silhouette coefficient :Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the


Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20:

53–65. doi:10.1016/0377-0427(87)90125-7.

 Homogeneity, Completeness, V-measure : V-Measure: A conditional entropy-based external


cluster evaluation measure Andrew Rosenberg and Julia Hirschberg, 2007

748
IV. Mining

Mining >> Clustering >> Hierarchical >> Matrix

 Menu
Mining >> Clustering >> Hierarchical >> Matrix

 Description
This hierarchical clustering algorithm performs an agglomerative hierarchical clustering of nodes
for a given 1-mode proximity (i.e. similarity or dissimilarity) matrix. To find the best fusion level for

given data, Mojena’s best-cut 2 algorithm is performed. There are four cluster options:

 Single: Determines the distance between two clusters by calculating the distance of two
closest nodes from different clusters (i.e. one node from a cluster and another node from
other cluster)

 Complete: Determines the distance between two clusters by calculating the distance of two
furthest nodes from different clusters.

 Average: Determines the distance between two clusters by calculating the average
distance between all pairs of nodes from two different clusters.

 Ward: This method is somewhat different from previous three methods. Each cluster's
homogeneity is appraised by the sum of squared deviations (ESS) of the distance between

each actor in the given cluster and each actor in the network from the mean distance

between actors in a cluster and one in the network. In other words, if all nodes in the given

cluster have the same distance to every node in the network, ESS of the given cluster

would be equal to 0 because all nodes in a cluster are homogeneous. Users need to be

careful when this method is used. The criterion for fusion is that it should produce the

smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 User Options

 Input

749
NetMiner Module Reference

1-mode Network: Select a 1-mode network. A user can only choose


one 1-mode network.

 Link Merge: When selected data contains multiple links,


where more than two links connect the same source node and

target node pair, a user should decide how to merge them into

a single link.

 Pre-process
Symmetrize: A user must symmetrize data before running this
module. In other words, directed / asymmetric data must be

transformed to undirected / symmetric data.

 Main process

Proximity (Disimilarity or Similarity):Select whether a designated 1-


mode network is to be interpreted as similarity or dissimilarity data.

Cluster Option: Select a cluster algorithm among ‘Single’,


‘Complete’, ‘Average’ and ‘Ward’.

 Output
A user can select in which format(s) the outputs are to be reported. As

the result of ‘Hierarchical Clustering (Matrix)’ analysis, ‘Main

Report’, ‘Cluster Matrix Table’, ‘Permutation Vector Table’ and

‘Dendrogram Chart’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output

window.

 Reports

750
IV. Mining

Main Report

 Cluster Diagram: Columns represent the index of nodes while rows represent the level of
association (i.e. similarity or dissimilarity) among nodes within clusters. For each level, an

‘X’ indicates that nodes associated with the columns are assigned to the same cluster.

 Tables
Cluster Matrix: ‘Mi, k = c’ means that a
node i is a member of a cluster c at the

aggregation level of k (initially, c is the

index of each node).

 # of Clusters: The number of


clusters in each step

 Fusion Level: The minimum distance between two clusters in each step. In this step, two
clusters with the minimum distance are merged.

 Modularity: The modularity value is proportional to the quality of a cluster (e.g. large value
suggests that nodes are better clustered than how they are clustered with a lower value).

There are normally four levels of best-cut score:

o If score < 1.25, it is bad.

o If 1.25 ≤ score ≤ 2.75, it is normal.

o If 2.75 ≤ score < 3.5, it is good.

o If 3.75 ≤ score, it is excellent.

Permutation Vector: The order of a node in a leaf level of a


dendrogram.

751
NetMiner Module Reference

 Charts
Dendrogram: As the fusion level increases, this chart shows how the number of clusters decreases
(i.e. shows the progress of how each node is being clustered).

 Time Complexity
 O( n3 )

 References
 Ward, Jr., J. H. “Hierarchical Grouping to Optimize an Objective Function.” Journalof the
American Statistical Association, 58 (1963), 236-244.

 Related Topics
 Mining >> Clustering >> Hierarchical >> Vector

752
IV. Mining

Mining >> Clustering >> Hierarchical >> Vector

 Menu
Mining >> Clustering >> Hierarchical >> Vector

 Description
The hierarchical clustering algorithm performs an agglomerative hierarchical clustering of nodes
for given attributes and distance / similarity measure. To find the best fusion level for the given data,

Mojena’s best-cut 2 algorithm is performed. There are four cluster options:

 Single: Determines the distance between two clusters by calculating the distance of two
closest nodes from different clusters (i.e. one node from a cluster and another node from
other cluster)

 Complete: Determines the distance between two clusters by calculating the distance of two
furthest nodes from different clusters.

 Average: Determines the distance between two clusters by calculating the average
distance between all pairs of nodes from two different clusters.

 Ward: This method is somewhat different from previous three methods. Each cluster's
homogeneity is appraised by the sum of squared deviations (ESS) of the distance between

each actor in the given cluster and each actor in the network from the mean distance

between actors in a cluster and one in the network. In other words, if all nodes in the given

cluster have the same distance to every node in the network, ESS of the given cluster

would be equal to 0 because all nodes in a cluster are homogeneous. Users need to be

careful when this method is used. The criterion for fusion is that it should produce the

smallest possible increase in the ESS. In addition, ward method tends to make the sizes of

clusters similar.

 User Options

 Input

753
NetMiner Module Reference

Node Attribute: Select a numerical attribute(s), which will be used to calculate distances or
similarities.

 Main process
Proximity Measures: Select a similarity or distance measure
among ‘Euclidean distance’, ‘Manhattan distance’ and ‘Exact

Match’, which will be used to calculate the distance between nodes.

Normalize: Select ‘Yes’ if a user wants to normalize an attribute(s).

Cluster Option: Select a cluster algorithm among ‘Single’,


‘Complete’, ‘Average’ and ‘Ward’.

 Output
A user can select in which format(s) the outputs are to be reported. As

the result of ‘Hierarchical Clustering (Vector)’ analysis, ‘Main Report’,

‘Cluster Matrix Table’, ‘Permutation Vector Table’ and ‘Dendrogram

Chart’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output

window.

 Reports
Main Report

 Cluster Diagram: Columns represent the index of nodes while rows represent the level of
association (i.e. similarity or dissimilarity) among nodes within clusters. For each level, an

‘X’ indicates that nodes associated with the columns are assigned to the same cluster.

754
IV. Mining

 Tables
Cluster Matrix: ‘Mi, k = c’ means that a
node i is a member of a cluster c at the

aggregation level of k (initially, c is the

index of each node).

 # of Clusters: The number of


clusters in each step

 Fusion Level: The minimum distance between two clusters in each step. In this step, two
clusters with the minimum distance are merged.

 Modularity: The modularity value is proportional to the quality of a cluster (e.g. large value
suggests that nodes are better clustered than how they are clustered with a lower value).

There are normally four levels of best-cut score:

o If score < 1.25, it is bad.

o If 1.25 ≤ score ≤ 2.75, it is normal.

o If 2.75 ≤ score < 3.5, it is good.

o If 3.75 ≤ score, it is excellent.

Permutation Vector: The order of a node in a leaf level of a


dendrogram.

 Charts
Dendrogram: As the fusion level increases, this chart shows how the number of clusters decreases
(i.e. shows the progress of how each node is being clustered).

755
NetMiner Module Reference

 Time Complexity
 O( n3 )

 References
 Ward, Jr., J. H. “Hierarchical Grouping to Optimize an Objective Function.” Journal of the
American Statistical Association, 58 (1963), 236-244.

 Related Topics
 Mining >> Clustering >> Hierarchical >>Matrix

756
IV. Mining

Mining >> Clustering >> K-means

 Menu
Mining >> Clustering >> K-means

 Description
K-means algorithm is a clustering algorithm that assigns each node to its closest cluster. The
distance between a node and a cluster is calculated as a square distance between the node and the

mean of a cluster, which is calculated by averaging attributes (i.e. feature vector) of the nodes that

belong to the cluster. This algorithm proceeds by alternating between two steps: ‘Assignment’ step

and ‘Update’ step. In the ‘Assignment’ step, each node is assigned to its closest cluster. In the

‘Update’ step, the algorithm re-calculates the mean of a cluster based on the nodes (i.e. members)
assigned in the ‘Assignment’ step. These two steps are repeated until there is no change with

assignments.

 User Options

 Input
Node Attribute: Select a numerical attribute(s), which will be
used to calculate distances or similarities.

Partition Vector for Evaluation: The partition vector


produced as the result of running this module can be evaluated

using exemplary partition vectors. If this partition vector is to

be saved as a node attribute, check the box. Upon checking the

box, a contingency table and other performance indices such as

ARI, Homogeneity, Completeness and V-measure will be provided.

 Main process

757
NetMiner Module Reference

Number of Clusters: Decides the number of clusters.

Replicate: The number of times to repeat 'k-means' algorithm.

Random Seed: The result of running k-means algorithm


depends on random values generated during initialization. If

the random seed is fixed, the algorithm generates the same

random values so that a user can have the same result every

time he or she runs the algorithm.

Initialization Method: Selects how to create an initial


assignment.

 Forgy: Randomly selects k nodes with uniform probability and uses each of these nodes as the
initial ‘mean’ of a cluster. Having initialized clusters, the remaining nodes will be assigned to its

closest cluster.

 Random Partition: Nodes are assigned randomly to each cluster with uniform probability.

 K-means++:Select k nodes sequentially. In each stage, the nodes that are close to already
selected nodes are selected with lower probability. The feature vector (i.e. attribute values) of

each selected node is used as the ‘mean’ of each cluster.

Remaining nodes are assigned to the cluster closest to them.

Normalize: Select ‘Yes’ if a user wants to normalize an


attribute(s).

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘k-means’ analysis, ‘Main Report’,

‘Partition Vector Table’, ‘Contingency Table’, ‘Silhouette

Coefficient Table’ and ‘MDS Map’ are reported.

 Outputs

758
IV. Mining

An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

 Distance to the Nearest Mean: Shows the ‘sum’, ‘average’ and ‘maximum distance from
each node to the nearest cluster’s mean.

 Means of clusters: The mean of each cluster

 Clustering Summary:
o # of instances: The number of instances

o # of clusters: The number of clusters

o Average Silhouette Coefficient: The average of each node’s Silhouette coefficient.

The index ranges from -1 to 1 and increases as clusters become denser and well

separated. This index is provided only if a user selects Silhouette Coefficient

table report.

o ARI (Adjusted Rand Index): ARI measures the similarity between an exemplary

partition vector and the partition vector generated as an output, by ignoring

permutations. An ARI with the value 1 means that two vectors are exactly the

same. If the value is close to 0 or negative, it means that the vectors are different

or slightly similar. This index is provided only if a user selects ‘Partition Vector

for Evaluation’.

o Homogeneity: If each partition produced as an output only contains the members

of a single exemplary partition, the value would be close to 1. However, if each

resulting partition contains the members of different exemplary partitions, the

759
NetMiner Module Reference

value of this index would be close to 0. This index is provided only if a user

selects ‘Partition Vector for Evaluation’.

o Completeness: If every member of each exemplary partition was assigned to the

same partition produced as an output, its value would be close to 1. However, if

the members of each exemplary partition were assigned to different resulting

partitions, its value would be close to 0.

o V-measure: The harmonic mean of ‘Homogeneity’ and ‘Completeness’. The

index is provided only if a user selects ‘Partition Vector for Evaluation’.

 # of instances per cluster: The number of instances assigned to each partition (i.e. cluster).

 Attribute Distribution: ‘Mean’, ‘standard deviation’, ‘minimum’ and ‘maximum’ of


attribute values of nodes assigned to each cluster.

 Tables
Partition Vector:Shows clusters to which each node belongs.

760
IV. Mining

Contingency Table: (i, j)th entry shows the number of instances that are assigned to an ith cluster in
an exemplary partition and to the jth cluster in the partition generated as an output.

Silhouette Coefficient: Shows a Silhouette coefficient for each node, which ranges from -1 to 1. The
higher the Silhouette coefficient for a node, the more appropriately clustered the node is.

761
NetMiner Module Reference

 Maps
MDS: By default, nodes’ colors are determined according to a cluster to which each node assigned.

 Inspect
Cluster: Upon selecting a cluster in the combo box, the
style of nodes belonging to the cluster will be changed to

the style pre-established in the global option. The

corresponding global option is as follows:

 Nodes of the selected cluster: Node >> Subset Membership >> Subset Member Node(s)

 Nodes of the non-selected cluster: Node >> Subset Membership >> Subset Non-member
Node(s)

762
IV. Mining

 Time Complexity
 O( n(dk+1) * log n ) where n is the number of nodes, d is the dimension and k is the number

of clusters.

 References
 Hartigan, J. A.; Wong, M. A. (1979). "Algorithm AS 136: A K-Means Clustering
Algorithm". Journal of the Royal Statistical Society, Series C 28 (1): 100–108. JSTOR

2346830

763
NetMiner Module Reference

Mining >> Clustering >> Gaussian Mixture

Model (GMM)

 Menu
Mining >> Clustering >> GMM

 Description
Gaussian Mixture Model (‘GMM’) is a clustering algorithm. Unlike k-means clustering algorithm
that only provides the mean (i.e. centroid) of a cluster, GMM provides not only means but also the

variance of each cluster. In the learning phase, GMM constructs the mixture model of Gaussian by

following these steps:

1. Initialize means (μ), variances (Σ) and prior (ϕ) parameters:

2. (EM-step) Assign a label z to the training data point x based on posterior, means (μ),

variances (Σ) and prior (ϕ).

764
IV. Mining

3. Update means (μ), variances (Σ) and prior (ϕ) parameters of Gaussian models.

4. Re-assign a label to each training data in the same manner as step 2.

765
NetMiner Module Reference

5. Repeat until ‘Convergence’ occurs.

 User Options

 Input
Node Attribute: Select a numerical attribute(s), which will be
used to calculate distances or similarities.

Partition Vector for Evaluation: The partition vector produced


as the result of running this module can be evaluated using

exemplary partition vectors. If this partition vector is to be saved

as a node attribute, check the box. Upon checking the box, a contingency table and other

performance indices such as ARI, Homogeneity, Completeness and V-measure will be provided.

766
IV. Mining

 Main process
Number of Clusters: Decides the number of clusters.

Replicate: The number of times to repeat the EM algorithm.


The Gaussian mixture model with the largest likelihood is

selected as a result.

Covariance Type: ‘Diagonal’ if the covariance matrices


are assumed to be diagonal. Otherwise, select ‘Full’.

Share Covariance: Select ‘Yes’ if Gaussian models share


a common covariance.

Option:

 Max iteration: The maximum iteration for updating the parameters of Gaussian mixture
model.

 Regularization value: A non-negative regularization number added to the diagonal of


covariance matrices to avoid an ill-conditioned covariance matrix.

 Tolerance for function value: The tolerance value for


convergence check.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘GMM’ analysis, ‘Main Report’,

‘Partition Vector Table’, ‘Contingency Table’, ‘Silhouette

Coefficient Table’, ‘Posterior Probability Table’ and ‘MDS

Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

767
NetMiner Module Reference

 Reports
Main Report

 Clustering Summary:
o # of instances: The number of instances

o # of clusters: The number of clusters

o Average Silhouette Coefficient: The average of each node’s Silhouette coefficient.

The index ranges from -1 to 1 and increases as clusters become denser and well

separated. This index is provided only if a user selects Silhouette Coefficient

table report.

o ARI (Adjusted Rand Index): ARI measures the similarity between an exemplary

partition vector and the partition vector generated as an output, by ignoring

permutations. An ARI with the value 1 means that two vectors are exactly the

same. If the value is close to 0 or negative, it means that the vectors are different

or slightly similar. This index is provided only if a user selects ‘Partition Vector

for Evaluation’.

o Homogeneity: If each partition produced as an output only contains the members

of a single exemplary partition, the value would be close to 1. However, if each

resulting partition contains the members of different exemplary partitions, the

value of this index would be close to 0. This index is provided only if a user

selects ‘Partition Vector for Evaluation’.

o Completeness: If every member of each exemplary partition was assigned to the

same partition produced as an output, its value would be close to 1. However, if

the members of each exemplary partition were assigned to different resulting

partitions, its value would be close to 0.

o V-measure: The harmonic mean of ‘Homogeneity’ and ‘Completeness’. The

index is provided only if a user selects ‘Partition Vector for Evaluation’.

768
IV. Mining

 # of instances per cluster: The number of instances assigned to each partition (i.e. cluster).

 Attribute Distribution: ‘Mean’, ‘standard deviation’, ‘minimum’ and ‘maximum’ of


attribute values of nodes assigned to each cluster.

 Tables
Partition Vector:Shows clusters to which each node belongs.

Contingency Table: (i, j)th entry shows the number of instances that are assigned to an ith cluster in
an exemplary partition and to the jth cluster in the partition generated as an output.

769
NetMiner Module Reference

Silhouette Coefficient: Shows a Silhouette coefficient for each node, which ranges from -1 to 1. The
higher the Silhouette coefficient for a node, the more appropriately clustered the node is.

Posterior Probability Table: Shows posterior probabilities of components.

 Maps

770
IV. Mining

MDS: By default, nodes’ colors are determined according to a cluster to which each node assigned.

 Inspect
Cluster: Upon selecting a cluster in the combo box, the
style of nodes belonging to the cluster will be changed to

the style pre-established in the global option. The

corresponding global option is as follows:

 Nodes of the selected cluster: Node >> Subset


Membership >> Subset Member Node(s)

 Nodes of the non-selected cluster: Node >> Subset Membership >> Subset Non-member
Node(s)

771
NetMiner Module Reference

 References
 Bishop, Christopher M. "Pattern recognition and machine learning (information science
and statistics)." (2007).

772
IV. Mining

Mining >> Clustering >> Partitioning Around

Medoids (PAM) >> Matrix

 Menu
Mining >> Clustering >> PAM >> Matrix

 Description
K-medoids algorithm is a clustering algorithm that selects a collection of nodes called medoids to
minimize the average distance from each node to its closest medoid. The nodes whose closest

medoids are the same belong to the same cluster. Partitioning Around Medoids (‘PAM’)

algorithm is the most common realization of a k-medoids algorithm and consists of two phases:

‘Build’ phase and ‘Swap’ phase. In ‘Build’ phase, a collection of k nodes is selected in a greedy

manner. In ‘Swap’ phase, the quality of a cluster is improved by exchanging medoids with other
nodes.

 User Options

 Input
1-mode Network: Select a similarity or dissimilarity network.

 Link Merge: When selected data contains multiple links,


where more than two links connect the same source node

and target node pair, a user should decide how to merge

them into a single link.

Partition Vector for Evaluation: The partition vector produced as the result of running this module
can be evaluated using exemplary partition vectors. If this partition vector is to be saved as a node

attribute, check the box. Upon checking the box, a contingency table and other performance indices

such as ARI, Homogeneity, Completeness and V-measure will be provided.

773
NetMiner Module Reference

 Pre-process
Symmetrize: A user must symmetrize data before running this module.
In other words, directed / asymmetric data must be transformed to

undirected / symmetric data.

 Main process
Number of Medoids: Decides the number of medoids to be
selected, which is equal to the number of clusters.

Max Number of Swaps: Decides the maximum number of


times the swap phase is to be repeated.

Proximity: Decides whether the 1-mode Network selected as


an input is to be interpreted as Similarity data or

Dissimilarity matrix. If ‘Similarity’ is selected, the distance between two nodes is calculated as 1 /

(1.0 + the weight of links between them). If ‘Dissimilarity’ is selected, the distance is equal to the

weight of links between them. The weight is assumed to be 0 when the link between two nodes does

not exist. The distance to oneself is also assumed to be 0.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘PAM (Matrix)’ analysis, ‘Main Report’, ‘Partition

Vector Table’, ‘Contingency Table’, ‘Silhouette Coefficient Table’,

‘Medoids Table’ and ‘MDS Map’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

774
IV. Mining

 Distance to the Nearest Medoid: Shows the ‘sum’, ‘average’ and ‘maximum distance from
each node to the nearest medoid. The sum of distances is the objective value that PAM

algorithm tries to minimize.

 Clustering Summary:
o # of instances: The number of instances

o # of clusters: The number of clusters

o Average Silhouette Coefficient: The average of each node’s Silhouette coefficient.

The index ranges from -1 to 1 and increases as clusters become denser and well

separated. This index is provided only if a user selects Silhouette Coefficient

table report.

o ARI (Adjusted Rand Index): ARI measures the similarity between an exemplary

partition vector and the partition vector generated as an output, by ignoring

permutations. An ARI with the value 1 means that two vectors are exactly the

same. If the value is close to 0 or negative, it means that the vectors are different

or slightly similar. This index is provided only if a user selects ‘Partition Vector

for Evaluation’.

o Homogeneity: If each partition produced as an output only contains the members

of a single exemplary partition, the value would be close to 1. However, if each

resulting partition contains the members of different exemplary partitions, the

value of this index would be close to 0. This index is provided only if a user

selects ‘Partition Vector for Evaluation’.

o Completeness: If every member of each exemplary partition was assigned to the

same partition produced as an output, its value would be close to 1. However, if

the members of each exemplary partition were assigned to different resulting

partitions, its value would be close to 0.

o V-measure: The harmonic mean of ‘Homogeneity’ and ‘Completeness’. The

index is provided only if a user selects ‘Partition Vector for Evaluation’.

775
NetMiner Module Reference

 # of instances per cluster: The number of instances assigned to each partition (i.e. cluster).

 Tables
Partition Vector: Shows clusters to which each node belongs.

Contingency Table: (i, j)th entry shows the number of instances that are assigned to an ith cluster in
an exemplary partition and to the jth cluster in the partition generated as an output.

776
IV. Mining

Silhouette Coefficient: Shows a Silhouette coefficient for each node, which ranges from -1 to 1. The
higher the Silhouette coefficient for a node, the more appropriately clustered the node is.

Medoids: The medoid node of each cluster.

 Maps
MDS: By default, nodes’ colors are determined according to a cluster to which each node assigned.

777
NetMiner Module Reference

 Inspect
Cluster: Upon selecting a cluster in the combo box, the
style of nodes belonging to the cluster will be changed to

the style pre-established in the global option. The

corresponding global option is as follows:

 Nodes of the selected cluster: Node >> Subset


Membership >> Subset Member Node(s)

 Nodes of the non-selected cluster: Node >> Subset Membership >> Subset Non-member
Node(s)

 Time Complexity
 O( k * (n – k)2 ) where n is the number of nodes and k is the number of medoids.

 References
 Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical
Data Analysis Based on the L1–Norm and Related Methods, edited by Y. Dodge, North-

Holland, 405–416.

 Related Topics
 Mining >> Clustering >> PAM >> Vector

778
IV. Mining

Mining >> Clustering >> Partitioning Around

Medoids (PAM) >> Vector

 Menu
Mining >> Clustering >> PAM >> Vector

 Description
K-medoids algorithm is a clustering algorithm that selects a collection of nodes called medoids to
minimize the average distance from each node to its closest medoid. The nodes whose closest

medoids are the same belong to the same cluster. Partitioning Around Medoids (‘PAM’)

algorithm is the most common realization of a k-medoids algorithm and consists of two phases:

‘Build’ phase and ‘Swap’ phase. In ‘Build’ phase, a collection of k nodes is selected in a greedy

manner. In ‘Swap’ phase, the quality of a cluster is improved by exchanging medoids with other
nodes.

 User Options

 Input
Node Attribute: Select a numerical attribute(s), which will be
used to calculate distances or similarities.

Partition Vector for Evaluation: The partition vector produced


as the result of running this module can be evaluated using

exemplary partition vectors. If this partition vector is to be saved

as a node attribute, check the box. Upon checking the box, a

contingency table and other performance indices such as ARI, Homogeneity, Completeness and V-

measure will be provided.

 Main process

779
NetMiner Module Reference

Number of Medoids (Clusters): Decides the number of


medoids to be selected, which is equal to the number of

clusters.

Max Number of Swaps: Decides the maximum number of


times the swap phase is to be repeated.

Proximity Measures: Select a similarity or distance measure


among ‘Euclidean distance’, ‘Manhattan distance’ and ‘Exact

Match’, which will be used to calculate the distance between

nodes.

Normalize: Select ‘Yes’ if a user wants to normalize an attribute(s).

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘PAM (Vector)’ analysis, ‘Main

Report’, ‘Partition Vector Table’, ‘Contingency Table’,

‘Silhouette Coefficient Table’, ‘Medoids Table’ and ‘MDS Map’

are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

 Distance to the Nearest Medoid: Shows the ‘sum’, ‘average’ and ‘maximum distance from
each node to the nearest medoid. The sum of distances is the objective value that PAM

algorithm tries to minimize.

780
IV. Mining

 Clustering Summary:
o # of instances: The number of instances

o # of clusters: The number of clusters

o Average Silhouette Coefficient: The average of each node’s Silhouette coefficient.

The index ranges from -1 to 1 and increases as clusters become denser and well

separated. This index is provided only if a user selects Silhouette Coefficient

table report.

o ARI (Adjusted Rand Index): ARI measures the similarity between an exemplary

partition vector and the partition vector generated as an output, by ignoring

permutations. An ARI with the value 1 means that two vectors are exactly the

same. If the value is close to 0 or negative, it means that the vectors are different

or slightly similar. This index is provided only if a user selects ‘Partition Vector

for Evaluation’.

o Homogeneity: If each partition produced as an output only contains the members

of a single exemplary partition, the value would be close to 1. However, if each

resulting partition contains the members of different exemplary partitions, the

value of this index would be close to 0. This index is provided only if a user

selects ‘Partition Vector for Evaluation’.

o Completeness: If every member of each exemplary partition was assigned to the

same partition produced as an output, its value would be close to 1. However, if

the members of each exemplary partition were assigned to different resulting

partitions, its value would be close to 0.

o V-measure: The harmonic mean of ‘Homogeneity’ and ‘Completeness’. The

index is provided only if a user selects ‘Partition Vector for Evaluation’.

781
NetMiner Module Reference

 # of instances per cluster: The number of instances assigned to each partition (i.e. cluster).

 Tables
Partition Vector:Shows clusters to which each node belongs.

Contingency Table: (i, j)th entry shows the number of instances that are assigned to an ith cluster in
an exemplary partition and to the jth cluster in the partition generated as an output.

Silhouette Coefficient: Shows a Silhouette coefficient for each node, which ranges from -1 to 1. The
higher the Silhouette coefficient for a node, the more appropriately clustered the node is.

782
IV. Mining

Medoids: The medoid node of each cluster.

 Maps
MDS: By default, nodes’ colors are determined according to a cluster to which each node assigned.

783
NetMiner Module Reference

 Inspect
Cluster: Upon selecting a cluster in the combo box, the style of nodes belonging to the cluster will
be changed to the style pre-established in the global option. The corresponding global option is as

follows:

 Nodes of the selected cluster: Node >> Subset


Membership >> Subset Member Node(s)

 Nodes of the non-selected cluster: Node >>


Subset Membership >> Subset Non-member

Node(s)

 Time Complexity
 O( k * (n – k)2 ) where n is the number of nodes and k is the number of medoids.

 References
 Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical
Data Analysis Based on the L1–Norm and Related Methods, edited by Y. Dodge, North-

Holland, 405–416.

 Related Topics
 Mining >> Clustering >> PAM >> Matrix

784
IV. Mining

Mining >> Anomaly Detection >> Probability

Distribution >> Independent Normal

 Menu
Mining >> Anomaly Distribution >> Probability Distribution >> Independent Normal

 Description
This algorithm finds anomalistic main nodes under the assumption that the values of each selected
attribute follow an independent normal distribution and anomalies’ attribute values are unlikely to

follow this distribution. The distribution of each attribute can be estimated based on current attribute

values by using a maximum likelihood estimation ('MLE') method. After estimating the

distribution, the probability density and the Mahalanobis distance of each node in regard to each

attribute are calculated. If ith attribute follows , the Mahalanobis distance of jth node

regarding ith attribute is defined as where means jth node’s ith attribute value. The

overall probability density of jth node is defined as where means the

probability density function of and overall Mahalanobis distance of jth node is defined as

. In general, anomalistic nodes have low probability density and high Mahalanobis

distance.

 User Options

 Input

785
NetMiner Module Reference

Node Attribute: Select a numerical attribute(s), which will be used to calculate anomaly score.

 Main process
Data to Estimate Distribution: Selects main nodes
whose attribute values are used to estimate the

distributions.

 All: Selects all main nodes.


 Condition: Selects main nodes whose attribute value(s) satisfies the specified conditions. If
a user can distinguish sufficient number of ‘known normal’ nodes from ‘unknown’ nodes

that are anomalies, estimating the distribution based on only ‘known normal’ nodes

improves the performance.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Anomaly detection (Probability

distribution >> Independent Normal)’ analysis, ‘Main Report’,

‘Probability Density Table’, ‘Probability Density (Log Scale)

Table’ and ‘Mahalanobis Distance Table ’are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

 Estimated Distribution: Means and standard deviations of the estimated distributions.

786
IV. Mining

 Probability Density Distribution: Mean, standard deviation, minimum and maximum of


each probability density.

787
NetMiner Module Reference

 Probability Density (Log Scale) Distribution: Mean, standard deviation, minimum and
maximum of each log scale probability density.

788
IV. Mining

 Mahalanobis Distance Distribution: Mean, standard deviation, minimum and maximum of


each Mahalanobis distance.

 Tables
Probability Density: Shows the probability density value of each node.

Probability Density (Log Scale): Shows the log scale probability density value of each node.

789
NetMiner Module Reference

Mahalanobis Distance: Shows the Mahalanobis distance of each node.

 Time Complexity
 O( n * k ) where n is the number of nodes and k is the number of attributes.

790
IV. Mining

Mining >> Anomaly Detection >> Probability

Distribution >> Multivariate Normal

 Menu
Mining >> Anomaly Distribution >> Probability Distribution >> Multivariate Normal

 Description
This algorithm finds anomalistic main nodes under the assumption that the values of each selected

attribute follow a multivariate normal distribution and anomalies’ attribute values are unlikely to

follow this distribution. The distribution of each attribute can be estimated based on current attribute

values by using a maximum likelihood estimation ('MLE') method. Given the estimated

distribution , the probability density and the Mahalanobis distance of each node are

calculated. The Mahalanobis distance of ith node is defined as In general,

anomalistic nodes have low probability density and high Mahalanobis distance.

 User Options

 Input
Node Attribute: Select a numerical attribute(s), which will be
used to calculate anomaly score.

 Main process
Data to Estimate Distribution: Selects main nodes
whose attribute values are used to estimate the

distributions.

 All: Selects all main nodes.

791
NetMiner Module Reference

 Condition: Selects main nodes whose attribute value(s) satisfies the specified conditions. If
a user can distinguish sufficient number of ‘known normal’ nodes from ‘unknown’ nodes

that are anomalies, estimating the distribution based on only ‘known normal’ nodes

improves the performance.

 Output
A user can select in which format(s) the outputs are to be

reported. As the result of ‘Anomaly detection (Probability

distribution >> Multivariate Normal’ analysis, ‘Main Report’,

‘Probability Density Table’, ‘Probability Density (Log Scale)

Table’ and ‘Mahalanobis Distance Table’ are reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

 Estimated Distribution: Means and covariance matrix of the estimated distributions.

 Probability Density Distribution: Mean, standard deviation, minimum and maximum of the
probability density.

792
IV. Mining

 Probability Density (Log Scale) Distribution: Mean, standard deviation, minimum and
maximum of the log scale probability density.

 Mahalanobis Distance Distribution: Mean, standard deviation, minimum and maximum of


each Mahalanobis distance.

 Tables
Probability Density:Shows the probability density value of each node.

Probability Density (Log Scale): Shows the log scale probability density value of each node.

793
NetMiner Module Reference

Mahalanobis Distance: Shows the Mahalanobis distance of each node.

 Time Complexity
 O( n * k2 ) where n is the number of nodes and k is the number of attributes.

794
IV. Mining

Mining >> Anomaly Detection >> Local Outlier

Factor >> Matrix

 Menu
Mining >> Anomaly Distribution >> Local Outlier Factor >> Matrix

 Description
This module finds anomalistic nodes under the assumption that anomalies’ local densities are far

smaller than those of their neighbors. The locality of a node A is defined by , a set of k nearest

neighbors of node A. The local density of a node is roughly an inverse of the average distance (or

dissimilarity) to its k nearest neighbors, and is measured more accurately by using a local

reachability density. The local reachability density of a node A is defined as

, where:

 k-distance(B) is the maximum distance from a node B to any k nearest neighbor of node B

and

 is the distance from node A to node B.

The local outlier factor of a node A, which is the deviation of the local density of a node, is defined

as .

 If the LOF of a node is close to 1, the node’s local density is comparable to those of its

neighbors hence the node is not an anomaly.

 However, if the LOF of a node is far larger than 1, the node’s local density is far smaller

than those of its neighbors hence the node is an anomaly.

795
NetMiner Module Reference

 User Options

 Input
1-mode Network: Select a similarity or dissimilarity network.

 Link Merge: When selected data contains multiple links,


where more than two links connect the same source

node and target node pair, a user should decide how to

merge them into a single link.

 Pre-process

Symmetrize: A user must symmetrize data before running this


module. In other words, directed / asymmetric data must be

transformed to undirected / symmetric data.

 Main process
# of Neighbors(k): The number of neighbors contained in the
locality of a node.

Proximity: Decides whether the 1-mode Network selected as


an input is to be interpreted as Similarity matrix or Dissimilarity matrix. If ‘Similarity’ is selected,

the distance between two nodes is calculated as 1 / (1.0 + the weight of links between them). If

‘Dissimilarity’ is selected, the distance is equal to the weight of links between them. The weight is

assumed to be 0 when the link between two nodes does not exist. The distance to oneself is also

assumed to be 0.

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘Anomaly

detection (Local Outlier Factor >> Matrix)’ analysis, ‘Main Report’, ‘Local Outlier Factor Table’,

‘Local Reachability Density Table’, ‘K-Distance Table’ and ‘MDS Map’ are reported.

796
IV. Mining

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

 Local Outlier Factor Distribution:Mean, standard deviation, minimum and maximum of


local outlier factors.

 Local Reachability Density Distribution: Mean, standard deviation, minimum and


maximum of local reachability densities.

 K-distance Distribution: Mean, standard deviation, minimum and maximum of k-distances.

797
NetMiner Module Reference

 Tables
Local Outlier Factor:Shows the local outlier factor of each node.

798
IV. Mining

Local Reachability Density: Shows the local reachability density of each node.

K-Distance: Shows the k-distance of each node.

 Maps
MDS:The default style is set according to the ‘Common’ option in the ‘Preference >> Node’ tab. A
node with higher local outlier factor is presented bigger on the map.

799
NetMiner Module Reference

 Inspect
Random shift: If many nodes are so overlapped that the map
becomes incomprehensible, randomly shifting nodes can help

a user to understand the map. A user can adjust the maximum

length that nodes can move from its original position using

the slider.

 Time Complexity
 O( n2 * log( k ) ) where n is the number of nodes and k is the number of neighbors.

 References
 Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). "LOF: Identifying Density-
based Local Outliers". Proceedings of the 2000 ACM SIGMOD international conference

800
IV. Mining

on Management of data. SIGMOD '00: 93–104. doi:10.1145/335191.335388. ISBN 1-

58113-217-4.

 Related Topics
 Mining >> Anomaly Detection >> Local Outlier Factor >> Vector

801
NetMiner Module Reference

Mining >> Anomaly Detection >> Local

Outlier Factor >> Vector

 Menu
Mining >> Anomaly Distribution >> Local Outlier Factor >> Vector

 Description
This module finds anomalistic nodes under the assumption that anomalies’ local densities are far

smaller than those of their neighbors. The locality of a node A is defined by , a set of k nearest

neighbors of node A. The local density of a node is roughly an inverse of the average distance (or

dissimilarity) to its k nearest neighbors, and is measured more accurately by using a local

reachability density. The local reachability density of a node A is defined as

, where:

 k-distance(B) is the maximum distance from a node B to any k nearest neighbor of node B

and

 is the distance from node A to node B.

The local outlier factor of a node A, which is the deviation of the local density of a node, is defined

as .

 If the LOF of a node is close to 1, the node’s local density is comparable to those of its

neighbors hence the node is not an anomaly.

 However, if the LOF of a node is far larger than 1, the node’s local density is far smaller

than those of its neighbors hence the node is an anomaly.

802
IV. Mining

 User Options

 Input
Node Attribute: Select a numerical attribute(s), which will be
used to calculate distances.

 Pre-process
Symmetrize: A user must symmetrize data before running this
module. In other words, directed / asymmetric data must be

transformed to undirected / symmetric data.

 Main process
# of Neighbors(k): The number of neighbors contained in
the locality of a node.

Proximity Measures: Select a similarity or distance


measure among ‘Euclidean distance’, ‘Manhattan distance’

and ‘Exact Match’, which will be used to calculate the

distance between nodes.

Normalize: Select ‘Yes’ if a user wants to normalize an attribute(s).

 Output
A user can select in which format(s) the outputs are to be reported. As the result of ‘Anomaly

detection (Local Outlier Factor >> Vector)’ analysis, ‘Main Report’, ‘Local Outlier Factor Table’,

‘Local Reachability Density Table’, ‘K-Distance Table’ and ‘MDS Map’ are reported.

803
NetMiner Module Reference

 Outputs
An output(s) is listed as an inner tab located at the bottom of an output window.

 Reports
Main Report

 Local Outlier Factor Distribution: Mean, standard deviation, minimum and maximum of
local outlier factors.

 Local Reachability Density Distribution: Mean, standard deviation, minimum and


maximum of local reachability densities.

 K-distance Distribution: Mean, standard deviation, minimum and maximum of k-distances.

804
IV. Mining

 Tables
Local Outlier Factor: Shows the local outlier factor of each node.

805
NetMiner Module Reference

Local Reachability Density: Shows the local reachability density of each node.

K-Distance: Shows the k-distance of each node.

 Maps
MDS:The default style is set according to the ‘Common’ option in the ‘Preference >> Node’ tab. A
node with higher local outlier factor is presented bigger on the map.

806
IV. Mining

 Inspect
Random shift: If many nodes are so overlapped that the map
becomes incomprehensible, randomly shifting nodes can

help a user to understand the map. A user can adjust the

maximum length that nodes can move from its original

position using the slider.

 Time Complexity
 O( n2 * log( k ) + n2 * f ) where n is the number of nodes, k is the number of neighbors and

f is the number of input node attributes.

 References
 Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). "LOF: Identifying Density-
based Local Outliers". Proceedings of the 2000 ACM SIGMOD international conference

on Management of data. SIGMOD '00: 93–104. doi:10.1145/335191.335388. ISBN 1-

58113-217-4.

807
NetMiner Module Reference

 Related Topics
 Mining >> Anomaly Detection >> Local Outlier Factor >> Matrix

808
IV. Mining

Mining >> Anomaly Detection >> Attribute Value

Frequency(AVF)

 Menu
Mining >> Anomaly Distribution >> Attribute Value Frequency(AVF)

 Description
AVF method is one of the efficient methods to detect outliers in categorical data. The mechanism in

this method is that, it calculates frequency of each value in each data at-tribute and finds their

probability, and then it finds the attribute value frequency for each record by averaging probabilities

and selects top k- outliers based on the least AVF score.

Lets assume that the dataset contains n data points, If each data point has m attributes,

we can write , where is the value of the l-th attribute of xi. Following the

reasoning given above, a good indicator or score to decide if point xi is an outlier can be defined as

the AVF Score below:

where is the number of times the l-th attribute value of xi appears in the dataset. Since we

essentially have a sum of m positive numbers, the AVF score is minimized when each of the

summation terms are individually minimized. Thus, the AVF score will be minimum for the `ideal'

outlier as defined earlier.

 User Options

809
NetMiner Module Reference

 Input
Node Attribute: Select a numerical attribute(s), which will be used to calculate anomaly score.

 Main process
Outlier detection: Select the condition for detecting outliers.

 # of outliers: The user specifies the number of


outliers.

 Less than mean-#stdev: If AVF score value of the


node less than μ-n × σ is determined as outlier. The user specifies parameter n, which is

coefficient of the standard deviation.

 Output
A user can select in which format(s) the outputs are to be

reported.. As the result of ‘Anomaly detection (Attribute Value

Frequency)’ analysis, ‘Main Report’ and ‘AVF Score Table’ are

reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports
Main Report

 AVF Score Distribution: Mean, standard deviation, minimum and maximum of AVF score.

 Number of Outliers: Show the number of outliers.

810
IV. Mining

 Outliers : The table shows the node names and values for the
outliers.

 Tables
AVF score: Shows the AVF score of each node.

 Time Complexity
 O( n * k ), where n is the number of nodes, k is the number of attributes.

 References

811
NetMiner Module Reference

Anna Koufakou, Scalable and Efficient Outlier Detection in Large Distributed Data Sets with Mixed-

Type Attributes, University of Central Florida, 2009

812
IV. Mining

Mining >> Text >> Topic >> Latent Dirichlet

Allocation (LDA)

 Menu
Mining >> Text >> Topic >> LDA

 Description

Latent Dirichlet Allocation (‘LDA’) is the most popular topic model, which is a method for

analyzing a large set of documents. The basic idea is that documents are represented as a topic

distribution where each topic is characterized by a word distribution. Let denote the

topic distribution for each document i and the word distribution for a topic allocated to jth word of

document i respectively. In the learning phase, LDA fits to a set of documents

(i.e. a document-by-word sparse matrix). Given these distributions, the LDA can generate a new
document with the following generative process:

for jth word in the ith document:

Choose a topic ~

Choose a topic ~
Although LDA is designed to model the document generation process, it can be generalized to model

the (main nodeset by sub nodeset) 2-mode network generation process. LDA can also be interpreted

as a matrix factorization[3], that is, a (main nodeset by sub nodeset) sparse matrix is split into two

parts as follows:

813
NetMiner Module Reference

 User Options

 Input
2-mode Network: Select a 2-mode network. A user can
only choose one 2-mode network. However, the link weight

of a 2-mode network must be a non-negative value.

 Link Merge: When selected data contains multiple


links, where more than two links connect the same

source node and target node pair, a user should

decide how to merge them into a single link.

 Main process
# of Topics: Specifies the number of topics (must be larger
than 1).

Random Seed: The number of times to repeat a 'LDA'


algorithm.

Learning Method:

 MCMC: Fitting the model with random walk


Monte Carlo Markov Chain (‘MCMC’) method

using Gibbs sampling.[2]

814
IV. Mining

 VEM: Fitting the model with the Variational Expectation-Maximization (‘VEM’)


algorithm.[1]

Option (for MCMC):

 alpha: Dirichlet hyperparameters of as suggested by [2].

 beta: Dirichlet hyperparameters of . Smaller values of beta will produce more

specific topics.

 # of iterations, burn-in, sample-lag: Parameters that control a Gibbs sampler. The first
burn-in iterations are abandoned and statistics are collected every sample-lag for # of

iterations.

Option (for VEM):

 emmax: Maximum number of iterations for EM step.

 demmax: Maximum number of iterations for updating beta step.

 alphamx: Maximum number of iterations for updating alpha step.

 emtol: Tolerance for convergence for EM step.

 demtol: Tolerance for convergence for updating beta step.

 newtotol: Tolerance for convergence for updating alpha step.

 Output
A user can select in which format(s) the outputs are to be reported.

As the result of ‘LDA’ analysis, ‘Main Report’, ‘Topic Info Table’,

‘Document Classification Table’, ‘Topic Distribution Over

Mainnode Table’ and ‘SubNode Distribution over topic Table’ are

reported.

 Outputs
An output(s) is listed as an inner tab located at the bottom of an

output window.

 Reports

815
NetMiner Module Reference

Main Report

 Topic Info: For each topic, the names of top nodes from ‘Topic Distribution over
Mainnode’ are shown in this table.

 Document Classification Statistics: When the classification labels of nodes in a


Subnodeset are assigned to the topic that maximizes the topic proportion from ‘SubNode

Distribution over Topic’, this table shows the number of nodes included for each topic.

 Tables
Topic Info:Shows the detailed information for ‘topic info’ in a main report.

Document Classification: Shows the detailed information for ‘document classification statistics’ in
a main report.

816
IV. Mining

MainNode Distribution Over Topic: MainNode (Keyword) shows probability information about
which topic to enter.

SubNode Distribution Over Topic: SubNode (Document) shows probability information about
which topic to enter.

817
NetMiner Module Reference

 Time Complexity
 O( L ) where L is the size of links.

 References
 [1] D. Blei, A. Ng, and M. Jordan (2003). "Latent Dirichlet allocation", Journal of Machine

Learning Research, 3:993-1022

 [2] Griffiths TL, Steyvers M (2004). "Finding Scientific Topics", Proceedings of the National

Academy of Sciences of the United States of America, 101, 5228-5235.

 [3] M. Steyvers and T. Griffiths. “Probabilistic topic models.” In T. Landauer, D.S.

McNamara, S. Dennis, and W. Kintsch, editors, Handbook of Latent Semantic Analysis.

Erlbaum, 2007

818
V. Visualize

V. Visualize
1. Layout >> 2D

2. Layout >> 3D

3. Drawing >> 2D

4. Drawing >> 3D

5. Spring >> 2D
 Kamada & Kawai

 Stress Majorization

 Eades

 Fruchterman & Reingold

 GEM

 HDE

6. Spring >> 3D
 Kamada & Kawai

 Eades

7. MDS >> 2D

8. MDS >> 3D

9. Clustered >> 2D
 Clustered-CoLa

 Clustered Eades

10. Clustered >> 3D


 Clustered Eades

11. Layered >> 2D


 Dig-CoLa

12. Circular >> 2D


 Circumference

 Concentric

 Radial

13. Simple >> 2D


 Fixed

819
NetMiner Module Reference

 Random

14. Two Mode >> Spring

15. Link Layout >> Edge Bundling >> Divided Edge Bundling

820
V. Visualize

Visualize >> Layout >> 2D

 Menu
Visualize >> Layout >> 2D

 Description
‘Visualize >> Layout >> 2D’ module generates coordinates of nodes in the network map with

selected input data and other visualization options. This module can be used for handling huge

network data which requires much time to generate coordintes and network map at the same time.

You can separate network drawing process into two steps: first ‘Layout’ module , second ‘Drawing’

module.

 User Options

 Input
1-mode Network: Select 1-mode Network.

 Main Process (Node and Link Layout)


You can select layout algorithm and other options for drawing network map. These options are same

with other visualization modules.

821
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Layout’ module, Main Report, Node Position can be generated.

822
V. Visualize

 Outputs

 Main Report
In the Main Report, you can check just some process and data information.

 Node Position
In [T] Node Position Table, you can check the X Position and Y Position vectors of the nodes for

drawing network map. These vectors can be added to the node attribute data for drawing network

map usign mouse-right click menu.

 Related Topics
Using NetMiner >> Task >> Visual Exploration of Network Map >> Drawing Network Map >>

Methods for Drawing Network Map

823
NetMiner Module Reference

Visualize >> Layout >> 3D

 Menu
Visualize >> Layout >> 3D

 Description
‘Visualize >> Layout >> 3D’ module generates coordinates of nodes in the network map with

selected input data and other visualization options. This module can be used for handling huge

network data which requires much time to generate coordintes and network map at the same time.

You can separate network drawing process into two steps: first ‘Layout’ module , second ‘Drawing’

module.

 User Options

 Input
1-mode Network: Select 1-mode Network.

 Main Process (Node and Link Layout)


You can select layout algorithm and other options for drawing network map. These options are same

with other visualization modules.

824
V. Visualize

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Layout’ module, Main Report, Node Position can be generated.

825
NetMiner Module Reference

 Outputs

 Main Report
In the Main Report, you can check just some process and data information.

 Node Position
In [T] Node Position Table, you can check the X Position, Y Position and Z Position vectors of the

nodes for drawing network map. These vectors can be added to the node attribute data for drawing

network map using mouse-right click menu.

 Related Topics
Using NetMiner >> Task >> Visual Exploration of Network Map >> Drawing Network Map >>

Methods for Drawing Network Map.

826
V. Visualize

Visualize >> Drawing >> 2D

 Menu
Visualize >> Drawing >> 2D

 Description
‘Visualize >> Drawing >> 2D’ module generates network map with the selected 1-mode network

and selected X, Y Position vector in the node attribute data. This module can be used in case of

handling huge size of network data that requires much time complexity to draw network with the

with the ‘Visualize >> Drawing’ module which generates coordinate of network data.

 User Options

 Input
1-mode Network: Select 1-mode Network.

X Position: Select x position vector for drawing network map in


node attribute data.

Y Position: Select y position vector for drawing network map in


node attribute data.

827
NetMiner Module Reference

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Drawing’ module, ‘[M]

Drawing’ can be generated.

 Outputs

 Map
In the [M] Drawing, you can check the network map with the selected network data and coordinates.

 Related Topics
Using NetMiner >> Task >> Visual Exploration of Network Map >> Drawing Network Map >>

Methods for Drawing Network Map

828
V. Visualize

Visualize >> Drawing >> 3D

 Menu
Visualize >> Drawing >> 3D

 Description
‘Visualize >> Drawing >> 3D’ module generates network map with the selected 1-mode network

and selected X, Y, Z Position vector in the node attribute data. This module can be used in case of

handling huge size of network data that requires much time complexity to draw network with the

with the ‘Visualize >> Drawing’ module which generates coordinate of network data.

 User Options

 Input
1-mode Network: Select 1-mode Network.

X Position: Select x position vector for drawing network map in


node attribute data.

Y Position: Select y position vector for drawing network map in


node attribute data.

Z Position: Select z position vector for drawing network map in


node attribute data.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Drawing’ module, ‘[M] Drawing’ can be generated.

829
NetMiner Module Reference

 Outputs

 Map
In the [M] Drawing, you can check the network map with the selected network data and coordinates.

 Related Topics
Using NetMiner >> Task >> Visual Exploration of Network Map >> Drawing Network Map >>

Methods for Drawing Network Map

830
V. Visualize

Visualize >> Spring >> 2D

Spring-embedding modules attach virtual springs between each pair of nodes. Attractive forces are

given to the pairs that should be near-by, and repelling forces are given to the pairs that should be far

apart. When you embed nodes with spring modules, direction of link is not considered. And the

coordinate of each node does not convey theoretically strict meanings.

NetMiner includes six 2-dimensional spring-embedding layouts, each of which is selected in Layout

Control Item in Display Tab. Detailed Description for each Spring 2D Layout is in following pages.

- Kamada & Kawai

- Stress Majorization

- Eades

- Fruchterman & Reingold

- GEM (Graph Embedder)

- HDE (High-Dimensional Embedding)

831
NetMiner Module Reference

Visualize >> Spring >> 2D >> Kamada & Kawai

 Menu
Visualize >> Spring >> 2D >> Kamada & Kawai

 Description
This is a straightforward implementation of Kamada-Kawai [1989]'s spring embedding algorithm,

which is one of force-directed graph layout algorithms. The aim of this algorithm is to find a set of

coordinates in which, for each pair of nodes, the Euclidean distance is approximately proportional to

the geodesic distance between two nodes. That is, this algorithm tries to represent ideal distance

between nodes that aren’t adjacent to each other.

Kamada & Kawai visualizing module somewhat sacrifices the computation time for nodes’ being

more evenly distributed than Eades algorithm and Fruchterman & Reingold algorithm.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Between Component Factor: Its default value is 2. If this value is greater than the default value,

average distances between disconnected components gets larger.

832
V. Visualize

- Epsilon: Kamada & Kawai algorithm is an iterative algorithm. In

certain stage, if that the energy of every node becomes smaller than

this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance

and distance between each node.) That is, if the smaller epsilon value

is chosen, you can get the more accurate visualization image at the

expense of longer computation time.

- Max Iteration: This specifies the maximum number of iterations

until the reach of aforementioned Epsilon convergence in order for

the more accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you

to decide the method of determining the initial coordinates of nodes.

The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.

X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

Arrange Components
‘Arrange Components’ option controls the way to arrange components. That is, this option is valid

when there’re more than two components. Available options are as follow.

833
NetMiner Module Reference

- Alternate Bisection: It divide the map into two equal parts, and arrange the biggest component at the

upper part. Components are arranged in this way.

- Polyomino Packing: It arranges components as close as possible.

- Tiling: It divides map into various rectangles fitting to sizes of components.

- Vertical: It arranges components vertically by their sizes.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

834
V. Visualize

 Time Complexity
 O(n^3)

 Reference
 T. Kamada and S. Kawai (1989). An Algorithm for Drawing General Undirected Graphs,
Inform. Process. Lett., 31, 7-15.

 Related Topics

835
NetMiner Module Reference

Visualize >> Spring >> 2D >> Stress

Majorization

 Menu
Visualize >> Spring >> 2D >> Stress Majorization

 Description
This algorithm is an implementation of 'Graph Drawing by Stress Majorization' by Emden R.

Gansner, Yehuda Koren, and Stephen North. Although this algorithm is theoretically supposed to

provide almost the same or better output than the one generated by Kamada-Kawaii algorithm with

consuming the shorter computation time, experimentally in many cases, Kamada-Kawai has

outperformed this algorithm in speed. However, this algorithm has the great virtue of monotonic-

convergence when refining your layout further.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Epsilon: Stress Majorization algorithm is an iterative algorithm. In certain stage, if that the energy

of every node becomes smaller than this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

image at the expense of longer computation time.

836
V. Visualize

- Max Iteration: This specifies the maximum number of iterations

until the reach of aforementioned Epsilon convergence in order for

the more accurate visualization image.

- Timeout Limit (in second): the maximum time allowed operating the

iterations of the algorithm.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you

to decide the method of determining the initial coordinates of nodes.

The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates.

Users should provide attribute data containing nodes’ coordinate


information.

X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

Arrange Components
‘Arrange Components’ option controls the way to arrange components. That is, this option is valid

when there’re more than two components. Available options are as follow.

- Alternate Bisection: It divides the map into two equal parts, and arranges the biggest component at

one part. Then, it divides the left one into two parts, and arranges the second biggest component at

one. This process is repeated for all components.

- Polyomino Packing: It arranges components close to each other.

837
NetMiner Module Reference

- Tiling: It divides map into various rectangles. Size of each rectangle is fit to each component.

- Vertical: Arranges components vertically in order of size.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Stress Majorization algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

838
V. Visualize

 Time Complexity
 O(n^3)

 Reference
 E. R. Gansner, Y. Koren and S. North (2004). Graph Drawing by Stress Majorization.

 Related Topics
Visualize >> Spring >> 2D >> Kamada & Kawai

839
NetMiner Module Reference

Visualize >> Spring >> 2D >> Eades

 Menu
Visualize >> Spring >> 2D >> Eades

 Description
This algorithm is fairly straightforward implementation of Eades' Spring Embedder and faster than

Kamada & Kawai algorithm. Repelling forces are assigned to every pair of non-adjacent nodes, and

attractive forces are assigned to every pair of adjacent nodes. Based upon this spring model, nodes

are spread well on the plane and adjacent nodes are placed closely one another. That is, this

algorithm illusrates the local network structure of each node. But the arrangement can be unequal to

some network structure.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Repulsiveness Coefficient: This option controls the repelling forces between two non-adjacent nodes.

The default value is 1. If the value is bigger than 1, the repelling forces increase. Then non-adjacent

nodes and disconnected nodes get far away from each other.

840
V. Visualize

-Attenuation Factor: Eades algorithm is an iterative algorithm. For

each stage, the algorithm changes position of the nodes. Rate of

change gets smaller by the time passes, under the condition of big

attenuation factor value.

- Epsilon: Eades algorithm is an iterative algorithm. In certain stage, if

that the energy of every node becomes smaller than this user-defined

value, this algorithm stops optimizing node’s coordinates. (The energy

is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the

more accurate visualization image at the expense of longer

computation time.

- Max Iteration: This specifies the maximum number of iterations until

the reach of aforementioned Epsilon convergence in order for the more

accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’ coordinates by the algorithm.

Therefore, it may be necessary for you to decide the method of determining the initial coordinates of

nodes. The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

841
NetMiner Module Reference

Arrange Components
‘Arrange Components’ option controls the way to arrange components. That is, this option is valid

when there’re more than two components. Available options are as follow.

- Alternate Bisection: It divides the map into two equal parts, and arranges the biggest component at

one part. Then, it divides the left one into two parts, and arranges the second biggest component at

one. This process is repeated for all components.

- Polyomino Packing: It arranges components close to each other.

- Tiling: It divides map into various rectangles. Size of each rectangle is fit to each component.

- Vertical: Arranges components vertically in order of size.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

842
V. Visualize

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Eades algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

 Time Complexity
 O(k * n^2) where k is the number of iterations.

 Reference
 P. Eades (1984). A Heuristic for Graph Drawing, Cong. Numer., 42, 149-160.

 Related Topics

843
NetMiner Module Reference

Visualize >> Spring >> 2D >>Fruchterman & Reingold

 Menu
Visualize >> Spring >> 2D >> Fruchterman & Reingold

 Description
This is an implementation of spring-embedding algorithm by Fruchterman and Reingold. Basically

the logic of this algorithm is the same as Eades's algorithm, but some heuristics, including repelling

and attractive force, are employed to speed up layout computation.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Cooling Coefficient: Fruchterman & Reingold algorithm is an iterative algorithm. For each stage, it

changes nodes’ coordinates. If you use bigger cooling coefficient, the changing rate gets smaller as

time pasts.

- Epsilon: Fruchterman & Reingold algorithm is an iterative algorithm. In certain stage, if that the

energy of every node becomes smaller than this user-defined value, this algorithm stops optimizing

node’s coordinates. (The energy is proportional to the gap of ideal distance and distance between

each node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

844
V. Visualize

image at the expense of longer computation time.

- Max Iterations: This specifies the maximum number of iterations

until the reach of aforementioned Epsilon convergence in order for

the more accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you

to decide the method of determining the initial coordinates of nodes.

The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates.

Users should provide attribute data containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

Arrange Components
‘Arrange Components’ option controls the way to arrange components. That is, this option is valid

when there’re more than two components. Available options are as follow.

- Alternate Bisection: It divides the map into two equal parts, and arranges the biggest component at

one part. Then, it divides the left one into two parts, and arranges the second biggest component at

one. This process is repeated for all components.

- Polyomino Packing: It arranges components close to each other.

845
NetMiner Module Reference

- Tiling: It divides map into various rectangles. Size of each rectangle is fit to each component.

- Vertical: Arranges components vertically in order of size.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Fruchterman & Reingold algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

846
V. Visualize

 Time Complexity
 O(k * n^2) where k is the number of iterations.

 Reference
 Fruchterman and Reingold (1991), "Graph-drawing by force-directed placement", Software-
Practice and Experience, 21(11):1129-1164

 Related Topics
Visualize >> Spring >> 2D >> Eades

847
NetMiner Module Reference

Visualize >> Spring >> 2D >> GEM

 Menu
Visualize >> Spring >> 2D >> GEM

 Description
This algorithm is implementation of Frick, Ludwig and Mehldau’s Graph Embedder algorithm. Like

Fruchterman & Reingold algorithm, the logic of this algorithm is similar to Eades’. But this

algorithm employs some heuristics to improve the computation time of the algorithm convergence.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout
Algorithm Options
- Delta rotation: It decides how much degrees of an angle should be concluded to ‘rotation’.

- Delta oscillation: It decides how much degrees of an angle should be concluded to ‘oscillation’.

- Alpha rotation: It decides the decreasing ratio of temperature(kinetic energy of each node) when it

is ‘rotation’.

- Alpha oscillation: It decides the decreasing ratio of temperature when it is ‘oscillation’.

- Start temperature: It is the initial temperature of nodes.

- Final temperature: It is the expected temperature of the end of algorithm.

- Max temperature: It is the maximum temperature of each node. (That is, temperature can’t be

higher than this value.)

- Edge length: It is aiming length of edge.

- Gravitational constant: It decides the strength of force that gathers nodes to the center of gravity.

848
V. Visualize

- Random move range: It decides the degree of randomness that is

used to node movement.

- Epsilon: GEM algorithm is an iterative algorithm. In certain stage, if

that the energy of every node becomes smaller than this user-defined

value, this algorithm stops optimizing node’s coordinates. (The

energy is proportional to the gap of ideal distance and distance

between each node.) That is, if the smaller epsilon value is chosen,

you can get the more accurate visualization image at the expense of

longer computation time.

- Max Iteration: This specifies the maximum number of iterations

until the reach of aforementioned Epsilon convergence in order for

the more accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you to decide the method of

determining the initial coordinates of nodes. The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

849
NetMiner Module Reference

Arrange Components
‘Arrange Components’ option controls the way to arrange components. That is, this option is valid

when there’re more than two components. Available options are as follow.

- Alternate Bisection: It divides the map into two equal parts, and arranges the biggest component at

one part. Then, it divides the left one into two parts, and arranges the second biggest component at

one. This process is repeated for all components.

- Polyomino Packing: It arranges components close to each other.

- Tiling: It divides map into various rectangles. Size of each rectangle is fit to each component.

- Vertical: Arranges components vertically in order of size.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

850
V. Visualize

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> GEM algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

\
 Time Complexity

 Reference
 Arne Frick, Andreas Ludwig, and Heiko Mehldau. A fast adaptive layout algorithm for
undirected graphs. In Roberto Tamassia and Ioannis G. Tollis, editors, Proc. DIMACS Int.
Work. Graph Drawing, GD, number 894, pages 388–403, Berlin, Germany, 10–12 1994.
Springer-Verlag.

 Related Topics
Visualize >> Spring >> 2D >> Eades

851
NetMiner Module Reference

Visualize >> Spring >> 2D >> HDE

 Menu
Visualize >> Spring >> 2D >> HDE

 Description
This is an implementation of High-Dimensional Embedding (HDE) Algorithm which was introduced

by D. Harel and Y. Koren. This algorithm gives rather coarse result when compared with Kamada &

Kawai or GEM Algorithm, but it is capable of drawing large network within a few seconds.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout
There’s no layout option for HDE algorithm.

Transparency Option: Using this option, users are able to set


transparency on node or link. It can control 0~100% transparency

option. 0% is no transparency and 100% is perfect transparency.

852
V. Visualize

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> HDE algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

 Time Complexity
 O(n+m)

 Reference
 'Graph Drawing by High-Dimensional Embedding', D. Harel and Y. Koren, 2002

 Related Topics

853
NetMiner Module Reference

Visualize >> Spring >> 3D

Spring-embedding modules attach virtual springs between each pair of nodes. Attractive forces are

given to the pairs that should be near-by, and repelling forces are given to the pairs that should be far

apart. When you embed nodes with spring modules, direction of link is not considered. And the

coordinate of each node does not convey theoretically strict meanings.

NetMiner now includes two 3-dimensional spring-embedding layouts. Each Layout is selected in

Layout Control Item in Display Tab. Detailed Description for each Spring 3D Layout is in following

pages.

- Kamada & Kawai

- Eades

854
V. Visualize

Visualize >> Spring >> 3D >> Kamada & Kawai

 Menu
Visualize >> Spring >> 3D >> Kamada & Kawai

 Description
This is a straightforward implementation of Kamada-Kawai [1989]'s spring embedding algorithm,

which is one of force-directed graph layout algorithms. The aim of this algorithm is to find a set of

coordinates in which, for each pair of nodes, the Euclidean distance is approximately proportional to

the geodesic distance between two nodes. That is, this algorithm tries to represent ideal distance

between nodes that aren’t adjacent to each other.

Kamada & Kawai visualizing module somewhat sacrifices the computation time for nodes’ being

more evenly distributed than Eades algorithm and Fruchterman & Reingold algorithm.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Between Component Factor: Its default value is 2. If this value is greater than the default value,

average distances between disconnected components gets larger.

855
NetMiner Module Reference

- Epsilon: Kamada & Kawai algorithm is an iterative algorithm. In certain stage, if that the energy of

every node becomes smaller than this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

image at the expense of longer computation time.

- Max Iteration: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module arranges nodes on the initial

coordinates. Then it optimizes nodes’ coordinates by the algorithm. Therefore, it may be necessary

856
V. Visualize

for you to decide the method of determining the initial coordinates of nodes. The possible options are

as follow:

- Random: Arrange nodes on the map randomly.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y, Z: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y, Z coordinates multiplied by the

scale. So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

3D Coloring Option
- Ambient: This option controls the average volume of light that is created by emission of light from

all of the light sources surrounding (or located inside of) the lit area.

- Diffuse: This option controls diffuse light. Diffuse light represents a directional light cast by a light

source.

- Specular: This option controls specular light. Just like Diffuse light, Specular light is a directional

type of light. It comes from one particular direction. The difference between the two is that specular

light reflects off the surface in a sharp and uniform way.

- Shininess: This option controls the brightness and size of the reflection on nodes and links.

 Outputs
 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node.

857
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference
 T. Kamada and S. Kawai (1989). An Algorithm for Drawing General Undirected Graphs,
Inform. Process. Lett., 31, 7-15.

 Related Topics
Visualize >> Spring >> 2D >> Kamada & Kawai

858
V. Visualize

Visualize >> Spring >> 3D >> Eades

 Menu
Visualize >> Spring >> 3D >> Eades

 Description
This algorithm is fairly straightforward implementation of Eades' Spring Embedder and faster than

Kamada & Kawai algorithm. Repelling forces are assigned to every pair of non-adjacent nodes, and

attractive forces are assigned to every pair of adjacent nodes. Based upon this spring model, nodes

are spread well on the plane and adjacent nodes are placed closely one another. That is, this

algorithm illusrates the local network structure of each node. But the arrangement can be unequal to

some network structure.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Repulsiveness Coefficient: This option controls the repelling forces between two non-adjacent nodes.

The default value is 1. If the value is bigger than 1, the repelling forces increase. Then non-adjacent

nodes and disconnected nodes get far away from each other.-Attenuation Factor: Eades algorithm is

859
NetMiner Module Reference

an iterative algorithm. For each stage, the algorithm changes position of the nodes. Rate of change

gets smaller by the time passes, under the condition of big attenuation factor value.

-Attenuation Factor: Eades algorithm is an iterative algorithm. For each stage, the algorithm changes

position of the nodes. Rate of change gets smaller by the time passes, under the condition of big

attenuation factor value.

- Epsilon: Eades algorithm is an iterative algorithm. In certain stage, if that the energy of every node

becomes smaller than this user-defined value, this algorithm stops optimizing node’s coordinates.

(The energy is proportional to the gap of ideal distance and distance between each node.) That is, if

the smaller epsilon value is chosen, you can get the more accurate visualization image at the expense

of longer computation time.

- Max Iteration: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

860
V. Visualize

Initial Coordinate
Before running the core visualization algorithm, visualizing module arranges nodes on the initial

coordinates. Then it optimizes nodes’ coordinates by the algorithm. Therefore, it may be necessary

for you to decide the method of determining the initial coordinates of nodes. The possible options are

as follow:

- Random: Arrange nodes on the map randomly.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y, Z: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y , Z coordinates multiplied by the

scale. So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

3D Coloring Option
- Ambient: This option controls the average volume of light that is created by emission of light from

all of the light sources surrounding (or located inside of) the lit area.

- Diffuse: This option controls diffuse light. Diffuse light represents a directional light cast by a light

source.

- Specular: This option controls specular light. Just like Diffuse light, Specular light is a directional

type of light. It comes from one particular direction. The difference between the two is that specular

light reflects off the surface in a sharp and uniform way.

- Shininess: This option controls the brightness and size of the reflection on nodes and links.

 Outputs
 Maps
Spring Map
Default layout: A map is drawn by Spring >> Eades algorithm.

Default style: Default style is set by Common option in the Preference >> Node tab.

Applied style: The higher position is represented as the bigger node. And the color shows the team of

each node.

861
NetMiner Module Reference

 Time Complexity
 O(k * n^2) where k is the number of iterations.

 Reference
 P. Eades (1984). A Heuristic for Graph Drawing, Cong. Numer., 42, 149-160.

 Related Topics
 Visualize >> Spring >> 2D >> Eades

862
V. Visualize

Visualize >> MDS >> 2D

 Menu
Visualize >> MDS >> 2D

 Description

Multidimensional Scaling Layout.


- Classical MDS (c-MDS) algorithm implements Torgerson-Gower's classical (metric)

Multidimensional Scaling, which is also known as Principal Coordinate Analysis (PCO). A similarity

matrix is converted to a dissimilarity matrix with linear transformation. The dissimilarity matrix is

squared, double centered and multiplied with -1/2, after which eigenvalue decomposition is used to

determine the coordinate values. Only the first two positive ordered eigenvalues and eigenvectors are

used. A scale is displayed at left and upper side.

- Nonmetric MDS (n-MDS) performs non-metric 2-dimensional scaling of a given ordinal proximity

matrix following ALSCAL(Alternating Least-Squares Scaling) algorithm. The initial configuration is

found using "Classical MDS"(c-MDS). Then disparity matrix is calculated (following Kruskal's least-

squares monotonic transformation) and normalized. Then using this disparity matrix, coordinates are

determined one at a time estimation of coordinates minimizing SStress.

- Kruskal’s Nonmetric MDS (Kn-MDS) approach was published in 1964. It finds a configuration with

minimum stress using Kruskal's monotonic least squares regression and Newton-Rhapson method.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

863
NetMiner Module Reference

 Layout
Proximity
- Dissimilarity: The input data of MDS algorithms is basically

difference between a pair of nodes. If your data represents ‘distance’

between nodes, select this option.

- Similarity: If the input data represents the similarity between a pair of

nodes, select this option. It will be automatically transformed to the

dissimilarity matrix under the hood for MDS algorithm.

Transparency Option: Using this option, users are able to set


transparency on node or link. It can control 0~100% transparency

option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Map
Classical MDS Map
- Default layout: A map is drawn by MDS algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

864
V. Visualize

N – MDS
- Default layout: A map is drawn by MDS algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.(반복)
- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

Kn – MDS
- Default layout: A map is drawn by MDS algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

865
NetMiner Module Reference

 Time Complexity
 O(n^3)

 Reference
 J. C. Gower (1966). Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika, 53:325--388, 156.

 Young, F., W, Takane, Y., & Lewyckyj, R., " Three notes on ALSCAL", Pschometrika, 1978,
43, 433-435

 Kruskal, J.B. Nonmnetric multidimensional scaling: a numerical method. Psychometrika, 29.


115-129.

 Related Topics

866
V. Visualize

Visualize >> MDS >> 3D

 Menu
Visualize >> MDS >> 3D

 Description

Multidimensional Scaling Layout.


- Classical MDS (c-MDS) algorithm implements Torgerson-Gower's classical (metric)

Multidimensional Scaling, which is also known as Principal Coordinate Analysis (PCO). A similarity

matrix is converted to a dissimilarity matrix with linear transformation. The dissimilarity matrix is

squared, double centered and multiplied with -1/2, after which eigenvalue decomposition is used to

determine the coordinate values. Only the first two positive ordered eigenvalues and eigenvectors are

used. A scale is displayed at left and upper side.

- Nonmetric MDS (n-MDS) performs non-metric 2-dimensional scaling of a given ordinal proximity

matrix following ALSCAL(Alternating Least-Squares Scaling) algorithm. The initial configuration is

found using "Classical MDS"(c-MDS). Then disparity matrix is calculated (following Kruskal's least-

squares monotonic transformation) and normalized. Then using this disparity matrix, coordinates are

determined one at a time estimation of coordinates minimizing SStress.

- Kruskal’s Nonmetric MDS (Kn-MDS) approach was published in 1964. It finds a configuration with

minimum stress using Kruskal's monotonic least squares regression and Newton-Rhapson method.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

867
NetMiner Module Reference

 Layout

Proximity
- Dissimilarity: The input data of MDS algorithms is basically

difference between a pair of nodes. If your data represents ‘distance’

between nodes, select this option.

- Similarity: If the input data represents the similarity between a pair of

nodes, select this option. It will be automatically transformed to the

dissimilarity matrix under the hood for MDS algorithm.

3D Coloring Option
- Ambient: This option controls the average volume of light that is created by emission of light from

all of the light sources surrounding (or located inside of) the lit area.

- Diffuse: This option controls diffuse light. Diffuse light represents a directional light cast by a light

source.

- Specular: This option controls specular light. Just like Diffuse light, Specular light is a directional

type of light. It comes from one particular direction. The difference between the two is that specular

light reflects off the surface in a sharp and uniform way.

- Shininess: This option controls the brightness and size of the reflection on nodes and links.

 Outputs

 Map
Classical MDS Map
- Default layout: A map is drawn by MDS algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node.

868
V. Visualize

N-MDS
- Default layout: A map is drawn by MDS algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node.

 Time Complexity
 O(n^3)

869
NetMiner Module Reference

 Reference
 J. C. Gower (1966). Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika, 53:325--388, 156.

 Young, F., W, Takane, Y., & Lewyckyj, R., " Three notes on ALSCAL", Pschometrika, 1978,
43, 433-435

 Kruskal, J.B. Nonmnetric multidimensional scaling: a numerical method. Psychometrika, 29.


115-129.

 Related Topics
 Visualize >> MDS >> 2D

870
V. Visualize

Visualize >> Clustered >> 2D

NetMiner includes two 2-dimensional clustered layouts. Each Layout is selected in Layout Control

Item in Display Tab. Detailed Description for each Clustered 2D Layout is in following pages.

- Clustered-CoLa 2D

- Clustered-Eades 2D

871
NetMiner Module Reference

Visualize >> Clustered >> 2D >> Clustered-CoLa

 Menu
Visualize >> Clustered >> 2D >> Clustered-CoLa

 Description
This is an implementation of IPSep-CoLa algorithm applied to clustered graph drawing. Graph is

drawn a lot like Kamada & Kawai style, but it separates nodes of different clusters using given

partition vector (attribute data). Attractive force is given to nodes of same cluster, and repelling force

is given to nodes of different clusters.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Epsilon: Clustered-CoLa algorithm is an iterative algorithm. In certain stage, if that the energy of

every node becomes smaller than this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

image at the expense of longer computation time.

- Max Iteration: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

- Timeout Limit(sec): You can decide when to stop the algorithm. After the time that is appointed in

here pasts, algorithm stops.

872
V. Visualize

- Level Gap: Specify distances between clusters in the unit of pixels.

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you

to decide the method of determining the initial coordinates of nodes.

The possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users

should provide attribute data containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate

information.

Scale: Nodes are placed in the position given by X, Y

coordinates multiplied by the scale. So, by this option, you can

arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y

offset’). Using this option, you can prevent the nodes converging on the left upper plane.

Select Vector: Select a Main Node Attribute data. Selected vector data is used to clustering nodes.
- Label Groups’ name with Att: If it is checked, current attribute name is used to Label group’s

name.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

873
NetMiner Module Reference

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-CoLa algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node. (Selected vector = Team)

874
V. Visualize

 Time Complexity
 O(n^3) approximately

 Reference
 T. Dwyer, Y. Koren, K. Marriott (2006). IPSep-CoLa: An Incremental Procedure for Separation
Constraint Layout of Graphs

 Related Topics
 Visualize >> Spring >> 2D >> Kamada & Kawai

875
NetMiner Module Reference

Visualize >> Clustered >> 2D >> Clustered

Eades

 Menu
Visualize >> Clustered >> 2D >> Clustered Eades

 Description
This is an extension of original Eades' Spring Embedding algorithm in order to integrate clustering.

Nodes in same clusters have more attractive force with one another, and nodes in different clusters

have more repelling forces between one another. In result, nodes in each cluster are gathered closely,

and different clusters are located far apart.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Repulsiveness Coefficient: This option controls the repelling forces between two non-adjacent nodes.

The default value is 1. If the value is bigger than 1, the repelling forces increase. Then non-adjacent

876
V. Visualize

nodes and disconnected nodes get far away from each other.

- Internal Cluster Factor: The weight that is multiplied to the

attractive force of springs attached to nodes in internal cluster. When

you use bigger value, the pair of nodes in internal cluster becomes

closer.

- External Cluster Factor: The weight that is multiplied to the

attractive force of springs attached to nodes in external cluster. When

you use smaller value, the pair of nodes in external cluster becomes

farther.

- Between Cluster Factor: Clustered Eades algorithm makes a virtual

node for each cluster. This factor is a weight that is multiplied to the

attractive force of spring attached between each pair of clusters. (In

fact, the spring between each pair of virtual nodes.) When you use

smaller value, clusters get farther.

-Attenuation Factor: Clustered Eades algorithm is an iterative

algorithm. For each stage, the algorithm changes position of the nodes.

Rate of change gets smaller by the time passes, under the condition of

big attenuation factor value.

- Epsilon: Clustered Eades algorithm is an iterative algorithm. In

certain stage, if that the energy of every node becomes smaller than this user-defined value, this

algorithm stops optimizing node’s coordinates. (The energy is proportional to the gap of ideal

distance and distance between each node.) That is, if the smaller epsilon value is chosen, you can get

the more accurate visualization image at the expense of longer computation time.

- Max Iterations: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

Initial Coordinate
Before running the core visualization algorithm, visualizing module arranges nodes on the initial

coordinates. Then it optimizes nodes’ coordinates by the algorithm. Therefore, it may be necessary

for you to decide the method of determining the initial coordinates of nodes. The possible options are

as follow:

877
NetMiner Module Reference

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

Select Vector: Select a Main Node Attribute data. Selected vector data is used to clustering nodes.
- Label Groups’ name with Att: If it is checked, current attribute name is used to Label group’s

name.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

878
V. Visualize

 Outputs

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered Eades algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node. (Selected vector = Team)

 Time Complexity
 O(k * n^2) where k is the number of iterations

 Reference
 P. Eades (1984). A Heuristic for Graph Drawing, Cong. Numer., 42, 149-160.

 Related Topics
 Visualize >> Spring >> 2D >> Eades

879
NetMiner Module Reference

Visualize >> Clustered >> 3D >> Clustered

Eades

 Menu
Visualize >> Clustered >> 3D >> Clustered Eades

 Description
This is an extension of original Eades' Spring Embedding algorithm in order to integrate clustering in

3-dimensional space. Nodes in same clusters have more attractive force with one another, and nodes

in different clusters have more repelling forces between one another. In result, nodes in each cluster

are gathered closely, and different clusters are located far apart.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout
Algorithm Options
- Natural Length Coefficient: This option controls basic distance of connected nodes. The default

value is 1(Default value is proportional to the square root of canvas. Experimentally ideal

proportional factor is used.) If this value is smaller than 1, the average distance of node pairs gets

shorter. And if this value is bigger than 1, the average distance gets longer.

- Repulsiveness Coefficient: This option controls the repelling forces between two non-adjacent nodes.

The default value is 1. If the value is bigger than 1, the repelling forces increase. Then non-adjacent

nodes and disconnected nodes get far away from each other.

- Internal Cluster Factor: The weight that is multiplied to the attractive force of springs attached to

880
V. Visualize

nodes in internal cluster. When you use bigger value, the pair of nodes in internal cluster becomes

closer.

- External Cluster Factor: The weight that is multiplied to the attractive force of springs attached to

nodes in external cluster. When you use smaller value, the pair of nodes in external cluster becomes

farther.

881
NetMiner Module Reference

- Between Cluster Factor: Clustered Eades algorithm makes a virtual node for each cluster. This

factor is a weight that is multiplied to the attractive force of spring attached between each pair of

clusters. (In fact, the spring between each pair of virtual nodes.) When you use smaller value, clusters

get farther.

-Attenuation Factor: Clustered Eades algorithm is an iterative algorithm. For each stage, the

algorithm changes position of the nodes. Rate of change gets smaller by the time passes, under the

condition of big attenuation factor value.

- Epsilon: Clustered Eades algorithm is an iterative algorithm. In certain stage, if that the energy of

every node becomes smaller than this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

image at the expense of longer computation time.

- Max Iteration: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

Select Vector: Select a Main Node Attribute data. Selected vector data is used to clustering nodes.

Initial Coordinate
Before running the core visualization algorithm, visualizing module arranges nodes on the initial

coordinates. Then it optimizes nodes’ coordinates by the algorithm. Therefore, it may be necessary

for you to decide the method of determining the initial coordinates of nodes. The possible options are

as follow:

- Random: Arrange nodes on the map randomly.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y, Z: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y, Z coordinates multiplied by the

scale. So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

882
V. Visualize

Select Vector: Select a Main Node Attribute data. Selected vector data is used to clustering network.

3D Coloring Option
- Ambient: This option controls the average volume of light that is created by emission of light from

all of the light sources surrounding (or located inside of) the lit area.

- Diffuse: This option controls diffuse light. Diffuse light represents a directional light cast by a light

source.

- Specular: This option controls specular light. Just like Diffuse light, Specular light is a directional

type of light. It comes from one particular direction. The difference between the two is that specular

light reflects off the surface in a sharp and uniform way.

- Shininess: This option controls the brightness and size of the reflection on nodes and links.

 Outputs

 Maps
Clustered Map
- Default layout: A map is drawn by Clustered >> Clustered-Eades algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. (Selected vector=Team)

883
NetMiner Module Reference

 Time Complexity
 O(k * n^2) where k is the number of iterations.

 Reference
 P. Eades (1984). A Heuristic for Graph Drawing, Cong. Numer., 42, 149-160.

 Related Topics
 Visualize >> Spring >> 3D >> Eades

 Visualize >> Clustered >> 2D >> Clustered Eades

884
V. Visualize

Visualize >> Layered >> 2D >> Dig-CoLa

 Menu
Visualize >> Layered >> 2D >> Dig-CoLa

 Description
This is an implementation of Dig-CoLa algorithm applied to a layered graph drawing. The algorithm

layouts nodes in Kamada-Kawaii style (thus visualize connectivity), but nodes are layered to show

the overall "flow" of the graph.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Algorithm Options
- Epsilon: Dig-CoLa algorithm is an iterative algorithm. In certain stage, if that the energy of every

node becomes smaller than this user-defined value, this algorithm stops optimizing node’s

coordinates. (The energy is proportional to the gap of ideal distance and distance between each

node.) That is, if the smaller epsilon value is chosen, you can get the more accurate visualization

image at the expense of longer computation time.

- Max Iteration: This specifies the maximum number of iterations until the reach of aforementioned

Epsilon convergence in order for the more accurate visualization image.

885
NetMiner Module Reference

- Timeout Limit(sec): You can decide when to stop the algorithm. After

the time that is appointed in here pasts, algorithm stops.

- Level Gap: Specify distances between layers in the unit of pixels.

- Level Closeness (alpha): It is a parameter used to calculate Hierarchy.

The smaller this value is, the more levels are created and employed.

(For more information, please read the reference.)

- Minimum Level Depth (beta): It sets the minimum level depth for

Hierarchy. "Level closeness" is first used to compute the closeness, but

it cannot be smaller than beta value. (For more information, please

read the reference.)

Initial Coordinate
Before running the core visualization algorithm, visualizing module

arranges nodes on the initial coordinates. Then it optimizes nodes’

coordinates by the algorithm. Therefore, it may be necessary for you to

decide the method of determining the initial coordinates of nodes. The

possible options are as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

Scale: Nodes are placed in the position given by X, Y coordinates multiplied by the scale.

So, by this option, you can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y offset’). Using this option, you can

prevent the nodes converging on the left upper plane.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

886
V. Visualize

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
- Layered Map
- Default layout: A map is drawn by Layered >> Dig-CoLa

algorithm.

- Default style: Default style is set by Common option in

the Preference >> Node tab.

- Applied style: The higher position is represented as the

bigger node. Shape of node represents the sex of that node.

And the color shows the team of each node.

887
NetMiner Module Reference

 Time Complexity
 O(n^3) approximately

 Reference
 T. Dwyer, Y. Koren, K. Marriott (2007). Constrained Graph Layout by Stress Majorization and
Gradient Projection

 Related Topics

888
V. Visualize

Visualize >> Circular >> 2D >> Circumference

 Menu
Visualize >> Circular >> 2D >> Circumference

 Description
This layout algorithm simply locates each node on a circumference. The order of nodes can be

controlled by selecting a vector as key variable.

If you select partition vector, nodes in same partition would be plotted banded together.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Select Partition Vector: Select a Main Node Attribute. It is used to divide partitions.

Preserve Map Size: If you select this option before running the layout algorithm, the current map
size will remain unchanged. If not, the current map size will be not kept and new map size will be

computed to fit to the screen.

Fix nodes coordinates: Using this option, users are able to run layout algorithm while some user-
specified nodes' coordinates are fixed. When fixing nodes’ coordinates, the results of Eades

algorithm are usually better than the results of other spring layout algorithm.

- Node list view: The coordinates of nodes which are listed in this box will be fixed.

- Add Selection: When you select some nodes on the map and click 'Add Selection' button, the

889
NetMiner Module Reference

selected nodes are showed in the 'Node list view' box.

- Remove: When you select some nodes in the 'Node list view' and click 'Remove' button, the selected

nodes are removed from the 'Node list view' box.

- Select Fixed: The nodes listed in the 'Node list view' box are selected on the map.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Circular Map
- Default layout: A map is drawn by Circular >> Circumference algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

890
V. Visualize

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

 Time Complexity
 O(n)

 Reference

 Related Topics

891
NetMiner Module Reference

Visualize >> Circular >> 2D >> Concentric

 Menu
Visualize >> Circular >> 2D >> Concentric

 Description
This algorithm gets various centrality vectors as input data, and puts nodes in a concentric form

according to their centrality scores. If selected Centrality Type is ‘Central’, the most central node(s)

is located at the center of the map. The lower centrality score (scale) a node has, the farther it is away

the center. Nodes with similar centrality scale have the same radius and lie on the same concentric

circle.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout

Select Vector: Select a Main Node Attribute vector. Selected vector is used as a Centrality vector.

Centrality Type
- Central: Nodes with highest value would be placed in center.

- Peripheral: Nodes with lowest value would be placed in center.

# of Scale : the number of concentric circles

Preserve Map Size: If you select this option before running the layout algorithm, the current map

892
V. Visualize

size will remain unchanged. If not, the current map size will be not

kept and new map size will be computed to fit to the screen.

Concentric Map Style


- Score: Checking Score check box makes the Centrality Score show

by each node on the concentric or radial map.

- Grid: Checking Grid check box makes the concentric circle show on

the concentric or radial map.

- Threshold: Checking Threshold Value check box makes the

Threshold Value of each Grid show.

Preserve Map Size: If you select this option before running the
layout algorithm, the current map size will remain unchanged. If not,

the current map size will be not kept and new map size will be

computed to fit to the screen.

Transparency Option: Using this option, users are able to set


transparency on node or link. It can control 0~100% transparency

option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Concentric Map
- Default layout: A map is drawn by Circular >> Concentric algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

893
NetMiner Module Reference

 Time Complexity
 O(n)

 Reference

 Related Topics

894
V. Visualize

Visualize >> Circular >> 2D >> Radial

 Menu
Visualize >> Circular >> 2D >> Radial

 Description
This layout algorithm is an extension of Kamada-Kawai [1989]'s spring embedding algorithm.

Radial algorithm is basically similar to Concentric algorithm, Given a centrality vector, it places

nodes with high centrality near the center of plane, and node with lower centrality score far from the

center of plane. If there’s no centrality vector, eccentricity of node can be used as centrality vector. In

this case, the nodes with smaller eccentricity would be placed closer to center of plane. Then, it

computes geodesic distance between all the pair of nodes, and tries to make the Euclidean distances

of nodes proportional to the geodesic distances.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph
will be drawn in consideration of all the networks selected.

 Layout

Select Centrality Vector


-Eccentricity: Eccentricity of a node means the greatest distance between that node and all other

nodes. If you select this option, eccentricity value would be used as centrality vector.

-Select other vector: Selected Main Node Attribute is used as centrality vector.

# of Scale : You can decide the number of concentric circles.

895
NetMiner Module Reference

Preserve Map Size: If you select this option before running the
layout algorithm, the current map size will remain unchanged. If not,

the current map size will be not kept and new map size will be

computed to fit to the screen.

Concentric Map Style


- Score: Checking Score check box makes the Centrality Score show

by each node on the concentric or radial map.

- Grid: Checking Grid check box makes the concentric circle show on

the concentric or radial map.

- Threshold: Checking Threshold Value check box makes the

Threshold Value of each Grid show.

Preserve Map Size: If you select this option before running the
layout algorithm, the current map size will remain unchanged. If not,

the current map size will be not kept and new map size will be

computed to fit to the screen.

Transparency Option: Using this option, users are able to set


transparency on node or link. It can control 0~100% transparency option. 0% is no transparency and

100% is perfect transparency.

 Outputs

 Maps
Radial Map
- Default layout: A map is drawn by Circular >> Radial algorithm.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

896
V. Visualize

 Time Complexity
 O(n^3)

 Reference
 T. Kamada and S. Kawai (1989). An Algorithm for Drawing General Undirected Graphs,
Inform. Process. Lett., 31, 7-15.

 Related Topics
 Visualize >> Spring >> 2D >> Kamada & Kawai

 Visualize >> Circular >> 2D >> Concentric

897
NetMiner Module Reference

Visualize >> Simple >> 2D >> Fixed

 Menu
Visualize >> Simple >> 2D >> Fixed

 Description
Nodes are placed on the map with the user-defined coordinates. (Therefore, a user should provide the

attribute data for all the coordinates.)

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will be
drawn in consideration of all the networks selected.

 Layout

Initial Coordinate
Before running the core visualization algorithm, visualizing module arranges nodes on the initial

coordinates. Then it optimizes nodes’ coordinates by the algorithm. Therefore, it may be necessary

for you to decide the method of determining the initial coordinates of nodes. The possible options are

as follow:

- Circular: Arrange nodes on a circumference at regular intervals.

- Random: Arrange nodes on the map randomly.

- Current Position: Arrange nodes on their current position.

- User Defined: Arrange nodes at the user-specified coordinates. Users should provide attribute data

containing nodes’ coordinate information.


X, Y: Select attribute data containing nodes’ coordinate information.

898
V. Visualize

Scale: Nodes are placed in the position given by X, Y

coordinates multiplied by the scale. So, by this option, you

can arrange nodes leaving enough space between nodes.

X, Y offset: Nodes are arranged at (X+’X offset’, Y+’Y

offset’). Using this option, you can prevent the nodes

converging on the left upper plane.

Arrange Components
‘Arrange Components’ option controls the way to arrange

components. That is, this option is valid when there’re more than two

components. Available options are as follow.

- Alternate Bisection: It divides the map into two equal parts, and

arranges the biggest component at one part. Then, it divides the left

one into two parts, and arranges the second biggest component at one. This process is repeated for all

components.

- Polyomino Packing: It arranges components close to each other.

- Tiling: It divides map into various rectangles. Size of each rectangle is fit to each component.

- Vertical: Arranges components vertically in order of size.

Transparency Option: Using this option, users are able to set transparency on node or link. It can
control 0~100% transparency option. 0% is no transparency and 100% is perfect transparency.

 Outputs

 Maps
Fixed Map
- Default layout: A map is drawn by Simple >> Fixed layout.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

899
NetMiner Module Reference

 Time Complexity
 O(n)

 Reference

 Related Topics

900
V. Visualize

Visualize >> Simple >> 2D >> Random

 Menu
Visualize >> Simple >> 2D >> Random

 Description
On a map, random coordinate is assigned to each Node.

 User Options

 Network
1-mode Network: Select one or more 1-mode Networks. Graph will
be drawn in consideration of all the networks selected.

 Layout
Preserve Map Size: If you select this option before running the layout
algorithm, the current map size will remain unchanged. If not, the

current map size will be not kept and new map size will be computed

to fit to the screen.

Transparency Option: Using this option, users are able to set


transparency on node or link. It can control 0~100% transparency

option. 0% is no transparency and 100% is perfect transparency.

901
NetMiner Module Reference

 Outputs
 Maps
Random Map
- Default layout: A map is drawn by Simple >> Random layout.

- Default style: Default style is set by Common option in the Preference >> Node tab.

- Applied style: The higher position is represented as the bigger node. Shape of node represents the

sex of that node. And the color shows the team of each node.

 Time Complexity
 O(n)

 Reference

 Related Topics

902
V. Visualize

Visualize >> Two Mode >> Spring

 Menu
Visualize >> Two Mode >> Spring

 Description
Spring-embedding layouts are extended to 2-mode Network. The bipartite graph of selected Main

NodeSet and Sub NodeSet is used as input data.

Spring-embedding modules attach virtual springs between each pair of nodes. Attractive forces are

given to the pairs that should be near-by, and repelling forces are given to the pairs that should be far

apart. When you embed nodes with spring modules, direction of link is not considered. And the

coordinate of each node does not convey theoretically strict meanings.

NetMiner includes six 2-dimensional spring-embedding layouts, each of which is selected in Layout

Control Item in Display Tab. Detailed Description for each Spring 2D Layout is in ‘Visualize >>

Spring >> 2D’ for more information.

 User Options

 Network
2-mode Network: Select a 2-mode Network. Only one 2-mode
Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of

interest should be selected.

- Link Merge: Determine how multiple links are merged to a single

link.

903
NetMiner Module Reference

 Outputs

 Maps
Spring Map
- Default layout: A map is drawn by Spring >> Kamada & Kawai algorithm.

- Default style: Default style is set by Two-mode option in the Preference >> Node tab.

 Related Topics
 Visulize >> Spring >> 2D

 Transform >> Mode >> 2-mode Network

904
V. Visualize

Visualize >> Link Layout >> Edge Bundling >>

Divided Edge Bundling

 Menu
Visualize >> Link Layout >> Edge Bundling >> Divided Edge Bundling

 Description

Edge bundling is a technique that improves the visibility and succinctness of


graph visualizations. Before the advent of a divided edge bundling, there have
been many attempts to improve the effectiveness of graph visualizations such as
force-directed edge bundling. While these techniques have few salient advantages,
they fail to consider direction and weight when bundling edges or links. Divided
edge bundling is developed to address these shortcomings by binding similar
edges in a directed graph thereby clearly differentiating or ‘dividing’ two groups
of edges according to their directions. Furthermore, this algorithm considers
weights of edges. Therefore, this algorithm offers useful information without
sacrificing the benefits of lucid and intuitive visualized layout.

 User Options

 Cycles: an algorithm will perform the specified


number of cycles (5.0 by default) thereby creating
2cycles -1 control points to render an edge. It is to be
noted that each edge is treated as a set of control
points.

 Iterations, Iteration rate, Simulation time step,

905
NetMiner Module Reference

time rate for each cycle: for each ith cycle, an algorithm performs iterations

ⅹ iteration ratei-1 n-body simulation by discrete physical simulation that

assumes equal amount of force during timeStep ⅹ time ratei-1 time.

 Edge stretchiness, edge attraction: used to control the ratio of spring force
and Coulombic force, which affects the final layout of control points. It is to
be noted that spring force is used to straighten edges and Coulombic force is
used to bend edges.

 Attractive force range: as Coulombic force is proportional to the inverted


Lorentzian function, it requires an attractive force range constant.

 Directional lane width: to separate edges having different directions, the


directional lane width value determines the distance between the two groups.

 Friction: After performing n-body simulation, the speed is multiplied by


friction value.

906
V. Visualize

 Outputs

Used Spring 2D >> Srping >> Fruchterman & Reingold node layout algorithm.

Options values are all set to their default value.

 Reference

Selassie D et al., (2011) ‘Divided Edge Bundling for Directional Network Data’,
IEEE Transactions on visualization and computer graphics, vol. 17, no. 12, 2354-
2363.

907
NetMiner Module Reference

VI. Chart
1. Pie Chart

2. Matrix Diagram

3. Area Bar

4. Box Plot

5. Scatter Plot

6. Contour Plot

7. Surface Plot

8. Network Contour Plot

9. Network Surface Plot

908
VI. Chart

Chart >> Pie Chart

 Menu
Chart >> Pie Chart

 Description
A pie chart is a circular chart divided into sectors, illustrating relative magnitudes or frequences or

percents. In a pie chart, the arc length of each sector is proportional to the quantity it represents.

 User Options

 Input
Select Vector: Select a Main Node Attribute data to chart.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Pie Chart’ module, Pie

chart is created.

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

909
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Pie Charts

 Time Complexity
 O(n)

 Reference

 Related Topics
Statistics >> Frequency >> Vector

910
VI. Chart

Chart >> Matrix Diagram

 Menu
Chart >> Matrix Diagram

 Description
Matrix Diagram relates nodes in matrix format. Dark cell (i, j) means that a link from node i to node j

exists. The larger weight a link has the darker cell it is represented.

You may partition matrix using permutation vector. Nodes in same partition are arranged near by in

Matrix Diagram.

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

Link Merge: When selected data contains multiple links (more than
two links which are composed of same source node and target node),

you should decide how to merge them to a single link.

Permutation Vector: Select a Main Node Attribute data. Partitions are created by the vector values.

 Main process
Chart Options
- Show Node Label: Show or hide node label.

- Show Borders: After this option is selected, partitions are presented in the matrix diagram.

911
NetMiner Module Reference

- Show Border Names: After this option is selected, names of the

vector values are presented beside the partitions in the matrix

diagram.

- Show Grid: Show or hide grid.

- Colored Scale: After this option is selected, shading of each cell

turns to blue.

- Color Bar: After this option is selected, color bar is presented at

the right side of the chart.

- Show Values: After this option is selected, values for each cell are

presented.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Matrix Diagram’ module, Matrix Diagram is created.

External File format is not available in Chart modules. Instead, you can save your chart to an image

file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Matrix Diagram

912
VI. Chart

Font style for node label can be changed in Edit >> Preference menu Font tab. And the two color for

the maximum and minimum value can be changed in Edit >> Preference menu Other tab.

 Time Complexity
 O(m)

 Reference

 Related Topics

913
NetMiner Module Reference

Chart >> Area Bar

 Menu
Chart >> Area Bar

 Description
Area bar is one of the most useful representations of 2-mode Network data. Each main node or sub

node is represented as a vertical bar (100% stacked bar). This bar is divided by more than 1 color. In

‘row major mode’, 1 bar presents 1 main node. The various colors of a bar show proportion of sub

nodes that the main node is connected with. The length of the part of a particular color is considered

of weight of the link.

In ‘column major mode’, roles of main node and sub node are exchanged.

 User Options

 Input
2-mode Network: Select a 2-mode Network. Only one 2-mode
Network can be selected.

- Nodeset: First, a Sub Nodeset containing 2-mode Network of

interest should be selected.

- Link Merge: Determine how multiple links are merged to a single

link.

 Main process
Select Direction
- Row Major: Each main node would be represented as one vertical

bar.

- Column Major: Each sub node would be represented as one vertical bar.

914
VI. Chart

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Area Bar’ module, Area

Bar is created.

External File format is not available in Chart modules. Instead, you can

save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Area Bar
Org_Net_Tiny1 data is used.

 Time Complexity
 Reference
 Related Topics

915
NetMiner Module Reference

Chart >> Box Plot

 Menu
Chart >> Box Plot

 Description
Box Plot module displays distributions of dependent variable conditioned on independent variable.

The five-number summaries are mean, median, first quartile(25% line), third quartile(75% line), min

and max whisker. The mean is presented as a block dot in the center of the box. The median is

presented as a black horizontal line in the box. In addition, colored box represents the area from first

quartile to third quartile. Min whisker is presented as a black line under the box and max whisker

above the box.

 User Options

 Input

Select Dependent Variable (Interval): Select a Main Node Attribute


data. It should be an interval variable.

Select Independent Variable (Categorical): Select another Main


Node Attribute data. It should be a categorical variable.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Box Plot’ module, Main Report and Box Plot is created.

916
VI. Chart

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output

Window.

 Reports
Main Report
- Box Plot Data: Main Report presents the distribution of dependent variable, for each Independent

variable value. Mean, standard deviation, minimum, first quartile, median, third quartile and

maximum are reported.

 Charts
Box Plot

917
NetMiner Module Reference

 Time Complexity
 O(n)

 Reference
[Link]

 Related Topics
Statistics >> ANOVA >> Vector

918
VI. Chart

Chart >> Scatter Plot

 Menu
Chart >> Scatter Plot

 Description
A scatter plot is a type of display using Cartesian coordinates to display values for the two selected

vectors. The data is displayed as a collection of points, each having the value of one variable

determining the position on the horizontal axis and the value of the other variable determining the

position on the vertical axis.

 User Options

 Input

Select X Axis vector: Select a Main Node Attribute data. It will be


used to X axis.

Select Y Axis vector: Select a Main Node Attribute data. It will be


used to Y axis.

 Output
You can select which outputs should be reported and which format the outputs should be displayed in.

In the result of ‘Scatter Plot’ module, Scatter Plot is created.

External File format is not available in Chart modules. Instead, you can save your chart to an image

file in Internal Tab format.

919
NetMiner Module Reference

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Scatter Plot

 Time Complexity
 O(n)

 Reference
 Related Topics

920
VI. Chart

Chart >> Contour Plot

 Menu
Chart >> Contour Plot

 Description
Contour plot is a 3D data plot whose color indicates magnitude. To represent contour with a few

nodes, regression analysis is used. The color of each point is determined by regression-like analysis

such as linear, quadratic, and weighted sum.

 User Options

 Input
Select Variable
- X: select a Main Node Attribute data. This attribute is used as X axis.

- Y: select a Main Node Attribute data. This attribute is used as Y axis.

- Z: select a Main Node Attribute data. This attribute is used as Z axis,

and it is the dependent variable of regression-like analysis.

 Main process

Select Fitting Method


- Linear: the value at each (x,y) point in 2D is determined by the

equation resulted from the linear regression of the attribute value of

node to its x,y coordinates.

- Quadratic: the value at each (x,y) point in 2D is determined by the

equation resulted from the quadratic regression of the attribute value of node to its x,y coordinates.

- Weighted Sum: the value at each (x,y) point in 2D is determined by the weighted linear sum of

every node's attribute value. It uses 1/[the Euclidean distance between (x,y) and the node] as the

921
NetMiner Module Reference

weight.

Chart Options
- Show Raw Data Points: Show or hide raw data points on chart.

- Show node name: Show or hide node name on chart.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Contour Plot’

module, contour plot is created.

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Contour Plot

922
VI. Chart

 Time Complexity

 Reference

 Related Topics
Statistics >> Regression

923
NetMiner Module Reference

Chart >> Surface Plot

 Menu
Chart >> Surface Plot

 Description
Surface plot is a 3D data plot. For visual clarity, surface is determined by regression-like analysis

(linear, quadratic, weighted sum).

 User Options

 Input
Select Variable
- X: select a Main Node Attribute data. This attribute is used as X axis.

- Y: select a Main Node Attribute data. This attribute is used as Y axis.

- Z: select a Main Node Attribute data. This attribute is used as Z axis

and it is the dependent variable of regression-like analysis.

 Main process

Select Fitting Method


- Linear: the value at each (x,y) point in 2D is determined by the

equation resulted from the linear regression of the attribute value of

node to its x,y coordinates.

- Quadratic: the value at each (x,y) point in 2D is determined by the

equation resulted from the quadratic regression of the attribute value of node to its x,y coordinates.

- Weighted Sum: the value at each (x,y) point in 2D is determined by the weighted linear sum of

every node's attribute value. It uses 1/[the Euclidean distance between (x,y) and the node] as the

weight.

924
VI. Chart

Chart Options
- Show Raw Data Points: Show or hide raw data points on chart.

- Show node name: Show or hide node name on chart.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Surface Plot’ module,

surface plot is created.

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Surface Plot

925
NetMiner Module Reference

 Time Complexity

 Reference

 Related Topics
Statistics >> Regression

926
VI. Chart

Chart >> Network Contour Plot

 Menu
Chart >> Network Contour Plot

 Description
Network Contour Plot is 3D data plot representing height as color. It’s almost same as Contour Plot

except coordinates in XY plane is determined by visualization layout(Kamada & Kawai algorithm)

results applied to selected 1-mode Network. To represent contour with a few nodes, regression

analysis is used. The color of one point is determined by regression-like analysis. (linear, quadratic,

weighted sum)

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just one
1-mode Network.

Link Merge: When selected data contains multiple links, where more

than two links connect the same source node and target node pair, you

should decide how to merge them to a single link.

Select Vector: Select a Main Node Attribute data. This attribute is


used as Z axis and it is the dependent variable of regression-like

analysis.

927
NetMiner Module Reference

 Main process

Select Fitting Method


- Linear: the value at each (x,y) point in 2D is determined by the

equation resulted from the linear regression of the attribute value of

node to its x,y coordinates.

- Quadratic: the value at each (x,y) point in 2D is determined by the

equation resulted from the quadratic regression of the attribute value of

node to its x,y coordinates.

- Weighted Sum: the value at each (x,y) point in 2D is determined by the weighted linear sum of

every node's attribute value. It uses 1/[the Euclidean distance between (x,y) and the node] as the

weight.

Chart Options
- Show Raw Data Points: Show or hide raw data points on chart.

- Show node name: Show or hide node name on chart.

 Output
You can select which outputs should be reported and which format the

outputs should be displayed in. In the result of ‘Network Contour Plot’

module, network contour plot is created.

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Network Contour Plot

928
VI. Chart

 Time Complexity

 Reference

 Related Topics
Statistics >> Regression

Visualize >> Spring >> Kamada & Kawai

929
NetMiner Module Reference

Chart >> Network Surface Plot

 Menu
Chart >> Network Surface Plot

 Description
Network Surface Plot is 3D data plot similar to Surface Plot except that their coordinates in XY plane

are determined by visualization Kamada & Kawai algorithm on the selected 1-mode Network. For

visual clarity, surface is determined by regression-like analysis (linear, quadratic, weighted sum).

 User Options

 Input

1-mode Network: Select a 1-mode Network. You can choose just


one 1-mode Network.

- Link Merge: When selected data contains multiple links, where

more than two links connect the same source node and target node

pair, you should decide how to merge them to a single link.

Select Vector: Select a Main Node Attribute data. This attribute is


used as Z axis and it is the dependent variable of regression-like

analysis.

 Main process

Select Fitting Method


- Linear: the value at each (x,y) point in 2D is determined by the

930
VI. Chart

equation resulted from the linear regression of the attribute value of node to its x,y coordinates.

- Quadratic: the value at each (x,y) point in 2D is determined by the equation resulted from the

quadratic regression of the attribute value of node to its x,y coordinates.

- Weighted Sum: the value at each (x,y) point in 2D is determined by the weighted linear sum of

every node's attribute value. It uses 1/[the Euclidean distance between (x,y) and the node] as the

weight.

Chart Options
- Show Raw Data Points: Show or hide raw data points on chart.

- Show node name: Show or hide node name on chart.

 Output
You can select which outputs should be reported and which format

the outputs should be displayed in. In the result of ‘Network Surface

Plot’ module, network surface plot is created.

External File format is not available in Chart modules. Instead, you

can save your chart to an image file in Internal Tab format.

 Outputs
Each output is listed as an Inner Tab at the bottom of Output Window.

 Charts
Network Surface Plot

931
NetMiner Module Reference

 Time Complexity

 Reference

 Related Topics
Statistics >> Regression

Visualize >> Spring >> Kamada & Kawai

932
“Unleashing Hidden Power of Networks”

[Link]

Tel: +82-31-739-8352

Fax: +82-31-739-8354

Email. netminer@[Link]

933

You might also like