0% found this document useful (0 votes)
14 views44 pages

MLC 01 Intro Basics Graphs-Sose2023

The document presents an overview of machine learning applications in analyzing complex data, specifically focusing on networks and sequences. It covers fundamental concepts of complex networks, including their properties, community detection, and centrality measures, while also discussing the significance of scale-free networks and preferential attachment. The content is structured into sections that define key terms, illustrate examples, and highlight various analytical tasks and methodologies used in network analysis.

Uploaded by

Yunru Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views44 pages

MLC 01 Intro Basics Graphs-Sose2023

The document presents an overview of machine learning applications in analyzing complex data, specifically focusing on networks and sequences. It covers fundamental concepts of complex networks, including their properties, community detection, and centrality measures, while also discussing the significance of scale-free networks and preferential attachment. The content is structured into sections that define key terms, illustrate examples, and highlight various analytical tasks and methodologies used in network analysis.

Uploaded by

Yunru Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning for Complex Data

Networks and Sequences


Summer semester 2023

Prof. Dr. Martin Atzmüller

Osnabrück University & DFKI


Semantic Information Systems Group
[Link]
1 – Introduction: Graphs & Complex Networks
1 Introduction and Overview: Complex Networks and Graphs
2 Basic (Structural) Properties of Complex Networks
3 Definitions & Notation – Networks/Graphs
4 Detecting Groups & Communities
5 Outlook: Link Prediction
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Complex networks
Definition
Graphs abstracting, directly or indirectly, interactions in real-world
systems.

Basic topological
features
• Low Density
• Small Diameter
• Scale-free

• High Clustering
Coefficient

1
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Scale-free

2
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Scale-invariant Networks
• Some nodes have significantly more
connections (hubs) compared to other nodes
• Popular nodes can have millions of links è
The networks appears to have no limit/scale

• Necessary:
– Growth
– Preferential Attachment

3
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Scale-invariant Networks
• Power law: Alpha

4
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Networks [Newman 2003]

Vorlesung
Web
Science

5
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

6
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

7
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Preferential Attachment [Barabasi 1999]


• "The rich getting richer"
• "Preferential Attachment": High probability, that
a new node connects to an existing node having a
high degree
• Examples:
– New member in a social network (linking to others ...)
– Internet router network
– Paper citations

• Which node has


the best chances
to get a new link?

8
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Generating Scale-free Networks


[Barabasi & Albert 1999]

9
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Generating scale-free networks


[Barabasi & Albert 1999]

10
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Generating[Barabasi
Scale-free Networks
& Albert 1999]

11
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Six Degrees of Separation

[Link]

12
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

The Bacon Number

13
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

[Link]

14
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

The Bacon Number [Watts 2002]

15
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Milgram – Small world Experiment

16
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Follow-Up (2008)
è [Link]
• 30 billion conversations among 240 million
people of Microsoft Messenger
• Communication graph with 180 million nodes
and 1.3 billion undirected edges
• Largest social network constructed and
analyzed to date (2008)

17
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Follow-Up/Results
• Approximating the "Degrees of Separation"
– Random sample - 1000 Nodes
– For every node: Computation of the shortest path
to all other nodes
Path length 6.6, Median 7
– ~ "7 degrees of separation"
– Longer paths exist
(up to path length 29)
– Result: On average a random
node pair has a distance of
6.6 links (Milgram: 5.5)
[Link]

18
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Now, more formally ...

19
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Notation

A graph G =< V , E ⊆ V × V >

• V : set of nodes (a.k.a. vertices, actors,


sites)
• E : set of edges (a.k.a. ties, links, bonds)

Notation
• AG Adjacency Matrix d : aij 6= 0 iff (vi , vj ) ∈ E , 0 otherwise.
• n = |V |
• m = |E |. Often m ∼ n
• Γ(v ) : neighbors of node v . Γ(v ) = {x ∈ V : (x, v ) ∈ E }.
• Node degree : d(v ) =k Γ(v ) k

20
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Clustering coefficient

Definitions
a

• Global version :

• Local on node v : # of links between neighbors of v
# Potential links between neighbors of v

• CC = 3×1
8
• CCv =1 = 16

21
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Further Definitions I

• In a directed graph, E denotes a subset of V × V .


• The density of G is the fraction of possible edges that are
actually present.
• In a weighted graph, each edge l ∈ E is given an edge weight
w (l) by some weighting function w : E → R.
• The degree of a node in a network is the number of
connections it has to other nodes.
• The adjacency matrix of a set of nodes S with n = |S|
contained in a (weighted) graph G = (V , E ) is a matrix
A ∈ Rn×n with Aij = 1 (Aij = w (i, j)) iff (i, j) ∈ E for any
nodes i, j in S (assuming some bijective mapping from
1, . . . , n to S). We identify a graph with its according
adjacency matrix where appropriate.

22
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Further Definitions II

• A path v0 →G vn of length n in a graph G is a sequence


v0 , . . . , vn of nodes with n ≥ 1 and (vi , vi+1 ) ∈ E for
i = 0, . . . , n − 1.
• A shortest path between nodes u and v is a path u →G v of
minimal length and the diameter of G is the shortest path of
maximal length.
• The transitive closure of a graph G = (V , E ) is given by
G ∗ = (V , E ∗ ) with (u, v ) ∈ E ∗ iff there exists a path u →G v .
• A strongly connected component (scc) of G is a subset
U ⊆ V , such that u →G ∗ v exists for every u, v ∈ U.
• A (weakly) connected component (wcc) is defined accordingly,
ignoring the direction of edges (u, v ) ∈ E .

23
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Further Definitions III

• Many observations of network properties can be explained just


by the network’s degree distribution
• It is therefore important to contrast the observed property to
the according result obtained on a random graph as a null
model which shares the same degree distribution.
• If a single network G is considered, a corresponding null
model G can be obtained by randomly replacing edges
(u1 , v1 ), (u2 , v2 ) ∈ E with (u1 , v2 ) and (u2 , v1 ), ensuring that
these edges were not present in G beforehand
• This process is typically repeated a multiple of the graph edge
set’s cardinality

24
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Complex networks : Example I

Direct interactions
Greatest connected component of Wikipedia

• Density : 2−3
• Diameter : 28
• Clustering coeff.: 0.47

25
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Complex networks : Example II

Indirect interactions
Co-authorship network - DBLP 1980-1984 (for authors

active for more than 10 years)

• Density : 10−4
• Diameter : 24
• Clustering coeff.: 0.67

26
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Complex network analysis

A lot of interesting results using a simple model: interacting


nodes

• Node related analysis tasks


Centralities, Roles, Similarities
• Community detection
Local, disjoint, overlapping, hierarchical
• Network evolution mining
Link prediction

27
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Centralities

Nodes centralities
• Idea: Estimate importance
and/or influence of a node
• PageRank, HITS algorithms: web
search results ranking
• FolkRank algorithm: Tag
recommendation
• Betweenness, Closeness:
Importance of actors in a social
network
• Applications: E. g.,
Degree centrality example
recommendations, viral marketing

28
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Node Centrality I

• The betweenness centrality bet measures the number of


shortest paths of all node pairs that go through a specific
node.
X σst (v )
bet(v ) = .
σst
s6=v 6=t∈V

Hereby, σst denotes the number of shortest paths between s


and t and σst (v ) is the number of shortest paths between s
and t passing through v . Thus, a vertex has a high
betweenness centrality if it can be found on many shortest
paths between other vertex pairs.

29
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Node Centrality II
• The closeness centrality clos considers the length of these
shortest paths. Then, the shorter its shortest path length to
all other reachable nodes, the higher a vertex ranks:
1
clos(v ) = P .
t∈V \v dG (v , t)

dG (v , t) denotes hereby the geodesic distance (shortest path)


between the vertices v and t.
• The eigenvector centrality eig of a node is an important
measure of its influence, similar to the pagerank measure.
Intuitively, a node is central, if it has many central neighbors.
The eigenvector centrality eig (v ) of node v is defined as
X
eig (v ) = λ eig (u) ,
{u,v }∈E

where λ ∈ R is a constant.
30
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Roles

Node Roles

• Nodes having a similar


structural behavior
• Centers of stars, in cliques,
peripheral nodes, . . .
• Role query, Role outliers,
Role dynamics, Role
transfer, . . .
red nodes are star centers, blues are

in-clique and others are peripherial

31
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Node similarity

• A node is similar to itself


• Neighborhood-based similarity: Two nodes are similar if
they are connected to the same nodes.
• Distance-based similarity: Two nodes are similar if they are
connected by short paths.
• Structural-similarity : Two nodes are similar if each one is
similar to the neighbors of the other.

32
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Community detection

Overview
I Given a network/graph, find “modules”
• Single network
• Multiplex networks
• Attributed networks
I Community structures
• Graph Clustering/disjoint communities
• Hierarchical organization
• Overlapping communities
I Questions:
• What is “a community”?
• What are “good” communities?
• How do we evaluate these?

33
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Community detection

Definitions
I A dense subgraph loosely coupled to other modules in the
network
I A community is a set of nodes seen as “one” by nodes outside
of the community
I A subgraph where almost all nodes are linked to other nodes
in the community.
I ...

34
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Community detection

Applications
• Finding similar items (documents, users, queries, etc.)
• Collaborative filtering
• Network visualisation
• Computation distribution
• ...

35
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Global community detection

• Communities can be defined with respect to whole graph


• Graph has community structure, if its structure is different
from a random graph
• Random graph: Not expected to have community structure
Any two vertices have the same probability to be adjacent
• Define null model; use it for investigating if we can observe
community structure in a graph

36
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Local community detection

• Considers local communities, i. e., “local” model


• Examples:
• Start with a seed-node, which is expanded
• Application of local (connectivity) criteria
• Overlapping, local communities

(Atzmueller et al. 2021)

37
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Attributed Graph I

• Integrates additional attribute data about individual instances


• Example: In a social network, attributes describing each user’s
characteristics might be combined with the underlying
friendship network to form an attributed graph
• Also, here the relationships (friendship) could be labeled with
some information, e. g., specifying the “degree” of friendship
Other terms: “graphs with feature vectors”, “labeled graphs”, or
“annotated graph”, where also the term “network” is often
substituted for “graph”

38
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Attributed Graph II

Definition 1 (Attributed Graph)


An attributed graph is a graph G in which each v ∈ V is
associated to a vector of attribute values x = (x1 , . . . , xd ), and
each edge e ∈ E to a vector y = (y1 , . . . , yt ). We use ai (v ) to
refer to the ith attribute value of a node v , and ai (e) for the edge
e respectively. We denote with AV the set of node attributes,
d = |AV |, and with AE the set of edge attributes, t = |AE |. If
|AV | > 0, |AE | = 0, we refer to G as node-attributed; similarly if
|AV | = 0, |AE | > 0, we call it edge-attributed. If
|AV | = 0, |AE | = 0, then we refer to a plain graph.

39
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Attributed Graph & Local Community

(Atzmueller et al. 2021)

Attributed graph with a community describable by a local,


discriminative description ((A, B), top), and one describable by a
non-discriminative one ((C , D), bottom)

40
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Link Prediction

Link prediction
I Structural:
Find hidden/missing links
in a network
ex. Missing links in
Wikipedia
I Temporal:
Predicting new links to
appear at time tp based on
the network state at time
t < tp

41
Machine Learning for Complex Data, Prof. Dr. M. Atzmüller, Osnabrück University

Summary

What did we learn?


1 Overview on Complex Networks and Graphs
• Structure
• Properties
• Basic Definitions

2 Exemplary Methods: Community Detection & Link Prediction

Outlook: Ranking, Classification, Deep Learning Methods

42

You might also like