Client Clustering For Energy-Efficient Clustered Federated Learning in Wireless Networks

Uploaded by

Sree Krishna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views6 pages

Client Clustering For Energy-Efficient Clustered Federated Learning in Wireless Networks

Uploaded by

Sree Krishna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Client Clustering for Energy-Efficient Clustered Federated

Learning in Wireless Networks

Jieming Bian Jie Xu∗
[email protected] [email protected]
University of Miami University of Miami
Coral Gables, FL, USA Coral Gables, FL, USA
ABSTRACT processing of data on end-users’ own devices have become in-
Clustered Federated Learning (FL) is a novel approach for handling creasingly appealing. Hence, distributed learning, and in particular
data heterogeneity in FL and training personalized ML models for Federated Learning (FL) [4, 7], has become a significant area of
clients. However, existing research overlooks network constraints research, aiming to fully exploit on-device storage and computing
and objectives when implementing clustered FL in a wireless net- capabilities.
work. Since clients experience varying energy costs when con- A salient feature of FL is data heterogeneity or non-i.i.d. data,
nected to different servers, the cluster formation greatly impacts which is inherent as data is generated by and stored on individual
system energy efficiency. To address this, we present an energy- clients. Exploiting this heterogeneity can greatly aid in building
efficient client clustering problem that optimizes FL performance personalized ML models, but it is challenged by the limited volume
while minimizing energy costs. We introduce a new metric, the of data each client has. Even though it’s common to train a single
effective number of clients, to predict cluster FL performance based ML model via FL using all devices’ data, this may not be able to
on size and data heterogeneity, guiding the clustering optimiza- provide personalized predictions and achieve high performance
tion. To find optimal client clusters, we propose an energy-efficient for all clients. To tackle non-i.i.d. data, clustered FL [2, 6, 9] has
clustering algorithm using parallel Gibbs Sampling. Simulations recently been explored to train personalized ML models using data
and data experiments demonstrate that our algorithm achieves tun- similarity among clients. However, previous work on clustered
able trade-offs between data similarity (thus FL performance) and FL mainly treated it as a pure ML problem, overlooking network
energy cost. constraints when clustering clients in a real-world wireless network
environment.
KEYWORDS In this paper, we examine, for the first time, the formation of
client clusters in clustered FL in a distributed wireless network. The
Federated Learning, Wireless Networks
model considers that both clients and the parameter servers of clus-
ACM Reference Format: ters are distributed over the wireless network, leading to varying
Jieming Bian and Jie Xu. 2023. Client Clustering for Energy-Efficient Clus- wireless channel conditions. Given that clients upload local model
tered Federated Learning in Wireless Networks. In Adjunct Proceedings of the updates to associated parameter servers via wireless channels, their
2023 ACM International Joint Conference on Pervasive and Ubiquitous Comput- transmission energy consumption can vary significantly depending
ing & the 2023 ACM International Symposium on Wearable Computing (Ubi-
on which server and hence which cluster they join. As clients are
Comp/ISWC ’23 Adjunct), October 08–12, 2023, Cancun, Quintana Roo, Mexico.
energy-limited and numerous rounds are often needed for FL to
ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3594739.3612913
converge, the formation of clusters plays a crucial role in the energy
efficiency of the clustered FL system in wireless networks.
1 INTRODUCTION The paper focuses on the "one-shot" client clustering problem
Today, an enormous amount of data is generated daily from end- where clustering is performed only once before FL starts and re-
user devices like personal computers, smart phones, and Internet-of- mains unchanged throughout the learning course. Even though
Things (IoT) devices. Leveraging machine learning (ML) algorithms dynamically adjusting clusters might yield better outcomes, it can
such as deep learning, this data can be harnessed for applications be energy-inefficient due to substantial extra data exchange and
like recommendation systems, image recognition, and natural lan- computation. However, some of the techniques developed here
guage processing. As privacy concerns mount, local storage and could also be readily applied in a periodic clustering setting. The
main contributions of this paper include:
∗ This
work is partially supported by NSF under grants 2006630, 2033681, 2029858,
2044991 and 2319780.
• In the context of one-shot clustering, we introduce a new
metric called the effective number of clients that captures
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed both the number of participating clients and their data het-
for profit or commercial advantage and that copies bear this notice and the full citation erogeneity. This metric, which has been empirically shown
on the first page. Copyrights for components of this work owned by others than the to correlate with FL performance on non-i.i.d. data, is used
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission as the basis for client clustering.
and/or a fee. Request permissions from [email protected]. • We formulate an energy-efficient client clustering problem
UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico considering both the data similarity among clients and the
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0200-6/23/10. . . $15.00 total energy cost. Despite being a variation of the correlation
https://doi.org/10.1145/3594739.3612913 clustering problem and hence NP-hard [1], we develop a

718
UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico Jieming Bian and Jie Xu

new algorithm based on Gibbs sampling that can solve this degree of this set is defined as the average pair-wise distribution
problem with a performance guarantee. difference:
1 ∑︁ ∑︁
ΘN = 2 Δ𝑛,𝑚 (4)
2 FEDERATED LEARNING AND NON-I.I.D. 𝑁 𝑛∈ N 𝑚∈ N
DATA Intuitively, a higher degree of non-i.i.d. data negatively impacts
2.1 Federated Learning Preliminaries the performance of FL. Thus, taking also the number of clients 𝑁
into account, we hypothesize that the FL performance is positively
In a typical FL system, there is a single parameter server and a
correlated with the following metric
number of 𝑁 participating clients, denoted by N = {1, ..., 𝑁 } with
distributed datasets. Let 𝐷𝑛 be the dataset of client 𝑛 ∈ N . FL aims 𝑁 · (1 − 𝛼 · Θ N ) ≜ 𝜂 (N ) (5)
to solve an empirical risk minimization problem where 𝛼 ≥ 0 is a constant depending on the specific FL task and the
1 ∑︁ choice of the function Δ. We call this metric the effective number of
min 𝑙 (𝑤) = min 𝑙𝑛 (𝑤) (1) clients, and denote it by 𝜂 (N ) for a client set N . On the one hand,
𝑤 𝑤 𝑁
𝑛∈ N
having more clients participating in FL improves the FL perfor-
where 𝑤 is the ML model variable that one would like to optimize, mance. On the other hand, a larger non-i.i.d. degree among these
𝑙𝑛 (𝑤) = |𝐷1 | 𝑧 ∈𝐷𝑛 𝑙 (𝑤; 𝑧) is the empirical loss function at client
Í
𝑛
clients hampers the FL performance. Just like the number of clients
𝑛, and 𝑙 (𝑤; 𝑧) is the loss function evaluated at model 𝑤 and data 𝑁 in the i.i.d. data case, the effective number of clients 𝜂 (N ) can be
sample 𝑧 = (𝑥; 𝑦) with 𝑥 being the feature and 𝑦 being the label. used to predict the FL performance by a set of clients with non-i.i.d.
A typical FL algorithm, such as FedAvg [4], runs in an iterative datasets. This metric will serve as the basis for the client clustering
manner. In each iteration 𝑡, the server sends the current global problem to predict the performance of clustered FL. Note that when
model 𝑤 𝑡 −1 to the clients. Each client then updates its local model the data is i.i.d. among the clients, namely Θ N = 0, the effective
𝑤𝑛𝑡 based on the global model 𝑤 𝑡 −1 with its local dataset 𝐷𝑛 . The number of clients 𝜂 (N ) reduces to the actual number of clients 𝑁 .
updated local models are then sent back to the server. Finally, the
server aggregates the received local models, e.g., by averaging the 3 MODEL AND FORMULATION
local models, to obtain a new global model 𝑤 𝑡 . The iterative proce- 3.1 System Model
dure continues until a certain termination criteria is met.
We consider a wireless network consisting of 𝐾 base stations (BSes),
indexed by K = {1, ..., 𝐾 }, and 𝑁 users, indexed by N = {1, ..., 𝑁 }.
2.2 Non-i.i.d. Degree and Effective Client Users play as FL clients while the BSes play as the FL parameter
Number servers, and hence we use users and clients interchangeably, and
A key challenge in FL is the non-i.i.d. data among the participating BSes and servers interchangeably thereafter. Each client 𝑛 ∈ N has
clients. In this paper, we focus on the label distribution heterogene- a local dataset 𝐷𝑛 with size |𝐷𝑛 | and data distribution 𝑃𝑛 . The clients
ity. Suppose there are a total of 𝑑 labels in the label set Y, then are distributed over the network and within different distances from
client 𝑛’s label distribution is denoted by a vector 𝑃𝑛 ∈ R𝑑 , with the servers. Depending on the clients’ and servers’ locations, each
𝑃𝑛 (𝑦) being the probability that a data sample has label 𝑦. client 𝑛 can access a subset of servers K𝑛 ⊆ K. We assume that
For two label distributions 𝑃𝑛 and 𝑃𝑚 , let Δ(𝑃𝑛 , 𝑃𝑚 ) be a func- clients are in fixed locations during the time of interest, and let ℎ𝑛,𝑘
tion that is used to measure their difference. This function can denote the average channel condition between client 𝑛 and server
have many specific choices. For example, Kullback-Leibler (KL) 𝑘 ∈ K𝑛 .
divergence [5] is a common function to measure the distribution We consider a clustered FL architecture where the clients are
difference and, in this case divided into (at most) 𝐾 different clusters before FL starts, and
∑︁
𝑃𝑛 (𝑦)
clients within the same cluster collaboratively train a ML model
ΔKL (𝑃𝑛 , 𝑃𝑚 ) = 𝑃𝑛 (𝑦) log (2) via FL. Without loss of generality, we let server 𝑘 be the parameter
𝑃𝑚 (𝑦)
𝑦∈Y server for cluster 𝑘. Hence, clients in cluster 𝑘 will have to download
the global model from server 𝑘 and upload its local model to server
Alternatively, we can simply calculate the distance between 𝑃𝑛 and
𝑘 in every FL iteration. Let 𝑎𝑛 ∈ K𝑛 denote the server association
𝑃𝑚 in a norm space as follows:
decision variable for client 𝑛, which represents the server it is
Δ𝑝 (𝑃𝑛 , 𝑃𝑚 ) = ∥𝑃𝑛 − 𝑃𝑚 ∥ 𝑝 (3) associated with and hence the cluster it joins in. We collect the
server association decisions of all clients in 𝒂 = (𝑎 1, ..., 𝑎 𝑁 ). Denote
where ∥ · ∥ 𝑝 denotes the 𝑙𝑝 -norm. Clearly, in both cases, the dif-
the set of clients in cluster 𝑘 by C𝑘 (𝒂), which can be computed
ference between a distribution and itself is zero, i.e., Δ(𝑃𝑛 , 𝑃𝑛 ) = 0.
based on the association decision as C𝑘 (𝒂) = {𝑛 : 𝑎𝑛 = 𝑘 }. Note
We keep the probability divergence function Δ(·, ·) general in the
that it is possible that a cluster is empty and no clients are associated
remainder of this paper, and write Δ(𝑃𝑛 , 𝑃𝑚 ) = Δ𝑛,𝑚 for short.
with its corresponding server.
Note that Δ needs not to be symmetric. For example, for KL diver-
gence, ΔKL (𝑃𝑛 , 𝑃𝑚 ) ≠ ΔKL (𝑃𝑚 , 𝑃𝑛 ) in general. With Δ, we define 3.2 Client Energy Cost
the non-i.i.d. degree of the datesets of a set of clients.
Participating in FL incurs energy costs to the clients which are
Definition 1 (Non-I.I.D. Degree). Consider a set of 𝑁 clients often energy-limited. The client energy cost consists of two parts:
N = {1, ..., 𝑁 } and their respective datasets {𝐷𝑖 }𝑖 ∈ N . The non-i.i.d. computation energy cost and (upload) transmission energy cost.

719
Client Clustering for Energy-Efficient Clustered Federated Learning in Wireless Networks UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico

Since computation energy cost does not depend on which cluster {𝑤 ∗C , ..., 𝑤 ∗C } can be written as
1 𝐾
the client joins in, we consider only the transmission energy cost
in this paper. ∑︁ ∑︁
𝐿(𝑤 ∗C1 , ..., 𝑤 ∗C𝐾 ) = 𝑙𝑛 (𝑤 ∗C𝑘 )
We consider a synchronized FL setting where the parameter
𝑘 ∈ K 𝑛∈ C𝑘
server performs local model aggregation after it receives the local ∑︁ 1 ∑︁
models from all clients in the cluster. We assume a maximal delay = |C𝑘 | 𝑙𝑛 (𝑤 ∗C𝑘 )
constraint 𝑇 on the training of each FL round. The delay constraints |C𝑘 |
𝑘∈K 𝑛∈ C𝑘
can vary among different FL tasks for different clusters, but in this ∑︁
= |C𝑘 | · 𝑙 C𝑘 (𝑤 ∗C𝑘 )
paper we assume a common 𝑇 for all clusters since the extension to
𝑘∈K
the heterogeneous delay constraint is straightforward. We assume
that clients have heterogeneous computing capabilities, and denote Considering both the total learning loss function and the energy
the computing speed, in terms of the CPU frequency, of client 𝑛 cost, the client clustering problem is formulated as a total cost
by 𝑓𝑛 . Depending on which server a client 𝑛 is connected with, the minimization problem as follows:
client can adjust its upload transmission power 𝑝𝑛 , to minimize the
total energy cost while meeting the delay constraint. In particular, ∑︁
cp 𝛽𝑔𝐷 min 𝛾𝐿(𝑤 ∗C1 , ..., 𝑤 ∗C𝐾 |𝒂) + (1 − 𝛾) 𝐸𝑛 (𝑎𝑛 ) (9)
the computation time of client 𝑛 can be calculated as 𝑇𝑛 = 𝑓 𝑛 , 𝒂
𝑛 𝑛∈ N
where 𝑔 is the number of CPU cycles required for computing one
data sample and 𝛽 is the number of local training iterations in where 𝛾 ∈ [0, 1] is a tunable weight that trades-off between the
one learning round. These two parameters both depend on the learning loss and the energy cost.
specific FL algorithm adopted and its configurations (e.g., mini- Unfortunately, the above cost minimization problem is impossi-
batch size and number of epoches). To meet the delay constraint 𝑇 , ble to solve because there is no way to obtain the optimal model
the transmission time for client 𝑛 to upload the local model updates parameters 𝑤 ∗C , ..., 𝑤 ∗C and evaluate the total learning loss un-
cp 1
is therefore𝑇𝑛tx = 𝑇 −𝑇𝑛 . Let 𝑆𝑛 denote the uploading data size, then
𝐾
less FL finishes training. However, as client clustering has to be
the minimum required transmission rate is 𝑟𝑛 ≜ 𝑆𝑛 cp . Following performed before FL starts the training process, a serious contra-
𝑇 −𝑇𝑛
the Shannon’s channel capacity equation, if client 𝑛 is associated diction is caused. To overcome this challenge, we have to somehow
with server 𝑘, then it must set its transmission power 𝑝𝑛 so that predict the FL performance if a certain cluster of clients perform
FL together. Thus, we leverage the effective client number metric
that considers both the cluster size and its non-i.i.d. degree, and
ℎ𝑛,𝑘 𝑝𝑛 formulate an alternative optimization problem as follows:
𝐵 log2 1 + = 𝑟𝑛 (6)
𝐵𝑁 0
∑︁ ∑︁
max 𝛾 |C𝑘 (𝒂)| · 𝜂 (C𝑘 (𝒂)) − (1 − 𝛾) 𝐸𝑛 (𝑎𝑛 ) (10)
𝒂
where 𝐵 is the bandwidth for each client and 𝑁 0 the noise power 𝑘∈K 𝑛∈ N
spectral density. Solving for 𝑝𝑛 in the above equation, the transmis-
sion energy of client 𝑛 is thus Essentially, we replace the loss function 𝑙 C𝑘 (𝑤 ∗C ) with the negative
𝑘
effective number of clients 𝜂 (C𝑘 (𝒂)). Hence, instead of minimizing
the empirical loss function directly, we aim to maximize a weighed
cp 𝐵𝑁 0 (2𝑟𝑛 /𝐵 − 1) sum of the per-cluster effective client number while considering
𝐸𝑛 (𝑎𝑛 ) = (𝑇 − 𝑇𝑛 ) (7)
ℎ𝑛,𝑎 (𝑛) clients’ energy costs.

The different channel conditions between client 𝑛 and the servers 4 ENERGY-EFFICIENT CLUSTERING
result in different energy costs, which defines a preference order ALGORITHM
on the servers for the client to connect. Thus, the wireless network
effect will play an indispensable role in forming client clusters. 4.1 Extreme Cases
When 𝛾 = 0, problem (10) simplifies to
3.3 Problem Formulation ∑︁
min 𝐸𝑛 (𝑎𝑛 ) (11)
For a given cluster C ⊆ N , let 𝑤 ∗C
be the optimal solution that 𝒂
𝑛∈ N
solves the empirical risk minimization with respect to the datasets
of clients in C, i.e., wherein each client 𝑛 connects to the server with the best channel
condition, i.e., 𝑎𝑛∗ = arg max𝑘 ∈ K𝑛 ℎ𝑛,𝑘 . However, this can result
1 ∑︁ in poor FL performance due to grouping clients with diverse data
𝑤 ∗C = arg min 𝑙 C (𝑤) = arg min 𝑙𝑛 (𝑤) (8) distributions.
𝑤 𝑤 |𝐶 |
𝑛∈ C
On the other extreme end 𝛾 = 1, problem (10) reduces to
∑︁
Therefore, for a given clustering outcome {C1, ..., C𝐾 }, the total loss max |C𝑘 (𝒂)| · 𝜂 (C𝑘 (𝒂)) (12)
of the entire clustered FL system given the optimal model weights 𝒂
𝑘∈K

720
UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico Jieming Bian and Jie Xu

By the definition of the effective client number, the objective func- The client utility takes into account the aggregated data distribution
tion can be rewritten as similarity (or dis-similarity) between the client itself and all other
∑︁ ∑︁ clients in the same cluster, as well as the energy cost by connecting
|C𝑘 (𝒂)| · 𝜂 (C𝑘 (𝒂)) = |C𝑘 (𝒂)| 2 (1 − 𝛼 · Θ C𝑘 (𝒂) )
to the server of this cluster. It is easy to verify that the objective
𝑘∈K 𝑘∈K
function in problem (10) can be decomposed as a sum of client
∑︁ 𝛼 ∑︁ ∑︁ utilities. Hence, the energy-efficient client clustering problem is
= |C𝑘 (𝒂)| 1 − Δ𝑛,𝑚 ®
© ª
|C𝑘 (𝒂)| 2 converted into a sum utility maximization problem as follows:
𝑘∈K 𝑛∈ C𝑘 (𝒂) 𝑚∈ C𝑘 (𝒂)
∑︁ ∑︁ « ∑︁ ¬ ∑︁
= (1 − 𝛼 · Δ𝑛,𝑚 ) max 𝑢𝑛 (𝑎𝑛 , 𝒂 −𝑛 ) (15)
𝒂
𝑘 ∈ K 𝑛∈ C𝑘 (𝒂) 𝑚∈ C𝑘 (𝒂) 𝑛∈ N

With this conversion, the effective client number maximization where by convention 𝒂 −𝑛 = (𝑎 1, ..., 𝑎𝑛−1, 𝑎𝑛+1, ..., 𝑎 𝑁 ) denotes the
problem can be transformed into a correlation clustering where server association actions of all clients except 𝑛.
the affinity matrix 𝑊 ∈ R𝑁 ×𝑁 is 𝑊𝑛,𝑚 = 1 − 𝛼 · Δ𝑛,𝑚 , ∀𝑛, 𝑚. In Our algorithm is developed based on the Gibbs Sampling (GS)
other words, consider that there exists an edge between any pair technique [3]. GS is used to generate the probability distribution of
of nodes (i.e., clients in our case) 𝑛 and 𝑚, and the edge weight the server association action by sweeping through each client, sam-
is 𝑊𝑛,𝑚 . The correlation clustering problem aims to partition the pling its server association action from a conditional distribution
set of nodes into 𝐾 clusters so that the intra-cluster weights are with the remaining clients fixing their server association actions.
maximized. Mathematically, the problem is According to the theory of Markov chain Monte Carlo, it is guaran-
∑︁ ∑︁ teed that the probability distribution of the server association action
max 𝑊𝑛,𝑚 𝑣𝑛,𝑘 𝑣𝑚,𝑘 Í
profile 𝒂 approximated by GS is proportional to exp( 𝑛 𝑢𝑛 (𝒂)/𝜏),
𝒖
𝑛,𝑚∈ N 𝑘∈K
∑︁ where 𝜏 is an algorithm parameter. Such a probability measure is
𝑣𝑛,𝑘 ∈ {0, 1}, ∀𝑛, 𝑘; 𝑣𝑛,𝑘 = 1, ∀𝑛 known as the Gibbs distribution. Furthermore, performing GS while
𝑘 reducing 𝜏 can result in the globally maximum system utility.
Í
Note that 𝑘 ∈ K 𝑣𝑛,𝑘 𝑣𝑚,𝑘 = 1 if and only if nodes 𝑛 and 𝑚 belong Specifically, in every iteration 𝑡, a client 𝑖 is picked according
to the same cluster. to some prescribed order. Then GS evaluates the achievable sum
utility 𝑛 𝑢𝑛 (𝑎𝑖 , 𝒂𝑡−𝑖−1 ) by varying client 𝑖’s action 𝑎𝑖 ∈ K𝑖 while
Í
It is well-known that optimizing the correlation clustering is
NP-hard, and classic methods take a continuous and convex relax- fixing the other clients’ actions 𝒂𝑡−𝑖−1 from the previous iteration.
ation approach to approximate the solution. Specifically, let 𝑉 = Next, GS updates client 𝑖’s action according to the GS distribution,
[𝑣𝑛,𝑘 ]𝑛∈ N,𝑘 ∈ K denote the partition matrix and define 𝑋 = 𝑉𝑉 𝑇 as i.e.,
the binary adjacency matrix, where 𝑋𝑛,𝑚 = 1 if and only if 𝑛 and
exp( 𝑛∈ N 𝑢𝑛 (𝑘, 𝒂𝑡−𝑖−1 )/𝜏)
Í
𝑚 belong to the same cluster. Therefore, the quadratic objective 𝑎𝑖𝑡 ← 𝑘, with prob. Í (16)
′ 𝑡 −1
Í
𝑘 ′ ∈ K𝑖 exp( 𝑛∈ N 𝑢𝑛 (𝑘 , 𝒂 −𝑖 )/𝜏)
Í
in (13), namely 𝑛,𝑚 𝑊𝑛,𝑚 [𝑉𝑉 𝑇 ]𝑛,𝑚 becomes linear (w.r.t. 𝑋 ), i.e.,
Í
𝑛,𝑚 𝑊𝑛,𝑚 𝑋𝑛,𝑚 . To ensure that 𝑋 can be decomposed to recover
𝑉 , a semi-definite constraint is imposed, and the problem becomes Intuitively, action that yields a higher utility will be chosen with a
∑︁ higher probability. For the other clients, they keep the action from
max 𝑊𝑛,𝑚 𝑋𝑛,𝑚 s.t. 𝑋 ⪰ 0 (13) the previous iteration, i.e., 𝑎𝑡𝑗 ← 𝑎𝑡𝑗 −1, ∀𝑗 ≠ 𝑖.
𝑋 𝑛,𝑚 While sequential GS converges to the global optimal solution
This problem is a semi-definite program and hence easy to solve. with probability 1 [3], it can be inefficient as it takes too long to
Once (the continuous version of) 𝑋 is obtained, it is factorized to complete one round of action updating if the number of clients is
𝑉𝑉 𝑇 and (the continuous version of) 𝑉 is then converted to binary large. Therefore, we design a parallel GS algorithm to speed up
values via proper rounding to produce the clustering solution. convergence by exploiting the structure of the client clustering
problem. The key to enabling parallelization is realizing that a
4.2 General Case client’s utility does not depend on far-away clients that do not share
common reachable servers. Specifically, we say that two clients 𝑛
The general case of the energy-efficient client clustering problem and 𝑚 are neighbors if they have at least one common reachable
is much more challenging than the two extreme cases. Because server, i.e. K𝑛 ∩ K𝑚 ≠ ∅. Denote the neighborhood of client 𝑛 by
of the additional energy cost term in the objective function and Γ(𝑛) = {𝑚 ∈ N : K𝑛 ∩ K𝑚 ≠ ∅}. Hence, the utility of a client
the fact that a client may not access all servers, the continuous satisfies
and convex relaxation approach used in the previous subsection
is no longer applicable. In addition, the relaxation approach only 𝑢𝑛 (𝑎𝑛 , 𝒂 −𝑛 ) = 𝑢𝑛 (𝑎𝑛 , 𝒂 Γ (𝑛) ) (17)
produces approximate clustering solutions with no performance
guarantee. In this subsection, we propose a new energy-efficient In other words, the server association actions of clients outside
client clustering algorithm. Γ(𝑛) has no impact on client 𝑛’s utility as they will never be in the
To describe our algorithm, we define the “utility” of a client 𝑛 as same cluster with client 𝑛. Likewise, client 𝑛’s server association
∑︁ action also has no impact on the utilities of clients outside Γ(𝑛).
𝑢𝑛 (𝒂) = 𝛾 (1 − 𝛼 · Δ𝑛,𝑚 ) − (1 − 𝛾)𝐸𝑛 (𝑎𝑛 ) Therefore, by canceling out utilities of these clients on both the
(14)
𝑚:𝑎 (𝑚)=𝑎 (𝑛) denominator and the numerator in (16), client 𝑖’s action can be

721
Client Clustering for Energy-Efficient Clustered Federated Learning in Wireless Networks UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico

*+,$-./"01&%23'4 5'/6+0'&6'&$&1%/%-31
can be seen in Fig. 3, by increasing the preference parameter 𝛾, intra-
" $ cluster data similarity (in terms of the weighted sum of per-cluster
effective number of clients) is higher. In other words, clients with
similar data distributions tend to be clustered together. However,
! " # $ ! # this is achieved at a higher energy cost as the clients may have to
connect to farther-away servers. On the other hand, decreasing 𝛾
!"#$"%&'()*+!, ! " # $ achieves a higher energy efficiency. However, data distributions
of the clients in the formed clusters may be very diverse, thereby
-(.())")*+!, ! $ " # degrading the FL performance. Fig. 4 further shows the obtained
5'6")'%"
model accuracy and learning loss on the Fashion-MNIST dataset
/&".(&'0%*1 /&".(&'0%*2 /&".(&'0%*3 /&".(&'0%*4
for different values of 𝛾. As can be seen, the FL performance is
correlated with data similarity and hence exhibits a similar trade-
Figure 1: Illustration of parallel GS. off as before.
updated equivalently as
exp( 𝑛∈Γ (𝑖 ) 𝑢𝑛 (𝑘, 𝒂𝑡Γ−1
Í
(𝑖 )
)/𝜏)
𝑡
𝑎𝑖 ← 𝑘, with prob. Í (18)
′ 𝑡 −1
Í
𝑘 ′ ∈ K𝑖 exp( 𝑛∈Γ (𝑖 ) 𝑢𝑛 (𝑘 , 𝒂 Γ (𝑖 ) )/𝜏)
With this observation, we can divide clients into 𝑍 < 𝑁 groups
so that no two clients in the same group are neighbors, and update
all clients in a group at the same time. Finding the minimum value
of 𝑍 is equivalent to a graph coloring problem, which we do not
discuss in detail in this paper. In our implementation, we use the
Figure 4: Accuracy and Loss Figure 5: CCR v.s. GS
sequential coloring algorithm [8] to generate the colors and divide
the clients into groups. The difference between sequential GS and
parallel GS is illustrated in Fig. 1. In Proposition 1, we formally state Comparison with Continuous Convex Relaxation. We com-
the convergence and the optimality of the proposed algorithm. pare the GS-based algorithm with the continuous convex relaxation
(CCR) method in (13). Because CCR cannot handle the energy cost
Proposition 1. Let {𝜏𝑡 } be a decreasing sequence of control pa-
or the partial access to the server set, we let 𝛾 = 0 and consider a
rameters over iterations. As 𝜏𝑡 → 0, proposed algorithm converges
setting where clients can access all servers for a fair comparison.
to the global optimal solution of (15) with probability 1.
Fig. 5 shows the normalized utility gap by changing 𝛼 (which is the
parameter in 𝜂 (N )). As can be seen, our algorithm outperforms
5 SIMULATIONS
CCR as expected. When 𝛼 is small (i.e., the data heterogeneity plays
In this section, we evaluate the proposed client clustering algorithm a smaller role), then the optimal clustering outcome is a grand clus-
in simulations. We simulate a wireless network with 9 BSes (servers) ter, which are relatively easy to find. In these cases, the performance
deployed in a grid layout. 60 users (clients) are randomly distributed of our algorithm and that of CCR is similar.
over a 500m x 500m area. The server/client transmission range is
200 m. The path loss model is 128.1 + 37.6 log10 𝑑 where d is the 6 CONCLUSIONS
distance which is measured in km, and the standard deviation of
This paper made an initial attempt to study client clustering for
shadow fading is 8 dB. In addition, we simulate the noise power
clustered FL in a wireless network by taking into account the net-
spectral density which labeled as 𝑁 0 equals to −174 dBm/HZ.
work constraints and objectives. Based on the effective number of
Algorithm Convergence. We first study the convergence of
clients, we formulated an energy-efficient clustering optimization
the proposed parallel GS-based algorithm in Fig. 2 (𝛾 = 0.75 and
problem and developed new algorithm to find the optimal/stable
𝜏 = 0.1). As illustrated, our parallel GS-based algorithm converges
clusters. Several future research directions are worth exploring to
after about 40 iterations, and is much faster than the conventional
further advance this study. First, dynamic client clustering may
sequential GS-based algorithm.
discover more accurate client correlations based on intermediate
learning outcomes. Such information may be incorporated into
our formulation to perform periodic (re-)clustering. Second, for
mobile clients, the energy cost by associating to different servers
becomes more difficult to evaluate. Third, when the BS has limited
bandwidth resource, client clustering may have to be performed
jointly with resource allocation.

REFERENCES
Figure 2: Sequential/Parallel Figure 3: Trade-off [1] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation clustering.
GS Machine learning 56, 1 (2004), 89–113.
[2] Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ramchandran. 2020. An ef-
Learning v.s. Energy Tradeoff. We investigate the trade-off ficient framework for clustered federated learning. arXiv preprint arXiv:2006.04088
between the learning performance and the system energy cost. As (2020).

722
UbiComp/ISWC ’23 Adjunct, October 08–12, 2023, Cancun, Quintana Roo, Mexico Jieming Bian and Jie Xu

[3] John Hammersley. 2013. Monte carlo methods. Springer Science & Business Media. [7] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and
[4] Jakub Konecny, H Brendan McMahan, Felix X Yu, Peter Richtarik, Ananda Theertha Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks
Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
communication efficiency. arXiv preprint arXiv:1610.05492 (2016). [8] John Mitchem. 1976. On various algorithms for estimating the chromatic number
[5] Solomon Kullback. 1997. Information theory and statistics. Courier Corporation. of a graph. Comput. J. 19, 2 (1976), 182–183.
[6] Yishay Mansour, Mehryar Mohri, Jae Ro, and Ananda Theertha Suresh. 2020. Three [9] Felix Sattler, Klaus-Robert Muller, and Wojciech Samek. 2020. Clustered feder-
approaches for personalization with applications to federated learning. arXiv ated learning: Model-agnostic distributed multitask optimization under privacy
preprint arXiv:2002.10619 (2020). constraints. IEEE transactions on neural networks and learning systems (2020).

723

Dynamic Clustering in Federated Learning
No ratings yet
Dynamic Clustering in Federated Learning
6 pages
ML in Wireless Sensor Networks
100% (1)
ML in Wireless Sensor Networks
25 pages
Client Selection For Federated Learning With Heterogeneous Resources in Mobile Edge
No ratings yet
Client Selection For Federated Learning With Heterogeneous Resources in Mobile Edge
7 pages
Federated Learning Resource
No ratings yet
Federated Learning Resource
8 pages
1.5.-Federated Learning Over Wireless Networks Convergence Analysis and Resource Allocation
No ratings yet
1.5.-Federated Learning Over Wireless Networks Convergence Analysis and Resource Allocation
12 pages
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
No ratings yet
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
8 pages
Non-IID Data in FL-Survey With Taxonomy-Metrics
No ratings yet
Non-IID Data in FL-Survey With Taxonomy-Metrics
25 pages
A Survey On Federated Learning For Resource-Constrained IoT Devices
No ratings yet
A Survey On Federated Learning For Resource-Constrained IoT Devices
24 pages
An Energy and Carbon Footprint Analysis of Distributed and Federated Learning
No ratings yet
An Energy and Carbon Footprint Analysis of Distributed and Federated Learning
17 pages
Implementation of Federated Learning On Resource-Constrained Devices Lessons Learned
No ratings yet
Implementation of Federated Learning On Resource-Constrained Devices Lessons Learned
6 pages
SR24105145623
No ratings yet
SR24105145623
7 pages
Strategies Based On Various Aspects of Clustering in Wireless Sensor
No ratings yet
Strategies Based On Various Aspects of Clustering in Wireless Sensor
43 pages
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
No ratings yet
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
33 pages
BT 3
No ratings yet
BT 3
28 pages
Accelerating Federated Learning With Cluster Construction and Hierarchical Aggregation
No ratings yet
Accelerating Federated Learning With Cluster Construction and Hierarchical Aggregation
18 pages
Scheduling Algorithms For Federated Learning With Minimal Energy Consumption
No ratings yet
Scheduling Algorithms For Federated Learning With Minimal Energy Consumption
12 pages
A148-Hamideche Final-1
No ratings yet
A148-Hamideche Final-1
7 pages
Efficient and Intelligent Multi-Job Federated Learning in Wireless Networks
No ratings yet
Efficient and Intelligent Multi-Job Federated Learning in Wireless Networks
13 pages
Fed Adp
No ratings yet
Fed Adp
11 pages
FedLesScan: Serverless FL Optimization
No ratings yet
FedLesScan: Serverless FL Optimization
13 pages
Adaptive Clustered Federated Learning For Clients With Time-Varying Interests
No ratings yet
Adaptive Clustered Federated Learning For Clients With Time-Varying Interests
10 pages
Paper 2
No ratings yet
Paper 2
21 pages
Semi-supervised Federated Learning on Evolving Data Streams
No ratings yet
Semi-supervised Federated Learning on Evolving Data Streams
22 pages
Aop Iccps 2024
No ratings yet
Aop Iccps 2024
10 pages
Electronics 11 02071
No ratings yet
Electronics 11 02071
13 pages
Federated Learning For Computationally Constrained Heterogeneous Devices A Survey
No ratings yet
Federated Learning For Computationally Constrained Heterogeneous Devices A Survey
27 pages
2022 IEEEIoTM FLIoTVision
No ratings yet
2022 IEEEIoTM FLIoTVision
6 pages
ML for Wireless Sensor Networks
No ratings yet
ML for Wireless Sensor Networks
25 pages
A Survey On Federated Learning: Challenges and Applications
No ratings yet
A Survey On Federated Learning: Challenges and Applications
23 pages
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
No ratings yet
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
16 pages
Real-Time Prediction Using Fog-Based Federated Learning and Genetic Hyperparameter Optimisation
No ratings yet
Real-Time Prediction Using Fog-Based Federated Learning and Genetic Hyperparameter Optimisation
10 pages
A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
No ratings yet
A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
5 pages
Federated Learning in Iot: A Survey From A Resource-Constrained Perspective
No ratings yet
Federated Learning in Iot: A Survey From A Resource-Constrained Perspective
6 pages
Base Paper 2
No ratings yet
Base Paper 2
5 pages
LSP Wireless Network Attacks Using Supervised Machine Learning Techniques
No ratings yet
LSP Wireless Network Attacks Using Supervised Machine Learning Techniques
28 pages
Bonawitz, Eichner Et Al 2019 - Towards Federated Learning at Scale
No ratings yet
Bonawitz, Eichner Et Al 2019 - Towards Federated Learning at Scale
15 pages
Towards Federated Learning at Scale System Design
No ratings yet
Towards Federated Learning at Scale System Design
15 pages
Electronics 13 02135
No ratings yet
Electronics 13 02135
16 pages
Fairness in Federated Learning
No ratings yet
Fairness in Federated Learning
77 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
Auction Based Clustered Federated Learning in Mobile Edge Computing System
No ratings yet
Auction Based Clustered Federated Learning in Mobile Edge Computing System
13 pages
Federated Learning Systems Survey
No ratings yet
Federated Learning Systems Survey
41 pages
20785-Article Text-24798-1-2-20220628
No ratings yet
20785-Article Text-24798-1-2-20220628
8 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
Knowledge-Aided Federated Learning For Energy-Limited Wireless Networks
No ratings yet
Knowledge-Aided Federated Learning For Energy-Limited Wireless Networks
19 pages
Optimal Transport-Based One-Shot Federated Learning For Artificial Intelligence of Things
No ratings yet
Optimal Transport-Based One-Shot Federated Learning For Artificial Intelligence of Things
15 pages
20894-Article Text-24907-1-2-20220628
No ratings yet
20894-Article Text-24907-1-2-20220628
9 pages
Energy Efficient Federated Learning Over Wireless
No ratings yet
Energy Efficient Federated Learning Over Wireless
14 pages
Clustered Federated Learning - Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
No ratings yet
Clustered Federated Learning - Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
13 pages
7 GT
No ratings yet
7 GT
40 pages
Federated Learning
No ratings yet
Federated Learning
15 pages
Federated Learning: Technologies & Applications
No ratings yet
Federated Learning: Technologies & Applications
68 pages
Federated Visual Classification With Real-World Data Distribution
No ratings yet
Federated Visual Classification With Real-World Data Distribution
21 pages
012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges
No ratings yet
012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges
16 pages
Federated Learning Survey
No ratings yet
Federated Learning Survey
33 pages
A Close Look at The Communication Efficiency and The Energy Footprints of Robust Federated Learning in Industrial IoT
No ratings yet
A Close Look at The Communication Efficiency and The Energy Footprints of Robust Federated Learning in Industrial IoT
21 pages
2021 - D. Lian - Federated Learning Over Cellular-Connected UAV Networks With Non-IID Datasets
No ratings yet
2021 - D. Lian - Federated Learning Over Cellular-Connected UAV Networks With Non-IID Datasets
6 pages
Towards Energy-Aware Federated Learning Via MARL: A Dual-Selection Approach For Model and Client
No ratings yet
Towards Energy-Aware Federated Learning Via MARL: A Dual-Selection Approach For Model and Client
9 pages
Split Federated Learning for 6G Networks
No ratings yet
Split Federated Learning for 6G Networks
37 pages
Joint Antenna Position and Beamforming Optimization For Movable Antenna Enabled Secure IRS-ISAC Network
No ratings yet
Joint Antenna Position and Beamforming Optimization For Movable Antenna Enabled Secure IRS-ISAC Network
15 pages
Non-Stationary Channel Estimation For Extremely Large-Scale MIMO
No ratings yet
Non-Stationary Channel Estimation For Extremely Large-Scale MIMO
15 pages
Rotatable Antenna Aided Mixed Near-Field and Far-Field Communications in The Upper Mid-Band-Interference Analysis and Joint Optimization
No ratings yet
Rotatable Antenna Aided Mixed Near-Field and Far-Field Communications in The Upper Mid-Band-Interference Analysis and Joint Optimization
13 pages
RIS-Assisted Joint Near-Field Channel Estimation and Beamforming Design Based On Deep Unfolding
No ratings yet
RIS-Assisted Joint Near-Field Channel Estimation and Beamforming Design Based On Deep Unfolding
12 pages
Channel Estimation For Extremely Large-Scale Massive MIMO Far-Field Near-Field or Hybrid-Field
No ratings yet
Channel Estimation For Extremely Large-Scale Massive MIMO Far-Field Near-Field or Hybrid-Field
5 pages
Autoencoders - by Kavishka Abeywardana - Medium
No ratings yet
Autoencoders - by Kavishka Abeywardana - Medium
19 pages
Lab Report of The Fall Semester 2016
No ratings yet
Lab Report of The Fall Semester 2016
2 pages
A D F L: Symmetrically Ecentralized Ederated Earning
No ratings yet
A D F L: Symmetrically Ecentralized Ederated Earning
18 pages
Multi-Beam Design For Extremely Large-Scale RIS Aided Near-Field Wireless Communications
No ratings yet
Multi-Beam Design For Extremely Large-Scale RIS Aided Near-Field Wireless Communications
6 pages
EEE Fall 2017 Grade Report
No ratings yet
EEE Fall 2017 Grade Report
1 page
2.11 Lesson Plan Form - Microwave Engineering
No ratings yet
2.11 Lesson Plan Form - Microwave Engineering
1 page
4 11
No ratings yet
4 11
189 pages
2.11 Lesson Plan Form - Microwave Engineering - N
No ratings yet
2.11 Lesson Plan Form - Microwave Engineering - N
1 page
Lab Report of The Fall Semester 2016
No ratings yet
Lab Report of The Fall Semester 2016
2 pages
EEE 57 - C - Student's Information
No ratings yet
EEE 57 - C - Student's Information
2 pages
Application Salary Certificate
No ratings yet
Application Salary Certificate
1 page
PES 3950/PHYS 6950 Homework 3 Details
No ratings yet
PES 3950/PHYS 6950 Homework 3 Details
12 pages
Lesson Plan Form: Title: Ref. No: Target Population: Duration Aims/Rationale: Learning Outcomes
No ratings yet
Lesson Plan Form: Title: Ref. No: Target Population: Duration Aims/Rationale: Learning Outcomes
1 page
Mid 56G Microwave Engineering
No ratings yet
Mid 56G Microwave Engineering
2 pages
PES 3950/PHYS 6950 Homework 3 Details
No ratings yet
PES 3950/PHYS 6950 Homework 3 Details
12 pages
Effect of Slow Fading and Adaptive Modulation On TCP/UDP Performance of High-Speed Packet Wireless Networks
No ratings yet
Effect of Slow Fading and Adaptive Modulation On TCP/UDP Performance of High-Speed Packet Wireless Networks
140 pages
03 Agriculture in The Western Himalayas An Asset Turning Into A Liability
No ratings yet
03 Agriculture in The Western Himalayas An Asset Turning Into A Liability
7 pages
HSE Training Matrix
100% (5)
HSE Training Matrix
1 page
PG Information Brochure Final - 2023-24
No ratings yet
PG Information Brochure Final - 2023-24
30 pages
Half Life and Decay Constant
No ratings yet
Half Life and Decay Constant
9 pages
Urban - London Case Study Summary
No ratings yet
Urban - London Case Study Summary
4 pages
Parenting Competencies Supporting The Development of Social and Emotional Skills
No ratings yet
Parenting Competencies Supporting The Development of Social and Emotional Skills
24 pages
Barron S AP Human Geography 6th Edition Meredith Marsh Full
100% (4)
Barron S AP Human Geography 6th Edition Meredith Marsh Full
100 pages
Department of Education: Schools Division of Iligan City District Name of Awardees Level/Category Date Regional National
No ratings yet
Department of Education: Schools Division of Iligan City District Name of Awardees Level/Category Date Regional National
4 pages
DIRAC DELTA FUNCTION.docx
No ratings yet
DIRAC DELTA FUNCTION.docx
8 pages
Unsupervised Learning and Pre-Processing Notes
No ratings yet
Unsupervised Learning and Pre-Processing Notes
13 pages
Instrumentation and Control Systems William Bolton PDF Download
No ratings yet
Instrumentation and Control Systems William Bolton PDF Download
126 pages
Climate Change Issue in The Pakistani Press: Exploring The Predominant Themes During and After The 2022-Floods
No ratings yet
Climate Change Issue in The Pakistani Press: Exploring The Predominant Themes During and After The 2022-Floods
13 pages
Homomorphisms of Groups 3
No ratings yet
Homomorphisms of Groups 3
65 pages
Machine Learning in Materials Science
No ratings yet
Machine Learning in Materials Science
7 pages
Intelligence Bureau Assistant Central Intelligence Officer Grade II Executive Exam Sample Paper
100% (2)
Intelligence Bureau Assistant Central Intelligence Officer Grade II Executive Exam Sample Paper
14 pages
Slides (Handout) - Caio - Chapter 4 (Wooldridge)
No ratings yet
Slides (Handout) - Caio - Chapter 4 (Wooldridge)
99 pages
Dissertation Timeline Umich
100% (2)
Dissertation Timeline Umich
6 pages
Sociological Perspective 2
No ratings yet
Sociological Perspective 2
22 pages
APHMEL Contract Job Openings
No ratings yet
APHMEL Contract Job Openings
13 pages
NM 100 Partb Vol1 Development Control Regulations and Guidelines HQ
No ratings yet
NM 100 Partb Vol1 Development Control Regulations and Guidelines HQ
578 pages
Biaya Kuliah di Jakarta Global University
100% (1)
Biaya Kuliah di Jakarta Global University
4 pages
4.2 Expected Value and Variance of Continuous Random Variables
No ratings yet
4.2 Expected Value and Variance of Continuous Random Variables
2 pages
The Mind's Manual Telugu
100% (2)
The Mind's Manual Telugu
112 pages
Life Attitudes Scale LAS Tragic Optimism Wong Leung Steinfort Vroon 2002
No ratings yet
Life Attitudes Scale LAS Tragic Optimism Wong Leung Steinfort Vroon 2002
3 pages
Chaper 7 Voltage Regulation: Presentation #2
No ratings yet
Chaper 7 Voltage Regulation: Presentation #2
21 pages
Stats Tests for Thesis Students
No ratings yet
Stats Tests for Thesis Students
1 page
Chemistry e Waste 01
No ratings yet
Chemistry e Waste 01
4 pages
PCS 213 Lab 2
No ratings yet
PCS 213 Lab 2
11 pages
Mathematics LO Only 0862 - tcm143-592642
No ratings yet
Mathematics LO Only 0862 - tcm143-592642
9 pages
Cocoa Details and Required Technical Specifications
No ratings yet
Cocoa Details and Required Technical Specifications
2 pages

Client Clustering For Energy-Efficient Clustered Federated Learning in Wireless Networks

Uploaded by

Client Clustering For Energy-Efficient Clustered Federated Learning in Wireless Networks

Uploaded by

Client Clustering for Energy-Efficient Clustered Federated

Learning in Wireless Networks

You might also like