Skip to main content

taeho jung

Followers

2

Following

0

Public Views

Liliana González

Liliana Gonzalez

Florida International University

University of Maryland Baltimore County

Calicut University

Purdue University

Interests

Uploads

Papers by taeho jung

Ensuring Privacy in Location-Based Services: A Model-based Approach

arXiv (Cornell University), Feb 24, 2020

In recent years, the widespread of mobile devices equipped with GPS and communication chips has l... more In recent years, the widespread of mobile devices equipped with GPS and communication chips has led to the growing use of location-based services (LBS) in which a user receives a service based on his current location. The disclosure of user's location, however, can raise serious concerns about user privacy in general, and location privacy in particular which led to the development of various location privacy-preserving mechanisms aiming to enhance the location privacy while using LBS applications. In this paper, we propose to model the user mobility pattern and utility of the LBS as a Markov decision process (MDP), and inspired by probabilistic current state opacity notation, we introduce a new location privacy metric, namely −privacy, that quantifies the adversary belief over the user's current location. We exploit this dynamic model to design a LPPM that while it ensures the utility of service is being fully utilized, independent of the adversary prior knowledge about the user, it can guarantee a user-specified privacy level can be achieved for an infinite time horizon. The overall privacy-preserving framework, including the construction of the user mobility model as a MDP, and design of the proposed LPPM, are demonstrated and validated with real-world experimental data.

Non-Interactive MPC with Trusted Hardware Secure Against Residual Function Attacks

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2019

Secure multiparty computation (MPC) has been repeatedly optimized, and protocols with two communi... more Secure multiparty computation (MPC) has been repeatedly optimized, and protocols with two communication rounds and strong security guarantees have been achieved. While progress has been made constructing non-interactive protocols with just one-round of online communication (i.e., non-interactive MPC or NI-MPC), since correct evaluation must be guaranteed with only one round, these protocols are by their nature vulnerable to the residual function attack in the standard model. This is because a party that receives a garbled circuit may repeatedly evaluate the circuit locally, while varying their own inputs and fixing the inputs of others to learn the values entered by other participants. We present the first MPC protocol with a one-round online phase that is secure against the residual function attack. We also present rigorous proofs of correctness and security in the covert adversary model, a reduction of the malicious model that is stronger than the semi-honest model and better suited for modeling the behaviour of parties in the real world, for our protocol. Furthermore, we rigorously analyze the communication and computational complexity of current state of the art protocols which require two rounds of communication or one round during the online-phase with a reduced security requirement, and demonstrate that our protocol is comparable to or outperforms their complexity.

Federated Dynamic Graph Neural Networks with Secure Aggregation for Video-based Distributed Surveillance

ACM Transactions on Intelligent Systems and Technology, 2022

Distributed surveillance systems have the ability to detect, track, and snapshot objects moving a... more Distributed surveillance systems have the ability to detect, track, and snapshot objects moving around in a certain space. The systems generate video data from multiple personal devices or street cameras. Intelligent video-analysis models are needed to learn dynamic representation of the objects for detection and tracking. Can we exploit the structural and dynamic information without storing the spatiotemporal video data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from graph sequences: (1) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. (2) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local m...

Secure Single-Server Nearly-Identical Image Deduplication

2020 29th International Conference on Computer Communications and Networks (ICCCN), 2020

Cloud computing is often utilized for file storage. Clients of cloud storage services want to ens... more Cloud computing is often utilized for file storage. Clients of cloud storage services want to ensure the privacy of their data, and both clients and servers want to use as little storage as possible. Cross-user deduplication is one method to reduce the amount of storage a server uses. Deduplication and privacy are naturally conflicting goals, especially for nearly-identical ("fuzzy") deduplication, as some information about the data must be used to perform deduplication. Prior solutions thus utilize multiple servers, or only function for exact deduplication. In this paper, we present a single-server protocol for cross-user nearly-identical deduplication based on secure locality-sensitive hashing (SLSH). We formally define our ideal security, and rigorously prove our protocol secure against fully malicious, colluding adversaries with a proof by simulation. We show experimentally that the individual parts of the protocol are computationally feasible, and further discuss practical issues of security and efficiency.

VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices

ArXiv, 2017

Voice input has been tremendously improving the user experience of mobile devices by freeing our ... more Voice input has been tremendously improving the user experience of mobile devices by freeing our hands from typing on the small screen. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, the cloud might compromise users' privacy by identifying their identities by voice, learning their sensitive input content via speech recognition, and then profiling the mobile users based on the content. In this paper, we design an intermediate between users and the cloud, named VoiceMask, to sanitize users' voice data before sending it to the cloud for speech recognition. We analyze the potential privacy risks and aim to protect users' identities and sensitive input content from being disclosed to the cloud. VoiceMask adopts a carefully designed voice conversion mechanism that is resistant to several attacks. Meanwhile, it utilizes an evolution-based keyword substitution technique to sanitize th...

Federated Dynamic GNN with Secure Aggregation

ArXiv, 2020

Given video data from multiple personal devices or street cameras, can we exploit the structural ... more Given video data from multiple personal devices or street cameras, can we exploit the structural and dynamic information to learn dynamic representation of objects for applications such as distributed surveillance, without storing data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from multi-user graph sequences: i) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. ii) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local models on the respective user devices learn and periodically send their learning to the central server without ever exposing the user's data to server. iii) Studies showed that...

Cryptonite: A Framework for Flexible Time-Series Secure Aggregation with Online Fault Tolerance

IACR Cryptol. ePrint Arch., 2020

Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a... more Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a set of multiple participants’ data while ensuring the data remains private. Existing works rely on a trusted party to enable an aggregator to achieve offline fault tolerance, but in the real world this may not be practical. We develop a new framework that supports PSA in a way that is robust to online user faults, while still supporting a strong guarantee on each individual’s privacy. We first must define a new level of security in the presence of online faults and malicious adversaries because the existing definition does not account for online faults. After this we describe a general framework that allows existing work to reach this new level of security. Furthermore, we develop the first protocol that provably reaches this level of security by leveraging trusted hardware. After we develop a methodology to outsource computationally intensive work to higher performance devices, while s...

Enabling Faster Operations for Deeper Circuits in Full RNS Variants of FV-like Somewhat Homomorphic Encryption

IACR Cryptol. ePrint Arch., 2020

Though Fully Homomorphic Encryption (FHE) has been realized, most practical implementations utili... more Though Fully Homomorphic Encryption (FHE) has been realized, most practical implementations utilize leveled Somewhat Homomorphic Encryption (SHE) schemes, which have limits on the multiplicative depth of the circuits they can evaluate and avoid computationally intensive bootstrapping. Many SHE schemes exist, among which those based on Ring Learning With Error (RLWE) with operations on large polynomial rings are popular. Of these, variants allowing operations to occur fully in Residue Number Systems (RNS) have been constructed. This optimization allows homomorphic operations directly on RNS components without needing to reconstruct numbers from their RNS representation, making SHE implementations faster and highly parallel. In this paper, we present a set of optimizations to a popular RNS variant of the B/FV encryption scheme that allow for the use of significantly larger ciphertext moduli (e.g., thousands of bits) without increased overhead due to excessive numbers of RNS components...

GPS: Integration of Graphene, PALISADE, and SGX for Large-scale Aggregations of Distributed Data

IACR Cryptol. ePrint Arch., 2021

Secure computing methods such as fully homomorphic encryption and hardware solutions such as Inte... more Secure computing methods such as fully homomorphic encryption and hardware solutions such as Intel Software Guard Extension (SGX) have been applied to provide security for user input in privacy-oriented computation outsourcing. Fully homomorphic encryption is amenable to parallelization and hardware acceleration to improve its scalability and latency, but is limited in the complexity of functions it can efficiently evaluate. SGX is capable of arbitrarily complex calculations, but due to expensive memory paging and context switches, computations in SGX are bound by practical limits. These limitations make either of fully homomorphic encryption or SGX alone unsuitable for large-scale multi-user computations with complex intermediate calculations. In this paper, we present GPS, a novel framework integrating the Graphene, PALISADE, and SGX technologies. GPS combines the scalability of homomorphic encryption with the arbitrary computational abilities of SGX, forming a more functional and...

Graph-based privacy-preserving data publication

IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, 2016

We propose a graph-based framework for privacy preserving data publication, which is a systematic... more We propose a graph-based framework for privacy preserving data publication, which is a systematic abstraction of existing anonymity approaches and privacy criteria. Graph is explored for dataset representation, background knowledge specification, anonymity operation design, as well as attack inferring analysis. The framework is designed to accommodate various datasets including social networks, relational tables, temporal and spatial sequences, and even possible unknown data models. The privacy and utility measurements of the anonymity datasets are also quantified in terms of graph features. Our experiments show that the framework is capable of facilitating privacy protection by different anonymity approaches for various datasets with desirable performance.

Computing-in-Memory for Performance and Energy-Efficient Homomorphic Encryption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020

Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous resear... more Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE computations degrades computational efficiency. Near-memory Processing (NMP) and Computing-in-memory (CiM)-paradigms where computation is done within the memory boundariesrepresent architectural solutions for reducing latency and energy associated with data transfers in data-intensive applications such as HE. This paper introduces CiM-HE, a Computing-in-memory (CiM) architecture that can support operations for the B/FV scheme, a somewhat homomorphic encryption scheme for general computation. CiM-HE hardware consists of customized peripherals such as sense amplifiers, adders, bit-shifters, and sequencing circuits. The peripherals are based on CMOS technology, and could support computations with memory cells of different technologies. Circuit-level simulations are used to evaluate our CiM-HE framework assuming a 6T-SRAM memory. We compare our CiM-HE implementation against (i) two optimized CPU HE implementations, and (ii) an FPGA-based HE accelerator implementation.When compared to a CPU solution, CiM-HE obtains speedups between 4.6x and 9.1x, and energy savings between 266.4x and 532.8x for homomorphic multiplications (the most expensive HE operation). Also, a set of four end-toend tasks, i.e., mean, variance, linear regression, and inference are up to 1.1x, 7.7x, 7.1x, and 7.5x faster (and 301.1x, 404.6x, 532.3x, and 532.8x more energy efficient). Compared to CPUbased HE in a previous work, CiM-HE obtain 14.3x speed-up and >2600x energy savings. Finally,our design offers 2.2x speed-up with 88.1x energy savings compared to a state-of-the-art FPGAbased accelerator.

Cryptonite: A Framework for Flexible Time-Series Secure Aggregation with Non-interactive Fault Recovery

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2021

Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a... more Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a set of multiple participants' data while ensuring the data remains private. Existing works rely on a trusted third party to enable an aggregator to achieve fault tolerance, that requires interactive recovery, but in the real world this may not be practical or secure. We develop a new formal framework for PSA that accounts for user faults, and can support non-interactive recovery, while still supporting strong individual privacy guarantees. We first must define a new level of security in the presence of faults and malicious adversaries because the existing definitions do not account for faults and the security implications of the recovery. After this we develop the first protocol that provably reaches this level of security, i.e., individual inputs are private even after the aggregator's recovery, and reach new levels of scalability and communication efficiency over existing work seeking to support fault tolerance. The techniques we develop are general, and can be used to augment any PSA scheme to support non-interactive fault recovery.

Cryptonomial: A Framework for Private Time-Series Polynomial Calculations

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2021

In modern times, data collected from multiuser distributed applications must be analyzed on a mas... more In modern times, data collected from multiuser distributed applications must be analyzed on a massive scale to support critical business objectives. While analytics often requires the use of personal data, it may compromise user privacy expectations if this analysis is conducted over plaintext data. Private Stream Aggregation (PSA) allows for the aggregation of time-series data, while still providing strong privacy guarantees, and is significantly more efficient over a network than related techniques (e.g. homomorphic encryption, secure multiparty computation, etc.) due to its asynchronous and efficient protocols. However, PSA protocols face limitations and can only compute basic functions, such as sum, average, etc.. We present Cryptonomial, a framework for converting any PSA scheme amenable to a complex canonical embedding into a secure computation protocol that can compute any function over timeseries data that can be written as a multivariate polynomial, by combining PSA and a Trusted Execution Environment. This design allows us to compute the parallelizable sections of our protocol outside the TEE using advanced hardware, that can take better advantage of parallelism. We show that Cryptonomial inherits the security requirements of PSA, and supports fully malicious security. We simulate our scheme, and show that our techniques enable performance that is orders of magnitude faster than similar work supporting polynomial calculations.

CASTLE: Enhancing the Utility of Inequality Query Auditing Without Denial Threats

IEEE Transactions on Information Forensics and Security, 2018

Scalable Privacy-Preserving Participant Selection for Mobile Crowdsensing Systems: Participant Grouping and Secure Group Bidding

IEEE Transactions on Network Science and Engineering, 2018

Mobile crowdsensing (MCS) has been emerging as a new sensing paradigm where vast numbers of mobil... more Mobile crowdsensing (MCS) has been emerging as a new sensing paradigm where vast numbers of mobile devices are used for sensing and collecting data in various applications. Auction based participant selection has been widely used for current MCS systems to achieve user incentive and task assignment optimization. However, participant selection problems solved with auction-based approaches usually involve participants' privacy concerns because a participant's bids may contain her private information (such as location visiting patterns), and disclosure of participants' bids may disclose their private information as well. In this paper, we study how to protect such bid privacy in a temporally and spatially dynamic MCS system. We assume that both sensing tasks and mobile participants have dynamic characteristics over spatial and temporal domains. Following the classical VCG auction, we carefully design a scalable grouping based privacy-preserving participant selection scheme, where participants are grouped into multiple participant groups and then auctions are organized within groups via secure group bidding. By leveraging Lagrange polynomial interpolation to perturb participants' bids within groups, participants' bid privacy is preserved. In addition, the proposed solution does not affect the operation of current MCS platform since the groups act as regular users to the platform. Both theoretical analysis and real-life tracing data simulations verify the efficiency and security of the proposed solution.

Social Network De-anonymization

ACM Transactions on Internet Technology, 2019

Previous works on social network de-anonymization focus on designing accurate and efficient de-an... more Previous works on social network de-anonymization focus on designing accurate and efficient de-anonymization methods. We attempt to investigate the intrinsic relationship between the attacker’s knowledge and the expected de-anonymization gain. A common intuition is that more knowledge results in more successful de-anonymization. However, our analysis shows this is not necessarily true if the attacker uses the full background knowledge for de-anonymization. Our findings leave intriguing implications for the attacker to make better use of the background knowledge for de-anonymization and for the data owners to better measure the privacy risk when releasing their data to third parties.

PDA: Semantically Secure Time-Series Data Analytics with Dynamic User Groups

IEEE Transactions on Dependable and Secure Computing, 2016

Third-party analysis on private records is becoming increasingly important due to the widespread ... more Third-party analysis on private records is becoming increasingly important due to the widespread data collection for various analysis purposes. However, the data in its original form often contains sensitive information about individuals, and its publication will severely breach their privacy. In this paper, we present a novel Privacy-preserving Data Analytics framework PDA, which allows a third-party aggregator to obliviously conduct many different types of polynomial-based analysis on private data records provided by a dynamic subgroup of users. Notably, every user needs to keep only O(n) keys to join data analysis among O(2 n) different groups of users, and any data analysis that is represented by polynomials is supported by our framework. Besides, a real implementation shows the performance of our framework is comparable to the peer works who present ad-hoc solutions for specific data analysis applications. Despite such nice properties of PDA, it is provably secure against a very powerful attacker (chosen-plaintext attack) even in the Dolev-Yao network model where all communication channels are insecure.

Social Network De-Anonymization and Privacy Inference with Knowledge Graph Model

IEEE Transactions on Dependable and Secure Computing, 2017

Social network data is widely shared, transferred and published for research purposes and busines... more Social network data is widely shared, transferred and published for research purposes and business interests, but it has raised much concern on users' privacy. Even though users' identity information is always removed, attackers can still de-anonymize users with the help of auxiliary information. To protect against de-anonymization attack, various privacy protection techniques for social networks have been proposed. However, most existing approaches assume specific and restrict network structure as background knowledge and ignore semantic level prior belief of attackers, which are not always realistic in practice and do not apply to arbitrary privacy scenarios. Moreover, the privacy inference attack in the presence of semantic background knowledge is barely investigated. To address these shortcomings, in this work, we introduce knowledge graphs to explicitly express arbitrary prior belief of the attacker for any individual user. The processes of de-anonymization and privacy inference are accordingly formulated based on knowledge graphs. Our experiment on data of real social networks shows that knowledge graphs can power de-anonymization and inference attacks, and thus increase the risk of privacy disclosure. This suggests the validity of knowledge graphs as a general effective model of attackers' background knowledge for social network attack and privacy preservation.

PIC: Enable Large-Scale Privacy Preserving Content-Based Image Search on Cloud

2015 44th International Conference on Parallel Processing, 2015

High-resolution cameras produce huge volume of high quality images everyday. It is extremely chal... more High-resolution cameras produce huge volume of high quality images everyday. It is extremely challenging to store, share and especially search those huge images, for which increasing number of cloud services are presented to support such functionalities. However, images tend to contain rich sensitive information (e.g., people, location and event), and people's privacy concerns hinder their readily participation into the services provided by untrusted third parties. In this work, we introduce PIC: a Privacy-preserving large-scale Image search system on Cloud. Our system enables efficient yet secure content-based image search with fine-grained access control, and it also provides privacy-preserving image storage and sharing among users. Users can specify who can/cannot search on their images when using the system, and they can search on others' images if they satisfy the condition specified by the image owners. Majority of the computationally intensive jobs are outsourced to the cloud side, and users only need to submit the query and receive the result throughout the entire image search. Specially, to deal with massive images, we design our system suitable for distributed and parallel computation and introduce several optimizations to further expedite the search process. We implement a prototype of PIC including both cloud side and client side. The cloud side is a cluster of computers with distributed file system (Hadoop HDFS) and MapReduce architecture (Hadoop MapReduce). The client side is built for both Windows OS laptops and Android phones. We evaluate the prototype system with large sets of real-life photos. Our security analysis and evaluation results show that PIC successfully protect the image privacy at a low cost of computation and communication.

A Framework for Optimization in Big Data: Privacy-Preserving Multi-agent Greedy Algorithm

Lecture Notes in Computer Science, 2015

Due to the variety of the data source and the veracity of their trustworthiness, it is challengin... more Due to the variety of the data source and the veracity of their trustworthiness, it is challenging to solve the distributed optimization problems in the big data applications owing to the privacy concerns. We propose a framework for distributed multi-agent greedy algorithms whereby any greedy algorithm that fits our requirement can be converted to a privacy-preserving one. After the conversion, the private information associated with each agent will not be disclosed to anyone else but the owner, and the same output as the plain greedy algorithm is computed by the converted one. Our theoretic analysis shows the security of the framework, and the implementation also shows good performance.

Ensuring Privacy in Location-Based Services: A Model-based Approach

arXiv (Cornell University), Feb 24, 2020

In recent years, the widespread of mobile devices equipped with GPS and communication chips has l... more In recent years, the widespread of mobile devices equipped with GPS and communication chips has led to the growing use of location-based services (LBS) in which a user receives a service based on his current location. The disclosure of user's location, however, can raise serious concerns about user privacy in general, and location privacy in particular which led to the development of various location privacy-preserving mechanisms aiming to enhance the location privacy while using LBS applications. In this paper, we propose to model the user mobility pattern and utility of the LBS as a Markov decision process (MDP), and inspired by probabilistic current state opacity notation, we introduce a new location privacy metric, namely −privacy, that quantifies the adversary belief over the user's current location. We exploit this dynamic model to design a LPPM that while it ensures the utility of service is being fully utilized, independent of the adversary prior knowledge about the user, it can guarantee a user-specified privacy level can be achieved for an infinite time horizon. The overall privacy-preserving framework, including the construction of the user mobility model as a MDP, and design of the proposed LPPM, are demonstrated and validated with real-world experimental data.

Non-Interactive MPC with Trusted Hardware Secure Against Residual Function Attacks

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2019

Secure multiparty computation (MPC) has been repeatedly optimized, and protocols with two communi... more Secure multiparty computation (MPC) has been repeatedly optimized, and protocols with two communication rounds and strong security guarantees have been achieved. While progress has been made constructing non-interactive protocols with just one-round of online communication (i.e., non-interactive MPC or NI-MPC), since correct evaluation must be guaranteed with only one round, these protocols are by their nature vulnerable to the residual function attack in the standard model. This is because a party that receives a garbled circuit may repeatedly evaluate the circuit locally, while varying their own inputs and fixing the inputs of others to learn the values entered by other participants. We present the first MPC protocol with a one-round online phase that is secure against the residual function attack. We also present rigorous proofs of correctness and security in the covert adversary model, a reduction of the malicious model that is stronger than the semi-honest model and better suited for modeling the behaviour of parties in the real world, for our protocol. Furthermore, we rigorously analyze the communication and computational complexity of current state of the art protocols which require two rounds of communication or one round during the online-phase with a reduced security requirement, and demonstrate that our protocol is comparable to or outperforms their complexity.

Federated Dynamic Graph Neural Networks with Secure Aggregation for Video-based Distributed Surveillance

ACM Transactions on Intelligent Systems and Technology, 2022

Distributed surveillance systems have the ability to detect, track, and snapshot objects moving a... more Distributed surveillance systems have the ability to detect, track, and snapshot objects moving around in a certain space. The systems generate video data from multiple personal devices or street cameras. Intelligent video-analysis models are needed to learn dynamic representation of the objects for detection and tracking. Can we exploit the structural and dynamic information without storing the spatiotemporal video data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from graph sequences: (1) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. (2) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local m...

Secure Single-Server Nearly-Identical Image Deduplication

2020 29th International Conference on Computer Communications and Networks (ICCCN), 2020

Cloud computing is often utilized for file storage. Clients of cloud storage services want to ens... more Cloud computing is often utilized for file storage. Clients of cloud storage services want to ensure the privacy of their data, and both clients and servers want to use as little storage as possible. Cross-user deduplication is one method to reduce the amount of storage a server uses. Deduplication and privacy are naturally conflicting goals, especially for nearly-identical ("fuzzy") deduplication, as some information about the data must be used to perform deduplication. Prior solutions thus utilize multiple servers, or only function for exact deduplication. In this paper, we present a single-server protocol for cross-user nearly-identical deduplication based on secure locality-sensitive hashing (SLSH). We formally define our ideal security, and rigorously prove our protocol secure against fully malicious, colluding adversaries with a proof by simulation. We show experimentally that the individual parts of the protocol are computationally feasible, and further discuss practical issues of security and efficiency.

VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices

ArXiv, 2017

Voice input has been tremendously improving the user experience of mobile devices by freeing our ... more Voice input has been tremendously improving the user experience of mobile devices by freeing our hands from typing on the small screen. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, the cloud might compromise users' privacy by identifying their identities by voice, learning their sensitive input content via speech recognition, and then profiling the mobile users based on the content. In this paper, we design an intermediate between users and the cloud, named VoiceMask, to sanitize users' voice data before sending it to the cloud for speech recognition. We analyze the potential privacy risks and aim to protect users' identities and sensitive input content from being disclosed to the cloud. VoiceMask adopts a carefully designed voice conversion mechanism that is resistant to several attacks. Meanwhile, it utilizes an evolution-based keyword substitution technique to sanitize th...

Federated Dynamic GNN with Secure Aggregation

ArXiv, 2020

Given video data from multiple personal devices or street cameras, can we exploit the structural ... more Given video data from multiple personal devices or street cameras, can we exploit the structural and dynamic information to learn dynamic representation of objects for applications such as distributed surveillance, without storing data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from multi-user graph sequences: i) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. ii) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local models on the respective user devices learn and periodically send their learning to the central server without ever exposing the user's data to server. iii) Studies showed that...

Cryptonite: A Framework for Flexible Time-Series Secure Aggregation with Online Fault Tolerance

IACR Cryptol. ePrint Arch., 2020

Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a... more Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a set of multiple participants’ data while ensuring the data remains private. Existing works rely on a trusted party to enable an aggregator to achieve offline fault tolerance, but in the real world this may not be practical. We develop a new framework that supports PSA in a way that is robust to online user faults, while still supporting a strong guarantee on each individual’s privacy. We first must define a new level of security in the presence of online faults and malicious adversaries because the existing definition does not account for online faults. After this we describe a general framework that allows existing work to reach this new level of security. Furthermore, we develop the first protocol that provably reaches this level of security by leveraging trusted hardware. After we develop a methodology to outsource computationally intensive work to higher performance devices, while s...

Enabling Faster Operations for Deeper Circuits in Full RNS Variants of FV-like Somewhat Homomorphic Encryption

IACR Cryptol. ePrint Arch., 2020

Though Fully Homomorphic Encryption (FHE) has been realized, most practical implementations utili... more Though Fully Homomorphic Encryption (FHE) has been realized, most practical implementations utilize leveled Somewhat Homomorphic Encryption (SHE) schemes, which have limits on the multiplicative depth of the circuits they can evaluate and avoid computationally intensive bootstrapping. Many SHE schemes exist, among which those based on Ring Learning With Error (RLWE) with operations on large polynomial rings are popular. Of these, variants allowing operations to occur fully in Residue Number Systems (RNS) have been constructed. This optimization allows homomorphic operations directly on RNS components without needing to reconstruct numbers from their RNS representation, making SHE implementations faster and highly parallel. In this paper, we present a set of optimizations to a popular RNS variant of the B/FV encryption scheme that allow for the use of significantly larger ciphertext moduli (e.g., thousands of bits) without increased overhead due to excessive numbers of RNS components...

GPS: Integration of Graphene, PALISADE, and SGX for Large-scale Aggregations of Distributed Data

IACR Cryptol. ePrint Arch., 2021

Secure computing methods such as fully homomorphic encryption and hardware solutions such as Inte... more Secure computing methods such as fully homomorphic encryption and hardware solutions such as Intel Software Guard Extension (SGX) have been applied to provide security for user input in privacy-oriented computation outsourcing. Fully homomorphic encryption is amenable to parallelization and hardware acceleration to improve its scalability and latency, but is limited in the complexity of functions it can efficiently evaluate. SGX is capable of arbitrarily complex calculations, but due to expensive memory paging and context switches, computations in SGX are bound by practical limits. These limitations make either of fully homomorphic encryption or SGX alone unsuitable for large-scale multi-user computations with complex intermediate calculations. In this paper, we present GPS, a novel framework integrating the Graphene, PALISADE, and SGX technologies. GPS combines the scalability of homomorphic encryption with the arbitrary computational abilities of SGX, forming a more functional and...

Graph-based privacy-preserving data publication

IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, 2016

We propose a graph-based framework for privacy preserving data publication, which is a systematic... more We propose a graph-based framework for privacy preserving data publication, which is a systematic abstraction of existing anonymity approaches and privacy criteria. Graph is explored for dataset representation, background knowledge specification, anonymity operation design, as well as attack inferring analysis. The framework is designed to accommodate various datasets including social networks, relational tables, temporal and spatial sequences, and even possible unknown data models. The privacy and utility measurements of the anonymity datasets are also quantified in terms of graph features. Our experiments show that the framework is capable of facilitating privacy protection by different anonymity approaches for various datasets with desirable performance.

Computing-in-Memory for Performance and Energy-Efficient Homomorphic Encryption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020

Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous resear... more Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE computations degrades computational efficiency. Near-memory Processing (NMP) and Computing-in-memory (CiM)-paradigms where computation is done within the memory boundariesrepresent architectural solutions for reducing latency and energy associated with data transfers in data-intensive applications such as HE. This paper introduces CiM-HE, a Computing-in-memory (CiM) architecture that can support operations for the B/FV scheme, a somewhat homomorphic encryption scheme for general computation. CiM-HE hardware consists of customized peripherals such as sense amplifiers, adders, bit-shifters, and sequencing circuits. The peripherals are based on CMOS technology, and could support computations with memory cells of different technologies. Circuit-level simulations are used to evaluate our CiM-HE framework assuming a 6T-SRAM memory. We compare our CiM-HE implementation against (i) two optimized CPU HE implementations, and (ii) an FPGA-based HE accelerator implementation.When compared to a CPU solution, CiM-HE obtains speedups between 4.6x and 9.1x, and energy savings between 266.4x and 532.8x for homomorphic multiplications (the most expensive HE operation). Also, a set of four end-toend tasks, i.e., mean, variance, linear regression, and inference are up to 1.1x, 7.7x, 7.1x, and 7.5x faster (and 301.1x, 404.6x, 532.3x, and 532.8x more energy efficient). Compared to CPUbased HE in a previous work, CiM-HE obtain 14.3x speed-up and >2600x energy savings. Finally,our design offers 2.2x speed-up with 88.1x energy savings compared to a state-of-the-art FPGAbased accelerator.

Cryptonite: A Framework for Flexible Time-Series Secure Aggregation with Non-interactive Fault Recovery

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2021

Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a... more Private stream aggregation (PSA) allows an untrusted data aggregator to compute statistics over a set of multiple participants' data while ensuring the data remains private. Existing works rely on a trusted third party to enable an aggregator to achieve fault tolerance, that requires interactive recovery, but in the real world this may not be practical or secure. We develop a new formal framework for PSA that accounts for user faults, and can support non-interactive recovery, while still supporting strong individual privacy guarantees. We first must define a new level of security in the presence of faults and malicious adversaries because the existing definitions do not account for faults and the security implications of the recovery. After this we develop the first protocol that provably reaches this level of security, i.e., individual inputs are private even after the aggregator's recovery, and reach new levels of scalability and communication efficiency over existing work seeking to support fault tolerance. The techniques we develop are general, and can be used to augment any PSA scheme to support non-interactive fault recovery.

Cryptonomial: A Framework for Private Time-Series Polynomial Calculations

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2021

In modern times, data collected from multiuser distributed applications must be analyzed on a mas... more In modern times, data collected from multiuser distributed applications must be analyzed on a massive scale to support critical business objectives. While analytics often requires the use of personal data, it may compromise user privacy expectations if this analysis is conducted over plaintext data. Private Stream Aggregation (PSA) allows for the aggregation of time-series data, while still providing strong privacy guarantees, and is significantly more efficient over a network than related techniques (e.g. homomorphic encryption, secure multiparty computation, etc.) due to its asynchronous and efficient protocols. However, PSA protocols face limitations and can only compute basic functions, such as sum, average, etc.. We present Cryptonomial, a framework for converting any PSA scheme amenable to a complex canonical embedding into a secure computation protocol that can compute any function over timeseries data that can be written as a multivariate polynomial, by combining PSA and a Trusted Execution Environment. This design allows us to compute the parallelizable sections of our protocol outside the TEE using advanced hardware, that can take better advantage of parallelism. We show that Cryptonomial inherits the security requirements of PSA, and supports fully malicious security. We simulate our scheme, and show that our techniques enable performance that is orders of magnitude faster than similar work supporting polynomial calculations.

CASTLE: Enhancing the Utility of Inequality Query Auditing Without Denial Threats

IEEE Transactions on Information Forensics and Security, 2018

Scalable Privacy-Preserving Participant Selection for Mobile Crowdsensing Systems: Participant Grouping and Secure Group Bidding

IEEE Transactions on Network Science and Engineering, 2018

Mobile crowdsensing (MCS) has been emerging as a new sensing paradigm where vast numbers of mobil... more Mobile crowdsensing (MCS) has been emerging as a new sensing paradigm where vast numbers of mobile devices are used for sensing and collecting data in various applications. Auction based participant selection has been widely used for current MCS systems to achieve user incentive and task assignment optimization. However, participant selection problems solved with auction-based approaches usually involve participants' privacy concerns because a participant's bids may contain her private information (such as location visiting patterns), and disclosure of participants' bids may disclose their private information as well. In this paper, we study how to protect such bid privacy in a temporally and spatially dynamic MCS system. We assume that both sensing tasks and mobile participants have dynamic characteristics over spatial and temporal domains. Following the classical VCG auction, we carefully design a scalable grouping based privacy-preserving participant selection scheme, where participants are grouped into multiple participant groups and then auctions are organized within groups via secure group bidding. By leveraging Lagrange polynomial interpolation to perturb participants' bids within groups, participants' bid privacy is preserved. In addition, the proposed solution does not affect the operation of current MCS platform since the groups act as regular users to the platform. Both theoretical analysis and real-life tracing data simulations verify the efficiency and security of the proposed solution.

Social Network De-anonymization

ACM Transactions on Internet Technology, 2019

Previous works on social network de-anonymization focus on designing accurate and efficient de-an... more Previous works on social network de-anonymization focus on designing accurate and efficient de-anonymization methods. We attempt to investigate the intrinsic relationship between the attacker’s knowledge and the expected de-anonymization gain. A common intuition is that more knowledge results in more successful de-anonymization. However, our analysis shows this is not necessarily true if the attacker uses the full background knowledge for de-anonymization. Our findings leave intriguing implications for the attacker to make better use of the background knowledge for de-anonymization and for the data owners to better measure the privacy risk when releasing their data to third parties.

PDA: Semantically Secure Time-Series Data Analytics with Dynamic User Groups

IEEE Transactions on Dependable and Secure Computing, 2016

Third-party analysis on private records is becoming increasingly important due to the widespread ... more Third-party analysis on private records is becoming increasingly important due to the widespread data collection for various analysis purposes. However, the data in its original form often contains sensitive information about individuals, and its publication will severely breach their privacy. In this paper, we present a novel Privacy-preserving Data Analytics framework PDA, which allows a third-party aggregator to obliviously conduct many different types of polynomial-based analysis on private data records provided by a dynamic subgroup of users. Notably, every user needs to keep only O(n) keys to join data analysis among O(2 n) different groups of users, and any data analysis that is represented by polynomials is supported by our framework. Besides, a real implementation shows the performance of our framework is comparable to the peer works who present ad-hoc solutions for specific data analysis applications. Despite such nice properties of PDA, it is provably secure against a very powerful attacker (chosen-plaintext attack) even in the Dolev-Yao network model where all communication channels are insecure.

Social Network De-Anonymization and Privacy Inference with Knowledge Graph Model

IEEE Transactions on Dependable and Secure Computing, 2017

Social network data is widely shared, transferred and published for research purposes and busines... more Social network data is widely shared, transferred and published for research purposes and business interests, but it has raised much concern on users' privacy. Even though users' identity information is always removed, attackers can still de-anonymize users with the help of auxiliary information. To protect against de-anonymization attack, various privacy protection techniques for social networks have been proposed. However, most existing approaches assume specific and restrict network structure as background knowledge and ignore semantic level prior belief of attackers, which are not always realistic in practice and do not apply to arbitrary privacy scenarios. Moreover, the privacy inference attack in the presence of semantic background knowledge is barely investigated. To address these shortcomings, in this work, we introduce knowledge graphs to explicitly express arbitrary prior belief of the attacker for any individual user. The processes of de-anonymization and privacy inference are accordingly formulated based on knowledge graphs. Our experiment on data of real social networks shows that knowledge graphs can power de-anonymization and inference attacks, and thus increase the risk of privacy disclosure. This suggests the validity of knowledge graphs as a general effective model of attackers' background knowledge for social network attack and privacy preservation.

PIC: Enable Large-Scale Privacy Preserving Content-Based Image Search on Cloud

2015 44th International Conference on Parallel Processing, 2015

High-resolution cameras produce huge volume of high quality images everyday. It is extremely chal... more High-resolution cameras produce huge volume of high quality images everyday. It is extremely challenging to store, share and especially search those huge images, for which increasing number of cloud services are presented to support such functionalities. However, images tend to contain rich sensitive information (e.g., people, location and event), and people's privacy concerns hinder their readily participation into the services provided by untrusted third parties. In this work, we introduce PIC: a Privacy-preserving large-scale Image search system on Cloud. Our system enables efficient yet secure content-based image search with fine-grained access control, and it also provides privacy-preserving image storage and sharing among users. Users can specify who can/cannot search on their images when using the system, and they can search on others' images if they satisfy the condition specified by the image owners. Majority of the computationally intensive jobs are outsourced to the cloud side, and users only need to submit the query and receive the result throughout the entire image search. Specially, to deal with massive images, we design our system suitable for distributed and parallel computation and introduce several optimizations to further expedite the search process. We implement a prototype of PIC including both cloud side and client side. The cloud side is a cluster of computers with distributed file system (Hadoop HDFS) and MapReduce architecture (Hadoop MapReduce). The client side is built for both Windows OS laptops and Android phones. We evaluate the prototype system with large sets of real-life photos. Our security analysis and evaluation results show that PIC successfully protect the image privacy at a low cost of computation and communication.

A Framework for Optimization in Big Data: Privacy-Preserving Multi-agent Greedy Algorithm

Lecture Notes in Computer Science, 2015

Due to the variety of the data source and the veracity of their trustworthiness, it is challengin... more Due to the variety of the data source and the veracity of their trustworthiness, it is challenging to solve the distributed optimization problems in the big data applications owing to the privacy concerns. We propose a framework for distributed multi-agent greedy algorithms whereby any greedy algorithm that fits our requirement can be converted to a privacy-preserving one. After the conversion, the private information associated with each agent will not be disclosed to anyone else but the owner, and the same output as the plain greedy algorithm is computed by the converted one. Our theoretic analysis shows the security of the framework, and the implementation also shows good performance.