Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021
In collaborative learning, multiple parties contribute their datasets to jointly deduce global machine learning models for numerous predictive tasks. Despite its efficacy, this learning paradigm fails to encompass critical application domains that involve highly sensitive data, such as healthcare and security analytics, where privacy risks limit entities to individually train models using only their own datasets. In this work, we target privacy-preserving collaborative hierarchical clustering. We introduce a formal security definition that aims to achieve balance between utility and privacy and present a two-party protocol that provably satisfies it. We then extend our protocol with: (i) an optimized version for single-linkage clustering, and (ii) scalable approximation variants. We implement all our schemes and experimentally evaluate their performance and accuracy on synthetic and real datasets, obtaining very encouraging results. For example, end-to-end execution of our secure approximate protocol for over 1M 10-dimensional data samples requires 35sec of computation and achieves 97.09% accuracy. CCS CONCEPTS • Security and privacy → Cryptography; • Computing methodologies → Unsupervised learning.
Scalable Computing: Practice and Experience
Digitalization across all spheres of life has given rise to issues like data ownership and privacy. Privacy-Preserving Machine Learning (PPML), an active area of research, aims to preserve privacy for machine learning (ML) stakeholders like data owners, ML model owners, and inference users. The Paper, CoTraIn-VPD, proposes private ML inference and training of models for vertically partitioned datasets with Secure Multi-Party Computation (SPMC) and Differential Privacy (DP) techniques. The proposed approach addresses complications linked with the privacy of various ML stakeholders dealing with vertically portioned datasets. This technique is implemented in Python using open-source libraries such as SyMPC (SMPC functions), PyDP (DP aggregations), and CrypTen (secure and private training). The paper uses information privacy measures, including mutual information and KL-Divergence, across different privacy budgets to empirically demonstrate privacy preservation with high ML accuracy and...
2021
When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents PRICURE, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate PRICURE on neural ne...
2015 IEEE Global Communications Conference (GLOBECOM), 2014
Advancements in big data analysis offer costeffective opportunities to improve decision-making in numerous areas such as health care, economic productivity, crime, and resource management. Nowadays, data holders are tending to sharing their data for better outcomes from their aggregated data. However, the current tools and technologies developed to manage big data are often not designed to incorporate adequate security or privacy measures during data sharing. In this paper, we consider a scenario where multiple data holders intend to find predictive models from their joint data without revealing their own data to each other. Data locality property is used as an alternative to multi-party computation (SMC) techniques. Specifically, we distribute the centralized learning task to each data holder as local learning tasks in a way that local learning is only related to local data. Along with that, we propose an efficient and secure protocol to reassemble local results to get the final result. Correctness of our scheme is proved theoretically and numerically. Security analysis is conducted from the aspect of information theory.
Proceeding of The International Conference on …
Nowadays, data mining and machine learning techniques are widely used in electronic applications in different areas such as e-government, e-health, e-business, and so on. One major and very crucial issue in these type of systems, which are normally distributed among two or more parties and are dealing with sensitive data, is preserving the privacy of individual's sensitive information. Each party wants to keep its own raw data private while getting useful knowledge from the whole data owned by all the parties. Privacy-preserving Data Mining is dealing with this problem and many protocols have been introduced for various standard data mining algorithms so far. In this paper, we propose some new protocols for two popular techniques, classification and clustering, when data is horizontally or vertically partitioned among two or more parties. In classification we use Gini Index applying on ID3 algorithm to create decision tree from distributed data. We also propose two protocols for k-means Clustering which is a prominent clustering algorithm in data mining techniques. Some secure two and multi-party building blocks such as secure comparison, secure multi-party addition and multiplication are also proposed to use as sub-protocols in our algorithms.
IEEE Access, 2021
Nowadays different entities (such as hospitals, cyber security companies, banks, etc.) collect data of the same nature but often with different statistical properties. It has been shown that if these entities combine their privately collected datasets to train a machine learning model, they would end up with a trained model that often outperforms the human experts of the corresponding field(s) in terms of classification accuracy. However, due to judicial, privacy and cost reasons, no entity is willing to share their data with others. We have the same problem during the classification (inference) stage. Namely, the user doesn't want to reveal any information about his query or its' final classification, while the owner of the trained model wants to keep this model private. In this article we overcome these drawbacks by firstly introducing novel efficient secure building blocks for general purpose, which can also be used to build privacy preserving machine learning algorithms for both training and classification (inference) purposes under strict privacy and security requirements. Our theoretical analysis and experimentation results show that our building blocks (hence also our privacy preserving algorithms which are built on top of them) are more efficient than most (if not all) of the state-of-the-art schemes in terms of computation and communication cost, as well as security characteristics in the semi-honest model. Furthermore, and to the best of our knowledge, for the Naïve Bayes model we extend this efficiency for the first time to also deal with active malicious users, which arbitrarily deviate from the protocol.
2020
Applications of machine learning have become increasingly common in recent years. For instance, navigation systems like Google Maps use machine learning to better predict traffic patterns; Facebook, LinkedIn, and other social media platforms use machine learning to customize user's news feeds. Central to all these systems is user data. However, the sensitive nature of the collected data has also led to a number of privacy concerns. Privacy-preserving machine learning enables systems that can
Privacy concern has become an important issue in data mining. In this paper, a novel algorithm for privacy preserving in distributed environment using data clustering algorithm has been proposed. As demonstrated, the data is locally clustered and the encrypted aggregated information is transferred to the master site. This aggregated information consists of centroids of clusters along with their sizes. On the basis of this local information, global centroids are reconstructed then it is transferred to all sites for updating their local centroids. Additionally, the proposed algorithm is integrated with Elliptic Curve Cryptography (ECC) public key cryptosystem and Diffie-Hellman key exchange. The proposed distributed encrypted scheme can add an increase not more than 15% in performance time relative to distributed non encrypted scheme but give not less than 48% reduction in performance time relative to centralized scheme with the same size of dataset. Theoretical and experimental analysis illustrates that the proposed algorithm can effectively solve privacy preserving problem of clustering mining over distributed data and achieve the privacy-preserving aim.
IEEE access, 2024
Machine learning is a powerful technology for extracting information from data of diverse nature and origin. As its deployment increasingly depends on data from multiple entities, ensuring privacy for these contributors becomes paramount for the integrity and fairness of machine learning endeavors. This review looks into the recent advancements in secure multi-party computation (SMPC) for machine learning, a pivotal technology championing data privacy. We evaluate these applications from various aspects, including security models, requirements, system types, and service models, aligning with the IEEE's recommended practices for SMPC. Broadly, SMPC systems are divided into two categories: homomorphicbased systems, which facilitate computations on encrypted data, ensuring data remains confidential, and secret sharing-based systems, which disseminate data across parties in fragmented shares. Our literature analysis highlights certain gaps, such as security requisites, streamlined information exchange, incentive structures, data authenticity, and operational efficiency. Recognizing these challenges lead to envisioning a holistic SMPC protocol tailored for machine learning applications.
Cornell University - arXiv, 2022
Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue that model ensembles, implemented in our framework called SafeNet, are a highly MPCamenable way to avoid many adversarial ML attacks. The natural partitioning of data amongst owners in MPC training allows this approach to be highly scalable at training time, provide provable protection from poisoning attacks, and provably defense against a number of privacy attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to poisoning on several machine learning datasets and models trained in end-to-end and transfer learning scenarios. For instance, SafeNet reduces backdoor attack success significantly, while achieving 39× faster training and 36× less communication than the four-party MPC framework of Dalskov et al. [28]. Our experiments show that ensembling retains these benefits even in many non-iid settings. The simplicity, cheap setup, and robustness properties of ensembling make it a strong first choice for training ML models privately in MPC.
Computers & Security, 2007
The sharing of data has been proven beneficial in data mining applications. However, privacy regulations and other privacy concerns may prevent data owners from sharing information for data analysis. To resolve this challenging problem, data owners must design a solution that meets privacy requirements and guarantees valid data clustering results. To achieve this dual goal, we introduce a new method for privacy-preserving clustering, called Dimensionality Reduction-Based Transformation (DRBT). This method relies on the intuition behind random projection to protect the underlying attribute values subjected to cluster analysis. The major features of this method are: a) it is independent of distance-based clustering algorithms; b) it has a sound mathematical foundation; and c) it does not require CPU-intensive operations. We show analytically and empirically that transforming a dataset using DRBT, a data owner can achieve privacy preservation and get accurate clustering with a little overhead of communication cost. * Note to referees: A preliminary version of this work appeared in the Workshop on Privacy and Security Aspects of Data Mining in conjunction with ICDM 2004, Brighton, UK, November 2004. The entire paper has been rewritten with additional detail throughout. We substantially improved the paper both theoretically and empirically to emphasize the practicality and feasibility of our approach. In addition, we introduced a new section with a methodology to evaluate the quality of the clusters generated after applying our method Dimensionality Reduction-Based Transformation (DRBT) to a dataset in which the attributes of objects are either available in a central repository or split across several sites.
Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, 2021
Clustering is an unsupervised machine learning technique that outputs clusters containing similar data items. In this work, we investigate privacy-preserving density-based clustering which is, for example, used in financial analytics and medical diagnosis. When (multiple) data owners collaborate or outsource the computation, privacy concerns arise. To address this problem, we design, implement, and evaluate the first practical and fully private density-based clustering scheme based on secure two-party computation. Our protocol privately executes the DBSCAN algorithm without disclosing any information (including the number and size of clusters). It can be used for private clustering between two parties as well as for private outsourcing of an arbitrary number of data owners to two non-colluding servers. Our implementation of the DBSCAN algorithm privately clusters data sets with 400 elements in 7 minutes on commodity hardware. Thereby, it flexibly determines the number of required clusters and is insensitive to outliers, while being only factor 19x slower than today's fastest private K-means protocol (Mohassel et al., PETS'20) which can only be used for specific data sets. We then show how to transfer our newly designed protocol to related clustering algorithms by introducing a private approximation of the TRACLUS algorithm for trajectory clustering which has interesting real-world applications like financial time series forecasts and the investigation of the spread of a disease like COVID-19. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; • Computing methodologies → Cluster analysis.
2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S)
IACR Cryptology ePrint Archive, 2017
Machine learning algorithms, such as neural networks, create better predictive models when having access to larger datasets. In many domains, such as medicine and finance, each institute has only access to limited amounts of data, and creating larger datasets typically requires collaboration. However, there are privacy related constraints on these collaborations for legal, ethical, and competitive reasons. In this work, we present a feasible protocol for learning neural networks in a collaborative way while preserving the privacy of each record. This is achieved by combining Differential Privacy and Secure Multi-Party Computation with Machine Learning.
EURASIP Journal on Information Security, 2013
Clustering is a very important tool in data mining and is widely used in on-line services for medical, financial and social environments. The main goal in clustering is to create sets of similar objects in a data set. The data set to be used for clustering can be owned by a single entity, or in some cases, information from different databases is pooled to enrich the data so that the merged database can improve the clustering effort. However, in either case, the content of the database may be privacy sensitive and/or commercially valuable such that the owners may not want to share their data with any other entity, including the service provider. Such privacy concerns lead to trust issues between entities, which clearly damages the functioning of the service and even blocks cooperation between entities with similar data sets. To enable joint efforts with private data, we propose a protocol for distributed clustering that limits information leakage to the untrusted service provider that performs the clustering. To achieve this goal, we rely on cryptographic techniques, in particular homomorphic encryption, and further improve the state of the art of processing encrypted data in terms of efficiency by taking the distributed structure of the system into account and improving the efficiency in terms of computation and communication by data packing. While our construction can be easily adjusted to a centralized or a distributed computing model, we rely on a set of particular users that help the service provider with computations. Experimental results clearly indicate that the work we present is an efficient way of deploying a privacy-preserving clustering algorithm in a distributed manner.
2020
With increasing usage of deep learning algorithms in many application, new research questions related to privacy and adversarial attacks are emerging. However, the deep learning algorithm improvement needs more and more data to be shared within research community. Methodologies like federated learning, differential privacy, additive secret sharing provides a way to train machine learning models on edge without moving the data from the edge. However, it is very computationally intensive and prone to adversarial attacks. Therefore, this work introduces a privacy preserving FedCollabNN framework for training machine learning models at edge, which is computationally efficient and robust against adversarial attacks. The simulation results using MNIST dataset indicates the effectiveness of the framework.
Proceedings on Privacy Enhancing Technologies, 2020
Privacy-preserving machine learning (PPML) via Secure Multi-party Computation (MPC) has gained momentum in the recent past. Assuming a minimal network of pair-wise private channels, we propose an efficient four-party PPML framework over rings ℤ2ℓ, FLASH, the first of its kind in the regime of PPML framework, that achieves the strongest security notion of Guaranteed Output Delivery (all parties obtain the output irrespective of adversary’s behaviour). The state of the art ML frameworks such as ABY3 by Mohassel et.al (ACM CCS’18) and SecureNN by Wagh et.al (PETS’19) operate in the setting of 3 parties with one malicious corruption but achieve the weaker security guarantee of abort. We demonstrate PPML with real-time efficiency, using the following custom-made tools that overcome the limitations of the aforementioned state-of-the-art– (a) dot product, which is independent of the vector size unlike the state-of-the-art ABY3, SecureNN and ASTRA by Chaudhari et.al (ACM CCSW’19), all of wh...
Data Mining has wide use in many fields such as financial, medication, medical research and among govt. departments. Classification is one of the widely applied works in data mining applications. For the past several years, due to the increase of various privacy problems, many conceptual and realistic alternatives to the classification issue have been suggested under various protection designs. On the other hand, with the latest reputation of cloud processing, users now have to be able to delegate their data, in encoded form, as well as the information mining task to the cloud. Considering that the information on the cloud is in secured type, current privacy-preserving classification methods are not appropriate. In this paper, we concentrate on fixing the classification issue over encoded data. In specific, we recommend a protected k-classifier over secured data in the cloud. The suggested protocol defends the privacy of information, comfort of user's feedback query, and conceals the information access styles. To the best of our information, our task is the first to create a protected k-classifier over secured data under the semi-honest model. Also, we empirically evaluate the performance of our suggested protocol utilizing a real-world dataset under various parameter configurations. To secure user privacy, numerous privacy-preserving category methods have been suggested over the past several years. The current methods are not appropriate to contracted database surroundings where the information exists in secured form on a third-party server
Data & Knowledge Engineering, 2007
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications.. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.
2016 IEEE Trustcom/BigDataSE/ISPA, 2016
Recent advances in sensing and storing technologies have led to big data age where a huge amount of data are distributed across sites to be stored and analysed. Indeed, cluster analysis is one of the data mining tasks that aims to discover patterns and knowledge through different algorithmic techniques such as k-means. Nevertheless, running k-means over distributed big data stores has given rise to serious privacy issues. Accordingly, many proposed works attempted to tackle this concern using cryptographic protocols. However, these cryptographic solutions introduced performance degradation issues in analysis tasks which does not meet big data properties. In this work we propose a novel privacy-preserving k-means algorithm based on a simple yet secure and efficient multiparty additive scheme that is cryptography-free. We designed our solution for horizontally partitioned data. Moreover, we demonstrate that our scheme resists against adversaries passive model.
Proceedings of the International Conference on …, 2007
Extracting meaningful and valuable knowledge from databases is often done by various data mining algorithms. Nowadays, databases are distributed among two or more parties because of different reasons such as physical and geographical restrictions and the most important issue is privacy. Related data is normally maintained by more than one organization, each of which wants to keep its individual information private. Thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed environments when privacy is highly concerned. Cluster analysis is a technique in data mining, by which data can be divided into some meaningful clusters, and it has an important role in different fields such as bio-informatics, marketing, machine learning, climate and medicine. k-means Clustering is a prominent algorithm in this category which creates a one-level clustering of data. In this paper we introduce privacy-preserving protocols for this algorithm, along with a protocol for Secure comparison, known as the Millionaires' Problem, as a sub-protocol, to handle the clustering of horizontally or vertically partitioned data among two or more parties.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.