Skip to content

eyalgerman/MIA-EPT

Repository files navigation

MIA-EPT – Membership Inference Attack via Error Prediction for Tabular Data

🚀 Award-Winning Method:
Our approach won 2nd Place in the Black-box Multi Table track of the
MIDST ChallengeMembership Inference over Diffusion-models-based Synthetic Tabular Data,
hosted by the Vector Institute at the
3rd IEEE Conference on Secure and Trustworthy Machine Learning (SaTML 2025).


Overview

MIA-EPT (Membership Inference Attack via Error Prediction for Tabular Data) is a black-box attack method designed to assess the privacy risks of synthetic data generated by tabular diffusion models. Our approach reveals that even in strictly black-box settings—where the attacker only sees the released synthetic data—models can leak membership information about real individuals used in training.

MIA-EPT leverages a shadow-model paradigm combined with attribute prediction models to construct error profiles for records. These profiles capture how accurately synthetic data-trained models can reconstruct individual attributes of a record. By learning the distinction between members and non-members through these features, our method predicts the likelihood that a given record was used in training the target generative model.

Method Steps

MIA-EPT follows a five-step pipeline:

1. Shadow Model Training

  • Partition an auxiliary dataset into disjoint member and non-member splits.
  • Train shadow tabular diffusion models on the member data and generate synthetic data.

2. Attribute Prediction Model Training

  • For each synthetic dataset, train one model per column (H_c) to predict that column based on all others.
  • Use classification for categorical columns and regression for continuous ones.

3. Feature Extraction (Error Profiles)

  • For each record in the shadow model’s member and non-member sets:
    • Mask each column and predict its value using the respective H_c.
    • Feature Design Based on Task Type:
      • For classification tasks:
        • Actual class
        • Predicted class
        • Binary indicator for correctness (0 = incorrect, 1 = correct)
      • For regression tasks:
        • Actual value
        • Predicted value
        • Absolute error
        • Error ratio (normalized error)

4. Attack Classifier Training

  • Train a classifier (e.g., XGBoost or CatBoost) on the extracted features to distinguish members from non-members.
  • Labels are binary: 1 for member, 0 for non-member.

5. Membership Prediction on the Challenge Dataset

  • Use the synthetic dataset released by the target model to train attribute predictors.
  • Apply these predictors to challenge dataset and extract their error profiles.
  • Feed the profiles to the trained attack classifier to compute membership scores.

Setup Instructions

Prerequisites

Before running the code, ensure you have Python (>=3.8) installed.

To install the required dependencies, run:

pip install -r requirements.txt

Running Steps

  1. Set up data paths:

    • Update the data_manager.py file with the correct path to the data folder. This folder should store all datasets, extracted features, and evaluation results.
  2. Run the full pipeline:

    • Execute the following command to generate feature vectors for shadow models and the attack classifier, train the attack classifier, evaluate its performance, and train the final attack classifier with all models:
    python main.py
  3. Evaluate the results:

    • The attack classifier is evaluated by training on 25 models and testing on the remaining 5 as validation set.
    • The evaluation includes computing the AUC-ROC score and TPR@FPR for different false positive rate (FPR) values.
    • The final attack classifier is then trained using all available models.
  4. Generate the Final Prediction File:

    • Running main.py automatically generates a submission file for the model that achieves the highest TPR@FPR of 10% based on our validation results.
    • To generate a submission file for a different model, execute the following command with the desired parameters:
    python create_prediction_folder.py --type_test <type_test> --classifier_model_name <classifier_model_name> --columns_lst <columns_lst> --time <time>
    • Argument Explanation:
      • --type_test <type_test>: Specifies the type of test being performed (e.g., blackbox_multi_table or blackbox_single_table).
      • --classifier_model_name <classifier_model_name>: Indicates the classifier model selected based on high AUC-ROC and high TPR@FPR_10 from results_summary.csv. Choose from the following: CatBoost, XGBoost, or MLP.
      • --columns_lst <columns_lst>: List of selected feature columns for evaluation, extracted from results_summary.csv (e.g., ['actual', 'error', 'error_ratio', 'accuracy']).
      • --time <time>: A timestamp or identifier for tracking execution. Unlike the other arguments, this value is printed in main.py to indicate the group of models to use. (e.g., 20250218_115430_test).

Repository Structure

MIA-EPT/
│── README.md                    # Project documentation
│── requirements.txt             # List of required dependencies
│── main.py                      # Main pipeline execution file
│── argument_parser.py           # Handles argument parsing
│── create_predictions_folder.py # Generates final submission file
│── data.py                      # Data handling functions
│── data_manager.py              # Manages data paths and preprocessing
│── features_extraction.py       # Extracts feature vectors for attack detection
│── metrics.py                   # Computes evaluation metrics
│── run_features_extraction.py   # Script to extract features
│── run_train_classifier.py      # Script to train the attack classifier
│── train_classifier.py          # Contains attack classifier training logic

Our Team

This project was developed as part of academic research conducted at the
Cyber Security Research Center, Ben-Gurion University of the Negev.

  • Eyal German – MSc Student, Developer
  • Daniel Samira – MSc Student, Developer
  • Prof. Yuval Elovici – Supervisor
  • Prof. Asaf Shabtai – Supervisor

About

MIA-EPT – Membership Inference Attack via Error Prediction for Tabular Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages