Skip to content

ronketer/generative-ai-security-forensics

Repository files navigation

Personalizing Generative Priors: DreamBooth, SDEdit & Forensics

This repository implements a production-grade exploratory pipeline for Stable Diffusion (v1.5), focusing on subject personalization, structurally-guided image editing, and the forensics of training data leakage.


🛠️ Technical Highlights

  • Subject Personalization (DreamBooth + LoRA): Fine-tuned Stable Diffusion models using LoRA (Low-Rank Adaptation) and DreamBooth to reconstruct unique subjects via custom identifiers.
  • Optimization of Training Dynamics: Analyzed loss curves and training dynamics to balance subject fidelity with the prevention of overfitting.
  • Membership Inference Algorithm: Engineered a lightweight forensics probe to detect training data leakage by distinguishing memorized samples from unseen inputs using Reconstruction MSE.
  • Guided Semantic Editing (SDEdit): Benchmarked SDEdit against standard text-based prompting, optimizing Gaussian noise levels $S \in {0.4, 0.6, 0.8}$ to control the trade-off between structural preservation and prompt adherence.
  • Forensics Thresholding: Established a clear reconstruction error threshold at $\approx 0.085$ (Training MSE $\approx 0.07$ vs. Unseen MSE $\approx 0.11$) to identify data leakage.

🔍 Core Pillars

1. Subject Personalization

Taught the model a unique identifier TOK for a specific subject using diverse background training sets. This approach ensures the model disentangles the subject's identity from its context, allowing for high-fidelity reconstruction in varied prompts.

2. Guided Editing (SDEdit)

Implemented a noise-then-denoise pipeline for semantic image-to-image translation.

  • Low Noise ($S \approx 0.4$): Maximizes structural preservation but may ignore prompt guidance.
  • High Noise ($S \approx 0.8$): Prioritizes prompt adherence at the cost of the original image structure.

3. Membership Inference (Forensics)

Utilized score thresholding based on reconstruction error to audit the model. This is critical for identifying whether specific images were used during the fine-tuning stage, a key aspect of AI security and privacy.


📂 Repository Layout

  • ex5.ipynb — Core Colab notebook for training, inference, SDEdit, and forensics probe.
  • reports/ — Comprehensive report assets, figures, and generated media.
  • main.py — Entry point placeholder for modular implementations.

⚙️ Setup & Execution

  • Runtime: Google Colab with a GPU ($\ge$ 16 GB VRAM recommended, e.g., T4 or A100).
  • Dependencies: diffusers, transformers, accelerate, peft.
  • Hugging Face: Requires valid access to runwayml/stable-diffusion-v1-5.

How to Run

  1. Open ex5.ipynb in a GPU-enabled Google Colab environment.
  2. Execute the Hugging Face login cell to fetch model weights.
  3. Follow the notebook sections to launch DreamBooth fine-tuning, generate SDEdit comparative grids, and run the Forensics probe.

👤 Author

Ron Keter — Image Processing (67829), The Hebrew University of Jerusalem

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published