Text-to-image diffusion models have revolutionized the way we generate personalized images from minimal reference photos. However, when misused, these tools can create misleading or harmful content, posing a significant risk to individuals. Current methods attempt to mitigate this by subtly altering user images to make them ineffective for unauthorized use. However, these poisoning-based defenses often fall short due to reliance on simplistic heuristics and a lack of resilience against basic image transformations like Gaussian filtering.
To combat these shortcomings, we introduce MetaCloak, an innovative approach that leverages a meta-learning framework to tackle the bi-level poisoning challenge. MetaCloak not only addresses the limitations of previous methods by incorporating a transformation sampling process for crafting robust and transferable perturbations but also utilizes a pool of surrogate diffusion models to ensure these perturbations are model-agnostic. An additional layer of innovation is applied in the form of a denoising-error maximization loss, designed to induce semantic distortion resilient to transformations, thus enhancing the generation's personalized touch.
Our comprehensive tests across the VGGFace2 and CelebA-HQ datasets demonstrate MetaCloak's superior performance over existing solutions. Remarkably, MetaCloak can deceive online training platforms like Replicate in a black-box fashion, showcasing its practical effectiveness in real-world applications. For those interested in further exploration or application, our code is publicly available at https://github.com/liuyixin-louis/MetaCloak.
Our experiments are performed on human subjects using the two face datasets: Celeba-HQ and VGGFace following Anti-DreamBooth. CelebA-HQ is an enhanced version of the original CelebA dataset consisting of 30,000 celebrity face images.VGGFace2 is a comprehensive dataset with over 3.3 million face images from 9,131 unique identities. Fifty identities are selected from each dataset, and we randomly pick 8 images from each individual and split those images into two subsets for image protection and reference. To show the overall performance of MetaCloak, in the following, we provide the main results of our method quantitatively and qualitatively.
We consider three adversarial purification techniques, including JPEG compression super-resolution transformation (SR), and image reconstruction based on total-variation minimization (TVM). We use a quality factor of 75 for the JPEG defense and a scale factor of 4 for the SR defense.
To test the effectiveness of our framework in the wild, we conduct experiments under online training-as-service settings. Unlike local training, attacking online training services is more challenging due to the limited knowledge of data prepossessing. We first showcase the performance of our method under two common DreamBooth fine-tuning scenarios, including full fine-tuning (Full-FT) and LoRA-fine-tuning (LoRA-FT). We sample data from VGGFace2 and upload its clean and poisoned images to Replicate for DreamBooth training.
@InProceedings{Liu_2024_CVPR,
author = {Liu, Yixin and Fan, Chenrui and Dai, Yutong and Chen, Xun and Zhou, Pan and Sun, Lichao},
title = {MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {24219-24228}
}
We thank the project homepage template from Real3D-Portrait and Anti-DreamBooth. And we also thank Stability AI, RunwayML and CompVis for publishing different version of pretrained Stable Diffusion models.