Text-to-image diffusion models are nothing but a revolution, allowing anyone, even without design skills, to create realistic images from simple text inputs. With powerful personalization tools like DreamBooth, they can generate images of a specific person just by learning from his/her few reference images. However, when misused, such a powerful and convenient tool can produce fake news or disturbing content targeting any individual victim, posing a severe negative social impact. In this paper, we explore a defense system called Anti-DreamBooth against such malicious use of DreamBooth. The system aims to add subtle noise perturbation to each user's image before publishing in order to disrupt the generation quality of any DreamBooth model trained on these perturbed images. We investigate a wide range of algorithms for perturbation optimization and extensively evaluate them on two facial datasets over various text-to-image model versions. Despite the complicated formulation of DreamBooth and Diffusion-based text-to-image models, our methods effectively defend users from the malicious use of those models. Their effectiveness withstands even adverse conditions, such as model or prompt/term mismatching between training and testing.
By finetuning a text-to-image model using DreamBooth, a malicious attacker is able to generate different images of a specific person/concept (denoted as sks) in different contexts, for a purpose of (1) creating highly-sensitive content, (2) stealing artistic style, or (3) spreading misinformation. Hence, our motivation is to prevent such scenarios by processing the subject's images before online release.
Given a few images of a subject collected online, a malicious attacker will finetune the DreamBooth model to create harmful content targeting an individual. Hence, before users release their images online, we protect these photos using our method Anti-DreamBooth. The entire attack-defense flow is described below:
(a) We optimize imperceptible noises added to the original clean images such that the perturbed images are nearly identical to the clean ones and any DreamBooth models finetuned on these images have poor quality.
(b) The attacker collects the perturbed images to fine-tune a text-to-image model following the DreamBooth algorithm. Specifically, he fine-tunes the model with these images paired with an instance prompt containing a unique identifier (e.g., "A photo of sks person”); in parallel, a class-specific prior preservation loss is applied using a class prompt (e.g., "A photo of a person”).
(c) Using the fooled DreamBooth model, the attacker can only generate noisy and corrupted images of the subject regardless of different sentences to synthesize the subjects. Our method can even generalize to prompts unseen during perturbation training.
Results for protecting instances in a convenient setting. Here, we assume to have prior knowledge about the pretrained text-to-image generator, training term (e.g., “sks”), and training prompt the attacker will use. We display the conditioning prompts above each image.
As described in the motivation section, we present our approach to defending against the presence of sensitive content that specifically targets women. As depicted in the figure below, such contents are significantly distorted with noticeable artifacts, which are induced by our method.
Our method has also proven effective in preventing the imitation of famous artworks by well-known artists such as Paul Jacoulet and Félix Vallotton, further highlighting its benefits.
In response to the recent spread of false news about Donald Trump's alleged arrest by the police, we demonstrate the effectiveness of our method in preventing the distribution of misinformation.
Results for protecting instances in adverse settings. For example, we don't have prior knowledge about the pretrained text-to-image generator will be used by the attacker, so we use an ensemble of three versions of Stable Diffusion (v1.4, v1.5, and v2.1) to train the adversarial noise and then validate on DreamBooth models which are finetuned on Stable Diffusion v2.1 and v2.0. We test with two random subjects and denote them in green and red, respectively.
Another possible case is that we may not have prior knowledge about the training term or prompt so we examine the transferability of the learned adversarial noise when there is a change in the training terms/promtps.
In the first scenario, the training term is changed from “sks” to “t@t”. In the second scenario, the training prompt is replaced with “a DSLR portrait of sks person” instead of “a photo of sks person”. Here, S* is “t@t” for term mismatching and “sks” for prompt mismatching.
Our proposed defense not only works in laboratory settings but also in real-world scenarios, as it successfully disrupts the personalized generation outputs of a black-box and commercialized AI service named Astria. First, we test our method's capability on Astria's default configuration: Stable Diffusion version 1.5 with face detection enabled.
With the increasing availability of public text-to-image models, it is not always the case that the attacker uses Stable Diffusion, but he may opt for a different pretrained model. Here, we test another Astria setting, which is Protogen version 3.4 with Prism and face detection enabled.
As can be seen, our method significantly reduces the quality of the generated images in various complex prompts and on both target models. This highlights the robustness of our approach, which can defend against these services without requiring knowledge of their underlying configurations.
In case the malicious attacker may get in hand some clean images of the target subject and mix them with the perturbed images for DreamBooth training, our Anti-DreamBooth still can protect users unless there are too many clean examples. As shown below, our defense is still quite effective when half of the images are perturbed, but its effectiveness reduces when more clean images are introduced.
To evaluate the robustness of the proposed Anti-DreamBooth, we conducted extensive experiments using various simple image processing techniques. Swipe left and right the image below to see more in detail.
Even though the above results indicate that our defense scheme withstands basic image processing techniques, we acknowledge that these methods do not represent all potential attacks to bypass our protection. Thus, we plan to explore additional algorithms to improve our defense robustness in future works further, as mentioned in the Conclusions section of our paper.
@InProceedings{le_etal2023antidreambooth,
title={Anti-DreamBooth: Protecting Users from Personalized Text-to-Image Synthesis},
author={Thanh Van Le, Hao Phung, Thuan Hoang Nguyen, Quan Dao, Ngoc Tran and Anh Tran},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}
Acknowledgements: We thank Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman for the wonderful work DreamBooth and its project page. Finally, a special "thank you" to Stability AI, RunwayML and CompVis for publishing different version of pretrained Stable Diffusion models.