SwiftBrush

One-Step Text-to-Image Diffusion Model

with Variational Score Distillation

One-Step Text-to-Image Diffusion Model

with Variational Score Distillation

Thuan Hoang Nguyen    Anh Tran

Thuan Hoang Nguyen    Anh Tran

VinAI Research

VinAI Research


CVPR 2024

CVPR 2024

Despite their ability to generate high-resolution and diverse images from text prompts, text-to-image diffusion models often suffer from slow iterative sampling processes. Model distillation is one of the most effective directions to accelerate these models. However, previous distillation methods fail to retain the generation quality while requiring a significant amount of images for training, either from real data or synthetically generated by the teacher model. In response to this limitation, we present a novel image-free distillation scheme named SwiftBrush. Drawing inspiration from text-to-3D synthesis, in which a 3D neural radiance field that aligns with the input prompt can be obtained from a 2D text-to-image diffusion prior via a specialized loss without the use of any 3D data ground-truth, our approach re-purposes that same loss for distilling a pretrained multi-step text-to-image model to a student network that can generate high-fidelity images with just a single inference step. In spite of its simplicity, our model stands as one of the first one-step text-to-image generators that can produce images of comparable quality to Stable Diffusion without reliance on any training image data. Remarkably, SwiftBrush achieves an FID score of 16.67 and a CLIP score of 0.29 on the COCO-30K benchmark, achieving competitive results or even substantially surpassing existing state-of-the-art distillation techniques.

More from the SwiftBrush family:

A hyperrealistic photo of fox astronaut, perfect face, artstation
A hyperrealistic photo of fox astronaut, perfect face, artstation
A DSLR photo of a dog wearing glasses at the beach
A DSLR photo of a dog wearing glasses at the beach
A high-resolution photograph of a waterfall in autumn; muted tone
A high-resolution photograph of a waterfall in autumn; muted tone
Portrait of a cat astronaut with Japanese samurai helmets
Portrait of a cat astronaut with Japanese samurai helmets
A poodle wearing a baseball cap and holding a dictionary in hand
A poodle wearing a baseball cap and holding a dictionary in hand
An oil painting of a vase with yellow roses, style of Frank Auerbach
An oil painting of a vase with yellow roses, style of Frank Auerbach
A photo of one ice cream ball in a luxurious plate, bokeh
A photo of one ice cream ball in a luxurious plate, bokeh
A blue Porsche 356 parked in front of a brick wall
A blue Porsche 356 parked in front of a brick wall

SwiftBrush research highlights

Qualitative comparison of our work against exsiting distillation methods on COCO 2014.
ModelFID ↓CLIP ↑
Image-depedent Distillation
Guided Distillation (Meng et al., 2023)37.30.27
LCM (Luo et al., 2023)35.560.24
InstaFlow (Liu et al., 2023)13.270.28
Image-free Distillation
BOOT (Gu et al., 2023)17.8935.49
SwiftBrush (Our Work)16.670.29
Stable Diffusion v2.1
1 sampling step202.140.06
25 sampling steps13.450.23

Click on a word below and brush swiftly!

A DSLR photo of a An oil painting of a

panda owl cat shiba inu raccoon

drinking latte eating pizza

in a garden on a beach on top of a mountain

SwiftBrush

speedy · simple · sublime

A high-resolution photo of a persian cat wearing a sunglasses and a beach hat in Times Square
A high-resolution photo of a persian cat wearing a sunglasses and a beach hat in Times Square
A pencil sketch of an old man by Milt Kahl
A pencil sketch of an old man by Milt Kahl
A majestic oil painting of a raccoon Queen wearing red French royal gown
A majestic oil painting of a raccoon Queen wearing red French royal gown
A dslr photo of a turtle sitting in a forest, fisheye lens view
A dslr photo of a turtle sitting in a forest, fisheye lens view

Special Thanks

We give thanks to Uy Dieu Tran for early discussions as well as providing many helpful comments and suggestions throughout the project. Special thanks to Trung Tuan Dao for valuable feedback and support. Last but not least, we thank Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su and Jun Zhu for the work of ProlificDreamer as well as Huggingface team for the diffusers framework.

BibTeX

@InProceedings{nguyen2024swiftbrush,
     title={SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation},
     author={Thuan Hoang Nguyen and Anh Tran},
     booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
     year={2024},
}