DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform

Amazon
Duke University. Work done during internship at Amazon
UCLA. Work done during internship at Amazon
* Equal Contribution
Person demonstration
Virtual try-on ability on humans
Furniture demonstration
Virtual try-on ability on furnitures
Graphics preservation
Graphics preservation ability
Garment extraction
Extracting garment information from photos

Abstract

Diffusion models enables high-quality virtual try-on (VTO) with their impressive image synthesis abilities. Despite the extensive end-to-end training of large pre-trained models involved in current VTO methods, real-world applications often prioritize limited training budgets for VTO. To solve this obstacle, we apply Doob’s h-transform efficient fine tuning (DEFT) for adapting large pre-trained unconditional models for downstream VTO abilities. DEFT freezes the pre-trained model’s parameters and trains a small h-transform network to learn a conditional h-transform. Based on a 2.4B parameters pre-trained score network and a 83.7M pre-trained autoencoder, the proposed framework trains a 35.3M parameter h-transform network, only ~1.42% of the frozen parameters. To further improve DEFT’s performance, and decrease existing models’ inference time, we additionally propose an adaptive consistency loss. Consistency training distills slow but performing diffusion models into a fast one while retaining performances by enforcing consistencies along the inference path. Inspired by constrained optimization, instead of distillation, we combine the consistency loss and the denoising score matching loss in a data-adaptive manner for fine-tuning existing VTO models at a low cost. Empirical results show proposed DEFT-VTO method achieves SOTA performances on VTO tasks, as well as a number of function evaluations as low as 15, while maintaining competitive performances.

BibTeX

If you find our work useful, please cite our paper:

@misc{DEFT-VTON,
      author = {Xingzi Xu, Qi Li, Shuwen Qiu, Julien Han, Karim Bouyarmane},
      title = {DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform},
      publisher = {ArXiv},
      year = {2024}
}