ArtiFade:Learning to Invert Clean Concepts from Unclean Images

Student name: Yang Shuya

Student number: 3035832488

Collaborators: Mr. Hao, Shaozhe; Mr. Cao, Yukang

Supervisor name: Dr. Kenneth, K.Y. Wong

Introduction

Subject-driven image generation has seen significant advancements over the past two years. Existing methods can output images based on customized prompts with high subject reconstruction fidelity to the given subject. However, these methods typically require clean, high-quality input images. This prerequisite often does not align with real-world scenarios, where inputs might be polluted by watermarks or other artifacts. We introduce ArtiFade, a novel approach designed to address the challenge of generating high-quality, subject-driven images from unclean inputs. ArtiFade involves partially fine-tuning a diffusion model and simultaneously training an artifact-free embedding. Through extensive experiments, ArtiFade demonstrates remarkable results in removing both in-distribution and out-of-distribution artifacts while still preserving high subject fidelity during reconstruction. We only show high-level ideas and general experiments in the website.

Methodology

Workflow of ArtiFade. On the left, we perform ArtiFade fine-tuning by minimizing the reconstruction loss between the clean image and the generated image with its corresponding blemished embedding. We evaluate the performance of our method by using our fine-tuned U-Net model with a new type of blemished embedding, as shown on the right.

Experiments

Evaluation metrics

(1) the fidelity of subject reconstruction (I^DINO and I^CLIP) – calculating similarity between clean datasets and genreated images
(2) the fidelity of text conditioning (T^CLIP) – calculating similarity between text prompts and genreated images
(3) the effectiveness of mitigating the impacts of artifacts (R^DINO and R^CLIP) – calculating similarity between unclean datasets and genreated images

QUANTITATIVE COMPARISONS

We can observe that the use of blemished embeddings in Textual Inversion leads to comprehensive performance decline. In contrast, our method consistently achieves higher scores than Textual Inversion with blemished embeddings across the board, demonstrating the efficiency of ArtiFade in various aspects.

Qualitative Comparisons

In-distribution comparisons

Out-of-distribution comparisons

Applications

Our ArtiFade model can be applied to remove various unwanted artifacts in the input images for subject-driven image generation, e.g. stickers and glass effect.