Adversarial Text to Continuous Image Generation

Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen, Jovana Lazarevic, Ivan Skorokhodov, Chamuditha Jayanga Galappaththige, Mohamed Elhoseiny

King Abdullah University of Science and Technology

Paper Code (coming soon)

The red rectangles indicate the resolution boundaries that our HyperCGAN model was trained. By design, our model can synthesize meaningful pixels at surrounding (x,y) coordinates beyond these boundaries without any explicit training.

Abstract

Existing GAN-based text-to-image models treat images as 2D pixel arrays. In this paper, we approach the text-to-image task from a different perspective, where a 2D image is represented as an implicit neural representation (INR). We show that straightforward conditioning of the unconditional INR-based GAN method on text inputs is not enough to achieve good performance. We propose a word-level attention-based weight modulation operator that controls the generation process of INR-GAN based on hypernetworks. Our experiments on benchmark datasets show that HyperCGAN achieves competitive performance to existing pixel-based methods and retains the properties of continuous generative models.

Architecture

The architecture of the proposed HyperCGAN: Linear layers are used as hypernetworks. Overall, given text embeddings and noise vector, hypernetworks generate parameters for modulating weights of INR-based decoder.

Qualitative Results

Qualitative results on three datasets: MS-COCO 256², CUB 256², and ArtEmis 256².

Sensitivity to Words

Here, the input noise z is kept fixed while varying color names in the prompt ”a small {color}, bird with white and dark gray wingbars and white breast and long tail”, aiming to assess the model’s sensitivity to word-level modulation.

Continous Properties

the capability of our models in terms of continuous image synthesis: extrapolation and superresolution.

Attention Maps

Attention maps per modulating word-based weights.

BibTeX

@InProceedings{Haydarov_2024_CVPR,
    author    = {Haydarov, Kilichbek and Muhamed, Aashiq and Shen, Xiaoqian and Lazarevic, Jovana and Skorokhodov, Ivan and Galappaththige, Chamuditha Jayanga and Elhoseiny, Mohamed},
    title     = {Adversarial Text to Continuous Image Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {6316-6326}
}