GANs, Diffusion Ride Dragon in AI Image Generation

As soon as we open any social media platform these days, we stumble upon AI generated images of celebrities, cities, or a new feature of Midjourney, increasing its capabilities in all verticals and horizontals. These diffusion models-based image generators were one of the first showcases of the capabilities of generative AI ever since they were released last year with DALL-E.

Now, the capabilities of diffusion models have surpassed everyone’s expectations. Meet DragonDiffusion, a model that enables dragging objects within an image to change its shape and orientation. This allows seamless manipulation of images and objects within them without requirement of any fine-tuning of the existing models – a photoshop user’s dream come true.

How does it work?

The fundamental idea behind DragonDiffusion is the construction of a classifier guidance system that utilises the correspondence of intermediate features within the diffusion model. This guidance system translates editing signals into gradients using a feature correspondence loss, allowing for modifications to the intermediate representation of the diffusion model.

By considering both semantic and geometric alignment through a multi-scale guidance approach, DragonDiffusion facilitates various editing modes for both generated and real images. These modes include object moving, object resizing, object appearance replacement, and content dragging.

To ensure consistency between the original image and the editing result, DragonDiffusion incorporates a cross-branch self-attention mechanism. This mechanism maintains the overall coherence of the image throughout the editing process, ensuring that the edited content seamlessly integrates with the original.

Read: Diffusion Models: From Art to State-of-the-art

Extensive experiments have been conducted to evaluate the performance of DragonDiffusion, and the results are remarkable. It demonstrates the ability to perform a wide range of image editing applications, including object moving, resizing, appearance replacement, and content dragging. DragonDiffusion offers a powerful and user-friendly interface for interacting with diffusion models, harnessing their creative potential.

The success of DragonDiffusion can be attributed to the inherent properties of diffusion models, which exhibit strong correspondence relationships within their intermediate features. While previous approaches such as GANs primarily focused on the correspondence between text and image features, DragonDiffusion capitalises on the stable and fine-grained correspondence between image and image features. This fine-grained image editing scheme opens up new possibilities for precise and intuitive manipulation within diffusion models.

Wait..we have seen this before?

People started questioning the relevance of GANs in the age of diffusion models. But just as this thought could take shape, researchers made a huge breakthrough with DragGAN, allowing editors to drag and change objects’ orientations and shapes in real-time. Ironically, this development made people question the abilities of diffusion model based image generators.

Read: GANs in The Age of Diffusion Models

Similar to DragonDiffusion, this GAN-based method leverages a pre-trained GAN to synthesise images that not only precisely follow user input, but also stay on the manifold of realistic images.

The researchers have introduced a novel approach that distinguishes itself from previous methods by offering a general framework that does not rely on domain-specific modelling or auxiliary networks. This groundbreaking technique involves optimising latent codes to gradually move multiple handle points towards their desired positions. Additionally, a point tracking procedure is employed to accurately trace the trajectory of these handle points.

By leveraging the discriminative characteristics of intermediate feature maps within the GAN, both components of this approach enable precise pixel-level image deformations while maintaining interactive performance.

The researchers have confidently asserted that their approach surpasses the current state-of-the-art in GAN-based manipulation, marking a significant advancement in the field of image editing utilising generative priors. Furthermore, they have expressed their intentions to extend this point-based editing technique to 3D generative models in the near future.

Image generation to a whole new level

It was believed that due to the complexity of the diffusion process, it would be difficult to infuse dragging techniques within them. Now, with DragonDiffusion, diffusion model research is back on track. On the other hand, it is crucial to acknowledge that GANs are also proving to be equally capable in the ecosystem.

The rising popularity of diffusion models can be attributed to their unique strengths and advantages in various image synthesis scenarios. However, it is important to recognize the enduring significance and impact of GAN models, as they have demonstrated their efficacy in producing visually appealing outcomes.

The current landscape witnesses a dynamic interplay between these two approaches, with diffusion models resurfacing and reclaiming their position, showcasing their ability to enhance and complement the image generation domain, along with GANs.

The post GANs, Diffusion Ride Dragon in AI Image Generation appeared first on Analytics India Magazine.

GANs, Diffusion Ride Dragon in AI Image Generation

How does it work?

Wait..we have seen this before?

Image generation to a whole new level

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product reccomendations

How Circle co-founder Sean Neville plans to construct the primary AI-native monetary establishment

Latest stories

How Circle co-founder Sean Neville plans to construct the primary...

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product...

Meta restructures its AI unit below ‘Superintelligence Labs’

Why AI will eat McKinsey’s lunch — however not...

As job losses loom, Anthropic launches program to trace AI’s...

You might also like...

How Circle co-founder Sean Neville plans to construct the primary AI-native monetary establishment

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product reccomendations

Meta restructures its AI unit below ‘Superintelligence Labs’