DALL.E 3 and Midjourney Battle It Out

The cat’s out of the bag! After secretly working on an image generation tool for months, OpenAI finally announced DALL.E 3. Not only that, DALL.E 3 will be integrated with ChatGPT Plus and ChatGPT Enterprise in October – finally fulfilling the multimodal tag for GPT-4. With text and image generation now available on the famous chatbot, does the fate of other image-generation tools such as Midjourney seem murky?

Should Midjourney Be Worried?

When the image generation tool was being tested within Discord users a few months ago, the output generated by it was considered far superior to Midjourney. A user even mentioned that they have ‘zero interest in using Midjourney after using it.’ To what extent that holds true has been tested through a side-by-side comparison. Creative director and community developer in AI and art, Nick St. Pierre made a comparison with images generated from both DALL.E-3 and Midjourney by giving them the same set of prompts.

Prompt: Close-up photograph of a hermit crab nestled in wet sand, with seafoam nearby and the details of its shell and texture of the sand accentuated.

DallE. 3 (top) and Midjourney (bottom)

Prompt: A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon.

Dall.E 3 (left) and Midjourney (right)

It is noticeable that in the images generated by DALL.E 3, minute intricacies related to input prompts are followed. They are more closely aligned to the specific details mentioned in the prompt instructions.

Simpler Language Prompts

The USP of DALL.E 3 is the simplicity in its usage of text prompts. With ChatGPT integration, users can easily input via simple, conversational-kind of prompts, via simple sentences or detailed paragraphs, that will output relevant images. Users can continue to have conversations with the chatbot to further tweak the generated output.

A user has called out the superiority of DALL.E 3 in terms of image quality, prompt coherence, and an accessible UI. However, Midjourney is working on its latest version (V6) which is said to have a better understanding of natural language understanding. They are even looking to bring it on the web and mobile platforms.

Tool Versatility

Midjourney, a pure generative AI platform for creating images has all features that are built for image editing. The output can be tweaked to an extent of providing better colour, contrast or composition. With a list of features such as Zoom, Pan, Remix, etc. Midjourney V5.2 (the last released version) is synonymous with any image generation/editing tool and can be compared to the likes of Adobe too. However, DALL.E 3 does not have any of these features.

Multimodality In Pieces

With DALL.E 3 integration on ChatGPT, multimodality has been addressed, however, images can be produced only as an output. In Midjourney, you can upload images as reference with text prompts to get a desired image as per user needs. This is not available on DALL.E 3, and input is still in the form of text only.

Midjourney can be accessed only through Discord, something that is being highly criticised, especially after DALL.E-3’s release. The onboarding process and usage of the application on Discord seems complicated to many, which dissuades people from using it.

Safety First

During the initial testing phase with Discord users, OpenAI’s tool had no control on its safety feature. Gore, inappropriate images along with trademarked brand logos were generated. This has however been updated before the release of DALL.E 3. OpenAI announced in its latest blog about how safety has been prioritised in the image-generation feature by collaborating with red teamers and removing harmful biases related to visual representation. The model declines public figure requests, and even requests for images in the style of a living artist.

This is an area where Midjourney has tricky boundaries. In the past, a number of images of public figures such as Pope Francis, Donald Trump, Elon Musk and many more have been created using Midjourney. It led to wide criticism and copyright infringement lawsuits too. However, this issue is not yet addressed in spite of releasing five versions of the application. Furthermore, OpenAI is researching and building an internal tool, a provenance classifier, to help identify if an image is generated by DALL.E 3.

Starting as an avocado chair to becoming an avocado patient, OpenAI’s fascination with the fruit to show how far DALL E has come is quite impressive. Integrating it on a chatbot that has over 100M users is the best way for higher reach too. However, Midjourney with its impressive image quality with every version release, and with another version in the pipeline that will probably have a web and mobile presence, will certainly be a game-changer.

The post DALL.E 3 and Midjourney Battle It Out appeared first on Analytics India Magazine.