ImageFX AI Image Generator Test Drive: Is It Better than DALL-E?

On February 1, 2024, Google announced the launch of its AI image creation tool, ImageFX. This text-to-image generator is now available through the Google Labs website for users in the US, Australia, New Zealand, and Kenya.

The company also announced the release of MusicFX, a text-to-music tool that allows users to create music of up to 70 seconds in length or loops.

These releases come just months after OpenAI integrated the popular text-to-image model DALL-E 3 with ChatGPT, giving users the ability to generate images from written prompts.

Can Google’s AI image generator rival the best text-to-image generators already available on the market, such as DALL-E and Midjourney?

Learn what ImageFX is and where you can give it a try, explore its major features and capabilities, and take a peek at what lies ahead in the future of AI image creation tools.

Key Takeaways

ImageFX is Google’s newly released AI image creation tool powered by the Imagen 2 text-to-image model, which is designed to create highly realistic images.
A new Google’s AI image generator offers ‘expressive chips,’ basically a set of recommended keywords, which the user can choose to generate other stylistically similar designs.
This feature is the solution’s key differentiator from other competitors like DALL-E 3 and Midjourney.
ImageFX is now accessible only in English and is limited to a handful of countries, including the United States, Australia, New Zealand, and Kenya, with no global release dates revealed.
In the future, Bard and Imagen 2 have the potential to become a power couple much the same way that ChatGPT and DALL-E 3.

What Is ImageFX?

ImageFX is an AI text-to-image generator based on Imagen 2, a text-to-image diffusion model created by Google DeepMind, which has the ability to create high-quality, photorealistic images.

How to Use ImageFX

Users can access ImageFX by signing up for the AI Test Kitchen program on Google Labs. Then they select ImageFX or go to the ImageFX page directly, where they can sign into their personal Google account and start generating new creatives.

Note that the solution is now available for people in the US, Australia, New Zealand, and Kenya. Google hasn’t revealed the dates of the product’s global release yet.

However, you can also use one of the best VPNs (virtual private network) to access the service from other locations.

When using the platform, users can opt to enter their own text prompt and press the generate button to create an image or click on the “I’m feeling lucky” option to create a random prompt and image.

After creating an image, users have the option to download or share it. There is also the option to change a numerical seed to give the solution’s output more variety.

Users can also click on expressive chips at the bottom of the screen. Types of keywords recommended during our testing included photorealistic, dramatic, 35mm film, minimal, sketchy, handmade, wide shot, illustration, close-up, and highly detailed.

Testing ImageFX: A Step-By-Step Guide

In this section, we’re going to try ImageFX capabilities in action. All you need is a Google account, to be located in one of the approved destinations, or a VPN.

Clicking on the link will prompt you to sign in, so select the Sign in with Google option and then press Sign in again.

A pop-up will come up, giving you the option to receive marketing emails or research invitations. Check which option you want (if any) and press the Next button.

You will now be shown Google’s Privacy policy. Read through it and press Next if you agree. Clicking this will bring up a Google Terms of Service pop-up. Select the Agree and Continue option. Once the About ImageFX text box comes up, read it and select the Got it button to finish the signup process.

Now you’re all set and ready to experiment with your prompts!

Using ImageFX: The Basics

Generating AI image in ImageFX — Caption: Screenshot by Tim Keary

On the left-hand side of the screen, you will see a text box where you can enter your written prompt and press the Generate button to produce images that will be displayed on the right-hand side of the screen.

Underneath the text prompt box, you will find a button that says More alongside a series of keywords – the expressive chips. Clicking More allows you to generate another set of keywords, and clicking on a keyword adds it to your written prompt.

Finally, on the bottom right-hand corner of the screen, there are three buttons, the first allows you to select a numerical seed to increase the variety of outputs, the next lets you download an image, and the last one lets you share the image.

ImageFX in Action

As with any image-to-text tool, the quality of image output will depend largely on your initial prompt.

To get the best results, you’ll want to include as much context as possible. For the purposes of this guide, we decided to go with a surreal image – of an alien playing football with an ostrich.

The results were as follows:

The first image was acceptable and looked fairly “realistic,” but the other outputs weren’t so good.

To see if we could get some alternative designs, we pressed the More button to get ImageFX to provide us with more expressive chips to choose from.

From these options, we clicked on the Painting option to see how the image would look as a painting. The results were as follows:

To further test expressive chips, we went looking for an option that would create an animated-style version.

The closest keyword match we could get was Illustration. Here are the results of the prompt:

These results were probably the best of the bunch in terms of matching the intent of the prompt and the overall output quality.

ImageFX vs. DALL-E 3: Which Is Better?

To help evaluate ImageFX we decided to compare its output against DALL-E 3’s to see which created the best images. While this isn’t an exhaustive test of each LLM’s image quality – it does give an idea of how each tool will respond to a barebones prompt.

To start off our test, we instructed DALL-E 3 to create an image of an alien playing football with an ostrich (the same initial prompt we entered into ImageFX). The results were as follows:

During our test, we noticed that DALL-E 3 took longer than ImageFX to generate the image, but we felt the output image it created was much better than any of the designs produced by Google’s solution so far.

That being said – it did only generate one image.

To further build on our comparison we decided to see how each handled a cartoon T-rex. Here are the results:

The images created by ImageFX were all highly detailed, but we felt like DALL-E 3 produced an image that not only better matched the intent of the prompt, but produced a pretty good Disney-style animated character.

First Impressions of ImageFX

[Su_note]

Overall, ImageFX was very easy to use.

We found expressive chips to be a welcome addition – they offered a valuable reference point we could use to see how prompts could be adapted or improved when creating images. This would be useful for users who were struggling with coming up with compositional ideas.

While the image quality didn’t blow us away, particularly with the alien-ostrich example, in other tests, it did generate extremely high-quality results.

[/su_note]

Here is a decent image it created of an Astronaut on the moon:

In this sense, ImageFX is definitely a tool that you can get good results with if you’re willing to take the time to enter the right prompts.

Imagen 2 Explained

The core of Google’s AI image generator, Imagen 2, is the text-to-image diffusion model that enables ImageFX to produce high-quality images. It’s also a model used to power Google Bard so that users can create images directly, integrated with search generative experience (SGE).

To enable Imagen 2 to create detailed images, Google added more detailed descriptions to image captions in the model’s training data so that it could learn between different artistic styles.

Using this approach means that the model can better understand the context of user prompts and respond with more relevant output.

Another important differentiator for Imagen 2 is that it’s accessible with Google Cloud – more specifically via the ImagenAPI in Google Cloud Vertex AI.

In the future, Bard and Imagen 2 have the potential to become a power couple much the same way that ChatGPT and DALL-E 3 have, simply by making image creation technology accessible alongside a free, publicly available research assistant.

This is particularly true when considering the introduction of the more powerful Gemini Pro language model to Bard.

Where Does ImageFX Fit into the Text-to-Image Market?

ImageFX AI image creation tool is competing against a number of established competitors in the text-to-image market. Competing tools include OpenAI’s DALL-E 3 and Midjourney.

Below, we’ve created a high-level overview of what each tool has to offer.

Feature	ImageFX	DALL-E 3	Midjourney
Ease of use	Easy	Easy	Complex
Create images for free	Yes	No	No
Generate random images	Yes	Yes	Yes
Sizes/dimensions	Images fixed to 1536×1536	Image can be 1024×1024, 1024×1792 or 1792×1024	Images can be 1024×1024, 2048×2048, 4096×4096
Watermark	Yes (SynthID)	Yes (C2PA)	No
Copyright	N/A	Outputs owned by user	Outputs owned by user
Free Plan	Yes	No. Requires a paid plan such as ChatGPT Plus or Enterprise	No
Pricing	Free	Paid plans start at $20 per user per month for ChatGPT, $25 per user per month for ChatGPT Team, and price on request for the Enterprise package	Paid plans start at $10 per month for the Basic Plan, $30 per month for the Standard Plan, $60 per month for the Pro Plan, and $120 per month for the Mega Plan with more fast GPU time and other benefits included
Access	Via Google’s Search Labs (restricted regions)	Via ChatGPT	Discord

Google’s AI Safety and Legal Protections

Google has some basic safety protections to help mitigate the risks presented by AI-generated images. One of these protections is content moderation guidelines, which prevent the generation of violent, offensive, or sexually explicit content.

The organization has also made a concerted effort to make it easier for users to identify AI-generated images.

For instance, all images created with ImageFX are given a digital watermark by SynthID to make them easier to identify. Likewise, images also include IPTC metadata so that users will be able to tell when they encounter AI-generated images.

Using digital watermarks is an attempt to address concerns over deepfakes, digitally-created images of people that are difficult to distinguish from real ones.

At the same time, Google also states that “you can request the removal of images under our policies or applicable laws.” This provides a basic mechanism for getting images removed which violate local laws.

It’s worth noting that Google’s privacy policy notes that human readers annotate and process conversations had with ImageFX – so user inputs aren’t completely private.

The Future of AI Image Generation

AI image generation is evolving rapidly at the moment, with vendors like Google and OpenAI looking to build multimodal AI solutions that can respond to inputs, including text, images, audio, and video.

The development of ImageFX and its underlying model Imagen 2 highlight that Google is attempting to integrate the ability to generate high-quality, photorealistic images into its product ecosystem. This is shown by using Imagen 2 to add image-creation functionalities to Bard.

As it stands, there is a long way to go to advance image generation technology. While tools like Stable Diffusion and Midjourney have offered users powerful text-to-image generators, they’ve also been difficult to use.

Current technologies have also struggled to develop realistic images, having difficulties with elements like hands and faces and creating an unsettling uncanny valley effect when attempting to depict any lifelike designs.

The Bottom Line

Image generation is an extremely fast-moving segment of the AI market at the moment. Google’s launch of ImageFX provides a big opportunity for the organization to start competing against DALL-E 3, which has remained one of the most accessible text-to-image generators on offer.