Artificial Intelligence (AI) has blown up in popularity as one of the most exciting and misunderstood technologies of recent times. This year saw the advent of ChatGPT, an AI Chat Bot that can answer queries—including writing a comedy routine about grocery store cheese. While these types of plugins demonstrate AI’s ability to generate text, few are talking about AI’s ability to generate images. That’s why I’ve compiled three of the best image generators for you to experiment with and impress your friends.
The three image generators in question are Midjourney, DALL-E, and Stable Diffusion. Each generator has its inherent advantages and disadvantages, but we recommend not straying away from them if you want to create some high-quality AI art. Or you could just generate an image of Darth Vader smashing out a DJ set at the club as I did below.
My comparison analyzed cost, ease of use, image resolution, dynamic range, composition, creativity, post-processing, and even speed. Along with those factors I also fed each with identical prompts to see how they rendered four different scenes.
Before we start, I’d be remiss not to discuss claims that AI Image Generators are “stealing” other artists’ work.
It doesn’t take a lot of research to realize all of these programs understand art through machine learning: algorithms that are merely recognizing patterns and well… not much else. Are they trained using art made by humans? Yes. Are they flat-out stealing it? Not really.
It’s important to note that AI isn’t simply copying and pasting other artists’ work into one giant mashup. The key issue here is generative AI that can replicate artists’ styles—which is already possible using a program called Midjourney. Things are obviously very complicated right now and the matter is actively being investigated by the United States Office of Copyright. As of right now, there’s still not a definitive yes or no as to whether these generative AI programs are on the right or wrong side of the law—though we do at least know that AI-generated images can not be copyrighted.
That being said, resistance to new innovations in artistry is nothing new. This movement of AI taking over reminds me of painters’ reacting to the advent of photography back in the 1820s; the majority of them were outraged, but it also had a hefty influence on impressionist painters. It pushed them over the edge into accepting photography as the best medium for capturing life’s fleeting moments. This actually allowed them to lean further into the funk zone with their style to complement photography instead of compete with it. Funny to think that this controversial new medium actually opened up opportunities for painters to be more creative.
I fed each image generator with three prompts to compare their ability to create different scenes. However, instead of messing around with random prompts and letting my imagination run loose, I was very purposeful with the prompts I chose. The first aimed to not only evaluate the ability to create human forms, but also trees, forests, and complex lighting scenarios.
It’s important to note that human forms (think of things like your face, hands, arms, and legs) are still very difficult for AI to produce realistically.
➥ Prompt 1: “A close-up of a robot working at a desk in a densely packed forest”
Midjourney: It’s clear to see that Midjourney objectively produced the best image here. Not only is it the only rendition that even attempts to replicate complex human features, but it does so with impressive levels of detail. That’s also not to mention the “densely-packed forest” behind our robot, which shows quite a lot of depth and complexity; note that the foreground elements are in focus and the background is nice and soft—just as you would get from a photograph.
Stable Diffusion (Dream Studio): At first glance, this render simply isn’t in the same ballpark as the Midjourney work that shows up first. If you look closely, it doesn’t even make an attempt to render human hands. However, the layering of the image (foreground, midground, and background) is actually very strong, and the forest isn’t all too bad either. I’d even go as far as to say that the composition—or framing of the image—is just as good if not better than the first.
DALL-E: While the first two images were overwhelmingly positive, DALL-E’s rendition of the robot left quite a lot to be desired. Not only is the robot a complete mess—you really have to work to find the arms and legs—but also the forest and lighting conditions are basic, flat, and downright uninteresting.
The goal of this second scene was to showcase each program’s ability to generate a complex lighting situation—ie: a forest fire. However, while the ability to generate the color, warmth, reflection, and even halation of fire is great, that’s far from everything needed to achieve a photorealistic render. The other goal of this prompt was to demonstrate the ability to render objects that are on fire. Trees evolve quite a lot as they burn down, leaving sparks and embers, and a colossal amount of smoke. Of the three different comparisons I’d say the differences between the three images are most profound in this scene.
➥ Prompt 2: “Massive overgrown forest on fire being extinguished by firefighters”
Midjourney: There are no prizes for guessing which image was produced by Midjourney. Every time I look at this masterpiece I notice something new. The complexity in particular takes it to the next level; the flames themselves are also complemented by sparks and heat haze, which add drama to the image. That’s not to mention that it appears to be quite a windy day—good thing this is a render and not reality.
Stable Diffusion: I actually really like the composition that Stable Diffusion came up with —it’s arguably better than the Midjourney image. Everything else on the other hand is well… not as good. Sure the flames are there but show up extremely basic compared to the number 1 image. This is the perfect example of what separates a good image from a great image.
DALL-E: Unlike the other two, DALL-E’s rendition of the forest fire leaves a lot to be desired. Is it a digital rendering of a prompt? Yes. Do I like any one aspect of it? No. It might sound like I’m being harsh, but when you realize DALL-E is still a paid service (like the other two) this level of performance is downright disappointing.
As a photographer, many of the images I capture are inspired by Edward Hopper’s paintings—known for their moody commentary on the strangeness of everyday scenarios. So why not use it as the base for an imitation game between all three of these image generators? Not only will this test the ability to replicate Hopper’s work, but I realized it’s also a great way to test out the compositions that these image generators can come up with; I’m always enamored by the layering of foreground, midground, and background in his work. Instead of rendering the same intimate small-town settings—as Hopper would have—I challenged Midjourney, Stable Diffusion, and DALL-E to recreate the Empire State Building.
➥ Prompt 3: “The empire state building, Edward Hopper style”
Midjourney: I was especially blown away by its ability to render Hopper’s style here. The result is super tight as the final image is well put together and hits all the right notes. Along with nailing the style, it’s one of the only images that has a good amount of layering; note the person and staircase in the foreground, more buildings in the midground, and the Empire State Building in the background. It’s also the only image that shows the landmark from ground level.
Stable Diffusion: Keeping with the pattern of previous scenes, Stable Diffusion came close to the Hopper look, but the result was never really in the same ballpark as Midjourney’s version—the composition is orders of magnitude simpler. I do have to say the dynamic range (the ability to replicate the darkest and brightest bits of the image) is still quite good.
DALL-E: Given the painterly style we’ve seen from DALL-E, I wasn’t super surprised to see that its work was actually quite impressive here. While it doesn’t feature the same definition seen in Midjourrney, you’d probably be able to successfully identify it as a painting of the Empire State Building.
🏆 Best Overall: Midjourney
Midjourney is the most advanced AI image generator that we tested. It not only produced the highest visual fidelity, but also cranked out equally impressive human anatomy (i.e.: hands, feet, legs, and arms), dynamic range, textures, and composition relative to the generators on test. However, these spectacular results were the most difficult to achieve, with a steep but rewarding learning curve to grind through.
The learning curve is steep, but it’s important to mention that the latest version of Midjourney produced stunning results right out of the gate. The key is learning the right commands and keywords to get the last 10 percent from your image. Once you have your prompt figured out, Midjourney allows you to add separate ideas—you’re able to split them using commas—to give you more freedom to create the image you want. For instance, below you’ll find the exact prompt that we used for the first scene that we rendered in Midjourney.
While we were most impressed with Midjourney, it’s not perfect. It was not only the most difficult to learn, but also proved to be the most expensive. Starting out you’ll get 25 free “tokens,” before you have to pay a monthly subscription; these are available in two tiers with $10 a month giving you around 200 renders per month while $30 gives you unlimited queries.
🧑🎨 Easiest to Use: Stable Diffusion
Stable Diffusion (SD) is by far the easiest AI Image Generator to get the hang of. While it isn’t collaborative like Midjourney, we used Dream Studio, which allows you to interact with Stable Diffusion using a visual interface that legitimately makes sense. There aren’t any complex commands or syntax to learn for you to achieve the final image you want.
Stable Diffusion also makes it easy to tweak renders that you’ve already made. Yes, you can do this in Midjourney but the process is a bit convoluted and doesn’t give you any control of where to take the image. As an example, I’ve taken the image of the Empire State Building below and added some tags to change the time of day from dusk to dawn. You can see, the composition of the image is largely the same with just a bit more warmth and light in the second image. In Midjourney it would’ve been quite similar.
We’d be remiss not to mention that we were using the latest SDXL Beta version of Stable Diffusion for this article. Unlike Midjourney, which is paid for as a subscription service, Stable Diffusion uses a token system where $15 will get you approximately 7,500 images.
🏁 Best For Starters: DALL-E
Following our ChatGPT coverage, I came in with high expectations for DALL-E, which is hosted by the same company (Open AI). Unfortunately, that was where the hype ran out and the disappointment began. As a start, the images DALL-E produced weren’t all that impressive—especially for a paid service where users burn tokens for each render. You get a limited number of tokens to start but they run out pretty quickly.
The max resolution of (1024 x 1024 pixels) is comparable with the best generators out there. However, I was a bit befuddled to see that the final images failed to deliver the same visual fidelity. Most of DALL-E’s work appears more painterly when compared with Midjourney and Stable Diffusion. This means you’re simply out of luck if you want to make a render appear photorealistic when using DALL-E.
Worst of all, I ran into the same capacity issues that I experienced with ChatGPT—getting an error message just about every time I tried to make a rendering. This effectively made the service completely unusable for about 90 percent of my time with it—yeah not great. I’d be inclined to look the other way if DALL-E was free for all to use, but that simply isn’t the case. To offer some perspective, $15 in credits gives you approximately 400 DALL-E renders.
You Might Also Like