Dall-E May Be Awesome, but It Still Can't Count.

Written by EnterpriseSEO | Published 2022/08/01
Tech Story Tags: ai | artificial-intelligence | image-generation | image-synthesis | image-processing | machine-learning | optimization | tips

TLDROpenAI’s Dall-E can be frustratingly terrible at following precise directions. While getting a nice image is relatively easy, it can be very difficult (if not impossible) to get the image you really *want* or need for your projects. Why? Because for all it's awesomeness, Dall-E still sucks at a few things. Here are some tips to help get around these issues.via the TL;DR App

(image generated by Dall-E & me)

If you’ve ever played with OpenAI’s incredible generative AI “Dall-E” you know it’s a game changer.

But Dall-E can also be incredibly frustrating to use for professional developers or creatives.

Why?

Because while getting a nice image is relatively easy, it can be very difficult to get the image you really want or need for your projects.

Put another way, Dall-E doesn’t really help you get an image from your mind onto the screen. If you’re trying to render a very specific scene or concept, Dall-E may be very good at surprising you — but it can be frustratingly terrible at following precise directions.

For example, the image of the sword-wielding robot at the top of the page was created with a single English sentence that I entered into Dall-E’s dashboard. But what you aren’t seeing in the image above is the 20 to 30 failed attempts it took to get there.

Having spent far too much time, and far too many dollars buying additional Dall-E credits, I’ve come to realize that the way to get the most out of Dall-E is to keep in mind all the things that Dall-E just happens to be terrible at.

Knowing what Dall-E can’t do, will help you focus your efforts on the things that Dall-E can do.

A list of things that Dall-E (still) sucks at

Here are a few areas where Dall-E frequently fails, and some tips to help get around these issues:

Photographs of people

While it’s generally not possible to get Dall-E to make a photo of a famous person (except for some major historical figures), it’s not a good idea to generate a photorealistic image of any person.

Why?

Because Dall-E’s generated photorealistic images of people tend to look unnatural — and they often look completely weird.

Sometimes I think that I’ve got a decent generated photo, and then I’ll realize that the person in the photo has 6 fingers on each hand. Or a third arm. Or their eyes are looking in different directions.

These kinds of bizarre touches would be forgivable in an AI-generated painting or an illustration where the odd details can be passed off as “artistic license”. But in anything photorealistic, they tend to look jarring and weird.

TIP: If you need an image of a person doing something to convey a certain point, it’s a better idea to stick with illustrations and avoid generating photorealistic images.

Avoid anything sexy. Even if it involves elephants.

As you may know, trying to get Dall-E to generate “images of a sexual nature” will get your account banned in a hurry.

When I first got access to Dall-E I was having a blast making completely ridiculous images. One of the descriptive sentences I tried entering was:

“A drawing of an elephant laying on the beach, wearing a bikini, sipping a cocktail”

I pushed Enter on my keyboard and prepared myself for what I thought would surely be a hilarious image.

Never in my wildest dreams did I think I was about to trigger a very stern warning from OpenAI — or that I would suddenly be in danger of getting my account banned for attempting to create adult imagery.

But that’s exactly what happened!

So, lesson learned. Better to stay away from that department altogether.

TIP: Avoid using potentially risqué words like “seductive” or “revealing”, and avoid using settings like “bedrooms”, “beaches” and “pools”. Even if what you’re creating is totally innocuous, OpenAI’s safeguards against adult content are extremely sensitive.

Dall-E can’t count

I was recently asked to help produce a feature for a website called iFate. iFate is a Tarot, I Ching, and astrology website, best known for its popular online Tarot reading feature. They asked me to use Dall-E to develop a set of AI-generated Tarot cards.

Easy peasy, I thought. This was going to be a no-brainer for Dall-E.

Wow, was I wrong. It turns out that Dall-E doesn’t really pay much attention to numbers above 3.

So for example, if you’re trying to get Dall-E to draw the “Seven of Swords” Tarot card, it’s going to be pretty important, no matter what Dall-E comes up with, to have a total of 7 different swords on that specific Tarot card image.

No matter how hard I tried, I couldn’t get Dall-E to actually paint me a picture with a total of 7 swords on it. Needless to say, the same problems applied to all the rest of the 10 numbered cards in each of the Tarot suits: The Suit of Swords, Suit of Cups, Suit of Pentacles, and the Suit of Wands.

The project took much, much longer than I had anticipated.

TIP: If you’re using Dall-E to do anything involving specific numbers of objects, bear in mind that you’re probably going to be doing a lot of post-production in Photoshop. Dall-E doesn’t seem to count above 3 objects very well.

Dall-E can’t read or write

It might seem surprising that Dall-E can’t render text. After all, Dall-E’s creator, OpenAI, also created the GPT-3 generative AI, which writes incredible human-readable prose.

But Dall-E is a very different beast.

While working on the Tarot card project I described above, I had to add 100% of the card-name text using Photoshop. It wasn’t something Dall-E could handle.

If you ask for images involving a book or a street sign, expect plenty of garbled, illegible text along the way.

TIP: If you’re creating images that use text, such as street signs, book covers or posters, plan on doing all of your text-writing in post-production using Photoshop or another image editor.

Think in broad strokes. Don’t add too much detail.

Dall-E seems to handle 3 or 4 details in an image relatively well. But if you keep on adding additional details to your descriptive sentence, it looks like Dall-E just randomly starts to ignore some of them.

I find it helpful to reduce my instructions to Dall-E to just a few general features. Dall-E will often add details on its own. But it’s very difficult to force Dall-E to include more than a couple of additional features within a composition.

TIP: When describing images, keep the number of details to 3 or 4 details for best results. More than that, and Dall-E will often ignore your request.

Some General Dall-E Tips and Tricks

Lastly, here’s some general advice and quirky tricks that have helped me coax some truly incredible results from Dall-E over the past couple of months.

  1. Go with the flow. Dall-E never does exactly what you want it to do. Part of the joy (and frustration) of Dall-E is that it does its own wonderfully weird thing. It can be helpful not to fight it. Let it surprise you, and keep an open mind to what appears before you. It’s often better to use a slightly mismatched image that looks incredible, rather than a poor rendition of the image you were hoping for.

  2. Use Variations. If multiple attempts are getting you nowhere, and then suddenly you get a promising image, remember to use the Variations function. Using Variations can help Dall-E iterate in the right direction. It may take a few rounds of creating similar images, but often this process can help you jump from an image that’s “close” to perfect, to an image that’s spot-on.

  3. Remember to specify the medium. It can be very helpful to say “3d render”, “oil painting”, “sepia photo”, “woodblock print”, “pen & ink drawing”, “cartoon” or “watercolor”. Also, keep in mind that some images just seem to work better in different mediums for whatever reason. If something isn’t working as a “photo”, try “digital art” or “3d render”. You’ll often get superior results.

  4. Try using artists’ names: You can also try using names of artists like “in the style of Rembrandt”, or “in the style of Picasso”. You can even try “in the style of Pixar” or “in the style of Disney”. Sometimes an image that just isn’t working in one style, will look amazing in the style of a different artist.

  5. Adjectives are often not as good as a better noun. For example, I was having a lot of trouble trying to create just the right female sword-fighting robot image seen at the top of this page. After the first few attempts, the images still weren’t looking all that great. I eventually figured out that using specific nouns from the Japanese sword-fighting martial art, akido, worked better. Using specific terms like “bokken” and “aikidōgi” (two akido-specific terms) got me much better results than using long chains of adjectives to describe what I wanted.

Happy creating!


Written by EnterpriseSEO | I like to make beautiful things with code & graphics. I work with AI & SEO. I live somewhere in Asia.
Published by HackerNoon on 2022/08/01