Introduction
I’ve been playing with the AI art generators more than photography, lately, and they’ve improved greatly in just a few months. AI can produce images that are amazingly detailed and with good composition, which I wouldn’t have thought yet possible, and yet, so many of the images have flaws.
If you haven’t tried it already, but want to, the easiest way is to use one of the many online services. Midjourney is one of the more popular paid services, but there are some that are free. If you want to install and run Stable Diffusion on your own computer, first be sure you have a decent Nvidia gaming card. For beginners, I would start with this easy-to-use UI: https://github.com/cmdr2/stable-diffusion-ui. And for experts, you may want to continue using it, as it is easy to use and straightforward. However, more features can be found in other UIs out there. The most popular one may be the one from Automatic1111: https://github.com/AUTOMATIC1111/stable-diffusion-webui.
For this discussion, I’ll be using the Cmdr2 version of Stable Diffusion.
Generating images
The idea with the new AI art generators is that you can describe the image that you want, and it generates it. Sounds simple enough, but you can only describe with so much detail, so the AI is going to have its own assumptions or ideas. I look at it as a cooperative effort. Note that Stable Diffusion is pretty much limited to 75 tokens, where a word is composed of one or more tokens.
In this example, I ask the AI for a “funny cat”, adding keywords to give some direction on style (such as “anime”, “illustration”, and a few painters’ names). I kind of like the overall look of the result - the cat with the raised paw, colors, and the composition.
Looking closer, there are some odd things about it. Part of the cat looks nice, but there are some serious problems with it. There is some flesh-color on the cat’s chin, and what could be the left paw is a big mess of fluff, for example. Let’s ignore the background mess for now.
Working with AI
Most of the models for Stable Diffusion are made from source images sized 512 by 512 pixels, so when you generate an image, you’ll want to stick close to these numbers. If you start out too large, the AI will “fill in” the extra space with duplicate subjects or oddly stretched objects, rather than make a more detailed subject. In this case, I started with a size of 512 x 704.
To get more detail, you use this image to feed back to the AI (a.k.a. img2img), telling it to generate at a higher resolution. This will often fix flaws, while adding detail, in generating the new image. It can add flaws as well, so multiple attempts may be necessary. The amount of change you allow is controlled by the “prompt strength”. I use .15 if I want to preserve most of the original image, and up to .35 if I want the AI to make some changes, but still keep the same basic image structure.
So what’s the plan? Scale up the image to see if the AI will fix things sufficiently, and if not, manually help it by painting-in what you want.
The first thing you might try is to see if the AI will just fix some of the flaws, given another try with a fresh random seed. This can be simply part of scaling-up to a slightly higher resolution. In this case, the things I want changed are too severe for minor modifications. We’re going to have to help the AI.
Starting with the original image (using img2img), I ran the “inpaint” tool to mask over the main things that I wanted to change.
I put a heavy mask over the left arm – I want to redo this entirely. I put less mask on the feet, but let’s see how far this takes us. I generated 2 images with a high prompt strength (0.97), and picked the best of the two.
The result seemed much improved, if still imperfect. To scale up, I increased the resolution while maintaining the same ratio. The next step-up is 576 x 768. I increased further and generated again, but flaws remained. The AI needs more help!
I returned to the earlier step, but this time, before inpainting, I went to the draw tool.
I painted a pupil in the right eye, painted in a better right-foot, painted in more of the belly, adjusted the tail, and removed some of the excess fluff. You can see that my edits are rough – we’ll let the AI finish it. The important part is to get the basic shapes the way you want it, using the appropriate colors. The AI will pick up on the shapes and colors to make its modifications, so any help we can give will get a better result.
Now, we just need more inpainting to clean up those areas. Set the resolution to the size of our source image, and select Inpaint.
Paint the mask over areas that you want the AI to change; places not masked will be left unchanged.
Set the prompt strength depending on what you’re trying to do. In this case, I want to keep close to the changes that I painted-in. I used 0.2. If I wanted something to be changed completely or get erased, I’d use a high prompt strength as I did earlier.
You may need to try a couple of times to get a result you’re happy with. Once we get a good result, we’re still left with a relatively low-resolution 576 x 768 image. We can use img2img to continue to resize larger. As part of this process, I also increase the steps, to ensure more detail. Be sure to turn off the mask, as you want the entire image to be modified in the img2img process, for scaling up the resolution. You need to set the resolution to something that is very close to the same ratio as the source image. There are plugins that can make this process much easier, even down to a single click. As you go to each next size up, the AI will fill in with additional detail, but sometimes it will change objects completely, depending upon your prompt strength (and your model – some are more prone to changes than others). With a starting point of 512 x 704, our next steps in resolution are 576 x 768, 704 x 960, and 960 x 1280. It’s here where I run into the limit of my GPU that has 8GB of VRAM. If you want an increased size for printing, you’ll have to use an upscaling tool, but it won’t add detail like we got with img2img.
Result
With some suggestions to the AI, the result is much better than the original. Sizing up in increments also allows the AI to change and add detail. It also changed some of the odd background items to other odd background items, but I think this, too, is an improvement.
This is a good demonstration of why AI art is not just press-a-button, if you want best results. One option is to generate large number of images, only picking out the least-flawed, but chances are that many of the most interesting ones need help. In particular, hands are usually flawed. Occassionally, you can get something great without any modifications needed, but you’ll still want to use img2img to increase resolution.
Is it worth it? While it takes a lot of time when making manual edits, if it’s an image that inspires you, why not? It’s not nearly as time-consuming as painting from scratch, so in that respect it’s a positive, but the negative is that it still takes a lot of time when you could just go on and generate another dozen new images to choose from. Or a hundred. However, I think this is a fun activity and a creative process, one that reminds me more of photography than anything else. With photography, you often don’t have total control of your original image, and there is a lot you can do manually, editing after the photo has been taken. With the AI, it’s a fun game between seeing what the AI comes up with and how to encourage it to bend to your intended style.