This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Demonstrating that AI art is art & that AI artists are artists

Post Body

Introduction

There's a lot of arguments about whether or not AI artists are truly artists, and they stem from a deeper discussion about whether AI art is truly art. In this post I'm going to aim to convince you not only that AI art is art, but also that the people involved in creating that art are artists.

This will be a long post, so get a coffee and hopefully it's informative or at least entertaining.

Defining our terms

Defining 'artist'

The Cambridge dictionary has three definitions for the word 'artist':

someone who paints, draws, or makes sculptures
someone who performs music
someone who creates things with great skill and imagination

I'm going to be working with the third definition, which is mirrored in the first definition in Merriam-Webster:

a person who creates art (such as painting, sculpture, music, or writing) using conscious skill and creative imagination

If we distil these definitions, there are two requirements for someone to be considered an artist:

Skill
Imagination

Defining 'art'

Now, if we look at definitions for the word 'art' (noun), Cambridge provides us with this definition:

the making of objects, images, music, etc. that are beautiful or that express feelings
the activity of painting, drawing, and making sculpture
paintings, drawings, and sculptures
an activity through which people express particular ideas

It's interesting that none of these seem to be definitions for nouns but rather for the actions which produce those nouns.

Merriam-Webster defines 'art' (noun) more fluently in my opinion in its fourth definition:

the conscious use of skill and creative imagination especially in the production of aesthetic objects
also: works so produced

The case to be made

Given these definitions, I think it's fair to summarise that:

An artist needs to have used creative imagination in the process
An artist needs to use skills which they have developed to produce aesthetic works
The works resulting from that combination of imagination, expression and skill are art

I don't think that most people would insist that one needs to be skilled at illustration or painting to be an artist. To do so would be to say that photographers, musicians, directors, composers and authors aren't artists.

I will endeavor to demonstrate that skill is involved, but I'm not interested in trying to demonstrate that those specific skills are involved. If the argument is that AI artists are not illustrators or painters, then there is no argument.

I use Stable Diffusion as a tool, and I'll be using it to make my case here. In most examples I will generate eight images with the same parameters and pick the one I consider to be best. Because I have a fairly low end GPU (3070 Mobile), I will generate less images on the more demanding applications using ControlNet.

Demonstrating creative imagination

I think that creative imagination is the easiest point to demonstrate, but for the sake of completeness, I'm going to start at the beginning.

Defining creativity

It may seem pointless to try to define 'creativity', because we all know what it is, but that doesn't make it easy to put it into words.

The Cambridge dictionary describes creativity as "the ability to produce or use original and unusual ideas", and Merriam-Webster describes it as "the ability to create" or "the quality of being creative", which in term is defined as "having the quality of something created rather than imitated (imaginative)".

To meet this definition, we should be able to demonstrate the following:

It is possible for an AI artist to create something original (i.e. not imitated).
It is possible for an AI artist to create something unusual (i.e. not commonly seen in real life).

Some would (wrongly, I believe) argue that the tool is the creator, but this argument could also be made for a camera, or indeed a drawing tablet or a pencil. In most cases, to create art a human needs to use some form of tool. A director needs actors and a film studio, a conductor needs an orchestra, a musician needs an instrument or a DAW. In all of these cases, as with AI art, the tool doesn't create anything without human input.

Demonstrating originality

We can show that AI can create original concepts pretty easily, we just need to show that the output can be controlled to generate something new. This can be done with a simple prompt.

Model: IvoryV2, Prompt: I forgot the exact prompt but it was something like... (masterpiece, best quality, detailed, detailed face), a painting of an old farmer holding to a tiny blue dragon, conversation, sitting on a bench, in the mountains, concept art; Negative: (worst quality, low quality), EasyNegative; Sampler: DPM 2M Karras; Steps: 100; CFG Scale: 7.

There are several fair arguments to make about this image -- I didn't control the fine details at all, I didn't specify that I wanted a tree, the bench is broken, the hand holding the dragon is broken etc -- which I will address later when I talk about skill & control, but the goal was to demonstrate originality and it is an original picture. Reverse image searching the picture shows nothing particularly similar.

Demonstrating unusual-ness

It would be fair to say that the image above is unusual, but let's go way out there and imagine an entirely different world. A food hall in ultra-clean space station with metallic walls and floors, filled with greenery.

Model: SciFi Diffusion V10; Prompt: (masterpiece, best quality, detailed, detailed), a photo inside a space station, restaurants, food court, bustling, metal walls, plants, greenery, busy, people, vendor stalls, marketplace; Negative: (worst quality, low quality), EasyNegative; Sampler: Euler a; Sampling steps: 35; CFG Scale: 7.

I think it's fair to say that this is unusual. The juxtaposition of end game capitalism (space market) with all that greenery isn't a normal commentary on the future.

If creativity is the art of creating something original or unusual, I would say that I've done that, but that's the easier of the two points to prove.

Demonstrating skill

I've seen a lot of poor argumentation on this topic from both sides.

Pro-AI arguing that because something takes time, that it takes skill is a poor argument. Waiting for the paint to dry in my kitchen takes time, but waiting for paint to dry is not a skill. I will therefore not be arguing that time spent waiting equates to skill.

Similarly, I've seen the argument that "this took a lot of effort" bandied around a lot, but effort is not skill. Usually having greater skill reduces, not increases, required effort. I will therefore similarly not be arguing that working hard equates to skill.

Anti-AI arguing that prompting can't be a skill because "it's just writing what you want in a box" fail to acknowledge that creative writing itself is a skill, and that prompting has a whole bunch of factors which can be learned in order to improve image generation.

I intend to go much deeper than prompting in this post, but I reject the idea that prompting isn't a skill. It may not have a particularly high skill ceiling but those who practice it will achieve better results than those who don't.

Defining the challenge

Merriam-Webster defines skill as follows:

the ability to use one's knowledge effectively and readily in execution or performance
a learned power of doing something competently : a developed aptitude or ability

The Cambridge dictionary has a more basic definition:

an ability to do an activity or job well, especially because you have practised it

In order to demonstrate that skill is involved in creating AI art, I need to be able to show the following things are true:

There are aspects of creating AI art, which when practised and studied, directly result in a higher quality image.
There are aspects of creating AI art, which when practised and studied, directly result in the artist having greater control over the final image.

Skill involved in prompting

Let's talk about prompting. A lot of people think prompting isn't a skill, and while I can't think of any time that I've been able to get what I want from just prompting, I've definitely observed the quality of my generations improving from more practice with prompting.

Let's say that we want to generate an image of a post-apocalyptic city overrun by nature, similar to the effects seen in The Last of Us or parts of the Fallout series.

Let's see what we can get with a very basic prompt:

Model: IvoryV2, Prompt: (masterpiece, best quality, detailed), a post apocalyptic city overrun by nature; Negative: (worst quality, low quality), EasyNegative; Sampler: Euler a; Sampling steps: 35; CFG Scale: 7

This isn't a terrible example, some of the generations didn't have a lot of nature involved at all, and one had a mushroom cloud which is more apocalyptic than post-apocalyptic. Let's see if we can get more control over this image with prompting.

Let's say we want a sweeping shot showing more of the city, as if taken by a drone. We want more foliage, to put the apocalyptic event much further in history, and we want fungus taller than the buildings to be visible in the city, giving off a yellow mist of spores.

Model: IvoryV2, Prompt: (masterpiece, best quality, detailed), a photo of a post-apocalyptic city, foliage, moss growing on buildings, vines growing on buildings, cinematic, drone footage, from above, huge colorful mushrooms, sunrise, (yellow mist:1.4) in the streets, realistic lighting, broken windows, [colorful mushrooms :0.1], contemporary, [city | jungle]; Negative: (worst quality, low quality), EasyNegative, futuristic, sci fi; Sampler: Euler a; Sampling steps: 35; CFG Scale: 7.

I'm not the most skilled prompter in the world, and you could probably debate whether you prefer the first or the second image, but I think it's clear that by using a few prompting tricks I achieved far greater control over this image than I had over the first.

Some of the more advanced prompting used here includes:

Weighting: getting a yellow mist was very challenging, but using a higher weight actually made it stick without turning all the buildings yellow;
Delay: by delaying 'colorful mushrooms' to step 4, I was able to avoid the buildings being colourful as well;
Mixing: the mix of city|jungle created a far more natural effect with the foliage;
Concept bleeding: the AI wanted to merge the buildings with the mushrooms, so I had to negative prompt 'futuristic' and 'sci fi' to avoid those strangely shaped buildings.

I am in no way claiming that prompting is as difficult as learning to draw, only that it is a skill which can be improved to get closer to what you want.

Demonstrating deeper control over images

Prompting alone is a great way to generate ideas, brainstorm, and reach starting points... but what if you want something really specific? Prompting, no matter how well it's done, is not a consistent way to get anything from your mind's eye onto the screen.

Using img2img & basic outlining

So how can we get more control? Let's start by setting a scene using img2img. We want this image set in a beautiful lagoon, hanging vines, waterfall etc. We could try prompting for the exact scene that we're imagining but we're unlikely to get it.

The next step is to open up Photoshop (well, in my case, GIMP since I don't want to pay for Photoshop) and either draw or photobash (or both) the scene we're imagining.

Quick sketch of what I want the scene to look like made with a mouse in GIMP.

For those who can't tell anything from this: I'm looking for a rock face with two caves, a waterfall between the caves, and some foliage in front of and behind the rocks. I'm specifically looking for foliage on the right and over the top to provide a 'cozy' look by vignetting the entire image.

I fed this into img2img with a denoising strength of 0.85, essentially just keeping the basic colors and letting the sampler work with them to generate from the prompt. This is what I got back:

Model: IvoryV2, Prompt: (masterpiece, best quality, photo background, detailed, photorealistic), a photo of a lagoon, jungle, saltwater lake, (caves :1.2), waterfall, sand, realistic lighting; Negative: (worst quality, low quality), tree, EasyNegative, inside cave; Sampler: DPM 3M Exponential; Sampling steps: 150; CFG Scale: 7.

Not bad. The second one is definitely the closest to what I was going for, but...

Using InPainting

I didn't ask for that many caves, or those rocks in the water, so now we go back to GIMP and work on it some more:

Re-edited in GIMP.

Touched up, re-adding the pillar between the caves, removing the water rocks and thickening part of the waterfall.

This time, we go to the inpaint tab. We don't want to change the entire image, so we're going to tell StableDiffusion to only repaint the parts that we've changed (leaving plenty of space around those areas so that we don't get weird artifacts). We'll turn the de-noising strength down to 0.5 because we don't want to lose shapes, only add surface details. Here's what we get from this:

After inpainting, same settings as above.

Obviously, you could do more steps of inpainting to remove or add anything you want, and tweak the image as much as you want, but the point of this is only to demonstrate compositional control, and we've achieved that.

Adding characters to the scene using ControlNet

We could attempt to prompt characters into the scene with inpainting, but it's very unlikely that we'd get the characters we want in the pose that we want. Enter ControlNet.

I used a pre-made pose on posemy.art for this (normally I'd pose them myself to get full control of the composition, but this post has taken a few hours to write and I'd like to get the dog out for a walk), and exported depth maps & openpose references:

OpenPose map

Depth map

I then inpainted a large area around where the characters are on the PMA references as well as the leaf in the foreground because it would overlap the characters (and honestly I really don't like how it looks), and set the denoising strength back up to 0.78 to regenerate the area wholesale. I did a simple update to the prompt and here are the results:

Model: IvoryV2, Prompt: (masterpiece, best quality, photo background, detailed, photorealistic), a photo of a lagoon, a man holding a woman in the air, jungle, saltwater lake, (caves :1.2), waterfall, sand, realistic lighting; Negative: (worst quality, low quality), tree, EasyNegative, inside cave, moss; Sampler: DPM 3M Exponential; Sampling steps: 150; CFG Scale: 7.

Messy, but the second one is looking pretty promising. We'll need to re-add our vignette using inpainting, but that's going to be easier than rolling for the right pose over and over.

What if we want fine-grained control over the characters though? There are two options: if we want to use well-defined existing characters, we can use LORAs (along with the Composable Lora plugin), but for this example, we're not looking to use any specific characters.

Demonstrating control of characters using Latent Couple

An artist doesn't want the AI deciding who the characters in their picture are, so let's be a lot more specific using the Latent Couple plugin. We're going to have a white guy in speedos and a hispanic woman in a bikini.

You could be a lot more specific if you wanted to (or use LORAs) but this is enough detail to illustrate the point (and my laptop GPU really doesn't love using 2 ControlNet tabs & Latent Couple at the same time). Set the denoising strength down to 0.58 and this is what we get:

Model: IvoryV2, Prompt: (masterpiece, best quality, photo background, detailed, photorealistic), a photo of a lagoon, a man holding a woman in the air, jungle, saltwater lake, (caves :1.2), waterfall, sand, realistic lighting AND a photo of a white man, human, caucasian, short hair, wearing blue speedos AND a photo of a hispanic woman, human, hispanic, long black hair, wearing a (bikini :1.3); Negative: (worst quality, low quality), tree, EasyNegative, inside cave, moss; Sampler: DPM 3M Exponential; Sampling steps: 150; CFG Scale: 7.

That didn't work out great! It added an extra head in most of the pics (my depth map is too small to be read well), and ignored half of the prompts, so I went back into GIMP, manually tidied some stuff, played with ControlNet weights, brushed back in my old vignette leaf and did a couple more rounds of inpainting to get a result I was happier with:

Same settings as above.

The people are still far from perfect. If I had better Photoshop skills, this would be the time when I'd clean them up too. Alas, another skill which would benefit my AI art is... digital painting. And that's really my point: while I have some skills and some understanding of the process, my work is still amateur and has a lot of areas that need improvement.

Skills needed to succeed with Stable Diffusion

So, what skills have we demonstrated?

We've demonstrated that good prompting allows for more control over an image, but that prompting alone will only take you so far.
We've demonstrated that just like other forms of art, an understanding of good composition is important to create a good final image. This means that skills like depth perception, colour theory and composition theory are still relevant to AI artists.
We've demonstrated that being able to draw/paint to at least a minimal degree enables far better control of how the images look. The better you can draw, the better results you'll be able to get out of SD -- if I could draw a half-decent lagoon scene, I wouldn't have to rely so much on denoising & randomness.
We've demonstrated that a good ability to pose (and make) 3D models enables greater control over generated images. There are a whole bunch of skills around training & improving LORAs that I haven't touched on.
We've demonstrated that I lack the skills necessary to get the best out of Latent Couple, and that someone would need more skill than I have to get quality character designs without LORAs (which makes sense because I usually use LORAs).

In short, there are skills which can be practiced and studied which directly lead to better images. AI art is demonstrably a skill-based and creative activity.

In summary

We have demonstrated that originality and unusualness are possible, that Stable Diffusion responds to your creativity, and that understanding its various components and plugins is necessary to get good results with it.

If we define (as we did at the start) art as the intersection of human creativity and human skill, then AI artists are artists, and AI art is art.

If we try to change the definitions to require the skill specifically to include drawing or painting, then digital artists, photographers, scuptors, and musicians aren't artists.

It's fair to say that AI artists aren't illustrators, or painters, or photographers. It's fair to say that some of the skills involved are easier to learn than others (for example, digital painting with an undo button is much easier than traditional painting where you have to improvise when you mess up).

There are differences between the arts, but that doesn't make some of them arts and others not.

Author

Account Strength

50%

Account Age

10 months

Verified Email

Yes

Verified Flair

Total Karma

3,192

Link Karma

406

Comment Karma

2,786

Profile updated: 4 days ago

realechelon

Subreddit

r/aiwars

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 10 months ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/aiwars/comm...