Skip to main content

Get the Reddit app

Scan this QR code to download the app now
Or check it out in the app stores
r/StableDiffusion icon
r/StableDiffusion icon
Go to StableDiffusion
r/StableDiffusion
A banner for the subreddit

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and exclude blind users from the site. More info: https://rtech.support/docs/meta/blackout.html#what-is-going-on Discord: https://discord.gg/4WbTj8YskM Check out our new Lemmy instance: https://lemmy.dbzer0.com/c/stable_diffusion


Members Online

Comparisons of various "photorealism" prompt

Comparison
  • r/StableDiffusion - Comparisons of various "photorealism" prompt
  • r/StableDiffusion - Comparisons of various "photorealism" prompt
  • r/StableDiffusion - Comparisons of various "photorealism" prompt
  • r/StableDiffusion - Comparisons of various "photorealism" prompt
  • r/StableDiffusion - Comparisons of various "photorealism" prompt
Share
Sort by:
Best
Open comment sort options
u/fivecanal avatar

I'm dumb. I really can see any differences in terms of photorealism. They all look pretty realistic to me.

u/Purplekeyboard avatar

That's actually what you should take from this. Most of these tags are placebo tags, they don't really do anything.

u/frequenZphaZe avatar

it'll depend a lot on the model. different tokens will have different strengths and different effects across different models. a lot of model perform well in their niche because they're heavily tuned for it, which can cause finer tokens to get pushed out of relevancy.

when doing a grid comparison like this, it's usually helpful to include various models for comparison as well. that helps to clarify whether specific tokens are wholly irrelevant or if certain models are over-tuned and ignore them more than others. since OP didn't do that here, all we can really conclude is that these tokens don't work well on his model

This is so true... and not just the model either. The rest of the prompt can change things up as well.

I did keep the rest of the prompt very short to allow the tokens to have the greatest affect, but even in testing the camera token especially were overriding other tokens

more replies More replies
More replies

Agreed, there is plenty of exploring to do in this space.  Different models will vehave very differently to each prompt depending on what tokens they were trained with

More replies
u/Fortyseven avatar

You know how you can stick nonsense like a semicolon (or whatever) in there, or something, and given the same seed you get a slightly different, but still very similar image?

It feels like a lot of folks confuse that non-meaningful volatility with actually useful prompting.

So much of this kind of thing is indistinguishable from tossing a nonsense word in. Just jostles the RNG a bit. ;)

u/raiffuvar avatar

you just dont use enough tags. LOL.
try more..more more...
but really, WTF to compare portrait mode.
every fucking model were trained in it. But if you promt something more complex.... good luck to get "photorealism" without placebo tags.

More replies
u/Comrade_Derpsky avatar

That's because there isn't a difference. Having "photo" in the prompt is enough to make it do photorealism. Same with specifying a camera model. Really, any photography terminology should push it cleanly into photorealistic territory.

Another take-home here is that the model really doesn't understand f stop values. Higher f-stop means a narrower aperture which widens the depth of field (fun fact, it's the same reason why squinting can bring things into focus). With f16, everything should be in very sharp focus with basically no visible depth of field or bokeh effect.

The ISO results actually makes sense, since its an outdoor photo during the day and there should be sufficient light for a good exposure regardless of how low the ISO speed is. This would mean that including ISO values in the prompt won't have a very clear effect since it would be very inconsistent in training images. On a camera, I'd expect more motion blur for moving things in the background because of the longer exposure time but that's not necessarily a given in a photograph.

None of the settings really make sense adjusted in a vacuum. Going from 100 to 800 ISO outside like that would drastically change the light level of the photo without also adjusting shutter speed and/or f stop. As a parameter I’m not sure what the goal with ISO was unless the effect they were looking for was actually shutter speed? Shutter speed is what adjusts exposure time. Those are different settings. Regardless the ISO images equally strike me as nonsense terms here.

I guess in some cases the weight of other prompt word can overwhelm the single "photo" one. But perhaps then it's enough to add explicit weight to "photo" to avoid deprioritization during generation.

If further clarification is needed: in case a hypothetical word "ABCD" is very strongly associated in a given model with specific style of abstract painting, then adding "photo" and "realistic" might not be sufficient to give expected results. Maybe it's not best example but I hope it's at least explaining the idea.

I was struggling with generating very fantastical combinations, not present in any real photos. For example - a photo portrait entirely made of realistic leaves. Or a room filled with trees.

But maybe I'm not skilled enough, and misunderstand something :). So I'll be happily corrected.

Yeah I was quite struck how little difference the f stops made. I’d have thought there were enough images on the internet with their exif data included that it might have learned what those mean but I guess not?

More replies
u/maxihash avatar

It looks like the tag only change ISO of the image (lighting) and I have no idea what is Fujifilm XT3 does there. Maybe it's the same tag as 'raw'

More replies

So basically it doesn't do anything. Lol

u/MaverickJonesArt avatar

yea my conclusion here is those words don't really matter haha

Words like "masterpiece" or shutter speed are too vague to do anything specific. If the training data of images tagged with "masterpiece" contain literally any and all kinds of images, what exactly are you telling the generator to show? You might as well put "good picture" in there. The only thing those words add is noise, which sometimes is good and sometimes bad. What you want are words that describe good things like "bokeh" or even "rule of thirds".

Almost everyone does this mistake but they will get mad if you tell them.

When SD was still only on Discord I did the same test with a cat.

https://i.imgur.com/bIt5ORh.png was just a simple prompt "painting of a cat by lilia alvarado"

https://i.imgur.com/y7FdDBx.png adds "8 k, artstation". There's a space between 8 and k because the discord bot would add that.

You'll notice it removes the clown costume from the cat, making the image objectively worse. In multiple variations of the prompt "artstation" resulted in the clown costume always being removed.

u/Fortyseven avatar

adds "8 k, artstation". There's a space between 8 and k because the discord bot would add that.

Dude is THAT why I see that so often!? That's been driving me nuts seeing that. :D

More replies
[deleted]
[deleted]

Comment deleted by user

[deleted]
[deleted]

Comment deleted by user

u/Comrade_Derpsky avatar

"Masterpiece", "best quality", and "high quality" seem to be synonyms as far as stable diffusion is concerned. It will make the picture look prettier and more professional. "High quality" also tends to make images more coherent when generating at high resolutions e.g. 768x1024 which usually have a lot of wonky stuff because of the size.

More replies
More replies
u/ImJacksLackOfBeetus avatar
Edited

yeah, f/2 having the same bokeh as f/16.
Seems legit. 👍

edit: This is the kind of difference one would expect with these f-stops.

Bro you think Stable Diffusion is a camera simulator or what?

u/ImJacksLackOfBeetus avatar

Bro you think a post comparing keywords should have differences between those keywords?

Well yes, I do.

Maybe go and read an article about CLIP and actually try to understand how this models was trained. And then how it is used to guide Stable Diffusion.

more reply More replies
More replies
More replies
More replies
u/xantub avatar

It's what I always thought. I stopped using all those extra keywords some time ago, no noticeable difference to me.

It does "wet plate" makes it rain and wet hair

Yeah because it has no idea what wet plate means, so it just takes the keyword "wet" and makes everything wet lol

More replies

It is subtle in what it does, and some stuff does more than others, but some also block others... it'll be a game of mix and match to find what you like

The "wet" in "wet plate" seems to have a much larger effect on the photo than the technique.

have you tried something like "(wet plate:1.3)" in the positive and "(wet:1.2), rain, water" in the negative? Sometimes I have to spend a while playing with the weights and the words / phrases, but I've occasionally gotten it to recognize a more esoteric terminology by doing that.

note: I did not do that with the wet plate example so don't know if it works. I don't care that much.

More replies
More replies
More replies

Thanks for sharing the results 👍

Comment Image

Follow-up, heres one with some combination word salads for your perusal as well

[deleted]
[deleted]

It's funny that in 'extreme realism' she looks like she slept 4 hours the whole week. I feel you sis

Truth!

More replies
More replies
[deleted]
[deleted]

Why Fujifilm XT3? I see it a lot. Can I do Sony A7 S3, Canon Rebel, etc and expect similar results?

Comment Image

Here you go.. answer, yes :)

Those look different but not from the technical side so it seems that various camera names don't make an actual difference where it should happen

The Hasselblad should have a much more shallow depth of field due to its large sensor. There should also be quite a difference between full frame Sony and APS-C Canon Rebel. None of the images reflect that

sselblad lenses have quite a tight f-stop, around 1/4, so the depth of field is sometimes worse than a full

The thing is, it comes down to the word associations used when training. If the foundational model didn't associate the camera name with the image, adding it in the prompt isn't going to have a lot of meaning. I think it would be useful to have a model or lora that is trained on actual exif data for the images ... this would produce some amazing results, like being able to specify specific cameras, lenses, and settings accurately. It works a bit now, but not as well as it could, imo

u/lohmatij avatar

In theory, yes, in practice Hasselblad lenses have quite a tight f-stop, around 1/4, so the depth of field is sometimes worse than a full frame (with 1/1.4).

more replies More replies
More replies
More replies
u/wolfsolus avatar

it all depends on the model

if the model is trained in photos you will only have photos

More replies
u/aspirationless_photo avatar

Because Fujifilm owners love to tell everyone what camera they own as if it matters? Lol jk but fr.

u/Zoanyway avatar

Can confirm.

Source: self, Fujifilm X-T1, X-T3, and X-T5 owner.

More replies

The guy who created Realistic Vision used “xt3” in some of his (excellent) portraits, so people continue to use it.

Some models will add some film grain to anything “fuji” just like some will treat any mention of “kodak” as kodak gold film.

u/nDman_sk avatar

Because Fujifilm cameras are applying film simulation on jpegs which increases contrast and you feel more depth in image. In these examples you clearly can see it. It is interesting for me to see AI knows this and can apply it correctly. I must test it using film names like velvia, provia, astia does anything to output or not.

More replies

Using ICBINP XL v3 with no negative prompt (except on the artist one, which had "nsfw, nudity, naked, nipple
" added due to Tyler Shields photo style)

prompt format of "woman on the street" with various tokens around it that are commonly used in photorealism prompts

Steps: 25, Sampler: DPM++ 2M Karras, CFG scale: 3, Seed: 2291976425, Size: 1024x1024, Model: icbinpXL_v3

My conclusions:

* Your results will vary depending on model, cfg, steps used, and the complexity of initial prompt
* Adding the camera does tend to override a lot of the other prompts
* The "quality" tokens do vary the image, but may or may not be better

u/pendrachken avatar

That's because the "quality" tokens are meant for NAI type drawn / painted models, not models fine tuned for realistic content. The NAI based models are quite literally trained with the "quality" tags.

Really no different than if you tried to steer your realistic model to what you want with booru tags like a NAI based model. It won't do all that much, and if you get something good it will be random.

The same goes for a NAI based model, using natural language like you usually do with realistic models won't work nearly as well as using booru tags.

What is a NAI model?

Novel AI fine tuned their own SD model and it leaked early on. It was one of the first fine tuned models.

Think of it as the grandfather of sd1.5 anime models

More replies
More replies

I believe you are correct (especially highres) about those being danbooru tags. What is interesting is that most of the prompts around even for realistic models still have the word salad including the danbooru tags so it was good to try them out. they did give some change to the end result, but definitely not as much as if we were to try the same exercise on Anything Diffusion or other NAI derived models

More replies
u/Beautiful-Musk-Ox avatar

every picture looks like the pictures they trained on, that's just how it works, seems like the models on civitai all work like that. they are very basic, as we can see in your testing

That is how training works, in a simple way it takes an average of the concepts in the images that have that token in the dataset, so you will always get an “average” looking person 

More replies
More replies
u/TheDailyDiffusion avatar

Photographer and type of medium used for photograph will yield actual differences

F/16 isn't right, that should have a very wide depth of field

It's really just the "kind of picture where the F stop is advertised in the description of the image"

It does make sense that photography terms would make it produce the kind of picture that photography amateurs would take.

We need photography lora, that enforces the correct terms

More replies

Here is a similar test I did a while back on photography related terms

Nice work! Apologies for not being as thorough as you were!!

Lol - no need for apologies. Just thought you might like some extra concepts and ideas to try out. Plus in the world of AI this stuff is pretty old. It would be nice to try it all over again with SDXL to see how it looks now.

The ones above are ICBINP XL, but it's not even in the top 5 of realism models for SDXL by monthly downloads, so trying one of the others would be good! I did later try Juggernaut XL and got quite different results, but that is another study for another day....

More replies
More replies
More replies
u/someweirdbanana avatar

You need to keep in mind that these tags aren't actually going to magically affect the image, all it does is telling SD to use the trained data from images that had these in the filename.

So while hasselblad for example is an outstanding camera, there aren't going to be many portraits taken with it in the trained data. You'd have better luck specifying a more common camera like Nikon D850 or Sony A7r iv.

Also aperture, you're not going to have many portraits with f2 in the training data. You'll have better luck with f1.4 or f2.8. Definitely not f16

Moreover, i don't think that ISO does what you think it does. Plus in recent-ish cameras in broad daylight you won't see any difference between ISO 100 and 800 so the photos SD was trained on will be the same in that aspect.

Good to note :)

u/Comrade_Derpsky avatar

Yeah, ISO is exposure speed. A lower ISO speed means the film/sensor is less sensitive to light and needs a longer time to sufficiently expose so you need a slower shutter speed. A very high ISO means the film/sensor is super sensitive. In low light settings this will result in a certain amount of noise in the picture because of oversensitivity. In broad daylight, there is so much light that you won't see any real difference between ISO settings, but with low ISO speeds you might get a bit more blurr from moving subjects or from the camera shaking while being held. I doubt that stable diffusion has picked up on this though since it won't be a super consistent thing. You might see a difference between ISO values if you make the setting indoors or at night.

Thanks for the info :)

More replies
More replies

Did SDXL or any fine tunes actually include camera tags?

I'd love to see the juggernaut xl tags. IMO those would be the most effective.

More replies

Also aperture, you're not going to have many portraits with f2 in the training data. You'll have better luck with f1.4 or f2.8. Definitely not f16

Yea but shouldn't it realize that lower number = shallower depth of field?

More replies

All this shows is that the seed has takes precedent over any new settings. The iso and f stop arnt evident in any of the images.

Yep.. that's the conclusion most people have got to. I think it shows that one should find a model they like and try all of these things on the model to understand how it best behaves for the look they want to get and go from there. One day you may want that photoshopped airbrushed look, and another day you may want that vintage grainy film look, and it's good to know what tokens you can push to get what you seek.

More replies
Comment Image

Just realised the quality prompt didn't have the blank for comparitive purposes, so here you go

To me, this more looks lije the camera type is doing the heavy lifting... And i suspect being biased by professional photographers both naming type of camera taken, and being much more likely to own such a camera.

So the "framing" and focus on the subject is similar to what a professional would do.

yeah, the camera prompt is definitely overpowering the quality prompt!

More replies
More replies
u/Abject-Recognition-9 avatar

i would do this test on base model instead of any biased one

I would do this test

On base model instead of

Any biased one

- Abject-Recognition-9


I detect haikus. And sometimes, successfully. Learn more about me.

Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete"

To be fair, on SDXL based models even most of the biased ones are still capable of generating pixar cartoons and sketches and anime with heavy enough prompting

More replies

So hopefully everyone will realize just how incredibly stupid it is to put in "best quality" which isn't even grammatically correct and implies functionality that doesn't exist.

This is one model. “best quality” is a danbooru tag which is used when tagging anime, and as such anime models that are trained with danbooru tags will probably notice a much greater effect than seen here

But yes, the next model card for this model will not be containing prompts with “best quality” in it :)

More replies

ya know, without knowing the checkpoint/model - this information is rather pointless? Not all models react the same to identical prompts . . . .

There is a comment below with the details, but definitely agree.  This is ICBINP XL v3

More replies

What are the blank rows and columns without labels?

Blanks mean that the was no token added to the prompt

More replies

Excellent!! Thank you

u/Delicious-Pilot3331 avatar

They seem to make the subject slightly more sad XD

u/vivikto avatar

If you expect any difference between ISO 100 and 800 in a well lit scene with modern cameras, you probably don't know much about photography. You might see a slight difference in grain on very high res pictures, but it's impossible you'd see a difference on low res pictures like these ones.

You would be correct that the finer details of photography would not my forte

More replies

Something about this is eerie, like seeing different existing versions of a lover.

I guess the keyword here is the overcast sky.

Was all “woman on a street”, no location, time of day, or weather prompts

More replies

more tags just change the noise, that's why you see slight differences
I highly doubt these photography tags actually do anything worthwhile

As I understand it the noise is from the seed and the prompt is turned into vectors which guide the diffusion

More replies
u/Queasy_Star_3908 avatar

"××× Dpi" is another one. In general it's pretty model/CP dependant

Will try that out next gpu run

Comment Image

Affects it for sure, but not in the way I was expecting..

More replies
[deleted]
[deleted]

I can definitely tell the F-stop def affects the focal blur but it is definitely a question of taste. Thank you for these comparisons!

Most welcome :)

More replies
u/moahmo88 avatar
u/nemo_theoceanborn avatar

I second what one user said, stop using "photorealism", just use "photo" or "photograph"

wow SD is actually still dumb as hell.

What were your expectations?

Edited

I get the downvotes, but no offense. "Wet plate" photo actually puts her in wet environment and makes her wet. I see no change in the focus when the f-stop is changed.

To me that creates quite a stretch when talking about the I in AI

That's because SDXL uses CLIP not an LLM. It has no "understanding" of the prompt.

Through statistical association of the image training set, A.I. give high probability of linking "wet" with water, it does not "know" that "Wet plate" has nothing to do with water.

Understanding this aspect of how SDXL works will make you a better prompter because then you know how to fix/improve your prompt when it does not work.

This bleeding is an issue but we have to work around it. For example "person, white background" often means the person (can be anyone) will be white, and their clothes are likely to be white. All I wanted is a white background.

more replies More replies
More replies
u/AuryGlenz avatar

So use “tintype” or “ambrotype.”

No AI really “thinks,” although LLM are flirting with it. Keep in mind there would be a lot more images tagged “wet” or “plate” than wet plate.

Yeah of course. It's just funny that it doesn't get the relation to the prompt. One could think it might be already one step further than mixing words to a picture.

More replies
u/organic_bird_posion avatar

Exactly. It's good to know that it doesn't know what f-stop does to a picture.

u/spacekitt3n avatar

f stops don't seem to matter. I've put f16 and it made a fighter jet. bokeh, lens blur in the positive or negative seem to yield better results. just think of how a human would label the photos in the dataset. I highly doubt that they are including all the EXIF data in the images that would be so tedious

True, but my statement as well. It's dumb. Like in the intelligence way.

More replies

I used a prompt with the phrase "sunken cheeks" and it kept putting the subject on a underwater or on a shipwreck. I understand that the tag might not be in the training data, but it did make for some interesting results.

More replies
More replies

Ur getting downvoted for no reason. This post is proof that this model is not trained well.

As in it is not trained with lots of photos of different apertures, iso settings and shutter speeds to more accurately reproduce photography. That's fair!

That said, if it gives you a decent image that you are looking for, then does it matter if you couldn't specify f/1.6 instead of saying a super bright image?

He is being down voted because his comments strongly suggest that he does not know how the A.I. works.

Not true at all man. A lot of my projects make use of AI in some way. It's just a random comment from someone who can't find sleep tonight.

But I get the conclusion.

More replies
More replies
More replies

Damn these women look haggard.

u/SDLidster avatar

yup look like photos. Good job, even though I personally don’t understand the fascination. I can browse the internet for random people all day long. /shrug.

What is the appeal?

u/raiffuvar avatar

portraits - sucks. try other staff.

Sad to see sd doesn't understand aperture. Would be a great way to control dof. At least it detects some of the focal lengths but I noticed it only reacts to some standard ones like 16, 35 or 85mm

I thought I remember seeing a Lora to specifically control DoF, focal lengths for fisheye/wide/normal/tele, white balance etc.

I feel like controlling technical aspects with a Lora on your preferred model would be the best approach, but understand the convenience of having it contained in a single checkpoint.

Comment Image

I mean there is some reaction, but again more to the womans hair than to the actual quality of the image

u/Abject-Recognition-9 avatar

if you dont see the difference between 16 and 50 then dont use this tags, dont ever buy a camera either

Im not saying there isnt also subtle changes to the background as well, but Im definitely not gonna go splash $2k + on a dlsr anytime soon without doing more research

There is some difference here but in reality 16vs 85mm would be completely different. With 16 it would look sort of like the first picture but 85mm is totally off. It would be only her face with zero background

Unless sd deducted it's moving away from subject to maintain the same view. But still with 16 as reference for this shot 50 and 85 are completely off

More replies
More replies
More replies

Would trying to get a bad quality image also work, such as “taken on a iPhone 3” or something like that? I haven’t tried…

Comment Image

Not really

Or at least not using ICBINP XL anyways

I’ve tried before with webcam/cctv and it doesn’t work. Not specific models however…maybe people just don’t talk about bad webcam models or something

I think if anything this has shown that there is a niche out there for someone to train an sdxl model on the photography triangle concetps (aperture, shutter speed, ISO)

More replies
More replies
More replies

Many times, I find the negative prompt to have a larger effect on the realism of the result than the inclusion of these sort of 'buzzwords' in the positive prompt.

u/dostler avatar

I’d be interested to see if you made up some camera names and tested the prompts just as they are using made up names would you get similar variations? In other words just shifting the noise slightly?

Pin that thought on the wall, I'll get back to you on that one (I've shut the rented GPU down for now)

Comment Image

Make of that what you will :)

More replies
u/dostler avatar

I wonder if the checkpoints scrape the metadata of photographs for camera names. If so I can see that making a difference in that high quality training images are more likely to be taken with professional cameras, perhaps it biases the results towards the better data?

From what I understand it is to do with whatever captions the LAION dataset had, so if the concepts are in there they should come through in the base model

More replies
u/CSsmrfk avatar

Great work! How long did it take you to generate all of these?

Not actually that long. Each 1024x1024 image took about 20 seconds to render (using an a4000 gpu on paperspace)

More replies

Any prompts which can help with photorealistic celebrity faces?

Any of these would help if the model knew who the celebrity was already. If the model doesnt know then youll need a lora/embedding

Any tutorials/resources for doing this? Appreciate the quick response!

More replies
More replies

Why doesn't it understand aperture at all? It's so straight forward, but there seems to be absolutely no difference in depth of field

Either not enough training data or its being overriden I guess, or at least it seems that way in this model

More replies
u/Queasy_Star_3908 avatar

Oh and there's the Negative "3D MAX" which has atleast some ground due to the early anatomy merges.

Cant say id ever seen that one!

More replies
u/Dangerous-Paper-8293 avatar

Hi friend, have you tried "Shot with Kodak Gold 200"? I really like the effect that it produces.

Cant say I have, will try it out next time I fire a gpu up

Comment Image

Interesting :)

u/Dangerous-Paper-8293 avatar

Nice

More replies
More replies
u/DrySupermarket8830 avatar

what model did you use for this?

These are all ICBINP XL v3

u/DrySupermarket8830 avatar

Thanks! I am very overwhelmed right now. Lots of model for photorealism. There's realistic vision, epic realism, cyber realistic, etc. But all of them looks like from a photographer and was color grade very well. Now I realized I want something that was taken from crappy phone, no bokeh and very raw or unprocessed. Have you tried other models?

I have tried the top 6 on Civitai for realism last month, but not to this extent. In terms of the basics they all make a good image

More replies
More replies
More replies