Help me incorporating reference_only on a custom node

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Hey all! Hopefully I can find some help here. I need someone with deep understanding of how Stable Diffusion works technically speaking (both theoretically and with Python code) and also how ComfyUI works so they could possibly lend me a hand with a custom node.

I'm trying to implement reference only "controlnet preprocessor". For those who don't know, it is a technique that works by patching the unet function so it can make two passes during an inference loop: one to write data of the reference image, another one to read it during the normal input image inference so the output emulates the reference image's style to an extent.

It's also necessary to patch the self-attention layer in order to write and read that data.

After some days of studying a lot, I finally managed to get my code working with no errors. I based my code on an example made for diffusers and adapted to ComfyUI logic. It's a custom node that takes as inputs a latent reference image and the model to patch.

However, I'm not happy with the results. I can tell it's working because it's possible to see some features of the reference image. Here are some examples of my testing (all using Dreamshaper):

Input reference:

Input image

Output:

https://preview.redd.it/kct66mts5odb1.png?width=512&format=png&auto=webp&s=fdad42ce8c0632dfe4f1a20abc1ee0ea12740441

Input (with the same prompt):

https://preview.redd.it/vtads2xx5odb1.jpg?width=628&format=pjpg&auto=webp&s=555857827f9cb1ee4c0aa05f5c95438ee13bb64c

Output:

https://preview.redd.it/2j04oyo26odb1.png?width=512&format=png&auto=webp&s=57d9b6d7c880fbb4c46cf223cd9a0dd74afd2d76

Same input reference but with "Dog walking in the park":

https://preview.redd.it/f4j7cr676odb1.png?width=512&format=png&auto=webp&s=27bbaa6f0c07eb22dfd5421de4f3e461b4fd913f

Same prompt without reference:

https://preview.redd.it/7a39gifb6odb1.png?width=512&format=png&auto=webp&s=75aa2a61d6797508d2d156135d113e7df4fdfc71

It's kind of obvious that it's taking slight features of the reference image, but it's not strong enough to be considered successful, even more considering all the testing was done with a style fidelity of 1 (the maximum value).

Automatic1111's controlnet extension can even copy character features.

So in conclusion, there is something missing in my code, but I can't tell what at this point. I admit I have made many assumptions during coding, but that's the reason why I'm asking for help. I need someone that understands and can tell me what is wrong with my implementation.

You can check my custom node's source code in my controlnet preprocessors fork. (Original repo by Fannovel16)

Author

User Disabled

Account Strength

Disabled 2 weeks ago

Account Age

3 years

Verified Email

Yes

Verified Flair

Total Karma

2,723

Link Karma

457

Comment Karma

2,251

Profile updated: 4 days ago

Posts updated: 6 months ago

RangerRocket09

Subreddit

r/comfyui

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 year ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/comfyui/com...