You to can create Panorama images 512x10240+ (not a typo) using less then 6GB VRAM (Vertorama works...

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

117

You to can create Panorama images 512x10240+ (not a typo) using less then 6GB VRAM (Vertorama works too). A modification of the MultiDiffusion code to pass the image through the VAE in slices then reassemble. Potato computers of the world rejoice.

Post Flair (click to view more posts with a particular flair)

Tutorial | Guide

Post Body

TLDR: The 512x10240 image below was made was using the same amount of VRAM as for a 512x512 generation. It took the equivalent of 160 images at 10 steps each (20s/it). Each section of the image gets the full attention of SD which avoids some of the weird multiples you would get if you did this directly.

Ultra-wide panorama with cross fade between prompts

So I haven't made many images with Stable Diffusion despite using it heavily. The reason is I've been messing with the internals of the diffusion pipe, to interfere with the diffusion process in different ways. Todays fun result is based on omerbt/MultiDiffusion for making panoramas.

The original code would not give bigger than 2048x512 on my 2060 6GB VRAM. It ran out of memory when trying to pass the latents through the VAE. I tried splitting the image up but the result was not good, as each section was decoded and colour balanced separately.

https://preview.redd.it/9ee8dlyarzja1.png?width=3072&format=png&auto=webp&s=9ae771abb9653a5a3428ab05002c7947ee6973e9

Not to be deterred I hacked together some code to blend it all back together after the VAE but before the final colour balance. The pipe code is available on github. thekitchenscientist/sd_lite

Its just a standard pipe where I've bolted in sections of useful extras. There are no dependencies beyond standard SD. It should work with 1.5 and 2.1.

from scripts.pipeline_stable_diffusion_multi import StableDiffusionMultiPipeline
import torch
from diffusers import DDIMScheduler

model_id = r"C:\Models\stable-diffusion-2-1-base"
pipe = StableDiffusionMultiPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()

The panorama method seems to require DDIM to work but a low CFG and number of steps works fine. With my modification you can also provide a second prompt and it will cross fade between the two over the middle third of the image.

prompt = "a clear blue sky above a mountain range"
alt_prompt = " gorge opening into a waterfall cascading down a cliff face, with many different pools and ledges"
image = pipe(prompt, height=2048, width=512,num_inference_steps=10, guidance_scale=4, alt_mode="panorama",alt_prompt=alt_prompt).images[0]
display(image)

https://preview.redd.it/qzoy2q35rzja1.png?width=512&format=png&auto=webp&s=1238af2fbec26194582c282067b564344db8c9d9

There is still optimisation to make the code run 25-50% faster without loss in quality. Also it is limited to images 512 in one of the dimensions. An arbitrary size in both dimensions is possible but the time will triple for each additional 512 in size. With a bit of supporting code you could make a near infinite image with prompts that changed every short section.

https://preview.redd.it/vklk1p8vrzja1.png?width=2048&format=png&auto=webp&s=fdfdc617462faecdeb9ca4cec8382c2b3bace7e8

https://preview.redd.it/brqyytbyrzja1.png?width=4096&format=png&auto=webp&s=58039aedab918cd00034b76674a998b792c0cdd7

https://preview.redd.it/jhcdopi0szja1.png?width=2048&format=png&auto=webp&s=9c975fcaa1daf74720c4b824986ce542535a2945

Author

Account Strength

50%

Account Age

1 year

Verified Email

Yes

Verified Flair

Total Karma

626

Link Karma

171

Comment Karma

425

Profile updated: 2 days ago

thkitchenscientist

Subreddit

r/StableDiffusion

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 year ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/StableDiffu...