This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
TLDR: The 512x10240 image below was made was using the same amount of VRAM as for a 512x512 generation. It took the equivalent of 160 images at 10 steps each (20s/it). Each section of the image gets the full attention of SD which avoids some of the weird multiples you would get if you did this directly.
Ultra-wide panorama with cross fade between prompts
So I haven't made many images with Stable Diffusion despite using it heavily. The reason is I've been messing with the internals of the diffusion pipe, to interfere with the diffusion process in different ways. Todays fun result is based on omerbt/MultiDiffusion for making panoramas.
The original code would not give bigger than 2048x512 on my 2060 6GB VRAM. It ran out of memory when trying to pass the latents through the VAE. I tried splitting the image up but the result was not good, as each section was decoded and colour balanced separately.
Not to be deterred I hacked together some code to blend it all back together after the VAE but before the final colour balance. The pipe code is available on github. thekitchenscientist/sd_lite
Its just a standard pipe where I've bolted in sections of useful extras. There are no dependencies beyond standard SD. It should work with 1.5 and 2.1.
from scripts.pipeline_stable_diffusion_multi import StableDiffusionMultiPipeline
import torch
from diffusers import DDIMScheduler
model_id = r"C:\Models\stable-diffusion-2-1-base"
pipe = StableDiffusionMultiPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
The panorama method seems to require DDIM to work but a low CFG and number of steps works fine. With my modification you can also provide a second prompt and it will cross fade between the two over the middle third of the image.
prompt = "a clear blue sky above a mountain range"
alt_prompt = " gorge opening into a waterfall cascading down a cliff face, with many different pools and ledges"
image = pipe(prompt, height=2048, width=512,num_inference_steps=10, guidance_scale=4, alt_mode="panorama",alt_prompt=alt_prompt).images[0]
display(image)
There is still optimisation to make the code run 25-50% faster without loss in quality. Also it is limited to images 512 in one of the dimensions. An arbitrary size in both dimensions is possible but the time will triple for each additional 512 in size. With a bit of supporting code you could make a near infinite image with prompts that changed every short section.
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/StableDiffu...