1

notes_on_sd_vae

 3 weeks ago
source link: https://gist.github.com/madebyollin/ff6aeadf27b2edbc51d05d5f97a595d9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Instantly share code, notes, and snippets.

  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Clone this repository at <script src="https://gist.github.com/madebyollin/ff6aeadf27b2edbc51d05d5f97a595d9.js"></script>
Save madebyollin/ff6aeadf27b2edbc51d05d5f97a595d9 to your computer and use it in GitHub Desktop.
notes_on_sd_vae

Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes and decodes images into a compressed "latent" format. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.

(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)

This document is a big pile of various links with more info.

VAE Versions & Lineage

Other SD-VAE-related Codebases

Other Info

Author

Diagram of VAE

281488415-ea77f359-d380-412d-b27a-21884d19da1d.jpg

Animation of how VAE (decoder) is used during SD generation

281488437-b03b28f6-b094-465b-958e-adf2215662aa.gif

Author

sd_vae_modification_chart

sdxl_vae_modification_chart

Author

Additive Gaussian noise

image

image

Author

effect of input image resolution on the scale of the encoded latents

Unknown-15

effect of scaling the "artifact" (brightest spot) of the SD latents up / down

anim-3.mp4

Author

effect of flips

image

image

Author

better latent-max chart with gaussian baseline

Unknown-16

animated visualization of the artifact that shows up SD-VAE for larger input images

scale_check_sd_vae-3.mp4

same test with the SDXL-VAE which doesn't have the artifact

scale_check_sdxl_vae-3.mp4

Author

Adding a quick grid for the Wuerstchen (Stable Cascade) Stage A f=4 VQGAN

Unknown-18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK