1

List of VQGAN+CLIP Implementations

 2 years ago
source link: https://ljvmiranda921.github.io/notebook/2021/08/11/vqgan-list/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

List of VQGAN+CLIP Implementations

Aug 11, 2021 • LJ MIRANDA | 3 min read (578 words)  

I’ve been in a VQGAN+CLIP craze lately, so here’s a list of all VQGAN+CLIP implementations I found on the internet (The symbol 🔰 means perfect for non-programmers alike. If you don’t know where to start, you can start with these):

VQGAN+CLIP implementations

Name Author Description / Features VQGAN+CLIP (codebook sampling method) @RiversHaveWings The original VQGAN+CLIP notebook of Katherine Crowson (@RiversHaveWings). AI Art Machine @hillelogram 🔰 Very accessible Colab notebook. Has advanced options that are explained in a beginner-friendly level. Create realistic AI-Generated Images with VQGAN+CLIP @minimaxir 🔰 Has good UI affordances and more descriptive explanation of parameters. Have options for deterministic output by using icon-based input/target images. VQGAN+CLIP (with pooling and quantize method) @ak92501 Has an optional Gradio demo for a more streamlined experience. Zoetrope 5 @classpectanon Has advanced parameters for more controlled AI art generation. I haven’t tried this yet, but it may be good to flesh your artwork more. VQGAN+CLIP Python command-line interface @nerdyrodent Not a Google Colab notebook, but a Github repo that you can fork and run locally. Provides a command-line interface to generate AI-art on the fly. VQGAN+CLIP (z+quantize method with augmentations) @somewheresy It seems to be the first English-translated notebook of Katherine Crowson (@RiversHaveWings). CLIPIT PixelDraw @dribnet A very interesting fork of the VQGAN+CLIP notebooks that uses PixelDraw to generate pixel art given a prompt. Nightcafe Studio NightCafe Studio Not a Colab notebook, but rather a managed service where you need to setup an account. I can’t comment how different the outputs are compared to the Colab notebooks. Kapwing AI Video Generator Kapwing A web-hosted version of CLIP VQGAN. Generates videos after processing. It’s not as customizable, but the processing time is relatively fast!

CLIP-guided art generators

These aren’t necessarily VQGAN implementations, but can produce AI art nonetheless:

Name Author Description / Features The Big Sleep: BigGAN x CLIP @advadnoun Uses a CLIP-guided BigGAN generator. I can’t comment on the quality of the outputs, but this is exciting to try as well! Aleph-Image @advadnoun Uses a CLIP-guided DALL-E decoder. Try it out for more interesting results! CLIP Guided Diffusion HQ 512x512 @RiversHaveWings Uses OpenAI’s 512x512 class-conditional ImageNet diffusion model with CLIP. It is fixed at 512x512, but it also has a 256x256 version.

The common denominator across these works is that they are guided by OpenAI’s CLIP so that the image matches the text description. For more CLIP-guided projects, check out this Reddit post from February.

Resources

If you wish to learn more about VQGAN and CLIP, I suggest reading the following:

  • Alien Dreams: An Emerging Art Scene by Charlie Snell: gives a good overview and history of the recent AI Art scene. Traces its roots from the introduction of CLIP and its pairing of VQGAN today.
  • The Illustrated VQGAN: by yours truly, here I tried to explain how VQGAN works in a conceptual level. It starts with how images are “perceived” then ends with the whole VQGAN system.

Of course, nothing beats reading the original papers themselves:

  1. Esser, P., Rombach, R. and Ommer, B., 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12873-12883).
  2. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.

Did I miss anything? Just comment below!

Changelog

  • 08-22-2021: Added Kapwing and PixelDraw
  • 08-21-2021: This blogpost was featured in Comet’s newsletter!

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK