GitHub - lucidrains/DALLE2-pytorch: Implementation of DALL-E 2, OpenAI's updated... - JOYK Joy of Geek, Geek News, Link all geek

DALL-E 2 - Pytorch (wip)

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch. Yannic Kilcher summary

The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. Specifically, this repository will only build out the diffusion prior network, as it is the best performing variant (but which incidentally involves a causal transformer as the denoising network )

This model is SOTA for text-to-image for now.

It may also explore an extension of using latent diffusion in the decoder from Rombach et al.

Please join if you are interested in helping out with the replication

Do let me know if anyone is interested in a Jax version #8

Install

$ pip install dalle2-pytorch

Usage (work in progress)

template

$ dream 'sharing a sunset at the summit of mount everest with my dog'

Once built, images will be saved to the same directory the command is invoked

Training (work in progress, will offer both in code and as command-line)

template

finish off gaussian diffusion class for latent embedding - allow for both prediction of epsilon as well as directly predicting embedding
make sure it works end to end
augment unet so that it can also be conditioned on text encodings (although in paper they hinted this didn't make much a difference)
look into Jonathan Ho's cascading DDPM for the decoder, as that seems to be what they are using. get caught up on DDPM literature
figure out all the current bag of tricks needed to make DDPMs great (starting with the blur trick mentioned in paper)
train on a toy task, offer in colab
add attention to unet - apply some personal tricks with efficient attention

Citations

@misc{ramesh2022,
    title   = {Hierarchical Text-Conditional Image Generation with CLIP Latents}, 
    author  = {Aditya Ramesh et al},
    year    = {2022}
}

@misc{crowson2022,
    author  = {Katherine Crowson},
    url     = {https://twitter.com/rivershavewings}
}

@misc{rombach2021highresolution,
    title   = {High-Resolution Image Synthesis with Latent Diffusion Models}, 
    author  = {Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
    year    = {2021},
    eprint  = {2112.10752},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

@inproceedings{Liu2022ACF,
    title   = {A ConvNet for the 2020s},
    author  = {Zhuang Liu and Hanzi Mao and Chaozheng Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
    year    = {2022}
}

@misc{zhang2019root,
    title   = {Root Mean Square Layer Normalization},
    author  = {Biao Zhang and Rico Sennrich},
    year    = {2019},
    eprint  = {1910.07467},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

GitHub - lucidrains/DALLE2-pytorch: Implementation of DALL-E 2, OpenAI's updated...

DALL-E 2 - Pytorch (wip)

Install

Usage (work in progress)

Training (work in progress, will offer both in code and as command-line)

Citations

Recommend

韩国30家半导体企业成立碳化硅产业联盟

Bank of Ghana Says it is Illegal to Use Other ‘Currencies’ for Transactions Beyo...

Imperva最新报告，消费者对数据泄露风险日益麻木

苏富比将拍卖纪念生成艺术史的NFT和实物作品Natively Digital

The Great Atlassian outage enters a new week • The Register

Fancy strings in Scala 3

Performance at GitHub: deferring stats with rack.after_reply

Yahoo Finance Presents: Ark Invest Founder & CEO Cathie Wood

写给Python程序员的PHP快速入门教程

config_patch_gitlab_api-3259230

About Joyk