7

Google Brain's New Model Imagen is Even More Impressive than Dall-E 2

 2 years ago
source link: https://hackernoon.com/google-brains-new-model-imagen-is-even-more-impressive-than-dall-e-2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Google Brain's New Model Imagen is Even More Impressive than Dall-E 2

Notifications
Happy Weekend, enjoy these top stories from this week, The Man Who Was Dead, Let Us Demystify DeFi, and more 💚
Last Saturday at 6:00 PM
Find Out How Much Wealth You Should Invest in Crypto
05/18/2022
The #Blockchain Writing Contest 2022: April Results Announced!
05/18/2022
Is Crypto Worth the HODL? How High Should You Stack Your Stocks? We Place Our Bets on Your Wisdom
05/13/2022
The Future of Gaming Writing Contest: April 2022 Results Announced!
05/13/2022
The Decentralized Internet Writing Contest 2022: March Results Announced
04/22/2022
🐞 Got a story on #debugging? Write today and win up to $500 in prizes!
04/15/2022
see more
Google Brain's New Model Imagen is Even More Impressive than Dall-E 2 by@whatsai

Google Brain's New Model Imagen is Even More Impressive than Dall-E 2

Trending: #17
Open TLDR
5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
If you thought Dall-e 2 had great results, wait until you see what this new model from Google Brain can do. Dalle-e is amazing but often lacks realism, and this is what the team attacked with a new model called Imagen. Imagen can not only understand text but also understand images it can also understand the images it generates. Learn more in the video...  Read the full article: https://www.louisbouchard.ai/Google-brain-imagen/
image

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

If you thought Dall-e 2 had great results, wait until you see what this new model from Google Brain can do.

Dalle-e is amazing but often lacks realism, and this is what the team attacked with this new model called Imagen.

They share a lot of results on their project page as well as a benchmark, which they introduced for comparing text-to-image models, where they clearly outperform Dall-E 2, and previous image generation approaches. Learn more in the video...

References

►Read the full article: https://www.louisbouchard.ai/google-brain-imagen/
►Paper: Saharia et al., 2022, Imagen - Google Brain, https://gweb-research-imagen.appspot.com/paper.pdf
►Project link: https://gweb-research-imagen.appspot.com/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video transcript

if you thought dali 2 had great results

wait until you see what this new model

from google brain can do delhi is

amazing but often lacks realism and this

is what the team attacked with this new

model called imogen they share a lot of

results on their project page as well as

a benchmark which they introduced for

comparing text to image models where

they clearly outperformed daily2 and

previous image generation approaches

this benchmark is also super cool as we

see more and more text to image models

and it's pretty difficult to compare the

results unless we assume the results are

really bad which we often do but this

model and le2 definitely defied the odds

tldr it's a new text-to-image model that

you can compare to dali to with more

realism as per human testers so just

like dali that i covered not even a

month ago this model takes texts like a

golden retriever dog wearing a blue

checkered barrette and a red dotted

turtleneck and tries to generate a

photorealistic image out of this weird

sentence the main point here is that

imogen can not only understand text but

it can also understand the images it

generates since they are more realistic

than all previous approaches of course

when i say understand i mean its own

kind of understanding that is really

different than ours the modal doesn't

really understand the text or the image

it generates it definitely has some kind

of knowledge about it but it mainly

understands how this particular kind of

sentence with these objects should be

represented using pixels on an image but

i'll concede that it sure looks like it

understands what we send it when we see

those results obviously you can trick it

with some really weird sentences that

couldn't look realistic like this one

but it sometimes beats even your own

imagination and just creates something

amazing still what's even more amazing

is how it works using something i never

discussed on the channel a diffusion

model but before using this diffusion

model we first need to understand the

text input and this is also the main

difference with dali they used a huge

text model similar to gpt3 to understand

the text as best as an ai system can so

instead of training a text model along

with the image generation model they

simply use a big pre-trained model and

freeze it so that it doesn't change

during the training of the image

generation model from their study this

led to much better results and it seemed

like the model understood text better so

this text module is how the model

understands text and this understanding

is represented in what we call encodings

which is what the model has been trained

to do on huge datasets to transfer text

inputs into a space of information that

it can use and understand

now we need to use this transform text

data to generate the image and as i said

they used a diffusion model to achieve

that but what is a diffusion model

diffusion models are generative models

that convert random gaussian noise like

this into images by learning how to

reverse gaussian noise iteratively they

are powerful models for super resolution

or other image to image translations and

in this case use a modified unit

architecture which i covered numerous

times in previous videos so i won't

enter into the architectural details

here basically the model is trained to

denoise an image from pure noise which

the orient using the text encodings and

a technique called classifier free

guidance which they say is essential and

clearly explained in their paper i'll

let you read it for more information on

this technique so now we have a model

able to take random gaussian noise and

our text encoding and denoise it with

guidance from the text encodings to

generate our image but as you see here

it isn't as simple as it sounds the

image we just generated is a very small

image as a bigger image will require

much more computation and a much bigger

model which are not viable instead we

first generate a photorealistic image

using the diffusion model we just

discussed and then use other diffusion

models to improve the quality of the

image iteratively i already covered

super resolution models in past videos

so i won't enter into the details here

but let's do a quick overview once again

we want to have noise and not an image

so we cover up this initially generated

low resolution image with again some

gaussian noise and we train our second

diffusion model to take this modified

image and improve it then we repeat

these two steps with another model but

this time using just patches of the

image instead of the full image to do

the same upscaling ratio and stay

computationally viable and voila we end

up with our photorealistic high

resolution image

of course this was just an overview of

this exciting new model with really cool

results i definitely invite you to read

their great paper for a deeper

understanding of their approach and a

detailed results analysis

and you do you think the results are

comparable to delhi too are they better

or worse i sure think it is dally's main

competitor as of now let me know what

you think of this new google brain

publication and the explanation i hope

you enjoyed this video and if you did

please take a second to leave a like and

subscribe to stay up to date with

exciting ai news if you are subscribed i

will see you next week with another

amazing paper

5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
by Louis Bouchard @whatsai.I explain Artificial Intelligence terms and news to non-experts.
Watch more on YouTube: https://www.youtube.com/c/WhatsAI

Comments

Signup or Login to Join the Discussion

Tue May 24 2022

Thanks for sharing! What do you think the major use case will be for these text-to-image models long term?

Curated St|

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK