

Google Brain's New Model Imagen is Even More Impressive than Dall-E 2
source link: https://hackernoon.com/google-brains-new-model-imagen-is-even-more-impressive-than-dall-e-2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Google Brain's New Model Imagen is Even More Impressive than Dall-E 2
Google Brain's New Model Imagen is Even More Impressive than Dall-E 2


















I explain Artificial Intelligence terms and news to non-experts.
If you thought Dall-e 2 had great results, wait until you see what this new model from Google Brain can do.
Dalle-e is amazing but often lacks realism, and this is what the team attacked with this new model called Imagen.
They share a lot of results on their project page as well as a benchmark, which they introduced for comparing text-to-image models, where they clearly outperform Dall-E 2, and previous image generation approaches. Learn more in the video...
References
►Read the full article: https://www.louisbouchard.ai/google-brain-imagen/
►Paper: Saharia et al., 2022, Imagen - Google Brain, https://gweb-research-imagen.appspot.com/paper.pdf
►Project link: https://gweb-research-imagen.appspot.com/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
Video transcript
if you thought dali 2 had great results
wait until you see what this new model
from google brain can do delhi is
amazing but often lacks realism and this
is what the team attacked with this new
model called imogen they share a lot of
results on their project page as well as
a benchmark which they introduced for
comparing text to image models where
they clearly outperformed daily2 and
previous image generation approaches
this benchmark is also super cool as we
see more and more text to image models
and it's pretty difficult to compare the
results unless we assume the results are
really bad which we often do but this
model and le2 definitely defied the odds
tldr it's a new text-to-image model that
you can compare to dali to with more
realism as per human testers so just
like dali that i covered not even a
month ago this model takes texts like a
golden retriever dog wearing a blue
checkered barrette and a red dotted
turtleneck and tries to generate a
photorealistic image out of this weird
sentence the main point here is that
imogen can not only understand text but
it can also understand the images it
generates since they are more realistic
than all previous approaches of course
when i say understand i mean its own
kind of understanding that is really
different than ours the modal doesn't
really understand the text or the image
it generates it definitely has some kind
of knowledge about it but it mainly
understands how this particular kind of
sentence with these objects should be
represented using pixels on an image but
i'll concede that it sure looks like it
understands what we send it when we see
those results obviously you can trick it
with some really weird sentences that
couldn't look realistic like this one
but it sometimes beats even your own
imagination and just creates something
amazing still what's even more amazing
is how it works using something i never
discussed on the channel a diffusion
model but before using this diffusion
model we first need to understand the
text input and this is also the main
difference with dali they used a huge
text model similar to gpt3 to understand
the text as best as an ai system can so
instead of training a text model along
with the image generation model they
simply use a big pre-trained model and
freeze it so that it doesn't change
during the training of the image
generation model from their study this
led to much better results and it seemed
like the model understood text better so
this text module is how the model
understands text and this understanding
is represented in what we call encodings
which is what the model has been trained
to do on huge datasets to transfer text
inputs into a space of information that
it can use and understand
now we need to use this transform text
data to generate the image and as i said
they used a diffusion model to achieve
that but what is a diffusion model
diffusion models are generative models
that convert random gaussian noise like
this into images by learning how to
reverse gaussian noise iteratively they
are powerful models for super resolution
or other image to image translations and
in this case use a modified unit
architecture which i covered numerous
times in previous videos so i won't
enter into the architectural details
here basically the model is trained to
denoise an image from pure noise which
the orient using the text encodings and
a technique called classifier free
guidance which they say is essential and
clearly explained in their paper i'll
let you read it for more information on
this technique so now we have a model
able to take random gaussian noise and
our text encoding and denoise it with
guidance from the text encodings to
generate our image but as you see here
it isn't as simple as it sounds the
image we just generated is a very small
image as a bigger image will require
much more computation and a much bigger
model which are not viable instead we
first generate a photorealistic image
using the diffusion model we just
discussed and then use other diffusion
models to improve the quality of the
image iteratively i already covered
super resolution models in past videos
so i won't enter into the details here
but let's do a quick overview once again
we want to have noise and not an image
so we cover up this initially generated
low resolution image with again some
gaussian noise and we train our second
diffusion model to take this modified
image and improve it then we repeat
these two steps with another model but
this time using just patches of the
image instead of the full image to do
the same upscaling ratio and stay
computationally viable and voila we end
up with our photorealistic high
resolution image
of course this was just an overview of
this exciting new model with really cool
results i definitely invite you to read
their great paper for a deeper
understanding of their approach and a
detailed results analysis
and you do you think the results are
comparable to delhi too are they better
or worse i sure think it is dally's main
competitor as of now let me know what
you think of this new google brain
publication and the explanation i hope
you enjoyed this video and if you did
please take a second to leave a like and
subscribe to stay up to date with
exciting ai news if you are subscribed i
will see you next week with another
amazing paper
















Comments
Signup or Login to Join the Discussion
Thanks for sharing! What do you think the major use case will be for these text-to-image models long term?
Recommend
-
13
This mathematical brain model may pave the way for more human-like AICan we build a model of the human mind?
-
13
Google’s Imagen text-to-image generator offers ‘unprecedented photorealism’ [Gallery] May 24, 2022
-
38
-
8
Google's New Imagen AI Outperforms DALL-E on Text-to-Image Generation Benchmarks Jun 14, 2022...
-
8
Say hello to CHAI CHAI, a short news product by nextbigwhat is meant for ambitious professionals like you who prefer more signals vs. noise. Read less....
-
9
Google’s Imagen takes on Meta’s Make-A-Video as text-to-video AI models ramp up October 7, 2022...
-
8
Google’s text-to-image AI model Imagen is getting its first (very limited) public outingGoogle’s text-to-image AI model Imagen is getting its first (very limited) public outing / Don’t expect a free-for-all....
-
5
Check out all the on-demand sessions from the Intelligent Security Summit
-
3
Google’s new AI tool, Imagen Editor transforms images in one sentence
-
6
Gemini API and more new AI tools for developers and enterprises Dec 13, 2023...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK