

Generate Novel Artistic Artworks with Deep Learning
source link: https://towardsdatascience.com/generate-novel-artistic-artworks-with-deep-learning-f2f61da69e6e?gi=7c62ae386246
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

1. Problem Statement
In this article, I will go on about using deep learning to compose images in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as neural style transfer ! This is a technique outlined in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style , which is a great read, and you should definitely check it out.
But, what is neural style transfer?
Neural style transfer is an optimization technique used to take three images, a content image, a style reference image (such as an artwork by a famous painter), and the input image you want to style — and blend them together such that the input image is transformed to look like the content image, but “painted” in the style of the style image, bridging the orbits of deep learning and art!
For example, let’s take an image of this turtle and Katsushika Hokusai’s The Great Wave off Kanagawa :
Now how would it look like if Hokusai decided to paint the picture of this turtle exclusively with this style? Something like this?
Is this magic or just deep learning? Fortunately, this doesn’t involve any witchcraft: style transfer is a fun and interesting technique that showcases the capabilities and internal representations of neural networks.
The principle of neural style transfer is to define two distance functions, one that describes how different the content of two images are , , and one that describes the difference between two images in terms of their style, . Then, given three images, a desired style image, a desired content image, and the input image (initialized with the content image), we try to transform the input image to minimize the content distance with the content image and its style distance with the style image. In summary, we’ll take the base input image, a content image that we want to match, and the style image that we want to match. We’ll transform the base input image by minimizing the content and style distances (losses) with backpropagation, creating an image that matches the content of the content image and the style of the style image.
In this article, we will be generating an image of the Louvre museum in Paris (content image C), mixed with a painting by Claude Monet, a leader of the impressionist movement (style image S).
2. Transfer Learning
Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning .
Following the original NST paper , I will be using the VGG network. Specifically, VGG-19 , a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low-level features (at the shallower layers) and high-level features (at the deeper layers).
The following code to load parameters from the VGG model (refer to Github repo for more information):
pp = pprint.PrettyPrinter(indent=4) model = load_vgg_model(“pretrained-model/imagenet-vgg-verydeep-19.mat”) pp.pprint(model)
The model is stored in a python dictionary. The python dictionary contains key-value pairs for each layer in which the ‘key’ is the variable name and the ‘value’ is a tensor for that layer.
3. Neural Style Transfer (NST)
We will build the Neural Style Transfer (NST) algorithm in three steps:
- Build the content cost function J_content (C, G).
- Build the style cost function J_style (S, G).
- Put it together to obtain J(G) = α * J_content (C, G) + β * J_style (S, G).
3.1 Computing content cost
In our running example, the content image C will be the picture of the Louvre Museum in Paris (scaled to 400 x 300 )
content_image = scipy.misc.imread(“images/louvre.jpg”) imshow(content_image);
The content image (C) shows the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.
3.1.1 Match content of generated image G with image C
As aforementioned, the shallower layers of a ConvNet tend to detect lower-level features such as edges and simple textures; the deeper layers tend to detect higher-level features such as more complex textures as well as object classes.
We would like the generated image G to have similar content as the input image C . Suppose you have chosen some layer’s activations to represent the content of an image. In practice, you’ll get the most visually pleasing results if you choose a layer in the middle of the network — neither too shallow nor too deep.
Note: After you have finished this article’s example, feel free to experiment with different layers, to see how the results vary.
First, we will set the image C as the input to the pre-trained VGG network, and run forward propagation. Let a ᶜ be the hidden layer activations in the layer you had chosen. This will be an nH × nW × nC tensor.
Repeat this process with the image G — set G as the input, and run forward propagation. Let a ᴳ be the corresponding hidden layer activation.
We will then define the content cost function as:
Content cost functionHere, nH , nW, and nC are respectively the height, width and number of channels of the hidden layer you have chosen. The terms appear in a normalization term in the cost.
For clarity, note that a ᶜ and a ᴳ are the 3D volumes corresponding to a hidden layer’s activations. In order to compute the cost J_content (C, G), it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.
Technically, this unrolling step isn’t needed to compute J_content , but it will be of good practice for when you do need to carry out a similar operation later for computing the style cost J_style .
Implementing compute_content_cost()
compute_content_cost()
function computes the content cost using TensorFlow.
The 3 steps to implement this function are:
- Retrieve dimensions from
a_G
. - Unroll
a_C
anda_G
as explained in the picture above. - Compute the content cost.
In summary, the content cost takes a hidden layer activation of the neural network, and measures how different a ᶜ and a ᴳ are. When we minimize the content cost later, this will help make sure G has similar content as C .
3.2 Computing style cost
For our running example, we will use the following style image:
By Claude Monet, a leader of the impressionist movement, painted in the style of impressionism .
3.2.1 Style matrix
The style matrix is also called a Gram matrix . In linear algebra, the Gram matrix G of a set of vectors (v₁,…, v n ) is the matrix of dot products, whose entries are G ij = vᵢᵀ vⱼ= np.dot(vᵢ, vⱼ)
In other words, G ij compares how similar vᵢ is to vⱼ. If they are highly similar, you would expect them to have a large dot product, and thus for G ij to be large.
Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature. G is used to denote the Style matrix (or Gram matrix); G also denotes the generated image. For this example, we will use G gram to refer to the Gram matrix, and G to denote the generated image.
In Neural Style Transfer (NST), you can compute the Style matrix by multiplying the “unrolled” filter matrix with its transpose:
G gram measures the correlation between two filters:
The result is a matrix of dimension ( nC, nC ) where nC is the number of filters (channels). The value G gram(i, j) measures how similar the activations of filter i are to the activations of filter j .
G gram also measures the prevalence of patterns or textures:
The diagonal elements G gram(i, i) measure how “active” a filter i is. For example, suppose filter i is detecting vertical textures in the image. Then G gram(i, i) measures how common vertical textures are in the image as a whole. If G gram(i, i) is large, this means that the image has a lot of vertical texture.
Implementing gram_matrix()
3.2.2 Style cost
The goal will be to minimize the distance between the Gram matrix of the style image S and the gram matrix of the generated image G.
For now, we are using only a single hidden layer a ˡ. The corresponding style cost for this layer is defined as:
Implementing compute_layer_style_cost()
The 3 steps to implement this function are:
- Retrieve dimensions from the hidden layer activations
a_G
. - Unroll the hidden layer activations
a_S
anda_G
into 2D matrices, as explained in the figure above. - Compute the Style matrix of the images S and G with the function we had previously written.
- Compute the Style cost.
3.2.3 Style Weights
So far, we have captured the style from only one layer. We would get better results if we “merge” style costs from several different layers. Each layer will be given weights ( λˡ ) that reflect how much each layer will contribute to the style. By default, we’ll give each layer equal weight, and the weights add up to 1. After completing this example, feel free to experiment with different weights to see how it changes the generated image G .
You can combine the style costs for different layers as follows:
where the values for λˡ are given in STYLE_LAYERS
.
STYLE_LAYERS = [ (‘conv1_1’, 0.2), (‘conv2_1’, 0.2), (‘conv3_1’, 0.2), (‘conv4_1’, 0.2), (‘conv5_1’, 0.2)]
Implementing compute_style_cost()
This function calls the compute_layer_style_cost(...)
function several times, and weighs their results using the values in STYLE_LAYERS
.
Description of compute_style_cost
For each layer:
- Select the activation (the output tensor) of the current layer.
- Get the style of the style image S from the current layer.
- Get the style of the generated image G from the current layer.
- Compute the style cost for the current layer
- Add the weighted style cost to the overall style cost ( J_style )
Once done with the loop:
- Return the overall style cost.
Note: In the inner-loop of the for-loop above, a_G
is a tensor and hasn't been evaluated yet. It will be evaluated and updated at each iteration when we run the TensorFlow graph in model_nn() below.
In summary, the style of an image can be represented using the Gram matrix of a hidden layer’s activations. We get even better results by combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient. In addition, minimizing the style cost will cause the image G to follow the style of the image S .
3.3 Defining the total cost to optimize
Finally, let’s create a cost function that minimizes both the style and the content cost. The formula is:
Implementing total_cost()
The total cost is a linear combination of the content cost J_content (C, G) and the style cost J_style (S, G).
α and β are hyperparameters that control the relative weighting between content and style.
4. Solving the optimization problem
Finally, let’s put everything together to implement Neural Style Transfer!
Here’s what the program will have to do:
- Create an Interactive Session
- Load the content image
- Load the style image
- Randomly initialize the image to be generated
- Load the VGG19 model
- Build the TensorFlow graph:
- Run the content image through the VGG19 model and compute the content cost
- Run the style image through the VGG19 model and compute the style cost
- Compute the total cost
- Define the optimizer and the learning rate
7. Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.
Let's go through the individual steps in detail.
Interactive Sessions
We’ve previously implemented the overall cost J(G) . We’ll now set up TensorFlow to optimize this with respect to G .
To do so, our program has to reset the graph and use an “ Interactive Session ”. Unlike a regular session, the “Interactive Session” installs itself as the default session to build a graph. This allows us to run variables without constantly needing to refer to the session object (calling sess.run()
), which simplifies the code.
# Reset the graph tf.reset_default_graph()# Start interactive session sess = tf.InteractiveSession()
Content image
Let’s load, reshape, and normalize our content image (the Louvre museum picture):
content_image = scipy.misc.imread(“images/w_hotel.jpg”) content_image = reshape_and_normalize_image(content_image)
Style image
Let’s load, reshape and normalize our style image (Claude Monet’s painting):
style_image = scipy.misc.imread(“images/starry_night.jpg”) style_image = reshape_and_normalize_image(style_image)
Generated image correlated with content image
Now, we initialize the generated image as a noisy image created from the content_image
.
The generated image is slightly correlated with the content image. By initializing the pixels of the generated image to be mostly noise but slightly correlated with the content image, this will help the content of the generated image more rapidly match the content of the content image.
Feel free to look in nst_utils.py
to see the details of generate_noise_image(...)
in the Github repo.
generated_image = generate_noise_image(content_image) imshow(generated_image[0]);
Load pre-trained VGG19 model
Next, as explained before, we shall load the VGG19 model.
model = load_vgg_model(“pretrained-model/imagenet-vgg-verydeep-19.mat”)
Content Cost
To get the program to compute the content cost, we will now assign a_C
and a_G
to be the appropriate hidden layer activations. We will use layer conv4_2
to compute the content cost. The code below does the following:
- Assign the content image to be the input to the VGG model.
- Set
a_C
to be the tensor giving the hidden layer activation for layerconv4_2
. - Set
a_G
to be the tensor giving the hidden layer activation for the same layer. - Compute the content cost using
a_C
anda_G
.
Note: At this point, a_G
is a tensor and hasn’t been evaluated. It will be evaluated and updated at each iteration when we run the Tensorflow graph in model_nn()
below.
# Assign the content image to be the input of the VGG model. sess.run(model[‘input’].assign(content_image))# Select the output tensor of layer conv4_2 out = model[‘conv4_2’]# Set a_C to be the hidden layer activation from the layer we have selected a_C = sess.run(out)# Set a_G to be the hidden layer activation from same layer. Here, a_G references model[‘conv4_2’] # and isn’t evaluated yet. Later in the code, we’ll assign the image G as the model input, so that # when we run the session, this will be the activations drawn from the appropriate layer, with G as input. a_G = out# Compute the content cost J_content = compute_content_cost(a_C, a_G)
Style cost
# Assign the input of the model to be the “style” image sess.run(model[‘input’].assign(style_image))# Compute the style cost J_style = compute_style_cost(model, STYLE_LAYERS)
Total cost
Now that we have the content cost ( J_content) and style cost ( J_style ), compute the total cost J by calling total_cost()
.
J = total_cost(J_content, J_style, alpha=10, beta=40)
Optimizer
Here, I used the Adam optimizer to minimize the total cost J
.
# define optimizer optimizer = tf.train.AdamOptimizer(2.0)# define train_step train_step = optimizer.minimize(J)
Implementing model_nn()
The function initializes the variables of the TensorFlow graph, assigns the input image (initial generated image) as the input of the VGG19 model, and runs the train_step
tensor (it was created in the code above this function) for a large number of steps.
Run the following code snippet to generate an artistic image. It should take about 3min on CPU for every 20 iterations but you start observing attractive results after ≈140 iterations. Neural Style Transfer is generally trained using GPUs.
model_nn(sess, generated_image)
You’re done! After running this, you should see something the image presented below on the right:
Here are a few other examples:
- The beautiful ruins of the ancient city of Persepolis (Iran) with the style of Van Gogh (The Starry Night)
- The tomb of Cyrus the Great in Pasargadae with the style of a Ceramic Kashi from Ispahan.
- A scientific study of a turbulent fluid with the style of a abstract blue fluid painting.
6. Conclusion
You are now able to use Neural Style Transfer to generate artistic images. Neural Style Transfer is an algorithm, that given a content image C and a style image S , can generate an artistic image.
It uses representations (hidden layer activations) based on a pre-trained ConvNet. The content cost function is computed using one hidden layer’s activations; the style cost function for one layer is computed using the Gram matrix of that layer’s activations. The overall style cost function is obtained using several hidden layers.
Lastly, optimizing the total cost function results in synthesizing new images.
7. Citations & References
Github repo: https://github.com/TheClub4/artwork-neural-style-transfer
Special thanks to deeplearning.ai . Images courtesy of deeplearning.ai .
The Neural Style Transfer algorithm was due to Gatys et al. (2015). Harish Narayanan and Github user “log0” also have highly readable write-ups from which we drew inspiration. The pre-trained network used in this implementation is a VGG network, which is due to Simonyan and Zisserman (2015). Pre-trained weights were from the work of the MathConvNet team.
- Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, (2015). A Neural Algorithm of Artistic Style
- Harish Narayanan, Convolutional neural networks for artistic style transfer.
- Log0, TensorFlow Implementation of “A Neural Algorithm of Artistic Style”.
- Karen Simonyan and Andrew Zisserman (2015). Very deep convolutional networks for large-scale image recognition
- MatConvNet.
Recommend
-
15
0:00 / 33:43 ...
-
18
Research 发布基于飞桨的前沿研究工作,包括CV、NLP、KG、STDM等领域的顶会论文和比赛冠军模型。 计算机视觉 任务类型 目录 简介 论文链接 图像检索
-
28
20+ Bright Digital Artworks Capturing Summer Moments Inspiration 20+ Bright Digital Artworks Capturing Summer Moments New summer is up and that means it’s high time we collected a new set of dig...
-
844
Welcome to HashLips All the code in these repos was created and explained by HashLips on the main YouTube channel. To find out more...
-
6
-
5
Find Your Pet in Famous Artworks With Google’s Arts & Culture App By Maxwell Holland Published 10 hours ago This...
-
21
40 Bright Digital Artworks About Cute Corgi Life by Lynn Chen Inspiration 40 Bright Digital Artworks About Cute Corgi Life by Lynn Chen Time for a new dose of artistic inspiration in
-
12
Leví Arista Posted on Jan 14...
-
9
March 7, 2023 ...
-
7
AI system can generate novel proteins that meet structural design targets These tunabl...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK