Deep Learning: Solving Problems With TensorFlow

Learn how to Solve Optimization Problems and Train your First Neural Network with the MNIST Dataset!

Victor Roman

Jan 24 ·10min read

miyeeiQ.jpg!web

www.forbes.com

Introduction

The goal of this article is to define and solve pratical use cases with TensorFlow. To do so, we will solve:

An optimization problem
A linear regression problem, where we will adjust a regression line to a dataset
And we will end solving the “Hello World” of Deep Learning classification projects with the MINST Dataset.

Optimization Problem

Netflix has decided to place one of their famous posters in a building. The marketing team has decided that the advertising poster has to cover an area of 600 square meters, with a margin of 2 meters above and below and 4 meters left and right.

However, they have not been informed of the dimensions of the building’s facade. We could send an email to the owner and ask him, but as we know mathematics we can solve it easily. How can we find out the dimensions of the building?

JBfy2qF.png!web

The total area of the building is:

Width = 4 + x + 4 = x +8

Height = 2 + y + 2 = y +4

Area = Width x Height = (x + 8)*(y + 4)

And there is the constraint of: x*y = 600

This allows us to write an equation system:

xy = 600 → x = 600/y

S(y)= (600/y + 8)(y + 4) = 600 +8y +4*600/y +32 = 632 + 8y + 2400/y

In an optimization problem, the information of the slope of the function, (the derivative) is used to calculate its minimum. We have to equal the first derivative to 0 and then check that the second derivative is positive. So, in this case:

S’(y) = 8–2400/y²

S’’(y) = 4800/y³

S’(y) = 0 → 0 = 8–2400/y² → 8 = 2400/y² → y² = 2400/8 = 300 → y = sqrt(300) = sqrt(100–3) = sqrt(100)-sqrt(3) = 10-sqrt(3) = 17.32 (we discard the negative sign because it has no physical meaning)

Substituting in x:

x =600 / 10-sqrt(3) = 60 / sqrt(3) = 60-sqrt(3) / sqrt(3)-sqrt(3) = 60-sqrt(3) / 3 = 20-sqrt(3) = 34.64

As for y = 17.32 -> S’’(y) = 0.9238 > 0, we have found the minimum solution.

Therefore, the dimensions of the building are:

Width: x + 8 = 42.64 m

Height: y + 4 = 21.32 m

Have you seen how useful derivatives are? We just solved this problem analytically. We have been able to solve it because it was a simple problem, but there are many problems for which it is very computationally expensive to solve them analytically, so we use numerical methods. One of these methods is Gradient Descent.

What do you say if we solve this problem this time numerically with Tensorflow? Let’s go!

import numpy as np
import tensorflow as tfx = tf.Variable(initial_value=tf.random_uniform([1], 34, 35),name=’x’)
y = tf.Variable(initial_value=tf.random_uniform([1], 0., 50.), name=’y’)# Loss function
s = tf.add(tf.add(632.0, tf.multiply(8.0, y)), tf.divide(2400.0, y), ‘s’)opt = tf.train.GradientDescentOptimizer(0.05)
train = opt.minimize(s)sess = tf.Session()init = tf.initialize_all_variables()
sess.run(init)old_solution = 0
tolerance = 1e-4
for step in range(500):
 sess.run(train)
 solution = sess.run(y)
 if np.abs(solution — old_solution) < tolerance:
 print(“The solution is y = {}”.format(old_solution))
 break
 
 old_solution = solution
 if step % 10 == 0:
 print(step, “y = “ + str(old_solution), “s = “ + str(sess.run(s)))

Fb2Ub2e.png!web

We have managed to calculate y using the gradient descent algorithm. Of course, we now need to calculate x substituting x = 600/y.

x = 600/old_solution[0]
print(x)

Which matches our results, so it seems to work! Let’s plot the results:

import matplotlib.pyplot as plty = np.linspace(0, 400., 500)
s = 632.0 + 8*y + 2400/y
plt.plot(y, s)

UrQ32aZ.png!web

print("The function minimum is in {}".format(np.min(s)))
min_s = np.min(s)
s_min_idx = np.nonzero(s==min_s)
y_min = y[s_min_idx]
print("The y value that reaches the minimum is {}".format(y_min[0]))

Let’s See other Example

In this case, we want to find the minimum of the y = log2(x) function.

x = tf.Variable(15, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(log_x_squared)init = tf.initialize_all_variables()def optimize():
  with tf.Session() as session:
    session.run(init)
    print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
    for step in range(100):  
      session.run(train)
      print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
      
optimize()

aUjeMjb.png!web

Let’s plot it!

x_values = np.linspace(0,10,100)
fx = np.log(x_values)**2
plt.plot(x_values, fx)print("The function minimum is in {}".format(np.min(fx)))
min_fx = np.min(fx)
fx_min_idx = np.nonzero(fx==min_fx)
x_min_value = x_values[fx_min_idx]
print("The y value that reaches the minimum is {}".format(x_min_value[0]))

JBZvqyM.png!web

Let’s Solve a Linear Regression Problem

Let’s see how to adjust a straight line to a dataset that represent the intelligence of every character in the Simpson’s show, from Ralph Wigum to Doctor Frink.

Let’s plot the distribution of intelligence against the age, normalized from 0 to 1, where Maggie is the youngest and Montgomery Burns the oldest:

n_observations = 50
_, ax = plt.subplots(1, 1)
xs = np.linspace(0., 1., n_observations)
ys = 100 * np.sin(xs) + np.random.uniform(0., 50., n_observations)
ax.scatter(xs, ys)
plt.draw()

2EzYNve.png!web

Now, we need two tf.placeholders, one to the entry and other to the exit of our regression algorithm. Placeholders are variables that do not need to be assigned a value until the network is executed.

X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

Let’s try to optimizie a straight line of linear regression. We need two variables, the weights (W) and the bias (b). Elements of the type tf.Variable need an initialization and its type cannot be changed after being declared. What we can change is its value, by the “assign” method.

W = tf.Variable(tf.random_normal([1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
Y_pred = tf.add(tf.multiply(X, W), b)

Let’s define now the cost function as the difference between our predictions and the real values.

loss = tf.reduce_mean(tf.pow(Y_pred - y, 2))

We’ll define now the optimization method, we will use the gradient descent. Basically, it calculates the variation of each weight with respect to the total error, and updates each weight so that the total error decreases in subsequent iterations. The learning rate indicates how abruptly the weights are updated.

learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)# Definition of the number of iterations and start the initialization using the GPU
n_epochs = 1000with tf.Session() as sess:
  with tf.device("/GPU:0"):    # We initialize now all the defined variables
    sess.run(tf.global_variables_initializer())    # Start the adjust
    prev_training_loss = 0.0
    for epoch_i in range(n_epochs):
      for (x, y) in zip(xs, ys):
        sess.run(optimizer, feed_dict={X: x, Y: y})      W_, b_, training_loss = sess.run([W, b, loss], feed_dict={X: xs, Y: ys})      # We print the losses every 20 epochs
      if epoch_i % 20 == 0:
        print(training_loss)      # Ending conditions
      if np.abs(prev_training_loss - training_loss) < 0.000001:
        print(W_, b_)
        break
      prev_training_loss = training_loss    # Plot of the result
    plt.scatter(xs, ys)
    plt.plot(xs, Y_pred.eval(feed_dict={X: xs}, session=sess))

IFraa2i.png!web

And we have it! With this regression line we will be able to predict the intelligence of every Simpson’s character knowing the age.

MNIST Dataset

Let’s see now how to classify digits images with a logistic regression. We will use the “Hello world” of the Deep Learning datasets.

yqM73u3.jpg!web

A77v6nz.png!web

Let’s import the relevant libraries and the dataset MNIST:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

We load the dataset by encoding the labels with one-hot encoding (it converts each label into a vector of length = N_CLASSES, with all 0s except for the index that indicates the class to which the image belongs, which contains a 1). For example, if we have 10 classes (numbers from 0 to 9), and the label belongs to number 5: label = [0 0 0 0 1 0 0 0 0].

mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)print("Train examples: {}".format(mnist.train.num_examples))
print("Test examples: {}".format(mnist.test.num_examples))
print("Validation examples: {}".format(mnist.validation.num_examples))# Images are stored in a 2D tensor: images_number x image_pixels_vector
# Labels are stored in a 2D tensor: images_number x classes_number (one-hot)
print("Images Size train: {}".format(mnist.train.images.shape))
print("Images Size train: {}".format(mnist.train.labels.shape))# To see the range of the images values
print("Min value: {}".format(np.min(mnist.train.images)))
print("Max value: {}".format(np.max(mnist.train.images)))# To see some images we will acess a vector of the dataset and resize it to 28x28
plt.subplot(131)
plt.imshow(np.reshape(mnist.train.images[0, :], (28, 28)), cmap='gray')
plt.subplot(132)
plt.imshow(np.reshape(mnist.train.images[27500, :], (28, 28)), cmap='gray')
plt.subplot(133)
plt.imshow(np.reshape(mnist.train.images[54999, :], (28, 28)), cmap='gray')

We have already seen a little of what the MNIST dataset consists of. Now, let’s create our regressor:

First, we create the placeholder for our input data. In this case, the input is going to be a set of vectors of size 768 (we are going to pass several images at once to our regressor, this way, when it calculates the gradient it will be swept in several images, so the estimation will be more precise than if it used only one)

n_input = 784  # Number of data features: number of pixels of the image
n_output = 10  # Number of classes: from 0 to 9
net_input = tf.placeholder(tf.float32, [None, n_input])  # We create the placeholder

Let’s define now the regression equation: y = W*x + b

W = tf.Variable(tf.zeros([n_input, n_output]))
b = tf.Variable(tf.zeros([n_output]))

As the output is multiclass, we need a function that returns the probabilities of an image belonging to each of the possible classes. For example, if we put an image with a 5, a possible output would be: [0.05 0.05 0.05 0.05 0.55 0.05 0.05 0.05 0.05] whose sum of probabilities is 1, and the class with the highest probability is 5.

We apply the softmax function to normalize the output probabilities:

net_output = tf.nn.softmax(tf.matmul(net_input, W) + b)

SoftMax Function

# We also need a placeholder for the image label, with which we will compare our prediction And finally, we define our loss function: cross entropy
y_true = tf.placeholder(tf.float32, [None, n_output])# We check if our prediction matches the label
cross_entropy = -tf.reduce_sum(y_true * tf.log(net_output))
idx_prediction = tf.argmax(net_output, 1)
idx_label = tf.argmax(y_true, 1)
correct_prediction = tf.equal(idx_prediction, idx_label)# We define our measure of accuracy as the number of hits in relation to the number of predicted samples
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))# We now indicate that we want to minimize our loss function (the cross entropy) by using the gradient descent algorithm and with a rate of learning = 0.01.
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Everything is now set up! Let’s execute the graph:

from IPython.display import clear_outputwith tf.Session() as sess:  sess.run(tf.global_variables_initializer())  # Let's train the regressor
  batch_size = 10 
  for sample_i in range(mnist.train.num_examples):
    sample_x, sample_y = mnist.train.next_batch(batch_size)
    sess.run(optimizer, feed_dict={net_input: sample_x, 
                                   y_true: sample_y})    # Let's check how is performing the regressor
    if sample_i < 50 or sample_i % 200 == 0:
      val_acc = sess.run(accuracy, feed_dict={net_input: mnist.validation.images, y_true: mnist.validation.labels})
      print("({}/{}) Acc: {}".format(sample_i, mnist.train.num_examples, val_acc))# Let's show the final accuracy
  print('Teste accuracy: ', sess.run(accuracy, feed_dict={net_input: mnist.test.images, y_true: mnist.test.labels}))

a67nEvM.png!web

We have just trained our first NEURONAL NETWORK with TensorFlow!

Think a little bit about what we just did.

We have implemented a logistic regression, with this formula: y = G(Wx + b), where G = softmax() instead of the typical G = sigmoid().

If you look at the following image, which defines the perceptron (a single-layer neural network) you can see as output = Activation_function(Wx). You see? Only the bias is missing! And notice that the input is a 1? So the weight w0 is not multiplied by anything. Exactly! The weight w0 is the bias, which appears with this notation simply to be able to implement it as a matrix multiplication.

iyqqqmi.png!web

So, what we have just implemented is a perceptron, with

batch_size = 10
1 epoch
descent gradient as optimizer
and softmax as activation function.

Final Words

As always, I hope you enjoyed the post, that you have learned how to use TensorFlow to solve linear problems and that you have succesfully trained your first Neural Network!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!

Learn how to Solve Optimization Problems and Train your First Neural Network with the MNIST Dataset!

Introduction

Optimization Problem

Width = 4 + x + 4 = x +8

Height = 2 + y + 2 = y +4

Area = Width x Height = (x + 8)*(y + 4)

S’(y) = 8–2400/y²

S’’(y) = 4800/y³

Width: x + 8 = 42.64 m

Height: y + 4 = 21.32 m

Let’s See other Example

Let’s Solve a Linear Regression Problem

MNIST Dataset

SoftMax Function

Final Words

Recommend

Our Pledge to Open Source | PhpStorm Blog

腾讯基金会捐赠3亿元人民币紧急驰援肺炎疫情

中科院:取消 “木兰”语言问题当事人五年内晋升资格

支付宝五福开奖超3亿人分5亿大多分到1.68元

搜索大数据：疫情风暴中心的武汉人到底关心什么？

《囧妈》初一免费在线播出 “春节档”的流媒体布局

Daniel Connolly: The exciting possibilities for Bitcoin SV after Genesis

Samsung Galaxy Fold review: The future is an ugly disappointment | Ars Technica

由于最近的疫情，导致已经无心继续做目前的工作，如何回到正常生活？

布局无代码开发，谷歌云在全家桶中加入一道甜品

About Joyk