12

Fizz Buzz in Tensorflow

 4 years ago
source link: https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Joel Grus – Fizz Buzz in Tensorflow

Joel Grus

is sort of a famous author

interviewer: Welcome, can I get you coffee or anything? Do you need a break?

me: No, I've probably had too much coffee already!

interviewer: Great, great. And are you OK with writing code on the whiteboard?

me: It's the only way I code!

interviewer: ...

me: That was a joke.

interviewer: OK, so are you familiar with "fizz buzz"?

me: ...

interviewer: Is that a yes or a no?

me: It's more of a "I can't believe you're asking me that."

interviewer: OK, so I need you to print the numbers from 1 to 100, except that if the number is divisible by 3 print "fizz", if it's divisible by 5 print "buzz", and if it's divisible by 15 print "fizzbuzz".

me: I'm familiar with it.

interviewer: Great, we find that candidates who can't get this right don't do well here.

me: ...

interviewer: Here's a marker and an eraser.

me: [thinks for a couple of minutes]

interviewer: Do you need help getting started?

me: No, no, I'm good. So let's start with some standard imports:

import numpy as np
import tensorflow as tf

interviewer: Um, you understand the problem is fizzbuzz, right?

me: Do I ever. So, now let's talk models. I'm thinking a simple multi-layer-perceptron with one hidden layer.

interviewer: Perceptron?

me: Or neural network, whatever you want to call it. We want the input to be a number, and the output to be the correct "fizzbuzz" representation of that number. In particular, we need to turn each input into a vector of "activations". One simple way would be to convert it to binary.

interviewer: Binary?

me: Yeah, you know, 0's and 1's? Something like:

def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

interviewer: [stares at whiteboard for a minute]

me: And our output will be a one-hot encoding of the fizzbuzz representation of the number, where the first position indicates "print as-is", the second indicates "fizz", and so on:

def fizz_buzz_encode(i):
    if   i % 15 == 0: return np.array([0, 0, 0, 1])
    elif i % 5  == 0: return np.array([0, 0, 1, 0])
    elif i % 3  == 0: return np.array([0, 1, 0, 0])
    else:             return np.array([1, 0, 0, 0])

interviewer: OK, that's probably enough.

me: That's enough setup, you're exactly right. Now we need to generate some training data. It would be cheating to use the numbers 1 to 100 in our training data, so let's train it on all the remaining numbers up to 1024:

NUM_DIGITS = 10
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = np.array([fizz_buzz_encode(i)          for i in range(101, 2 ** NUM_DIGITS)])

interviewer: ...

me: Now we need to set up our model in tensorflow. Off the top of my head I'm not sure how many hidden units to use, maybe 10?

interviewer: ...

me: Yeah, possibly 100 is better. We can always change it later.

NUM_HIDDEN = 100

We'll need an input variable with width NUM_DIGITS, and an output variable with width 4:

X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])

interviewer: How far are you intending to take this?

me: Oh, just two layers deep -- one hidden layer and one output layer. Let's use randomly-initialized weights for our neurons:

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])

And we're ready to define the model. As I said before, one hidden layer, and let's use, I don't know, ReLU activation:

def model(X, w_h, w_o):
    h = tf.nn.relu(tf.matmul(X, w_h))
    return tf.matmul(h, w_o)

We can use softmax cross-entropy as our cost function and try to minimize it:

py_x = model(X, w_h, w_o)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

interviewer: ...

me: And, of course, the prediction will just be the largest output:

predict_op = tf.argmax(py_x, 1)

interviewer: Before you get too far astray, the problem you're supposed to be solving is to generate fizz buzz for the numbers from 1 to 100.

me: Oh, great point, the predict_op function will output a number from 0 to 3, but we want a "fizz buzz" output:

def fizz_buzz(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

interviewer: ...

me: So now we're ready to train the model. Let's grab a tensorflow session and initialize the variables:

with tf.Session() as sess:
    tf.initialize_all_variables().run()

Now let's run, say, 1000 epochs of training?

interviewer: ...

me: Yeah, maybe that's not enough -- so let's do 10000 just to be safe.

And our training data are sequential, which I don't like, so let's shuffle them each iteration:

    for epoch in range(10000):
        p = np.random.permutation(range(len(trX)))
        trX, trY = trX[p], trY[p]

And each epoch we'll train in batches of, I don't know, 128 inputs?

BATCH_SIZE = 128

So each training pass looks like

        for start in range(0, len(trX), BATCH_SIZE):
            end = start + BATCH_SIZE
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

and then we can print the accuracy on the training data, since why not?

        print(epoch, np.mean(np.argmax(trY, axis=1) ==
                             sess.run(predict_op, feed_dict={X: trX, Y: trY})))

interviewer: Are you serious?

me: Yeah, I find it helpful to see how the training accuracy evolves.

interviewer: ...

me: So, once the model has been trained, it's fizz buzz time. Our input should just be the binary encoding of the numbers 1 to 100:

    numbers = np.arange(1, 101)
    teX = np.transpose(binary_encode(numbers, NUM_DIGITS))

And then our output is just our fizz_buzz function applied to the model output:

    teY = sess.run(predict_op, feed_dict={X: teX})
    output = np.vectorize(fizz_buzz)(numbers, teY)

    print(output)

interviewer: ...

me: And that should be your fizz buzz!

interviewer: Really, that's enough. We'll be in touch.

me: In touch, that sounds promising.

interviewer: ...

Postscript

I didn't get the job. So I tried actually running this (code on GitHub), and it turned out it got some of the outputs wrong! Thanks a lot, machine learning!

In [185]: output
Out[185]:
array(['1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', 'buzz',
       '11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19',
       'buzz', '21', '22', '23', 'fizz', 'buzz', '26', 'fizz', '28', '29',
       'fizzbuzz', '31', 'fizz', 'fizz', '34', 'buzz', 'fizz', '37', '38',
       'fizz', 'buzz', '41', '42', '43', '44', 'fizzbuzz', '46', '47',
       'fizz', '49', 'buzz', 'fizz', '52', 'fizz', 'fizz', 'buzz', '56',
       'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz',
       'fizz', '67', '68', '69', 'buzz', '71', 'fizz', '73', '74',
       'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', '81', '82', '83',
       '84', 'buzz', '86', '87', '88', '89', 'fizzbuzz', '91', '92', '93',
       '94', 'buzz', 'fizz', '97', '98', 'fizz', 'fizz'],
      dtype='<U8')

I guess maybe I should have used a deeper network.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK