
Implementing our first neural network
Great! Now that you've learned the architecture, basics, and scoping mechanism of TensorFlow, it's high time that we move on and implement something moderately complex. Let's implement a neural network. Precisely, we will implement a fully connected neural network model that we discussed in Chapter 1, Introduction to Natural Language Processing.
One of the stepping stones to the introduction of neural networks is to implement a neural network that is able to classify digits. For this task, we will be using the famous MNIST dataset made available at http://yann.lecun.com/exdb/mnist/. You might feel a bit skeptical regarding our using a computer vision task rather than a NLP task. However, vision tasks can be implemented with less preprocessing and are easy to understand.
As this is our first encounter with neural networks, we will walk through the main parts of the example. However, note that I will only walk through the crucial bits of the exercise. To run the example end to end, you can find the full exercise in the tensorflow_introduction.ipynb
file in the ch2
folder.
Preparing the data
First, we need to download the dataset with the maybe_download(...)
function and preprocess it with the read_mnist(...)
function. These two functions are defined in the exercise file. The read_mnist(...)
function performs two main steps:
- Reading the byte stream of the dataset and forming it into a proper
numpy.ndarray
object - Standardizing the images to have a zero-mean and unit-variance (also known as whitening)
The following code shows the read_mnist(...)
function. The read_mnist(...)
function takes the filename of the file containing images and the filename of the file containing labels, as input. Then the read_mnist(...)
function produces two NumPy matrices containing all the images and their corresponding labels:
def read_mnist(fname_img, fname_lbl): print('\nReading files %s and %s'%(fname_img, fname_lbl)) with gzip.open(fname_img) as fimg: magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16)) print(num,rows,cols) img = (np.frombuffer(fimg.read(num*rows*cols), dtype=np.uint8).reshape(num, rows * cols)).astype(np.float32) print('(Images) Returned a tensor of shape ',img.shape) # Standardizing the images img = (img - np.mean(img))/np.std(img) with gzip.open(fname_lbl) as flbl: # flbl.read(8) reads upto 8 bytes magic, num = struct.unpack(">II", flbl.read(8)) lbl = np.frombuffer(flbl.read(num), dtype=np.int8) print('(Labels) Returned a tensor of shape: %s'%lbl.shape) print('Sample labels: ',lbl[:10]) return img, lbl
Defining the TensorFlow graph
To define the TensorFlow graph, we'll first define placeholders for the input images (tf_inputs
) and the corresponding labels (tf_labels
):
# Defining inputs and outputs tf_inputs = tf.placeholder(shape=[batch_size, input_size], dtype=tf.float32, name = 'inputs') tf_labels = tf.placeholder(shape=[batch_size, num_labels], dtype=tf.float32, name = 'labels')
Next, we'll write a Python function that will create the variables for the first time. Note that we are using scoping to ensure the reusability, and make sure that our variables are named properly:
# Defining the TensorFlow variables def define_net_parameters(): with tf.variable_scope('layer1'): tf.get_variable(WEIGHTS_STRING,shape=[input_size,500], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[500], initializer=tf.random_uniform_initializer(0,0.01)) with tf.variable_scope('layer2'): tf.get_variable(WEIGHTS_STRING,shape=[500,250], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[250], initializer=tf.random_uniform_initializer(0,0.01)) with tf.variable_scope('output'): tf.get_variable(WEIGHTS_STRING,shape=[250,10], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[10], initializer=tf.random_uniform_initializer(0,0.01))
Next, we'll define the inference process for the neural network. Note how the scoping has given a very intuitive flow to the code in the function, compared with using variables without scoping. So, in this network we have three layers:
- A fully-connected layer with ReLU activation (
layer1
) - A fully-connected layer with ReLU activation (
layer2
) - A fully-connected softmax layer (
output
)
By means of scoping, we name variables (weights and biases) for each layer as, layer1/weights
, layer1/bias
, layer2/weights
, layer2/bias
, output/weights
, and output/bias
. Note that in the code, all of them have the same name, but different scopes:
# Defining calcutations in the neural network # starting from inputs to logits # logits are the values before applying softmax to the final output def inference(x): # calculations for layer 1 with tf.variable_scope('layer1',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_h1 = tf.nn.relu(tf.matmul(x,w) + b, name = 'hidden1') # calculations for layer 2 with tf.variable_scope('layer2',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_h2 = tf.nn.relu(tf.matmul(tf_h1,w) + b, name = 'hidden1') # calculations for output layer with tf.variable_scope('output',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_logits = tf.nn.bias_add(tf.matmul(tf_h2,w), b, name = 'logits') return tf_logits
Now we'll define a loss function and then a loss minimize operation. The loss minimize operation minimizes the loss by nudging the network parameters in the direction that minimizes the loss. There is a perse collection of optimizers available in TensorFlow. Here, we will be using MomentumOptimizer
, which gives better final accuracy and convergence than GradientDescentOptimizer
:
# defining the loss tf_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=inference(tf_inputs), labels=tf_labels)) # defining the optimize function tf_loss_minimize = tf.train.MomentumOptimizer(momentum=0.9,learning_rate=0.01).minimize(tf_loss)
Finally, we'll define an operation to retrieve the predicted softmax probabilities for a given batch of inputs. This in turn will be used to calculate the accuracy of your neural network:
# defining predictions tf_predictions = tf.nn.softmax(inference(tf_inputs))
Running the neural network
Now we have all the essential operations required to run the neural network and examine whether it's capable of learning to successfully classify digits:
for epoch in range(NUM_EPOCHS): train_loss = [] # Training Phase for step in range(train_inputs.shape[0]//batch_size): # Creating one-hot encoded labels with labels # One-hot encoding digit 3 for 10-class MNIST dataset # will result in # [0,0,0,1,0,0,0,0,0,0] labels_one_hot = np.zeros((batch_size, num_labels),dtype=np.float32) labels_one_hot[np.arange(batch_size),train_labels[step*batch_size:(step+1)*batch_size]] = 1.0 # Running the optimization process loss, _ = session.run([tf_loss,tf_loss_minimize],feed_dict={ tf_inputs: train_inputs[step*batch_size: (step+1)*batch_size,:], tf_labels: labels_one_hot}) train_loss.append(loss) # Used to average the loss for a single epoch test_accuracy = [] # Testing Phase for step in range(test_inputs.shape[0]//batch_size): test_predictions = session.run(tf_predictions,feed_dict={tf_inputs: test_inputs[step*batch_size: (step+1)*batch_size,:]}) batch_test_accuracy = accuracy(test_predictions,test_labels[step*batch_size: (step+1)*batch_size]) test_accuracy.append(batch_test_accuracy) print('Average train loss for the %d epoch: %.3f\n'%(epoch+1,np.mean(train_loss))) print('\tAverage test accuracy for the %d epoch: %.2f\n'%(epoch+1,np.mean(test_accuracy)*100.0))
In this code, accuracy(test_predictions,test_labels)
is a function that takes some predictions and labels as inputs and provides the accuracy (how many predictions matched the actual label). It is defined in the exercise file.
If successful, you should be able to see a behavior similar to the ones shown in Figure 2.10. After 50 epochs, the test accuracy should reach approximately 98%:

Figure 2.10: Training loss and test accuracy for the MNIST digit classification task