Intelligent Mobile Projects with TensorFlow
上QQ阅读APP看书,第一时间看更新

Retraining using the Inception v3 model

In the TensorFlow source that we set up in the previous chapter, there's a Python script, tensorflow/examples/image_retraining/retrain.py, that you can use to retrain the Inception v3 or MobileNet models. Before we run the script to retrain the Inception v3 model for our dog breed recognition, we need to first download the Stanford Dogs Dataset (http://vision.stanford.edu/aditya86/ImageNetDogs), which contains images of 120 dog breeds (you only need to download the Images in the link, not the Annotations).

Untar the downloaded dog images.tar file in ~/Downloads, and you should see a list of folders in ~/Downloads/Images, as shown in the following screenshot. Each folder corresponds to one dog breed and contains about 150 images (you don't need to supply explicit labels for images as the folder names are used to label the images contained within the folders):

Figure 2.1 Dogset images separated by folders, or labels of dog breed

You may download the dataset and then run the retrain.py script on Mac, as it doesn't take too long (less than an hour) for the script to run on the relatively small dataset (about 20,000 images in total), but if you do that on a GPU-powered Ubuntu, as set up in the last chapter, the script can complete in just a few minutes. In addition, when retraining with a large image dataset, running on Mac may take hours or days so it makes sense to run it on a GPU-powered machine.

The command to retrain the model, assuming you have created a /tf_file directory and also a /tf_file/dogs_bottleneck directory, is as follows:

python tensorflow/examples/image_retraining/retrain.py  
--model_dir=/tf_files/inception-v3  
--output_graph=/tf_files/dog_retrained.pb  
--output_labels=/tf_files/dog_retrained_labels.txt  
--image_dir ~/Downloads/Images  
--bottleneck_dir=/tf_files/dogs_bottleneck 

The five parameters need a little explanation here:

  • --model_dir specifies the directory path where the Inception v3 model should be downloaded, automatically by the retrain.py, unless it's already in the directory.
  • --output_graph indicates the name and path of the retrained model.
  • --output_labels is a file consisting of the folder (label) names of the image dataset, that is used later with the retrained model to classify new images.
  • --image_dir is the path to the image dataset used to retrain the Inception v3 model.
  • --bottleneck_dir is used to cache the results generated on bottleneck, the layer before the final layer; the final layer performs classification using those results. During retraining, each image is used several times but the bottleneck values for the image remains the same, even for future reruns of the retrain script. So the first run takes much longer, as it needs to create the bottleneck results.

During the retraining, you'll see 3 values every 10 steps, for a default total of 4,000 steps. The first 20 and last 20 steps and final accuracy look like the following:

INFO:tensorflow:2018-01-03 10:42:53.127219: Step 0: Train accuracy = 21.0% 
INFO:tensorflow:2018-01-03 10:42:53.127414: Step 0: Cross entropy = 4.767182 
INFO:tensorflow:2018-01-03 10:42:55.384347: Step 0: Validation accuracy = 3.0% (N=100) 
INFO:tensorflow:2018-01-03 10:43:11.591877: Step 10: Train accuracy = 34.0% 
INFO:tensorflow:2018-01-03 10:43:11.592048: Step 10: Cross entropy = 4.704726 
INFO:tensorflow:2018-01-03 10:43:12.915417: Step 10: Validation accuracy = 22.0% (N=100) 
... 
... 
INFO:tensorflow:2018-01-03 10:56:16.579971: Step 3990: Train accuracy = 93.0% 
INFO:tensorflow:2018-01-03 10:56:16.580140: Step 3990: Cross entropy = 0.326892 
INFO:tensorflow:2018-01-03 10:56:16.692935: Step 3990: Validation accuracy = 89.0% (N=100) 
INFO:tensorflow:2018-01-03 10:56:17.735986: Step 3999: Train accuracy = 93.0% 
INFO:tensorflow:2018-01-03 10:56:17.736167: Step 3999: Cross entropy = 0.379192 
INFO:tensorflow:2018-01-03 10:56:17.846976: Step 3999: Validation accuracy = 90.0% (N=100) 
INFO:tensorflow:Final test accuracy = 91.0% (N=2109) 
 

The train accuracy is the classification accuracy on the images the neural network has used for training, and the validation accuracy is on the images the neural network has not used for training. So the validation accuracy is a more reliable measurement on how accurate the model is, and it normally should be a little less than the train accuracy, but not too much, if the training converges and goes well, that is, if the trained model is neither overfitting nor underfitting.

If the train accuracy gets high but the validation accuracy remains low, it means the model is overfitting. If the train accuracy remains low, it suggests the model is underfitting. Also, the cross entropy is the loss function value and if the retraining goes well, it should overall get smaller and smaller. Finally, the test accuracy is on the images that haven't been used for either training or validation. It's generally the most accurate value we can tell about a retrained model.

As the preceding outputs show, by the end of the retraining, we see the validation accuracy is similar to the train accuracy (90% and 93%, compared to 3% and 21% in the beginning) and the final test accuracy is 91%. The cross entropy also drops from 4.767 in the beginning to 0.379 in the end. So we have a pretty good retrained dog breed recognition model now.

To further improve the accuracy, you can play with the retrain.py's other parameters such as training steps (--how_many_training_steps), learning rate (--learning_rate), and data augmentation (--flip_left_right, --random_crop, --random_scale, --random_brightness). Generally, this is a tedious process that involves a lot of "dirty work" as called by Andrew Ng, one of the best-known deep learning experts, in his Nuts and Bolts of Applying Deep Learning speech (video is available at: https://www.youtube.com/watch?v=F1ka6a13S9I).

Another Python script you can use to give the retrained model a quick test on your own image (for example, a Labrador Retriever image in /tmp/lab1.jpg) is label_image, which you can run after first building it as follows:

bazel build tensorflow/examples/image_retraining:label_image 
 
bazel-bin/tensorflow/examples/label_image/label_image  
--graph=/tf_files/dog_retrained.pb  
--image=/tmp/lab1.jpg  
--input_layer=Mul  
--output_layer=final_result  
--labels=/tf_files/dog_retrained_labels.txt 

You'll see the top five classification results similar to (but, since networks vary randomly, likely not exactly the same as) the following:

n02099712 labrador retriever (41): 0.75551 
n02099601 golden retriever (64): 0.137506 
n02104029 kuvasz (76): 0.0228538 
n02090379 redbone (32): 0.00943663 
n02088364 beagle (20): 0.00672507 

The values of --input_layer (Mul) and --output_layer (final_result) are very important – they have to be the same as what are defined in the model for the classification to work at all. If you wonder how you can get them (from the graph, aka model, file dog_retrained.pb), there are two TensorFlow tools that can be helpful. The first one is the appropriately named summarize_graph. Here's how you can build and run it:

bazel build tensorflow/tools/graph_transforms:summarize_graph

bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/tf_files/dog_retrained.pb

You'll see the summary results similar to these:

No inputs spotted.
No variables spotted.
Found 1 possible outputs: (name=final_result, op=Softmax)
Found 22067948 (22.07M) const parameters, 0 (0) variable parameters, and 99 control_edges
Op types used: 489 Const, 101 Identity, 99 CheckNumerics, 94 Relu, 94 BatchNormWithGlobalNormalization, 94 Conv2D, 11 Concat, 9 AvgPool, 5 MaxPool, 1 DecodeJpeg, 1 ExpandDims, 1 Cast, 1 MatMul, 1 Mul, 1 PlaceholderWithDefault, 1 Add, 1 Reshape, 1 ResizeBilinear, 1 Softmax, 1 Sub

There's one possible output with the name final_result. Unfortunately, sometimes the summarize_graph tool doesn't tell us the input name, as it seems to be confused about the nodes used for training. After the nodes used only for training are stripped out, which we'll discuss soon, the summarize_graph tool will return the correct input name. Another tool called TensorBoard gives us a more complete picture of the model graph. If you have TensorFlow installed directly from binary, you should be able to just run TensorBoard, as by default, it's installed in /usr/local/bin. But if you install TensorFlow from source as we did in earlier, you can run the following commands to build TensorBoard:

git clone https://github.com/tensorflow/tensorboard 
cd tensorboard/ 
bazel build //tensorboard 

Now, make sure you have /tmp/retrained_logs, created automatically when you run retrain.py, and run:

bazel-bin/tensorboard/tensorboard --logdir /tmp/retrain_logs

Then launch the URL http://localhost:6006 on a browser. You'll first see the accuracy graph as shown in the following screenshot:

Figure 2.2 Train and validation accuracy of the Inception v3 retrained model

The cross_entropy graph in the following screenshot is just as we described earlier regarding the outputs of running retrain.py:

Figure 2.3 Train and validation cross entropy of the Inception v3 retrained model

Now click the GRAPHS tab, and you'll see an operation with the name Mul and another with the name final_result, as shown here:

Figure 2.4 The Mul and final_result nodes in the retrained model

Actually, if you prefer a small interaction with TensorFlow, you can try several lines of Python code to find out the names of the output layer and input layer, as shown here in an iPython interaction:

In [1]: import tensorflow as tf 
In [2]: g=tf.GraphDef() 
In [3]: g.ParseFromString(open("/tf_files/dog_retrained.pb", "rb").read()) 
In [4]: x=[n.name for n in g.node] 
In [5]: x[-1:] 
Out[5]: [u'final_result'] 

Note that this code snippet won't always work as the order of nodes isn't guaranteed, but it often gives you the information or validation you may need.

Now we're ready to discuss how to further modify the retrained model so it can be deployed and run on mobile devices. The size of the retrained model file dog_retrained.pb is too big, about 80 MB, which should go through two steps for optimization before being deployed to mobile devices:

  1. Strip unused nodes: remove the nodes in the model that are only used during training but not needed during inference.
  2. Quantize the model: convert all the 32-bit floating numbers for the model parameters to 8-bit values. This would reduce the model size to about 25% of its original size, while keeping the inference accuracy about the same.

TensorFlow's documentation (https://www.tensorflow.org/performance/quantization) offers more details on quantization and why it works.

There are two ways to perform the preceding two tasks: the older way that uses the strip_unused tool and the new way that uses the transform_graph tool.

Let's see how the older method works: first run the following commands to create a model with all the unused nodes stripped:

bazel build tensorflow/python/tools:strip_unused 
 
bazel-bin/tensorflow/python/tools/strip_unused  
  --input_graph=/tf_files/dog_retrained.pb  
  --output_graph=/tf_files/stripped_dog_retrained.pb  
  --input_node_names=Mul  
  --output_node_names=final_result  
     --input_binary=true 

If you run the preceding Python code with the output graph, you can find the right input layer name:

In [1]: import tensorflow as tf 
In [2]: g=tf.GraphDef() 
In [3]: g.ParseFromString(open("/tf_files/ stripped_dog_retrained.pb", "rb").read()) 
In [4]: x=[n.name for n in g.node] 
In [5]: x[0] 
Out[5]: [u'Mul'] 

Now run the following command to quantize the model:

python tensorflow/tools/quantization/quantize_graph.py  
   --input=/tf_files/stripped_dog_retrained.pb   
--output_node_names=final_result   
--output=/tf_files/quantized_stripped_dogs_retrained.pb  
   --mode=weights 

After this, the model quantized_stripped_dogs_retrained.pb is ready to be deployed and used in iOS and Android apps, which we'll see in the following sections of this chapter.

The other way of stripping unused nodes and quantizing the model is to use a tool called transform_graph. This is the recommended new way in TensorFlow 1.4 and works fine with the Python label_image script but still causes incorrect recognition results when deployed to iOS and Android apps.

bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=/tf_files/dog_retrained.pb --out_graph=/tf_files/transform_dog_retrained.pb --inputs='Mul' --outputs='final_result' --transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights'

Using the label_image script on test both quantized_stripped_dogs_retrained.pb and transform_dog_retrained.pb works correctly. But only the first one works correctly in iOS and Android apps.

For detailed documentation on the graph transform tool, see its GitHub README at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md.