Retraining using MobileNet models
The stripped and quantized model generated in the previous section is still over 20 MB in size. This is because the pre-built Inception v3 model used for retraining is a large-scale deep learning model, with over 25 million parameters, and Inception v3 was not created with a mobile-first goal.
In June 2017, Google released MobileNets v1, a total of 16 mobile-first deep learning models for TensorFlow. These models are only a few MB in size, with 0.47 million to 4.24 million parameters, still achieving decent accuracy (just a bit lower than Inception v3). See its README for more information: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md.
The retrain.py script discussed in the previous section also supports retraining based on MobileNet models. Simply run a command like the following:
python tensorflow/examples/image_retraining/retrain.py --output_graph=/tf_files/dog_retrained_mobilenet10_224.pb --output_labels=/tf_files/dog_retrained_labels_mobilenet.txt --image_dir ~/Downloads/Images --bottleneck_dir=/tf_files/dogs_bottleneck_mobilenet --architecture mobilenet_1.0_224
The generated label file dog_retrained_labels_mobilenet.txt actually is the same as the label file generated during retraining using the Inception v3 model. The --architecture parameter specifies one of the 16 MobileNet models, and the value mobilenet_1.0_224 means using the model with 1.0 as the parameter size (the other three possible values are 0.75, 0.50, and 0.25 – 1.0 for most parameters and accurate but largest size, 0.25 the opposite) and 224 as the image input size (the other three values are 192, 160, and 128). If you add _quantized to the end of the—architecture value, that is --architecture mobilenet_1.0_224_quantized, the model will also be quantized, resulting in the retrained model size of about 5.1 MB. The non-quantized model has a size of about 17 MB.
You can test the model generated previously with label_image as follows:
bazel-bin/tensorflow/examples/label_image/label_image --graph=/tf_files/dog_retrained_mobilenet10_224.pb --image=/tmp/lab1.jpg --input_layer=input --output_layer=final_result --labels=/tf_files/dog_retrained_labels_mobilenet.txt --input_height=224 --input_width=224 --input_mean=128 --input_std=128 n02099712 labrador retriever (41): 0.824675 n02099601 golden retriever (64): 0.144245 n02104029 kuvasz (76): 0.0103533 n02087394 rhodesian ridgeback (105): 0.00528782 n02090379 redbone (32): 0.0035457
Notice that when running label_image, the input_layer is named as input. We can find this name using the interactive iPython code or the summarize graph tool seen earlier:
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/tf_files/dog_retrained_mobilenet10_224.pb Found 1 possible inputs: (name=input, type=float(1), shape=[1,224,224,3]) No variables spotted. Found 1 possible outputs: (name=final_result, op=Softmax) Found 4348281 (4.35M) const parameters, 0 (0) variable parameters, and 0 control_edges Op types used: 92 Const, 28 Add, 27 Relu6, 15 Conv2D, 13 Mul, 13 DepthwiseConv2dNative, 10 Dequantize, 3 Identity, 1 MatMul, 1 BiasAdd, 1 Placeholder, 1 PlaceholderWithDefault, 1 AvgPool, 1 Reshape, 1 Softmax, 1 Squeeze
So, when should we use an Inception v3 or MobileNet retrained model on mobile devices? In cases where you want to achieve the highest possible accuracy, you should and can use the retrained model based on Inception v3. If speed is your top consideration, you should consider using a MobileNet retrained model with the smallest parameter size and image input size, in exchange for some accuracy loss.
One tool to give you an accurate benchmark of a model is benchmark_model. First, build it as follows:
bazel build -c opt tensorflow/tools/benchmark:benchmark_model
Then, run it against the Inception v3 or MobileNet v1-based retrain model:
bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=/tf_files/quantized_stripped_dogs_retrained.pb --input_layer="Mul" --input_layer_shape="1,299,299,3" --input_layer_type="float" --output_layer="final_result" --show_run_order=false --show_time=false --show_memory=false --show_summary=true
You'll get a pretty long output and at the end there's a line like FLOPs estimate: 11.42 B, meaning it'd take the Inception v3-based retrained model about 11 B FLOPS (floating point operations) to make an inference. An iPhone 6 runs about 2 B FLOPS so it'd take about 5–6 seconds on an iPhone 6 to run the model. Other modern smartphones can run 10 B FLOPS.
By replacing the graph file with the MobileNet model-based retrained model dog_retrained_mobilenet10_224.pb and rerunning the benchmark tool, you'll see the FLOPS estimate becomes about 1.14 B, which is about 10 times faster.