Intelligent Mobile Projects with TensorFlow
上QQ阅读APP看书,第一时间看更新

Transfer learning – what and why

We human beings don't learn new things from scratch. Instead, we take advantage of what we have learned as much as possible, consciously or not. Transfer learning in AI attempts to do the same thing—it's a technique that takes a normally small piece of a big trained model and reuses it in a new model for a related task, without the need to access the large training data and computing resources to train the original model. Overall, transfer learning is still an open problem in AI, since in many situations, what takes human beings only a few examples of trial-and-errors before learning to grasp something new would take AI a lot more time to train and learn. But in the field of image recognition, transfer learning has proven to be very effective.

Modern deep learning models for image recognition are typically deep neural networks, or more specifically, deep Convolutional Neural Networks (CNNs), with many layers. Lower layers of such a CNN are responsible for learning and recognizing lower-level features such as an image's edges, contours, and parts, while the final layer determines the image's category. For different types of objects, such as dog breeds or flower types, we don't need to relearn the parameters, or weights, of lower layers of a network. In fact, it'd take many weeks of training from scratch to learn all the weights, typically millions or even more, of a modern CNN for image recognition. Transfer learning in the case of image classification allows us to just retrain the last layer of such a CNN with our specific set of images, usually in less than an hour, leaving the weights of all the other layers unchanged and reaching about the same accuracy as if we'd trained the whole network from scratch for weeks.

The second main benefit of transfer learning is that we only need a small amount of training data to retrain the last layer of a CNN. We'd need a large amount of training data if we have to train millions of parameters of a deep CNN from scratch. For example, for our dog breed retraining, we only need 100+ images for each dog breed to build a model with better dog breed classification than the original image classification model.

If you're unfamiliar with CNN, check out the videos and notes of one of the best resources on it, the Stanford CS231n course CNN for Visual Recognition (http://cs231n.stanford.edu). Another good resource on CNN is Chapter 6 of Michael Nielsen's online book, Neural Networks and Deep Learning: http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks.

In the following two sections, we'll use two of the best pretrained CNN models for TensorFlow and a dog breed dataset to retrain the models and generate better dog breed recognition models. The first model is Inception v3, a more accurate model than Inception v1, optimized for accuracy but with a larger size. The other model is MobileNet, optimized for size and efficiency on mobile devices. A detailed list of pretrained models supported in TensorFlow is at https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models.