By 苏剑林 | October 13, 2017
In the past two years, Baidu's Big Data competitions were focused on natural language processing. This year, the style changed significantly to fine-grained image classification. The task of the competition was to classify pet dogs into one of 100 categories. This task itself is quite standard, and the approach is conventional, involving three aspects: data augmentation, fine-tuning ImageNet models, and model ensemble. I am not particularly good at model ensemble, so I only performed the first two steps. My results were average (accuracy around 80%), but I feel that some of the code might be helpful to readers, so I am sharing it here. Below is an explanation combined with the code.
Competition official website (may become invalid at any time): http://js.baidu.com
The model is mainly implemented using TensorFlow and Keras. First, of course, is importing various modules:
# [Code block for importing modules]
Next is the model. The base model is Xception, followed by the use of the GLU (Gated Linear Unit) activation function to compress features, and finally a Softmax layer for classification. Additionally, Center Loss and auxiliary loss (direct skip connections) were added as auxiliary components; these two terms can be viewed as regularization terms.
# [Code block for model definition]
Regarding the training strategy, training is divided into three steps:
1. Freeze all parameters of Xception and train only the additionally added fully connected layers and the Center Loss part using the Adam optimizer;
2. Unfreeze two blocks of Xception and switch to SGD for fine-tuning;
3. Remove most of the data augmentation and continue fine-tuning with SGD.
The code is as follows:
# [Code block for training strategies]
Next is the data preparation for this competition. The official organizers eventually provided 18,000 training images.
# [Code block for data preparation]
Following this are some data augmentation codes, written by hand without using any existing libraries. The advantage is high customizability. Of course, it is uncertain whether every single one of these data augmentation methods provides an improvement for the problem.
# [Code block for data augmentation]
We write an iterator to generate batches of training and test data:
# [Code block for iterator implementation]
First, we train the model, then use the model to make predictions. Next, we mix the prediction results with the training set to train together, then obtain new prediction results, which can be iteratively mixed and trained again. This is a transfer learning idea (pseudo-labeling) that can improve performance by about 0.5% to 1%.
Additionally, since the model has two predictions—one predicted using the GLU features and one predicted directly using the Xception features—we can perform a weighted average of the two prediction results to improve performance. The weights are determined by the validation set.
# [Code block for training and prediction process]
Although this article is quite long, most of it consists of code. From the perspective of effectiveness, it serves as a baseline, provided only for beginners to learn. Experts are welcome to leave comments and provide guidance. Thank you.
Complete code can be found at: https://github.com/bojone/baidu_dog_classifier