By 苏剑林 | August 06, 2018
Reflecting on these past two or three years since entering the field of machine learning, Keras has always been by my side. If I hadn't come across such an easy-to-use framework as Keras when I first fell into this pit, allowing me to quickly implement my ideas, I am not sure if I would have had the perseverance to stick with it. After all, back then the world belonged to the likes of Theano, Pylearn, Caffe, and Torch, which still seem like "books from heaven" to me even today.
Later, to broaden my horizons, I also spent some time learning TensorFlow and wrote several programs in pure TensorFlow, but no matter what, I still couldn't let go of Keras. As my understanding of Keras deepened—especially after spending a little time researching the source code—I discovered that Keras is not as "inflexible" as everyone complains it is. In fact, Keras's exquisite encapsulation allows us to easily implement many complex functions. I increasingly feel that Keras is like a very exquisite work of art, fully reflecting the profound creative prowess of its developers.
This article introduces some content regarding standardizing custom models in Keras. Relatively speaking, this belongs to the advanced content of Keras; friends who have just started, please temporarily ignore this.
Here we introduce custom layers in Keras and some of their application techniques. Within these, we can see the ingenuity of Keras layers.
In Keras, the simplest way to customize a layer is through the Lambda layer:
from keras.layers import *
from keras import backend as K
x_in = Input(shape=(10,))
x = Lambda(lambda x: x+2)(x_in) # Add 2 to the input
Sometimes, we want to distinguish between the training phase and the testing phase. For example, adding some noise to the input during the training phase and removing the noise during the testing phase. This needs to be implemented using K.in_train_phase, for instance:
def add_noise_in_train(x):
x_ = x + K.random_normal(shape=K.shape(x)) # Add standard Gaussian noise
return K.in_train_phase(x_, x)
x_in = Input(shape=(10,))
x = Lambda(add_noise_in_train)(x_in) # Add Gaussian noise during training, remove it during testing
Of course, the Lambda layer is only suitable for situations where no training parameters need to be added. If the functionality you want to implement requires adding new parameters to the model, then you must use a custom Layer. In fact, this is not complicated; compared to the Lambda layer, it’s just a few more lines of code. The official documentation explains it very clearly: https://keras.io/layers/writing-your-own-keras-layers/
Here is an example carried over from that page:
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim # You can define custom attributes for easy access
super(MyLayer, self).__init__(**kwargs) # Mandatory
def build(self, input_shape):
# Add trainable parameters
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
def call(self, x):
# Define the function, equivalent to the function in a Lambda layer
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
# Calculate the output shape. If input and output shapes are identical, this can be omitted, but otherwise, it's best to include it.
return (input_shape[0], self.output_dim)
Almost all the layers we usually encounter are single-output, including all the built-in layers in Keras, which take one or more inputs and then return a single result as an output. So, can Keras define a layer with dual outputs? The answer is yes, but you must clearly define the output_shape. For example, the following layer simply splits the input into two halves and returns both simultaneously.
class SplitVector(Layer):
def __init__(self, **kwargs):
super(SplitVector, self).__init__(**kwargs)
def call(self, inputs):
# Slice the tensor along the second dimension and return a list
in_dim = K.int_shape(inputs)[-1]
return [inputs[:, :in_dim//2], inputs[:, in_dim//2:]]
def compute_output_shape(self, input_shape):
# output_shape must also be a corresponding list
in_dim = input_shape[-1]
return [(None, in_dim//2), (None, in_dim-in_dim//2)]
x1, x2 = SplitVector()(x_in) # Usage
Readers with experience from the article "Customizing Complex Loss Functions in Keras" know that the basic definition of loss in Keras is a function with inputs y_true and y_pred. However, in more complex situations, it is not just a function of the predicted values and target values; it can also be combined with weights for complex calculations.
Here, using center loss as an example again, we introduce a writing method based on custom layers.
class Dense_with_Center_loss(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(Dense_with_Center_loss, self).__init__(**kwargs)
def build(self, input_shape):
# Add trainable parameters
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='glorot_normal',
trainable=True)
self.bias = self.add_weight(name='bias',
shape=(self.output_dim,),
initializer='zeros',
trainable=True)
self.centers = self.add_weight(name='centers',
shape=(self.output_dim, input_shape[1]),
initializer='glorot_normal',
trainable=True)
def call(self, inputs):
# For center loss, the return result is still consistent with the return result of Dense
# So it is still a ordinary matrix multiplication plus a bias
self.inputs = inputs
return K.dot(inputs, self.kernel) + self.bias
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
def loss(self, y_true, y_pred, lamb=0.5):
# Define the complete loss
y_true = K.cast(y_true, 'int32') # Ensure y_true's dtype is int32
crossentropy = K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
centers = K.gather(self.centers, y_true[:, 0]) # Retrieve sample centers
center_loss = K.sum(K.square(centers - self.inputs), axis=1) # Calculate center loss
return crossentropy + lamb * center_loss
f_size = 2
x_in = Input(shape=(784,))
f = Dense(f_size)(x_in)
dense_center = Dense_with_Center_loss(10)
output = dense_center(f)
model = Model(x_in, output)
model.compile(loss=dense_center.loss,
optimizer='adam',
metrics=['sparse_categorical_accuracy'])
# Here y_train is the integer ID of the class, no need to convert to one-hot
model.fit(x_train, y_train, epochs=10)
In addition to modifying the model, we might also do many things during the training process. For example, after each epoch ends, calculate metrics on the validation set, save the best model, or perhaps reduce the learning rate after a certain number of epochs, or modify regularization parameters, etc. All of these can be implemented through callbacks.
Official Callbacks page: https://keras.io/callbacks/
In Keras, the most convenient way to keep the best model based on validation set metrics is through the built-in ModelCheckpoint, for example:
checkpoint = ModelCheckpoint(filepath='./best_model.weights',
monitor='val_acc',
verbose=1,
save_best_only=True)
model.fit(x_train,
y_train,
epochs=10,
validation_data=(x_test, y_test),
callbacks=[checkpoint])
However, while this method is simple, it has a significant drawback: the metrics inside are determined by the metrics in compile. In Keras, defining a custom metric requires writing it as a tensor operation. In other words, if the index you expect cannot be written as a tensor operation (such as the BLEU score), then it cannot be written as a metric function and this solution cannot be used.
Therefore, a universal solution emerges: write your own callback—calculate whatever you want. For example:
from keras.callbacks import Callback
def evaluate(): # Evaluation function
pred = model.predict(x_test)
return np.mean(pred.argmax(axis=1) == y_test) # Calculate whatever you want
# Define the Callback class, calculate validation set acc, and save the best model
class Evaluate(Callback):
def __init__(self):
self.accs = []
self.highest = 0.
def on_epoch_end(self, epoch, logs=None):
acc = evaluate()
self.accs.append(acc)
if acc >= self.highest: # Save best model weights
self.highest = acc
model.save_weights('best_model.weights')
# Run whatever you want
print 'acc: %s, highest: %s' % (acc, self.highest)
evaluator = Evaluate()
model.fit(x_train,
y_train,
epochs=10,
callbacks=[evaluator])
During training, you may also need to fine-tune hyperparameters. For example, a common requirement is to adjust the learning rate based on the epoch. This can be easily implemented through LearningRateScheduler, which is also one of the callbacks.
from keras.callbacks import LearningRateScheduler
def lr_schedule(epoch):
# Return different learning rates based on the epoch
if epoch < 50:
lr = 1e-2
elif epoch < 80:
lr = 1e-3
else:
lr = 1e-4
return lr
lr_scheduler = LearningRateScheduler(lr_schedule)
model.fit(x_train,
y_train,
epochs=10,
callbacks=[evaluator, lr_scheduler])
What if it's another hyperparameter? For example, the lamb in the previous center loss, or a similar regularization term. In this case, we need to set lamb as a Variable and then customize a callback to assign values dynamically. For instance, a loss I previously defined:
def mycrossentropy(y_true, y_pred, e=0.1):
loss1 = K.categorical_crossentropy(y_true, y_pred)
loss2 = K.categorical_crossentropy(K.ones_like(y_pred)/nb_classes, y_pred)
return (1-e)*loss1 + e*loss2
If you want to dynamically change the parameter e, it can be changed to:
e = K.variable(0.1)
def mycrossentropy(y_true, y_pred):
loss1 = K.categorical_crossentropy(y_true, y_pred)
loss2 = K.categorical_crossentropy(K.ones_like(y_pred)/nb_classes, y_pred)
return (1-e)*loss1 + e*loss2
model.compile(loss=mycrossentropy,
optimizer='adam')
class callback4e(Callback):
def __init__(self, e):
self.e = e
def on_epoch_end(self, epoch, logs={}):
if epoch >= 100: # Set to 0.01 after 100 epochs
K.set_value(self.e, 0.01)
model.fit(x_train,
y_train,
epochs=10,
callbacks=[callback4e(e)])
Note that the Callback class supports six different execution functions for different stages: on_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end. Each function is executed at a different stage (easy to judge by the name), and they can be combined to implement very complex functions. For example, "warmup" refers to a setting where, after the default learning rate is set, it is not used for training immediately. Instead, in the first few epochs, the learning rate is slowly increased from zero to the default learning rate. This process can be understood as allowing the model to adjust to a better initialization. Reference code:
class Evaluate(Callback):
def __init__(self):
self.num_passed_batchs = 0
self.warmup_epochs = 10
def on_batch_begin(self, batch, logs=None):
# params are some parameters automatically passed to the Callback by the model
if self.params['steps'] == None:
self.steps_per_epoch = np.ceil(1. * self.params['samples'] / self.params['batch_size'])
else:
self.steps_per_epoch = self.params['steps']
if self.num_passed_batchs < self.steps_per_epoch * self.warmup_epochs:
# In the first 10 epochs, the learning rate increases linearly from zero to 0.001
K.set_value(self.model.optimizer.lr,
0.001 * (self.num_passed_batchs + 1) / self.steps_per_epoch / self.warmup_epochs)
self.num_passed_batchs += 1
There are many other noteworthy techniques in Keras, such as directly using model.add_loss to flexibly add losses, nested model calls, using it purely as a simple high-level API for TensorFlow, and so on. They won't be listed one by one. Readers who have questions or are interested are welcome to leave a comment for discussion.
Usually, we think that highly encapsulated libraries like Keras lack flexibility, but that's actually not the case. You should know that Keras does not simply call existing high-level functions in TensorFlow or Theano; instead, it only encapsulates some basic functions through the backend, and then rewrites everything else (various layers, optimizers, etc.) using its own backend! It is precisely because of this that it can support switching between different backends.
Being able to achieve this level of performance, Keras's flexibility is beyond dispute. However, this flexibility is difficult to reflect in help documentation and ordinary cases. Most of the time, you need to read the source code to feel that Keras's way of writing is already impeccable. I feel that using Keras to implement complex models is both a challenge and a kind of artistic creation. When you succeed, you will be intoxicated by the piece of art you have created.