“Make Keras a Bit Cooler!”: Ingenious Layers and Fancy Callbacks

By 苏剑林 | August 06, 2018

Keras Accompanying Me Over the Years

Reflecting on these past two or three years since entering the field of machine learning, Keras has always been by my side. If I hadn't come across such an easy-to-use framework as Keras when I first fell into this pit, allowing me to quickly implement my ideas, I am not sure if I would have had the perseverance to stick with it. After all, back then the world belonged to the likes of Theano, Pylearn, Caffe, and Torch, which still seem like "books from heaven" to me even today.

Later, to broaden my horizons, I also spent some time learning TensorFlow and wrote several programs in pure TensorFlow, but no matter what, I still couldn't let go of Keras. As my understanding of Keras deepened—especially after spending a little time researching the source code—I discovered that Keras is not as "inflexible" as everyone complains it is. In fact, Keras's exquisite encapsulation allows us to easily implement many complex functions. I increasingly feel that Keras is like a very exquisite work of art, fully reflecting the profound creative prowess of its developers.

This article introduces some content regarding standardizing custom models in Keras. Relatively speaking, this belongs to the advanced content of Keras; friends who have just started, please temporarily ignore this.

Customizing Layers

Here we introduce custom layers in Keras and some of their application techniques. Within these, we can see the ingenuity of Keras layers.

Basic Definition Method

In Keras, the simplest way to customize a layer is through the Lambda layer:

from keras.layers import *
from keras import backend as K

x_in = Input(shape=(10,))
x = Lambda(lambda x: x+2)(x_in) # Add 2 to the input

Sometimes, we want to distinguish between the training phase and the testing phase. For example, adding some noise to the input during the training phase and removing the noise during the testing phase. This needs to be implemented using K.in_train_phase, for instance:

def add_noise_in_train(x):
    x_ = x + K.random_normal(shape=K.shape(x)) # Add standard Gaussian noise
    return K.in_train_phase(x_, x)

x_in = Input(shape=(10,))
x = Lambda(add_noise_in_train)(x_in) # Add Gaussian noise during training, remove it during testing

Of course, the Lambda layer is only suitable for situations where no training parameters need to be added. If the functionality you want to implement requires adding new parameters to the model, then you must use a custom Layer. In fact, this is not complicated; compared to the Lambda layer, it’s just a few more lines of code. The official documentation explains it very clearly: https://keras.io/layers/writing-your-own-keras-layers/

Here is an example carried over from that page:

class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim # You can define custom attributes for easy access
        super(MyLayer, self).__init__(**kwargs) # Mandatory

    def build(self, input_shape):
        # Add trainable parameters
        self.kernel = self.add_weight(name='kernel',
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)

    def call(self, x):
        # Define the function, equivalent to the function in a Lambda layer
        return K.dot(x, self.kernel)

    def compute_output_shape(self, input_shape):
        # Calculate the output shape. If input and output shapes are identical, this can be omitted, but otherwise, it's best to include it.
        return (input_shape[0], self.output_dim)

Layers with Multiple Outputs

Almost all the layers we usually encounter are single-output, including all the built-in layers in Keras, which take one or more inputs and then return a single result as an output. So, can Keras define a layer with dual outputs? The answer is yes, but you must clearly define the output_shape. For example, the following layer simply splits the input into two halves and returns both simultaneously.

class SplitVector(Layer):

    def __init__(self, **kwargs):
        super(SplitVector, self).__init__(**kwargs)

    def call(self, inputs):
        # Slice the tensor along the second dimension and return a list
        in_dim = K.int_shape(inputs)[-1]
        return [inputs[:, :in_dim//2], inputs[:, in_dim//2:]]

    def compute_output_shape(self, input_shape):
        # output_shape must also be a corresponding list
        in_dim = input_shape[-1]
        return [(None, in_dim//2), (None, in_dim-in_dim//2)]

x1, x2 = SplitVector()(x_in) # Usage

Combining Layers with Loss

Readers with experience from the article "Customizing Complex Loss Functions in Keras" know that the basic definition of loss in Keras is a function with inputs y_true and y_pred. However, in more complex situations, it is not just a function of the predicted values and target values; it can also be combined with weights for complex calculations.

Here, using center loss as an example again, we introduce a writing method based on custom layers.

class Dense_with_Center_loss(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(Dense_with_Center_loss, self).__init__(**kwargs)

    def build(self, input_shape):
        # Add trainable parameters
        self.kernel = self.add_weight(name='kernel',
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='glorot_normal',
                                      trainable=True)
        self.bias = self.add_weight(name='bias',
                                    shape=(self.output_dim,),
                                    initializer='zeros',
                                    trainable=True)
        self.centers = self.add_weight(name='centers',
                                       shape=(self.output_dim, input_shape[1]),
                                       initializer='glorot_normal',
                                       trainable=True)

    def call(self, inputs):
        # For center loss, the return result is still consistent with the return result of Dense
        # So it is still a ordinary matrix multiplication plus a bias
        self.inputs = inputs
        return K.dot(inputs, self.kernel) + self.bias

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

    def loss(self, y_true, y_pred, lamb=0.5):
        # Define the complete loss
        y_true = K.cast(y_true, 'int32') # Ensure y_true's dtype is int32
        crossentropy = K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
        centers = K.gather(self.centers, y_true[:, 0]) # Retrieve sample centers
        center_loss = K.sum(K.square(centers - self.inputs), axis=1) # Calculate center loss
        return crossentropy + lamb * center_loss

f_size = 2

x_in = Input(shape=(784,))
f = Dense(f_size)(x_in)

dense_center = Dense_with_Center_loss(10)
output = dense_center(f)

model = Model(x_in, output)
model.compile(loss=dense_center.loss,
              optimizer='adam',
              metrics=['sparse_categorical_accuracy'])

# Here y_train is the integer ID of the class, no need to convert to one-hot
model.fit(x_train, y_train, epochs=10)

Fancy Callbacks

In addition to modifying the model, we might also do many things during the training process. For example, after each epoch ends, calculate metrics on the validation set, save the best model, or perhaps reduce the learning rate after a certain number of epochs, or modify regularization parameters, etc. All of these can be implemented through callbacks.

Official Callbacks page: https://keras.io/callbacks/

Saving the Best Model

In Keras, the most convenient way to keep the best model based on validation set metrics is through the built-in ModelCheckpoint, for example:

checkpoint = ModelCheckpoint(filepath='./best_model.weights',
                             monitor='val_acc',
                             verbose=1,
                             save_best_only=True)

model.fit(x_train,
          y_train,
          epochs=10,
          validation_data=(x_test, y_test),
          callbacks=[checkpoint])

However, while this method is simple, it has a significant drawback: the metrics inside are determined by the metrics in compile. In Keras, defining a custom metric requires writing it as a tensor operation. In other words, if the index you expect cannot be written as a tensor operation (such as the BLEU score), then it cannot be written as a metric function and this solution cannot be used.

Therefore, a universal solution emerges: write your own callback—calculate whatever you want. For example:

from keras.callbacks import Callback

def evaluate(): # Evaluation function
    pred = model.predict(x_test)
    return np.mean(pred.argmax(axis=1) == y_test) # Calculate whatever you want

# Define the Callback class, calculate validation set acc, and save the best model
class Evaluate(Callback):

    def __init__(self):
        self.accs = []
        self.highest = 0.

    def on_epoch_end(self, epoch, logs=None):
        acc = evaluate()
        self.accs.append(acc)
        if acc >= self.highest: # Save best model weights
            self.highest = acc
            model.save_weights('best_model.weights')

        # Run whatever you want
        print 'acc: %s, highest: %s' % (acc, self.highest)

evaluator = Evaluate()
model.fit(x_train,
          y_train,
          epochs=10,
          callbacks=[evaluator])

Modifying Hyperparameters

During training, you may also need to fine-tune hyperparameters. For example, a common requirement is to adjust the learning rate based on the epoch. This can be easily implemented through LearningRateScheduler, which is also one of the callbacks.

from keras.callbacks import LearningRateScheduler

def lr_schedule(epoch):
    # Return different learning rates based on the epoch
    if epoch < 50:
        lr = 1e-2
    elif epoch < 80:
        lr = 1e-3
    else:
        lr = 1e-4
    return lr

lr_scheduler = LearningRateScheduler(lr_schedule)

model.fit(x_train,
          y_train,
          epochs=10,
          callbacks=[evaluator, lr_scheduler])

What if it's another hyperparameter? For example, the lamb in the previous center loss, or a similar regularization term. In this case, we need to set lamb as a Variable and then customize a callback to assign values dynamically. For instance, a loss I previously defined:

def mycrossentropy(y_true, y_pred, e=0.1):
    loss1 = K.categorical_crossentropy(y_true, y_pred)
    loss2 = K.categorical_crossentropy(K.ones_like(y_pred)/nb_classes, y_pred)
    return (1-e)*loss1 + e*loss2

If you want to dynamically change the parameter e, it can be changed to:

e = K.variable(0.1)

def mycrossentropy(y_true, y_pred):
    loss1 = K.categorical_crossentropy(y_true, y_pred)
    loss2 = K.categorical_crossentropy(K.ones_like(y_pred)/nb_classes, y_pred)
    return (1-e)*loss1 + e*loss2

model.compile(loss=mycrossentropy,
              optimizer='adam')

class callback4e(Callback):
    def __init__(self, e):
        self.e = e
    def on_epoch_end(self, epoch, logs={}):
        if epoch >= 100: # Set to 0.01 after 100 epochs
            K.set_value(self.e, 0.01)

model.fit(x_train,
          y_train,
          epochs=10,
          callbacks=[callback4e(e)])

Note that the Callback class supports six different execution functions for different stages: on_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end. Each function is executed at a different stage (easy to judge by the name), and they can be combined to implement very complex functions. For example, "warmup" refers to a setting where, after the default learning rate is set, it is not used for training immediately. Instead, in the first few epochs, the learning rate is slowly increased from zero to the default learning rate. This process can be understood as allowing the model to adjust to a better initialization. Reference code:

class Evaluate(Callback):
    def __init__(self):
        self.num_passed_batchs = 0
        self.warmup_epochs = 10
    def on_batch_begin(self, batch, logs=None):
        # params are some parameters automatically passed to the Callback by the model
        if self.params['steps'] == None:
            self.steps_per_epoch = np.ceil(1. * self.params['samples'] / self.params['batch_size'])
        else:
            self.steps_per_epoch = self.params['steps']
        if self.num_passed_batchs < self.steps_per_epoch * self.warmup_epochs:
            # In the first 10 epochs, the learning rate increases linearly from zero to 0.001
            K.set_value(self.model.optimizer.lr,
                        0.001 * (self.num_passed_batchs + 1) / self.steps_per_epoch / self.warmup_epochs)
        self.num_passed_batchs += 1

Infinite Possibilities of Keras

There are many other noteworthy techniques in Keras, such as directly using model.add_loss to flexibly add losses, nested model calls, using it purely as a simple high-level API for TensorFlow, and so on. They won't be listed one by one. Readers who have questions or are interested are welcome to leave a comment for discussion.

Usually, we think that highly encapsulated libraries like Keras lack flexibility, but that's actually not the case. You should know that Keras does not simply call existing high-level functions in TensorFlow or Theano; instead, it only encapsulates some basic functions through the backend, and then rewrites everything else (various layers, optimizers, etc.) using its own backend! It is precisely because of this that it can support switching between different backends.

Being able to achieve this level of performance, Keras's flexibility is beyond dispute. However, this flexibility is difficult to reflect in help documentation and ordinary cases. Most of the time, you need to read the source code to feel that Keras's way of writing is already impeccable. I feel that using Keras to implement complex models is both a challenge and a kind of artistic creation. When you succeed, you will be intoxicated by the piece of art you have created.