By 苏剑林 | January 27, 2019
Continuing the "Making Keras Cooler!" series, let's make Keras even more interesting. This time, we will focus on Keras losses, metrics, weights, and progress bars.
This kind of model is a standard input-output structure, where the loss is a calculation based on the output. However, for more complex models such as Autoencoders, GANs, and Seq2Seq, this approach is sometimes inconvenient, as the loss is not always just a function of the final output. Fortunately, newer versions of Keras already support more flexible loss definitions. For example, we can write an Autoencoder like this:
x_in = Input(shape=(784,))
x = x_in
x = Dense(100, activation='relu')(x)
x = Dense(784, activation='sigmoid')(x)
model = Model(x_in, x)
loss = K.mean((x - x_in)**2)
model.add_loss(loss)
model.compile(optimizer='adam')
model.fit(x_train, None, epochs=5)
The characteristics of the above approach are:
1. When calling compile, no loss is passed. Instead, the loss is defined in another way before compile and added to the model via add_loss. This allows for writing sufficiently flexible losses—for instance, the loss can depend on the output of intermediate layers, the inputs, and so on.
2. During fit, the original target data is now None, because all inputs and outputs have already been passed through Input. Readers can also refer to my previous article on Seq2Seq: "Playing with Keras: Seq2Seq for Automatic Title Generation". In that example, readers can more fully appreciate the convenience of this approach.
Another type of output is the metric used to observe the training process. Metrics here refer to indicators used to measure model performance, such as accuracy, F1 score, etc. Keras has built-in common metrics. Like the accuracy in the opening example, adding these metric names to model.compile allows them to be dynamically displayed during the training process.
Of course, you can also define new metrics by referring to Keras's built-in metric definitions. However, the problem is that in the standard metric definition method, a metric is the result of an operation between the "output layer" and the "target values." Yet, we often need to observe the changes of certain special quantities during training. For example, if I want to observe the change in the output of a specific intermediate layer, the standard metric definition will not work.
What can be done? We can look at the Keras source code and trace its metric-related methods. Ultimately, I discovered that metrics are actually defined within two lists. By modifying these two lists, we can flexibly display the metrics we need to observe. For example:
x_in = Input(shape=(784,))
x = x_in
x = Dense(100, activation='relu')(x)
x_h = x
x = Dense(10, activation='softmax')(x)
model = Model(x_in, x)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# The key part starts here
model.metrics_names.append('x_h_norm')
model.metrics_tensors.append(K.mean(K.sum(x_h**2, 1)))
model.fit(x_train, y_train, epochs=5)
The code above demonstrates how to observe the change in the average norm of an intermediate layer during the training process. As can be seen, this mainly involves two lists: model.metrics_names is a list of strings representing the names of the metrics, and model.metrics_tensors is a list of tensors for the metrics. As long as you add the desired quantity here, it can be displayed during training. Note, however, that you can only add one scalar at a time.
Sometimes we need to apply constraints to weights. Common examples include normalization, such as L2-norm normalization, Spectral Normalization, etc., though it could be other constraints as well.
There are generally two ways to implement weight constraints. The first is post-processing, where the weights are directly handled after each gradient descent step:
\begin{equation} \begin{aligned} &\boldsymbol{\theta} \leftarrow \boldsymbol{\theta} - \varepsilon\nabla_{\boldsymbol{\theta}}L(\boldsymbol{\theta})\\ &\boldsymbol{\theta}\leftarrow constraint(\boldsymbol{\theta}) \end{aligned} \end{equation}Clearly, this processing method must be written into the implementation of the optimizer. In fact, Keras's built-in approach is exactly this. It is simple to use; you only need to set the kernel_constraint or bias_constraint parameters when adding a layer. For details, please refer to: https://keras.io/constraints/.
The second method is pre-processing. We hope to process the weights before they are substituted into subsequent layer calculations. In other words, the constraint is part of the model rather than part of the optimizer. Keras itself does not provide native support for this scheme, but we can implement this requirement ourselves.
This is where the elegance of Keras's design is fully reflected. When creating a layer object, Keras divides it into two steps: build and call. The former is responsible for establishing weights, while the latter is responsible for the computation. By default, these two parts are executed simultaneously, but we can perform a "grafting" maneuver and execute them manually in steps.
Below is an implementation of Spectral Normalization using this idea:
class SpectralNormalization:
"""A wrapper for layers to add Spectral Normalization.
"""
def __init__(self, layer):
self.layer = layer
def spectral_norm(self, w, r=5):
w_shape = K.int_shape(w)
in_dim = np.prod(w_shape[:-1]).astype(int)
out_dim = w_shape[-1]
w = K.reshape(w, (in_dim, out_dim))
u = K.ones((1, in_dim))
for i in range(r):
v = K.l2_normalize(K.dot(u, w))
u = K.l2_normalize(K.dot(v, K.transpose(w)))
return K.sum(K.dot(K.dot(u, w), K.transpose(v)))
def spectral_normalization(self, w):
return w / self.spectral_norm(w)
def __call__(self, inputs):
with K.name_scope(self.layer.name):
if not self.layer.built:
input_shape = K.int_shape(inputs)
self.layer.build(input_shape)
self.layer.built = True
if self.layer._initial_weights is not None:
self.layer.set_weights(self.layer._initial_weights)
if not hasattr(self.layer, 'spectral_normalization'):
if hasattr(self.layer, 'kernel'):
self.layer.kernel = self.spectral_normalization(self.layer.kernel)
if hasattr(self.layer, 'gamma'):
self.layer.gamma = self.spectral_normalization(self.layer.gamma)
self.layer.spectral_normalization = True
return self.layer(inputs)
The usage method is:
x = SpectralNormalization(Dense(100, activation='relu'))(x)
Essentially, you define the layer and then wrap it with SpectralNormalization to modify it. As for the principle, we only need to observe the __call__ section. First, a newly created layer has built=False. We then manually execute the build method, normalize the original weights, and overwrite the original weights, as seen in the line self.layer.kernel = self.spectral_normalization(self.layer.kernel).
Finally, I'll mention a more interesting gadget: Keras's built-in progress bar. In the early days, this built-in progress bar was one of the features that attracted many new users. Of course, progress bars are no longer a novelty; there is a very useful progress bar tool in Python called tqdm, which I introduced a long time ago: "Two Stunning Python Libraries: tqdm and retry".
However, if you prefer the style of the Keras progress bar or don't want to install tqdm separately, you can call the Keras progress bar in your own designs:
import time
from keras.utils import Progbar
pbar = Progbar(100)
for i in range(100):
pbar.update(i + 1)
time.sleep(0.1)
It displays progress and the remaining time. If you want to include more content in the progress bar, you can add the values parameter during update, for example:
import time
from keras.utils import Progbar
pbar = Progbar(100)
for i in range(100):
pbar.update(i + 1, values=[('something', i - 10)])
time.sleep(0.1)
Note, however, that the values here use a moving average because this progress bar was primarily designed by Keras for metrics. If you do not want it to update with a moving average, use:
import time
from keras.utils import Progbar
pbar = Progbar(100, stateful_metrics=['something'])
for i in range(100):
pbar.update(i + 1, values=[('something', i - 10)])
time.sleep(0.1)
More usage parameters can be found here, or you can refer to the source code. Overall, its functionality is far less powerful than tqdm, but as a refined tool, it remains a nice choice for occasional use.
I have shared some more fancy Keras tricks, and I hope they are helpful to everyone. Using Keras flexibly is a quite interesting endeavor. Keras might not be the "best" deep learning framework, but it is certainly the most elegant framework (wrapper), and quite possibly without equal.
Deep Learning is short, I use Keras~