1.构建自定义层
1.1 基本模型框架构建
首先我们先定义一个全连接层,来对其有一个基本的认识,它应该封装一些权重和一些基本的计算。
from tensorflow.keras import layers class Linear(layers.Layer): def __init__(self, units=32, input_dim=32): super(Linear, self).__init__() w_init = tf.random_normal_initializer() self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units),dtype='float32'),trainable=True) b_init = tf.zeros_initializer() self.b = tf.Variable(initial_value=b_init(shape=(units,),dtype='float32'),trainable=True) def call(self, inputs): return tf.matmul(inputs, self.w) + self.b x = tf.ones((2, 2)) linear_layer = Linear(4, 2) y = linear_layer(x) print(y)
tf.Tensor( [[ 0.06982724 -0.03397265 -0.10616937 -0.00868062] [ 0.06982724 -0.03397265 -0.10616937 -0.00868062]], shape=(2, 4), dtype=float32)
它的实现利用了所有keras层的基类:Layer,在这中call是最重要的函数,它用于实现层的功能,layers中魔法函数 __call__ 会将收到的输入传递给 call 函数,然后调用 call 函数实现具体的功能。
在对图层权重的设定中,我们更多采用add_weight,它比较方便,如下所所示
class Linear(layers.Layer): def __init__(self, units=32, input_dim=32): super(Linear, self).__init__() self.w = self.add_weight(shape=(input_dim, units), initializer='random_normal', trainable=True) self.b = self.add_weight(shape=(units,), initializer='zeros', trainable=True) def call(self, inputs): return tf.matmul(inputs, self.w) + self.b x = tf.ones((2, 2)) linear_layer = Linear(4, 2) y = linear_layer(x) print(y)
tf.Tensor( [[ 0.00500566 0.09818187 -0.04121634 0.08368425] [ 0.00500566 0.09818187 -0.04121634 0.08368425]], shape=(2, 4), dtype=float32)
在图层我们可以定义权重是否可以被训练,我们在tf.Variable或者add_weight函数中,将trainable=False即为该权重不参与训练,我们可以通过layer.weights、layer.non_trainable_weights和layer.trainable_weights查看权重信息。
1.2 延迟创建权重,直到知道输入的形状
在许多情况下,我们可能事先不知道输入的大小,并且想在实例化图层后的某个时间,在该值已知后再去创建权重。我们在图层的build(inputs_shape)
方法中创建图层权重。Keras
图层的内置__call__
方法将在首次调用build
时自动运行。
class Linear(layers.Layer): def __init__(self, units=32): super(Linear, self).__init__() self.units = units def build(self, input_shape): self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True) self.b = self.add_weight(shape=(self.units,), initializer='random_normal', trainable=True) def call(self, inputs): return tf.matmul(inputs, self.w) + self.b x = tf.ones((2, 2)) linear_layer = Linear(4) y = linear_layer(x) print(y)
tf.Tensor( [[ 0.07487861 -0.06351871 0.02886802 -0.10637227] [ 0.07487861 -0.06351871 0.02886802 -0.10637227]], shape=(2, 4), dtype=float32)
1.3 层的递归组合
如果您将Layer
实例分配为另一个Layer
的属性,则外层将开始跟踪内层的权重。通常在__init__
方法中创建此类子层。
class MLPBlock(layers.Layer): def __init__(self): super(MLPBlock, self).__init__() self.linear_1 = Linear(32) self.linear_2 = Linear(32) self.linear_3 = Linear(1) def call(self, inputs): x = self.linear_1(inputs) x = tf.nn.relu(x) x = self.linear_2(x) x = tf.nn.relu(x) return self.linear_3(x) mlp = MLPBlock() y = mlp(tf.ones(shape=(3, 64))) print('weights:', len(mlp.weights)) print('trainable weights:', len(mlp.trainable_weights))
weights: 6 trainable weights: 6
1.4 层中创建loss张量
层可以在向前传递的过程中带来损失,我们可以创建loss张量,在将在编写训练迭代循环时使用。 这可以通过调用self.add_loss(value)
来实现:
class LossLayer(layers.Layer): def __init__(self, rate=1e-2): super(LossLayer, self).__init__() self.rate = rate def call(self, inputs): self.add_loss(self.rate * tf.reduce_sum(inputs)) return inputs class OuterLayer(layers.Layer): def __init__(self): super(OuterLayer, self).__init__() self.loss_fun=LossLayer(1e-2) def call(self, inputs): return self.loss_fun(inputs) layer = OuterLayer() assert len(layer.losses) == 0 # No losses yet since the layer has never been called _ = layer(tf.zeros(1, 1)) assert len(layer.losses) == 1 # We created one loss value # `layer.losses` gets reset at the start of each __call__ _ = layer(tf.zeros(1, 1)) assert len(layer.losses) == 1 # This is the loss created during the call above
此外,损失属性还包含为任何内层的权重创建的正则化损失。
class OuterLayer(layers.Layer): def __init__(self): super(OuterLayer, self).__init__() self.dense = layers.Dense(32, kernel_regularizer=tf.keras.regularizers.l2(1e-3)) def call(self, inputs): return self.dense(inputs) layer = OuterLayer() _ = layer(tf.zeros((1, 1))) # This is `1e-3 * sum(layer.dense.kernel ** 2)`, # created by the `kernel_regularizer` above. print(layer.losses)
[<tf.Tensor: id=43178, shape=(), dtype=float32, numpy=0.0017348012>]
在编写训练循环时应考虑这些损失,如下所示:
# Instantiate an optimizer. optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) # Iterate over the batches of a dataset. for x_batch_train, y_batch_train in train_dataset: with tf.GradientTape() as tape: logits = layer(x_batch_train) # Logits for this minibatch # Loss value for this minibatch loss_value = loss_fn(y_batch_train, logits) # Add extra losses created during this forward pass: loss_value += sum(tf.keras.model.losses) grads = tape.gradient(loss_value, tf.keras.model.trainable_weights) optimizer.apply_gradients(zip(grads, tf.keras.model.trainable_weights))
1.5 选择性地将层序列化
如果想把定制的层序列化为功能模型的一部分,我们可以选择性地执行get_config
方法:
class Linear(layers.Layer): def __init__(self, units=32): super(Linear, self).__init__() self.units = units def build(self, input_shape): self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True) self.b = self.add_weight(shape=(self.units,), initializer='random_normal', trainable=True) def call(self, inputs): return tf.matmul(inputs, self.w) + self.b def get_config(self): return {'units': self.units} # x = tf.ones((2, 2)) layer = Linear(4) print(layer(x)) # Now you can recreate the layer from its config: layer = Linear(4) config = layer.get_config() print(config) new_layer = Linear.from_config(config) print(new_layer(x))
tf.Tensor( [[ 0.04059407 0.03458515 -0.15687762 -0.07868339] [ 0.04059407 0.03458515 -0.15687762 -0.07868339]], shape=(2, 4), dtype=float32) {'units': 4} tf.Tensor( [[-0.05122885 -0.01253471 -0.00830858 -0.05356697] [-0.05122885 -0.01253471 -0.00830858 -0.05356697]], shape=(2, 4), dtype=float32)
注意,基层类的_init__
方法接受一些关键字参数,特别是name
和dtype
。在_init__
中将这些参数传递给父类并将它们包含在层配置中是一个很好的做法:
class Linear(layers.Layer): def __init__(self, units=32, **kwargs): super(Linear, self).__init__(**kwargs) self.units = units def build(self, input_shape): self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True) self.b = self.add_weight(shape=(self.units,), initializer='random_normal', trainable=True) def call(self, inputs): return tf.matmul(inputs, self.w) + self.b def get_config(self): config = super(Linear, self).get_config() config.update({'units': self.units}) return config
{'name': 'linear_40', 'trainable': True, 'dtype': 'float32', 'units': 64}