六、相关参数设置
tf.data.Dataset.from_tensor_slices:将输入张量和目标张量混合为一个TensorSliceDataset对象,把输入和目标对应,即切片操作。
shuffle: 对from_tensor_slices处理的数据,进行混合,混合就是打乱原数组之间的顺序,数组的数据大小和内容并没有改变;
混合的数据越大,混合程度越高。
dataset.batch: 对shuffle处理后的数据进行打包,如果为1,则数据内容和格式跟shuffle的数据相同,相当于没有处理,即对数据打包,便于后续批量处理。
iter(): 函数用来生成迭代器。
next(): 返回迭代器的下一个项目。
# BUFFER_SIZE = len(input_tensor_train) #就是3万 # 批次设置为64 BATCH_SIZE = 64 steps_per_epoch = len(input_tensor_train)//BATCH_SIZE # # 设置embedding维度 embedding_dim = 256 units = 1024 # 输入的 # 为什么要加1? 因为word_index是从1开始的,而embedding是从0开始的,所以需要加1. # 词典大小,输入的词典大小为: vocab_inp_size = len(inp_lang.word_index)+1 print(vocab_inp_size) #输出 # 输出的词典大小是多少 vocab_tar_size = len(targ_lang.word_index)+1 print(vocab_tar_size) # 训练集 # 训练集的输入值和目标值 # 创建一个dataset是给定张量的维度。 # 所有输入张量的第一维必须有相同的大小。 # shuffle(): 将所有数据重新排序。 dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE) dataset = dataset.batch(BATCH_SIZE, drop_remainder=True) # 一批次是64个样本数据。 # iter() 函数用来生成迭代器。 # next() 返回迭代器的下一个项目 example_input_batch, example_target_batch = next(iter(dataset)) example_input_batch,example_target_batch
输出:
*
9414
4935
(<tf.Tensor: shape=(64, 16), dtype=int32, numpy=
array([[ 1, 17, 116, …, 0, 0, 0],
[ 1, 55, 21, …, 0, 0, 0],
[ 1, 8, 7, …, 0, 0, 0],
…,
[ 1, 23, 958, …, 0, 0, 0],
[ 1, 7, 74, …, 0, 0, 0],
[ 1, 8, 509, …, 0, 0, 0]], dtype=int32)>,
<tf.Tensor: shape=(64, 11), dtype=int32, numpy=
array([[ 1, 14, 2114, 55, 1204, 3, 2, 0, 0, 0, 0],
[ 1, 10, 26, 9, 2586, 3, 2, 0, 0, 0, 0],
[ 1, 14, 90, 12, 21, 936, 3, 2, 0, 0, 0],
[ 1, 4, 194, 35, 9, 2099, 3, 2, 0, 0, 0],
[ 1, 4, 18, 1399, 475, 3, 2, 0, 0, 0, 0],
[ 1, 181, 75, 14, 36, 89, 7, 2, 0, 0, 0],
[ 1, 4, 234, 4, 26, 107, 3, 2, 0, 0, 0],
[ 1, 6, 23, 326, 3, 2, 0, 0, 0, 0, 0],
[ 1, 4, 77, 390, 3, 2, 0, 0, 0, 0, 0],
[ 1, 1366, 8, 847, 3, 2, 0, 0, 0, 0, 0],
[ 1, 20, 11, 21, 940, 3, 2, 0, 0, 0, 0],
[ 1, 4, 26, 39, 285, 3, 2, 0, 0, 0, 0],
[ 1, 5, 8, 4908, 1665, 3, 2, 0, 0, 0, 0],
[ 1, 64, 74, 3, 2, 0, 0, 0, 0, 0, 0],
[ 1, 6, 23, 48, 549, 3, 2, 0, 0, 0, 0],
[ 1, 5, 1001, 17, 3, 2, 0, 0, 0, 0, 0],
[ 1, 5, 76, 33, 61, 83, 3, 2, 0, 0, 0],
[ 1, 5, 51, 15, 36, 3, 2, 0, 0, 0, 0],
[ 1, 16, 118, 15, 13, 1274, 3, 2, 0, 0, 0],
[ 1, 5, 8, 48, 4912, 3, 2, 0, 0, 0, 0],
[ 1, 24, 6, 253, 7, 2, 0, 0, 0, 0, 0],
[ 1, 20, 3766, 11, 1845, 3, 2, 0, 0, 0, 0],
[ 1, 4, 18, 34, 1223, 3, 2, 0, 0, 0, 0],
[ 1, 5, 8, 2118, 3, 2, 0, 0, 0, 0, 0],
[ 1, 22, 6, 29, 9, 920, 7, 2, 0, 0, 0],
[ 1, 4, 87, 12, 245, 109, 3, 2, 0, 0, 0],
[ 1, 5, 199, 17, 33, 971, 3, 2, 0, 0, 0],
[ 1, 497, 20, 83, 3, 2, 0, 0, 0, 0, 0],
[ 1, 64, 17, 9, 301, 3, 2, 0, 0, 0, 0],
[ 1, 4, 222, 13, 83, 3, 2, 0, 0, 0, 0],
[ 1, 569, 50, 49, 5, 3, 2, 0, 0, 0, 0],
[ 1, 4, 47, 15, 53, 33, 3, 2, 0, 0, 0],
[ 1, 16, 38, 268, 3, 2, 0, 0, 0, 0, 0],
[ 1, 796, 31, 344, 3, 2, 0, 0, 0, 0, 0],
[ 1, 60, 38, 36, 7, 2, 0, 0, 0, 0, 0],
[ 1, 25, 14, 127, 139, 7, 2, 0, 0, 0, 0],
[ 1, 188, 1020, 3, 2, 0, 0, 0, 0, 0, 0],
[ 1, 31, 496, 8, 185, 37, 2, 0, 0, 0, 0],
[ 1, 19, 8, 10, 3, 2, 0, 0, 0, 0, 0],
[ 1, 19, 8, 165, 3, 2, 0, 0, 0, 0, 0],
[ 1, 10, 11, 265, 3, 2, 0, 0, 0, 0, 0],
[ 1, 19, 83, 8, 35, 146, 3, 2, 0, 0, 0],
[ 1, 4, 331, 21, 372, 3, 2, 0, 0, 0, 0],
[ 1, 14, 480, 61, 189, 3, 2, 0, 0, 0, 0],
[ 1, 20, 274, 1896, 3, 2, 0, 0, 0, 0, 0],
[ 1, 42, 14, 36, 57, 7, 2, 0, 0, 0, 0],
[ 1, 4, 62, 1062, 561, 3, 2, 0, 0, 0, 0],
[ 1, 10, 11, 308, 110, 3, 2, 0, 0, 0, 0],
[ 1, 6, 25, 12, 73, 5, 3, 2, 0, 0, 0],
[ 1, 88, 55, 20, 3, 2, 0, 0, 0, 0, 0],
[ 1, 4, 77, 21, 669, 3, 2, 0, 0, 0, 0],
[ 1, 5, 270, 9, 733, 3, 2, 0, 0, 0, 0],
[ 1, 464, 31, 995, 3, 2, 0, 0, 0, 0, 0],
[ 1, 16, 23, 626, 3, 2, 0, 0, 0, 0, 0],
[ 1, 4, 47, 20, 3, 2, 0, 0, 0, 0, 0],
[ 1, 4, 26, 1045, 6, 3, 2, 0, 0, 0, 0],
[ 1, 20, 11, 9, 153, 59, 115, 3, 2, 0, 0],
[ 1, 124, 10, 93, 7, 2, 0, 0, 0, 0, 0],
[ 1, 609, 16, 36, 451, 7, 2, 0, 0, 0, 0],
[ 1, 5, 8, 1000, 91, 3, 2, 0, 0, 0, 0],
[ 1, 8, 10, 917, 13, 1345, 7, 2, 0, 0, 0],
[ 1, 4, 290, 6, 69, 703, 3, 2, 0, 0, 0],
[ 1, 20, 8, 67, 3, 2, 0, 0, 0, 0, 0],
[ 1, 6, 30, 12, 29, 15, 36, 3, 2, 0, 0]],
dtype=int32)>)
*
观察tf.data.Dataset.from_tensor_slices生成的数据:
next(iter(tf.data.Dataset.from_tensor_slices((input_tensor, target_tensor)).shuffle(BUFFER_SIZE)))
输出:
(<tf.Tensor: shape=(53,), dtype=int32, numpy=
array([ 1, 12, 9, 8, 312, 917, 6, 170, 18, 133, 44, 11, 2,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0])>,
<tf.Tensor: shape=(51,), dtype=int32, numpy=
array([ 1, 8, 204, 12, 796, 5, 219, 19, 112, 13, 10, 2, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])>)
七、Encoder部分
python特殊函数__call__(self): 可以使用,类对象(参数)来直接调用call函数。当然也可以使用普通调用,类对象.call(参数)来调用call方法。
tf.keras.layers.Embedding
(input_dim,
output_dim,
embeddings_initializer=‘uniform’,
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs)
tf.keras.layers.Embedding: 嵌入层主要负责将一个特征转换成一个向量。将单词序列转化成一个向量,便于数据的处理。例子中Embedding层的作用就是把向量中每一个标签值映射为一个256维向量,这样就可以用一个256维向量来表示一个单词。input_dim表示词汇量的大小,即之前的变量vocab_inp_size,还有一个常用的参数input_length,这个参数用来规定输入的单词序列的长度,如果单词序列长度为30个,那么这个参数的值就应该设置为30。如果没有设置参数input_length,那么输入序列的长度可以改变。
注意:Embedding层输入是一个二维张量,形状为(batch_size, input_length),输出形状为(batch_size, input_length, output_dim),是一个三维张量。
tf.keras.layers.GRU(
units, activation=‘tanh’, recurrent_activation=‘sigmoid’,
use_bias=True, kernel_initializer=‘glorot_uniform’,
recurrent_initializer=‘orthogonal’,
bias_initializer=‘zeros’, kernel_regularizer=None,
recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
dropout=0.0, recurrent_dropout=0.0, return_sequences=False, return_state=False,
go_backwards=False, stateful=False, unroll=False, time_major=False,
reset_after=True, **kwargs
)
tf.keras.layers.GRU:
常用参数介绍:
units 正整数,输出空间的维度。
return_sequences 布尔值。是返回输出序列中的最后一个输出还是完整序列。默认值: False 。return_sequences默认为False,即只返回最后一个单元的output,在例子中是返回完整序列。
return_state 布尔值。除输出外,是否返回最后一个状态。默认值: False 。
recurrent_initializer recurrent_kernel 权重矩阵的初始化程序,用于递归状态的线性转换。
# 手法和之前的类似 # 继承自tf.keras.Model class Encoder(tf.keras.Model): # vocab_size:词典大小是多少 # embedding_dim: embedding的维度 # encoding_units:编码的单元个数 # batch_size:一批样本的大小 def __init__(self, vocab_size, embedding_dim, encoding_units, batch_size): super(Encoder, self).__init__() self.batch_size = batch_size # 编码单元 self.encoding_units = encoding_units # 创建Embedding层, # 词典大小,embedding的维度。 self.embedding = keras.layers.Embedding(vocab_size, embedding_dim) #定义GRU层,gru是lstm变种,gru把遗忘门和输入门变为一个,因为遗忘门+输入门=1 self.gru = keras.layers.GRU(self.encoding_units, # 默认为False,即只返回最后一个单元的output # 如果设置为True,表示返回所有单元的output return_sequences=True, # 是否返回最后的状态以添加到输出里边 # 默认为False,即不返回最后一个单元的hidden_state # 如果设置为True,表示返回最后一个单元的hidden_state return_state=True, recurrent_initializer='glorot_uniform') # 初始化 def call(self, x, hidden): # embedding的输出x x = self.embedding(x) # 初始状态 hidden output, state = self.gru(x, initial_state = hidden) return output, state # 隐藏状态 def initialize_hidden_state(self): return tf.zeros((self.batch_size, self.encoding_units)) # embedding:特征维度 # units:神经元个数 # embedding_dim:embedding维度,256. # vocab_inp_size:输入单词的一个词典大小 encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE) # 获得初始化的hidden # 初始化隐状态 # 二维列表: 默认的话是(64*1024) 即批次*编码单元个数 sample_hidden = encoder.initialize_hidden_state() # 获得输出和隐含状态 # 对象运行调用了call方法,因为继承了tf.keras.model # example_input_batch:转换为向量的训练集的一个批次48 sample_output, sample_hidden = encoder(example_input_batch, sample_hidden) # 维度变化: # 首先,句子经过tokenizer处理,句子的维度变为(,16) # # 输出的16是长度,1024是状态的size print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape)) print('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))
输出:
Encoder output shape: (batch size, sequence length, units) (64, 16, 1024)
Encoder Hidden state shape: (batch size, units) (64, 1024)
八、BahdanauAttention部分
1、Bahdanau注意力公式:
EO: (Encoder_Output) encoder各个位置的输出。
H: (Hidden_State) decoder某一步的隐含状态。
FC: 全连接层
X: decoder的一个输入
score = FC(tanh(FC(EO)+FC(H)))
基于attention的seq2seq结构图如下:
2、tf.keras.layers.Dense:全连接层在整个网络卷积神经网络中起到“特征提取器”的作用。
tf.keras.layers.Dense(
units, # 正整数,输出空间的维数
activation=None, # 激活函数,不指定则没有
use_bias=True, # 布尔值,是否使用偏移向量
kernel_initializer=‘glorot_uniform’, # 核权重矩阵的初始值设定项
bias_initializer=‘zeros’, # 偏差向量的初始值设定项
kernel_regularizer=None, # 正则化函数应用于核权矩阵
bias_regularizer=None, # 应用于偏差向量的正则化函数
activity_regularizer=None, # Regularizer function applied to the output of the layer (its “activation”)
kernel_constraint=None, # Constraint function applied to the kernel weights matrix.
bias_constraint=None, **kwargs # Constraint function applied to the bias vector
)
3、tf.expand_dims(
input, axis=None, name=None, dim=None
)
tf.expand_dims:给定的张量input,该操作插入尺寸索引处的1维axis的input的形状。维度索引axis从零开始;简单来说就是增加一个维度。
#实现Attention机制 # class BahdanauAttention(tf.keras.Model): def __init__(self, units): super(BahdanauAttention, self).__init__() # 做三个全连接层 # 做全连接 # 做几个全连接层。 # 激活函数 默认是没有的。 self.W1 = tf.keras.layers.Dense(units) self.W2 = tf.keras.layers.Dense(units) self.V = tf.keras.layers.Dense(1) # query传的是decoder_ hidden # 实现公式 # call传入的是sample_hidden def call(self, query, values): # deocoder_ hidden. shape: (batch_ size,units ) # encoder outputs.shape: ( batch size, length,units ) # 做维度扩展,扩展前后对比是下面两行 # hidden shape == (batch_size, hidden size) # hidden_with_time_axis shape == (batch_size, 1, hidden size) # we are doing this to perform addition to calculate the score # 维度扩展 # 运算需要,必须维度一致。 hidden_with_time_axis = tf.expand_dims(query, 1) #接下来要实现Attention,Bahdanau方式的 # beforeV:(batch_ size,length,units ) # after V( batch_ size,length,1) # score shape == (batch_size, max_length, 1) # we get 1 at the last axis because we are applying score to self.V # the shape of the tensor before applying self.V is (batch_size, max_length, units) score = self.V(tf.nn.tanh(self.W1(values) + self.W2(hidden_with_time_axis))) # attention_weights shape == (batch_size, max_length, 1) attention_weights = tf.nn.softmax(score, axis=1) #先算加权,values就是encoder_outputs # context_vector.shape: ( batch size, length units ) context_vector = attention_weights * values #再算平均,在length的维度去求和 # context_vector shape after sum == (batch_size, hidden_size) context_vector = tf.reduce_sum(context_vector, axis=1) return context_vector, attention_weights # attention_layer = BahdanauAttention(10) attention_result, attention_weights = attention_layer(sample_hidden, sample_output) print("Attention result shape: (batch size, units) {}".format(attention_result.shape)) print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))
输出:
Attention result shape: (batch size, units) (64, 1024)
Attention weights shape: (batch_size, sequence_length, 1) (64, 16, 1)
九、Decoder部分
#接着我们实现decoder class Decoder(tf.keras.Model): #init传参和encoder很像 def __init__(self, vocab_size, embedding_dim, decoding_units, batch_size): #这里必须调用父类 super(Decoder, self).__init__() self.batch_size = batch_size self.decoding_units = decoding_units # Embedding 层 # self.embedding = keras.layers.Embedding(vocab_size, embedding_dim) # GRU的decoder # self.gru = keras.layers.GRU(self.decoding_units, return_sequences=True, return_state=True, recurrent_initializer='glorot_uniform') self.fc = keras.layers.Dense(vocab_size) # 拿到注意力的对象 # used for attention,每一步都会被调用 self.attention = BahdanauAttention(self.decoding_units) #照着原来的原理图理解 def call(self, x, hidden, encoding_output): # context vector. shape: ( batch size, units) # enc_output shape == (batch_size, max_length, hidden_size) context_vector, attention_weights = self.attention(hidden, encoding_output) # before embedding: x. shape: (batch_ size, 1 ) # after embedding : x. shape: (batch size, 1, embedding units) # x shape after passing through embedding == (batch_size, 1, embedding_dim) x = self.embedding(x) #把x和context_vector拼起来,context_vector为什么要扩展维度? # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size) x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1) # passing the concatenated vector to the GRU # output. shape:[batch_size,1,decoding_units ] #state. shape:[batch_size, decoding_units ] output, state = self.gru(x) # output shape == (batch_size * 1, hidden_size) output = tf.reshape(output, (-1, output.shape[2])) # output shape == (batch_size, vocab) x = self.fc(output) return x, state, attention_weights decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE) sample_decoder_output,decoder_hidden,decoder_aw = decoder(tf.random.uniform((64, 1)), sample_hidden, sample_output) print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape)) print( "decoder_hidden.shape: ",decoder_hidden.shape ) print( "decoder_attention_weights.shape:",decoder_aw.shape )
输出:
Decoder output shape: (batch_size, vocab size) (64, 4935) decoder_hidden.shape: (64, 1024) decoder_attention_weights.shape: (64, 16, 1)
十、定义优化器、损失函数
# 我们用的优化器是adam optimizer = keras.optimizers.Adam() # 分类问题我们往往用SparseCategoricalCrossentropy,因为我们的fc是纯的输出,没有加softmax, # 因此这里的from_logits为True,否则改为false,reduction是损失函数如何做聚合 # 交叉损失函数 # from_logits: 为True时,会将y_pred转化为概率(用softmax),否则不进行转换,通常情况下用True结果更稳定; # reduction:类型为tf.keras.losses.Reduction,对loss进行处理,默认是求平均; loss_object = keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none') def loss_function(real, pred): # 是零的时候返回结果是True,因此要取反操作 # tf.math.equal(real, 0)是padding的部分都是1,不是padding的部分都是零,因此我们要取反 # 取反 # tf.math.logical_not: 为Tensorflow中的逻辑NOT功能提供支持 # 返回的mask是一个张量,数据类型为bool类型。 mask = tf.math.logical_not(tf.math.equal(real, 0)) loss_ = loss_object(real, pred) #将张量转换为新类型 # 将张量转为float类型 mask = tf.cast(mask, dtype=loss_.dtype) # padding部分的mask是零 # loss_ *= mask # 计算累计的平均损失 # 用于计算张量tensor沿着指定的数轴(tensor的某一维度)上的的平均值,主要用作降维或者计算tensor(图像)的平均值。 return tf.reduce_mean(loss_) # 保存模型 checkpoint_dir = './8-1_checkpoints' if not os.path.exists(checkpoint_dir): os.mkdir(checkpoint_dir) checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt") checkpoint = tf.train.Checkpoint(optimizer=optimizer, encoder=encoder, decoder=decoder)
十一、提高调用速度,把它变成图
# 为了提高调用速度,把它变成图。 @tf.function # def train_step(inp, targ, encoding_hidden): loss = 0 # with tf.GradientTape() as tape: #把输入给encoder,得到encoding_output, encoding_hidden encoding_output, encoding_hidden = encoder(inp, encoding_hidden) decoding_hidden = encoding_hidden decoding_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1) #eg: <start> I am here <end> #1.<start>->I #2.I->am #3.am->here # 4. here ->< end> #对于here,我们相当于要把I am 的信息都要给过去 # Teacher forcing - feeding the target as the next input for t in range(1, targ.shape[1]): # passing enc_output to the decoder #根据我们前面的原理解析,我们这里需要给3项信息 predictions, decoding_hidden, _ = decoder(decoding_input, decoding_hidden, encoding_output) loss += loss_function(targ[:, t], predictions) # using teacher forcing decoding_input = tf.expand_dims(targ[:, t], 1) #这里是每个batch上平均的损失函数 batch_loss = (loss / int(targ.shape[1])) variables = encoder.trainable_variables + decoder.trainable_variables #求梯度 gradients = tape.gradient(loss, variables) #有了梯度以后,可以用optimizer去做apply optimizer.apply_gradients(zip(gradients, variables)) return batch_loss
十二、训练
EPOCHS = 10 #这里运行时间比较久 for epoch in range(EPOCHS): start = time.time() # 第一次,全0的隐含状态 encoding_hidden = encoder.initialize_hidden_state() total_loss = 0 # 取多少次数据 # 每次去取dataset.take(steps_per_epoch)这么多数据 # 每训练100次打印一下损失 for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)): batch_loss = train_step(inp, targ, encoding_hidden) total_loss += batch_loss #这里增加打印 if batch % 100 == 0: print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch, batch_loss.numpy())) # saving (checkpoint) the model every 2 epochs,保存模型 # 满足条件才保存模型 if (epoch + 1) % 2 == 0: checkpoint.save(file_prefix = checkpoint_prefix) print('Epoch {} Loss {:.4f}'.format(epoch + 1, total_loss / steps_per_epoch)) print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
十三、给定字符串,进行翻译。
# 接收字符串,并进行翻译 def evaluate(sentence): attention_plot = np.zeros((max_length_targ, max_length_inp)) sentence = preprocess_sentence(sentence) #text到id的转换 inputs = [inp_lang.word_index[i] for i in sentence.split(' ')] #加padding inputs = keras.preprocessing.sequence.pad_sequences([inputs], maxlen=max_length_inp, padding='post') inputs = tf.convert_to_tensor(inputs) result = '' hidden = [tf.zeros((1, units))] encoding_out, encoding_hidden = encoder(inputs, hidden) #按模型把encoding_hidden给decoding_hidden decoding_hidden = encoding_hidden decoding_input = tf.expand_dims([targ_lang.word_index['<start>']], 0) #eg:<start>->A #A->B->C-> D # decoding_ input. shape:(1, 1) for t in range(max_length_targ): predictions, decoding_hidden, attention_weights = decoder( decoding_input, decoding_hidden, encoding_out) # attention weights. shape: (batch size, input length, 1) (1, 16, 1 ),需要变为长度为16的向量 # storing the attention weights to plot later on attention_weights = tf.reshape(attention_weights, (-1, )) attention_plot[t] = attention_weights.numpy() # predictions.shape: (batch_ size, vocab_ size) (1, 4935) #获取概率最大的值作为下一步的输入 predicted_id = tf.argmax(predictions[0]).numpy() result += targ_lang.index_word[predicted_id] + ' ' #终止循环 if targ_lang.index_word[predicted_id] == '<end>': return result, sentence, attention_plot # the predicted ID is fed back into the model decoding_input = tf.expand_dims([predicted_id], 0) #到此decoding_input,decoding_hidden我们都做了更新 return result, sentence, attention_plot # function for plotting the attention weights,把注意力关系完成可视化 def plot_attention(attention, sentence, predicted_sentence): fig = plt.figure(figsize=(10,10)) ax = fig.add_subplot(1, 1, 1) ax.matshow(attention, cmap='viridis') fontdict = {'fontsize': 14} #把标注写上,我们需要把第零个位置空出来,看图即可看出 ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90) ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict) plt.show() #通过这个函数,把上面两个函数串起来 def translate(sentence): result, sentence, attention_plot = evaluate(sentence) print('Input: %s' % (sentence)) print('Predicted translation: {}'.format(result)) #因为输出不一定有输入的长度长,也就是result长度小于输入的长度 attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))] plot_attention(attention_plot, sentence.split(' '), result.split(' ')) checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
评估
#it is terribly cold here translate(u'hace mucho frio aqui.')
参考文章: