六、相关参数设置

tf.data.Dataset.from_tensor_slices：将输入张量和目标张量混合为一个TensorSliceDataset对象，把输入和目标对应，即切片操作。

shuffle: 对from_tensor_slices处理的数据，进行混合，混合就是打乱原数组之间的顺序，数组的数据大小和内容并没有改变;

混合的数据越大，混合程度越高。

dataset.batch: 对shuffle处理后的数据进行打包，如果为1，则数据内容和格式跟shuffle的数据相同，相当于没有处理，即对数据打包，便于后续批量处理。

iter(): 函数用来生成迭代器。

next(): 返回迭代器的下一个项目。

# 
BUFFER_SIZE = len(input_tensor_train)  #就是3万
# 批次设置为64
BATCH_SIZE = 64
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
# 
# 设置embedding维度
embedding_dim = 256
units = 1024
# 输入的
# 为什么要加1？ 因为word_index是从1开始的，而embedding是从0开始的，所以需要加1.
# 词典大小，输入的词典大小为：
vocab_inp_size = len(inp_lang.word_index)+1
print(vocab_inp_size)
#输出
# 输出的词典大小是多少
vocab_tar_size = len(targ_lang.word_index)+1
print(vocab_tar_size)
# 训练集
# 训练集的输入值和目标值
# 创建一个dataset是给定张量的维度。
# 所有输入张量的第一维必须有相同的大小。
# shuffle（）: 将所有数据重新排序。
dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
# 一批次是64个样本数据。
# iter() 函数用来生成迭代器。
# next() 返回迭代器的下一个项目
example_input_batch, example_target_batch = next(iter(dataset))
example_input_batch,example_target_batch

输出：

9414

4935

(<tf.Tensor: shape=(64, 16), dtype=int32, numpy=

array([[ 1, 17, 116, …, 0, 0, 0],

[ 1, 55, 21, …, 0, 0, 0],

[ 1, 8, 7, …, 0, 0, 0],

…,

[ 1, 23, 958, …, 0, 0, 0],

[ 1, 7, 74, …, 0, 0, 0],

[ 1, 8, 509, …, 0, 0, 0]], dtype=int32)>,

<tf.Tensor: shape=(64, 11), dtype=int32, numpy=

array([[ 1, 14, 2114, 55, 1204, 3, 2, 0, 0, 0, 0],

[ 1, 10, 26, 9, 2586, 3, 2, 0, 0, 0, 0],

[ 1, 14, 90, 12, 21, 936, 3, 2, 0, 0, 0],

[ 1, 4, 194, 35, 9, 2099, 3, 2, 0, 0, 0],

[ 1, 4, 18, 1399, 475, 3, 2, 0, 0, 0, 0],

[ 1, 181, 75, 14, 36, 89, 7, 2, 0, 0, 0],

[ 1, 4, 234, 4, 26, 107, 3, 2, 0, 0, 0],

[ 1, 6, 23, 326, 3, 2, 0, 0, 0, 0, 0],

[ 1, 4, 77, 390, 3, 2, 0, 0, 0, 0, 0],

[ 1, 1366, 8, 847, 3, 2, 0, 0, 0, 0, 0],

[ 1, 20, 11, 21, 940, 3, 2, 0, 0, 0, 0],

[ 1, 4, 26, 39, 285, 3, 2, 0, 0, 0, 0],

[ 1, 5, 8, 4908, 1665, 3, 2, 0, 0, 0, 0],

[ 1, 64, 74, 3, 2, 0, 0, 0, 0, 0, 0],

[ 1, 6, 23, 48, 549, 3, 2, 0, 0, 0, 0],

[ 1, 5, 1001, 17, 3, 2, 0, 0, 0, 0, 0],

[ 1, 5, 76, 33, 61, 83, 3, 2, 0, 0, 0],

[ 1, 5, 51, 15, 36, 3, 2, 0, 0, 0, 0],

[ 1, 16, 118, 15, 13, 1274, 3, 2, 0, 0, 0],

[ 1, 5, 8, 48, 4912, 3, 2, 0, 0, 0, 0],

[ 1, 24, 6, 253, 7, 2, 0, 0, 0, 0, 0],

[ 1, 20, 3766, 11, 1845, 3, 2, 0, 0, 0, 0],

[ 1, 4, 18, 34, 1223, 3, 2, 0, 0, 0, 0],

[ 1, 5, 8, 2118, 3, 2, 0, 0, 0, 0, 0],

[ 1, 22, 6, 29, 9, 920, 7, 2, 0, 0, 0],

[ 1, 4, 87, 12, 245, 109, 3, 2, 0, 0, 0],

[ 1, 5, 199, 17, 33, 971, 3, 2, 0, 0, 0],

[ 1, 497, 20, 83, 3, 2, 0, 0, 0, 0, 0],

[ 1, 64, 17, 9, 301, 3, 2, 0, 0, 0, 0],

[ 1, 4, 222, 13, 83, 3, 2, 0, 0, 0, 0],

[ 1, 569, 50, 49, 5, 3, 2, 0, 0, 0, 0],

[ 1, 4, 47, 15, 53, 33, 3, 2, 0, 0, 0],

[ 1, 16, 38, 268, 3, 2, 0, 0, 0, 0, 0],

[ 1, 796, 31, 344, 3, 2, 0, 0, 0, 0, 0],

[ 1, 60, 38, 36, 7, 2, 0, 0, 0, 0, 0],

[ 1, 25, 14, 127, 139, 7, 2, 0, 0, 0, 0],

[ 1, 188, 1020, 3, 2, 0, 0, 0, 0, 0, 0],

[ 1, 31, 496, 8, 185, 37, 2, 0, 0, 0, 0],

[ 1, 19, 8, 10, 3, 2, 0, 0, 0, 0, 0],

[ 1, 19, 8, 165, 3, 2, 0, 0, 0, 0, 0],

[ 1, 10, 11, 265, 3, 2, 0, 0, 0, 0, 0],

[ 1, 19, 83, 8, 35, 146, 3, 2, 0, 0, 0],

[ 1, 4, 331, 21, 372, 3, 2, 0, 0, 0, 0],

[ 1, 14, 480, 61, 189, 3, 2, 0, 0, 0, 0],

[ 1, 20, 274, 1896, 3, 2, 0, 0, 0, 0, 0],

[ 1, 42, 14, 36, 57, 7, 2, 0, 0, 0, 0],

[ 1, 4, 62, 1062, 561, 3, 2, 0, 0, 0, 0],

[ 1, 10, 11, 308, 110, 3, 2, 0, 0, 0, 0],

[ 1, 6, 25, 12, 73, 5, 3, 2, 0, 0, 0],

[ 1, 88, 55, 20, 3, 2, 0, 0, 0, 0, 0],

[ 1, 4, 77, 21, 669, 3, 2, 0, 0, 0, 0],

[ 1, 5, 270, 9, 733, 3, 2, 0, 0, 0, 0],

[ 1, 464, 31, 995, 3, 2, 0, 0, 0, 0, 0],

[ 1, 16, 23, 626, 3, 2, 0, 0, 0, 0, 0],

[ 1, 4, 47, 20, 3, 2, 0, 0, 0, 0, 0],

[ 1, 4, 26, 1045, 6, 3, 2, 0, 0, 0, 0],

[ 1, 20, 11, 9, 153, 59, 115, 3, 2, 0, 0],

[ 1, 124, 10, 93, 7, 2, 0, 0, 0, 0, 0],

[ 1, 609, 16, 36, 451, 7, 2, 0, 0, 0, 0],

[ 1, 5, 8, 1000, 91, 3, 2, 0, 0, 0, 0],

[ 1, 8, 10, 917, 13, 1345, 7, 2, 0, 0, 0],

[ 1, 4, 290, 6, 69, 703, 3, 2, 0, 0, 0],

[ 1, 20, 8, 67, 3, 2, 0, 0, 0, 0, 0],

[ 1, 6, 30, 12, 29, 15, 36, 3, 2, 0, 0]],

dtype=int32)>)

观察tf.data.Dataset.from_tensor_slices生成的数据：

next(iter(tf.data.Dataset.from_tensor_slices((input_tensor, target_tensor)).shuffle(BUFFER_SIZE)))

输出：

(<tf.Tensor: shape=(53,), dtype=int32, numpy=

array([ 1, 12, 9, 8, 312, 917, 6, 170, 18, 133, 44, 11, 2,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0])>,

<tf.Tensor: shape=(51,), dtype=int32, numpy=

array([ 1, 8, 204, 12, 796, 5, 219, 19, 112, 13, 10, 2, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])>)

七、Encoder部分

python特殊函数__call__(self): 可以使用，类对象（参数）来直接调用call函数。当然也可以使用普通调用，类对象.call(参数)来调用call方法。

tf.keras.layers.Embedding

(input_dim,

output_dim,

embeddings_initializer=‘uniform’,

embeddings_regularizer=None,

activity_regularizer=None,

embeddings_constraint=None,

mask_zero=False,

input_length=None,

**kwargs)

tf.keras.layers.Embedding: 嵌入层主要负责将一个特征转换成一个向量。将单词序列转化成一个向量，便于数据的处理。例子中Embedding层的作用就是把向量中每一个标签值映射为一个256维向量，这样就可以用一个256维向量来表示一个单词。input_dim表示词汇量的大小，即之前的变量vocab_inp_size，还有一个常用的参数input_length，这个参数用来规定输入的单词序列的长度，如果单词序列长度为30个，那么这个参数的值就应该设置为30。如果没有设置参数input_length，那么输入序列的长度可以改变。

注意：Embedding层输入是一个二维张量，形状为(batch_size, input_length)，输出形状为(batch_size, input_length, output_dim)，是一个三维张量。

tf.keras.layers.GRU(

units, activation=‘tanh’, recurrent_activation=‘sigmoid’,

use_bias=True, kernel_initializer=‘glorot_uniform’,

recurrent_initializer=‘orthogonal’,

bias_initializer=‘zeros’, kernel_regularizer=None,

recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None,

kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,

dropout=0.0, recurrent_dropout=0.0, return_sequences=False, return_state=False,

go_backwards=False, stateful=False, unroll=False, time_major=False,

reset_after=True, **kwargs

)

tf.keras.layers.GRU：

常用参数介绍：

units 正整数,输出空间的维度。

return_sequences 布尔值。是返回输出序列中的最后一个输出还是完整序列。默认值： False 。return_sequences默认为False，即只返回最后一个单元的output，在例子中是返回完整序列。

return_state 布尔值。除输出外，是否返回最后一个状态。默认值： False 。

recurrent_initializer recurrent_kernel 权重矩阵的初始化程序，用于递归状态的线性转换。

# 手法和之前的类似
# 继承自tf.keras.Model
class Encoder(tf.keras.Model):
    # vocab_size：词典大小是多少
    # embedding_dim: embedding的维度
    # encoding_units：编码的单元个数
    # batch_size：一批样本的大小
    def __init__(self, vocab_size, embedding_dim, encoding_units, batch_size):
        super(Encoder, self).__init__()
        self.batch_size = batch_size
        # 编码单元
        self.encoding_units = encoding_units
        # 创建Embedding层，
        # 词典大小，embedding的维度。
        self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
        #定义GRU层，gru是lstm变种，gru把遗忘门和输入门变为一个，因为遗忘门+输入门=1
        self.gru = keras.layers.GRU(self.encoding_units,
                                    # 默认为False，即只返回最后一个单元的output
                                    # 如果设置为True，表示返回所有单元的output
                                    return_sequences=True,
                                    # 是否返回最后的状态以添加到输出里边
                                    # 默认为False，即不返回最后一个单元的hidden_state
                                    # 如果设置为True，表示返回最后一个单元的hidden_state
                                    return_state=True,
                                    recurrent_initializer='glorot_uniform')
    # 初始化
    def call(self, x, hidden):
        # embedding的输出x
        x = self.embedding(x)
        # 初始状态 hidden
        output, state = self.gru(x, initial_state = hidden)
        return output, state
    # 隐藏状态
    def initialize_hidden_state(self):
        return tf.zeros((self.batch_size, self.encoding_units))
# embedding：特征维度 
# units：神经元个数
# embedding_dim：embedding维度，256.
# vocab_inp_size：输入单词的一个词典大小
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# 获得初始化的hidden
# 初始化隐状态
# 二维列表： 默认的话是（64*1024） 即批次*编码单元个数
sample_hidden = encoder.initialize_hidden_state()
# 获得输出和隐含状态
# 对象运行调用了call方法，因为继承了tf.keras.model
# example_input_batch：转换为向量的训练集的一个批次48
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
# 维度变化：
# 首先，句子经过tokenizer处理，句子的维度变为（,16）
# 
# 输出的16是长度，1024是状态的size
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))

输出：

Encoder output shape: (batch size, sequence length, units) (64, 16, 1024)

Encoder Hidden state shape: (batch size, units) (64, 1024)

八、BahdanauAttention部分

1、Bahdanau注意力公式：

EO: (Encoder_Output) encoder各个位置的输出。

H: (Hidden_State) decoder某一步的隐含状态。

FC: 全连接层

X: decoder的一个输入

score = FC(tanh(FC(EO)+FC(H)))

基于attention的seq2seq结构图如下：

2、tf.keras.layers.Dense：全连接层在整个网络卷积神经网络中起到“特征提取器”的作用。

tf.keras.layers.Dense(

units, # 正整数，输出空间的维数

activation=None, # 激活函数，不指定则没有

use_bias=True, # 布尔值，是否使用偏移向量

kernel_initializer=‘glorot_uniform’, # 核权重矩阵的初始值设定项

bias_initializer=‘zeros’, # 偏差向量的初始值设定项

kernel_regularizer=None, # 正则化函数应用于核权矩阵

bias_regularizer=None, # 应用于偏差向量的正则化函数

activity_regularizer=None, # Regularizer function applied to the output of the layer (its “activation”)

kernel_constraint=None, # Constraint function applied to the kernel weights matrix.

bias_constraint=None, **kwargs # Constraint function applied to the bias vector

)

3、tf.expand_dims(

input, axis=None, name=None, dim=None

)

tf.expand_dims：给定的张量input，该操作插入尺寸索引处的1维axis的input的形状。维度索引axis从零开始；简单来说就是增加一个维度。

#实现Attention机制
# 
class BahdanauAttention(tf.keras.Model):
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        # 做三个全连接层
        # 做全连接
        # 做几个全连接层。
        # 激活函数 默认是没有的。
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    # query传的是decoder_ hidden
    # 实现公式
    # call传入的是sample_hidden
    def call(self, query, values):
        # deocoder_ hidden. shape: (batch_ size,units )
        # encoder outputs.shape: ( batch size, length,units )
        # 做维度扩展，扩展前后对比是下面两行
        # hidden shape == (batch_size, hidden size)
        # hidden_with_time_axis shape == (batch_size, 1, hidden size)
        # we are doing this to perform addition to calculate the score
        # 维度扩展
        # 运算需要，必须维度一致。
        hidden_with_time_axis = tf.expand_dims(query, 1)
        #接下来要实现Attention,Bahdanau方式的
        # beforeV:(batch_ size,length,units )
        # after V( batch_ size,length,1)
        # score shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        # the shape of the tensor before applying self.V is (batch_size, max_length, units)
        score = self.V(tf.nn.tanh(self.W1(values) + self.W2(hidden_with_time_axis)))
        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)
        #先算加权，values就是encoder_outputs
        # context_vector.shape: ( batch size, length units )
        context_vector = attention_weights * values
        #再算平均，在length的维度去求和
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weights
#     
attention_layer = BahdanauAttention(10)
attention_result, attention_weights = attention_layer(sample_hidden, sample_output)
print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))

输出：

Attention result shape: (batch size, units) (64, 1024)

Attention weights shape: (batch_size, sequence_length, 1) (64, 16, 1)

九、Decoder部分

#接着我们实现decoder
class Decoder(tf.keras.Model):
    #init传参和encoder很像
    def __init__(self, vocab_size, embedding_dim, decoding_units, batch_size):
        #这里必须调用父类
        super(Decoder, self).__init__()
        self.batch_size = batch_size
        self.decoding_units = decoding_units
        # Embedding 层
        # 
        self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
        # GRU的decoder
        # 
        self.gru = keras.layers.GRU(self.decoding_units,
                                    return_sequences=True,
                                    return_state=True,
                                    recurrent_initializer='glorot_uniform')
        self.fc = keras.layers.Dense(vocab_size)
        # 拿到注意力的对象
        # used for attention，每一步都会被调用
        self.attention = BahdanauAttention(self.decoding_units)
    #照着原来的原理图理解
    def call(self, x, hidden, encoding_output):
        # context vector. shape: ( batch size, units）
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, encoding_output)
        # before embedding: x. shape: (batch_ size, 1 )
        # after embedding : x. shape: (batch size, 1， embedding units)
        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)
        #把x和context_vector拼起来，context_vector为什么要扩展维度？
        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
        # passing the concatenated vector to the GRU
        # output. shape:[batch_size,1,decoding_units ]
        #state. shape:[batch_size, decoding_units ]
        output, state = self.gru(x)
        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))
        # output shape == (batch_size, vocab)
        x = self.fc(output)
        return x, state, attention_weights
decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)
sample_decoder_output,decoder_hidden,decoder_aw = decoder(tf.random.uniform((64, 1)),
                                      sample_hidden, sample_output)
print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))
print( "decoder_hidden.shape: ",decoder_hidden.shape )
print( "decoder_attention_weights.shape:",decoder_aw.shape )

输出：

Decoder output shape: (batch_size, vocab size) (64, 4935)
decoder_hidden.shape:  (64, 1024)
decoder_attention_weights.shape: (64, 16, 1)

十、定义优化器、损失函数

# 我们用的优化器是adam
optimizer = keras.optimizers.Adam()
# 分类问题我们往往用SparseCategoricalCrossentropy，因为我们的fc是纯的输出，没有加softmax，
# 因此这里的from_logits为True，否则改为false，reduction是损失函数如何做聚合
# 交叉损失函数
# from_logits: 为True时，会将y_pred转化为概率（用softmax）,否则不进行转换，通常情况下用True结果更稳定；
# reduction：类型为tf.keras.losses.Reduction，对loss进行处理，默认是求平均；
loss_object = keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
def loss_function(real, pred):
    # 是零的时候返回结果是True，因此要取反操作
    # tf.math.equal(real, 0)是padding的部分都是1，不是padding的部分都是零，因此我们要取反
    # 取反
    # tf.math.logical_not: 为Tensorflow中的逻辑NOT功能提供支持
    # 返回的mask是一个张量，数据类型为bool类型。
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)
    #将张量转换为新类型
    # 将张量转为float类型
    mask = tf.cast(mask, dtype=loss_.dtype)
    # padding部分的mask是零
    # 
    loss_ *= mask
    # 计算累计的平均损失
    # 用于计算张量tensor沿着指定的数轴（tensor的某一维度）上的的平均值，主要用作降维或者计算tensor（图像）的平均值。
    return tf.reduce_mean(loss_)
# 保存模型
checkpoint_dir = './8-1_checkpoints'
if not os.path.exists(checkpoint_dir):
    os.mkdir(checkpoint_dir)
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

十一、提高调用速度，把它变成图

# 为了提高调用速度，把它变成图。
@tf.function
# 
def train_step(inp, targ, encoding_hidden):
    loss = 0
    # 
    with tf.GradientTape() as tape:
        #把输入给encoder，得到encoding_output, encoding_hidden
        encoding_output, encoding_hidden = encoder(inp, encoding_hidden)
        decoding_hidden = encoding_hidden
        decoding_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
        #eg: <start> I am here <end>
        #1.<start>->I
        #2.I->am
        #3.am->here
        # 4. here ->< end>
        #对于here，我们相当于要把I am  的信息都要给过去
        # Teacher forcing - feeding the target as the next input
        for t in range(1, targ.shape[1]):
            # passing enc_output to the decoder
            #根据我们前面的原理解析，我们这里需要给3项信息
            predictions, decoding_hidden, _ = decoder(decoding_input, decoding_hidden, encoding_output)
            loss += loss_function(targ[:, t], predictions)
            # using teacher forcing
            decoding_input = tf.expand_dims(targ[:, t], 1)
    #这里是每个batch上平均的损失函数
    batch_loss = (loss / int(targ.shape[1]))
    variables = encoder.trainable_variables + decoder.trainable_variables
    #求梯度
    gradients = tape.gradient(loss, variables)
    #有了梯度以后，可以用optimizer去做apply
    optimizer.apply_gradients(zip(gradients, variables))
    return batch_loss

十二、训练

EPOCHS = 10
#这里运行时间比较久
for epoch in range(EPOCHS):
    start = time.time()
    # 第一次，全0的隐含状态
    encoding_hidden = encoder.initialize_hidden_state()
    total_loss = 0
    # 取多少次数据
    # 每次去取dataset.take(steps_per_epoch)这么多数据
    # 每训练100次打印一下损失
    for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
        batch_loss = train_step(inp, targ, encoding_hidden)
        total_loss += batch_loss
        #这里增加打印
        if batch % 100 == 0:
            print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch, batch_loss.numpy()))
    # saving (checkpoint) the model every 2 epochs，保存模型
    # 满足条件才保存模型
    if (epoch + 1) % 2 == 0:
        checkpoint.save(file_prefix = checkpoint_prefix)
    print('Epoch {} Loss {:.4f}'.format(epoch + 1, total_loss / steps_per_epoch))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

十三、给定字符串，进行翻译。

# 接收字符串，并进行翻译
def evaluate(sentence):
    attention_plot = np.zeros((max_length_targ, max_length_inp))
    sentence = preprocess_sentence(sentence)
    #text到id的转换
    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
    #加padding
    inputs = keras.preprocessing.sequence.pad_sequences([inputs], maxlen=max_length_inp, padding='post')
    inputs = tf.convert_to_tensor(inputs)
    result = ''
    hidden = [tf.zeros((1, units))]
    encoding_out, encoding_hidden = encoder(inputs, hidden)
    #按模型把encoding_hidden给decoding_hidden
    decoding_hidden = encoding_hidden
    decoding_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)
    #eg:<start>->A
    #A->B->C-> D
    # decoding_ input. shape:(1， 1)
    for t in range(max_length_targ):
        predictions, decoding_hidden, attention_weights = decoder(
            decoding_input, decoding_hidden, encoding_out)
        # attention weights. shape: (batch size, input length, 1) (1， 16， 1 )，需要变为长度为16的向量
        # storing the attention weights to plot later on
        attention_weights = tf.reshape(attention_weights, (-1, ))
        attention_plot[t] = attention_weights.numpy()
        # predictions.shape: (batch_ size, vocab_ size) (1， 4935)
        #获取概率最大的值作为下一步的输入
        predicted_id = tf.argmax(predictions[0]).numpy()
        result += targ_lang.index_word[predicted_id] + ' '
        #终止循环
        if targ_lang.index_word[predicted_id] == '<end>':
            return result, sentence, attention_plot
        # the predicted ID is fed back into the model
        decoding_input = tf.expand_dims([predicted_id], 0)
    #到此decoding_input，decoding_hidden我们都做了更新
    return result, sentence, attention_plot
# function for plotting the attention weights，把注意力关系完成可视化
def plot_attention(attention, sentence, predicted_sentence):
    fig = plt.figure(figsize=(10,10))
    ax = fig.add_subplot(1, 1, 1)
    ax.matshow(attention, cmap='viridis')
    fontdict = {'fontsize': 14}
    #把标注写上，我们需要把第零个位置空出来，看图即可看出
    ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)
    ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)
    plt.show()
#通过这个函数，把上面两个函数串起来
def translate(sentence):
    result, sentence, attention_plot = evaluate(sentence)
    print('Input: %s' % (sentence))
    print('Predicted translation: {}'.format(result))
    #因为输出不一定有输入的长度长，也就是result长度小于输入的长度
    attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))]
    plot_attention(attention_plot, sentence.split(' '), result.split(' '))
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

评估

#it is terribly cold here
translate(u'hace mucho frio aqui.')

参考文章：

Seq2Seq系列(一)：基于神经网络的高维时间序列预测.

Seq2Seq原理详解 .

Seq2Seq模型介绍 .

Seq2Seq 模型详解 .

机器翻译——基于注意力机制的seq2seq结构（下）

六、相关参数设置

七、Encoder部分

八、BahdanauAttention部分

九、Decoder部分

十、定义优化器、损失函数

十一、提高调用速度，把它变成图

十二、训练

十三、给定字符串，进行翻译。

总结

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

机器翻译——基于注意力机制的seq2seq结构（下）

六、相关参数设置

七、Encoder部分

八、BahdanauAttention部分

九、Decoder部分

十、定义优化器、损失函数

十一、提高调用速度，把它变成图

十二、训练

十三、给定字符串，进行翻译。

总结

热门文章

最新文章

相关电子书