Seq2seq

简介: 机器学习中的 Seq2seq 模型是一种将一个序列映射为另一个序列的模型,其主要应用场景是自然语言处理、机器翻译等领域。Seq2seq 模型通过编码器(encoder)将输入序列(如源语言句子)编码为一个连续的向量,然后通过解码器(decoder)将该向量解码为输出序列(如目标语言句子)。在训练过程中,模型会尽可能地使输出序列与真实目标序列接近,以达到最好的映射效果。

机器学习中的 Seq2seq 模型是一种将一个序列映射为另一个序列的模型,其主要应用场景是自然语言处理、机器翻译等领域。Seq2seq 模型通过编码器(encoder)将输入序列(如源语言句子)编码为一个连续的向量,然后通过解码器(decoder)将该向量解码为输出序列(如目标语言句子)。在训练过程中,模型会尽可能地使输出序列与真实目标序列接近,以达到最好的映射效果。
使用 Seq2seq 模型进行任务时,一般需要进行以下步骤:

  1. 数据准备:收集并预处理源语言和目标语言的语料库,为模型提供训练数据。
  2. 构建模型:搭建 Seq2seq 模型,包括编码器和解码器。编码器通常使用循环神经网络(RNN)或长短时记忆网络(LSTM)等,解码器则使用另一个 RNN 或 LSTM。
  3. 训练模型:利用收集到的数据对模型进行训练,通过优化损失函数(如交叉熵损失)来学习模型参数。
  4. 评估模型:使用验证集对模型进行评估,根据评估结果调整模型参数以提高性能。
  5. 应用模型:将训练好的模型应用于实际任务,例如机器翻译、文本摘要等。
    总之,Seq2seq 模型是一种有效的序列映射方法,广泛应用于自然语言处理等领域。通过数据准备、模型构建、训练和评估等步骤,可以利用 Seq2seq 模型解决实际问题。

import tensorflow as tf
import numpy as np
import os, re
from tensorflow.python.layers.core import Dense
MAX_CHAR_PER_LINE = 20

def load_sentences(path):
    with open(path, 'r', encoding="ISO-8859-1") as f:
        data_raw = f.read().encode('ascii', 'ignore').decode('UTF-8').lower()
        data_alpha = re.sub('[^a-z\n]+', ' ', data_raw)
        data = []
        for line in data_alpha.split('\n'):
            data.append(line[:MAX_CHAR_PER_LINE])
    return data
def extract_character_vocab(data):
    special_symbols = ['<PAD>', '<UNK>', '<GO>',  '<EOS>']
    set_symbols = set([character for line in data for character in line])
    all_symbols = special_symbols + list(set_symbols)
    int_to_symbol = {word_i: word for word_i, word in enumerate(all_symbols)}
    symbol_to_int = {word: word_i for word_i, word in int_to_symbol.items()}
    return int_to_symbol, symbol_to_int

input_sentences = load_sentences('data/words_input.txt')  
output_sentences = load_sentences('data/words_output.txt')  

input_int_to_symbol, input_symbol_to_int = extract_character_vocab(input_sentences)
output_int_to_symbol, output_symbol_to_int = extract_character_vocab(output_sentences)
input_int_to_symbol
{0: '<PAD>',
 1: '<UNK>',
 2: '<GO>',
 3: '<EOS>',
 4: 's',
 5: 'n',
 6: 'q',
 7: 'f',
 8: 'v',
 9: 'g',
 10: 'm',
 11: 'w',
 12: 'd',
 13: 'i',
 14: 'o',
 15: 'a',
 16: 'r',
 17: 'y',
 18: 'j',
 19: 'b',
 20: 'c',
 21: ' ',
 22: 'u',
 23: 'p',
 24: 'e',
 25: 'k',
 26: 'h',
 27: 't',
 28: 'z',
 29: 'l',
 30: 'x'}
output_int_to_symbol
{0: '<PAD>',
 1: '<UNK>',
 2: '<GO>',
 3: '<EOS>',
 4: 's',
 5: 'n',
 6: 'q',
 7: 'f',
 8: 'v',
 9: 'g',
 10: 'm',
 11: 'w',
 12: 'd',
 13: 'i',
 14: 'o',
 15: 'a',
 16: 'r',
 17: 'y',
 18: 'j',
 19: 'b',
 20: 'c',
 21: ' ',
 22: 'u',
 23: 'p',
 24: 'e',
 25: 'k',
 26: 'h',
 27: 't',
 28: 'z',
 29: 'l',
 30: 'x'}
NUM_EPOCS = 300
RNN_STATE_DIM = 512
RNN_NUM_LAYERS = 2
ENCODER_EMBEDDING_DIM = DECODER_EMBEDDING_DIM = 64

BATCH_SIZE = int(32)
LEARNING_RATE = 0.0003

INPUT_NUM_VOCAB = len(input_symbol_to_int)
OUTPUT_NUM_VOCAB = len(output_symbol_to_int)
# Encoder placeholders
encoder_input_seq = tf.placeholder(
    tf.int32, 
    [None, None],  
    name='encoder_input_seq'
)

encoder_seq_len = tf.placeholder(
    tf.int32, 
    (None,), 
    name='encoder_seq_len'
)

# Decoder placeholders
decoder_output_seq = tf.placeholder( 
    tf.int32, 
    [None, None],
    name='decoder_output_seq'
)

decoder_seq_len = tf.placeholder(
    tf.int32,
    (None,), 
    name='decoder_seq_len'
)

max_decoder_seq_len = tf.reduce_max( 
    decoder_seq_len, 
    name='max_decoder_seq_len'
)
def make_cell(state_dim):
    lstm_initializer = tf.random_uniform_initializer(-0.1, 0.1)
    return tf.contrib.rnn.LSTMCell(state_dim, initializer=lstm_initializer)

def make_multi_cell(state_dim, num_layers):
    cells = [make_cell(state_dim) for _ in range(num_layers)]
    return tf.contrib.rnn.MultiRNNCell(cells)
# Encoder embedding

encoder_input_embedded = tf.contrib.layers.embed_sequence(
    encoder_input_seq,     
    INPUT_NUM_VOCAB,        
    ENCODER_EMBEDDING_DIM  
)


# Encodering output

encoder_multi_cell = make_multi_cell(RNN_STATE_DIM, RNN_NUM_LAYERS)

encoder_output, encoder_state = tf.nn.dynamic_rnn(
    encoder_multi_cell, 
    encoder_input_embedded, 
    sequence_length=encoder_seq_len, 
    dtype=tf.float32
)

del(encoder_output)
decoder_raw_seq = decoder_output_seq[:, :-1]  
go_prefixes = tf.fill([BATCH_SIZE, 1], output_symbol_to_int['<GO>'])  
decoder_input_seq = tf.concat([go_prefixes, decoder_raw_seq], 1)  
decoder_embedding = tf.Variable(tf.random_uniform([OUTPUT_NUM_VOCAB, 
                                                   DECODER_EMBEDDING_DIM]))
decoder_input_embedded = tf.nn.embedding_lookup(decoder_embedding, 
                                                decoder_input_seq)

decoder_multi_cell = make_multi_cell(RNN_STATE_DIM, RNN_NUM_LAYERS)

output_layer_kernel_initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.1)
output_layer = Dense(
    OUTPUT_NUM_VOCAB,
    kernel_initializer = output_layer_kernel_initializer
)
with tf.variable_scope("decode"):

    training_helper = tf.contrib.seq2seq.TrainingHelper(
        inputs=decoder_input_embedded,
        sequence_length=decoder_seq_len,
        time_major=False
    )

    training_decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_multi_cell,
        training_helper,
        encoder_state,
        output_layer
    ) 

    training_decoder_output_seq, _, _ = tf.contrib.seq2seq.dynamic_decode(
        training_decoder, 
        impute_finished=True, 
        maximum_iterations=max_decoder_seq_len
    )
with tf.variable_scope("decode", reuse=True):
    start_tokens = tf.tile(
        tf.constant([output_symbol_to_int['<GO>']], 
                    dtype=tf.int32), 
        [BATCH_SIZE], 
        name='start_tokens')

    # Helper for the inference process.
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
        embedding=decoder_embedding,
        start_tokens=start_tokens,
        end_token=output_symbol_to_int['<EOS>']
    )

    # Basic decoder
    inference_decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_multi_cell,
        inference_helper,
        encoder_state,
        output_layer
    )

    # Perform dynamic decoding using the decoder
    inference_decoder_output_seq, _, _ = tf.contrib.seq2seq.dynamic_decode(
        inference_decoder,
        impute_finished=True,
        maximum_iterations=max_decoder_seq_len
    )
# rename the tensor for our convenience
training_logits = tf.identity(training_decoder_output_seq.rnn_output, name='logits')
inference_logits = tf.identity(inference_decoder_output_seq.sample_id, name='predictions')

# Create the weights for sequence_loss
masks = tf.sequence_mask(
    decoder_seq_len, 
    max_decoder_seq_len, 
    dtype=tf.float32, 
    name='masks'
)

cost = tf.contrib.seq2seq.sequence_loss(
    training_logits,
    decoder_output_seq,
    masks
)
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)

gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var)
                        for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)
def pad(xs, size, pad):
    return xs + [pad] * (size - len(xs))
input_seq = [
    [input_symbol_to_int.get(symbol, input_symbol_to_int['<UNK>']) 
        for symbol in line]  
    for line in input_sentences  
]

output_seq = [
    [output_symbol_to_int.get(symbol, output_symbol_to_int['<UNK>']) 
        for symbol in line] + [output_symbol_to_int['<EOS>']]  
    for line in output_sentences  
]

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()  

for epoch in range(NUM_EPOCS + 1):  
    for batch_idx in range(len(input_sentences) // BATCH_SIZE): 

        input_batch, input_lengths, output_batch, output_lengths = [], [], [], []
        for sentence in input_sentences[batch_idx:batch_idx + BATCH_SIZE]:
            symbol_sent = [input_symbol_to_int[symbol] for symbol in sentence]
            padded_symbol_sent = pad(symbol_sent, MAX_CHAR_PER_LINE, input_symbol_to_int['<PAD>'])
            input_batch.append(padded_symbol_sent)
            input_lengths.append(len(sentence))
        for sentence in output_sentences[batch_idx:batch_idx + BATCH_SIZE]:
            symbol_sent = [output_symbol_to_int[symbol] for symbol in sentence]
            padded_symbol_sent = pad(symbol_sent, MAX_CHAR_PER_LINE, output_symbol_to_int['<PAD>'])
            output_batch.append(padded_symbol_sent)
            output_lengths.append(len(sentence))

        _, cost_val = sess.run( 
            [train_op, cost],
            feed_dict={
                encoder_input_seq: input_batch,
                encoder_seq_len: input_lengths,
                decoder_output_seq: output_batch,
                decoder_seq_len: output_lengths
            }
        )

        if batch_idx % 629 == 0:
            print('Epcoh {}. Batch {}/{}. Cost {}'.format(epoch, batch_idx, len(input_sentences) // BATCH_SIZE, cost_val))

    saver.save(sess, 'model.ckpt')   
sess.close()
sess = tf.InteractiveSession()    
saver.restore(sess, 'model.ckpt')

example_input_sent = "do you want to play games"
example_input_symb = [input_symbol_to_int[symbol] for symbol in example_input_sent]
example_input_batch = [pad(example_input_symb, MAX_CHAR_PER_LINE, input_symbol_to_int['<PAD>'])] * BATCH_SIZE
example_input_lengths = [len(example_input_sent)] * BATCH_SIZE

output_ints = sess.run(inference_logits, feed_dict={
    encoder_input_seq: example_input_batch,
    encoder_seq_len: example_input_lengths,
    decoder_seq_len: example_input_lengths
})[0]

output_str = ''.join([output_int_to_symbol[i] for i in output_ints])
print(output_str)
INFO:tensorflow:Restoring parameters from model.ckpt
indeed just one of that r
目录
相关文章
|
2月前
|
机器学习/深度学习 自然语言处理
序列到序列(Seq2Seq)模型
序列到序列(Seq2Seq)模型
|
2月前
|
机器学习/深度学习 自然语言处理
seq2seq的机制原理
【8月更文挑战第1天】seq2seq的机制原理。
24 1
|
4月前
|
机器学习/深度学习 自然语言处理
使用seq2seq架构实现英译法(二)
**Seq2Seq模型简介** Seq2Seq(Sequence-to-Sequence)模型是自然语言处理中的关键架构,尤其适用于机器翻译、聊天机器人和自动文摘等任务。它由编码器和解码器组成,其中编码器将输入序列转换为固定长度的上下文向量,而解码器则依据该向量生成输出序列。模型能够处理不同长度的输入和输出序列,适应性强。
|
4月前
|
数据采集 自然语言处理 机器人
使用seq2seq架构实现英译法(一)
**Seq2Seq模型简介** Seq2Seq(Sequence-to-Sequence)模型是自然语言处理中的关键架构,尤其适用于机器翻译、聊天机器人和自动文摘等任务。它由编码器和解码器组成,其中编码器将输入序列转换为固定长度的上下文向量,而解码器则依据该向量生成输出序列。模型能够处理不同长度的输入和输出序列,适应性强。
|
5月前
|
机器学习/深度学习 TensorFlow 算法框架/工具
seq2seq:中英文翻译
seq2seq:中英文翻译
43 1
|
5月前
|
机器学习/深度学习 人工智能 自然语言处理
详细介绍Seq2Seq、Attention、Transformer !!
详细介绍Seq2Seq、Attention、Transformer !!
129 0
|
5月前
random.choice(seq)
random.choice(seq)
26 1
|
机器学习/深度学习 移动开发 自然语言处理
经典Seq2Seq与注意力Seq2Seq模型结构详解
经典Seq2Seq与注意力Seq2Seq模型结构详解
237 0
经典Seq2Seq与注意力Seq2Seq模型结构详解
|
机器学习/深度学习 人工智能 自然语言处理
学习笔记——seq2seq模型介绍
学习笔记——seq2seq模型介绍
291 0
学习笔记——seq2seq模型介绍
|
机器学习/深度学习 TensorFlow API
seq2seq与Attention机制(二)
seq2seq与Attention机制(二)
163 0
seq2seq与Attention机制(二)