【多标签文本分类】代码详解Seq2Seq模型

2023-02-24 371 发布于辽宁

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

交互式建模 PAI-DSW，每月250计算时 3个月

模型训练 PAI-DLC，100CU*H 3个月

模型在线服务 PAI-EAS，A10/V100等 500元 1个月

简介： 【多标签文本分类】代码详解Seq2Seq模型

·阅读摘要：

本文提出经典的Seq2Seq模型，应用于机器翻译领域。但是Seq2Seq适用于很多领域，比如多标签文本分类。

·参考文献：

[1] Sequence to Sequence Learning with Neural Networks

【注一】：本论文提出的Seq2Seq模型，引发一系列基于Seq2Seq模型的文章问世。地位类似于2014年Kim发表的TextCNN，2017年Google发表的Transformer。

【注二】：论文的内容比较简单，重点都是在讲解Seq2Seq的原理。本篇博客将从pytorch实现Seq2Seq的角度讲解用代码逻辑理解Seq2Seq。

[1] Seq2Seq模型图

[2] 编码器（Encoder）

代码如下：

代码中，参数src是源序列，参数trg是目标序列

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
        super().__init__()
        self.hid_dim = hid_dim
        self.n_layers = n_layers
        self.embedding = nn.Embedding(input_dim, emb_dim)
        self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)
        self.dropout = nn.Dropout(dropout)
    def forward(self, src):
        #src = [src len, batch size]
        embedded = self.dropout(self.embedding(src))
        #embedded = [src len, batch size, emb dim]
        outputs, (hidden, cell) = self.rnn(embedded)
        #outputs = [src len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        #cell = [n layers * n directions, batch size, hid dim]
        #outputs are always from the top hidden layer
        return hidden, cell

编码器（Encoder）就是一个普通的双向LSTM模型，比较简单。

正常情况下，我们使用的是最后一层，每个时间步的输出outputs。

这里，编码器（Encoder）返回的是每一层每个时间步的输出hidden与中间参数cell。hidden与中间参数cell会作为解码器（Decoder）的输入。

[3] 解码器（Decoder）

代码如下：

代码中，参数src是源序列，参数trg是目标序列

class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hid_dim, n_layers, dropout):
        super().__init__()     
        self.output_dim = output_dim
        self.hid_dim = hid_dim
        self.n_layers = n_layers   
        self.embedding = nn.Embedding(output_dim, emb_dim) 
        self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)
        self.fc_out = nn.Linear(hid_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
    def forward(self, input, hidden, cell):
        input = input.unsqueeze(0)
        #input = [1, batch size]
        embedded = self.dropout(self.embedding(input))
        #embedded = [1, batch size, emb dim]  
        output, (hidden, cell) = self.rnn(embedded, (hidden, cell))
        #output = [seq len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        #cell = [n layers * n directions, batch size, hid dim]
        prediction = self.fc_out(output.squeeze(0))
        #prediction = [batch size, output dim]
        return prediction, hidden, cell

解码器（Decoder）依然是一个LSTM层，它的输入是上一次的输出hidden与cell和上一次生成的单词的词向量input。

【注三】：在第一次运行Decoder的时候，用的是Encoder的输出hidden与cell和开始字符<SOS>的词向量。

到这里，其实还是有诸多疑问的，包括：

1、解码器（Decoder）是一个词一个词蹦出来的，对于Decoder可见性的要用循环来遍历一下，这个循环怎么写的问题；

2、词向量怎么转化成单词，由于转化的单词要立即送到Decoder里，所以这个转化操作要在模型内完成，不应该作为输出，放到外面转化。

3、模型要返回全连接层的输出，这是个向量，便于后续做loss计算；

4、如果Decoder出的第一个单词就错误的话，那整个Decoder出来的句子就打错特错，如何防止这种情况。

[4] Seq2Seq

最终把Encoder和Decoder整合，才算成功，代码如下：

代码中，参数src是源序列，参数trg是目标序列

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
        assert encoder.n_layers == decoder.n_layers, \
            "Encoder and decoder must have equal number of layers!"
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use ground-truth inputs 75% of the time
        batch_size = trg.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        #tensor to store decoder outputs
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
        #last hidden state of the encoder is used as the initial hidden state of the decoder
        hidden, cell = self.encoder(src)
        #first input to the decoder is the <sos> tokens
        input = trg[0,:]
        for t in range(1, trg_len):  
            #insert input token embedding, previous hidden and previous cell states
            #receive output tensor (predictions) and new hidden and cell states
            output, hidden, cell = self.decoder(input, hidden, cell) 
            #place predictions in a tensor holding predictions for each token
            outputs[t] = output
            #decide if we are going to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio
            #get the highest predicted token from our predictions
            top1 = output.argmax(1) 
            #if teacher forcing, use actual next token as next input
            #if not, use predicted token
            input = trg[t] if teacher_force else top1 
        return outputs

看到以上模型，可以解决在步骤[3]中残留的问题。首先用编码器得到输出，然后以一个for循环逐次执行解码器。

【注四】：self.decoder的输出output是经过全连接层的，它的含义是概率，接下来执行output.argmax(1)就是找出其中最大概率对应的序号，这样去词典/标签集中一找就能对应得上了。

【注五】：teacher_force是一种非常好的机制，防止解码器一错再错，随机填入目标序列中的词作为输入，用以纠正。再训练阶段我们可以使用teacher_force机制，但是在验证、测试时使用teacher_force机制是不对的，我们需要设置模型的形参teacher_forcing_ratio=0。

【注六】：模型最终return的outputs是经过全连接层的！它的含义是概率！它要作为损失函数的输入计算loss的。

⭐【注七】：这样写还是有点问题，在验证、测试的时候，还是这样的话，即使teacher_forcing_ratio=0，但已经默认了生成序列的长度，这是不对的。

⭐【注八】：在验证、测试的时候，我们应该以出现终止符<EOS>为结束。对于机器翻译任务可行，但是对于多标签文本分类，应该没有效果。因为文本的最后一个单词是具有结束语义信息的，标签不具有。还要继续看论文深造。。。

[5] 结尾

其实有时候看代码比看论文更容易理解，只是好的代码不好找。

完整代码参考：https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb

【多标签文本分类】代码详解Seq2Seq模型

[1] Seq2Seq模型图

[2] 编码器（Encoder）

[3] 解码器（Decoder）

[4] Seq2Seq

[5] 结尾

ModelScope模型即服务

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

【多标签文本分类】代码详解Seq2Seq模型

[1] Seq2Seq模型图

[2] 编码器（Encoder）

[3] 解码器（Decoder）

[4] Seq2Seq

[5] 结尾

ModelScope模型即服务

热门文章

最新文章

相关电子书