Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

简介: Tutorials on training the Skip-thoughts vectors for features extraction of sentence.    1. Send emails and download the training dataset.

Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

 

  1. Send emails and download the training dataset. 

    the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb 

    first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files: 

    

    unzip these files in the current folders. 

  2. Open and download the tensorflow version code.   

    Do as the following links: https://github.com/tensorflow/models/tree/master/skip_thoughts 

    Then, you will see the processing as follows: 

          

    [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations. 

 

  3. Encoding Sentences :   

    (1). First, open a terminal and input "ipython" : 

    (2). input the following code to the terminal: 

ipython  # Launch iPython.

In [0]:

# Imports.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager

In [1]:
# Set paths to the model.
VOCAB_FILE = "/path/to/vocab.txt"
EMBEDDING_MATRIX_FILE = "/path/to/embeddings.npy"
CHECKPOINT_PATH = "/path/to/model.ckpt-9999"
# The following directory should contain files rt-polarity.neg and
# rt-polarity.pos.

 

    For this moment, you already defined the environment, then, you need also do the followings: 

In [2]:
# Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models.
encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
                   vocabulary_file=VOCAB_FILE,
                   embedding_matrix_file=EMBEDDING_MATRIX_FILE,
                   checkpoint_path=CHECKPOINT_PATH)

In [3]:
# Load the movie review dataset.
data = [' This is my first attempt  to the tensorflow version skip_thought_vectors ... ']

 

  The, it's time to get the 2400# features now. 

In [4]:
# Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data)
print(encodings)
print(encodings[0]) 

  

  You can see the results of the algorithm as followings: 

  

 

  Now that, you have obtain the features of the input sentence. you can now load your texts to obtain the results. Come on ... 

 

相关文章
|
机器学习/深度学习 自然语言处理 TensorFlow
Next Sentence Prediction,NSP
Next Sentence Prediction(NSP) 是一种用于自然语言处理 (NLP) 的预测技术。
418 2
|
5月前
|
TensorFlow API 算法框架/工具
【Tensorflow】解决Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Te
文章讨论了在使用Tensorflow 2.3时遇到的一个错误:"Inputs to eager execution function cannot be Keras symbolic tensors...",这个问题通常与Tensorflow的eager execution(急切执行)模式有关,提供了三种解决这个问题的方法。
56 1
|
8月前
|
机器学习/深度学习 自然语言处理 ice
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
63 2
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
|
8月前
|
算法 TensorFlow 算法框架/工具
[FastText in Word Representations]论文实现:Enriching Word Vectors with Subword Information*
[FastText in Word Representations]论文实现:Enriching Word Vectors with Subword Information*
50 2
|
自然语言处理 算法
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
在社交媒体上,面临着大量的知识和信息,一个有效的关键词抽取算法可以广泛地被应用的信息检索和自然语言处理中。传统的关键词抽取算法很难使用外部的知识信息。
173 0
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
|
8月前
|
机器学习/深度学习 数据挖掘 Python
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
65 0
|
机器学习/深度学习 人工智能 自然语言处理
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
210 0
|
自然语言处理 算法 vr&ar
X-GEAR:Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
我们提出了一项利用多语言预训练生成语言模型进行零样本跨语言事件论元抽取(EAE)的研究。通过将EAE定义为语言生成任务,我们的方法有效地编码事件结构并捕获论元之间的依赖关系。
131 0
|
自然语言处理 搜索推荐 数据挖掘
RolePred: Open-Vocabulary Argument Role Prediction for Event Extraction 论文解读
事件抽取中的论元角色是指事件和参与事件的论元之间的关系。尽管事件抽取取得了巨大进展,但现有研究仍然依赖于领域专家预定义的角色。
80 0
|
自然语言处理 算法 知识图谱
DEGREE: A Data-Efficient Generation-Based Event Extraction Model论文解读
事件抽取需要专家进行高质量的人工标注,这通常很昂贵。因此,学习一个仅用少数标记示例就能训练的数据高效事件抽取模型已成为一个至关重要的挑战。
180 0