嵌入查找（Embedded Lookup）-阿里云开发者社区

嵌入查找（Embedded Lookup）

2023-09-05 229

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 嵌入查找（Embedded Lookup）是一种机器学习技术，它通过将输入数据映射到低维空间，然后在该空间中进行查找。这种技术可以提高搜索和匹配的速度，尤其是在大规模数据集上

嵌入查找（Embedded Lookup）是一种机器学习技术，它通过将输入数据映射到低维空间，然后在该空间中进行查找。这种技术可以提高搜索和匹配的速度，尤其是在大规模数据集上。嵌入查找通常用于以下场景：

词向量：将文本转换为数值向量，然后在词向量空间中进行查找。这可以用于词义消歧、词性标注等自然语言处理任务。
图像特征：将图像的局部特征映射到低维空间，然后在该空间中进行查找。这可以用于图像检索、目标检测等计算机视觉任务。
语音特征：将语音信号转换为数值向量，然后在该空间中进行查找。这可以用于语音识别、说话人识别等语音处理任务。
要使用嵌入查找，需要进行以下步骤：
数据预处理：对输入数据进行预处理，例如文本数据需要进行分词、去停用词等操作，图像数据需要进行归一化等操作。
嵌入学习：使用深度学习模型（如词向量模型、卷积神经网络等）将输入数据映射到低维空间。这可以是一个连续的数值空间，也可以是一个离散的编码空间。
索引构建：在低维空间中构建索引，以便快速查找。这可以通过 k-d 树、FLANN（快速最近邻搜索）等方法实现。
嵌入查找：利用构建的索引，在低维空间中查找与给定查询数据最接近的数据。
总之，嵌入查找是一种将输入数据映射到低维空间，并在该空间中进行查找的机器学习技术。它广泛应用于自然语言处理、计算机视觉和语音处理等领域。通过数据预处理、嵌入学习、索引构建和嵌入查找等步骤，可以实现高效、快速的搜索和匹配。


Ch 11: Concept 02
Embedding Lookup
Import TensorFlow, and begin an interactive session

import tensorflow as tf
sess = tf.InteractiveSession()
Let's say we only have 4 words in our vocabulary: "the", "fight", "wind", and "like".

Maybe each word is associated with numbers.

Word    Number
'the'    17
'fight'    22
'wind'    35
'like'    51
embeddings_0d = tf.constant([17,22,35,51])
Or maybe, they're associated with one-hot vectors.

Word    Vector
'the '    [1, 0, 0, 0]
'fight'    [0, 1, 0, 0]
'wind'    [0, 0, 1, 0]
'like'    [0, 0, 0, 1]
embeddings_4d = tf.constant([[1, 0, 0, 0],
                             [0, 1, 0, 0],
                             [0, 0, 1, 0],
                             [0, 0, 0, 1]])
This may sound over the top, but you can have any tensor you want, not just numbers or vectors.

Word    Tensor
'the '    [[1, 0] , [0, 0]]
'fight'    [[0, 1] , [0, 0]]
'wind'    [[0, 0] , [1, 0]]
'like'    [[0, 0] , [0, 1]]
embeddings_2x2d = tf.constant([[[1, 0], [0, 0]],
                               [[0, 1], [0, 0]],
                               [[0, 0], [1, 0]],
                               [[0, 0], [0, 1]]])
Let's say we want to find the embeddings for the sentence, "fight the wind".

ids = tf.constant([1, 0, 2])
We can use the embedding_lookup function provided by TensorFlow:

lookup_0d = sess.run(tf.nn.embedding_lookup(embeddings_0d, ids))
print(lookup_0d)
[22 17 35]
lookup_4d = sess.run(tf.nn.embedding_lookup(embeddings_4d, ids))
print(lookup_4d)
[[0 1 0 0]
 [1 0 0 0]
 [0 0 1 0]]
lookup_2x2d = sess.run(tf.nn.embedding_lookup(embeddings_2x2d, ids))
print(lookup_2x2d)
[[[0 1]
  [0 0]]

 [[1 0]
  [0 0]]

 [[0 0]
  [1 0]]]

嵌入查找（Embedded Lookup）

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

嵌入查找（Embedded Lookup）

热门文章

最新文章

相关电子书