论文名称:Watch Your Step: Learning Graph Embeddings Through Attention
这篇文章的出发点是自动化选择网络表示学习的参数从而适应不同网络的需求。同时文章也证明了DeepWalk的工作其实等同于矩阵分解。
论文简介
文章分析了之前一些工作的一个不可避免的问题,那就是模型参数的选择问题。对于不同的网络,模型能达到最好效果的参数是不同的,如果对于每个新出现的问题都要反复试验找到合适的参数,这显然是低效的。这些参数包括DeepWalk中窗口大小w的选取,node2vec中定义的两个跟随机游走有关的参数p、q。参数的选取完全影响着模型的好坏。
这些参数其实构成的是一个关于采样节点的一个概率分布,以DeepWalk中的w为例,对于随机游走的每一个节点v,采样的节点为v的w邻域的2w个节点,而采样每个节点的概率就是 ,这实际上就是一个采样节点的均匀分布。本文的思想就是将这些超参数作为要学习的参数,这样就能自动学习出最适合网络的超参数。
具体要做的有以下几点:
- 把随机游走中对节点的上下文采样过程看成是对转移概率矩阵的期望。
- 通过证明DeepWalk等同于矩阵分解发现上述的采样过程对应的目标是k阶转移概率矩阵。在这个不同的k阶转移矩阵前的系数就定义了一种对节点不同的采样过程。引入Attention Model自动学习这些系数,从而达到最好的采样效果。
- 用Attention Model学习出来的参数和人工调出来的参数效果差不多,进一步证明了方法的可行性。
本文的主要思想就是通过Attention Model自动化学习网络中的参数,将网络中的参数看成随机游走中对邻居采样的一种概率分布,通过学习最适应网络的这种分布达到好的网络表示结果。
论文思想
原文:【论文笔记】Watch Your Step
https://zhuanlan.zhihu.com/p/46935910
论文代码
完整代码可以查看google research:https://github.com/google-research/google-research/tree/master/graph_embedding/watch_your_step
def GetOrMakeAdjacencyMatrix(): """Creates Adjacency matrix and caches it on disk with name a.npy.""" a_file = os.path.join(FLAGS.dataset_dir, 'a.npy') if os.path.exists(a_file): return numpy.load(open(a_file, 'rb')) num_nodes = GetNumNodes() a = numpy.zeros(shape=(num_nodes, num_nodes), dtype='float32') train_edges = numpy.load( open(os.path.join(FLAGS.dataset_dir, 'train.txt.npy'), 'rb')) a[train_edges[:, 0], train_edges[:, 1]] = 1.0 if not IsDirected(): a[train_edges[:, 1], train_edges[:, 0]] = 1.0 numpy.save(open(a_file, 'wb'), a) return a def GetPowerTransitionPairs(highest_power): return list(IterPowerTransitionPairs(highest_power)) def IterPowerTransitionPairs(highest_power): """Yields powers of transition matrix (T, T*T, T*T*T, ...). It caches them on disk as t_<i>.npy, where <i> is the power. The first power (i = 1) is not cached as it is trivially computed from the adjacency matrix. Args: highest_power: integer representing the highest power of the transition matrix. This will be the number of yields. """ num_nodes = GetNumNodes() for i in range(highest_power): if i == 0: a = GetOrMakeAdjacencyMatrix() transition = a.T degree = transition.sum(axis=0) transition /= degree + 0.0000001 power_array = transition else: power_filename = os.path.join(FLAGS.dataset_dir, 't_%i.npy' % (i + 1)) if os.path.exists(power_filename): power_array = numpy.load(open(power_filename, 'rb')) else: power_array = power_array.dot(transition) print('Computing T^%i ...' % (i + 1)) # pylint: disable=superfluous-parens numpy.save(open(power_filename, 'wb'), power_array) print(' ... Saved T^%i' % (i + 1)) # pylint: disable=superfluous-parens placeholder = tf.placeholder(tf.float32, shape=(num_nodes, num_nodes)) yield (placeholder, power_array) def GetParametrizedExpectation(references): r"""Calculates E[D; q_1, q_2, ...]: a parametrized (tensor) matrix D. Which is defined as: E[D; q] = P_0 * (Q_1*T + Q_2*T^2 + Q_3*T^3 + ...) where Q_1, Q_2, ... = softmax(q_1, q_2, ...) and vector (q_1, q_2, ...) is created as a "trainable variable". Args: references: Dict that will be populated as key-value pairs: 'combination': \sum_j Q_j T^j (i.e. E[D] excluding P_0). 'normed': The vector Q_1, Q_2, ... (sums to 1). 'mults': The vector q_1, q_2, ... (Before softmax, does not sum to 1). Returns: Tuple (E[D; q], feed_dict) where the first entry contains placeholders and the feed_dict contains is a dictionary from the placeholders to numpy arrays of the transition powers. """ feed_dict = {} n = FLAGS.transition_powers regularizer = FLAGS.context_regularizer a = GetOrMakeAdjacencyMatrix() transition = a.T degree = transition.sum(axis=0) # transition /= degree + 0.0000001 # transition_pow_n = transition convex_combination = [] # vector q mults = tf.Variable(numpy.ones(shape=(n), dtype='float32')) # vector Q (output of softmax) normed = tf.squeeze(tf.nn.softmax(tf.expand_dims(mults, 0)), 0) references['mults'] = mults references['normed'] = normed transition_powers = GetPowerTransitionPairs(n) for k, (placeholder, transition_pow) in enumerate(transition_powers): feed_dict[placeholder] = transition_pow convex_combination.append(normed[k] * placeholder) d_sum = tf.add_n(convex_combination) d_sum *= degree tf.losses.add_loss(tf.reduce_mean(mults**2) * regularizer) references['combination'] = convex_combination return tf.transpose(d_sum) * GetNumNodes() * 80, feed_dict # Helper function 1/3 for PercentDelta. def GetPD(target_num_steps): global_step = tf.train.get_or_create_global_step() global_step = tf.cast(global_step, tf.float32) # gs = 0, target = 1 # gs = num_steps, target = 0.01 # Solve: y = mx + c # gives: c = 1 # m = dy / dx = (1 - 0.01) / (0 - num_steps) = - 0.99 / num_steps # Therefore, y = 1 - (0.99/num_steps) * x return -global_step * 0.99 / target_num_steps + 1 # Helper function 2/3 for PercentDelta. def PlusEpsilon(x, eps=1e-5): """Returns x+epsilon, without changing element-wise sign of x.""" return x + (tf.cast(x < 0, tf.float32) * -eps) + ( tf.cast(x >= 0, tf.float32) * eps) # Helper function 3/3 for PercentDelta. def CreateGradMultipliers(loss): """Returns a gradient multiplier so that SGD becomes PercentDelta.""" variables = tf.trainable_variables() # tf.global_variables() gradients = tf.gradients(loss, variables) multipliers = {} target_pd = GetPD(FLAGS.max_number_of_steps) for v, g in zip(variables, gradients): if g is None: continue multipliers[v] = target_pd / PlusEpsilon( tf.reduce_mean(tf.abs(g / PlusEpsilon(v)))) return multipliers def CreateEmbeddingDictionary(side, size): num_nodes = GetNumNodes() embeddings = numpy.array( numpy.random.uniform(low=-0.1, high=0.1, size=(num_nodes, size)), dtype='float32') embeddings = tf.Variable(embeddings, name=side + 'E') tf.losses.add_loss(tf.reduce_mean(embeddings**2) * 1e-6) return embeddings def CreateObjective(g, target_matrix): """Returns the objective function (can be nlgl or rmse).""" if FLAGS.objective == 'nlgl': # Negative log likelihood # target_matrix is E[D; q], which is used in the "positive part" of the # likelihood objective. We use true adjacency for the "negative part", as # described in our paper. true_adjacency = tf.Variable( GetOrMakeAdjacencyMatrix(), name='adjacency', trainable=False) logistic = tf.sigmoid(g) return -tf.reduce_mean( tf.multiply(target_matrix, tf.log(PlusEpsilon(logistic))) + tf.multiply(1 - true_adjacency, tf.log(PlusEpsilon(1 - logistic)))) elif FLAGS.objective == 'rmse': # Root mean squared error return tf.reduce_mean((g - target_matrix)**2) else: logging.fatal('unknown objective "%s".', FLAGS.objective) def CreateGFn(net_l, net_r): return tf.matmul(net_l, tf.transpose(net_r))