【图神经网络DGL】GCN应用于Karate Club-阿里云开发者社区

【图神经网络DGL】GCN应用于Karate Club

2022-04-27 333

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Karate club是一个社交网络，包括34个成员，并在俱乐部外互动的成员之间建立成对链接。俱乐部随后分为两个社区，由教员（节点0）和俱乐部主席（节点33）领导。网络以如下方式可视化，并带有表示社区的颜色（如下图）。

一、题目描述

Karate club是一个社交网络，包括34个成员，并在俱乐部外互动的成员之间建立成对链接。俱乐部随后分为两个社区，由教员（节点0）和俱乐部主席（节点33）领导。网络以如下方式可视化，并带有表示社区的颜色（如下图）。

任务：预测给定社交网络本身每个成员倾向于加入哪一侧的社区（0或33）。

二、步骤

2.1 在DGL中创建网络图

这里可以复习上一节的【图神经网络DGL】数据封装和消息传递机制的数据封装。

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 17 21:16:42 2021
@author: 86493
"""
import dgl
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.animation as animation
import matplotlib.pyplot as plt
def build_karate_club_graph():
    # All 78 edges are stored in two numpy arrays. One for source endpoints
    # while the other for destination endpoints.
    src = np.array([1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 10, 10,
                    10, 11, 12, 12, 13, 13, 13, 13, 16, 16, 17, 17, 19, 19, 21, 21,
                    25, 25, 27, 27, 27, 28, 29, 29, 30, 30, 31, 31, 31, 31, 32, 32,
                    32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
                    33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
    dst = np.array([0, 0, 1, 0, 1, 2, 0, 0, 0, 4, 5, 0, 1, 2, 3, 0, 2, 2, 0, 4,
                    5, 0, 0, 3, 0, 1, 2, 3, 5, 6, 0, 1, 0, 1, 0, 1, 23, 24, 2, 23,
                    24, 2, 23, 26, 1, 8, 0, 24, 25, 28, 2, 8, 14, 15, 18, 20, 22, 23,
                    29, 30, 31, 8, 9, 13, 14, 15, 18, 19, 20, 22, 23, 26, 27, 28, 29, 30,
                    31, 32])
    # Edges are directional in DGL; Make them bi-directional.
    u = np.concatenate([src, dst])
    v = np.concatenate([dst, src])
    # Construct a DGLGraph
    return dgl.DGLGraph((u, v))

G = build_karate_club_graph() 
print('We have %d nodes.' % G.number_of_nodes()) 
print('We have %d edges.' % G.number_of_edges())
# We have 34 nodes.
# We have 156 edges.
import networkx as nx
# 由于实际图形是无向的，因此我们去掉边的方向，以达到可视化的目的
nx_G = G.to_networkx().to_undirected()
# 为了图更加美观，我们使用Kamada-Kawaii layout 
pos = nx.kamada_kawai_layout(nx_G)
nx.draw(nx_G, pos, with_labels=True, node_color=[[.7, .7, .7]])

后面代码中我们就把draw的这块封装在一个visual函数内。

2.2 将特征分配给节点or边

GNN将特征与节点和边关联进行训练，本题分类中，每个节点对应一个独热编码。在DGL中，可通过一个特征向量为所有的节点添加特征，该张量沿着第一维处理。

# 对角矩阵
G.ndata['feat'] = torch.eye(34)
print(torch.eye(34))
# 打印出label为2的节点的特征
a = G.nodes[2].data['feat']
print(a)
# 打印出label为5和6的节点的特征
b = G.nodes[[5, 6]].data['feat']
print(b)

即如下创建一个对角矩阵：

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

结果为：

tensor([[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

不过这里我们可以使用nn.embedding：

    ## 对 34 个节点做 embedding
    embed = nn.Embedding(34, 5)  # 34 nodes with embedding dim equal to 5
    print(embed.weight)
    G.ndata['feat'] = embed.weight
    # print out node 2's input feature
    print(G.ndata['feat'][2])
    # print out node 10 and 11's input features
    print(G.ndata['feat'][[10, 11]])

2.3 定义一个图卷积神经网络

关于GCN的原理可看原作者的博客：https://tkipf.github.io/graph-convolutional-networks/

图卷积层的数学定义：

其中：

e j i e_{ji}e

是节点j jj到节点i ii的边权值；

初始时可以设c j i c_{ji}c

为norm='none' ，然后在前向传播forward计算时赋值为e j i e_{ji}e

；

~dgl.nn.pytorch.EdgeWeightNorm对标量边权值进行归一化。

一般来说，节点通过message函数传递消息，然后通过reduce函数进行数据聚合（下面栗子的聚合是通过sum）。

（1）第一层将大小为34的输入特征转换为隐藏的大小为5。

（2）第二层将隐藏层转换为大小为2的输出特征，对应Karate club中的两个组。

from dgl.nn.pytorch import GraphConv
class GCN(nn.Module):
    def __init__(self, in_feats, hidden_size, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, hidden_size)
        self.conv2 = GraphConv(hidden_size, num_classes)
    def forward(self, g, inputs):
        h = self.conv1(g, inputs)
        h = torch.relu(h)
        h = self.conv2(g, h)
        return h

对应的网络结构很简单：

GCN(
  (gcn1): GCNLayer(
    (linear): Linear(in_features=34, out_features=5, bias=True)
  )
  (gcn2): GCNLayer(
    (linear): Linear(in_features=5, out_features=2, bias=True)
  )
)

2.4 输出准备和初始化

# 数据准备和初始化
inputs = G.ndata['feat']
labeled_nodes = torch.tensor([0, 33])
labels = torch.tensor([0, 1])

2.5 训练和可视化

def train(G, inputs, embed, labeled_nodes,labels):
    net = GCN(5,5,2)
    import itertools
    optimizer = torch.optim.Adam(itertools.chain(net.parameters(), embed.parameters()), lr=0.01)
    all_logits = []
    for epoch in range(30):
        logits = net(G, inputs)
        # we save the logits for visualization later
        # detach代表从当前计算图中分离下来的
        all_logits.append(logits.detach()) 
        logp = F.log_softmax(logits, 1)
        # 半监督学习， 只使用标记的节点计算loss
        loss = F.nll_loss(logp[labeled_nodes], labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))
    print(all_logits)

为了可视化，并且在train函数中加入draw函数，这里还用到了生成动态图的animation.FuncAnimation函数。

相反，由于模型为每个节点生成大小为2的输出特征，因此我们可以通过在2D空间中绘制输出特征来可视化。下面的代码使训练过程从最初的猜测（根本没有正确分类节点）到最终的结果（线性可分离节点）动画化。

    def draw(i):
        cls1color = '#00FFFF'
        cls2color = '#FF00FF'
        pos = {}
        colors = []
        for v in range(34):
            pos[v] = all_logits[i][v].numpy()
            cls = pos[v].argmax()
            colors.append(cls1color if cls else cls2color)
        ax.cla()
        ax.axis('off')
        ax.set_title('Epoch: %d' % i)
        nx.draw_networkx(nx_G.to_undirected(), pos, node_color=colors,
                         with_labels=True, node_size=300, ax=ax)
    nx_G = G.to_networkx().to_undirected()
    fig = plt.figure(dpi=150)
    fig.clf()
    ax = fig.subplots()
    for i in range(30):
        draw(i)
        plt.pause(0.2)
    ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
    ani.save('change1.gif', writer='imagemagick', fps=10)
    plt.show()

【图神经网络DGL】GCN应用于Karate Club

2.2 将特征分配给节点or边

2.3 定义一个图卷积神经网络

2.4 输出准备和初始化

2.5 训练和可视化

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【图神经网络DGL】GCN应用于Karate Club

2.2 将特征分配给节点or边

2.3 定义一个图卷积神经网络

2.4 输出准备和初始化

2.5 训练和可视化

热门文章

最新文章

相关课程

相关电子书

相关实验场景