开发者社区> 问答> 正文

网络分析



目录



网络分析栏提供的都是基于Graph数据结构的分析算法;下图是使用平台网络分析组件构建的一个分析流程实例:


网络分析栏的算法组件都需要设置运行参数,参数说明如下:进程数:参数代号workerNum,用于设置作业并行执行的节点数;数字越大并行度越高,但框架通讯开销会增大。进程内存:参数代号workerMem,用于设置单个 worker可使用的最大内存量,默认每个worker分配4096内存;实际使用内存超过该值,会抛出OutOfMemory异常。


k-Core



功能介绍

  • 一个图的KCore是指反复去除度小于或等于k的节点后,所剩余的子图。若一个节点存在于KCore,而在(K+1)CORE中被移去,那么此节点的核数(coreness)为k。因此所有度为1的节点的核数必然为0,节点核数的最大值被称为图的核数。


参数设置


k:核数的值,必填,默认3


实例



测试数据


新建数据SQL
<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. drop table if exists KCore_func_test_edge;
  2. create table KCore_func_test_edge as
  3. select * from
  4. (
  5.   select '1' as flow_out_id,'2' as flow_in_id from dual
  6.   union all
  7.   select '1' as flow_out_id,'3' as flow_in_id from dual
  8.   union all
  9.   select '1' as flow_out_id,'4' as flow_in_id from dual
  10.   union all
  11.   select '2' as flow_out_id,'3' as flow_in_id from dual
  12.   union all
  13.   select '2' as flow_out_id,'4' as flow_in_id from dual
  14.   union all
  15.   select '3' as flow_out_id,'4' as flow_in_id from dual
  16.   union all
  17.   select '3' as flow_out_id,'5' as flow_in_id from dual
  18.   union all
  19.   select '3' as flow_out_id,'6' as flow_in_id from dual
  20.   union all
  21.   select '5' as flow_out_id,'6' as flow_in_id from dual
  22. )tmp;

数据对应的graph结构如下图:


运行结果


设定k = 2:运行结果:结果如下:<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. +-------+-------+
  2. | node1 | node2 |
  3. +-------+-------+
  4. | 1     | 2     |
  5. | 1     | 3     |
  6. | 1     | 4     |
  7. | 2     | 1     |
  8. | 2     | 3     |
  9. | 2     | 4     |
  10. | 3     | 1     |
  11. | 3     | 2     |
  12. | 3     | 4     |
  13. | 4     | 1     |
  14. | 4     | 2     |
  15. | 4     | 3     |
  16. +-------+-------+


pai命令示例

<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. pai -name KCore
  2.     -project algo_public
  3.     -DinputEdgeTableName=KCore_func_test_edge
  4.     -DfromVertexCol=flow_out_id
  5.     -DtoVertexCol=flow_in_id
  6.     -DoutputTableName=KCore_func_test_result
  7.     -Dk=2;


算法参数


参数key名称参数描述必/选填默认值
inputEdgeTableName输入边表名必填-
inputEdgeTablePartitions输入边表的分区选填全表读入
fromVertexCol边表中起点所在列必填-
toVertexCol边表中终点所在列必填-
outputTableName输出表名必填-
outputTablePartitions输出表的分区选填-
lifecycle输出表申明周期选填-
workerNum进程数量选填未设置
workerMem进程内存选填4096
splitSize数据切分大小选填64
k核数必填3


单源最短路径



功能介绍

  • 单源最短路径参考Dijkstra算法,本算法中当给定起点,则输出该点和其他所有节点的最短路径。


参数设置


起始节点id:用于计算最短路径的起始节点,必填


实例



测试数据


新建数据的SQL语句:
<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. drop table if exists SSSP_func_test_edge;
  2. create table SSSP_func_test_edge as
  3. select
  4.     flow_out_id,flow_in_id,edge_weight
  5. from
  6. (
  7.     select "a" as flow_out_id,"b" as flow_in_id,1.0 as edge_weight from dual
  8.     union all
  9.     select "b" as flow_out_id,"c" as flow_in_id,2.0 as edge_weight from dual
  10.     union all
  11.     select "c" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
  12.     union all
  13.     select "b" as flow_out_id,"e" as flow_in_id,2.0 as edge_weight from dual
  14.     union all
  15.     select "e" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
  16.     union all
  17.     select "c" as flow_out_id,"e" as flow_in_id,1.0 as edge_weight from dual
  18.     union all
  19.     select "f" as flow_out_id,"g" as flow_in_id,3.0 as edge_weight from dual
  20.     union all
  21.     select "a" as flow_out_id,"d" as flow_in_id,4.0 as edge_weight from dual
  22. ) tmp
  23. ;

数据对应的graph结构:

运行结果

<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. 结果如下:
  2. +------------+------------+------------+--------------+
  3. | start_node | dest_node  | distance   | distance_cnt |
  4. +------------+------------+------------+--------------+
  5. | a          | b          | 1.0        | 1            |
  6. | a          | c          | 3.0        | 1            |
  7. | a          | d          | 4.0        | 3            |
  8. | a          | a          | 0.0        | 0            |
  9. | a          | e          | 3.0        | 1            |
  10. +------------+------------+------------+--------------+


pai命令示例

<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. pai -name SSSP
  2.     -project algo_public
  3.     -DinputEdgeTableName=SSSP_func_test_edge
  4.     -DfromVertexCol=flow_out_id
  5.     -DtoVertexCol=flow_in_id
  6.     -DoutputTableName=SSSP_func_test_result
  7.     -DhasEdgeWeight=true
  8.     -DedgeWeightCol=edge_weight
  9.     -DstartVertex=a;


算法参数


参数key名称参数描述必/选填默认值
inputEdgeTableName输入边表名必填-
inputEdgeTablePartitions输入边表的分区选填全表读入
fromVertexCol输入边表的起点所在列必填-
toVertexCol输入边表的终点所在列必填-
outputTableName输出表名必填-
outputTablePartitions输出表的分区选填-
lifecycle输出表申明周期选填-
workerNum进程数量选填未设置
workerMem进程内存选填4096
splitSize数据切分大小选填64
startVertex起始节点ID必填-
hasEdgeWeight输入边表的边是否有权重选填false
edgeWeightCol输入边表边的权重所在列选填-


PageRank



功能介绍

  • PageRank起于网页的搜索排序,google利用网页的链接结构计算每个网页的等级排名,其基本思路是:如果一个网页被其他多个网页指向,这说明该网页比较重要或者质量较高。除考虑网页的链接数量,还考虑网页本身的权重级别,以及该网页有多少条出链到其它网页。 对于用户构成的人际网络,除了用户本身的影响力之外,边的权重也是重要因素之一。例如:新浪微博的某个用户,会更容易影响粉丝中关系比较亲密的家人、同学、同事等,而对陌生的弱关系粉丝影响较小。在人际网络中,边的权重等价为用户-用户的关系强弱指数。带连接权重的PageRank公式为:其中,w(i)为节点i的权重,c(A,i)为链接权重,d为阻尼系数,算法迭代稳定后的节点权重W即为每个用户的影响力指数。


参数设置


最大迭代次数:算法自身会收敛并停止迭代,选填,默认30


实例



测试数据


新建数据的SQL语句:<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. drop table if exists PageRankWithWeight_func_test_edge;
  2. create table PageRankWithWeight_func_test_edge as
  3. select * from
  4. (
  5.     select 'a' as flow_out_id,'b' as flow_in_id,1.0 as weight from dual
  6.     union all
  7.     select 'a' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
  8.     union all
  9.     select 'b' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
  10.     union all
  11.     select 'b' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
  12.     union all
  13.     select 'c' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
  14. )tmp
  15. ;

对应的graph结构:

运行结果

<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. 结果如下:
  2. +------+------------+
  3. | node | weight     |
  4. +------+------------+
  5. | a    | 0.0375     |
  6. | b    | 0.06938    |
  7. | c    | 0.12834    |
  8. | d    | 0.20556    |
  9. +------+------------+


pai命令示例

<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
  1. pai -name PageRankWithWeight
  2.     -project algo_public
  3.     -DinputEdgeTableName=PageRankWithWeight_func_test_edge
  4.     -DfromVertexCol=flow_out_id
  5.     -DtoVertexCol=flow_in_id
  6.     -DoutputTableName=PageRankWithWeight_func_test_result
  7.     -DhasEdgeWeight=true
  8.     -DedgeWeightCol=weight
  9.     -DmaxIter 100;


算法参数


参数key名称参数描述必/选填默认值
inputEdgeTableName输入边表名必填-
inputEdgeTablePartitions输入边表的分区选填全表读入
fromVertexCol输入边表的起点所在列必填-
toVertexCol输入边表的终点所在列必填-
outputTableName输出表名必填-
outputTablePartitions输出表的分区选填-
lifecycle输出表申明周期选填-
workerNum进程数量选填未设置
workerMem进程内存选填4096
splitSize数据切分大小选填64
hasEdgeWeight输入边表的边是否有权重选填false
edgeWeightCol输入边表边的权重所在列选填-
maxIter最大迭代次数选填30


展开
收起
nicenelly 2017-10-25 10:56:41 1960 0
0 条回答
写回答
取消 提交回答
问答排行榜
最热
最新

相关电子书

更多
关系网络分析(I+) 立即下载
云网分析与可视化——发掘网络数据的真正价值 立即下载
云网分析与可视化-发掘网络数据的真正价值 立即下载