目录
网络分析栏提供的都是基于Graph数据结构的分析算法;下图是使用平台网络分析组件构建的一个分析流程实例:
网络分析栏的算法组件都需要设置运行参数,参数说明如下:进程数:参数代号workerNum,用于设置作业并行执行的节点数;数字越大并行度越高,但框架通讯开销会增大。进程内存:参数代号workerMem,用于设置单个 worker可使用的最大内存量,默认每个worker分配4096内存;实际使用内存超过该值,会抛出OutOfMemory异常。
k-Core
功能介绍
- 一个图的KCore是指反复去除度小于或等于k的节点后,所剩余的子图。若一个节点存在于KCore,而在(K+1)CORE中被移去,那么此节点的核数(coreness)为k。因此所有度为1的节点的核数必然为0,节点核数的最大值被称为图的核数。
参数设置
k:核数的值,必填,默认3
实例
测试数据
新建数据SQL
<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- drop table if exists KCore_func_test_edge;
- create table KCore_func_test_edge as
- select * from
- (
- select '1' as flow_out_id,'2' as flow_in_id from dual
- union all
- select '1' as flow_out_id,'3' as flow_in_id from dual
- union all
- select '1' as flow_out_id,'4' as flow_in_id from dual
- union all
- select '2' as flow_out_id,'3' as flow_in_id from dual
- union all
- select '2' as flow_out_id,'4' as flow_in_id from dual
- union all
- select '3' as flow_out_id,'4' as flow_in_id from dual
- union all
- select '3' as flow_out_id,'5' as flow_in_id from dual
- union all
- select '3' as flow_out_id,'6' as flow_in_id from dual
- union all
- select '5' as flow_out_id,'6' as flow_in_id from dual
- )tmp;
数据对应的graph结构如下图:
运行结果
设定k = 2:运行结果:结果如下:<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- +-------+-------+
- | node1 | node2 |
- +-------+-------+
- | 1 | 2 |
- | 1 | 3 |
- | 1 | 4 |
- | 2 | 1 |
- | 2 | 3 |
- | 2 | 4 |
- | 3 | 1 |
- | 3 | 2 |
- | 3 | 4 |
- | 4 | 1 |
- | 4 | 2 |
- | 4 | 3 |
- +-------+-------+
pai命令示例
<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- pai -name KCore
- -project algo_public
- -DinputEdgeTableName=KCore_func_test_edge
- -DfromVertexCol=flow_out_id
- -DtoVertexCol=flow_in_id
- -DoutputTableName=KCore_func_test_result
- -Dk=2;
算法参数
单源最短路径
功能介绍
- 单源最短路径参考Dijkstra算法,本算法中当给定起点,则输出该点和其他所有节点的最短路径。
参数设置
起始节点id:用于计算最短路径的起始节点,必填
实例
测试数据
新建数据的SQL语句:
<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- drop table if exists SSSP_func_test_edge;
- create table SSSP_func_test_edge as
- select
- flow_out_id,flow_in_id,edge_weight
- from
- (
- select "a" as flow_out_id,"b" as flow_in_id,1.0 as edge_weight from dual
- union all
- select "b" as flow_out_id,"c" as flow_in_id,2.0 as edge_weight from dual
- union all
- select "c" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
- union all
- select "b" as flow_out_id,"e" as flow_in_id,2.0 as edge_weight from dual
- union all
- select "e" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
- union all
- select "c" as flow_out_id,"e" as flow_in_id,1.0 as edge_weight from dual
- union all
- select "f" as flow_out_id,"g" as flow_in_id,3.0 as edge_weight from dual
- union all
- select "a" as flow_out_id,"d" as flow_in_id,4.0 as edge_weight from dual
- ) tmp
- ;
数据对应的graph结构:
运行结果
<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- 结果如下:
- +------------+------------+------------+--------------+
- | start_node | dest_node | distance | distance_cnt |
- +------------+------------+------------+--------------+
- | a | b | 1.0 | 1 |
- | a | c | 3.0 | 1 |
- | a | d | 4.0 | 3 |
- | a | a | 0.0 | 0 |
- | a | e | 3.0 | 1 |
- +------------+------------+------------+--------------+
pai命令示例
<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- pai -name SSSP
- -project algo_public
- -DinputEdgeTableName=SSSP_func_test_edge
- -DfromVertexCol=flow_out_id
- -DtoVertexCol=flow_in_id
- -DoutputTableName=SSSP_func_test_result
- -DhasEdgeWeight=true
- -DedgeWeightCol=edge_weight
- -DstartVertex=a;
算法参数
PageRank
功能介绍
- PageRank起于网页的搜索排序,google利用网页的链接结构计算每个网页的等级排名,其基本思路是:如果一个网页被其他多个网页指向,这说明该网页比较重要或者质量较高。除考虑网页的链接数量,还考虑网页本身的权重级别,以及该网页有多少条出链到其它网页。 对于用户构成的人际网络,除了用户本身的影响力之外,边的权重也是重要因素之一。例如:新浪微博的某个用户,会更容易影响粉丝中关系比较亲密的家人、同学、同事等,而对陌生的弱关系粉丝影响较小。在人际网络中,边的权重等价为用户-用户的关系强弱指数。带连接权重的PageRank公式为:其中,w(i)为节点i的权重,c(A,i)为链接权重,d为阻尼系数,算法迭代稳定后的节点权重W即为每个用户的影响力指数。
参数设置
最大迭代次数:算法自身会收敛并停止迭代,选填,默认30
实例
测试数据
新建数据的SQL语句:<divre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- drop table if exists PageRankWithWeight_func_test_edge;
- create table PageRankWithWeight_func_test_edge as
- select * from
- (
- select 'a' as flow_out_id,'b' as flow_in_id,1.0 as weight from dual
- union all
- select 'a' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
- union all
- select 'b' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
- union all
- select 'b' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
- union all
- select 'c' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
- )tmp
- ;
对应的graph结构:
运行结果
<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- 结果如下:
- +------+------------+
- | node | weight |
- +------+------------+
- | a | 0.0375 |
- | b | 0.06938 |
- | c | 0.12834 |
- | d | 0.20556 |
- +------+------------+
pai命令示例
<pre style='background: rgb(246, 246, 246); font: 12px/1.6 "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin: 0px 0px 16px; padding: 10px; outline: 0px; border-radius: 3px; border: 1px solid rgb(221, 221, 221); color: rgb(51, 51, 51); text-transform: none; text-indent: 0px; letter-spacing: normal; overflow: auto; word-spacing: 0px; white-space: pre-wrap; word-wrap: break-word; box-sizing: border-box; orphans: 2; widows: 2; font-size-adjust: none; font-stretch: normal; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;' prettyprinted?="" linenums="">
- pai -name PageRankWithWeight
- -project algo_public
- -DinputEdgeTableName=PageRankWithWeight_func_test_edge
- -DfromVertexCol=flow_out_id
- -DtoVertexCol=flow_in_id
- -DoutputTableName=PageRankWithWeight_func_test_result
- -DhasEdgeWeight=true
- -DedgeWeightCol=weight
- -DmaxIter 100;
算法参数