解读实现数据森麟《“水泊梁山“互联网有限公司一百单八将内部社交网络》

+关注继续查看

STEP1：分析原文代码

python包有了，于是安装两个包。安装语句如下：

pip install pandas
pip install pyecharts


PART1代码如下：

with open("水浒传全文.txt", encoding='gb18030') as file:
shuihu = shuihu.replace('\n','')
shuihu_set = shuihu.split(' ')
shuihu_set=[k for k in shuihu_set if k!='']
songjiang_set=[k for k in shuihu_set if '宋江' in k]
haohan['出场段落']=0


songjiang_set=[k for k in shuihu_set if '宋江' in k]


PART2是引用pyecharts包中的Bar模块实现数据展现，这一部分内容比较简单易懂，也非我所长，略过后文再叙。

PART3部分的代码如下：

net_df = pd.DataFrame(columns=['Source','Target','Weight','Source_Ratio','Target_Ratio'])
for i in range(0,107):
for j in range(i+1,108):
this_weight = len([k for k in shuihu_set if haohan['使用名'][i] in k and haohan['使用名'][j] in k])
net_df=net_df.append({'Source':haohan['姓名'][i],'Target':haohan['姓名'][j],
'Weight':this_weight,
'Source_Ratio':this_weight/haohan['出场段落'][i],
'Target_Ratio':this_weight/haohan['出场段落'][j]},
ignore_index=True)
print(str(i)+':'+str(j))


That's all？这样就完了？

STEP2：亮出我的代码

with open(u'水浒传.txt','r') as f:
f.close()
haohan['出场段落'] = haohan.apply(lambda x:len([k for k in shuihu_set if x[u'姓名'] in k]), axis = 1)


STEP3：沿用原来的分析代码生成分析结果。

haohan.sort_values('出场段落',ascending=False,inplace=True)
attr = haohan['姓名'][0:10]
v1 = haohan['出场段落'][0:10]
bar = Bar("水泊梁山年收入TOP10")
bar.render('水泊梁山年收入TOP10.html')

haohan.sort_values('出场段落',ascending=True,inplace=True)
attr = haohan['姓名'][0:10]
v1 = haohan['出场段落'][0:10]
bar = Bar("水泊梁山年收入BOTTOM10")
bar.render('水泊梁山年收入BOTTOM10.html')


STEP4：生成数据及社交网络分析

'Source','Target','Weight','Source_Ratio','Target_Ratio'

net_df.to_csv('sjwl.csv',index=False,sep=',')


import pandas as pd
from pyecharts import Bar
import sys
sys.setdefaultencoding('utf-8')

with open(u'水浒传.txt','r') as f:
f.close()
haohan['出场段落'] = haohan.apply(lambda x:len([k for k in shuihu_set if x[u'姓名'] in k]), axis = 1)

haohan.to_csv('haohan.csv',index=False,sep=',')

haohan.sort_values('出场段落',ascending=False,inplace=True)
attr = haohan['姓名'][0:10]
v1 = haohan['出场段落'][0:10]
bar = Bar("水泊梁山年收入TOP10")
bar.render(u'水泊梁山年收入TOP10.html')

haohan.sort_values('出场段落',ascending=True,inplace=True)
attr = haohan['姓名'][0:10]
v1 = haohan['出场段落'][0:10]
bar = Bar("水泊梁山年收入BOTTOM10")
bar.render(u'水泊梁山年收入BOTTOM10.html')

net_df = pd.DataFrame(columns=['Source','Target','Weight','Source_Ratio','Target_Ratio'])
for i in range(0,107):
for j in range(i+1,108):
this_weight = len([k for k in shuihu_set if haohan['姓名'][i] in k and haohan['姓名'][j] in k])
net_df=net_df.append({'Source':haohan['姓名'][i],'Target':haohan['姓名'][j],
'Weight':this_weight,
'Source_Ratio':this_weight/haohan['出场段落'][i],
'Target_Ratio':this_weight/haohan['出场段落'][j]},
ignore_index=True)
print(str(i)+':'+str(j))

net_df.to_csv('sjwl.csv',index=False,sep=',')


1414 0
MySQL：互联网公司常用分库分表方案汇总

1258 0

971 0
Jdbc操作数据库&nbsp;&nbsp;改
Jdbc操作数据库&nbsp;&nbsp;改
8 0
Jdbc操作数据库&nbsp;&nbsp;删
Jdbc操作数据库&nbsp;&nbsp;删
11 0
+关注

5945

9

+ 订阅

《SaaS模式云原生数据仓库应用场景实践》

《看见新力量：二》电子书