微博（APP）榜单爬虫及数据可视化-阿里云开发者社区

微博（APP）榜单爬虫及数据可视化

2018-08-14 2460

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 前言今天继续APP爬虫，今天爬取的是微博榜单（24小时榜）的数据，采集的字段有：用户id用户地区用户性别用户粉丝微博内容发布时间转发、评论和点赞量该文分以下内容：爬虫代码用户分析微博分析...

前言

今天继续APP爬虫，今天爬取的是微博榜单（24小时榜）的数据，采集的字段有：

用户id
用户地区
用户性别
用户粉丝
微博内容
发布时间
转发、评论和点赞量

该文分以下内容：

爬虫代码
用户分析
微博分析

img_fdb60917401280e7c26bbafb46e51f07.jpe

爬虫代码

import requests
import json
import re
import time
import csv

headers = {
    'Host': 'api.weibo.cn',
    'Connection': 'keep-alive',
    'User-Agent': 'Weibo/29278 (iPhone; iOS 11.4.1; Scale/2.00)'
}

f = open('1.csv','w+',encoding='utf-8',newline='')
writer = csv.writer(f)
writer.writerow(['user_id','user_location','user_gender','user_follower','text','created_time','reposts_count','comments_count','attitudes_count'])

def get_info(url):
    res = requests.get(url,headers=headers)
    print(url)
    datas = re.findall('"mblog":(.*?),"weibo_position"',res.text,re.S)
    for data in datas:
        json_data = json.loads(data+'}')
        user_id = json_data['user']['name']
        user_location = json_data['user']['location']
        user_gender = json_data['user']['gender']
        user_follower = json_data['user']['followers_count']
        text = json_data['text']
        created_time = json_data['created_at']
        reposts_count = json_data['reposts_count']
        comments_count = json_data['comments_count']
        attitudes_count = json_data['attitudes_count']
        print(user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count)
        writer.writerow([user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count])
    time.sleep(5)

if __name__ == '__main__':
    urls = ['https://api.weibo.cn/2/cardlist?gsid=_2A252dh7LDeRxGeNM41oV-S_MzDSIHXVTIhUDrDV6PUJbkdANLVTwkWpNSf8_0j6hqTyDS0clYi-pzwDc2Kd8oj_d&wm=3333_2001&i=b9f7194&b=0&from=1088193010&c=iphone&networktype=wifi&v_p=63&skin=default&v_f=1&s=ef8eeeee&lang=zh_CN&sflag=1&ua=iPhone8,1__weibo__8.8.1__iphone__os11.4.1&ft=11&aid=01AuxGxLabPA7Vzz8ZXBUpkeJqWbJ1woycR3lFBdLhoxgQC1I.&moduleID=pagecard&scenes=0&uicode=10000327&luicode=10000010&count=20&extparam=discover&containerid=102803_ctg1_8999_-_ctg1_8999_home&fid=102803_ctg1_8999_-_ctg1_8999_home&lfid=231091&page={}'.format(str(i)) for i in range(1,16)]
    for url in urls:
        get_info(url)

用户分析

首先对部分用户id进行可视化，字体大一点的是上榜2次的（这次统计中最多上榜的是2次）。

接着对地区进行数据处理，进行统计。可以看出，位于北京的用户是最多的（大V都在北京）。

df['location'] = df['user_location'].str.split(' ').str[0]

接下来看下用户的性别比例：男性用户占多。

最后再看看上榜大V粉丝前十：

微博分析

首先，对时间数据进行处理，取出小时时间段。

接着，我们看看微博点赞前十的用户。

最后，绘制微博文章词云图。

微博（APP）榜单爬虫及数据可视化

前言

爬虫代码

用户分析

微博分析

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

微博（APP）榜单爬虫及数据可视化

前言

爬虫代码

用户分析

微博分析

热门文章

最新文章

相关课程

相关电子书