本期,我们用python爬虫实现对新浪财经每日资金流入情况的爬取,具体爬虫页面为:
此页面共有228页。
具体爬虫代码为:
import requestsimport jsonimport pandas as pdimport timecookies = { 'U_TRS1': '00000017.b5366def.5ea45f37.86172eca', 'SINAGLOBAL': '123.182.239.181_1587830584.490634', 'SCF': 'AkZTN949870BznlRFWgQ7ZHjP02Kx8MKsgY_bhNIMjeNwZNyD1F500JSpsQhh5ZbhGZtTolEKlGwySyFRCDF6Go.', 'SGUID': '1596948484657_10983414', '_ga': 'GA1.3.998103930.1600149268', 'FINA_V_S_2': 'sz000661,sh601216', '__gads': 'ID=53fc2f31a90cde06-2250dc821bc400fc:T=1602842357:RT=1602842357:S=ALNI_Ma73U07NdNtVxsUT_JZJpGpSqPH8A', 'UOR': ',,', 'Apache': '112.4.54.55_1625385453.469743', 'MONEY-FINANCE-SINA-COM-CN-WEB5': '', 'SFA_version': '2021-04-12%2009%3A00', 'SUB': '_2A25N5Rm_DeRhGeVO6FoT9SfKyz-IHXVukwx3rDV_PUNbm9AKLXLBkW9NTWYxRkMtTWbvGtexXDZzvpie6uSM7Tj4', 'SUBP': '0033WrSXqPxfM725Ws9jqgMF55529P9D9WhBSrc87.WA6LWkHooL27Ag5NHD95Q0eheReo-4So50Ws4Dqcjl9NH.qg4Q9PiaP0.cSoM7', 'ALF': '1656921455', 'U_TRS2': '00000037.d755cf.60e169f0.8eccbb9a', 'hqEtagMode': '1', 'rotatecount': '2', 'ULV': '1625385478403:10:1:1:112.4.54.55_1625385453.469743:1621428546540', 'SR_SEL': '1_511', 'sinaH5EtagStatus': 'y',} headers = { 'Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36', 'Content-type': 'application/x-www-form-urlencoded', 'Accept': '*/*', 'Referer': 'http://vip.stock.finance.sina.com.cn/moneyflow/', 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',} df=pd.DataFrame()for i in range(1,229): params = ( ('page', i), ('num', '20'), ('sort', 'r0_net'), ('asc', '0'), ('bankuai', ''), ('shichang', ''), ) response = requests.get('http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/MoneyFlow.ssl_bkzj_ssggzj', headers=headers, params=params, cookies=cookies, verify=False) rr=json.loads(response.text) df1=pd.DataFrame(rr,columns=["symbol","name","trade","changeratio","turnover","amount","inamount","outamount","netamount","ratioamount","r0_in","r0_out","r0_net","r3_in","r3_out","r3_net","r0_ratio","r3_ratio","r0x_ratio"]) df=pd.concat([df,df1]) time.sleep(5)df
jupyter notebook中的结果为:
把结果保存到Excel中:
df.to_excel('新浪财经资金流向.xls',index=False)
结果如下:
回头再看看网站:
纳尼?被限制访问了,被网站发现了 ,看来爬虫虽好,但也要适量用啊,不能把网站给搞崩了 。Bye