今天南京的中学正式开学了,在教师节来临的前一天,看来国家还是为孩子们创造了为老师先祝福的机会呀。
今天我们就来分享一个web小程序,满满的夸夸弹幕献给可爱的育人园丁!
Flask 程序框架
我们还是使用 Flask 作为基本的 web 框架,仅仅需要5行左右的代码,就能完成
from flask import Flask from flask import render_template @app.route('/') def index(): return render_template("index2.html") if __name__ == '__main__': app.run(debug=True)
我们再编写一个带滚动字幕的 HTML 文件,对于滚动字幕,一般都是使用标签 marquee 来实现
我们先输入一些固定的词语,来看下基本效果
<div class="content", id="datatext"> <marquee behavior="scroll">开学啦</marquee> <marquee behavior="alternate">教师节快乐!</marquee> <marquee direction="up">老师</marquee> <marquee direction="down">辛苦了</marquee> <marquee behavior="scroll">幸福不,哦no!</marquee> </div>
然后再增加一些 CSS 效果,基本的 web 页面就完成了
<style> marquee { font-weight: bolder; font-size: 40px; color: white; } .content { margin: 100px auto; width: 500px; height: 300px; background: url("https://imgconvert.csdnimg.cn/aHR0cHM6Ly91cGxvYWQtaW1hZ2VzLmppYW5zaHUuaW8vdXBsb2FkX2ltYWdlcy8yMDE5MDY0MS0zYWE5ZDExOWU3ZTVmODhhLmpwZw?x-oss-process=image/format,png"); border-radius: 24px; position: relative; } ... </style>
我们运行程序,打开页面来看看
好了,下面我们开始获取夸夸的数据,还是从知乎上获取
获取数据
在前面的文章中,我们已经全面的分析过知乎话题的爬取过程了,这里就不再过多赘述,直接上代码
抓取并保存的代码
import requests import re import os import time def get_zhihu(): zhihu_url = "https://www.zhihu.com/api/v4/questions/485491358/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Cpaid_info%2Cpaid_info_content%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cvip_info%2Cbadge%5B%2A%5D.topics%3Bdata%5B%2A%5D.settings.table_of_content.enabled&limit=20&offset=5&platform=desktop&sort_by=default" zhihu_header = { "User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"} res = requests.get(zhihu_url, headers=zhihu_header) return res.json() def filter_str(desstr,restr=''): #过滤除中文以外的其他字符 res = re.compile("[^\u4e00-\u9fa5^,^,^.^。^【^】^(^)^(^)^“^”^-^!^!^?^?^]") return res.sub(restr, desstr) def change_comma(datastr): datastr = datastr.replace(',', ',') return datastr def change_time(time_str): timeStamp = time_str timeArray = time.localtime(timeStamp) otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray) return otherStyleTime def save_answers(data): if not os.path.exists(r'teacher_data.csv'): with open(r"teacher_data.csv", "a+", encoding='utf-8') as f: f.write("用户,回答内容,创建时间,点赞数量,评论数量\n") for i in data["data"]: user = i["author"]["name"] content = change_comma(filter_str(i["content"])) created_time = change_time(i["created_time"]) voteup_count = i["voteup_count"] comment_count = i["comment_count"] row = '{},{},{},{},{}'.format(user,content,created_time,voteup_count,comment_count) f.write(row) f.write('\n') else: with open(r"teacher_data.csv", "a+", encoding='utf-8') as f: for i in data["data"]: user = i["author"]["name"] content = change_comma(filter_str(i["content"])) created_time = change_time(i["created_time"]) voteup_count = i["voteup_count"] comment_count = i["comment_count"] row = '{},{},{},{},{}'.format(user,content,created_time,voteup_count,comment_count) f.write(row) f.write('\n') if __name__ == '__main__': zhihu_data = get_zhihu() save_answers(zhihu_data)
进行分词的代码
import jieba import pandas as pd font = r'C:\Windows\Fonts\FZSTK.TTF' STOPWORDS = {"回复", "@", "我", "她", "你", "他", "了", "的", "吧", "吗", "在", "啊", "不", "也", "还", "是", "说", "都", "就", "没", "做", "人", "赵薇", "被", "不是", "现在", "什么", "这", "呢", "知道", "邓", "我们", "他们", "和", "有", "", "", "要", "就是", "但是", "而", "为", "自己", "中", "问题", "一个", "没有", "到", "这个", "并", "对", "[", "]", "“", "”", ",", "。"} def gen_words(file): df = pd.read_csv(file, usecols=[1]) df_copy = df.copy() df_copy['comment'] = df_copy['回答内容'].apply(lambda x: str(x).split()) # 去掉空格 df_list = df_copy.values.tolist() comment = jieba.cut(str(df_list), cut_all=False) outstr = "" for word in comment: if word not in STOPWORDS: if word != '\t': outstr += word outstr += " " return outstr if __name__ == '__main__': a = gen_words("teacher_data.csv") print(a)
至此,知乎数据的爬取完成
程序完善
最后我们来完成程序,首先编写一个前端获取数据的视图函数
@app.route('/data') def getdata(): teacher_data = gen_word.gen_words(r"C:\Python_project\teacher_day\teacher_data.csv") res = {} data_list = teacher_data.split(" ") index_num = 0 data_l = [] for i in data_list: data_l.append(i) index_num += 1 random_data_l = random.sample(data_l, 5) res['data'] = random_data_l return Response(json.dumps(res))
从我们保存的知乎数据中获取相关的分词信息,然后随机拿出5个词语,返回给前端
在前端代码里,我们使用原始的 AJAX 来进行接口调用
<script type="text/javascript"> function getdata(){ $.ajax({ type: 'GET', url: "http://127.0.0.1:5000/data", dataType: 'json', success: function(data){ var text = ""; var flag = 1; for (var i=0;i<data['data'].length ;i++ ) { if (flag ==1) {text = text+'<marquee behavior="scroll" direction="left" scrollamount="30"><font color="red" size="15px" >'+data['data'][i]+'</font> </marquee>'; } ... flag = flag +1; if (flag ==5) { flag =1; } } document.getElementById("datatext").innerHTML=text; } }); } setInterval("getdata()","5000"); </script>
基本代码含义就是从后台拿到数据之后,根据数据的位置,进行滚动速度,字幕颜色等设置,然后复写到datatext当中,5秒钟更新一次
我们来看下最终的效果吧