模拟登录这块一直没搞过,主要是对 模拟登陆的流程不太熟悉,网上找了好多资料,感觉熟悉个大概,就先用豆瓣 试试。
验证码这一块,现在主要是先把验证码的图片保存下来,手动输入验证码,后期研究下python自动识别验证码。
但是验证码保存成本地图片,看的不不太清楚(有时间在改下),可以把验证码的 url 地址在浏览器中打开,就可以看清楚验证码了。
主要实现 登录豆瓣,并发表一句话
# -*- coding:utf-8 -*- import re import requests from bs4 import BeautifulSoup class DouBan(object): def __init__(self): self.__username = "豆瓣帐号" # 豆瓣帐号 self.__password = "豆瓣密码" # 豆瓣密码 self.__main_url = "https://www.douban.com" self.__login_url = "https://www.douban.com/accounts/login" self.__proxies = { "http": "http://172.17.18.80:8080", "https": "https://172.17.18.80:8080" } self.__headers = { "Host": "www.douban.com", "Origin": self.__main_url, "Referer": self.__main_url, "Upgrade-Insecure-Requests": "1", "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' } self.__data = { "source": "index_nav", "redir": "https://www.douban.com", "form_email": self.__username, "form_password": self.__password, "login": u"登录" } self.__session = requests.session() self.__session.headers = self.__headers self.__session.proxies = self.__proxies pass def login(self): r = self.__session.post(self.__login_url, self.__data) if r.status_code == 200: html = r.content soup = BeautifulSoup(html, "lxml") captcha_address = soup.find('img', id='captcha_image')['src'] print captcha_address # 验证码存在 if captcha_address: # 利用正则表达式获取captcha的ID re_captcha_id = r'<input type="hidden" name="captcha-id" value="(.*?)"/' captcha_id = re.findall(re_captcha_id, html) print captcha_id # 保存到本地 with open('captcha.jpg', 'w') as f: f.write(requests.get(captcha_address, proxies=self.__proxies).content) captcha = raw_input('please input the captcha:') self.__data['captcha-solution'] = captcha self.__data['captcha-id'] = captcha_id r = self.__session.post(self.__login_url, data=self.__data) if r.status_code == 200: print "login success" data = { "ck": "NBJ2", "comment": "模拟登录" } r = self.__session.post(self.__main_url, data=data) print r.status_code else: print "登录不需要验证码" # 不需要验证码的逻辑 和 上面输入验证码之后 的 逻辑 一样 # 此处代码省略 else: print "login fail", r.status_code pass if __name__ == "__main__": t = DouBan() t.login() pass
登录豆瓣帐号,可以看到说了一句话 “模拟登录”