readUrl.txt文件示例
#接龙
1. CSDN-亮点 http://t.csdn.cn/DWodz
2. 不知名白帽 http://t.csdn.cn/YO6Sm
3. 编程爱好者-阿新 http://t.csdn.cn/4suuN
4. 一一哥 https://yiyige.blog.csdn.net/article/details/120990448
5. 执久 http://t.csdn.cn/4UCQf
6. 花神庙码农@CSDN
7. 木木 http://t.csdn.cn/aalnU
8. 挽·烽 http://t.csdn.cn/LaZIz
高质量三连回访
9. 六月暴雪飞梨花 http://t.csdn.cn/VqL0s
10. 风铃听雨~ http://t.csdn.cn/9fkAT
11. 东非不开森 http://t.csdn.cn/fZa8s 开学季征文 如有时间愿意看的,可以指点一下嘿嘿 谢谢啦🥰🥰
12. 小明java问道之路 经验文 | 编程的上帝视角是什么?感兴趣的可以看看
硬核深度文 | 精通内核-CPU控制并发原理CPU中断控制
💖在线求个一键三连💖
13. AKA|布鲁克林欧神仙 https://blog.csdn.net/m0_54594153/article/details/126661839?spm=1001.2014.3001.5501高质量三连回访
14. 阿提说说 http://t.csdn.cn/K3KSU
15. DDD666🍭 http://t.csdn.cn/2zn4R
16. 付文龙(爱吃回锅肉)红目香薰 http://t.csdn.cn/kqcPv
17. Bourne http://t.csdn.cn/ndJvc
18. 秦羽 http://t.csdn.cn/nn0cO
19. 宁采桃花不采臣 http://t.csdn.cn/nqgEK
2.Code For Better
20. CSDN-北极的三哈
21. promise https://blog.csdn.net/m0_71485750/article/details/126427221 互三互粉
22. Beyond https://blog.csdn.net/chuxinchangcun/article/details/126681915
编码示例:
import requests import re file = open("readUrl.txt", "r", encoding="utf-8") strListArr = file.readlines() strList = "".join(strListArr) file.close() headers = { "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" } rep="http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+" listUrl = re.findall(rep, strList) list_not_dup = list() for i in listUrl: if i not in list_not_dup: list_not_dup.append(i) for item in list_not_dup: print(item) strUrl = "" for item in list_not_dup: html = requests.get(item, headers).url result = html.split("?") strUrl += result[0] + "\n" file = open("newUrl.txt", "w", encoding="utf-8") file.write(strUrl) file.close()
推荐获取网页URL的正则
"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
解析结果newUrl.txt
https://blog.csdn.net/CSDN_anhl/article/details/126240868 https://blog.csdn.net/m0_63127854/article/details/126682845 https://blog.csdn.net/m0_47419053/article/details/126679490 https://yiyige.blog.csdn.net/article/details/120990448 https://blog.csdn.net/weixin_60719453/article/details/126674166 https://blog.csdn.net/qxhgd/article/details/115391385 https://blog.csdn.net/m0_64102491/article/details/126673956 https://blog.csdn.net/Fire_Cloud_1/article/details/126669683 https://blog.csdn.net/L_Lycos/article/details/126614374 https://blog.csdn.net/muzi_longren/article/details/126654597 https://blog.csdn.net/m0_62159662/article/details/126653214 https://blog.csdn.net/FMC_WBL/article/details/126683043 https://blog.csdn.net/FMC_WBL/article/details/126575914 https://blog.csdn.net/m0_54594153/article/details/126661839 https://blog.csdn.net/weixin_40972073/article/details/126682094 https://blog.csdn.net/BIT_666/article/details/126656554 https://blog.csdn.net/feng8403000/article/details/126674232 https://blog.csdn.net/qq_44631587/article/details/126667516 https://blog.csdn.net/qq_43585922/article/details/126685211 https://blog.csdn.net/m0_65909361/article/details/126599073 https://blog.csdn.net/m0_68744965/article/details/126471630 https://blog.csdn.net/m0_71485750/article/details/126427221 https://blog.csdn.net/chuxinchangcun/article/details/126681915