github: https://github.com/codelucas/newspaper
安装
pip3 install newspaper3k
代码示例
# -*- coding: utf-8 -*- from newspaper import Article url = "https://news.sina.com.cn/" article = Article(url) article.download() article.parse() print(article.title) print(article.authors) print(article.publish_date) print(article.top_image) print(article.text[:50])
解析的结果和新闻页面显示的信息基本一致,如果是简单处理新闻应该可以了