基本语法:
属性定位: #找到class属性值为song的div标签 //div[@class="song"] 层级&索引定位: #找到class属性值为tang的div的直系子标签ul下的第二个子标签li下的直系子标签a //div[@class="tang"]/ul/li[2]/a 逻辑运算: #找到href属性值为空且class属性值为du的a标签 //a[@href="" and @class="du"] 模糊匹配: //div[contains(@class, "ng")] //div[starts-with(@class, "ta")] 取文本: # /表示获取某个标签下的文本内容 # //表示获取某个标签下的文本内容和所有子标签下的文本内容 //div[@class="song"]/p[1]/text() //div[@class="tang"]//text() 取属性: //div[@class="tang"]//li[2]/a/@href
View Code
怎么玩呢?
# xpath学习: # # pip install lxml from lxml import etree import requests # 爬取网址为网易云音乐 url = "https://music.163.com/discover/playlist/?cat=摇滚" res = requests.get(url,headers=headers) # etree.parse("D://a.html") # 本地 tree = etree.HTML(res.text) # 网上的页面 playlist_img_src = tree.xpath("//div[@class='u-cover u-cover-1']/img/@src ") playlist_title = tree.xpath("//div[@class='u-cover u-cover-1']/a/@title") playlist_href = tree.xpath("//div[@class='u-cover u-cover-1']/a/@href") for i in range(len(playlist_href)): print("=" * 20) print(playlist_href[i]) print(playlist_title[i]) print(playlist_img_src[i]) print("=" * 20)