开发者社区> 问答> 正文

如何使用Guzzlehttp获取给定url的html内容

想爬一个页面的搜索结果,如:https://www.digikey.cn/produc...

使用Guzzlehttp的request方法(设置了headers的user-agent),并不能正确的返回html内容。

在Postman中测试也是无法获取html内容。然后修改为post请求,请求一次,再切换为get请求,就可以正常获取到html内容了。

请问这个是什么原因?

展开
收起
问问小秘 2020-01-09 18:06:22 2240 0
1 条回答
写回答
取消 提交回答
  • 结论:和cookie有关

    使用chrome访问该网页,调试模式Network右键Copy as Curl,获取该链接的CURL请求 curl 'https://www.digikey.cn/products/zh?WT.z_header=search_go&keywords=LTC4366HTS8-2' -H 'authority: www.digikey.cn' -H 'pragma: no-cache' -H 'cache-control: no-cache' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36' -H 'sec-fetch-mode: navigate' -H 'sec-fetch-user: ?1' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3' -H 'sec-fetch-site: none' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8' -H 'cookie: i10c.eac23=1; aa7988=1x1efe; i10c.ss=1578395029519; i10c.uid=1578395029521:7527; i10c.uservisit=1; WC_SESSION_ESTABLISHED=true; WC_PERSISTENT=vj8bgl7bOI0sNNkv%2b90wLjXUBsU%3d%0a%3b2020%2d01%2d07+05%3a03%3a49%2e994%5f1578395029971%2d839539%5f10001%5f%2d1002%2c%2d7%2cCNY%5f10001; WC_AUTHENTICATION-1002=%2d1002%2cxClFneevJTCQwIkhZMqB6nffX7k%3d; WC_ACTIVEPOINTER=%2d7%2c10001; WC_USERACTIVITY_-1002=%2d1002%2c10001%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cuGrL3%2fz%2bLA6Cj2UpqYs7GTu4hXweHTv4JGiLQRB%2bvVrwuriPkes%2bG0mMa9ATRkj7I%2fp4tRL8YqQl%0a0n6p344egXSIC%2bwTN%2bTU9bGZiWRPIoeJLqi7E8nxudlxjxmlNFtlJZpb9S74pbhjOghVFVTVrA%3d%3d; WC_GENERIC_ACTIVITYDATA=[1586660409%3atrue%3afalse%3a0%3a2fCiAHe9f%2fB3aeigtH2AlMBEOVU%3d][com.ibm.commerce.context.audit.AuditContext|1578395029971%2d839539][com.ibm.commerce.store.facade.server.context.StoreGeoCodeContext|null%26null%26null%26null%26null%26null][com.digikey.commerce.context.UserContext|null][CTXSETNAME|Store][com.ibm.commerce.context.globalization.GlobalizationContext|%2d7%26CNY%26%2d7%26CNY][com.ibm.commerce.catalog.businesscontext.CatalogContext|10001%26null%26false%26false%26false][com.ibm.commerce.context.base.BaseContext|10001%26%2d1002%26%2d1002%26%2d1][com.ibm.commerce.context.experiment.ExperimentContext|null][com.ibm.commerce.context.entitlement.EntitlementContext|10001%2610001%26null%26%2d2000%26null%26null%26null][com.ibm.commerce.giftcenter.context.GiftCenterContext|null%26null%26null]; TS01b442d5=01460246b6283137a975546c1a4de95baeb070ad00264b4abe3ad9ef6da800f97f6b7577eba7550d291b8e5ec0faefae132cf97938; EG-U-ID=E60c408796-9df3-41bd-b048-ab8f2c834fd1; EG-S-ID=D745e8198e-8691-4444-a661-854bad8f4ac4; i10c.bdddb=c2-83ab8zltOVCnSalfX7xEpkOEP0NrxIWUxpUPz2FHs8PpZROIRd0GkUDEPaNqPZP2hhPJtwkMq5PsNWxfSYUMkUDEQYUqQUWPyMKJy7PEs1Hr0VIk2T2Exh8JPvTOSIW4tmPgFL7JswNOQRNKNY2i0PDJK10oKN6PymZynhwvn1MlT4LfS8xEprN5HlJlPNRVWnKJYqCJFGug7RNkNZaHnVJQK02lPNqctmPEzTAEsbHqSrgfSYxFNWJEPaNqPjHnkK0Eyv7KQzHq2RNkrAs9pU8KxvXPKNWu8BfgPqCJn2ulSWLfSB09pU8JzvSqfhkPymKKWqCJqwMTQRNkNYc9pUWZkINqPIX2tmPHtvpHn1MlX7Ik2T2E8Glu1bNqPIX2xhPttvCiFwMqNXvfSYxEqPEsSvSqKS7; JSESSIONID=0003fOGwcYs5Ts4eSNqvd0orAqp:-1I3UP2:-1C0DN88; utag_main=v_id:016f7face8a30010fa5d33aa27c903079004f071009dc$_sn:1$_ss:0$_st:1578396940310$ses_id:1578395035813%3Bexp-session$_pn:3%3Bexp-session; website#lang=zh-CN-RMB; TS01d239f3=01460246b628971bd29ddff30d9207f98d6227e48d87844c2d3a89ac10241168bb816dc8b283e5d5a2b5fc70a92eb4c2e45bcca45f' --compressed

    发现有一大长串cookie -H,去掉cookie这部分尝试CURL请求则无返回结果,所以我觉得应该是和cookie有关。

    Guzzlehttp应该有使用cookie请求的选项,可以试试。

    2020-01-09 18:10:23
    赞同 展开评论 打赏
问答标签:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
《零基础HTML入门教程》 立即下载
天猫 HTML5 互动技术实践 立即下载
天猫HTML5互动技术实践 立即下载