@黄亿华 你好,想跟你请教个问题:我在运行你demo中的HuxiuProcessor类中对xpath修改为://div[@class='article-wrap']/h1[contains(text(),'马云')]后报错如下:
[16/04/27 18:13:31] [ERROR] process request Request{url='http://www.huxiu.com/', method='null', extras={statusCode=200}, priority=0} error
java.lang.NullPointerException
at us.codecraft.xsoup.xevaluator.CombiningEvaluator$And.matches(CombiningEvaluator.java:53)
at us.codecraft.xsoup.xevaluator.CombiningEvaluator$And.matches(CombiningEvaluator.java:53)
at org.jsoup.select.Collector$Accumulator.head(Collector.java:42)
at org.jsoup.select.NodeTraversor.traverse(NodeTraversor.java:31)
at org.jsoup.select.Collector.collect(Collector.java:24)
at us.codecraft.xsoup.xevaluator.DefaultXPathEvaluator.evaluate(DefaultXPathEvaluator.java:29)
at us.codecraft.webmagic.selector.XpathSelector.selectElements(XpathSelector.java:45)
at us.codecraft.webmagic.selector.HtmlNode.selectElements(HtmlNode.java:71)
at us.codecraft.webmagic.selector.HtmlNode.xpath(HtmlNode.java:43)
at cn.bh.webMagic.bh_crawler.pageProcessor.HuxiuProcessor.process(HuxiuProcessor.java:19)
at us.codecraft.webmagic.Spider.processRequest(Spider.java:421)
at us.codecraft.webmagic.Spider$1.run(Spider.java:322)
at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
是不是因为不支持contains呀,如果要支持该如何改代码。
自己找到了解决办法,page.getHtml().xpath("//div[@class='article-wrap']/h1/text()").regex(".马云.").toString()。条条大路通罗马。<imgsrc="http://www.oschina.net/js/ke/plugins/emoticons/images/82.gif"alt="">
同问,好似确实是这样,不过只有text()的时候这样
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。