bquery
bquery is a useful node module to fetch web page, which use css selector to fetch and structure this html page content.
Last updated 3 years ago by rickjose .
BSD · Original npm · Tarball · package.json
$ cnpm install bquery 
SYNC missed versions from official npm registry.

bquery

Quick , simple and elegant way to fetch a web documents and structure it.

Installation

Latest release:

$ npm install bquery
var buquery = require("buqery");
bquery.query({
  "url": "https://github.com/",
  "selector": "ul.header-nav.left>li",
  "extract": {
    "title":{},
    "url": {
      "selector": "a",
      "extract": "href"
    }
  }
}).then(function(docs){
  console.log(docs);
  //=> {"results":[{"result":[{"title":"Explore","url":"https://github.com/explore"},{"title":"Features","url":"https://github.com/features"},{"title":"Enterprise","url":"https://enterprise.github.com/"},{"title":"Blog","url":"https://github.com/blog"}]}]}
})

Options

bquery can sutomatic recognition the web document charset, but special circumstances you can also set docuemnt's charset.

var buquery = require("buqery");
bquery.query({
  "url": "https://github.com/",
  "selector": "ul.header-nav.left>li",
  "charset": "utf-8",
  "extract": {
    "title":{},
    "url": {
      "selector": "a",
      "extract": "href"
    }
  }
}).then(function(docs){
  console.log(docs);
})

You can also set the timeout period for the request.

bquery.query({
  "url": "https://github.com/",
  "selector": "ul.header-nav.left>li>a",
  "timeout": 3000
});

Sometimes you need to modify the page content link css, javascript or other content before you fetch the docuemnt content. you can use "preSelect" option.

bquery.query({
  "url": "https://github.com/",
  "selector": "ul.header-nav.left>li",
  "preSelect": function($){   //=> $ is a cheerio object, you can do any operate wich base on cheerio
    $("ul.header-nav.left>li").each(function(i, elem){
      if($("a", elem).text() == "Explore"){
        $(elem).remove()
      }
    });
  },
  "extract": {
    "title":{},
    "url": {
      "selector": "a",
      "extract": "href"
    }
  }
}).then(function(docs){
  console.log(docs); 
  //=>[
  //=>  { title: 'Features', url: 'https://github.com/features' },
  //=>  { title: 'Enterprise', url: 'https://enterprise.github.com/' },
  //=>  { title: 'Blog', url: 'https://github.com/blog' } 
  //=>]
})

you can also use callback to modify selected attribute

{
  "url": "https://github.com/",
  "selector": "ul.header-nav.left>li",
  "extract": {
    "title":{
      "extract": "text",
      "callback": function(txt){
        return "foo_" + txt;
      }
    },
    "url": {
      "selector": "a",
      "extract": "href"
    }
  }
}

Current Tags

  • 0.0.2                                ...           0.0.2 (6 years ago)
  • 0.4.0                                ...           latest (3 years ago)

11 Versions

  • 0.4.0                                ...           3 years ago
  • 0.3.0                                ...           3 years ago
  • 0.2.3                                ...           4 years ago
  • 0.2.2                                ...           4 years ago
  • 0.2.1                                ...           4 years ago
  • 0.2.0                                ...           4 years ago
  • 0.1.0                                ...           4 years ago
  • 0.0.4                                ...           4 years ago
  • 0.0.3                                ...           5 years ago
  • 0.0.2                                ...           6 years ago
  • 0.0.1                                ...           6 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 4
Last Day 0
Last Week 1
Last Month 1
Dependencies (16)
Dev Dependencies (0)
None

Copyright 2014 - 2017 © taobao.org |