A simple DOM crawler based on JSON scheme.
Last updated 3 years ago by eces .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install dom-collector 
SYNC missed versions from official npm registry.

DOM Collector

npm version

It simply transforms a given url into key-value organized JSON with specification.


npm install --save dom-collector


Under the hood, it does ...

  • Validate rule specification you passed.

  • Load web page with well-known library request

  • Parse and fetch elements with proved dom selector cheerio; it might be better than jsdom.

  • Filter values and fill the default value configured.

  • Replace collected values into JSON Object, also iterative elements will be into JSON Array.

  • Return a thenable Promise function to be resolved asynchronously.


For this html body

<ul id="content-list">
  <li data-id="1">
    <a href="#"> aaa </a>
  <li data-id="2">
    <a href="#"> bbb </a>
  <li data-id="3">
    <a href="#"></a>

Add a rule below

collector = require 'dom-collector'

rule =
  url: ''
  timeout: 15000
  encoding: 'utf8'
  params: []
    'User-Agent': 'Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10'
  selector: [
      key: 'items[]'
      value: '#content-list li'
      type: 'array'
      default: []
      key: 'items[].label'
      value: 'a'
      type: 'string'
      filter: 'trim'
      default: 'default'
      key: 'items[].src'
      value: '[data-id]'
      type: 'number'

task = collector.fetch_json rule
task.then (result) ->
  console.log result

Then, it brings the result

  "items": [ 
    { "label": "aaa", "src": 1 }
    { "label": "bbb", "src": 2 }
    { "label": "default", "src": 3 }


fetch_json(rule: Object)


Rule(selector) specification


This is DOM selector to find values for key. It supports querySelector and jQuery selector like. When you are supposed to do $('#content') then this value should be #content.


This key will be exposed and created into result JSON. If key has [] array notation, it becomes a parent key and every keys ending with parent[] become children of the parent. If parent key has no entry, children may not resolved from empty array.


string, number, boolean

Please note that the default value will be set if failed type-casting.


This default value will be replaced into value if no element is found, and also

  • when type is string and string length is zero.
  • when type is number and falsy with isFinite; NaN, Infinity, undefined.


This regular expression will be evaluated and return the first value.

100 can be found from <li onclick="contentView(100, 3);"></li> with below matcher:

match: "contentView\\(([0-9]+)\\,"


Reference: eces/dom-collector/src/


70.5M to 70500


1,000,000 to 1000000


"\r\n hello. " to "hello."


value to String(value)


value to Number(value)


value to Boolean(value)

Please be aware of unintended boolean conversion from this reading MDN - Boolean.

The value passed as the first parameter is converted to a boolean value, if necessary. If value is omitted or is 0, -0, null, false, NaN, undefined, or the empty string (""), the object has an initial value of false. All other values, including any object or the string "false", create an object with an initial value of true.

Do not confuse the primitive Boolean values true and false with the true and false values of the Boolean object.

Any object whose value is not undefined or null, including a Boolean object whose value is false, evaluates to true when passed to a conditional statement.


grunt build grunt test




Under MIT License.

Current Tags

  • 1.0.8                                ...           latest (3 years ago)

9 Versions

  • 1.0.8                                ...           3 years ago
  • 1.0.7                                ...           3 years ago
  • 1.0.6                                ...           5 years ago
  • 1.0.5                                ...           5 years ago
  • 1.0.4                                ...           5 years ago
  • 1.0.3                                ...           5 years ago
  • 1.0.2                                ...           5 years ago
  • 1.0.1                                ...           5 years ago
  • 1.0.0                                ...           5 years ago
Maintainers (1)
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 0
Last Month 0
Dependencies (11)
Dev Dependencies (0)
Dependents (0)

Copyright 2014 - 2016 © |