A lazy, fluent web crawler with an async/await API.
Last updated 4 years ago by hunterloftis .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install itemize 
SYNC missed versions from official npm registry.


A lazy, fluent web crawler with an async/await API.

$ yarn add itemize


Itemize lists all of the linked files and pages underneath the specified root URL.

const urls = itemize('', { depth: 2 })

// Get a quick Hacker News sitemap
while (!urls.done()) {

This is useful for writing mirrors, monitoring a page for new content, etc. It starts at the root URL provided and automatically spiders through to find connecting pages. Itemize takes a lazy approach to I/O and only makes requests when you're asking it for more content with next().


itemize(url, options)

Returns an Itemize instance.

  • url: String, the root URL from which to crawl
  • options: Object
    • depth: Number, crawl this many layers deep (0)
const items = itemize('', { depth: 1 })


Returns a Promise for a String, the next linked URL.

If no urls remain, returns a Promise for undefined.

const url = await


Returns a Boolean representing whether or not all spidering routes have been exhausted.

if (items.done()) console.log('crawl complete')


Returns a Promise for an Array of Strings, all of the previously traversed items.

const all = await items.all()


Itemize uses a keepalive HTTP/HTTPS agent. Use close() to destroy the existing underlying socket and create a new Agent with no existing connections.

You should use this to clean up after Itemize instances that haven't completed their crawls.


Tests and Examples

$ yarn test
$ node --harmony examples/hackernews.js
$ node --harmony examples/nodes.js

Current Tags

  • 2.1.0                                ...           latest (4 years ago)

3 Versions

  • 2.1.0                                ...           4 years ago
  • 2.0.0                                ...           4 years ago
  • 1.0.0                                ...           4 years ago
Maintainers (1)
Today 0
This Week 0
This Month 2
Last Day 0
Last Week 1
Last Month 2
Dependencies (2)
Dev Dependencies (3)
Dependents (0)

Copyright 2014 - 2016 © |