easy_web_crawler
Web crawler wrapper around puppeteer module to simply the crawling on ajax/java script enabled pages.
Last updated 2 years ago by vivek13186 .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install easy_web_crawler 
SYNC missed versions from official npm registry.

easy_web_crawler Gitter chat

Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use

Features!

  • Support crawling of javascript/ajax pages
  • url filter
  • avoid duplicate urls
  • delay before page load
  • custom data extraction
  • build in spider
  • stop and resume the crawling
  • fast image download

Documentation

Read full documentation here

USAGE

var Scraper = require("easy_web_crawler")

async function main() {

    var scraper = new Scraper();
    scraper.startWithURLs("start_url")
    scraper.allowIfMatches(function (url) { <<some true false logic here>> })
    scraper.enableAutoCrawler(true)
    scraper.saveProgressInFile("hello.db")
    scraper.waitBetweenPageLoad(0)
    scraper.callbackOnPageLoad(async function (page) {
        <<logic here>>
    });
    scraper.callbackOnFinish(function (result) {
        console.log(JSON.stringify(result,null,4))
    })
    await scraper.start()
}

main()

License

MIT

Current Tags

  • 1.0.6                                ...           latest (2 years ago)

7 Versions

  • 1.0.6                                ...           2 years ago
  • 1.0.5                                ...           2 years ago
  • 1.0.4                                ...           2 years ago
  • 1.0.3                                ...           2 years ago
  • 1.0.2                                ...           2 years ago
  • 1.0.1                                ...           2 years ago
  • 1.0.0                                ...           2 years ago
Maintainers (1)
Downloads
Today 0
This Week 1
This Month 6
Last Day 0
Last Week 1
Last Month 5
Dependencies (4)
Dev Dependencies (0)
None
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |