@mastixmc/sitemapper
Parser for XML Sitemaps to be used with Robots.txt and web crawlers. (Extended version by mastixmc)
Last updated a year ago by mastixmc .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install @mastixmc/sitemapper 
SYNC missed versions from official npm registry.

Sitemapper - Extended version

This is a fork from https://github.com/cabbiepete/sitemapper, but adds the following features:

  • Allows loading of sitemap.xml.gz files
  • Increases default timeout
  • Allows to filter by lastmod date
  • Added URL filter to filter all returned URLs

Original description

Parse through a sitemaps xml to get all the urls for your crawler.

Version 3

Installation

npm install @mastixmc/sitemapper

Simple Example

const Sitemapper = require('sitemapper');

const sitemap = new Sitemapper();

sitemap.fetch('http://wp.seantburke.com/sitemap.xml').then(function(sites) {
  console.log(sites);
});

Examples in ES5

const Sitemapper = require('sitemapper');

const Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000, //15 seconds
  lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '5',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
  },
  urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX

});

Google.fetch()
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });


// or


const sitemap = new Sitemapper();

sitemapper.timeout = 5000;
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });

Examples in ES6

import Sitemapper from 'sitemapper';

const Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000, // 15 seconds
  lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '3',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
  },
    urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX
});

Google.fetch()
  .then(data => console.log(data.sites))
  .catch(error => console.log(error));


// or


const sitemapper = new Sitemapper();
sitemapper.timeout = 5000;
sitemapper.lastmod = { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '14',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
};
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
  .then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
  .catch(error => console.log(error));

Current Tags

  • 3.2.0                                ...           latest (a year ago)

3 Versions

  • 3.2.0                                ...           a year ago
  • 3.1.0                                ...           a year ago
  • 3.0.2                                ...           a year ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 1
Last Month 1
Dependencies (7)
Dev Dependencies (8)
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |