sitemap-stream-parser
Get a list of URLs from one or more sitemaps
Last updated 4 years ago by evanderkoogh .
Apache-2.0 · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install sitemap-stream-parser 
SYNC missed versions from official npm registry.

node-sitemap-stream-parser

A streaming parser for sitemap files. It is able to deal with GBs of deeply nested sitemaps with hundreds of URLs in them. Maximum memory usage is just over 100Mb at any time.

Usage

The main method to extract URLs for a site is with the parseSitemaps(urls, url_cb, done) method. You can call it with both a single URL or an Array of URLs. The url_cb is called for every URL that is found. The done callback is passed an error and/or a list of all the sitemaps that were checked.

Examples:

var sitemaps = require('sitemap-stream-parser');

sitemaps.parseSitemaps('http://example.com/sitemap.xml', console.log, function(err, sitemaps) {
    console.log('All done!');
});

or

var sitemaps = require('sitemap-stream-parser');

var urls = ['http://example.com/sitemap-posts.xml', 'http://example.com/sitemap-pages.xml'];

all_urls = [];
sitemaps.parseSitemaps(urls, function(url) { all_urls.push(url); }, function(err, sitemaps) {
    console.log(all_urls);
    console.log('All done!');
});

Sometimes sites advertise their sitemaps in their robots.txt file. To parse this file to see if that is the case use the method sitemapsInRobots(url, cb). You can easily combine those 2 methods.

var sitemaps = require('sitemap-stream-parser');

sitemaps.sitemapsInRobots('http://example.com/robots.txt', function(err, urls) {
    if(err || !urls || urls.length == 0)
        return;
    sitemaps.parseSitemaps(urls, console.log, function(err, sitemaps) {
        console.log(sitemaps);
    });
});

Current Tags

  • 1.7.0                                ...           latest (a year ago)

14 Versions

  • 1.7.0                                ...           a year ago
  • 1.6.0                                ...           2 years ago
  • 1.5.1                                ...           2 years ago
  • 1.5.0                                ...           2 years ago
  • 1.4.0                                ...           2 years ago
  • 1.3.0                                ...           2 years ago
  • 1.2.2                                ...           4 years ago
  • 1.2.1                                ...           4 years ago
  • 1.2.0                                ...           4 years ago
  • 1.1.3                                ...           4 years ago
  • 1.1.2                                ...           4 years ago
  • 1.1.1                                ...           4 years ago
  • 1.1.0                                ...           4 years ago
  • 1.0.0                                ...           4 years ago
Maintainers (1)
Downloads
Today 0
This Week 1
This Month 15
Last Day 0
Last Week 0
Last Month 14
Dependencies (3)
Dev Dependencies (1)

Copyright 2014 - 2016 © taobao.org |