hapi-goldwasher
A plugin for Hapi.js to run goldwasher as a scraping API on the web.
Last updated 5 years ago by alexlangberg .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install hapi-goldwasher 
SYNC missed versions from official npm registry.

hapi-goldwasher

npm version Build Status Coverage Status Code Climate

Dependency Status devDependency Status

A plugin for hapi to run goldwasher as a scraping API on the web. Basically a scraper proxy that will return information in the selected format, defaulting to JSON.

Installation

npm install hapi-goldwasher

If you aren't already running a hapi server, you need to install this too, to run the example:

npm install hapi

Options

When registering the plugin with hapi, you have several options, non of them required:

  • path - the endpoint you mount the plugin on. Defaults to /goldwasher.
  • maxRedirects - the maximum number of redirects the scraper will accept before giving up. Defaults to 5.
  • cors - a CORS object. Defaults to false. See hapi docs for more information.
  • raw - enable raw output mode. This will enable output=raw that will return the raw, scraped result, usually HTML.

Parameters

  • url - url to scrape. Required.
  • selector - cheerio (jQuery) selector, a selection of target tags. Defaults to the default of goldwasher, usually 'h1, h2, h3, h4, h5, h6, p'.
  • search - only pick results containing these terms. Not case or special character sensitive.
  • limit - limit number of results.
  • output - output format (json, xml, atom, rss or - if enabled - raw).
  • filterTexts - stop texts that should be excluded.
  • filterKeywords - stop words that should be excluded as keywords.
  • filterLocale - stop words from external JSON file (see documentation on goldwasher)).

Example

var Hapi = require('hapi');
var HapiGoldwasher = require('./index');

var server = new Hapi.Server();
server.connection({ port: 7979 });

server.register({
  register: HapiGoldwasher,
  options: {
    path: '/goldwasher',
    cors: {
      origin: ['*']
    }
  }
}, function(err) {
  if (err) {
    throw err;
  }

  server.start(function() {
    console.log('Server running at: ' + server.info.uri);
  });
});

Go to the server uri and you will be presented with a JSON response containing documentation. I recommend using something like the Chrome JSON Formatter for readability.

Current Tags

  • 1.0.4                                ...           latest (5 years ago)

5 Versions

  • 1.0.4                                ...           5 years ago
  • 1.0.3                                ...           5 years ago
  • 1.0.2                                ...           5 years ago
  • 1.0.1                                ...           5 years ago
  • 1.0.0                                ...           5 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 0
Last Month 1
Dependencies (5)
Dev Dependencies (16)
Dependents (2)

Copyright 2014 - 2016 © taobao.org |