article-extractor
Extract metadata and content from web articles.
Last updated 5 years ago by thomastuts .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install article-extractor 
SYNC missed versions from official npm registry.

article-extractor

A Node.js module to retrieve article content and metadata from a URL.

This module is under heavy development! Its quality and API will probably change a lot, so keep an eye out for any changes.

To see what features are coming up next, or if you'd like to suggest one yourself, go here: https://github.com/thomastuts/article-extractor/issues/3

Demo

You can see article-extractor in action here:

GET http://article-extractor.thomastuts.com/parse?url=AN_ARTICLE_URL

Installation

npm install --save article-extractor

Extracting data

var extractor = require('article-extractor');

extractor.extractData('http://paulgraham.com/altair.html', function (err, data) {
  console.log(data);
});

Extract result

The result looks like this:

{
    "domain": "thomastuts.com",
    "author": "Thomas Tuts",
    "title": "Article Extractor Demo",
    "summary": "A Node.js module to retrieve article content and metadata from a URL.",
    "content": "<p>This is the article content.</p>"
}

Current Tags

  • 1.0.2                                ...           latest (5 years ago)

2 Versions

  • 1.0.2                                ...           5 years ago
  • 1.0.1                                ...           5 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 0
Last Month 0
Dependencies (4)
Dev Dependencies (1)
Dependents (1)

Copyright 2014 - 2016 © taobao.org |