context-url-extractor
Methods for extracting URLs from HTML or text strings with surrounding context
Last updated a year ago by njhoran .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install context-url-extractor 
SYNC missed versions from official npm registry.

context-url-extractor

standard-readme compliant

Methods for extracting URLs from HTML or text strings with surrounding context

When data mining content that contains URLs, it's far easier for a machine to categorise them if they are semantic (or friendly) URLs:

Bad URL

Good URL

This package provides a jumping off point for data mining the surrounding context of each URL found in the supplied content.

Table of Contents

Install

npm install --save context-url-extractor

Usage

const extractor = new ContextUrlExtractor({ content });
const res = extractor.extractUrls();

Custom Context Lengths

The default pre and post context string lengths are set to 170 characters, but this can be overridden in the constructor.

const extractor = new ContextUrlExtractor({ content, contextCharsBefore: 80, contextCharsAfter: 80 });

Example Response

[
	{
		"url": "https://example.com/profile.aspx?section=99&trId=9877A4CF44987123AED90&rd=722108935",
		"contextPre": "nd. To log in to your profile please <a href=\"",
		"contextPost": "\">click here</a> and sign in with your email "
	}
]

Maintainers

@njhoran

Contributing

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT © 2019 njhoran

Current Tags

  • 1.0.1                                ...           latest (a year ago)

2 Versions

  • 1.0.1                                ...           a year ago
  • 1.0.0                                ...           a year ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 0
Last Month 0
Dependencies (3)
Dev Dependencies (1)
Dependents (0)
None

Copyright 2014 - 2017 © taobao.org |