scraperscript
ScraperScript is a query language for Web Scraping
Last updated a year ago by tiagodanin .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install scraperscript -g
SYNC missed versions from official npm registry.

ScraperScript

Travis Downloads Node Version XO code style

ScraperScript is a query language for Web Scraping

Installation

Module available through the npm registry. It can be installed using the npm or yarn command line tools.

# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript

Documentation

Use the command scraperscript myfile or server

Example file.

@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string

This return an json:

"error": false,
"errorsMsg": [],
"names": [
	{
		"number": 0,
		"text": "Tiago"
	},
	{
		"number": 0,
		"text": "James"
	}
],
"hasTitle": true,
"title": "my string"

Syntax

Place the URL in the first line: @http://myurl.com

Other lines: - key: query :type

PS: Space is important.

Key

Name

Rules:

  • Use at the beginning of the line
  • Format - key:

Example: - name:

Type

Return type

Rules:

  • Use at the end of the line
  • Format :type

Types:

  • array
  • object
  • boolean
  • string
  • number

Example: :string

Query

String

" my string "

NOTE: "my string" is invalid

Comment

!! my comment in ScrapperScript

Elements

nameOfHtmlElementOne >> nameOfHtmlElementTwo

Map elements [String]

nameOfHtmlElementOne @> nameOfSubHtmlElement

Map elements [Array]

nameOfHtmlElementOne @> [nameOfSubHtmlElement]

Map elements [Object]

nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}

Addition

nameOfHtmlElementOne ++ nameOfHtmlElementTwo

Replace

nameOfHtmlElementOne -- nameOfHtmlElementTwo

Equal comparison or Different

nameOfHtmlElementOne == nameOfHtmlElementTwo

nameOfHtmlElementOne ~= nameOfHtmlElementTwo

OR

nameOfHtmlElementOne || nameOfHtmlElementTwo

Tests

To run the test suite, first install the dependencies, then run test:

# NPM
npm test
# Or Using Yarn
yarn test

Dependencies

  • axios: Promise based HTTP client for the browser and node.js
  • cheerio: Tiny, fast, and elegant implementation of core jQuery designed specifically for the server

Dev Dependencies

  • body-parser: Node.js body parsing middleware
  • express: Fast, unopinionated, minimalist web framework
  • mocha: simple, flexible, fun test framework
  • xo: JavaScript happiness style linter ❤️

Contributors

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.

License

MIT © Tiago Danin

Current Tags

  • 1.0.2                                ...           latest (a year ago)

3 Versions

  • 1.0.2                                ...           a year ago
  • 1.0.1                                ...           2 years ago
  • 0.0.1                                ...           2 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 1
Last Day 0
Last Week 0
Last Month 1
Dependencies (2)
Dev Dependencies (4)
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |