@ascari/reco
Generic text classifier for Mexican electronic invoices.
Last updated 3 years ago by ascari .
MIT · Original npm · Tarball · package.json
$ cnpm install @ascari/reco 
SYNC missed versions from official npm registry.

reco

A text recognition engine that classifies concept descriptions found in electronic invoices used in Mexico.

Installation

Command Line Utility

You must install reco globally to use the command line interface.

npm i @ascari/reco -g

Module

npm i @ascari/reco --save

USAGE

Command Line Utility

Create reco.json
reco init

A ./reco.json file will be created with default values. By default it uses a sqlite3 database.

You may edit the configuration now.

Scaffold a new project

Requires a valid ./reco.json file to be present.

reco create

A database will be scaffolded in the current directory.

By default it creates a ./database folder where a sqlite3 database file will be stored.

You may edit the first migration: ./database/migrations/0.js to better accomodate your database structure. Keep in mind that the autogenerated tables and columns are required.


NOTE The following commands can only be called after creating a project.


Add a single invoice

Load xml, parse information and store unique: suppliers, clients & concepts found.

reco xml path/to/invoice.xml

Note Ideally, valid SAT invoices should be fed to reco, however reco does not verify its integrity, this means you can feed non-compliant xml invoices as well, as long as they follow a similar structure:

<?xml version="1.0" encoding="utf-8"?>
<Comprobante fecha="{{INVOICE_DATE}}" sello="{{SELLO_DIGITAL}}">
  <Emisor rfc="{{CLIENT_RFC}}" name="{{CLIENT_NAME}}" />
  <Receptor rfc="{{SUPPLIER_RFC}}" name="{{SUPPLIER_NAME}}" />
  <Conceptos>
    <Concepto descripcion="{{CONCEPT_DESCRIPTION_A}}" />
    <Concepto descripcion="{{CONCEPT_DESCRIPTION_B}}" />
  </Conceptos>
</Comprobante>
Add all invoices in a folder

Load and store all invoice files found in a folder.

reco xmls path/to/invoices

Currenly, reco cannot read folders recursively

Label a concept for training

Will create a label that is used to train a classifier.

reco label "LABEL" "CONCEPT"

You may use the --rfc option to scope a label to a supplier.

reco label "LABEL" "CONCEPT" --rfc XXX0123456X7

Scoping a label improves recognition accuracy. The imporovment comes from weighting higher classifications that belong to a supplier when a recognition test is also scoped to a supplier. The reasoning being that a supplier will generally have their own unique set of concepts for their products and or services, that will more likely match a label scoped to the same supplier.

In other words, a Pizza supplier will tend to better identify concepts with the word "pizza", since its products have the word "pizza" in them, when we are classifying a concept from the Pizza supplier. Otherwise, a Toy supplier with toy pizza games may rank higher.

One more time: When we know a invoice concept comes from a certain supplier, it is better to test it against a classifier that has only been trained on its own invoice concepts and labels from the same supplier.

Add all labels found in a file

Add labels found in a list.

reco labels path/to/labels.lst

Labels are seperated by new lines, where labels and concept are seperated by a (:) colon.

example:

apple:I WANT AN APPLE
orange:ORANGE YOU GLAD?
lemon:EAT SOME LEMON PIE

You may use the -v option to see progress.

You may use the --rfc option to scope labels to a supplier.

You may use the --delim option to specify a different delimeter.

You mau use the --no-delim option to specify that the list is not delimeted, that is, it does not have a label and a concept, instead the label and the concept are the same.

This is usefull for adding a supplier's catalog, when identifying their concepts.

Train classifiers

Train classifiers.

Be patient, it may take a while.

reco train
Test an arbitrary concept

Test recognition by classifying a specified concept. Will return label with the best score.

reco test "CONCEPT"

You may use the --rfc option to scope test to a supplier.

You may use the -v option to return classification information. 10 rows are returned by default, if you specify a number: -v 20, you can specify how many classification rows to return, ordered from best match to least.

Module

const Reco = require('reco');

// Reco configuration
const recoConfig = { .... };

// instanciate
const reco = new Reco(recoConfig);

API

Reco::contructor(recoConfig);

Where recoConfig can have the follwing options:

Note the database property is fed to knex.

{
  database: {
    client: 'sqlite3',
    connection: {
      filename: './database/database.sqlite'
    },
    migrations: {
      tableName: 'migrations',
      directory: './database/migrations',
      stub: './database/stub.migration.js'
    },
    seeds: {
      directory: './database/seeds',
    },
    useNullAsDefault: true,
  },
}

Promise Reco::addLabel(String label, String concept, String [supplierRfc=null]);

Add a label.

Promise Reco::addXmlInvoice(String xml);

Add an invoice.

Promise Reco::train();

Train classifiers.

Promise Reco::test(String input, String [supplierRfc=null]);

Test classifiers

License

See LICENSE in respository.

Current Tags

  • 1.0.1                                ...           latest (3 years ago)

2 Versions

  • 1.0.1                                ...           3 years ago
  • 1.0.0                                ...           3 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 2
Last Month 4
Dependencies (7)
Dev Dependencies (0)
None
Dependents (0)
None

Copyright 2014 - 2017 © taobao.org |