@bodiless/migration-tool
The goal of this tool is to take an existing site (independent of its technology) and convert it static flattened html site. A flattened html site is non-editable site that can be served with no databases with minimal platform requirements. The tool autom
Last updated 8 days ago by npmbodiless .
Apache-2.0 · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install @bodiless/migration-tool 
SYNC missed versions from official npm registry.

Site Migration tool

The goal of this tool is to take an existing site (independent of its technology) and convert it static flattened html site. A flattened html site is non-editable site that can be served with no databases with minimal platform requirements. The tool automates this flattening to reduce the cost of migration of these sites off other technology.

The tool is given URL that will crawl the existing site finding all pages, generates the html, pulls the assets of the sites and outputs the static html site. The tool is provided as part of the BodilessJS project and is incorporated into migration process. The output works in coordination with the BodilessJS Starter Kit to use the same tooling/infrastructure.

Features

  1. Crawls the website and gets a list of website pages

  2. Scrapes website's page html

  3. Scrapes website's assets:

    1. Scripts
    2. Css styles
    3. Images
  4. Scrapes website's meta tags

  5. Provide html transformation rules

    • rules to remove html elements from dom
    • rules to replace html elements with html snippet
  6. Generate BodilessJS pages

  7. Converts html into jsx (jsx is a React extension that allows us to write JavaScript that looks like HTML.)

  8. Breaks monolithic html into pieces

Getting Started

Prerequisites

  • Node.js version >= 10
  • NPM version 6.13.1+ is required.

Installation & Run

  1. Clone this repository: https://github.com/johnsonandjohnson/Bodiless-JS.git

    git clone https://github.com/johnsonandjohnson/Bodiless-JS.git
    cd Bodiless-JS
    
  2. Let's create a site to migrate the files to.

    The full directions can be found here We recommend this path is outside the repository root. In the code example, we used ../NEW_MIGRATED_SITE.

    npm ci
    npm run new ../NEW_MIGRATED_SITE
    cd ../NEW_MIGRATED_SITE
    
    
  3. Prepare the migration-settings.json. All settings can be found in Configuration section

  4. Execute the command to flatten the site

    
    npm run migrate
    
    

Configuration

Using the requirements of the flattened site, prepare a build plan by adjusting migration-settings.json

The following options control how the the site will be flattened.

Options:

  • url

    • Description: Url of website that should be flattened

    • Accepted Formats:: Preface the url with http:// or https:// syntax

    • Examples:: "http://pariet10.ru/" or "https://pariet10.ru/"

    • Restrictions:: None

  • isPage404Disabled

    • Description: When this option is false (by default), all non-existing pages will not be scraped and will redirect to the default 404 page instead.

    • Accepted Formats:: true, false

  • page404Url

    • Description: A specific url from where the default "page not found" page should be flattened. When this option is not specified - a default /404 page will be used.

    • Accepted Formats:: Absolute url to the page.

    • Examples:: "https://pariet10.ru/404-custom-page"

    • Restrictions:: isPage404Disabled option should be set to false.

  • steps

    • Description: Specify a list of steps that should be executed by the tool

    • Recommendation: Specify all steps and set as true.

  • setup

      *  **Description**: Enable/disable cloning and setting up BodilessJS app locally
    
      *  **Accepted Formats:**: "true" or "false"
    
    • scrape

      • Description: Enable/disable site scraping and BodilessJS pages generation

      • Accepted Formats:: "true" or "false"

    • build

      • Description: Enable/disable building of static site

      • Accepted Formats:: "true" or "false"

    • serve

      • Description: Enable/disable serving of static site

      • Accepted Formats:: "true" or "false"

  • crawler

    • Description: Specify configuration for the crawler

    • maxDepth

      • Description: Maximum depth for the crawler to follow links automatically

      • Accepted Formats:: Number > 0

      • Recommendation:: 100, unless there is specific reason to limit. A higher number will allow tool to crawl the entire site.

    • maxConcurrency

      • Description: Maximum number of pages to open concurrently

      • Accepted Formats:: Number > 0

      • Recommendation:: Recommend to use 1 to not impact production sites

    • ignoreRobotsTxt

      • Description: Whether robots.txt should be ignored and whether urls disallowed in robots.txt should be scraped, default to false.

      • Accepted Formats: Boolean (true or false)

      • Recommendation: Recommend to use default value when the whole website is scraped, set this option to true once a disallowed page should be scraped individually.

  • htmltojsx

    • Description: Enable/disable transforming html to jsx

    • Accepted Formats:: "true" or "false"

    • Recommendation:: "true"

  • transformers

    • Description: Specify rules that should be applied to the scraped page html
  • rule

    • Description: The following rules allow you to manipulate the output and either remove or replace html selector components.

    • replace

      • Description: Replace each element in the set of matched elements with the provided new content and return the set of elements that was removed.
    • replaceString

      • Description: Replace string (or regex pattern) in the source html code before parsing.
    • tocomponent

      • Description: Extract matched elements into React components as separate modules
  • Specific configuration parameters for each rule type:

    • selector

      • Description: Selector for the element(s) that should be processed

      • Accepted Formats:: string

    • replacement

      • Description: New html content in replace mode. Name of React component in tojsx mode

      • Accepted Formats:: string

      • Restrictions:: Escape special characters, such as " with \"

    • disableTailwind

      • Description: Disables site tailwind theme. The site tailwind theme is disabled by default, set this variable to false to enable site tailwind theme. You may want to enable the tailwind theme if the migrated site will add new bodiless JS components. Note that enabling the tailwind theme in some cases may interfere with the migrated site's styling.

      • Accepted Formats: true or false

      • Default Value: true

    • allowFallbackHtml

      • Description: Optional setting to push original html body into the page component file and report a message in the output when migration encounters an error from body section of html. If disabled with value false, migration skips generation of JSX on page parsing errors.

      • Accepted Formats: Boolean (true or false)

      • Default Value: true

    • context

      • Description: A list of pages in which the rule should be applied.

      • Accepted Formats:: Please follow minimatch syntax to compose url pattern

  • Examples:

{

"rule":  "replace",

"selector":  "script[src*='cdn.cookielaw.org/consent']",

"replacement":  "<script charset=\"UTF-8\" src=\"https://optanon.blob.core.windows.net/consent/086a2433-54aa-4112-8ba6-331eb1d2fda7-test.js\"></script>",

"context":  "**"

}

Rehydration (convert html elements into React Components) during Site Migration

The following process will rehydrate (or replace specific html element components with React named components.)

In the packages/bodiless-migration-tool/settings.json

Specify tocomponent Rules

  1. selector
  • Description: selector for the element(s) that should be extracted into React components.
  1. replacement
  • Description: name of the React component that should be replaced with.

Note: Normally we extract common elements into shareable React component, for example, Header and Footer component. If there are dynamic elements inside the extracted component, i.e. active menu item with highlighted styles inside Header component, we will need to further process the hydrated component to make menu working.

Example

View examples/settings/to_components.json

Usage

To flatten a website using the tool, run:


npm run migrate

Migration Output

Flattened & Build files/assets
  • The individual pages of the site can found in examples\test-site\src\data\pages

After performing of npm run build:

  • The output of the build site can be found in the examples\test-site\public
  • The assets of the site can be found in the examples\test-site\static
View Migrated Site

Full settings.json Examples

Full examples can be found in examples/settings.

Configure no-scroll for selected anchor element.

Sites, undergoing flattening, may have foldable accordion elements that are implemented with anchor fragment. By default, GatsbyJS navigation compares the change in url and scrolls the page to the location based on the given url hash.

If this is not the desired behavior for migrated page, a user might override scrolling by configure no-scroll-settings.json and place it under [site]/src/@bodiless/gatsby-theme-bodiless/ folder. Behind the scenes, it shadows Gatsby theme packages/gatsby-theme-bodiless/src/no-scroll-settings.json configuration file.

Here's an example of no-scroll-settings.json usages:

{
  "parentSelectors": [
    ".container-classname-1",
    ".container-classname-2"
  ],
  "elementSelectors": [
    ".container-classname .fieldset .field__item > a"
  ],
  "excludeHashes": [
    "hash-to-be-excluded-from-no-scrolling"
  ]
}

Explanation of options:,

  • parentSelectors: Contains a list of selectors that within the selected container element, clicking anchor will disable the page scrolling.

  • elementSelectors: Contains a list of anchor selectors that clicking on these anchors will disable the page scrolling.

  • excludeHashes: Used to define a custom list of hash strings (without "#" character) to be excluded from no-scrolling feature.

Technical Notes

Chosen libraries

Troubleshooting

Current Tags

  • 0.0.48                                ...           latest (8 days ago)

13 Versions

  • 0.0.48                                ...           8 days ago
  • 0.0.47                                ...           a month ago
  • 0.0.45                                ...           2 months ago
  • 0.0.46                                ...           2 months ago
  • 0.0.44                                ...           2 months ago
  • 0.0.43                                ...           3 months ago
  • 0.0.42                                ...           3 months ago
  • 0.0.41                                ...           3 months ago
  • 0.0.40                                ...           3 months ago
  • 0.0.39                                ...           4 months ago
  • 0.0.38                                ...           4 months ago
  • 0.0.37                                ...           4 months ago
  • 0.0.36                                ...           5 months ago
Downloads
Today 0
This Week 0
This Month 22
Last Day 0
Last Week 22
Last Month 34
Dependencies (31)
Dev Dependencies (28)
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |