前言
使用Node做过爬虫的人应该都知道Cheerio.js模块,其快速灵活的机制,使我们只需要了解JQ就可以轻松上手,是在使用node抓取网页数据的过程中不可或缺的一员。
了解了cheerio后,我突发奇想:干脆拿cheerio实现个书签的导入吧,正好可以熟悉一下它的用法,于是早些时候我使用cheerio+node实现了初版的书签导入功能,将浏览器导出的书签通过前端页面上传到服务端,服务端使用cheerio将html解析成JSON文件,通过接口将数据传递到前端。
然而,当时我并不满意,因为就为了一个接口开了一个node服务,是不是有点大材小用了,我能否靠本地缓存实现一个纯前端的书签预览和导入导出功能?
说干就干,导入书签我借助前端的FileReader类,读取HTML文件,然后再使用cheerio将Dom解析成JSON格式的数据,在前端展示成menu形式;导出书签同样使用cheerio根据JSON数据生成对应的Dom数据,通过URL.createObjectURL新建文件的本地url地址,最后使用a标签下载文件
下面我分享一下完整的实现过程及源码
依赖
utils-lib-js模块
cheerio模块
vite:3.1
vue:3.2
element-plus:2.0
概览
这个小案例是基于vite搭建的一个vue-3.0的项目,除了layout之外,案例的核心部分是两个类:
FileSystem和HTMLSystem,前者提供下载,文件读取的功能,后者实现了JSON和HTML互转的功能,除此之外其他的都是常见的布局及组件,所以文章重点描述这两大块
功能实现
FileSystem:
读取文件功能,从element-ui的el-upload组件获取到数据后将结果转换成string格式
下载文件功能,给定url下载静态资源
本地文件转静态地址
import type { UploadFile } from "element-plus/es/components/upload/src/upload.type"; import { defer } from "utils-lib-js"; export type readFileType = 'readAsArrayBuffer' | 'readAsBinaryString' | 'readAsDataURL' | 'readAsText' export declare interface IFileSystem { readFile: (file: UploadFile, type?: readFileType, encoding?: string) => Promise<ProgressEvent<FileReader>> downloadFile: (url: string, name?: string) => void stringToBlobURL: (fileString: string) => string } export class FileSystem implements IFileSystem { /** * @name: * @description: 读取前端上传的文件 * @param {UploadFile} file 文件 * @param {readFileType} type 文件类型 * @param {string} encoding 解码方式 * @return {Promise<ProgressEvent<FileReader>>} */ readFile(file: UploadFile, type: readFileType = 'readAsText', encoding: string = 'utf-8') { const { promise, resolve, reject } = defer() const reader: FileReader = new FileReader(); reader[type](file.raw, encoding) reader.onload = resolve reader.onerror = reject return <Promise<any>>promise } /** * @name: * @description: 下载文件 * @param {string} url 资源目录/网址 * @param {string} name 下载文件名 * @return {*} */ downloadFile(url: string, name: string = 'file.txt') { const link = document.createElement('a') link.href = url link.download = name const _evt = new MouseEvent('click') link.dispatchEvent(_evt) } /** * @name: * @description: 字符串转本地文件 * @param {string} fileString 文件内容 * @return {*} */ stringToBlobURL(fileString: string) { return URL.createObjectURL(new Blob([fileString], { type: "application/octet-stream" })) } }
HTMLSystem:
HTML转JSON函数,解析dom树,生成JSON数据
JSON转HTML函数,通过标准格式生成书签格式的HTML标签
import { load, Cheerio, CheerioAPI, CheerioOptions } from 'cheerio' import { createHtmlFolder, createHtmlFile, createBaseTemp } from '@/config' import { File, Folder } from "@/layout/menu/types"; export declare interface IHTMLSystem<F = Folder | File, T = Cheerio<any>, I = CheerioAPI, FolderList = Array<F>> { count: number resetCount: () => void initHTML: (html: string) => FolderList htmlToJson: (node: T, bookMarks: FolderList) => void addToBookMarks: (node: T, list: FolderList) => unknown getNodeTitle: (node: T) => void getNodeInfo: (node: T, info: File) => File createInitHtml: (temp: string, opt?: CheerioOptions, isDoc?: boolean) => I initJSON: (json: FolderList) => string jsonToHtml: (bookMarks: FolderList, node: I) => string createFolder: (folder: Folder, node: T) => I createFile: (file: File, node: T) => I createElemChild: (node: T) => (it: F, i: number) => void checkIsFileOrFolder: (item: F) => 'folder' | 'file' | 'none' } export class HTMLSystem implements IHTMLSystem { count = 0 /** * @name: * @description: 重置id * @return {*} */ resetCount = () => { this.count = 0; }; /** * @name: * @description: 递增id * @return {*} */ addCount = () => { return this.count++ }; /** * @name: * @description: 初始化html生成器 * @param {string} html 预加载的html字符文件 * @return {Array<Folder | File>} */ initHTML(html: string) { const $ = load(html); const dl = $("dl").first(); const dt = dl.children("dt").eq(0); return this.htmlToJson(dt, []); } /** * @name: * @description: html转Json的递归函数 * @param {Cheerio} node 根节点 * @param {Array} bookMarks JSON数据源 * @return {Array<Folder | File>} */ htmlToJson = (node: Cheerio<any>, bookMarks: Array<Folder | File> = []) => { //下一级文件夹目录列表 const childrenNodeDL = node.children("dl"); const childrenNodeDT = childrenNodeDL.children("dt"); const { item: dir, dirType } = this.addToBookMarks(node, bookMarks) childrenNodeDT.map((i) => { const it = childrenNodeDT.eq(i) dirType === 'file' && this.addToBookMarks(it, dir.children) this.htmlToJson(it, dir.children); }); return bookMarks; }; /** * @name: * @description: 将单个数据添加到JSON中 * @param {Cheerio} node 父节点 * @param {Array} list 书签JSON数据 * @return {<Folder | File>, Array<Folder | File>, 'folder'|'file'} */ addToBookMarks = (node: Cheerio<any>, list: Array<Folder | File> = []) => { const item = this.getNodeTitle(node); const dirType = this.checkIsFileOrFolder(item) switch (dirType) { case "folder": item.children = []; case "file": item.id = this.addCount().toString() list.push(item) break; } return { item, list, dirType } } /** * @name: * @description: 判断单个数据是否是文件夹,并解析详细信息 * @param {Cheerio} node 文件或文件夹所在的节点 * @return {*} */ getNodeTitle = (node: Cheerio<any>) => { const info: any = {}; const title = node.children("h3"); // 如果h3的length为0则不是文件夹,就获取网站名称和网址,否则是文件夹并赋值title, add_date,last_modified return title.length === 0 ? this.getNodeInfo(node, info) : { ...info, title: title.text(), add_date: title.attr("add_date"), last_modified: title.attr("last_modified") }; }; /** * @name: * @description: 解析书签文件详细信息 * @param {Cheerio} node 文件所在的节点 * @return {File} */ getNodeInfo = (node: Cheerio<any>, info: File) => ({ ...info, name: node.children("a").text(), href: node.children("a").attr("href") ?? '', icon: node.children("a").attr("icon") ?? '', add_date: node.children("a").attr("add_date") }) /** * @name: * @description: 入口函数 * @param {Array} json 上面生成的书签JSON文件 * @return {string} */ initJSON(json: Array<Folder | File>) { return this.jsonToHtml(json); } /** * @name: * @description: 生成新标签的CheerioAPI * @param {string} temp 标签 * @param {*} opt Cheerio 配置项 * @param {*} isDoc 是否生成完整的html标签 * @return {CheerioAPI} */ createInitHtml = (temp: string, opt = { xml: true, xmlMode: true }, isDoc = false) => { const $ = load(temp, opt, isDoc); return $ } /** * @name: * @description: JSON转书签的主函数 * @param {Array} bookMarks 书签的JSON数据 * @return {string} */ jsonToHtml = (bookMarks: Array<Folder | File> = []) => { const root = this.createInitHtml(`<div id="root">${createBaseTemp()}</div>`)("#root") bookMarks.forEach(this.createElemChild(root.children().first())) return root.children().toString() } /** * @name: * @description: 递归生成Dom树 * @param {Cheerio} node 父节点 * @return {void} */ createElemChild = (node: Cheerio<any>) => (it: Folder | File) => { const type = this.checkIsFileOrFolder(it) switch (type) { case 'folder': const folder = this.createFolder(it as Folder) node.append(folder("*")) //每次都会获取最后一个标签,将子项放进去,防止标签重复遍历 it.children.forEach(this.createElemChild(node.children("DL").last())) break case 'file': const file = this.createFile(it as File) node.append(file('*')) break case 'none': throw new Error('Item is not Folder or File') } } /** * @name: * @description: 生成文件夹标签 * @param {Folder} folder 文件夹格式的单个数据 * @return {CheerioAPI} */ createFolder = (folder: Folder) => { const init = this.createInitHtml(createHtmlFolder(folder)) return init } /** * @name: * @description: 生成文件标签 * @param {File} file 文件格式的单个数据 * @return {CheerioAPI} */ createFile = (file: File) => { const init = this.createInitHtml(createHtmlFile(file)) return init } /** * @name: * @description: 判断是文件还是文件夹格式的数据 * @param {Folder} item 单个数据 * @return {*} */ checkIsFileOrFolder = (item: Folder | File) => item.title ? 'folder' : item.name ? 'file' : 'none' }
html-config:
此外,生成HTML时,需要一些模板函数
import { File, Folder } from "@/layout/menu/types"; /** * @name: * @description: 书签默认模板 * @param {string} 书签名 * @return {*} */ export const createHtmlTemp = (name: string) => `<!DOCTYPE NETSCAPE-Bookmark-file-1> <!-- This is an automatically generated file. It will be read and overwritten. DO NOT EDIT! --> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"> <TITLE>${name}</TITLE> <H1>${name}</H1> ` /** * @name: * @description: 生成文件夹格式的Dom * @param {Folder} folder 文件夹格式数据 * @return {*} */ export const createHtmlFolder = (folder: Folder) => ` <DT/> <H3 ADD_DATE="${folder.add_date}" LAST_MODIFIED="${folder.last_modified}">${folder.title}</H3> ${createBaseTemp()} ` /** * @name: * @description: 生成文件格式的Dom * @param {File} file 文件格式数据 * @return {*} */ export const createHtmlFile = (file: File) => ` <DT/> <A HREF="${file.href}" ICON="${file.icon}" ADD_DATE="${file.add_date}">${file.name}</A> ` /** * @name: * @description: 列表格式的Dom * @return {*} */ export const createBaseTemp = () => ` <DL><p> </DL><p> `
写在最后
最终实现效果:BookMarks
源码:book_mark: 纯前端导入导出html书签,生成书签导航