两个数组数据的高效合并方案-阿里云开发者社区

前言

作为一个前端，服务器返回的数据易用，能极大的提升开发效率。

能一个接口提供的数据，就不要用去调用两次或者更多网络请求，然后进行数据合并。

然而，理想和现实两者，现实总是找我，感觉不到理想对的温暖。

残酷的经历

说起来，你可能不信，但是请您相信我，因为我很善良。

我们某个业务，有一个列表数据，需要调用三个接口合并数据，具体如下:

单次返回10条基础信息
再分10次去请求详情信息
从第2步的结果中用某个ID值，再发10次请求，请求用户信息

打断你的施法，别问我为什么，之前就是这么实现的。

我们来做一个简单的算术: 1 + 10 + 10 = 21

我的天，原天堂没有数据合并。

后来在我再三的要求下，终于变成了调用两个接口：

单次返回10条基础信息
用第1步的数据集合中的ID值，再批量查询

我们来做一个更简单的算术： 1 + 1 = 2

虽然比一次返回全部数据仅仅多了一次，但是抛出了一个不可回避的问题，数组的数据合并。

今天，就让我们来一起探索数组数据合并。

demo数据

假设有两组数据，一组是用户基础数据，一组是分值数据，通过uid关联。

export const usersInfo = Array.from({ length: 10 }, (val, index) => {
    return {
        uid: `${index + 1}`,
        name: `user-name-${index}`,
        age: index + 10,
        avatar: `http://www.avatar.com/${index + 1}`
    }
});
export const scoresInfo = Array.from({ length: 8 }, (val, index) => {
    return {
        uid: `${index + 1}`,
        score: ~~(Math.random() * 10000),
        comments: ~~(Math.random() * 10000),
        stars: ~~(Math.random() * 1000)
    }
});
复制代码

基础版本

两层循环，通过某个key进行比较，然后赋值。

我想说，简单好用，XDM，你们不骂我才怪。搞不好还建议掘金搞一个踩的功能。

import * as datas from "./data";
const { usersInfo, scoresInfo } = datas;
console.time("merge data")
for (let i = 0; i < usersInfo.length; i++) {
    var user = usersInfo[i] as any;
    for (let j = 0; j < scoresInfo.length; j++) {
        var score = scoresInfo[j];
        if (user.uid == score.uid) {
            user.score = score.score;
            user.comments = score.comments;
            user.stars = score.stars;
        }
    }
}
console.timeEnd("merge data")
console.log(usersInfo);
复制代码

基础-hash版本

这里的基本思路先把数组转换成对象。当然也可以是转为Map对象。

用hash查找替换数组查找。

核心点：

找到能标记对象唯一的属性键，本文为 uid
遍历数组, 以单条数据属性uid的值作为属性键，单条数据作为值

对象属性的查找相比数组的查询要快得多，这里就能得到不少的性能提升。

你可能要问题，为啥要快得多，因为这里的数组查找不能按照下标查找的，而是链表形式来查找的。

import * as datas from "./data";
const { usersInfo, scoresInfo } = datas;
console.time("merge data")
const scoreMap = scoresInfo.reduce((obj, cur) => {
    obj[cur.uid] = cur;
    return obj;
}, Object.create(null));
for (let i = 0; i < usersInfo.length; i++) {
    const user = usersInfo[i] as any;
    const score = scoreMap[user.uid];
    if(score != null){
        user.score = score.score;
        user.comments = score.comments;
        user.stars = score.stars;
    }
}
console.timeEnd("merge data")
console.log(usersInfo);
复制代码

到这里，你也许会伸个懒腰，喝口水。

一不小心喝多了，你喝多了。

这种实现就存在多遍历的情况，所以我们在达到目标后，跳出。

基础-hash-跳出版

这个版本和上个版本的最大区别，就是增加了计数功能，记录已经合并的次数，达到预期后，跳出。

你也许会笑，这记个数有啥意思。

我们用数据说话：

假如我们的列表里面有100条数据，需要合并的数据只有10条。假设列表中第40-49条是要合并的数据。

我们来算个数：100-50 = 50

此场景会多遍历50次。随着各种场景的不同，多遍历的次数也不一样。

import * as datas from "./data";
const { usersInfo, scoresInfo } = datas;
console.time("merge data")
const scoreMap = scoresInfo.reduce((obj, cur) => {
    obj[cur.uid] = cur;
    return obj;
}, Object.create(null));
const len = scoresInfo.length;
let count = 0;
let  walkCount = 0;
for (let i = 0; i < usersInfo.length; i++) {
    const user = usersInfo[i] as any;
    const score = scoreMap[user.uid];
    walkCount++;
    if(score != null){
        count++
        user.score = score.score;
        user.comments = score.comments;
        user.stars = score.stars;       
        if(count>=len){
            break;
        }
    }
}
console.log(`合并完毕：遍历次数${walkCount}, 实际命中次数${count}, 预期命中次数${len}`)
console.timeEnd("merge data");
console.log(usersInfo);
复制代码

到这里，你微微一笑很迷人，打开掘金，发现自己上了首页的文章，居然从首页跑到第二页去了。

划重点 第二页，是的我们的数据一般是分页加载的。

我们理论上是拉取一页，合并一次数据。也就是说，大多数情况，后面拉取的数据都是追加原列表后面。

新数据都是追加到原列表后，那是不是倒序遍历就会更快呢？答案，肯定是。

当然你要是后拉取的数据，往前放，顺序遍历更快。

基础-hash-跳出-倒序版

倒序遍历和顺序遍历的区别就是一个从前到后，一个从后往前。代码如下

import * as datas from "./data";
const { usersInfo, scoresInfo } = datas;
console.time("merge data")
const scoreMap = scoresInfo.reduce((obj, cur) => {
    obj[cur.uid] = cur;
    return obj;
}, Object.create(null));
const len = scoresInfo.length;
let count = 0;
let  walkCount = 0;
for (let i = usersInfo.length - 1; i>=0 ; i--) {
    const user = usersInfo[i] as any;
    const score = scoreMap[user.uid];
    walkCount++;
    if(score != null){
        count++
        user.score = score.score;
        user.comments = score.comments;
        user.stars = score.stars;       
        if(count>=len){
            break;
        }
    }
}
console.log(`合并完毕：遍历次数${walkCount}, 实际命中次数${count}, 预期命中次数${len}`)
console.timeEnd("merge data");
console.log(usersInfo);
复制代码

到这里，你刚准备走，突然有同事甲围观你，发现你在写数组合并。

同事甲说：哇，写数据合并啊，我这也有需求，你顺便抽象封装一下吧。我的数据是追加型，也就是倒序遍历。

同事甲说完，好几个同事站了起来，说，我这也有需求。

同事乙说：我的是插入头部的，需要顺序遍历。

同事丙说：我要合并的属性是 a.b.c这种。

同事丁说：我要合并的是 a[0].b这种。

此时的你，改怎么做呢？

求助开源

underscore和lodash都有关于数据的操作，但是还达不到我们的需求。当然可以借助其完成。但是如果都依赖lodash了，那你这个可迁移性就有待商榷。

array-union, array-merge-by-key, deep-merger等等有合并的能力，但是都不如我们所期望。

算了，自己写吧。

准备工具类

属性读取和设置

我们先整理一下，数组的合并，具体操作的时候，其实是对象的合并。对象的合并一级属性合并好办，多级呢，是个问题。

其他lodash.get已经完美实现，我们一起来看一下其官方的demo：

var object = { 'a': [{ 'b': { 'c': 3 } }] };
_.get(object, 'a[0].b.c');
// => 3
_.get(object, ['a', '0', 'b', 'c']);
// => 3
_.get(object, 'a.b.c', 'default');
// => 'default'
复制代码

是不是很强大，同理，属性的设置，lodash提供了lodash.set, 我们一睹风采。

var object = { 'a': [{ 'b': { 'c': 3 } }] };
_.set(object, 'a[0].b.c', 4);
console.log(object.a[0].b.c);
// => 4
_.set(object, ['x', '0', 'y', 'z'], 5);
console.log(object.x[0].y.z);
// => 5
复制代码

这里，你可能要说，你不是不依赖lodash嘛，这不都提的都是lodash嘛。

请掘友莫着急，我们要实施 拿来主义。

删掉我们不太关心的，其核心就三个方法

stringToPath：路径转换成数组
getProperty：读属性
setProperty：设属性

const stringToPath = (string: string) => {
    const result = [];
    if (string.charCodeAt(0) === charCodeOfDot) {
        result.push('');
    }
    string.replace(rePropName, ((match, expression, quote, subString) => {
        let key = match;
        if (quote) {
            key = subString.replace(reEscapeChar, '$1');
        }
        else if (expression) {
            key = expression.trim();
        }
        result.push(key);
    }) as any);
    return result;
};
function getProperty(obj: Object, key: string, defaultValue: any = undefined) {
    if (!isObject(obj)) {
        return defaultValue;
    }
    const path = stringToPath(key);
    let index = 0;
    const length = path.length;
    while (obj != null && index < length) {
        obj = obj[path[index++]];
    }
    return (index && index == length) ? obj : undefined || defaultValue;
}
function setProperty(obj: Object, path: string, value: any = undefined) {
    if (!isObject(obj)) {
        return obj;
    }
    const keys = stringToPath(path);
    const length = keys.length;
    const lastIndex = length - 1;
    let index = -1;
    let nested = obj;
    while (nested != null && ++index < length) {
        const key = keys[index];
        let newValue = value;
        if (index != lastIndex) {
            const objValue = nested[key];
            newValue = undefined;
            if (newValue === undefined) {
                newValue = isObject(objValue) ? objValue : (isIndexLike[keys[index + 1]] ? [] : {})
            }
        }
        nested[key] = newValue;
        nested = nested[key];
    }
    return obj;
}
复制代码

到此，多级属性的读取设置问题解决。

准入下一个稍微复杂的问题，就是正序和倒叙的问题。

正序和倒叙-迭代器

不管是正序遍历还是倒叙遍历，其本质都是迭代。

按照传统实现，三种方式

if/else + 两个 for循环
while
数组的forEach, reduce等

但是都逃不出要分开逻辑判断是顺序还是倒叙, 这些判断写在遍历代码里，代码可读性会变差。

我们这里应该想到迭代器，迭代器。

利用高阶函数，封装递归逻辑，外部只关心hasNext和current。

function getStepIter(min: number, max: number, desc: boolean) {
    let start = desc ? max : min;
    let end = desc ? min : max;
    if (desc) {
        return {
            hasNext() {
                return start >= end
            },
            get current() {
                return start;
            },
            next() {
                return --start
            }
        }
    }
    return {
        hasNext() {
            return end >= start
        },
        get current() {
            return start;
        },
        next() {
            return ++start
        }
    }
}
复制代码

这两个大问题解决，就开始撸起袖子， just do it!.

感谢 SSShuai1999 提供的优化方案, 代码行数减少。在 hasNext的时候有一个三目运算。

function getStepIter(min: number, max: number, desc: boolean): any { 
    let [start, end, operator] = desc ? [max, min, -1] : [min, max, +1] 
     return { 
         hasNext() { 
             return desc ? start >= end : end >= start 
         }, 
         get current() { 
             return start; 
         }, 
         next() { 
             return start += operator 
         } 
     } 
 }
复制代码

具体实现

略。

会不会有那个羊驼跑过，哈哈。

篇幅问题，全部代码请移步到 arrayMerge

演示

看代码

data.ts, 为了方便看到结果，我们减少数据量。

这里的usersInfo的uid是 1,2,3

这里的scoresInfo的uid是2,3

这么设计是为了演示倒叙遍历。

// data.ts
export const usersInfo = Array.from({ length: 3 }, (val, index) => {
    return {
        uid: `${index + 1}`,
        name: `user-name-${index}`,
        age: index + 10,
        avatar: `http://www.avatar.com/${index + 1}`
    }
});
export const scoresInfo = Array.from({ length: 2 }, (val, index) => {
    return {
        uid: `${index + 2}`,
        score: ~~(Math.random() * 10000),
        comments: ~~(Math.random() * 10000),
        stars: ~~(Math.random() * 1000)
    }
});
复制代码

test.ts

import { mergeArray } from "../lib/array";
import * as datas from "./data";
const { usersInfo, scoresInfo } = datas;
const arr = mergeArray(usersInfo, scoresInfo, {
    sourceKey: "uid", // 源列表对象的用来比较的属性键
    targetKey: "uid",  // 目标列表对象的用来比较的属性键
    sKMap: {
        "score": "data.score",  // 把源的score属性映射到目标对象的data.score属性上
        "comments": "data.comments"， // 把源的comments属性映射到目标对象的data.comments属性上
        "stars": "stars" // 把源的stars属性映射到目标对象的stars属性上
    }
});
console.log("arr", arr);
复制代码

运行结果:

targetArr(3), sourceArr(2), 统计：遍历次数2, 命中次数2

其表示：列表长度为3，要合入的长度为2，累计遍历2次，命中2次。

mergeArray:: targetArr(3), sourceArr(2), 统计：遍历次数2, 命中次数2
arr [
  {
    uid: '1',
    name: 'user-name-0',
    age: 10,
    avatar: 'http://www.avatar.com/1'
  },
  [Object: null prototype] {
    uid: '2',
    name: 'user-name-1',
    age: 11,
    avatar: 'http://www.avatar.com/2',
    data: { score: 6979, comments: 3644 },
    stars: 434
  },
  [Object: null prototype] {
    uid: '3',
    name: 'user-name-2',
    age: 12,
    avatar: 'http://www.avatar.com/3',
    data: { score: 6348, comments: 320 },
    stars: 267
  }
]
复制代码

到此为止，真可以算是高一段段落。

提问

当前的版本还有什么问题，应该怎么解决。
当前的版本还有哪些改进方案。

写在最后

写作不易，如果觉得还不错，一赞一评，就是我最大的动力。

两个数组数据的高效合并方案

前言

残酷的经历

demo数据

基础版本

基础-hash版本

基础-hash-跳出版

基础-hash-跳出-倒序版

求助开源

准备工具类

属性读取和设置

正序和倒叙-迭代器

具体实现

演示

提问

写在最后

热门文章

最新文章

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

两个数组数据的高效合并方案

前言

残酷的经历

demo数据

基础版本

基础-hash版本

基础-hash-跳出版

基础-hash-跳出-倒序版

求助开源

准备工具类

属性读取和设置

正序和倒叙-迭代器

具体实现

演示

提问

写在最后

热门文章

最新文章

相关电子书

相关实验场景