dataworks同步maxcompute数据到ES，geo_point 类型写入测试-阿里云开发者社区

dataworks同步maxcompute数据到ES，geo_point 类型写入测试

2022-12-20 596

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： dataworks同步maxcompute数据到ES，geo_point 类型写入测试

一、问题背景

使用 dataworks同步maxcompute数据到ES的时候，目标端用到geo_point类型字段是报错：

"error":{"type":"mapper_parsing_exception","reason":"failed to parse field [location] of type [geo_point]","caused_by":{"type":"parse_exception","reason":"latitude must be a number"

本文通过测试详细描述，同步该类型字段，源端和目标端应该怎么配置

二、测试步骤

（一）环境准备

1 源端，使用默认的odps数据源

2 目标端，使用6.7.0版本的es实例

数据源连通性已经确认连通

（二）数据准备

1、在mc侧创建源表

create table toes(idd int ,location1 STRING  );

2、在es侧创建目标索引

PUT /product_info
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "mytype": {
      "properties": {
        "location": {
          "type": "geo_point"        }
      }
    }}
}

3、同步数据配置

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "odps",
            "parameter": {
                "partition": [],
                "datasource": "odps_first",
                "envType": 0,
                "isSupportThreeModel": false,
                "column": [
                    "location1"
                ],
                "tableComment": "",
                "table": "toes"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "elasticsearch",
            "parameter": {
                "actionType": "index",
                "indexType": "mytype",
                "cleanup": false,
                "datasource": "elastic_test",
                "envType": 0,
                "discovery": false,
                "column": [
                    {
                        "name": "location",
                        "type": "geo_point"
                    }
                ],
                "index": "product_info",
                "primaryKeyInfo": {
                    "type": "nopk",
                    "fieldDelimiter": ","
                },
                "dynamic": false,
                "batchSize": 1024,
                "splitter": ","
            },
            "name": "Writer",
            "category": "writer"
        },
        {
            "copies": 1,
            "parameter": {
                "nodes": [],
                "edges": [],
                "groups": [],
                "version": "2.0"
            },
            "name": "Processor",
            "category": "processor"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": ""
        },
        "locale": "zh",
        "speed": {
            "throttle": false,
            "concurrent": 1
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

三、测试结果

1、在源端插入测试数据

insert into  toes values(1,"11.55555,11.11111"),(2,"[22.55555,22.11111]");
select * from toes;

2、在目标端查询数据

GET /product_info/_search

四、问题总结

1 如上测试总结，在源端的数据源类型为string类型，并且数据写入时是正常的两个数字，既可被识别到。

如下的数据，如在源端加了 [] "" 等符号，都会被识别失败。

2 elasticsearch writer参考：https://help.aliyun.com/document_detail/137770.html

dataworks同步maxcompute数据到ES，geo_point 类型写入测试

一、问题背景

二、测试步骤

（一）环境准备

（二）数据准备

三、测试结果

四、问题总结

阿里云支持与服务

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

dataworks同步maxcompute数据到ES，geo_point 类型写入测试

一、问题背景

二 、测试步骤

（一）环境准备

（二）数据准备

三、测试结果

四、问题总结

阿里云支持与服务

热门文章

最新文章

相关课程

相关电子书

二、测试步骤