一、问题背景
使用 dataworks同步maxcompute数据到ES的时候,目标端用到geo_point类型字段是报错:
"error":{"type":"mapper_parsing_exception","reason":"failed to parse field [location] of type [geo_point]","caused_by":{"type":"parse_exception","reason":"latitude must be a number"
本文通过测试详细描述,同步该类型字段,源端和目标端应该怎么配置
二 、测试步骤
(一)环境准备
1 源端,使用默认的odps数据源
2 目标端,使用6.7.0版本的es实例
数据源连通性已经确认连通
(二)数据准备
1、在mc侧创建源表
create table toes(idd int ,location1 STRING );
2、在es侧创建目标索引
PUT /product_info { "settings": { "number_of_shards": 3, "number_of_replicas": 1 }, "mappings": { "mytype": { "properties": { "location": { "type": "geo_point" } } }} }
3、同步数据配置
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "odps", "parameter": { "partition": [], "datasource": "odps_first", "envType": 0, "isSupportThreeModel": false, "column": [ "location1" ], "tableComment": "", "table": "toes" }, "name": "Reader", "category": "reader" }, { "stepType": "elasticsearch", "parameter": { "actionType": "index", "indexType": "mytype", "cleanup": false, "datasource": "elastic_test", "envType": 0, "discovery": false, "column": [ { "name": "location", "type": "geo_point" } ], "index": "product_info", "primaryKeyInfo": { "type": "nopk", "fieldDelimiter": "," }, "dynamic": false, "batchSize": 1024, "splitter": "," }, "name": "Writer", "category": "writer" }, { "copies": 1, "parameter": { "nodes": [], "edges": [], "groups": [], "version": "2.0" }, "name": "Processor", "category": "processor" } ], "setting": { "errorLimit": { "record": "" }, "locale": "zh", "speed": { "throttle": false, "concurrent": 1 } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
三、测试结果
1、在源端插入测试数据
insert into toes values(1,"11.55555,11.11111"),(2,"[22.55555,22.11111]"); select * from toes;
2、在目标端查询数据
GET /product_info/_search
四、问题总结
1 如上测试总结,在源端的数据源类型为string类型,并且数据写入时是正常的两个数字,既可被识别到。
如下的数据,如在源端加了 [] "" 等符号,都会被识别失败。
2 elasticsearch writer参考:https://help.aliyun.com/document_detail/137770.html