开发者社区 > 大数据与机器学习 > 大数据开发治理DataWorks > 正文

dataworks中mongo增量同步是什么?

已解决

dataworks中mongo增量同步是什么?

展开
收起
爱喝咖啡嘿 2022-12-09 17:58:08 208 0
2 条回答
写回答
取消 提交回答
  • 推荐回答
    1. mongo reader增量同步:

    实时场景:增量的标志是mongodb里的一个字段为createtime,每次导入的数据范围是业务时间00:00到业务时间23:59 * {createtime :{$gte:ISODate("unknownT00:00:00.000+0800"),$lte:ISODate("unknownT23:59:59.999+0800")}}

    注意:

    * 1.不支持时间戳
    
    * 2.query 可结合调度参数同步使用
    
    * 3.详情请参见MongoDB查询语法。——此答案整理自钉群“DataWorks交流群(答疑@机器人)”
    
    2022-12-12 11:40:44
    赞同 展开评论 打赏
  • 同步mongon表,按时间增量同步。

    {
        "type": "job",
        "version": "2.0",
        "steps": [
            {
                "stepType": "mongodb",
                "parameter": {
                    "datasource": "mongo_member",
                    "envType": 0,
                    "cursorTimeoutInMs": "600000",
                    "query": "{'reportTimeStr':{'$gte':ISODate('${last_day}T00:00:00.424+0800')}}",
                    "column": [
                        {
                            "name": "_id",
                            "type": "string"
                        },
                        {
                            "name": "_class",
                            "type": "string"
                        },
                        {
                            "name": "did",
                            "type": "string"
                        },
                        {
                            "name": "stage",
                            "type": "string"
                        },
                        {
                            "name": "platform",
                            "type": "string"
                        },
                        {
                            "name": "channel",
                            "type": "string"
                        },
                        {
                            "name": "appVersion",
                            "type": "string"
                        },
                        {
                            "name": "reportWay",
                            "type": "string"
                        },
                        {
                            "name": "reportTime",
                            "type": "long"
                        },
                        {
                            "name": "reportTimeStr",
                            "type": "string"
                        },
                        {
                            "name": "reportDate",
                            "type": "string"
                        }
                    ],
                    "tableComment": "This kind of datasource dosen't support get table comment. This is a comment produced by di.",
                    "batchSize": "1000",
                    "collectionName": "device_install_app_info"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter": {
                    "partition": "dt=${bizdate}",
                    "truncate": true,
                    "datasource": "odps_first",
                    "envType": 0,
                    "column": [
                        "id",
                        "class",
                        "did",
                        "stage",
                        "platform",
                        "chanel",
                        "appversion",
                        "reportway",
                        "reporttime",
                        "reporttimestr",
                        "reportdate"
                    ],
                    "emptyAsNull": false,
                    "table": "ods_lx_mg_member_device_install_app_info"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "throttle": false,
                "concurrent": 2
            }
        },
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        }
    }
    
    2022-12-10 23:56:07
    赞同 展开评论 打赏

DataWorks基于MaxCompute/Hologres/EMR/CDP等大数据引擎,为数据仓库/数据湖/湖仓一体等解决方案提供统一的全链路大数据开发治理平台。

相关产品

  • 大数据开发治理平台 DataWorks
  • 相关电子书

    更多
    DataWorks数据集成实时同步最佳实践(含内测邀请)-2020飞天大数据平台实战应用第一季 立即下载
    DataWorks调度任务迁移最佳实践-2020飞天大数据平台实战应用第一季 立即下载
    DataWorks商业化资源组省钱秘籍-2020飞天大数据平台实战应用第一季 立即下载