logstash一次同步Mysql多张表到ES深入详解-阿里云开发者社区

logstash一次同步Mysql多张表到ES深入详解

2019-07-05 1146

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云数据库 RDS MySQL，集群系列 2核4GB

简介： 一次同步多张表是开发中的一般需求。我再次整理了一下。

题记

一次同步多张表是开发中的一般需求。之前研究了很久找到方法，但没有详细总结。
博友前天在线提问，说明这块理解的还不够透彻。
我整理下，
一是为了尽快解决博友问题，
二是加深记忆，便于未来产品开发中快速上手。

1、同步原理

原有ES专栏中有详解，不再赘述。详细请参考我的专栏：
深入详解Elasticsearch
以下是通过ES5.4.0， logstash5.4.1 验证成功。
可以确认的是2.X版本同样可以验证成功。

2、核心配置文件

input {
  stdin {
  }

  jdbc {
  type => "cxx_article_info"
  # mysql jdbc connection string to our backup databse 后面的test对应mysql中的test数据库
  jdbc_connection_string => "jdbc:mysql://110.10.15.37:3306/cxxwb"
  # the user we wish to excute our statement as
  jdbc_user => "root"
  jdbc_password => "xxxxx"

  record_last_run => "true"
  use_column_value => "true"
  tracking_column => "id"
  last_run_metadata_path => "/opt/logstash/bin/logstash_xxy/cxx_info"
  clean_run => "false"

  # the path to our downloaded jdbc driver
  jdbc_driver_library => "/opt/elasticsearch/lib/mysql-connector-java-5.1.38.jar"
  # the name of the driver class for mysql
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_paging_enabled => "true"
  jdbc_page_size => "500"
  statement => "select * from cxx_article_info where id > :sql_last_value"
#定时字段 各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新
  schedule => "* * * * *"
#设定ES索引类型
  }

  jdbc {
  type => "cxx_user"
  # mysql jdbc connection string to our backup databse 后面的test对应mysql中的test数据库
  jdbc_connection_string => "jdbc:mysql://110.10.15.37:3306/cxxwb"
  # the user we wish to excute our statement as
  jdbc_user => "root"
  jdbc_password => "xxxxxx"

  record_last_run => "true"
  use_column_value => "true"
  tracking_column => "id"
  last_run_metadata_path => "/opt/logstash/bin/logstash_xxy/cxx_user_info"
  clean_run => "false"

  # the path to our downloaded jdbc driver
  jdbc_driver_library => "/opt/elasticsearch/lib/mysql-connector-java-5.1.38.jar"
  # the name of the driver class for mysql
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_paging_enabled => "true"
  jdbc_page_size => "500"
  statement => "select * from cxx_user_info where id > :sql_last_value"
#以下对应着要执行的sql的绝对路径。
#statement_filepath => "/opt/logstash/bin/logstash_mysql2es/department.sql"
#定时字段 各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新
schedule => "* * * * *"
#设定ES索引类型
  }

}

filter {
mutate {
  convert => [ "publish_time", "string" ]
 }

date {
  timezone => "Europe/Berlin"
  match => ["publish_time" , "ISO8601", "yyyy-MM-dd HH:mm:ss"]
}
#date {
 # match => [ "publish_time", "yyyy-MM-dd HH:mm:ss,SSS" ]
  # remove_field => [ "publish_time" ]
  # }
json {
  source => "message"
  remove_field => ["message"]
  }
}

output {

if [type]=="cxxarticle_info" {
  elasticsearch {
#ESIP地址与端口
  hosts => "10.100.11.231:9200"
#ES索引名称（自己定义的）
  index => "cxx_info_index"
#自增ID编号
 # document_id => "%{id}"
  }
}

if [type]=="cxx_user" {
  elasticsearch {
#ESIP地址与端口
  hosts => "10.100.11.231:9200"
#ES索引名称（自己定义的）
  index => "cxx_user_index"
#自增ID编号
 # document_id => "%{id}"
  }
}

}

3、同步成功结果

[2017-07-19T15:08:05,438][INFO ][logstash.pipeline ] Pipeline main started
The stdin plugin is now waiting for input:
[2017-07-19T15:08:05,491][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2017-07-19T15:09:00,721][INFO ][logstash.inputs.jdbc ](0.007000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 0) AS `t1` LIMIT 1
[2017-07-19T15:09:00,721][INFO ][logstash.inputs.jdbc ](0.008000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 0) AS `t1` LIMIT 1
[2017-07-19T15:09:00,730][INFO ][logstash.inputs.jdbc ](0.004000s) SELECT * FROM (select * from cxx_user_info where id > 0) AS `t1` LIMIT 500 OFFSET 0
[2017-07-19T15:09:00,731][INFO ][logstash.inputs.jdbc ](0.007000s) SELECT * FROM (select * from cxx_article_info where id > 0) AS `t1` LIMIT 500 OFFSET 0
[2017-07-19T15:10:00,173][INFO ][logstash.inputs.jdbc ](0.002000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 3) AS `t1` LIMIT 1
[2017-07-19T15:10:00,174][INFO ][logstash.inputs.jdbc ](0.003000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 2) AS `t1` LIMIT 1
[2017-07-19T15:11:00,225][INFO ][logstash.inputs.jdbc ](0.001000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 3) AS `t1` LIMIT 1
[2017-07-19T15:11:00,225][INFO ][logstash.inputs.jdbc ](0.002000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 2) AS `t1` LIMIT 1

4、扩展

1）多个表无非就是在input里面多加几个类型，在output中多加基础
类型判定。
举例：

if [type]=="cxx_user"

2）input里的type和output if判定的type保持一致，该type对应ES中的type。

后记

死磕ES，有问题欢迎大家提问探讨！

作者：铭毅天下
转载请标明出处，原文地址：
http://blog.csdn.net/laoyang360/article/details/75452953

logstash一次同步Mysql多张表到ES深入详解

题记

1、同步原理

2、核心配置文件

3、同步成功结果

4、扩展

后记

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

logstash一次同步Mysql多张表到ES深入详解

题记

1、同步原理

2、核心配置文件

3、同步成功结果

4、扩展

后记

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像