logstash一次同步Mysql多张表到ES深入详解-阿里云开发者社区

logstash一次同步Mysql多张表到ES深入详解

2021-11-08 822

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云数据库 RDS MySQL，集群版 2核4GB 100GB

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： 题记一次同步多张表是开发中的一般需求。之前研究了很久找到方法，但没有详细总结。博友前天在线提问，说明这块理解的还不够透彻。我整理下，一是为了尽快解决博友问题，二是加深记忆，便于未来产品开发中快速上手。

1、同步原理

原有ES专栏中有详解，不再赘述。详细请参考我的专栏：

深入详解Elasticsearch

以下是通过ES5.4.0， logstash5.4.1 验证成功。

可以确认的是2.X版本同样可以验证成功。

2、核心配置文件

input {

stdin {

}

jdbc {

type => "cxx_article_info"

# mysql jdbc connection string to our backup databse 后面的test对应mysql中的test数据库

jdbc_connection_string => "jdbc:mysql://110.10.15.37:3306/cxxwb"

# the user we wish to excute our statement as

jdbc_user => "root"

jdbc_password => "xxxxx"

record_last_run => "true"

use_column_value => "true"

tracking_column => "id"

last_run_metadata_path => "/opt/logstash/bin/logstash_xxy/cxx_info"

clean_run => "false"

# the path to our downloaded jdbc driver

jdbc_driver_library => "/opt/elasticsearch/lib/mysql-connector-java-5.1.38.jar"

# the name of the driver class for mysql

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_paging_enabled => "true"

jdbc_page_size => "500"

statement => "select * from cxx_article_info where id > :sql_last_value"

#定时字段各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新

schedule => "* * * * *"

#设定ES索引类型

}

jdbc {

type => "cxx_user"

# mysql jdbc connection string to our backup databse 后面的test对应mysql中的test数据库

jdbc_connection_string => "jdbc:mysql://110.10.15.37:3306/cxxwb"

# the user we wish to excute our statement as

jdbc_user => "root"

jdbc_password => "xxxxxx"

record_last_run => "true"

use_column_value => "true"

tracking_column => "id"

last_run_metadata_path => "/opt/logstash/bin/logstash_xxy/cxx_user_info"

clean_run => "false"

# the path to our downloaded jdbc driver

jdbc_driver_library => "/opt/elasticsearch/lib/mysql-connector-java-5.1.38.jar"

# the name of the driver class for mysql

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_paging_enabled => "true"

jdbc_page_size => "500"

statement => "select * from cxx_user_info where id > :sql_last_value"

#以下对应着要执行的sql的绝对路径。

#statement_filepath => "/opt/logstash/bin/logstash_mysql2es/department.sql"

#定时字段各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新

schedule => "* * * * *"

#设定ES索引类型

}

filter {

mutate {

convert => [ "publish_time", "string" ]

}

date {

timezone => "Europe/Berlin"

match => ["publish_time" , "ISO8601", "yyyy-MM-dd HH:mm:ss"]

}

#date {

# match => [ "publish_time", "yyyy-MM-dd HH:mm:ss,SSS" ]

# remove_field => [ "publish_time" ]

# }

json {

source => "message"

remove_field => ["message"]

}

output {

if [type]=="cxxarticle_info" {

elasticsearch {

#ESIP地址与端口

hosts => "10.100.11.231:9200"

#ES索引名称（自己定义的）

index => "cxx_info_index"

#自增ID编号

# document_id => "%{id}"

}

if [type]=="cxx_user" {

elasticsearch {

#ESIP地址与端口

hosts => "10.100.11.231:9200"

#ES索引名称（自己定义的）

index => "cxx_user_index"

#自增ID编号

# document_id => "%{id}"

}

100

101

102

103

104

3、同步成功结果

[2017-07-19T15:08:05,438][INFO ][logstash.pipeline ] Pipeline main started

The stdin plugin is now waiting for input:

[2017-07-19T15:08:05,491][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

[2017-07-19T15:09:00,721][INFO ][logstash.inputs.jdbc ] (0.007000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 0) AS `t1` LIMIT 1

[2017-07-19T15:09:00,721][INFO ][logstash.inputs.jdbc ] (0.008000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 0) AS `t1` LIMIT 1

[2017-07-19T15:09:00,730][INFO ][logstash.inputs.jdbc ] (0.004000s) SELECT * FROM (select * from cxx_user_info where id > 0) AS `t1` LIMIT 500 OFFSET 0

[2017-07-19T15:09:00,731][INFO ][logstash.inputs.jdbc ] (0.007000s) SELECT * FROM (select * from cxx_article_info where id > 0) AS `t1` LIMIT 500 OFFSET 0

[2017-07-19T15:10:00,173][INFO ][logstash.inputs.jdbc ] (0.002000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 3) AS `t1` LIMIT 1

[2017-07-19T15:10:00,174][INFO ][logstash.inputs.jdbc ] (0.003000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 2) AS `t1` LIMIT 1

[2017-07-19T15:11:00,225][INFO ][logstash.inputs.jdbc ] (0.001000s) SELECT count(*) AS `count` FROM (select * from cxx_article_info where id > 3) AS `t1` LIMIT 1

[2017-07-19T15:11:00,225][INFO ][logstash.inputs.jdbc ] (0.002000s) SELECT count(*) AS `count` FROM (select * from cxx_user_info where id > 2) AS `t1` LIMIT 1

4、扩展

1）多个表无非就是在input里面多加几个类型，在output中多加基础

类型判定。

举例：

if [type]=="cxx_user"

2）input里的type和output if判定的type**保持一致**，该type对应ES中的type。

后记

死磕ES，有问题欢迎大家提问探讨！

logstash一次同步Mysql多张表到ES深入详解

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像