一、下载安装包
(1)地址:https://github.com/zendesk/maxwell/releases/download/v1.29.2/maxwell-1.29.2.tar.gz
注:Maxwell-1.30.0及以上版本不再支持JDK1.8。
(2)将安装包上传到hadoop10节点的/data/目录
注:此处使用教学版安装包,教学版对原版进行了改造,增加了自定义Maxwell输出数据中ts时间戳的参数,生产环境请使用原版。
2)将安装包解压至/opt/module
[root@hadoop10 data]# tar -zxvf maxwell-1.29.2.tar.gz -C /data/module/
3)修改名称
[root@hadoop10 module]# mv maxwell-1.29.2/ maxwell
2)配置MySQL
二、启用MySQL Binlog
MySQL服务器的Binlog默认是未开启的,如需进行同步,需要先进行开启。
1)修改MySQL配置文件/etc/my.cnf
[root@hadoop10 module]# sudo vim /etc/my.cnf
2)增加如下配置
[mysqld]
#数据库id
server-id = 1
#启动binlog,该参数的值会作为binlog的文件名
log-bin=mysql-bin
#binlog类型,maxwell要求为row类型
binlog_format=row
#启用binlog的数据库,需根据实际情况作出修改
binlog-do-db=gmall
注:MySQL Binlog模式
Statement-based:基于语句,Binlog会记录所有写操作的SQL语句,包括insert、update、delete等。
优点: 节省空间
缺点: 有可能造成数据不一致,例如insert语句中包含now()函数。
Row-based:基于行,Binlog会记录每次写操作后被操作行记录的变化。
优点:保持数据的绝对一致性。
缺点:占用较大空间。
mixed:混合模式,默认是Statement-based,如果SQL语句可能导致数据不一致,就自动切换到Row-based。
3)重启MySQL服务
[root@hadoop10 module]# sudo systemctl restart mysqld
三、创建Maxwell所需数据库和用户
Maxwell需要在MySQL中存储其运行过程中的所需的一些数据,包括binlog同步的断点位置(Maxwell支持断点续传)等等,故需要在MySQL为Maxwell创建数据库及用户。
1)创建数据库
msyql> CREATE DATABASE maxwell;
2)调整MySQL数据库密码级别
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=4;
3)创建Maxwell用户并赋予其必要权限
mysql> CREATE USER 'maxwell'@'%' IDENTIFIED BY 'maxwell';
mysql> GRANT ALL ON maxwell.* TO 'maxwell'@'%';
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON . TO 'maxwell'@'%';
四、配置Maxwell
修改配置文件名称
[root@hadoop10 module]# cd /data/module/maxwell/
[root@hadoop10 maxwell]# cp config.properties.example config.properties
修改配置文件 config.properties
[root@hadoop10 maxwell]# vim config.properties
log_level=info
producer=kafka
kafka.bootstrap.servers=hadoop10:9092,hadoop11:9092,hadoop12:9092
# mysql login info
host=hadoop10
user=maxwell
password=maxwell
jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai
启动
[root@hadoop10 maxwell]# /data/module/maxwell/bin/maxwell --config /data/module/maxwell/config.properties --daemon
停止
[root@hadoop10 logs]# ps -ef | grep maxwell | grep -v grep | grep maxwell | awk '{print $2}' | xargs kill -9
脚本[root@hadoop10 data]# vim mxw.sh
#!/bin/bash
MAXWELL_HOME=/data/module/maxwell
status_maxwell(){
result=`ps -ef | grep com.zendesk.maxwell.Maxwell | grep -v grep | wc -l`
return $result
}
start_maxwell(){
status_maxwell
if [[ $? -lt 1 ]]; then
echo "启动Maxwell"
{
mathJaxContainer[0]}MAXWELL_HOME/config.properties --daemon
else
echo "Maxwell正在运行"
fi
}
stop_maxwell(){
status_maxwell
if [[ $? -gt 0 ]]; then
echo "停止Maxwell"
ps -ef | grep com.zendesk.maxwell.Maxwell | grep -v grep | awk '{print $2}' | xargs kill -9
else
echo "Maxwell未在运行"
fi
}
case $1 in
start )
start_maxwell
;;
stop )
stop_maxwell
;;
restart )
stop_maxwell
start_maxwell
;;
esac
增量数据同步
启动Kafka消费者
[centos@hadoop11 data]$ cd /data/module/kafka/
[centos@hadoop11 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic maxwell
模拟生成数据
[root@hadoop10 db_log]# java -jar gmall2020-mock-db-2021-11-14.jar
观察消费
4:21","expire_time":"2020-06-14 07:29:21","process_status":null,"tracking_no":null,"parent_order_id":null,"img_url":"http://img.gmall.com/561895.jpg","province_id":25,"activity_reduce_amount":0.00,"coupon_reduce_amount":0.00,"original_total_amount":21568.00,"feight_fee":10.00,"feight_fee_reduce":null,"refundable_time":null},"old":{"order_status":"1002"}}
{"database":"gmall","table":"comment_info","type":"delete","ts":1692886461,"xid":910,"xoffset":1513,"data":{"id":1694707750685208578,"user_id":46,"nick_name":null,"head_img":null,"sku_id":35,"spu_id":12,"order_id":4864,"appraise":"1204","comment_txt":"评论内容:96247597671542921464962121958467368391368811462264","create_time":"2020-06-14 06:46:28","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"delete","ts":1692886461,"xid":910,"xoffset":1514,"data":{"id":1694707750685208579,"user_id":29,"nick_name":null,"head_img":null,"sku_id":16,"spu_id":4,"order_id":4864,"appraise":"1204","comment_txt":"评论内容:21524562541519988486139232626344927334755349232734","create_time":"2020-06-14 06:46:28","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"insert","ts":1692886461,"xid":910,"xoffset":1515,"data":{"id":1694714766954663937,"user_id":77,"nick_name":null,"head_img":null,"sku_id":24,"spu_id":8,"order_id":4871,"appraise":"1204","comment_txt":"评论内容:81894832391192465159233347798193373168591456631712","create_time":"2020-06-14 07:14:21","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"insert","ts":1692886461,"xid":910,"xoffset":1516,"data":{"id":1694714766954663938,"user_id":107,"nick_name":null,"head_img":null,"sku_id":13,"spu_id":4,"order_id":4875,"appraise":"1204","comment_txt":"评论内容:98828465676751552667448416351657496127596596476892","create_time":"2020-06-14 07:14:21","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"insert","ts":1692886461,"xid":910,"xoffset":1517,"data":{"id":1694714766954663939,"user_id":31,"nick_name":null,"head_img":null,"sku_id":29,"spu_id":10,"order_id":4875,"appraise":"1201","comment_txt":"评论内容:22618915673433917239222291514974179548798554226583","create_time":"2020-06-14 07:14:21","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"insert","ts":1692886461,"xid":910,"xoffset":1518,"data":{"id":1694714766954663940,"user_id":23,"nick_name":null,"head_img":null,"sku_id":33,"spu_id":11,"order_id":4875,"appraise":"1204","comment_txt":"评论内容:37817814486627785692489891238841683116736616476751","create_time":"2020-06-14 07:14:21","operate_time":null}}
{"database":"gmall","table":"comment_info","type":"insert","ts":1692886461,"xid":910,"commit":true,"data":{"id":1694714766954663941,"user_id":173,"nick_name":null,"head_img":null,"sku_id":34,"spu_id":12,"order_id":4875,"appraise":"1201","comment_txt":"评论内容:11974345169246911299173692813519639725926945399319","create_time":"2020-06-14 07:14:21","operate_time":null}}
^CProcessed a total of 1520 messages
历史数据全量同步
上一节,我们已经实现了使用Maxwell实时增量同步MySQL变更数据的功能。但有时只有增量数据是不够的,我们可能需要使用到MySQL数据库中从历史至今的一个完整的数据集。这就需要我们在进行增量同步之前,先进行一次历史数据的全量同步。这样就能保证得到一个完整的数据集。
4.4.1 Maxwell-bootstrap
Maxwell提供了bootstrap功能来进行历史数据的全量同步,命令如下:
[root@hadoop10 db_log]# /data/module/maxwell/bin/maxwell-bootstrap --database gmall --table user_info --config /data/module/maxwell/config.properties
报错
Caused by: com.mysql.cj.exceptions.InvalidConnectionAttributeException:
The server time zone value 'PDT' is unrecognized or represents more than
one time zone. You must configure either the server or JDBC driver
(via the serverTimezone configuration property) to use a more specifc
time zone value if you want to utilize time zone support.
解决
1、永久关闭only_full_group_by模式
Way2:永久关闭only_full_group_by模式,这种方法需要在mysql的配置文件里修改,然后重启。
Step 1 找到配置文件/etc/my.cnf(或则关联文件夹找到mysql-server.cnf)
Step 2: 在上述文件内的[mysqld]后追加
sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'
2、mysql 时区更改
show variables like '%time_zone%';
select now();
set time_zone=SYSTEM;
show VARIABLES LIKE 'sql_mode';
set global time_zone = '+8:00';
set time_zone = '+8:00';
flush privileges;
4.4.2 boostrap数据格式
采用bootstrap方式同步的输出数据格式如下:
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":191,"login_name":"quekbxbh","nick_name":"茂进","passwd":null,"name":"严茂进","phone_num":"13612348943","email":"quekbxbh@3721.net","head_img":null,"user_level":"1","birthday":"2002-09-14","gender":"M","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":192,"login_name":"ux8tisx","nick_name":"莺莺","passwd":null,"name":"苏莺","phone_num":"13248336874","email":"ux8tisx@live.com","head_img":null,"user_level":"1","birthday":"1991-05-14","gender":null,"create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":193,"login_name":"qzg8x19s1f","nick_name":"阿维","passwd":null,"name":"东方维","phone_num":"13127712698","email":"qzg8x19s1f@googlemail.com","head_img":null,"user_level":"1","birthday":"1971-04-14","gender":"M","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":194,"login_name":"0wqmgs","nick_name":"伊亚","passwd":null,"name":"秦伊亚","phone_num":"13168469868","email":"0wqmgs@163.net","head_img":null,"user_level":"1","birthday":"1999-02-14","gender":"F","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":195,"login_name":"1zlsuk0c3wf","nick_name":"武新","passwd":null,"name":"姜武新","phone_num":"13782252383","email":"1zlsuk0c3wf@googlemail.com","head_img":null,"user_level":"2","birthday":"2003-10-14","gender":null,"create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":196,"login_name":"o57dl6u9lc","nick_name":"柔柔","passwd":null,"name":"夏侯柔","phone_num":"13227125372","email":"o57dl6u9lc@163.com","head_img":null,"user_level":"1","birthday":"1985-01-14","gender":"F","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":197,"login_name":"uawtev5","nick_name":"朋斌","passwd":null,"name":"吴朋斌","phone_num":"13697847725","email":"uawtev5@qq.com","head_img":null,"user_level":"1","birthday":"1973-01-14","gender":"M","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":198,"login_name":"vusu98od","nick_name":"芸芸","passwd":null,"name":"范芸","phone_num":"13421517826","email":"vusu98od@0355.net","head_img":null,"user_level":"1","birthday":"2004-11-14","gender":null,"create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":199,"login_name":"vel6gn","nick_name":"阿信","passwd":null,"name":"宇文信","phone_num":"13661339241","email":"vel6gn@126.com","head_img":null,"user_level":"2","birthday":"1997-04-14","gender":"M","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-insert","ts":1692888570,"data":{
"id":200,"login_name":"4enatv","nick_name":"聪聪","passwd":null,"name":"尹聪","phone_num":"13569193752","email":"4enatv@yeah.net","head_img":null,"user_level":"1","birthday":"1998-06-14","gender":"F","create_time":"2020-06-14 07:14:19","operate_time":null,"status":null}}
{
"database":"gmall","table":"user_info","type":"bootstrap-complete","ts":1692888570,"data":{
}}
注意事项:
1)第一条type为bootstrap-start和最后一条type为bootstrap-complete的数据,是bootstrap开始和结束的标志,不包含数据,中间的type为bootstrap-insert的数据才包含数据。
2)一次bootstrap输出的所有记录的ts都相同,为bootstrap开始的时间。
采集通道Maxwell配置
1)修改Maxwell配置文件config.properties
[root@hadoop10 data]# vim /data/module/maxwell/config.properties
2)配置参数如下
log_level=info
producer=kafka
kafka.bootstrap.servers=hadoop102:9092,hadoop103:9092
kafka topic配置
kafka_topic=topic_db
mysql login info
host=hadoop102
user=maxwell
password=maxwell
jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai