MySQL Error 1048 奇遇记

本文涉及的产品
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
云数据库 RDS MySQL Serverless,价值2615元额度,1个月
简介:

前提


上周遇到次奇葩的同步错误,error 1048 , 看似是简单的not null导致,但是为什么master可以执行,slave不行呢?为什么5.1的slave可以,5.6的slave不行呢? 带着很多疑问,准备来一窥究竟

[ERROR] Slave SQL: Error 'Column 'type_id' cannot be null' on query. Default database: ''. Query: 'insert into if_dw_stats.da_upload_nh_score_rank_result(city_id,city_name,comm_id,region_name,paid,comm_name_nh,region_id_num,region_id,subregion_id_num,subregion_id,vcuv,vcuv_z,call_vcuv,call_vcuv_z,orders_vcuv,orders_vcuv_z,peitao,peitao_z,result_score,rank,type_id,type_name,pinyin,cal_dt) values (N), 其中N>9000;

这里总结一下我遇到过的错误,分三种情况,虽然都是由于null引起,但是1048才是重点。

  • timestamp字段类型,为什么master执行成功,同步到slave报错?
  • int字段类型,5.1(master)<--- 5.6(slave),同步报错?
  • int字段类型,5.6(master)<--- 5.6(slave),同步报错?

接下来,开始进入主题

场景一

* DB架构: Master(5.1) <-- Slave(5.6)

* 表结构如下 :
dbadmin:abc> desc lc_time;
+-------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type      | Null | Key | Default           | Extra                       |
+-------+-----------+------+-----+-------------------+-----------------------------+
| id    | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------+-----------+------+-----+-------------------+-----------------------------+
1 row in set (0.00 sec)

* master

dbadmin:abc> select @@global.explicit_defaults_for_timestamp;
+------------------------------------------+
| @@global.explicit_defaults_for_timestamp |
+------------------------------------------+
|                                        0 |
+------------------------------------------+
1 row in set (0.00 sec)


dbadmin:abc> insert into lc_time values(null);
Query OK, 1 row affected (0.02 sec)

dbadmin:abc> select * from lc_time;
+---------------------+
| id                  |
+---------------------+
| 2014-11-25 13:02:14 |
+---------------------+
1 row in set (0.00 sec)

*slave

dbadmin:abc> select @@global.explicit_defaults_for_timestamp;
+------------------------------------------+
| @@global.explicit_defaults_for_timestamp |
+------------------------------------------+
|                                        1 |
+------------------------------------------+
1 row in set (0.00 sec)

dbadmin:abc> insert into lc_time values(null);
ERROR 1048 (23000): Column 'id' cannot be null


  • 结论:master上explicit_defaults_for_timestamp=0,slave上explicit_defaults_for_timestamp=1,会出现这种错误。
  • 解决方案:

    1. 保证master和slave explicit_defaults_for_timestamp 一致。
    2. 前端过滤掉null。

场景二

* DB架构
        master(5.1)
            |
    -------------------------------------
    |                    |
  slave A(5.1)        slave B(5.6)


* 表结构

dbadmin:abc> show create table abc;
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| abc   | CREATE TABLE `abc` (
  `id` int(11) DEFAULT NULL,
  `id2` int(11) NOT NULL DEFAULT '6'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

* 核心参数: master 和 slave A,B 的sql_mode 都是 '';

* 症状:在master上执行一条SQL语句 insert into abc values(1,0),(1,null);
  结果 Slave A 正常, Slave B 报error 1048,Error 'Column 'id2' cannot be null' on query, 这是为什么呢?
  Question1:为什么insert into abc values(1,null)失败?insert into abc values(1,0),(1,null);成功?
  Question2:为什么5.1 slave可以,5.6slave 不行?
  Question3:手动去slave B上执行同样的insert,为什么可以执行成功?
  如果你已经知道为什么,可以忽略下面的分析。

* 分析:
    细心的读者已经发现,第一个问题的答案已经在sql_mode链接中。接下来,测试过程中发现:insert into abc values(1,0),(1,null); 在sql_mode=''的时候,不管是5.1还是5.6都会成功执行。那么问题只有一个,sql_mode出了问题。查看master binlog后发现:在insert语句之前,多了这个可以执行的注释:SET @@session.sql_mode=2097152。我们来看看:

dbadmin:abc> SET @@session.sql_mode=2097152;
Query OK, 0 rows affected (0.00 sec)

dbadmin:abc> select @@session.sql_mode;
+---------------------+
| @@session.sql_mode  |
+---------------------+
| STRICT_TRANS_TABLES |
+---------------------+
1 row in set (0.00 sec)

这下,似乎发现了蛛丝马迹,那么问题又来了。

SET @@session.sql_mode=2097152; 从何而来?是程序写的?还是mysql自带的?
经过一番折腾,定位到此SQL来自java jdbc 。

以下代码摘自 java ConnectionIMPL.java

    private void setupServerForTruncationChecks() throws SQLException {
        if (getJdbcCompliantTruncation()) {
            if (versionMeetsMinimum(5, 0, 2)) {
                String currentSqlMode =
                    this.serverVariables.get("sql_mode");

                boolean strictTransTablesIsSet = StringUtils.indexOfIgnoreCase(currentSqlMode, "STRICT_TRANS_TABLES") != -1;

                if (currentSqlMode == null ||
                        currentSqlMode.length() == 0 || !strictTransTablesIsSet) {
                    StringBuffer commandBuf = new StringBuffer("SET sql_mode='");

                    if (currentSqlMode != null && currentSqlMode.length() > 0) {
                        commandBuf.append(currentSqlMode);
                        commandBuf.append(",");
                    }

                    commandBuf.append("STRICT_TRANS_TABLES'");

                    execSQL(null,  commandBuf.toString(), -1, null,
                            DEFAULT_RESULT_SET_TYPE,
                            DEFAULT_RESULT_SET_CONCURRENCY, false,
                            this.database, null, false);

                    setJdbcCompliantTruncation(false); // server's handling this for us now
                } else if (strictTransTablesIsSet) {
                    // We didn't set it, but someone did, so we piggy back on it
                    setJdbcCompliantTruncation(false); // server's handling this for us now
                }

            }
        }
    }



大致的意思就是:如果sql_mode = ‘’,那么java会调高sql_mode的级别,commandBuf.append("STRICT_TRANS_TABLES'");


ok,这下我们已经知道此set来自java,那么问题又来了。即便设置STRICT_TRANS_TABLES,要出问题,master就会报错了,为啥master是好的,Slave A是好的,却Slave B 同步出错呢?

结果已经很明显,因为Slave B是5.6。说的明显一点就是:
在严格模式下,5.1中可以执行,但是5.6不行,这应该算是5.6安全方面的新特性么?
有兴趣的同学可以自己测试下。
  • 解决方案

    1. 配置java或者修改java源码,让其不要更改mysql的sql_mode
    2. 临时解决方案: insert ignore xxx;
    3. sql_mode的规范。

场景三

  • 来自case when的奇葩错误
* DB架构   Master(5.6)  <--- Slave (5.6)

* sql_mode 都是'';

* 报错如下:

  Replicate_Wild_Ignore_Table: mysql.%,test.%
                   Last_Errno: 1048
                   Last_Error: Error 'Column 'referer' cannot be null' on query. Default database: 'action_db'. Query: 'insert into oplogin_log(`cityId`,`userId`,`userName`,`uri
`,`referer`,`logType`,`logDate`,`ip`,`status`)
              values('','','kyqxmxyt','/login.php?rtn=1','http://xx.com:80/login.php?rtn=' RLIKE (SELECT (CASE WHEN (ORD(MID((SELECT IFNULL(CAST(COUNT(DISTINCT(schema_na
me)) AS CHAR),0x20) FROM INFORMATION_SCHEMA.SCHEMATA),1,1))>50) THEN 0x687474703a2f2f6f70746f6f6c732e616e6a756b652e636f6d3a38302f6c6f67696e2e7068703f72746e3d ELSE 0x28 END)) AND 'ae
WZ'='aeWZ','1','1416020259','114.242.250.192','2') #v1:checklogin@login.php (15) 1416020259'

这条奇葩且牛B的SQL,我来稍微翻译一下,如果INFORMATION_SCHEMA.SCHEMATA 去重后,得到的库名的第一个字符如果是1,返回0,否则返回 null。

将这种SQL稍微转换成简单一点的:

master:abc> desc abc;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id    | int(11) | YES  |     | NULL    |       |
| id2   | int(11) | NO   |     | 6       |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

master:abc> select * from abc;
+------+-----+
| id   | id2 |
+------+-----+
|    1 |   0 |
|    1 |   0 |
|    2 |   0 |
|    2 |   0 |
|    1 |   1 |
|    1 |   0 |
+------+-----+
6 rows in set (0.00 sec)

master:abc> select * from lc;
Empty set (0.00 sec)


master:abc> insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end );
Query OK, 1 row affected (0.00 sec)

查看master的binlog如下:
*binlog*
# at 1109
#141125 12:44:51 server id 101082106  end_log_pos 1271 CRC32 0x9ec0ca94         Query   thread_id=28    exec_time=0     error_code=0
SET TIMESTAMP=1416890691/*!*/;
insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end )
/*!*/;


slave:abc> select * from abc;
+------+-----+
| id   | id2 |
+------+-----+
|    1 |   0 |
|    1 |   0 |
|    2 |   0 |
|    2 |   0 |
|    1 |   0 |
|    1 |   0 |
+------+-----+
6 rows in set (0.00 sec)

slave:abc> select * from lc;
+------+
| id   |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)

*slave status*
   Last_SQL_Errno: 1048
   Last_SQL_Error: Error 'Column 'id2' cannot be null' on query. Default database: 'abc'. Query: 'insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end )'


  • 结论

    1. 最终binlog并不是RBR,所以会报错。
    2. 临时解决方案: insert ignore xxx. 然后再用pt-table-checksum && pt-sync等修复。
    3. 禁止case when语句。

Mysql reference 5.6 error code

相关实践学习
基于CentOS快速搭建LAMP环境
本教程介绍如何搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
5月前
|
关系型数据库 MySQL
MySQL 报错 [ERROR] [FATAL] InnoDB: Table flags are 0 in the data dictionary but the flags in file
MySQL 报错 [ERROR] [FATAL] InnoDB: Table flags are 0 in the data dictionary but the flags in file
407 0
|
1月前
|
关系型数据库 MySQL
mysql: error while loading shared libraries: libncurses.so.5: cannot open shared object file
mysql: error while loading shared libraries: libncurses.so.5: cannot open shared object file
28 0
|
1月前
|
SQL 关系型数据库 MySQL
SQL Error (2013): Lost connection to MySQL server at 'waiting for initial communication packet', sys...
SQL Error (2013): Lost connection to MySQL server at 'waiting for initial communication packet', sys...
|
7月前
|
关系型数据库 MySQL Linux
【Linux环境】centos安装mysql5.7.26报 ./mysqld: error while loading shared libraries: libaio.so.1: cannot op
【Linux环境】centos安装mysql5.7.26报 ./mysqld: error while loading shared libraries: libaio.so.1: cannot op
129 0
|
4月前
|
关系型数据库 MySQL
ERROR 1449 (HY000): The user specified as a definer (‘mysql.infoschema‘@‘localhost‘) does not exist
ERROR 1449 (HY000): The user specified as a definer (‘mysql.infoschema‘@‘localhost‘) does not exist
26 0
|
7月前
|
关系型数据库 MySQL Devops
docker容器刚启动就停止 — 运行mysql 报错 mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
docker容器刚启动就停止 — 运行mysql 报错 mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
274 0
|
4月前
|
SQL 数据采集 关系型数据库
如何解决MySQL报错 You have an error in your SQL syntax; check the manual that corresponds to your MySQL?
如何解决MySQL报错 You have an error in your SQL syntax; check the manual that corresponds to your MySQL?
|
1天前
|
关系型数据库 MySQL 数据库连接
解决 mysql8.0 ERROR 1045 (28000): Access denied for user ‘ODBC‘@‘localhost‘ (using password: NO)用户访问拒绝
解决 mysql8.0 ERROR 1045 (28000): Access denied for user ‘ODBC‘@‘localhost‘ (using password: NO)用户访问拒绝
15 5
解决 mysql8.0 ERROR 1045 (28000): Access denied for user ‘ODBC‘@‘localhost‘ (using password: NO)用户访问拒绝
|
6天前
|
关系型数据库 MySQL 数据库
MySQL 启动 登录报错Job for mysqld.service failed because the control process exited with error code. See
MySQL 启动 登录报错Job for mysqld.service failed because the control process exited with error code. See
|
12天前
|
SQL 关系型数据库 MySQL
MySQL SQL error: #1271 - Illegal mix of collations for operation ‘UNION‘
MySQL SQL error: #1271 - Illegal mix of collations for operation ‘UNION‘