在实际工作开发过程中,常常会遇到数据库表中存在多条数据重复了,此时我们需要删除重复数据,只保留其中一条有效的数据;
针对这种场景,我们用SQL语句该怎么实现呢?
数据准备
建表语句:
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
`id` int(11) NULL DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
INSERT INTO `test` VALUES (1, '张三');
INSERT INTO `test` VALUES (2, '李四');
INSERT INTO `test` VALUES (4, '张三');
INSERT INTO `test` VALUES (5, '王二');
INSERT INTO `test` VALUES (6, '护具');
INSERT INTO `test` VALUES (7, '无极');
INSERT INTO `test` VALUES (8, '护具');
INSERT INTO `test` VALUES (3, '空气');
INSERT INTO `test` VALUES (9, '王二');
INSERT INTO `test` VALUES (10, '几乎');
commit;
查看重复数据,并筛选
select t.name,count(1) from test t where 1=1 group by t.name ;
使用having语句进行筛选
select t.name,count(1) from test t where 1=1 group by t.name HAVING count(1) >1;
对于重复数据,保留一条数据筛选
select t.name,min(id) as id ,count(1) from test t where 1=1 group by t.name;
删除重复数据
delete from test where id not in (
select min(id) from test t where 1=1 group by t.name ) ;
执行上述SQL语句,发现会报错:
delete from test where id not in (
select min(id) from test t where 1=1 group by t.name )
> 1093 - You can't specify target table 'test' for update in FROM clause
> 时间: 0.004s
导致这一原因的问题是:不能在同一表中查询的数据作为同一表的更新数据。
正确参考SQL:
(1) 创建一张表temp_table存储最终保留的数据。
create table temp_table as SELECT min( id ) as id FROM test t WHERE 1 = 1 GROUP BY t.NAME;
(2) 排除表temp_table中的数据,删除即可。
DELETE FROM test WHERE id NOT IN (SELECT * FROM temp_table);
成功删除重复数据!