开发者社区 问答 正文

检索每个组中的最后一条记录-MySQL

有一个messages包含数据的表,如下所示:

Id Name Other_Columns

1 A A_data_1 2 A A_data_2 3 A A_data_3 4 B B_data_1 5 B B_data_2 6 C C_data_1 如果我运行查询select * from messages group by name,我将得到的结果为:

1 A A_data_1 4 B B_data_1 6 C C_data_1 什么查询将返回以下结果?

3 A A_data_3 5 B B_data_2 6 C C_data_1 即,应返回每个组中的最后一条记录。

目前,这是我使用的查询:

SELECT * FROM (SELECT * FROM messages ORDER BY id DESC) AS x GROUP BY name 但这看起来效率很低。还有其他方法可以达到相同的结果吗?

展开
收起
保持可爱mmm 2020-05-08 10:10:58 471 分享 版权
1 条回答
写回答
取消 提交回答
  • MySQL 8.0现在支持窗口功能,就像几乎所有流行的SQL实现一样。使用这种标准语法,我们可以编写每组最多n个查询:

    WITH ranked_messages AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn FROM messages AS m ) SELECT * FROM ranked_messages WHERE rn = 1; 以下是我在2009年为此问题写的原始答案:

    我这样写解决方案:

    SELECT m1.* FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id < m2.id) WHERE m2.id IS NULL; 关于性能,一种解决方案可能会更好,这取决于数据的性质。因此,您应该测试两个查询,并使用给定数据库性能最好的查询。

    例如,我有一个StackOverflow August数据转储的副本。我将其用于基准测试。该Posts表中有1,114,357行。它在Macbook Pro 2.40GHz的MySQL 5.0.75上运行。

    我将编写查询以查找给定用户ID(我的用户)的最新帖子。

    首先在子查询中使用@Eric 所示的技术GROUP BY:

    SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860;

    1 row in set (1 min 17.89 sec) 甚至EXPLAIN分析也要花费超过16秒的时间:

    +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where | | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ 3 rows in set (16.09 sec) 现在用产生同样的查询结果我的技术有LEFT JOIN:

    SELECT p1.postid FROM Posts p1 LEFT JOIN posts p2 ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid) WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

    1 row in set (0.28 sec) 该EXPLAIN分析表明,这两个表都能够使用他们的指标:

    +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index | | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ 2 rows in set (0.00 sec) 这是我的Posts桌子的DDL :

    CREATE TABLE posts ( PostId bigint(20) unsigned NOT NULL auto_increment, PostTypeId bigint(20) unsigned NOT NULL, AcceptedAnswerId bigint(20) unsigned default NULL, ParentId bigint(20) unsigned default NULL, CreationDate datetime NOT NULL, Score int(11) NOT NULL default '0', ViewCount int(11) NOT NULL default '0', Body text NOT NULL, OwnerUserId bigint(20) unsigned NOT NULL, OwnerDisplayName varchar(40) default NULL, LastEditorUserId bigint(20) unsigned default NULL, LastEditDate datetime default NULL, LastActivityDate datetime default NULL, Title varchar(250) NOT NULL default '', Tags varchar(150) NOT NULL default '', AnswerCount int(11) NOT NULL default '0', CommentCount int(11) NOT NULL default '0', FavoriteCount int(11) NOT NULL default '0', ClosedDate datetime default NULL, PRIMARY KEY (PostId), UNIQUE KEY PostId (PostId), KEY PostTypeId (PostTypeId), KEY AcceptedAnswerId (AcceptedAnswerId), KEY OwnerUserId (OwnerUserId), KEY LastEditorUserId (LastEditorUserId), KEY ParentId (ParentId), CONSTRAINT posts_ibfk_1 FOREIGN KEY (PostTypeId) REFERENCES posttypes (PostTypeId) ) ENGINE=InnoDB;

    2020-05-08 10:11:26
    赞同 展开评论