我有一个拥有6000万个条目的数据库。
每个条目包含:
ID 数据源ID 一些数据 约会时间 我需要从特定月份中选择条目。每个月约有200万个条目。
select * from Entries where time between "2010-04-01 00:00:00" and "2010-05-01 00:00:00" (查询大约需要1.5分钟)
我还想从给定的DataSourceID中选择特定月份的数据。(大约需要20秒)
大约有50-100个不同的DataSourceID。
有没有办法使它更快?我有什么选择?如何优化此数据库/查询?
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
利用innodb集群主键索引。
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
这将是非常出色的:
create table datasources ( year_id smallint unsigned not null, month_id tinyint unsigned not null, datasource_id tinyint unsigned not null, id int unsigned not null, -- needed for uniqueness data int unsigned not null default 0, primary key (year_id, month_id, datasource_id, id) ) engine=innodb;
select * from datasources where year_id = 2011 and month_id between 1 and 3;
select * from datasources where year_id = 2011 and month_id = 4 and datasouce_id = 100;
-- etc.. 编辑2
忘了我正在运行第一个包含3个月数据的测试脚本。这是一个月的结果:0.34和0.69秒。
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 and datasource_id = 100 order by d.id desc limit 10; +---------+----------+---------------+---------+-------+ | year_id | month_id | datasource_id | id | data | +---------+----------+---------------+---------+-------+ | 2010 | 3 | 100 | 3290330 | 38434 | | 2010 | 3 | 100 | 3290329 | 9988 | | 2010 | 3 | 100 | 3290328 | 25680 | | 2010 | 3 | 100 | 3290327 | 17627 | | 2010 | 3 | 100 | 3290326 | 64508 | | 2010 | 3 | 100 | 3290325 | 14257 | | 2010 | 3 | 100 | 3290324 | 45950 | | 2010 | 3 | 100 | 3290323 | 49986 | | 2010 | 3 | 100 | 3290322 | 2459 | | 2010 | 3 | 100 | 3290321 | 52971 | +---------+----------+---------------+---------+-------+ 10 rows in set (0.34 sec)
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 order by d.id desc limit 10; +---------+----------+---------------+---------+-------+ | year_id | month_id | datasource_id | id | data | +---------+----------+---------------+---------+-------+ | 2010 | 3 | 116 | 3450346 | 42455 | | 2010 | 3 | 116 | 3450345 | 64039 | | 2010 | 3 | 116 | 3450344 | 27046 | | 2010 | 3 | 116 | 3450343 | 23730 | | 2010 | 3 | 116 | 3450342 | 52380 | | 2010 | 3 | 116 | 3450341 | 35700 | | 2010 | 3 | 116 | 3450340 | 20195 | | 2010 | 3 | 116 | 3450339 | 21758 | | 2010 | 3 | 116 | 3450338 | 51378 | | 2010 | 3 | 116 | 3450337 | 34687 | +---------+----------+---------------+---------+-------+ 10 rows in set (0.69 sec) 编辑1
决定使用大约。3年内分布了6000万行。每个查询都是冷运行的,即每个运行单独运行,然后重新启动mysql,清除所有缓冲区且没有查询缓存。
完整的测试脚本可以在这里找到:http : //pastie.org/1723506或以下...
如您所见,即使在我不起眼的桌面上,它也是一个非常出色的架构:)
select count() from datasources; +----------+ | count() | +----------+ | 60306030 | +----------+
select count() from datasources where year_id = 2010; +----------+ | count() | +----------+ | 16691669 | +----------+
select year_id, month_id, count(*) as counter from datasources where year_id = 2010 group by year_id, month_id; +---------+----------+---------+ | year_id | month_id | counter | +---------+----------+---------+ | 2010 | 1 | 1080108 | | 2010 | 2 | 1210121 | | 2010 | 3 | 1160116 | | 2010 | 4 | 1300130 | | 2010 | 5 | 1860186 | | 2010 | 6 | 1220122 | | 2010 | 7 | 1250125 | | 2010 | 8 | 1460146 | | 2010 | 9 | 1730173 | | 2010 | 10 | 1490149 | | 2010 | 11 | 1570157 | | 2010 | 12 | 1360136 | +---------+----------+---------+ 12 rows in set (5.92 sec)
select count(*) as counter from datasources d where d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100;
+---------+ | counter | +---------+ | 30003 | +---------+ 1 row in set (1.04 sec)
explain select d.* from datasources d where d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100 order by d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ | 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 4 | NULL |4451372 | Using where; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ 1 row in set (0.00 sec)
select d.* from datasources d where d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+ | year_id | month_id | datasource_id | id | data | +---------+----------+---------------+---------+-------+ | 2010 | 3 | 100 | 3290330 | 38434 | | 2010 | 3 | 100 | 3290329 | 9988 | | 2010 | 3 | 100 | 3290328 | 25680 | | 2010 | 3 | 100 | 3290327 | 17627 | | 2010 | 3 | 100 | 3290326 | 64508 | | 2010 | 3 | 100 | 3290325 | 14257 | | 2010 | 3 | 100 | 3290324 | 45950 | | 2010 | 3 | 100 | 3290323 | 49986 | | 2010 | 3 | 100 | 3290322 | 2459 | | 2010 | 3 | 100 | 3290321 | 52971 | +---------+----------+---------------+---------+-------+ 10 rows in set (0.98 sec)
select count(*) as counter from datasources d where d.year_id = 2010 and d.month_id between 1 and 3;
+---------+ | counter | +---------+ | 3450345 | +---------+ 1 row in set (1.64 sec)
explain select d.* from datasources d where d.year_id = 2010 and d.month_id between 1 and 3 order by d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ | 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 3 | NULL |6566916 | Using where; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ 1 row in set (0.00 sec)
select d.* from datasources d where d.year_id = 2010 and d.month_id between 1 and 3 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+ | year_id | month_id | datasource_id | id | data | +---------+----------+---------------+---------+-------+ | 2010 | 3 | 116 | 3450346 | 42455 | | 2010 | 3 | 116 | 3450345 | 64039 | | 2010 | 3 | 116 | 3450344 | 27046 | | 2010 | 3 | 116 | 3450343 | 23730 | | 2010 | 3 | 116 | 3450342 | 52380 | | 2010 | 3 | 116 | 3450341 | 35700 | | 2010 | 3 | 116 | 3450340 | 20195 | | 2010 | 3 | 116 | 3450339 | 21758 | | 2010 | 3 | 116 | 3450338 | 51378 | | 2010 | 3 | 116 | 3450337 | 34687 | +---------+----------+---------------+---------+-------+ 10 rows in set (1.98 sec) 希望这可以帮助 :)来源:stack overflow