过去一年间,对优化器相关论文做了个系统性的学习,把过程中阅读的论文笔记记录在这里,和大家分享,欢迎大家和我一起讨论,纠错补差,共同进步 ~
阅读路线基本遵照了pingcap github上的一个Awesome Database Learning的资料,这个资料非常棒,包含了一些基本的课程 + 书籍,还按照内核中不同模块的不同方面做了分类,非常系统化,尤其是SQL层面非常详尽,正好符合需求,因此阅读基本也是按其中的paper来,并扩展到一些没有涉及的内容,总体目录如下(优化器部分),由于内容较多,主要挑选其中影响力较大的或者最有参考意义的。
Planner framework
- 1979, Access Path Selection in a Relational Database Management System, SIGMOD
- 1979, Query Processing in Main Memory Database Management Systems, VLDB
- 1988, Grammar-like Functional Rules for Representing Query Optimization Alternatives, SIGMOD
- 1993, The Volcano Optimizer Generator- Extensibility and Efficient Search, ICDE
- 1995, The Cascades Framework for Query Optimization, IEEE Data engineering Bulltin
- 1998, An Overview of Query Optimization in Relational Systems, PODS
- 2014, Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD
- 2016, The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database, VLDB
- 2012, Testing the Accuracy of Query Optimizers, DBTest
Transformation
- 1995, Eager Aggregation and Lazy Aggregation, VLDB
- 2000, Parameterized Queries and Nesting Equivalences, Microsoft Research
- 2001, Orthogonal Optimization of Subqueries and Aggregation, SIGMOD
- 2003, WinMagic : Subquery Elimination Using Window Aggregation, SIGMOD
- 2006, Cost-based query transformation in Oracle, VLDB
- 2009, Enhanced subquery optimizations in Oracle, VLDB
- 2015, Unnesting Arbitrary Queries, BTW
- 2017, The Complete Story of Joins, BTW
Join Order Optimization
- 2006, Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products, VLDB
- 2008, Dynamic Programming Strikes Back, SIGMOD
- 2013, On the Correct and Complete Enumeration of the Core Search Space, SIGMOD
- 2018, Adaptive Optimization of Very Large Join Queries, SIGMOD
- 2018, Improving Join Reorderability with Compensation Operators, SIGMOD
Functional Dependency & Physical Properties
- 1996, Fundamental Techniques for Order Optimization, SIGMOD
- 2004, An Efficient Framework for Order Optimization, ICDE
- 2010, Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer, ICDE
- 2011, Accelerating Queries with GroupBy and Join by Group join, VLDB
Cost Model
- 1996, Modelling Costs for a MM-DBMS, in Real-Time Databases
- 1996, SEEKing the truth about ad hoc join costs, IBM Almaden Research Center
- 2014, Approximation Schemes for Many-Objective Query Optimization, SIGMOD
Statistics
Histogram
- 1984, Accurate Estimation of the Number of Tuples Satisfying a Condition, SIGMOD
- 1993, Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results, ACM Trans. on Database Systems
- 1993, Universality of Serial Histograms, VLDB
- 1995, Balancing Histogram Optimality and Practicality for Query Result Size Estimation, SIGMOD
- 1996, Improved Histograms for Selectivity Estimation of Range Predicates, SIGMOD
- 1997, SEEKing the truth about ad hoc join costs, VLDB
- 2003, The History of Histograms, VLDB
- 2009, Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors, VLDB
- 2010, Histograms Reloaded: The Merits of Bucket Diversity, SIGMOD
- 2014, Exploiting Ordered Dictionaries to Efficiently Construct Histograms with Q-Error Guarantees in SAP HANA, SIGMOD
- 2015, How Good Are Query Optimizers, Really?, VLDB
Probabilistic Counting
- 2000, Towards Estimation Error Guarantees for Distinct Values, SIGMOD/PODS
- 2001, Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports, VLDB
- 2005, An Improved Data Stream Summary:The Count-Min Sketch and its ApplicationsJournal of Algorithms
- 2007, New Estimation Algorithms for Streaming Data: Count-min Can Do More
- 2013, HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm, ACM
Others
- 2002, Exploiting Statistics on Query Expressions for Optimization, ACM
- 2017, Adaptive Statistics in Oracle 12c, VLDB
- 2017, Statisticum: Data Statistics Management in SAP HANA, VLDB
- 2019, Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities, SIGMOD
- 2019, Deep Unsupervised Cardinality Estimation, VLDB
- 2020, NeuroCard: One Cardinality Estimator for All Tables, VLDB
Query feedback loop
- 2001, LEO – DB2’s LEarning Optimizer, VLDB
- 2004, Robust Query Processing through Progressive Optimization, SIGMOD
- 2004, Automated Statistics Collection in DB2 UDB, VLDB
- 2008, Optimizer plan change management: improved stability and performance in Oracle 11g, VLDB
- 2015, Adaptive Query Processing in the Looking Glass, CIDR
MPP optimization
- 1995, DB2 Parallel Edition, IBM System Journal
- 2004, Parallel SQL execution in Oracle 10g, SIGMOD
- 2012, Query Optimization in Microsoft SQL Server PDW, ACM
- 2013, Adaptive and big data scale parallel execution in Oracle, VLDB
- 2014, Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD
Benchmark
2013, TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark, tpc.org
2020, Quantifying TPCH Choke Points and Their Optimizations, VLDB