【学习资料】第14期快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介: 大家好,这里是快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法

背景

本章大

1. 聚集函数

常用聚合函数

统计类的聚合函

排序聚合

Hypothetical-Set聚合函数

分析

2. 查询

3. union\union all\except\intersect

4.

5.

优化器如何强制连接顺序

6.

左外

右外

全外

7. 窗口查询

第二章:高SQL用法

1. 聚集函数

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

常用聚合函数

Function

Argument Type(s)

Return Type

Description

array_agg(expression)

any

array of the argument type

input values, including nulls, concatenated into an array

avg(expression)

smallint, int, bigint, real, double precision, numeric, or interval

numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type

the average (arithmetic mean) of all input values

bit_and(expression)

smallint, int, bigint, or bit

same as argument data type

the bitwise AND of all non-null input values, or null if none

bit_or(expression)

smallint, int, bigint, or bit

same as argument data type

the bitwise OR of all non-null input values, or null if none

bool_and(expression)

bool

bool

true if all input values are true, otherwise false

bool_or(expression)

bool

bool

true if at least one input value is true, otherwise false

count(*)

-

bigint

number of input rows

count(expression)

any

bigint

number of input rows for which the value of expression is not null

every(expression)

bool

bool

equivalent to bool_and

json_agg(expression)

any

json

aggregates values as a JSON array

json_object_agg(name,value)

(any, any)

json

aggregates name/value pairs as a JSON object

max(expression)

any array, numeric, string, or date/time type

same as argument type

maximum value of expression across all input values

min(expression)

any array, numeric, string, or date/time type

same as argument type

minimum value of expression across all input values

string_agg(expression,delimiter)

(text, text) or (bytea, bytea)

same as argument types

input values concatenated into a string, separated by delimiter

sum(expression)

smallint, int, bigint, real, double precision, numeric, interval, or money

bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type

sum of expression across all input values

xmlagg(expression)

xml

xml

concatenation of XML values (see alsoSection 9.14.1.7)

上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.

使用sum, array_agg, 当没有行输入, 返回NULL可能有点别扭, 可以使用coalesce来替代NULL,coalesce(sum(x), 0)

coalesce(array_agg(x), '{}'::int[])

例子:

聚合后得到数组, null将计入数组元素

postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);    

array_agg    

------------    

{NULL,1,2}  

(1 row)    

算平均值时不计算null

postgres=# select avg(id) from (values(null),(1),(2)) as t(id);    

       avg            

--------------------    

1.5000000000000000    

(1 row)    

bit|时也不计算NULL

postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);    

bit_and    

---------    

      0  

(1 row)    

postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);    

bit_or    

--------    

     3  

(1 row)    

算布尔逻辑时也不计算NULL

postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);    

bool_and    

----------    

f    

(1 row)    

everybool_and的别名, 实际上是SQL标准中定义的.

postgres=# select every(id) from (values(null),(true),(false)) as t(id);    

every    

-------    

f    

(1 row)    

SQL标准中还定义了anysomebool_or的别名, 但是因为anysome还可以被解释为子查询, 所以在PostgreSQLanysome的布尔逻辑聚合不可用.

postgres=# select any(id) from (values(null),(true),(false)) as t(id);    

ERROR:  syntax error at or near "any"    

LINE 1: select any(id) from (values(null),(true),(false)) as t(id);    

              ^    

postgres=# select some(id) from (values(null),(true),(false)) as t(id);    

ERROR:  syntax error at or near "some"    

LINE 1: select some(id) from (values(null),(true),(false)) as t(id);    

              ^    

bool_or的例子

postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);    

bool_or    

---------    

t    

(1 row)    

计算非空的表达式个数, count带表达式时, 不计算null

postgres=# select count(id) from (values(null),(1),(2)) as t(id);    

count    

-------    

    2  

(1 row)    

计算表达式(含空值)的个数, count(*)计算null, 注意count(*)是一个独立的聚合函数. 请和count(express)区分开来.

postgres=# select count(*) from (values(null),(1),(2)) as t(id);    

count    

-------    

    3  

(1 row)    

postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);    

count    

-------    

    4  

(1 row)    

聚合后得到json, 不带keyjson聚合

postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);    

     json_agg          

---------------------    

[null, true, false]    

(1 row)    

聚合后得到json, keyjson聚合, 注意key不能为null, 否则报错.

postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);    

            json_object_agg                

-----------------------------------------    

{ "a" : null, "b" : true, "c" : false }    

(1 row)    

postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);    

ERROR:  22023: field name must not be null    

LOCATION:  json_object_agg_transfn, json.c:1959    

 计算最大最小值, max, min都不计算null

postgres=# select max(id) from (values(null),(1),(2)) as t(id);    

max    

-----    

  2    

(1 row)    

postgres=# select min(id) from (values(null),(1),(2)) as t(id);    

min    

-----    

  1    

(1 row)    

聚合后得到字符串, 字符串聚合

postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);    

string_agg    

------------    

a***b***c  

(1 row)    

postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);    

 string_agg      

---------------    

digoal***zhou    

(1 row)    

计算总和, sum不计算null, 当所有行都是null, 即没有任何行输入, 返回null.

postgres=# select sum(id) from (values(null),(1),(2)) as t(id);    

sum    

-----    

  3    

(1 row)    

postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);    

sum    

-----    

       

(1 row)    

(1 row)    

聚合后得到xml

postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);    

        xmlagg              

-------------------------    

<foo>digoal</foo><bar/>    

(1 row)    

某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?

支持聚合函数中使用order byPostgreSQL版本可以用如下语法:

postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);    

 string_agg      

---------------    

digoal***zhou    

(1 row)    

postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);    

 string_agg      

---------------    

zhou***digoal    

(1 row)    

统计类的聚合函

Function

Argument Type

Return Type

Description

corr(Y, X)

double precision

double precision

correlation coefficient

covar_pop(Y, X)

double precision

double precision

population covariance

covar_samp(Y, X)

double precision

double precision

sample covariance

regr_avgx(Y, X)

double precision

double precision

average of the independent variable (sum(X)/N)

regr_avgy(Y, X)

double precision

double precision

average of the dependent variable (sum(Y)/N)

regr_count(Y, X)

double precision

bigint

number of input rows in which both expressions are nonnull

regr_intercept(Y, X)

double precision

double precision

y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs

regr_r2(Y, X)

double precision

double precision

square of the correlation coefficient

regr_slope(Y, X)

double precision

double precision

slope of the least-squares-fit linear equation determined by the (X, Y) pairs

regr_sxx(Y, X)

double precision

double precision

sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)

regr_sxy(Y, X)

double precision

double precision

sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)

regr_syy(Y, X)

double precision

double precision

sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)

stddev(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

historical alias for stddev_samp

stddev_pop(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

population standard deviation of the input values

stddev_samp(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

sample standard deviation of the input values

variance(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

historical alias for var_samp

var_pop(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

population variance of the input values (square of the population standard deviation)

var_samp(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

sample variance of the input values (square of the sample standard deviation)

相关性统计:

corr, regr_r2

总体|样本方差, 标准方差:

variance, var_pop, var_samp

stddev, stddev_pop, stddev_samp

总体协方差, 样本协方差:

covar_pop, covar_samp

线性回归:

regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相关度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.

排序聚合

Function

Direct Argument Type(s)

Aggregated Argument Type(s)

Return Type

Description

mode() WITHIN GROUP (ORDER BYsort_expression)

-

any sortable type

same as sort expression

returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)

percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)

double precision

double precisionor interval

same as sort expression

continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed

percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)

double precision[]

double precisionor interval

array of sort expression's type

multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile

percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)

double precision

any sortable type

same as sort expression

discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction

percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression)

double precision[]

any sortable type

array of sort expression's type

multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile

mode比较好理解, 就是取分组中出现频率最高的值或表达式, 如果最高频率的值有多个, 则随机取一个.

postgres=# create table test(id int, info text);    

CREATE TABLE    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (1,'test2');    

INSERT 0 1    

postgres=# insert into test values (1,'test3');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# select * from test;    

id | info    

----+-------    

 1 | test1  

 1 | test1  

 1 | test2  

 1 | test3  

 2 | test1  

 2 | test1  

 2 | test1  

 3 | test4  

 3 | test4  

 3 | test4  

 3 | test4  

 3 | test4  

(12 rows)    

取出所有数据中, 出现频率最高的info, 有可能是test1也有可能是test4, 因为他们的出现频率一致.

mode的返回结果数据类型和order by后面的表达式一致.

postgres=# select mode() within group (order by info) from test;    

mode    

-------    

test1  

(1 row)    

如果按INFO来分组的话, 取出出现频率最高的info, 实际上这个操作是没有任何意义的, 返回值就是所有记录的info的唯一值.

postgres=# select mode() within group (order by info) from test group by info;    

mode    

-------    

test1  

test2  

test3  

test4  

(4 rows)    

id来分组, 取出组内出现频率最高的info, 这个是有意义的.

postgres=# select mode() within group (order by info) from test group by id;    

mode    

-------    

test1  

test1  

test4  

(3 rows)    

id=1 , 出现频率最高的infotest1. 出现2.

如下:

postgres=# select id,info,count(*) from test group by id,info;    

id | info | count    

----+-------+-------    

 1 | test1 |     2  

 1 | test3 |     1  

 3 | test4 |     5  

 1 | test2 |     1  

 2 | test1 |     3  

(5 rows)    

如果要返回mode()并返回频率次数. 可以使用row_number()窗口来实现. 如下.

postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;    

id | info | cnt    

----+-------+-----    

 1 | test1 |  2    

 2 | test1 |  3    

 3 | test4 |  5    

(3 rows)    

其他, mode的返回结果数据类型和order by后面的表达式一致.

postgres=# select mode() within group (order by id) from test;    

mode    

------    

   3    

(1 row)    

postgres=# select mode() within group (order by id+1) from test;    

mode    

------    

   4    

(1 row)    

另外还有4个函数是和数据分布有关的, 需要指定从01的分布位置. 返回排序后, 在指定分布位置的值或表达式的值.

src/backend/utils/adt/orderedsetaggs.c    

   

   

       if (percentile < 0 || percentile > 1 || isnan(percentile))    

              ereport(ERROR,    

                              (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),    

                               errmsg("percentile value %g is not between 0 and 1",    

                                              percentile)));    

同时还需要注意区分连续分布和离散分布.

postgres=# create table test(id int, info text);    

CREATE TABLE    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test2');    

INSERT 0 1    

postgres=# insert into test values (3,'test2');    

INSERT 0 1    

postgres=# insert into test values (4,'test2');    

INSERT 0 1    

postgres=# insert into test values (5,'test2');    

INSERT 0 1    

postgres=# insert into test values (6,'test2');    

INSERT 0 1    

postgres=# insert into test values (7,'test2');    

INSERT 0 1    

postgres=# insert into test values (8,'test3');    

INSERT 0 1    

postgres=# insert into test values (100,'test3');  

INSERT 0 1    

postgres=# insert into test values (1000,'test4');  

INSERT 0 1    

postgres=# select * from test;    

 id  | info      

------+-------    

   1 | test1  

   2 | test2  

   3 | test2  

   4 | test2  

   5 | test2  

   6 | test2  

   7 | test2  

   8 | test3  

 100 | test3  

1000 | test4  

(10 rows)    

取连续分布的中位数可以用percentile_cont(0.5)来获得.

postgres=# select percentile_cont(0.5) within group (order by id) from test;    

percentile_cont    

-----------------    

            5.5    

(1 row)    

这个5.5是怎么计算来的呢? 参考本文末尾:

 If (CRN = FRN = RN) then the result is    

   (value of expression from row at RN)    

 Otherwise the result is    

   (CRN - RN) * (value of expression for row at FRN) +    

   (RN - FRN) * (value of expression for row at CRN)    

解释:

N = 当前分组的行数= 10

RN = (1+传入参数*(N-1)) = (1+0.5*(10-1)) = 5.5

CRN = ceiling(RN) = 6

FRN = floor(RN) = 5

value of expression for row at FRN : 当前分组内第FRN行的值= 5

value of expression for row at CRN : 当前分组内第CRN行的值= 6

所以最终中位数值:

   (CRN - RN) * (value of expression for row at FRN) +    

   (RN - FRN) * (value of expression for row at CRN) =    

(6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;    

使用info分组:

postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    

percentile_cont | info      

-----------------+-------    

              1 | test1    

            4.5 | test2    

             54 | test3    

           1000 | test4    

(4 rows)    

验证这个值4.5 | test2 :

   2 | test2  

   3 | test2  

   4 | test2  

   5 | test2  

   6 | test2  

   7 | test2  

N = 当前分组的行数= 6

RN = (1+传入参数*(N-1)) = (1+0.5*(6-1)) = 3.5

CRN = ceiling(RN) = 4

FRN = floor(RN) = 3

value of expression for row at FRN : 当前分组内第FRN行的值= 4

value of expression for row at CRN : 当前分组内第CRN行的值= 5

所以最终中位数值:

   (CRN - RN) * (value of expression for row at FRN) +    

   (RN - FRN) * (value of expression for row at CRN) =    

(4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;    

当输入参数为数组时, 返回值也是数组, 如下:

postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;    

percentile_cont    

-----------------    

{5.5,1000}  

(1 row)    

接下来看一下稀疏分布:

返回行号大于等于指定百分比的值或表达式值.

例如:

postgres=# select id from test;    

 id    

------    

   1    

   2    

   3    

   4    

   5    

   6    

   7    

   8    

 100    

1000    

(10 rows)    

当前组一共10, 取位置在0.5.即行号>=0.5*10的第一行的值或表达式的值.

postgres=# select percentile_disc(0.5) within group (order by id) from test;    

percentile_disc    

-----------------    

              5    

(1 row)    

postgres=# select percentile_disc(0.5) within group (order by id^2) from test;    

percentile_disc    

-----------------    

             25    

(1 row)    

输入0.11, 表示行号返回>=1.1的第一行的值.

postgres=# select percentile_disc(0.11) within group (order by id) from test;    

percentile_disc    

-----------------    

              2    

(1 row)    

再看个例子

postgres=# select id,info,count(*) over (partition by info) from test;    

 id  | info  | count    

------+-------+-------    

   1 | test1 |     1  

   2 | test2 |     6  

   3 | test2 |     6  

   4 | test2 |     6  

   5 | test2 |     6  

   6 | test2 |     6  

   7 | test2 |     6  

   8 | test3 |     2  

 100 | test3 |     2  

1000 | test4 |     1  

(10 rows)    

取分组的数据, 主要看test2 这个组一共有6, 0.3*6=1.8, 所以它需要取第二行的数据.

postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;    

info  | percentile_disc    

-------+-----------------    

test1 |               1    

test2 |              3    

test3 |               8    

test4 |            1000    

(4 rows)    

注意

最终计算的是表达式的分布数, 而不是计算列值的分布数后再计算表达式.

验证如下:

或者你可以看代码:

postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;    

percentile_cont | info      

-----------------+-------    

              1 | test1    

           20.5 | test2    

           5032 | test3    

        1000000 | test4    

(4 rows)    

   

postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    

percentile_cont | info      

-----------------+-------    

              1 | test1    

            4.5 | test2    

             54 | test3    

           1000 | test4    

(4 rows)    

   

postgres=# select 4.5^2;    

     ?column?          

---------------------    

20.2500000000000000    

(1 row)    

   

postgres=# select 54^2;    

?column?    

----------    

    2916  

(1 row)    

Hypothetical-Set聚合函数

函数

返回

描述

rank()

bigint

rank of the current row with gaps; same as row_number of its first peer

dense_rank()

bigint

rank of the current row without gaps; this function counts peer groups

percent_rank()

double precision

relative rank of the current row: (rank - 1) / (total rows - 1)

cume_dist()

double precision

relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

the four ranking functions are defined so that they give the same answer for any two peer rows.

rank 返回值在分组内的等级, 如果值有重复的话, 跳级处理.

dense_rank 返回值在分组内的等级, 如果值有重复的话, 不跳级处理.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至当前记录等级一共有多少行除以本组的总行数.

看一个例子比较明白.

postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;    

 id  | info  | row_number | rank | dense_rank | percent_rank |     cume_dist        

------+-------+------------+------+------------+--------------+-------------------    

   1 | test1 |          1 |   1 |          1 |            0 |                 1    

   2 | test2 |          1 |   1 |          1 |            0 | 0.444444444444444    

   2 | test2 |          2 |   1 |          1 |           0 | 0.444444444444444    

   2 | test2 |          3 |   1 |          1 |            0 | 0.444444444444444    

   2 | test2 |          4 |   1 |          1 |            0 | 0.444444444444444    

   3 | test2 |          5 |   5 |          2 |         0.5 | 0.555555555555556    

   4 | test2 |          6 |   6 |          3 |        0.625 | 0.666666666666667    

   5 | test2 |          7 |   7 |          4 |         0.75 | 0.777777777777778    

   6 | test2 |          8 |   8 |          5 |       0.875 | 0.888888888888889    

   7 | test2 |          9 |   9 |          6 |            1 |                 1    

   8 | test3 |          1 |   1 |          1 |            0 |               0.5    

 100 | test3 |          2 |   2 |          2 |            1 |                 1    

1000 | test4 |          1 |   1 |          1 |            0 |                 1    

(13 rows)    

算法:

info='test2'这个组为例:

   2 | test2 |          1 |   1 |          1 |            0 | 0.444444444444444    

   2 | test2 |          2 |   1 |          1 |            0 | 0.444444444444444    

   2 | test2 |          3 |   1 |          1 |            0 | 0.444444444444444    

   2 | test2 |          4 |   1 |          1 |            0 | 0.444444444444444    

id=2 rankdense_rank都是1.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以

percent_rank = (1-1)/(9-1)=0    

cume_dist = (4)/(9) = 0.444444444444444    

   3 | test2 |          5 |   5 |          2 |          0.5 | 0.555555555555556    

rank = 5, 跳级

dense_rank = 2, 不跳级

percent_rank = (5-1)/(9-1)=0.5    

cume_dist = (5)/(9) = 0.555555555555556    

这些窗口函数的另一种用法, 聚合用法.

Function

Direct Argument Type(s)

Aggregated Argument Type(s)

Return Type

Description

rank(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

bigint

rank of the hypothetical row, with gaps for duplicate rows

dense_rank(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

bigint

rank of the hypothetical row, without gaps

percent_rank(args) WITHIN GROUP (ORDER BYsorted_args)

VARIADIC "any"

VARIADIC "any"

double precision

relative rank of the hypothetical row, ranging from 0 to 1

cume_dist(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

double precision

relative rank of the hypothetical row, ranging from 1/N to 1

这些用法比较奇特, 其实是要返回给定参数在集合中的位置.

例如:

1    

2    

3    

4    

5    

如果我们给一个参数值是2.2, 应该排在以上数据中的第三行.

例子:

postgres=# select * from test order by info,id;    

 id  | info      

------+-------    

   1 | test1  

   2 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

   3 | test2  

   4 | test2  

   5 | test2  

   6 | test2  

   7 | test2  

   8 | test3  

 100 | test3  

1000 | test4  

(13 rows)    

   

postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;  

info  | rank | dense_rank    

-------+------+------------    

test1 |   2 |          2    

test2 |   7 |          4    

test3 |   1 |          1    

test4 |   1 |          1    

(4 rows)    

4.9test1这个分组, 排名第2, 并且这个分组只有1个值, 所以没有gap.

重点关注test2这个组, 这个组有9个值, 其中有4个重复值2, 所以4.9在这里排名需要考虑gap.

rank 返回7, 4.9在这里考虑GAP排名第7

dense_rank 返回4, 4.9在这里不考虑GAP排名第4.

又如:

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;  

info  | rank | dense_rank    

-------+------+------------    

test1 |   2 |          2    

test2 |   7 |          4    

test3 |   1 |          1  

test4 |   1 |          1    

(4 rows)    

postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;  

info  | rank | dense_rank    

-------+------+------------    

test1 |   2 |          2    

test2 |   8 |          5    

test3 |   1 |          1    

test4 |   1 |          1    

(4 rows)    

 

最后要看计算0~1代表位置的聚合函数percent_rankcume_dist.

算法

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

例子1 :

postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;    

info  | rank | dense_rank |   percent_rank    |    cume_dist        

-------+------+------------+-------------------+-------------------    

test1 |   2 |          2 |                 1 |                 1    

test2 |   7 |          4 | 0.666666666666667 |               0.7    

test3 |   1 |          1 |                 0 | 0.333333333333333    

test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

同样以test2为分组, 讲解算法. 4.9插入到这个分组后. 数据应该变成:

   2 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

   3 | test2  

   4 | test2  

   4.9 | test2 # 计算位置    

   5 | test2    

   6 | test2  

   7 | test2  

一共10.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以4.9对应的percent_rank cume_dist 分别为:

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667    

cume_dist = (7)/10 = 0.7    

例子2 :

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;  

info  | rank | dense_rank |   percent_rank    |    cume_dist        

-------+------+------------+-------------------+-------------------    

test1 |   2 |          2 |                 1 |                 1    

test2 |   7 |          4 | 0.666666666666667 |               0.8    

test3 |   1 |          1 |                 0 | 0.333333333333333    

test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

插入计算值5, 数据变成

   2 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

   3 | test2  

   4 | test2  

   5 | test2 # 计算位置, 即参数值    

   5 | test2  

   6 | test2  

   7 | test2  

依旧10. 但是截至当前记录等级一共有多少行? 注意是8.

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667    

cume_dist = (8)/10 = 0.8    

例子3 :

postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;    

info  | rank | dense_rank |   percent_rank    |    cume_dist        

-------+------+------------+-------------------+-------------------    

test1 |   2 |          2 |                 1 |                 1    

test2 |   8 |          5 | 0.777777777777778 |               0.8    

test3 |   1 |          1 |                 0 | 0.333333333333333    

test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

插入计算值5.1, 数据变成:

   2 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

   3 | test2  

   4 | test2  

   5 | test2  

   5.1 | test2 # 计算位置, 即参数值    

   6 | test2  

   7 | test2  

例子4 :

postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;    

info  | rank | dense_rank |   percent_rank    | cume_dist    

-------+------+------------+-------------------+-----------    

test1 |   1 |          1 |                 0 |      0.5    

test2 |   3 |          3 | 0.222222222222222 |       0.4    

test3 |   3 |          3 |                 1 |         1  

test4 |   2 |          2 |                 1 |         1  

(4 rows)    

插入计算值5, 数据变成:

   7 | test2  

   6 | test2  

   5 | test2 # 注意, 这才是计算位置, 即插入位置.    

   5 | test2  

   4 | test2  

   3 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

   2 | test2  

 

多维分析

=> SELECT * FROM items_sold;    
 brand | size | sales    
-------+------+-------    
 Foo   | L    |  10    
 Foo   | M    |  20    
 Bar   | M    |  15    
 Bar   | L    |  5    
(4 rows)    
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());    
 brand | size | sum    
-------+------+-----    
 Foo   |      |  30    
 Bar   |      |  20    
       | L    |  15    
       | M    |  35    
       |      |  50    
(5 rows)    

其中GROUP BY GROUPING SETS ((brand), (size), ());

相当于以下三个group byunion all(未在分组的列以NULL代替。)

group by brand    

group by size    

group by ()    

分组集合除了可以用GROUPING SETS来指定,另外还提供了两个特殊的写法rollupcube.

ROLLUP ( e1, e2, e3, ... )    

代表递减分组,一般用于异构结构的分组如国家,省份,城市,乡镇这样的结构查询。

逐级分组汇总结果,它相当于如下写法:

 

 GROUPING SETS (    

   ( e1, e2, e3, ... ),    

   ...  

   ( e1, e2 )    

   ( e1 )  

   ( ) -- 注意包含全集    

)    

还有一种写法是CUBE

CUBE ( a, b, c )    

cube是任意组合,相当于:

GROUPING SETS (    

   ( a, b, c ),    

   ( a, b   ),    

   ( a,   c ),    

   ( a      ),    

   (   b, c ),    

   (   b    ),    

   (      c ),    

   (        )    -- 注意包含全集  

)    

cuberollup中使用括号可以将多个表达式作为单个表达式来处理:

ROLLUP ( a, (b,c), d )    

递减,相当于

GROUPING SETS (    

   ( a, b, c, d ),    

   ( a, b, c   ),    

   ( a         ),    

   (           )    

)    

CUBE ( (a,b), (c,d) )    

相当于:

GROUPING SETS (    

   ( a, b, c, d ),    

   ( a, b      ),    

   (      c, d ),    

   (           )    

)    

同时cube,rollup,grouping sets还可以混合使用:

GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))  

相当于:

GROUP BY GROUPING SETS (    

 (a,b,c,d), (a,b,c,e),    

 (a,b,d),  (a,b,e),    

 (a,c,d),  (a,c,e),    

 (a,d),    (a,e)    

)    

既然分组聚合相当于多个group by union all,那么我们如何区分当前输出的记录是哪个分组group by的输出呢?

grouping(cols)可以表示未参与聚合的表达式的比特位,并转换为INT输出。

例如:

=> SELECT * FROM items_sold;    

make  | model | sales    

-------+-------+-------    

Foo   | GT    | 10    

Foo   | Tour  | 20    

Bar   | City  | 15    

Bar   | Sport |  5    

(4 rows)    

grouping()中必须包含group by后面的任意或所有列。

=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);    

make  | model | grouping | sum    

-------+-------+----------+-----    

Foo   | GT    |        0 | 10    

Foo   | Tour  |        0 | 20    

Bar   | City  |        0 | 15    

Bar   | Sport |        0 | 5    

Foo  |       |        1 | 30    

Bar  |       |        1 | 20    

      |      |        3 | 50    

(7 rows)    

grouping()中必须包含group by后面的任意或所有列,不能包含未参与聚合的列。

grouping()中的每个表达式用1个比特位表示。

 

postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);    
CREATE TABLE    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        7   -- b'111'::int = 7    
(1 row)    
postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        3  b'11'::int = 3    
(1 row)    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        1  b'1'::int = 1    
(1 row)    
postgres=# insert into tbl values (1,2,3,4,5);    
INSERT 0 1    
postgres=# insert into tbl values (1,2,3,4,6);    
INSERT 0 1    
postgres=# insert into tbl values (2,3,4,5,6);    
INSERT 0 1    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        0    
  1 |     2 |        1    
  2 |     1 |        0    
  2 |     1 |        0    
  2 |     1 |        1    
    |     3 |        1    
  1 |     2 |        1    
    |     2 |        1    
  2 |     1 |        1    
    |     1 |        1    
    |     2 |        0    
    |     2 |        0    
    |     1 |        0    
    |     1 |        0    
(15 rows)    

grouping中包含未参与聚合的列将报错:

postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...    
                                    ^    
postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...    
                                             ^    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        1    
  1 |     2 |        3    
  2 |     1 |        0    
  2 |     1 |        1    
  2 |     1 |        3    
    |     3 |        7    
  1 |     2 |        2    
    |     2 |        6    
  2 |     1 |        2    
    |     1 |        6    
    |     2 |        4    
    |     2 |        5    
    |     1 |        4    
    |     1 |        5    
(15 rows)    

2. 查询

select子查询只能返回一列

postgres=# select (select * from (values (1,2),(2,3)) as t(c1,c2)) , relname, relkind from pg_class;    

ERROR:  subquery must return only one column    

LINE 1: select (select * from (values (1,2),(2,3)) as t(c1,c2)) , re...    

              ^    

select子查询只能返回一条记录

postgres=# select (select * from (values (1),(2)) as t(c1)) , relname, relkind from pg_class;    

ERROR:  more than one row returned by a subquery used as an expression    

子查询可以用在select子句也可以用在源中,还可以用在with, update from语句

 

 

postgres=# select (select * from (values (1),(2)) as t(c1) limit 1) , relname, relkind from pg_class;    
 c1 |                    relname                    | relkind     
----+-----------------------------------------------+---------    
  1 | pg_type                                       | r    
  1 | pg_toast_187550                               | t    
  1 | new_type                                      | c    
  1 | pg_toast_187550_index                         | i    
  1 | test                                          | r    
  1 | pg_toast_187556                               | t    
postgres=# select t.relname from (select * from pg_class limit 1) t , pg_class where t.relname=pg_class.relname;    
 relname     
---------    
 pg_type    
(1 row)    

3. union\union all\except\intersect

两条或者多条查询结果的合并、去重合并、相交、求差。

union all(不去重)

postgres=# select * from (values (1),(1)) t(id) union all select * from (values (2),(2)) t(id);  

id  

----  

 1  

 1  

 2  

 2  

(4 rows)  

union(去重)

postgres=# select * from (values (1),(1)) t(id) union select * from (values (2),(2)) t(id);  

id  

----  

 1  

 2  

(2 rows)  

except(去重)

postgres=# select * from (values (1),(1),(2)) t(id) except select * from (values (2),(2)) t(id);  

id  

----  

 1  

(1 row)  

intersect(去重)

postgres=# select * from (values (1),(1),(2)) t(id) intersect select * from (values (2),(2)) t(id);  

id  

----  

 2  

(1 row)  

 

4.

当某个表或者子句的A字段要和B字段进行关联时,可以使用自关联。

 

postgres=# create table tbl8(c1 int, c2 int, info text);  
CREATE TABLE  
postgres=# insert into tbl8 select generate_series(1,100), generate_series(2,101),md5(random()::text) ;  
INSERT 0 100  
postgres=# select t1.* from tbl8 t1, tbl8 t2 where t1.c1=t2.c2 and t2.c2<10 limit 10;  
 c1 | c2 |               info                 
----+----+----------------------------------  
  2 |  3 | b54fbfd843c2343330ceaa1758882ee4  
  3 |  4 | df999680d87435f2d2d8d5604aca2f1e  
  4 |  5 | 497ad2cfa8acd1f7062bb1bd7aa7a646  
  5 |  6 | ef55bcdc85dfa8d54978e4c2085ec55a  
  6 |  7 | 45f60117d6d577389707b22823a513e5  
  7 |  8 | ebde16ccaeced9a400a1608d591a2bf0  
  8 |  9 | fdeb1505dd02aca33abed20531592302  
  9 | 10 | 010d07f3b9b05b2c36b46440b1dd92aa  
(8 rows)  

5.内连接

内连接,仅仅输出符合连接条件的记录。

digoal=# create table tbl_join_1(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_2(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_3(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_4(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_5(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_6(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_7(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_8(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_9(id int primary key,info text);  
CREATE TABLE  
digoal=# insert into tbl_join_1 select generate_series(1,10),md5(random()::text);  
digoal=# insert into tbl_join_2 select generate_series(1,100),md5(random()::text);  
digoal=# insert into tbl_join_3 select generate_series(1,1000),md5(random()::text);  
digoal=# insert into tbl_join_4 select generate_series(1,10000),md5(random()::text);  
digoal=# insert into tbl_join_5 select generate_series(1,100000),md5(random()::text);  
digoal=# insert into tbl_join_6 select generate_series(1,1000000),md5(random()::text);  
digoal=# insert into tbl_join_7 select generate_series(1,2000000),md5(random()::text);  
digoal=# insert into tbl_join_8 select generate_series(1,3000000),md5(random()::text);  
digoal=# insert into tbl_join_9 select generate_series(1,4000000),md5(random()::text);  


非显示JOIN

postgres=# select t1.* from tbl_join_1 t1, tbl_join_2 t2 where t1.id=t2.id and t2.id=1;  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  

显示JOIN

postgres=# select t1.* from tbl_join_1 t1 join tbl_join_2 t2 on(t1.id=t2.id and t2.id=1);  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  


优化器如何强制连接顺序?

当join_collapse_limit=1时, 显示JOIN的关联按SQL写法进行关联. 例如 :

以下写法非显示关联,无法强制JOIN顺序

digoal=# set join_collapse_limit=1;  
digoal=# explain select t1.info, t5.info from tbl_join_1 t1,  
tbl_join_2 t2,  
tbl_join_3 t3,  
tbl_join_4 t4,  
tbl_join_5 t5,  
tbl_join_6 t6,  
tbl_join_7 t7,  
tbl_join_8 t8,  
tbl_join_9 t9  
where   
t1.id=t2.id and  
t2.id=t3.id and  
t3.id=t4.id and  
t4.id=t5.id and  
t5.id=t6.id and  
t6.id=t7.id and  
t7.id=t8.id and  
t8.id=t9.id and  
t9.id=10000;  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
Time: 8.398 ms  

以下写法显示关联,可以强制JOIN顺序

digoal=# set join_collapse_limit=1;  
digoal=# explain select t1.info, t5.info from   
tbl_join_1 t1 join tbl_join_2 t2 on (t1.id=t2.id)  
join tbl_join_3 t3 on (t2.id=t3.id)  
join tbl_join_4 t4 on (t3.id=t4.id)  
join tbl_join_5 t5 on (t4.id=t5.id)  
join tbl_join_6 t6 on (t5.id=t6.id)  
join tbl_join_7 t7 on (t6.id=t7.id)  
join tbl_join_8 t8 on (t7.id=t8.id)  
join tbl_join_9 t9 on (t8.id=t9.id)  
where t9.id=10000;  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
Time: 0.829 ms  

设置join_collapse_limit=1后, 按照SQL写法进行关联. 执行计划的时间也缩短了.

6. 外连接

左外连接

左边的表,即使没有匹配,也输出

postgres=# create table tab1(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 26.056 ms  
postgres=# create table tab2(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 36.215 ms  
postgres=# insert into tab1 values (1,'test',now());  
INSERT 0 1  
Time: 0.520 ms  
postgres=# insert into tab1 values (2,'test',now());  
INSERT 0 1  
Time: 0.297 ms  
postgres=# insert into tab2 values (2,'test',now());  
INSERT 0 1  
Time: 11.325 ms  
postgres=# insert into tab2 values (3,'test',now());  
INSERT 0 1  
Time: 0.352 ms  
postgres=# select * from tab1;  
 id | info |          crt_time            
----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
  2 | test | 2017-04-11 17:48:32.742795  
(2 rows)  
Time: 0.506 ms  
postgres=# select * from tab2;  
 id | info |          crt_time            
----+------+----------------------------  
  2 | test | 2017-04-11 17:48:39.722821  
  3 | test | 2017-04-11 17:48:41.901834  
(2 rows)  
Time: 0.335 ms  

右表没有被匹配时,输出NULL

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id);  
 id | info |          crt_time          | id | info |          crt_time            
----+------+----------------------------+----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  |    |      |   
  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  
(2 rows)  

过滤在A表但是不在B表的记录。

postgres=# select tab1.* from tab1 left join tab2 on (tab1.id=tab2.id) where tab2.* is null;  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  
postgres=# select * from tab1 where id in (select id from tab1 except select id from tab2);  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  

过外部where filter,可以过滤连接=false的记录

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
 id | info |         crt_time          | id | info | crt_time   
----+------+---------------------------+----+------+----------  
  1 | test | 2017-04-11 17:48:29.37083 |    |      |   
(1 row)  
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
                            QUERY PLAN                              
------------------------------------------------------------------  
 Nested Loop Left Join  (cost=0.00..48.81 rows=36 width=88)  
   Join Filter: (tab1.id = tab2.id)  
   ->  Seq Scan on tab1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Materialize  (cost=0.00..24.16 rows=6 width=44)  
         ->  Seq Scan on tab2  (cost=0.00..24.12 rows=6 width=44)  
               Filter: (id = 1)  
(7 rows)  


join内的条件,不会过滤未连接的记录

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  

id | info |          crt_time          | id | info | crt_time  

----+------+----------------------------+----+------+----------  

 1 | test | 2017-04-11 17:48:29.37083  |   |      |  

 2 | test | 2017-04-11 17:48:32.742795 |    |     |  

(2 rows)  

 

postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  

                            QUERY PLAN                                

---------------------------------------------------------------------  

Hash Left Join (cost=35.42..298.96 rows=1130 width=88)  

  Hash Cond: (tab1.id = tab2.id)  

  Join Filter: (tab1.id = 1)  

  -> Seq Scan on tab1 (cost=0.00..21.30 rows=1130 width=44)

  -> Hash  (cost=21.30..21.30 rows=1130 width=44)  

        -> Seq Scan on tab2 (cost=0.00..21.30 rows=1130 width=44)

(6 rows)  

右外

左表没有被匹配时,输出NULL

postgres=# select * from tab1 right join tab2 on (tab1.id=tab2.id);  

id | info |          crt_time          | id | info |          crt_time            

----+------+----------------------------+----+------+----------------------------  

 2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  

   |     |                           |  3 | test | 2017-04-11 17:48:41.901834  

(2 rows)  

全外

没有被匹配时,输出NULL

postgres=# select * from tab1 full join tab2 on (tab1.id=tab2.id);  

id | info |          crt_time          | id | info |          crt_time            

----+------+----------------------------+----+------+----------------------------  

 1 | test | 2017-04-11 17:48:29.37083  |   |      |  

 2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  

   |     |                           |  3 | test | 2017-04-11 17:48:41.901834  

(3 rows)  

 

 7. 窗口查询

https://www.postgresql.org/docs/9.6/static/functions-window.html

常用窗口函数

Function

Return Type

Description

row_number()

bigint

number of the current row within its partition, counting from 1

rank()

bigint

rank of the current row with gaps; same as row_number of its first peer

dense_rank()

bigint

rank of the current row without gaps; this function counts peer groups

percent_rank()

double precision

relative rank of the current row: (rank - 1) / (total rows - 1)

cume_dist()

double precision

relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

ntile(num_buckets integer)

integer

integer ranging from 1 to the argument value, dividing the partition as equally as possible

lag(value anyelement [, offset integer [, default anyelement ]])

same type as value

returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

lead(value anyelement [, offset integer [, default anyelement ]])

same type as value

returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

first_value(value any)

same type as value

returns value evaluated at the row that is the first row of the window frame

last_value(value any)

same type as value

returns value evaluated at the row that is the last row of the window frame

nth_value(value any, nth integer)

same type as value

returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

其他函数也可用在窗口语法中

例子

CREATE TABLE window_test(id int, name text, subject text, score numeric);  

INSERT INTO window_test VALUES(1,'digoal','数学',99.5),  

(2,'digoal','语文',89.5),  

(3,'digoal','英语',79.5),  

(4,'digoal','物理',99.5),  

(5,'digoal','化学',98.5),  

(6,'刘德华','数学',89.5),  

(7,'刘德华','语文',99.5),  

(8,'刘德华','英语',79.5),  

(9,'刘德华','物理',89.5),  

(10,'刘德华','化学',69.5),  

(11,'张学友','数学',89.5),  

(12,'张学友','语文',91.5),  

(13,'张学友','英语',92.5),  

(14,'张学友','物理',93.5),  

(15,'张学友','化学',94.5);  

-- 取出每门课程的第一名.

SELECT id,name,subject,score FROM  

 (SELECT row_number() OVER (PARTITION BY subject ORDER BY score DESC) AS rn,* FROM window_test) AS t  

WHERE rn=1 ORDER BY SUBJECT;  

id | name  | subject | score  

----+--------+---------+-------  

 5 | digoal | 化学    | 98.5  

 1 | digoal | 数学    | 99.5  

 4 | digoal | 物理    | 99.5  

13 | 张学友| 英语    | 92.5  

 7 | 刘德华| 语文    | 99.5  

每个人每门成绩与第一名的差距,第一名,排名

postgres=# SELECT row_number() over(partition by subject order by score desc) as ord,

id,name,subject,score,max(score) over (partition by subject) as top1,

max(score) over (partition by subject) - score as diff from window_test ;  

ord | id | name  | subject | score | top1 | diff  

-----+----+--------+---------+-------+------+------  

  1 |  5 | digoal | 化学    | 98.5 | 98.5 |  0.0  

  2 | 15 | 张学友| 化学    | 94.5 | 98.5 |  4.0  

  3 | 10 | 刘德华| 化学    | 69.5 | 98.5 | 29.0  

  1 |  1 | digoal | 数学    | 99.5 | 99.5 |  0.0  

  2 | 11 | 张学友| 数学    | 89.5 | 99.5 | 10.0  

  3 |  6 | 刘德华| 数学    | 89.5 | 99.5 | 10.0  

  1 |  4 | digoal | 物理    | 99.5 | 99.5 |  0.0  

  2 | 14 | 张学友| 物理    | 93.5 | 99.5 |  6.0  

  3 |  9 | 刘德华| 物理    | 89.5 | 99.5 | 10.0  

  1 | 13 | 张学友| 英语    | 92.5 | 92.5 |  0.0  

  2 |  3 | digoal | 英语    | 79.5 | 92.5 | 13.0  

  3 |  8 | 刘德华| 英语    | 79.5 | 92.5 | 13.0  

  1 |  7 | 刘德华| 语文    | 99.5 | 99.5 |  0.0  

  2 | 12 | 张学友| 语文    | 91.5 | 99.5 |  8.0  

  3 |  2 | digoal | 语文    |  89.5 | 99.5 | 10.0  

(15 rows)  

 

 

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
2月前
|
SQL 数据库 开发者
MSSQL性能调优实战:索引策略、SQL优化与并发管理深度剖析
在Microsoft SQL Server(MSSQL)的性能调优过程中,索引策略、SQL查询优化以及并发管理是关键的三大支柱
|
18天前
|
SQL 关系型数据库 C语言
PostgreSQL SQL扩展 ---- C语言函数(三)
可以用C(或者与C兼容,比如C++)语言编写用户自定义函数(User-defined functions)。这些函数被编译到动态可加载目标文件(也称为共享库)中并被守护进程加载到服务中。“C语言函数”与“内部函数”的区别就在于动态加载这个特性,二者的实际编码约定本质上是相同的(因此,标准的内部函数库为用户自定义C语言函数提供了丰富的示例代码)
|
28天前
|
SQL 存储 关系型数据库
PostgreSQL核心之SQL基础学习
PostgreSQL核心之SQL基础学习
25 3
|
1月前
|
SQL 安全 关系型数据库
PostgreSQL SQL注入漏洞(CVE-2018-10915)--处理
【8月更文挑战第8天】漏洞描述:PostgreSQL是一款自由的对象关系型数据库管理系统,支持多种SQL标准及特性。存在SQL注入漏洞,源于应用未有效验证外部输入的SQL语句,允许攻击者执行非法命令。受影响版本包括10.5及更早版本等。解决方法为升级PostgreSQL
169 2
|
21天前
|
SQL 关系型数据库 MySQL
SQL Server、MySQL、PostgreSQL:主流数据库SQL语法异同比较——深入探讨数据类型、分页查询、表创建与数据插入、函数和索引等关键语法差异,为跨数据库开发提供实用指导
【8月更文挑战第31天】SQL Server、MySQL和PostgreSQL是当今最流行的关系型数据库管理系统,均使用SQL作为查询语言,但在语法和功能实现上存在差异。本文将比较它们在数据类型、分页查询、创建和插入数据以及函数和索引等方面的异同,帮助开发者更好地理解和使用这些数据库。尽管它们共用SQL语言,但每个系统都有独特的语法规则,了解这些差异有助于提升开发效率和项目成功率。
92 0
|
2月前
|
SQL 数据采集 数据管理
SQL数据:探索、管理与优化的全面解析
在信息化时代,数据成为企业核心资产。本文探讨SQL在数据探索、管理与优化中的作用:使用DESC、SELECT了解数据集;评估数据质量;发现数据特征。管理方面,涵盖数据存储、检索、更新与维护。优化则涉及索引、查询及数据库设计,确保高性能和效率。掌握SQL能有效挖掘数据价值,支持企业决策与创新。
85 1
|
2月前
|
SQL 存储 数据挖掘
SQL数据:挖掘、管理与应用的深度探索
在数据驱动的时代, SQL作为数据库管理和查询的基石至关重要。本文探讨了SQL数据的挖掘、管理与应用。数据挖掘包括数据查询、聚合与关联,帮助发现数据模式和趋势以支持决策。数据管理确保数据的完整性、一致性和可用性,涉及存储、检索、更新和维护。而数据的应用则能推动业务发展、优化运营、提升客户体验和促进创新。通过高效利用SQL,企业可以最大化其数据资产的价值并在竞争中脱颖而出。
46 0
|
2月前
|
SQL DataWorks 安全
DataWorks产品使用合集之在进行数据查询和数据处理时,如何通过数据建模与开发模块来创建和管理SQL脚本
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
|
4天前
|
NoSQL 关系型数据库 MySQL
微服务架构下的数据库选择:MySQL、PostgreSQL 还是 NoSQL?
在微服务架构中,数据库的选择至关重要。不同类型的数据库适用于不同的需求和场景。在本文章中,我们将深入探讨传统的关系型数据库(如 MySQL 和 PostgreSQL)与现代 NoSQL 数据库的优劣势,并分析在微服务架构下的最佳实践。
|
6天前
|
存储 SQL 关系型数据库
使用MySQL Workbench进行数据库备份
【9月更文挑战第13天】以下是使用MySQL Workbench进行数据库备份的步骤:启动软件后,通过“Database”菜单中的“管理连接”选项配置并选择要备份的数据库。随后,选择“数据导出”,确认导出的数据库及格式(推荐SQL格式),设置存储路径,点击“开始导出”。完成后,可在指定路径找到备份文件,建议定期备份并存储于安全位置。
66 11

热门文章

最新文章