【学习资料】第14期快速入门PostgreSQL应用开发与管理

背景

本章大纲

1. 聚集函数

常用聚合函数

统计类的聚合函数

分组排序聚合

Hypothetical-Set聚合函数

多维分析

2. 子查询

3. union\union all\except\intersect

4. 自连接

5. 内连接

优化器如何强制连接顺序？

6. 外连接

左外连接

右外连接

全外连接

7. 窗口查询

第二章：高级SQL用法

1. 聚集函数

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

常用聚合函数

Function	Argument Type(s)	Return Type	Description
array_agg(expression)	any	array of the argument type	input values, including nulls, concatenated into an array
avg(expression)	smallint, int, bigint, real, double precision, numeric, or interval	numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type	the average (arithmetic mean) of all input values
bit_and(expression)	smallint, int, bigint, or bit	same as argument data type	the bitwise AND of all non-null input values, or null if none
bit_or(expression)	smallint, int, bigint, or bit	same as argument data type	the bitwise OR of all non-null input values, or null if none
bool_and(expression)	bool	bool	true if all input values are true, otherwise false
bool_or(expression)	bool	bool	true if at least one input value is true, otherwise false
count(*)	-	bigint	number of input rows
count(expression)	any	bigint	number of input rows for which the value of expression is not null
every(expression)	bool	bool	equivalent to bool_and
json_agg(expression)	any	json	aggregates values as a JSON array
json_object_agg(name,value)	(any, any)	json	aggregates name/value pairs as a JSON object
max(expression)	any array, numeric, string, or date/time type	same as argument type	maximum value of expression across all input values
min(expression)	any array, numeric, string, or date/time type	same as argument type	minimum value of expression across all input values
string_agg(expression,delimiter)	(text, text) or (bytea, bytea)	same as argument types	input values concatenated into a string, separated by delimiter
sum(expression)	smallint, int, bigint, real, double precision, numeric, interval, or money	bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type	sum of expression across all input values
xmlagg(expression)	xml	xml	concatenation of XML values (see alsoSection 9.14.1.7)

上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.

使用sum, array_agg时, 当没有行输入, 返回NULL可能有点别扭, 可以使用coalesce来替代NULL,如coalesce(sum(x), 0)

coalesce(array_agg(x), '{}'::int[])

例子:

聚合后得到数组, null将计入数组元素

postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);

array_agg

------------

{NULL,1,2}

(1 row)

算平均值时不计算null

postgres=# select avg(id) from (values(null),(1),(2)) as t(id);

avg

--------------------

1.5000000000000000

(1 row)

算bit与|或时也不计算NULL

postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);

bit_and

---------

(1 row)

postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);

bit_or

--------

(1 row)

算布尔逻辑时也不计算NULL

postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);

bool_and

----------

(1 row)

every是bool_and的别名, 实际上是SQL标准中定义的.

postgres=# select every(id) from (values(null),(true),(false)) as t(id);

every

-------

(1 row)

SQL标准中还定义了any和some为bool_or的别名, 但是因为any和some还可以被解释为子查询, 所以在PostgreSQL中any和some的布尔逻辑聚合不可用.

postgres=# select any(id) from (values(null),(true),(false)) as t(id);

ERROR: syntax error at or near "any"

LINE 1: select any(id) from (values(null),(true),(false)) as t(id);

postgres=# select some(id) from (values(null),(true),(false)) as t(id);

ERROR: syntax error at or near "some"

LINE 1: select some(id) from (values(null),(true),(false)) as t(id);

bool_or的例子

postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);

bool_or

---------

(1 row)

计算非空的表达式个数, count带表达式时, 不计算null

postgres=# select count(id) from (values(null),(1),(2)) as t(id);

count

-------

(1 row)

计算表达式(含空值)的个数, count(*)计算null, 注意count(*)是一个独立的聚合函数. 请和count(express)区分开来.

postgres=# select count(*) from (values(null),(1),(2)) as t(id);

count

-------

(1 row)

postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);

count

-------

(1 row)

聚合后得到json, 不带key的json聚合

postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);

json_agg

---------------------

[null, true, false]

(1 row)

聚合后得到json, 带key的json聚合, 注意key不能为null, 否则报错.

postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);

json_object_agg

-----------------------------------------

{ "a" : null, "b" : true, "c" : false }

(1 row)

postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);

ERROR: 22023: field name must not be null

LOCATION: json_object_agg_transfn, json.c:1959

计算最大最小值, max, min都不计算null

postgres=# select max(id) from (values(null),(1),(2)) as t(id);

max

-----

(1 row)

postgres=# select min(id) from (values(null),(1),(2)) as t(id);

min

-----

(1 row)

聚合后得到字符串, 字符串聚合

postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);

string_agg

------------

a***b***c

(1 row)

postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);

string_agg

---------------

digoal***zhou

(1 row)

计算总和, sum不计算null, 当所有行都是null时, 即没有任何行输入, 返回null.

postgres=# select sum(id) from (values(null),(1),(2)) as t(id);

sum

-----

(1 row)

postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);

sum

-----

(1 row)

聚合后得到xml

postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);

xmlagg

-------------------------

<foo>digoal</foo><bar/>

(1 row)

某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?

支持聚合函数中使用order by的PostgreSQL版本可以用如下语法:

postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);

string_agg

---------------

digoal***zhou

(1 row)

postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);

string_agg

---------------

zhou***digoal

(1 row)

统计类的聚合函数

Function	Argument Type	Return Type	Description
corr(Y, X)	double precision	double precision	correlation coefficient
covar_pop(Y, X)	double precision	double precision	population covariance
covar_samp(Y, X)	double precision	double precision	sample covariance
regr_avgx(Y, X)	double precision	double precision	average of the independent variable (sum(X)/N)
regr_avgy(Y, X)	double precision	double precision	average of the dependent variable (sum(Y)/N)
regr_count(Y, X)	double precision	bigint	number of input rows in which both expressions are nonnull
regr_intercept(Y, X)	double precision	double precision	y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_r2(Y, X)	double precision	double precision	square of the correlation coefficient
regr_slope(Y, X)	double precision	double precision	slope of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_sxx(Y, X)	double precision	double precision	sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)
regr_sxy(Y, X)	double precision	double precision	sum(XY) - sum(X) sum(Y)/N ("sum of products" of independent times dependent variable)
regr_syy(Y, X)	double precision	double precision	sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)
stddev(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	historical alias for stddev_samp
stddev_pop(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	population standard deviation of the input values
stddev_samp(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	sample standard deviation of the input values
variance(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	historical alias for var_samp
var_pop(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	population variance of the input values (square of the population standard deviation)
var_samp(expression)	smallint, int, bigint, real, double precision, or numeric	double precision for floating-point arguments, otherwise numeric	sample variance of the input values (square of the sample standard deviation)

相关性统计:

corr, regr_r2

总体|样本方差, 标准方差:

variance, var_pop, var_samp

stddev, stddev_pop, stddev_samp

总体协方差, 样本协方差:

covar_pop, covar_samp

线性回归:

regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相关度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.

分组排序聚合

Function	Direct Argument Type(s)	Aggregated Argument Type(s)	Return Type	Description
mode() WITHIN GROUP (ORDER BYsort_expression)	-	any sortable type	same as sort expression	returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)	double precision	double precisionor interval	same as sort expression	continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)	double precision[]	double precisionor interval	array of sort expression's type	multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile
percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)	double precision	any sortable type	same as sort expression	discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction
percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression)	double precision[]	any sortable type	array of sort expression's type	multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile

mode比较好理解, 就是取分组中出现频率最高的值或表达式, 如果最高频率的值有多个, 则随机取一个.

postgres=# create table test(id int, info text);

CREATE TABLE

postgres=# insert into test values (1,'test1');

INSERT 0 1

postgres=# insert into test values (1,'test1');

INSERT 0 1

postgres=# insert into test values (1,'test2');

INSERT 0 1

postgres=# insert into test values (1,'test3');

INSERT 0 1

postgres=# insert into test values (2,'test1');

INSERT 0 1

postgres=# insert into test values (2,'test1');

INSERT 0 1

postgres=# insert into test values (2,'test1');

INSERT 0 1