标签
PostgreSQL , substrb , 字节截取
背景
Oracle的substrb函数,用于基于字节流的截取,需要考虑多字节字符串的编码问题,未截取完整字符,则不截取。
https://docs.oracle.com/cd/B12037_01/olap.101/b10339/x_stddev004.htm
substr则用于基于字符串的截取。
PostgreSQL也可以支持类似的功能。
orafce插件
安装orafce插件,里面包含了大量的oracle兼容函数。
postgres=# \df *.*substrb*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------+------------------+----------------------------+--------
pg_catalog | substrb | varchar2 | varchar2, integer | normal
pg_catalog | substrb | varchar2 | varchar2, integer, integer | normal
(2 rows)
实际上这部分代码在PostgreSQL中已经存在,只是没有创建SQL函数。
src/backend/utils/adt/varlena.c
/*
* bytea_substr()
* Return a substring starting at the specified position.
* Cloned from text_substr and modified as required.
*
* Input:
* - string
* - starting position (is one-based)
* - string length (optional)
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL.
* If the length is less than zero, an ERROR is thrown. If no third argument
* (length) is provided, the length to the end of the string is assumed.
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
PG_RETURN_BYTEA_P(bytea_substring(PG_GETARG_DATUM(0),
PG_GETARG_INT32(1),
PG_GETARG_INT32(2),
false));
}
static bytea *
bytea_substring(Datum str,
int S,
int L,
bool length_not_specified)
{
int S1; /* adjusted start position */
int L1; /* adjusted substring length */
S1 = Max(S, 1);
if (length_not_specified)
{
/*
* Not passed a length - DatumGetByteaPSlice() grabs everything to the
* end of the string if we pass it a negative value for length.
*/
L1 = -1;
}
else
{
/* end position */
int E = S + L;
/*
* A negative value for L is the only way for the end position to be
* before the start. SQL99 says to throw an error.
*/
if (E < S)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
errmsg("negative substring length not allowed")));
/*
* A zero or negative value for the end position can happen if the
* start was negative or one. SQL99 says to return a zero-length
* string.
*/
if (E < 1)
return PG_STR_GET_BYTEA("");
L1 = E - S1;
}
/*
* If the start position is past the end of the string, SQL99 says to
* return a zero-length string -- DatumGetByteaPSlice() will do that for
* us. Convert to zero-based starting position
*/
return DatumGetByteaPSlice(str, S1 - 1, L1);
}
postgres=# select octet_length(public.substrb('nihao 中国 abc你好'::varchar,2,9));
octet_length
--------------
9
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,9);
substrb
---------
ihao 中
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,10);
substrb
---------
ihao 中
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,8);
substrb
---------
ihao 中
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,6);
substrb
---------
ihao
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,7);
substrb
---------
ihao
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,9);
substrb
---------
ihao 中
(1 row)
postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,10);
substrb
---------
ihao 中
(1 row)