标签
PostgreSQL , substrb , 按字节截断 , 不截断多字节字符
背景
在orafce提供的substrb中,当截断到某个多字节字符时,如果字符被中间截断,会导致乱码。
为了兼容oracle substrb,不截断多字节字符,可以在orafce包的substrb基础之上再写一个plpgsql函数来代替。
兼容oracle substrb
下面是阿里云提供的方法。
create extension orafce;
CREATE OR REPLACE FUNCTION substrb_new(v_text text, start integer, count integer)
RETURNS character varying AS
$BODY$
DECLARE
ret text default '';
len integer default 0;
len1 integer default 0;
start1 integer default 0;
end1 integer;
cc varchar;
BEGIN
-- execute 'select substrb(''123中国123'',3,2)' into v_text;
len = NVL(length(v_text), 0);
end1 = start + count;
for pos in 0..len-1 loop
if len1 >= start - 1 THEN
exit;
end if;
len1 = len1 + octet_length(SUBSTR (v_text, pos+1, 1));
start1 = start1 + 1;
end loop;
for pos in start1..len-1 loop
cc = SUBSTR (v_text, pos+1, 1);
len1 = len1 + octet_length(cc);
if len1 > end1 - 1 THEN
exit;
end if;
ret = ret || cc;
end loop;
return ret;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
UTF8字符集,中不会被截断。
postgres=# select octet_length(substrb_new('123中国123',3,2));
octet_length
--------------
1
(1 row)
postgres=# select octet_length(substrb_new('123中国123',3,3));
octet_length
--------------
1
(1 row)
postgres=# select octet_length(substrb_new('123中国123',3,4));
octet_length
--------------
4
(1 row)
postgres=# select octet_length(substrb_new('123中国123',3,1));
octet_length
--------------
1
(1 row)
postgres=# select substrb_new('123中国123',3,1);
substrb_new
-------------
3
(1 row)
postgres=# select substrb_new('123中国123',3,2);
substrb_new
-------------
3
(1 row)
postgres=# select substrb_new('123中国123',3,3);
substrb_new
-------------
3
(1 row)
postgres=# select substrb_new('123中国123',3,4);
substrb_new
-------------
3中
(1 row)
orafce提供的substrb截断多字节字符时,可能产生乱码或不可见字符。
postgres=# select substrb('123中国123',3,1)::bytea;
substrb
---------
\x33
(1 row)
postgres=# select substrb('123中国123',3,2)::bytea;
substrb
---------
\x33e4
(1 row)
postgres=# select substrb('123中国123',3,3)::bytea;
substrb
----------
\x33e4b8
(1 row)
postgres=# select substrb('123中国123',3,4)::bytea;
substrb
------------
\x33e4b8ad
(1 row)