标签
PostgreSQL , Oracle , nchar , nvarchar , ntext , ms sql , sybase , unicode character , utf8
背景
在Oracle, MS SQL, SYBASE数据库引擎中,有一些这样的字符串类型nchar , nvarchar , ntext。
这些类型是什么含义?在PostgreSQL中应该使用什么类型与之对应?
以MS SQL为例
Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.
nchar [ ( n ) ]
Fixed-length Unicode string data. n defines the string length and must be a value from 1 through 4,000.
实际上,就两层含义
1、存储的是unicode编码的字符串,使用UNICODE UCS-2字符集。
2、长度指的是字符个数,而非字节数。
nchar , nvarchar , ntext 在 PostgreSQL的对应关系
介于以上介绍的两点,只要满足以下条件,char,varchar,text即对应到了nchar , nvarchar , ntext类型。
1、在PostgreSQL中使用UTF8字符集时,实际上就是unicode(别名)。
https://www.postgresql.org/docs/devel/static/multibyte.html
Name | Description | Language | Server? | ICU? | Bytes/Char | Aliases |
---|---|---|---|---|---|---|
UTF8 | Unicode, 8-bit | all | Yes | Yes | 1-4 | Unicode |
满足以上条件即可用char,varchar,text直接替代nchar,nvarchar,ntext,因为在PostgreSQL中char(n),varchar(n),任何时候就是指的字符长度(而不是字节长度)。
如果PG 目标DB不是utf8字符集,又当如何
可以有两种方法:
1、实际上依旧可以使用char, varchar, text类型存储(长度限制与上游保持一致即可),只是建议业务方做一下字符集转换后存入PG。
源端存储的 unicode 字符串转换为 PG目标库的目标字符集字符串。
postgres=# \df convert*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------------+------------------+---------------------+------
pg_catalog | convert | bytea | bytea, name, name | func
pg_catalog | convert_from | text | bytea, name | func
pg_catalog | convert_to | bytea | text, name | func
(3 rows)
dbtest1=# select convert_to(N'你好中国','sqlascii');
convert_to
--------------------
\xc4e3bac3d6d0b9fa
(1 row)
dbtest1=# select convert_to(N'你好中国','utf8');
convert_to
----------------------------
\xe4bda0e5a5bde4b8ade59bbd
(1 row)
2、或者你在PG中可以使用字节流存储 来自源库unicode 字符串的字节流,读取时再转换为当前字符集。
dbtest1=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
dbtest1 | postgres | EUC_CN | C | C |
create table test (id int, content bytea);
insert into test values (1, convert_to(N'你好中国','utf8'));
dbtest1=# select * from test;
id | content
----+----------------------------
1 | \xe4bda0e5a5bde4b8ade59bbd
(1 row)
dbtest1=# select convert_from(content,'utf8') from test;
convert_from
--------------
你好中国
(1 row)
PostgreSQL nchar , nvarchar , ntext 的输入语法
N quote
dbtest1=# select N'abc你好中国';
bpchar
-------------
abc你好中国
(1 row)
参考
1、src/backend/parser/scan.l
/*
* To ensure that {quotecontinue} can be scanned without having to back up
* if the full pattern isn't matched, we include trailing whitespace in
* {quotestop}. This matches all cases where {quotecontinue} fails to match,
* except for {quote} followed by whitespace and just one "-" (not two,
* which would start a {comment}). To cover that we have {quotefail}.
* The actions for {quotestop} and {quotefail} must throw back characters
* beyond the quote proper.
*/
quote '
quotestop {quote}{whitespace}*
quotecontinue {quote}{whitespace_with_newline}{quote}
quotefail {quote}{whitespace}*"-"
/* Bit string
* It is tempting to scan the string for only those characters
* which are allowed. However, this leads to silently swallowed
* characters if illegal characters are included in the string.
* For example, if xbinside is [01] then B'ABCD' is interpreted
* as a zero-length string, and the ABCD' is lost!
* Better to pass the string forward and let the input routines
* validate the contents.
*/
xbstart [bB]{quote}
xbinside [^']*
/* Hexadecimal number */
xhstart [xX]{quote}
xhinside [^']*
输入nchar,nvarchar方法:
/* National character */
xnstart [nN]{quote}
/* Quoted string that allows backslash escapes */
xestart [eE]{quote}
xeinside [^\\']+
xeescape [\\][^0-7]
xeoctesc [\\][0-7]{1,3}
xehexesc [\\]x[0-9A-Fa-f]{1,2}
xeunicode [\\](u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})
xeunicodefail [\\](u[0-9A-Fa-f]{0,3}|U[0-9A-Fa-f]{0,7})
/* Extended quote
* xqdouble implements embedded quote, ''''
*/
xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+
/* $foo$ style quotes ("dollar quoting")
* The quoted string starts with $foo$ where "foo" is an optional string
* in the form of an identifier, except that it may not contain "$",
* and extends to the first occurrence of an identical string.
* There is *no* processing of the quoted text.
*
* {dolqfailed} is an error rule to avoid scanner backup when {dolqdelim}
* fails to match its trailing "$".
*/
dolq_start [A-Za-z\200-\377_]
dolq_cont [A-Za-z\200-\377_0-9]
dolqdelim \$({dolq_start}{dolq_cont}*)?\$
dolqfailed \${dolq_start}{dolq_cont}*
dolqinside [^$]+
/* Double quote
* Allows embedded spaces and other special characters into identifiers.
*/
dquote \"
xdstart {dquote}
xdstop {dquote}
xddouble {dquote}{dquote}
xdinside [^"]+
/* Unicode escapes */
uescape [uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
/* error rule to avoid backup */
uescapefail [uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
/* Quoted identifier with Unicode escapes */
xuistart [uU]&{dquote}
/* Quoted string with Unicode escapes */
xusstart [uU]&{quote}
/* Optional UESCAPE after a quoted string or identifier with Unicode escapes. */
xustop1 {uescapefail}?
xustop2 {uescape}
/* error rule to avoid backup */
xufailed [uU]&
3、https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2016
4、https://www.postgresql.org/docs/devel/static/multibyte.html
5、https://stackoverflow.com/questions/1245217/what-is-the-postgresql-equivalent-to-sql-server-nvarchar
It's varchar and text, assuming your database is in UNICODE encoding.
If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.
6、各种数据库引擎(postgresql, mysql, oracle, ms sql, sql lite等)的WEB版SQL在线测试运行环境