PostgreSQL 利用编译器extension 支持int128,提升聚合性能-阿里云开发者社区

开发者社区> 德哥> 正文

PostgreSQL 利用编译器extension 支持int128,提升聚合性能

简介: 标签 PostgreSQL , int128 , clang , gcc , icc 背景 PostgreSQL 9.4以及以前的版本,在INT,INT2,INT8的聚合计算中,为了保证数据不会溢出,中间结果使用numeric来存储。 numeric是PostgreSQL自己实现的一种数值类型,可以存储非常大的数值(估计是做科学计算的需求),但是牺牲了一定的性能。 为了提高聚合,特别
+关注继续查看

标签

PostgreSQL , int128 , clang , gcc , icc


背景

PostgreSQL 9.4以及以前的版本,在INT,INT2,INT8的聚合计算中,为了保证数据不会溢出,中间结果使用numeric来存储。

numeric是PostgreSQL自己实现的一种数值类型,可以存储非常大的数值(估计是做科学计算的需求),但是牺牲了一定的性能。

为了提高聚合,特别是大数据量的聚合时的性能,社区借用了编译器支持的int128类型,作为数据库int, int2, int8的中间计算结果,从而提升计算性能。

编译器相关的解释

gcc,clang,icc都支持int128

1. gcc

6.8 128-bit Integers

As an extension the integer scalar type __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. 

Simply write __int128 for a signed 128-bit integer, or unsigned __int128 for an unsigned 128-bit integer. 

There is no support in GCC for expressing an integer constant of type __int128 for targets with long long integer less than 128 bits wide.

2. icc

From what I can tell, at least icc 13.0.1+ support __int128_t and __uint128_t. Courtesy of Matt Godbolt's Compiler Explorer:

__int128_t ai (__int128_t x, __int128_t y) {
  return x + y;
}

__int128_t mi (__int128_t x, __int128_t y) {
  return x * y;
}

__int128_t di (__int128_t x, __int128_t y) {
  return x / y;
}

__int128_t ri (__int128_t x, __int128_t y) {
  return x % y;
}
compiles to:

L__routine_start_ai_0:
ai:
        add       rdi, rdx                                      #2.14
        mov       rax, rdi                                      #2.14
        adc       rsi, rcx                                      #2.14
        mov       rdx, rsi                                      #2.14
        ret                                                     #2.14
L__routine_start_mi_1:
mi:
        mov       rax, rdi                                      #6.14
        imul      rsi, rdx                                      #6.14
        imul      rcx, rdi                                      #6.14
        mul       rdx                                           #6.14
        add       rsi, rcx                                      #6.14
        add       rdx, rsi                                      #6.14
        ret                                                     #6.14
L__routine_start_di_2:
di:
        push      rsi                                           #9.44
        call      __divti3                                      #10.14
        pop       rcx                                           #10.14
        ret                                                     #10.14
L__routine_start_ri_3:
ri:
        push      rsi                                           #13.44
        call      __modti3                                      #14.14
        pop       rcx                                           #14.14
        ret                                                     #14.14
with icc 13.0.1 (http://goo.gl/UnxEFt).

PostgreSQL int128支持

编译时根据编译器的特性自动判断是否使用int128特性.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8122e1437e332e156d971a0274879b0ee76e488a

Add, optional, support for 128bit integers.

We will, for the foreseeable future, not expose 128 bit datatypes to
SQL. But being able to use 128bit math will allow us, in a later patch,
to use 128bit accumulators for some aggregates; leading to noticeable
speedups over using numeric.

So far we only detect a gcc/clang extension that supports 128bit math,
but no 128bit literals, and no *printf support. We might want to expand
this in the future to further compilers; if there are any that that
provide similar support.


config/c-compiler.m4        diff | blob | blame | history
configure       diff | blob | blame | history
configure.in        diff | blob | blame | history
src/include/c.h     diff | blob | blame | history
src/include/pg_config.h.in      diff | blob | blame | history
src/include/pg_config.h.win32       diff | blob | blame | history

性能提升测试

There was recently talk about if we should start using 128-bit integers
(where available) to speed up the aggregate functions over integers
which uses numeric for their internal state. So I hacked together a
patch for this to see what the performance gain would be.

Previous thread:
http://www.postgresql.org/message-id/20141017182500.GF2075@alap3.anarazel.de

What the patch does is switching from using numerics in the aggregate
state to int128 and then convert the type from the 128-bit integer in
the final function.

The functions where we can make use of int128 states are:

- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)

The initial benchmark results look very promising. When summing 10
million int8 I get a speedup of ~2.5x and similarly for var_samp() on 10
million int4 I see a speed up of ~3.7x. To me this indicates that it is
worth the extra code. What do you say? Is this worth implementing?

The current patch still requires work. I have not written the detection
of int128 support yet, and the patch needs code cleanup (for example: I
used an int16_ prefix on the added functions, suggestions for better
names are welcome). I also need to decide on what estimate to use for
the size of that state.

The patch should work and pass make check on platforms where __int128_t
is supported.

The simple benchmarks:

CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1, 
10000000) x;

Before:

# SELECT sum(x) FROM test_int8;
       sum
----------------
  50000005000000
(1 row)

Time: 2521.217 ms

After:

# SELECT sum(x) FROM test_int8;
       sum
----------------
  50000005000000
(1 row)

Time: 1022.811 ms

CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1, 
10000000) x;

Before:

# SELECT var_samp(x) FROM test_int4;
       var_samp
--------------------
  8333334166666.6667
(1 row)

Time: 3808.546 ms

After:

# SELECT var_samp(x) FROM test_int4;
       var_samp
--------------------
  8333334166666.6667
(1 row)

Time: 1033.243 ms

参考

1. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8122e1437e332e156d971a0274879b0ee76e488a

2. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=959277a4f579da5243968c750069570a58e92b38

3. https://www.postgresql.org/message-id/flat/544BB5F1.50709%40proxel.se#544BB5F1.50709@proxel.se

4. http://stackoverflow.com/questions/16365840/128-bit-integers-supporting-and-in-the-intel-c-compiler

5. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
Spring Cloud Alibaba基础教程:支持的几种服务消费方式(RestTemplate、WebClient、Feign)
通过《Spring Cloud Alibaba基础教程:使用Nacos实现服务注册与发现》一文的学习,我们已经学会如何使用Nacos来实现服务的注册与发现,同时也介绍如何通过LoadBalancerClient接口来获取某个服务的具体实例,并根据实例信息来发起服务接口消费请求。
2585 0
后疫情时代,用数据支持业务恢复创造新的可能性
2020年可以说每一天都在见证历史,新冠疫情的突然造访就如同“黑天鹅”不期而至,而企业现在还不开始数字化转型就如同“灰犀牛”存在潜在风险,当下在黑天鹅和灰犀牛的夹击下,经济和市场都产生了巨大的影响。
837 0
开启irqbalance提升服务器性能
操作系统 性能调休   公司有次压测存在一个问题:CPU资源压不上去,一直在40%已达到了性能瓶颈,后定位到原因,所在的服务器在压测过程中产生的中断都落在CPU0上处理,这种中断并没有均衡到各个CPU,导致单个CPU过载而形成瓶颈。
5676 0
最佳实践:负载均衡SLB支持自定义VPC实例IP地址
1.Greetings 大家好,我是负载均衡SLB产品经理添毅,今天教大家自定义负载均衡SLB VPC实例的IP地址,下面开始 2.负载均衡SLB的VPC实例 阿里云负载均衡SLB同时提供给了公网实例和私网实例,公网SLB可以面向互联网提供服务;私网实例面向阿里云内网提供服务,而私网负载均衡又分类两种:一种分配classic经典网络地址,可面现向阿里云内网所有经典网络提供服务;另外一种分配VPC专有网络地址,面向一个VPC内部提供服务。
4324 0
【产品功能】配置网卡从此与关机无缘,弹性网卡支持热插拔功能
本文主要介绍了弹性网卡的热插拨功能改进,弹性网卡插拔对应的实例不再要求必须为 Stopped 状态,现在只要求是稳定状态( Running Stopped )即可。以及介绍了控制台、OpenAPI如何对弹性网卡进行热插拔操作。
4271 0
+关注
德哥
公益是一辈子的事, I'm digoal, just do it.
2153
文章
245
问答
文章排行榜
最热
最新
相关电子书
更多
《2021云上架构与运维峰会演讲合集》
立即下载
《零基础CSS入门教程》
立即下载
《零基础HTML入门教程》
立即下载