PostgreSQL 11 preview - pgbench 压测软件增强-阿里云开发者社区

\set varname expression  
    Sets variable varname to a value calculated from expression. 
   The expression may contain the NULL constant, boolean 
   constants TRUE and FALSE, integer constants such as 5432, 
   double constants such as 3.14159, references to  
    variables :variablename, operators with their usual SQL 
   precedence and associativity, function calls, SQL CASE 
   generic conditional expressions and parentheses.  

    Functions and most operators return NULL on NULL input.  

    For conditional purposes, non zero numerical values are 
   TRUE, zero numerical values and NULL are FALSE.  

    When no final ELSE clause is provided to a CASE, the default value is NULL.  

    Examples:  

        \set ntellers 10 * :scale  
        \set aid (1021 * random(1, 100000 * :scale)) % \  
                   (100000 * :scale) + 1  
        \set divx CASE WHEN :x <> 0 THEN :y/:x ELSE NULL END

Add \if conditional support to pgbench (Fabien Coelho)

支持if条件控制用法

\if expression  
\elif expression  
\else  
\endif  
  This group of commands implements nestable conditional blocks, 
  similarly to psql's \if expression. Conditional expressions 
  are identical to those with \set, with non-zero values 
  interpreted as true.

类似psql的if用法

\if expression  
\elif expression  
\else  
\endif  
  This group of commands implements nestable conditional blocks. 
  A conditional block must begin with an \if and end with an \endif. 
  In between there may be any number of \elif clauses, which 
  may optionally be followed by a  
  single \else clause. Ordinary queries and other types of backslash 
  commands may (and usually do) appear between the commands forming 
  a conditional block.  
  
  The \if and \elif commands read their argument(s) and evaluate 
  them as a boolean expression. If the expression yields true then 
  processing continues normally; otherwise, lines are skipped until 
  a matching \elif, \else,  
  or \endif is reached. Once an \if or \elif test has succeeded, 
  the arguments of later \elif commands in the same block are not 
  evaluated but are treated as false. Lines following an \else are 
  processed only if no earlier  
  matching \if or \elif succeeded.  
  
  The expression argument of an \if or \elif command is subject to 
  variable interpolation and backquote expansion, just like any 
  other backslash command argument. After that it is evaluated 
  like the value of an on/off  
  option variable. So a valid value is any unambiguous case-insensitive 
  match for one of: true, false, 1, 0, on, off, yes, no. 
  For example, t, T, and tR will all be considered to be true.  
  
  Expressions that do not properly evaluate to true or false will 
  generate a warning and be treated as false.  
  
  Lines being skipped are parsed normally to identify queries and 
  backslash commands, but queries are not sent to the server, 
  and backslash commands other than conditionals (\if, \elif, \else, \endif) are ignored.  
  Conditional commands are checked only for valid nesting.
  Variable references in skipped lines are not expanded, and backquote 
  expansion is not performed either.  
  
  All the backslash commands of a given conditional block must appear 
  in the same source file. If EOF is reached on the main input file 
  or an \include-ed file before all local \if-blocks have been closed, 
  then psql will  
  raise an error.  
  
  Here is an example:  
  
    -- check for the existence of two separate records in the database and store  
    -- the results in separate psql variables  
    SELECT  
        EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,  
        EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee  
    \gset  
    \if :is_customer  
        SELECT * FROM customer WHERE customer_id = 123;  
    \elif :is_employee  
        \echo 'is not a customer but is an employee'  
        SELECT * FROM employee WHERE employee_id = 456;  
    \else  
        \if yes  
            \echo 'not a customer or employee'  
        \else  
            \echo 'this will never print'  
        \endif  
    \endif

Allow the use of non-ASCII characters in pgbench variable names (Fabien Coelho)
Add pgbench option --init-steps to control the initialization steps performed (Masahiko Sawada)

增加前奏操作，目前感觉还只围绕TPC-B来做前奏，建议后面加script用户自定义前奏。

-I init_steps  
--init-steps=init_steps  
    Perform just a selected set of the normal initialization steps.  
   init_steps specifies the initialization steps to be performed, 
   using one character per step. Each step is invoked in the 
   specified order. The default is  
    dtgvp. The available steps are:  

    d (Drop)  
        Drop any existing pgbench tables.  

    t (create Tables)  
        Create the tables used by the standard pgbench scenario, 
       namely pgbench_accounts, pgbench_branches, pgbench_history, 
       and pgbench_tellers.  

    g (Generate data)  
        Generate data and load it into the standard tables, 
       replacing any data already present.  

    v (Vacuum)  
        Invoke VACUUM on the standard tables.  

    p (create Primary keys)  
        Create primary key indexes on the standard tables.  

    f (create Foreign keys)  
        Create foreign key constraints between the standard tables. (Note that this step is not performed by default.)

Add approximated Zipfian-distributed random generator to pgbench (Alik Khilazhev)

增加Zipfian-distributed随机分布值的产生函数，可以生成需要Zipf分布的数据。Zipf分布数据的背景知识：

https://baike.baidu.com/item/Zipf%E5%AE%9A%E5%BE%8B

1935年,哈佛大学的语言学专家Zipf在研究英文单词出现的频率时,发现如果把单词出现的频率按由大到小的顺序排列,则每个单词出现的频率与它的名次的常数次幂存在简单的反比关系，这种分布就称为Zipf定律,它表明在英语单词中,只有极少数的词被经常使用,而绝大多数词很少被使用.实际上,包括汉语在内的许多国家的语言都有这种特点。这个定律后来在很多领域得到了同样的验证，包括网站的访问者数量、城镇的大小和每个国家公司的数量。

相信你一定听过这样的说法：

80%的财富集中在20%的人手中……

80%的用户只使用20%的功能……

20%的用户贡献了80%的访问量……

…………

你知道我在说“二八原则”或“20/80原则”，是的，没错！

如果把所有的单词（字）放在一起看呢？会不会20%的词（字）占了80%的出现次数？答案是肯定的。

早在上个世纪30年代，就有人（Zipf）对此作出了研究，并给出了量化的表达——齐普夫定律（Zipf's Law）：一个词在一个有相当长度的语篇中的等级序号（该词在按出现次数排列的词表中的位置，他称之为rank，简称r）与该词的出现频率（他称为frequency，简称f）的乘积几乎是一个常数（constant，简称C）。用公式表示，就是 r × f = C 。

·   random_zipfian generates an approximated bounded zipfian 
distribution. For parameter in (0, 1), an approximated algorithm 
is taken from "Quickly Generating Billion-Record Synthetic Databases", 
Jim Gray et al, SIGMOD  
    1994. For parameter in (1, 1000), a rejection method is used, 
   based on "Non-Uniform Random Variate Generation", Luc Devroye, 
   p. 550-551, Springer 1986. The distribution is not defined 
   when the parameter's value is 1.0.  
    The drawing performance is poor for parameter values close 
   and above 1.0 and on a small range.  

    parameter defines how skewed the distribution is. The larger 
   the parameter, the more frequently values to the beginning 
   of the interval are drawn. The closer to 0 parameter is, 
   the flatter (more uniform) the access  
    distribution.  

Hash functions hash, hash_murmur2 and hash_fnv1a accept an input 
value and an optional seed parameter. In case the seed isn't 
provided the value of :default_seed is used, which is initialized 
randomly unless set by the  
command-line -D option. Hash functions can be used to scatter the 
distribution of random functions such as random_zipfian or 
random_exponential. For instance, the following pgbench script 
simulates possible real world  
workload typical for social media and blogging platforms where 
few accounts generate excessive load:  

    \set r random_zipfian(0, 100000000, 1.07)

Allow the random seed to be set in pgbench (Fabien Coelho)

--random-seed=SEED  
  Set random generator seed. Seeds the system random number generator, 
  which then produces a sequence of initial generator states, one for 
  each thread. Values for SEED may be: time (the default, the seed is 
  based on the  
  current time), rand (use a strong random source, failing if none is 
  available), or an unsigned decimal integer value. The random generator 
  is invoked explicitly from a pgbench script (random...  functions) 
  or implicitly  
  (for instance option --rate uses it to schedule transactions). When 
  explicitly set, the value used for seeding is shown on the terminal. 
  Any value allowed for SEED may also be provided through the environment variable  
  PGBENCH_RANDOM_SEED. To ensure that the provided seed impacts all 
  possible uses, put this option first or use the environment variable.  
  
  Setting the seed explicitly allows to reproduce a pgbench run exactly, 
  as far as random numbers are concerned. As the random state is managed 
  per thread, this means the exact same pgbench run for an identical invocation  
  if there is one client per thread and there are no external or data 
  dependencies. From a statistical viewpoint reproducing runs exactly 
  is a bad idea because it can hide the performance variability or improve performance  
  unduly, e.g. by hitting the same pages as a previous run. However, 
  it may also be of great help for debugging, for instance re-running 
  a tricky case which leads to an error. Use wisely.

Allow pgbench to do exponentiation with pow() and power() (Raúl Marín Rodríguez)

支持power函数.

Add hashing functions to pgbench (Ildar Musin)

Hash functions hash, hash_murmur2 and hash_fnv1a accept an input value and an 
optional seed parameter. In case the seed isn't provided the value of :default_seed is 
used, which is initialized randomly unless set by the command-line -D option. 
Hash functions can be used to scatter the distribution of random functions such 
as random_zipfian or random_exponential. For instance, the following pgbench 
script simulates possible real world workload typical for social media and 
blogging platforms where few accounts generate excessive load:  

\set r random_zipfian(0, 100000000, 1.07)  
\set k abs(hash(:r)) % 1000000

Make pgbench statistics more accurate when using --latency-limit and --rate (Fabien Coelho)

提高统计精确度

PostgreSQL 11 preview - pgbench 压测软件增强

标签

背景

E.1.3.9.2. pgbench

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像