PostgreSQL 用CPU "硬解碼" 提升1倍數值運算能力助力金融大數據量計算

dasanhuans2 9年前發布 | 15K 次閱讀大數據 PostgreSQL 數據庫服務器

PostgreSQL 支持的數字類型包括整型，浮點，以及PG自己實現的numeric數據類型。

src/backend/utils/adt/numeric.c  
src/backend/utils/adt/float.c

 
 numeric可以存儲非常大的數字，超過2^17次方個數字長度。提升了精度的同時，也帶來了性能的損耗，不能充分利用CPU 的 “硬解碼”能力。

typedef struct NumericVar  
{  
        int                     ndigits;                /* # of digits in digits[] - can be 0! */  
        int                     weight;                 /* weight of first digit */  
        int                     sign;                   /* NUMERIC_POS, NUMERIC_NEG, or NUMERIC_NAN */  
        int                     dscale;                 /* display scale */  
        NumericDigit *buf;                      /* start of palloc'd space for digits[] */  
        NumericDigit *digits;           /* base-NBASE digits */  
} NumericVar;  
 
  
  浮點類型就比numeric輕量很多，所以性能也會好很多，一倍左右。

在大數據的場合中，節約1倍的計算量是很可觀的哦，特別是在金融行業，涉及到大量的數值計算。
如果你玩過greenplum, deepgreen, vitessedb ，也能發現在這些數據庫產品的測試手冊中，會提到使用money, float8類型來替換原有的numeric類型來進行測試。可以得到更好的性能。
但是money, float8始終是有一定的弊端的，超出精度時，結果可能不準確。
那么怎樣提升numeric的性能又不會得到有誤的結果呢？
我們可以使用fexeddecimal插件，如下：
https://github.com/2ndQuadrant/fixeddecimal 
 </div>
fixeddecimal的原理很簡單，實際上它是使用int8來存儲的，整數位和小數位是在代碼中固定的：
/*

The scale which the number is actually stored.  
For example: 100 will allow 2 decimal places of precision  
This must always be a '1' followed by a number of '0's.

*/

#define FIXEDDECIMAL_MULTIPLIER 100LL  

/*

Number of decimal places to store.  
This number should be the number of decimal digits that it takes to  
represent FIXEDDECIMAL_MULTIPLIER - 1

*/

#define FIXEDDECIMAL_SCALE 2  </code></pre>

如果 FIXEDDECIMAL_SCALE 設置為2，則FIXEDDECIMAL_MULTIPLIER 設置為100，如果 FIXEDDECIMAL_SCALE 設置為3，FIXEDDECIMAL_MULTIPLIER 設置為1000。也就是通過整型來存儲，顯示時除以multiplier得到整數部分，取余得到小數部分。 
/*

fixeddecimal2str  
Prints the fixeddecimal 'val' to buffer as a string.  
Returns a pointer to the end of the written string.

/

static char 

fixeddecimal2str(int64 val, char buffer)

{

char       ptr = buffer;

int64           integralpart = val / FIXEDDECIMAL_MULTIPLIER;

int64           fractionalpart = val % FIXEDDECIMAL_MULTIPLIER;
if (val < 0)

{

fractionalpart = -fractionalpart;
/*

Handle special case for negative numbers where the intergral part  
is zero. pg_int64tostr() won't prefix with "-0" in this case, so  
we'll do it manually

*/

if (integralpart == 0)  
 *ptr++ = '-';  

}

ptr = pg_int64tostr(ptr, integralpart);

*ptr++ = '.';

ptr = pg_int64tostr_zeropad(ptr, fractionalpart, FIXEDDECIMAL_SCALE);

return ptr;

}  </code></pre> 所以fixeddecimal能存取的值范圍就是INT8的范圍除以multiplier。


postgres=# select 9223372036854775807::int8;

int8



9223372036854775807

(1 row)
postgres=# select 9223372036854775808::int8;

ERROR:  22003: bigint out of range

LOCATION:  numeric_int8, numeric.c:2955
postgres=# select 92233720368547758.07::fixeddecimal;

 fixeddecimal       


92233720368547758.07

(1 row)
postgres=# select 92233720368547758.08::fixeddecimal;

ERROR:  22003: value "92233720368547758.08" is out of range for type fixeddecimal

LOCATION:  scanfixeddecimal, fixeddecimal.c:499  </code></pre> 
 
 
  另外需要注意，編譯fixeddecimal需要用到支持__int128的編譯器，gcc 4.9.3是支持的。所以如果你用的gcc版本比較低的話，需要提前更新好gcc。

http://blog.163.com/digoal@126/blog/static/163877040201601313814429/ 
 </div>
下面測試一下fixeddecimal+PostgreSQL 9.5的性能表現，對1億數據進行加減乘除以及聚合的運算，看float8, numeric, fixeddecimal類型的運算結果和速度：使用auto_explain記錄下對比float8,numeric,fixeddecimal的執行計劃和耗時。 
psql
\timing
postgres=# load 'auto_explain';

LOAD

Time: 2.328 ms
postgres=# set auto_explain.log_analyze =true;

SET

Time: 0.115 ms

postgres=# set auto_explain.log_buffers =true;

SET

Time: 0.080 ms

postgres=# set auto_explain.log_nested_statements=true;

SET

Time: 0.073 ms

postgres=# set auto_explain.log_timing=true;

SET

Time: 0.089 ms

postgres=# set auto_explain.log_triggers=true;

SET

Time: 0.076 ms

postgres=# set auto_explain.log_verbose=true;

SET

Time: 0.074 ms

postgres=# set auto_explain.log_min_duration=0;

SET

Time: 0.149 ms

postgres=# set client_min_messages ='log';

SET

Time: 0.144 ms
postgres=# set work_mem='8GB';

SET

Time: 0.152 ms
postgres=# select sum(i::numeric),min(i::numeric),max(i::numeric),avg(i::numeric),sum(3.0::numeric(i::numeric+i::numeric)),avg(i::numeric/3.0::numeric) from generate_series(1,100000000) t(i);

LOG:  duration: 241348.655 ms  plan:

Query Text: select sum(i::numeric),min(i::numeric),max(i::numeric),avg(i::numeric),sum(3.0::numeric(i::numeric+i::numeric)),avg(i::numeric/3.0::numeric) from generate_series(1,100000000) t(i);

Aggregate  (cost=50.01..50.02 rows=1 width=4) (actual time=241348.631..241348.631 rows=1 loops=1)

  Output: sum((i)::numeric), min((i)::numeric), max((i)::numeric), avg((i)::numeric), sum((3.0 * ((i)::numeric + (i)::numeric))), avg(((i)::numeric / 3.0))

  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..10.00 rows=1000 width=4) (actual time=12200.116..22265.586 rows=100000000 loops=1)

        Output: i

        Function Call: generate_series(1, 100000000)

       sum        | min |    max    |          avg          |         sum         |              avg

------------------+-----+-----------+-----------------------+---------------------+-------------------------------

 5000000050000000 |   1 | 100000000 | 50000000.500000000000 | 30000000300000000.0 | 16666666.83333333333333333333

(1 row)
Time: 243149.286 ms
postgres=# select sum(i::float8),min(i::float8),max(i::float8),avg(i::float8),sum(3.0::float8(i::float8+i::float8)),avg(i::float8/3.0::float8) from generate_series(1,100000000) t(i);

LOG:  duration: 112407.004 ms  plan:

Query Text: select sum(i::float8),min(i::float8),max(i::float8),avg(i::float8),sum(3.0::float8(i::float8+i::float8)),avg(i::float8/3.0::float8) from generate_series(1,100000000) t(i);

Aggregate  (cost=50.01..50.02 rows=1 width=4) (actual time=112406.967..112406.967 rows=1 loops=1)

  Output: sum((i)::double precision), min((i)::double precision), max((i)::double precision), avg((i)::double precision), sum(('3'::double precision * ((i)::double precision + (i)::double precision))), avg(((i)::double precision / '3'::double precision))

  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..10.00 rows=1000 width=4) (actual time=12157.571..20994.444 rows=100000000 loops=1)

        Output: i

        Function Call: generate_series(1, 100000000)

      sum       | min |    max    |    avg     |         sum          |       avg

----------------+-----+-----------+------------+----------------------+------------------

 5.00000005e+15 |   1 | 100000000 | 50000000.5 | 3.00000003225094e+16 | 16666666.8333333

(1 row)
Time: 114208.528 ms
postgres=# select sum(i::fixeddecimal),min(i::fixeddecimal),max(i::fixeddecimal),avg(i::fixeddecimal),sum(3.0::fixeddecimal(i::fixeddecimal+i::fixeddecimal)),avg(i::fixeddecimal/3.0::fixeddecimal) from generate_series(1,100000000) t(i);

LOG:  duration: 97956.458 ms  plan:

Query Text: select sum(i::fixeddecimal),min(i::fixeddecimal),max(i::fixeddecimal),avg(i::fixeddecimal),sum(3.0::fixeddecimal(i::fixeddecimal+i::fixeddecimal)),avg(i::fixeddecimal/3.0::fixeddecimal) from generate_series(1,100000000) t(i);

Aggregate  (cost=50.01..50.02 rows=1 width=4) (actual time=97956.431..97956.431 rows=1 loops=1)

  Output: sum((i)::fixeddecimal), min((i)::fixeddecimal), max((i)::fixeddecimal), avg((i)::fixeddecimal), sum(('3.00'::fixeddecimal * ((i)::fixeddecimal + (i)::fixeddecimal))), avg(((i)::fixeddecimal / '3.00'::fixeddecimal))

  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..10.00 rows=1000 width=4) (actual time=12168.630..20874.617 rows=100000000 loops=1)

        Output: i

        Function Call: generate_series(1, 100000000)

         sum         | min  |     max      |     avg     |         sum          |     avg

---------------------+------+--------------+-------------+----------------------+-------------

 5000000050000000.00 | 1.00 | 100000000.00 | 50000000.50 | 30000000300000000.00 | 16666666.83

(1 row)
Time: 99763.032 ms  </code></pre> 
 
 
  性能對比:

 
 </div>
 
  注意上面的測試case,

  float8的結果已經不準確了，fixeddecimal使用了默認的scale=2，所以小數位保持2位精度。

  numeric則精度更高，顯示的部分沒有顯示全，這是PG內部控制的。

  另外需要注意的是，fixeddecimal對于超出精度的部分是做的截斷，不是round, 因此123.555是存的12355而不是12356。

 
postgres=# select '123.555'::fixeddecimal;
 fixeddecimal   
123.55

(1 row)
postgres=# select '123.555'::fixeddecimal/'123.556'::fixeddecimal;
 ?column?   
1.00

(1 row)
postgres=# select '124.555'::fixeddecimal/'123.556'::fixeddecimal;
 ?column?   
1.00

(1 row)
postgres=# select 124.555/123.556;

  ?column?        


1.0080854025704944

(1 row)  </code></pre> 
 
 聲明：云棲社區站內文章，未經作者本人允許或特別聲明，嚴禁轉載，但歡迎分享。 

 </code></code></code></code></code></code></div>
                    

                    
                         本文由用戶 dasanhuans2 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。
                         轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。
                         本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！
                         本文地址：http://www.baiduhome.net/lib/view/open1457055152765.html
                         大數據 PostgreSQL 數據庫服務器
                    

                

                
                    
                        相關經驗
                        
  PostgreSQL 用CPU "硬解碼" 提升1倍 數值運算能力 助力金融大數據量計算
   如何提升你的閱讀能力？
   PostgreSQL LIKE 查詢效率提升實驗
   “PostgreSQL＋金融”的架構演進
   當簡單的計算遇上了大數,其實大數運算也很簡單
                         
                    
                    
                        相關資訊
                        
  用F#進行數值計算
   提升編程能力的11個技巧
   提升Visual Studio 2012的響應能力
   Ubuntu 17.10愜意看片：全面支持Intel/AMD/NVIDIA硬解碼
   提升R代碼運算效率的11個實用方法
                         
                    
                    
                        相關文檔
                        
   openssl大數運算函數簡介
    數值分析綜合實驗
    06-基本運算
    移動大數據技術在互聯網金融獲客及經營中的應用
    移動大數據技術在互聯網金融獲客及經營中的應用
    京東金融大數據分析平臺總體架構-v1.0
    python cookbook第三版中文v2.0.0
    m0n0wall 中文手冊
    m0n0wall 中文手冊
                         
                    

                    目錄

PostgreSQL 用CPU "硬解碼" 提升1倍 數值運算能力 助力金融大數據量計算

fixeddecimal

?column?

?column?

相關經驗

相關資訊

相關文檔

目錄

PostgreSQL 用CPU "硬解碼" 提升1倍數值運算能力助力金融大數據量計算