Python內存管理模塊的一個奇技淫巧
最近在讀Python源碼中有關內存管理的部分。Python在分配小塊內存(小于256字節)時,采用了內存池,以降低對內核內存分配程序的調用頻次。在內存池的設計上,采用了一個分層的設計,由上到下依次是arena、pool、block。這次我看到的這個比較費解的結構,就來自于分配內存時,對于pool的處理。
謎團
在最主要的分配內存的函數_PyObject_Alloc中,我看到了這么一段代碼:
pool = usable_arenas->freepools;
// some other code ...
init_pool:
/* Frontlink to used pools. */
next = usedpools[size + size]; /* == prev */
pool->nextpool = next;
pool->prevpool = next;
next->nextpool = pool;
next->prevpool = pool;
從注釋和函數命名中也能看出來,這段代碼的意思,大概是從arena的freepools中拿出一個pool,插入到usedpools的開頭。但是,這里有一個奇怪的地方:
如果說usedpools中存儲的不同size的pool鏈表,那么為什么index要用size+size來表示呢?
初探
首先,我在代碼中搜索了一下usedpools,找到了這么一段注釋:
This is involved. For an index i, usedpools[i+i] is the header for a list of
all partially used pools holding small blocks with “size class idx” i. So
usedpools[0] corresponds to blocks of size 8, usedpools[2] to blocks of size
16, and so on: index 2*i <-> blocks of size (i+1)<<ALIGNMENT_SHIFT.
哦,就是說,我要找到保存8字節block的pool,就去usedpools[0+0];找16字節的pool,就去usedpools[1+1]。再看一下上面代碼中的size的由來:
size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
果然可以和注釋里寫的對應起來(注意,注釋里的2*i是usedpools的index,而這里的nbytes是實際需要分配的block的字節大小,所以一個需要左移位,一個需要右移位,互相轉換)。
但是目前為止,好像處在一個“知其然,而不知其所以然”的狀態:這里沒有解釋為什么是這樣一個奇葩的size+size的index。
深入
往下,又有一段注釋(Python源碼的注釋真詳細):
Major obscurity: While the usedpools vector is declared to have poolp
entries, it doesn’t really. It really contains two pointers per (conceptual)
poolp entry, the nextpool and prevpool members of a pool_header. The
excruciating initialization code below fools C so that
usedpool[i+i]
“acts like” a genuine poolp, but only so long as you only reference its
nextpool and prevpool members. The “- 2 sizeof(block )” gibberish is
compensating for that a pool_header’s nextpool and prevpool members
immediately follow a pool_header’s first two members:
union { block *_padding;
uint count; } ref;
block *freeblock;
each of which consume sizeof(block
) bytes. So what usedpools[i+i] really
if* C believes it’s a poolp
pointer, then p->nextpool and p->prevpool are both p (meaning that the headed circular list is empty).
這一大段注釋說了什么呢?大意就是,你以為usedpools里放的是poolp類型?其實里面是兩兩一對的poolp,一個是nextpool指針,一個是prevpool指針。這里還使用了一些trick,讓C語言認為usedpools里放的是pool_header變量,只要使用者能保證只使用它的nextpool、prevpool成員。
直接看usedpools的定義吧:
#define PTA(x) ((poolp )((uint8_t *)&(usedpools[2*(x)]) - 2*sizeof(block *)))
#define PT(x) PTA(x), PTA(x)
static poolp usedpools[2 * ((NB_SMALL_SIZE_CLASSES + 7) / 8) * 8] = {
PT(0), PT(1), PT(2), PT(3), PT(4), PT(5), PT(6), PT(7)
#if NB_SMALL_SIZE_CLASSES > 8
, PT(8), PT(9), PT(10), PT(11), PT(12), PT(13), PT(14), PT(15)
……
};
展開宏定義:
static poolp usedpools[2 * ((NB_SMALL_SIZE_CLASSES + 7) / 8) * 8] = {
PTA(0), PTA(0),PTA(1),PTA(1), PTA(2),PTA(2), PTA(3),PTA(3),...
……
};
而PTA(x)定義為,usedpools[2x]的地址減去2個block*長度的地址。結合pool_header的定義:
struct pool_header {
union { block *_padding;
uint count; } ref; /* number of allocated blocks */
block *freeblock; /* pool's free list head */
struct pool_header *nextpool; /* next pool of this size class */
struct pool_header *prevpool; /* previous pool "" */
uint arenaindex; /* index into arenas of base adr */
uint szidx; /* block size class index */
uint nextoffset; /* bytes to virgin block */
uint maxnextoffset; /* largest valid nextoffset */
};
nextpool和pool_header的地址之間,正好差了2個block*的偏移量。所以,在初始化usedpools之后,當把usedpools[size+size]當作pool_header時,那么usedpools[size+size]->nextpool將正好取到usedpools[size+size]的地址,也就是指向usedpools[size+size];而usedpools[size+size]->prevpool將也正好取到usedpools[size+size]的地址。因此:
next = usedpools[size + size]; /* == prev */
這段代碼+注釋就說得通了。
然后,經過:
pool->nextpool = next;
pool->prevpool = next;
next->nextpool = pool;
next->prevpool = pool;
這么一折騰,usedpools中存儲的一對地址(nextpool, prevpool)就指向了真正的pool_header變量pool。
知其所以然
但是,為什么要這么繞的設計這樣的usedpools結構呢,直接存儲一個完整的pool_header不簡單明了嗎?下面這段注釋給出了答案:
It’s unclear why the usedpools setup is so convoluted. It could be to
minimize the amount of cache required to hold this heavily-referenced table
(which only needs the two interpool pointer members of a pool_header).
就是說,這樣的一個usedpools表,會經常的被使用,使用這種結構,就可以減少放入緩存中的數據量(畢竟,我們只需要放入兩個指針,而不是整個pool_header),從而也就提高了緩存的效率。為了提高效率,還真是無所不用其極啊。
到這里,obmalloc.c中我認為最費解的一個數據結構,就弄清楚啦。
來自:http://blog.guoyb.com/2017/03/15/python-obmalloc-trick/