4.6 KiB
In my mind, the memory package works like the following:
Design
Usage
To allocate 4KB CPU memory:
p = memory::Alloc(platform::CPUPlace(), 4*1024);
To allocate 4KB memory on the 3rd GPU:
p = memory::Alloc(platform::GPUPlace(2), 4*1024);
To free memory and check the so-far used amount of memory on a place:
auto pl = platform::GPUPlace(0);
p = memory::Alloc(pl, 4*1024);
cout << memory::Used(pl);
memory::Free(pl, p);
The API
In paddle/memory/memory.h
we have:
template <typeanme Place> void* Alloc(Place, size_t);
template <typeanme Place> void Free(Place, void*);
}
These function templates have specializations on either platform::CPUPlace
or platform::GPUPlace
:
template<>
void Alloc<CPUPlace>(CPUPlace p, size_t size) {
return GetCPUBuddyAllocator()->Alloc(size);
}
and
template<>
void Alloc(GPUPlace)(GPUPlace p, size_t size) {
return GetGPUBuddyAllocator(p.id)->Alloc(size);
}
The Implementation
GetCPUBuddyAllocator
and GetGPUBuddyAllocator
are singletions.
BuddyAllocator* GetCPUBuddyAllocator() {
static BuddyAllocator* a = NULL;
if (a == NULL) {
a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
}
return a;
}
BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
static BuddyAllocator* as = NULL;
if (as == NULL) {
as = new BuddyAllocator*[platform::NumGPUs()];
for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
}
}
return as[gpu_id);
BuddyAllocator
BuddyAllocator
implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm:
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
...
}
Please be aware that BuddyAllocator
always allocate aligned memory, aligned on 32-bytes, which can hold a BuddyAllocator::Block
object:
class BuddyAllocator {
private:
struct Block {
size_t size;
Blobk* left, right;
};
...
};
System Allocators
The GPUAllocator
and CPUAllocator
are calls system allocators. They hold information about the device, including the amount of memory has been allocated. So that we can call
GPUAllocator::Used
andCPUAllocator::Used
to get the amount of memory that has been allocated so far.
Why Such a Design
I got inspiration from Majel and Caffe2, though above design look different from both.
Caffe2
In Caffe2, Tensor<Context>::mutable_data()
allocates the memroy. In particular, Tensor<Context>::mutable_data
calls Tensor<Context>::raw_mutable_data
, which in turn calls Context::New
.
There are two implementations of Context
:
-
CPUContext
, whoseNew
method callsg_cpu_allocator.get()->New(size_t)
to allocate the memory. -
CUDAContext
, which has a data memberint gpu_id_
. This looks very similar to classmajel::GPUPlace
, who also has anint id_
data member.CUDAContext::New(size_t)
callsg_cub_allocator->DeviceAllocate(&ptr, nbytes)
to allocate the memory.
Majel
In Majel, there are basically two allocator types:
cpu::SystemAllocator
, which has similar functionality tocaffe2::CPUContext::New/Delete
.gpu::SystemAllocator
, which has similar functionality tocaffe2::CUDAContext::New/Delete
.
However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces.
In Majel there are hidden global variables like:
cpu::SystemAllocator g_cpu_allocator
, andvector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)
.
Programs allocate memory via a BuddyAllocator, which can take the g_cpu_allocator
or a g_gpu_allocators[gpu_id]
as its fallback allocator, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's New(size_t)
.