"update doc" (#5682)
parent
dc78f3ca7d
commit
0ed5a5513f
@ -1,4 +1,141 @@
|
||||
# Region-based Heterogeneous Memory Management
|
||||
## Design
|
||||
|
||||
Please check out the [design documentation](http://gangliao.me) to find out more details about
|
||||
buddy memory allocator for both CPU and GPU.
|
||||
### Usage
|
||||
|
||||
To allocate 4KB CPU memory:
|
||||
|
||||
```cpp
|
||||
p = memory::Alloc(platform::CPUPlace(), 4*1024);
|
||||
```
|
||||
|
||||
To allocate 4KB memory on the 3rd GPU:
|
||||
|
||||
```cpp
|
||||
p = memory::Alloc(platform::GPUPlace(2), 4*1024);
|
||||
```
|
||||
|
||||
To free memory and check the so-far used amount of memory on a place:
|
||||
|
||||
```cpp
|
||||
auto pl = platform::GPUPlace(0);
|
||||
p = memory::Alloc(pl, 4*1024);
|
||||
cout << memory::Used(pl);
|
||||
memory::Free(pl, p);
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
In `paddle/memory/memory.h` we have:
|
||||
|
||||
```cpp
|
||||
namespace memory {
|
||||
template <typename Place> void* Alloc(Place, size_t);
|
||||
template <typename Place> void Free(Place, void*);
|
||||
template <typename Place> size_t Used(Place);
|
||||
} // namespace memory
|
||||
```
|
||||
|
||||
These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`:
|
||||
|
||||
```cpp
|
||||
template<>
|
||||
void* Alloc<CPUPlace>(CPUPlace p, size_t size) {
|
||||
return GetCPUBuddyAllocator()->Alloc(size);
|
||||
}
|
||||
```
|
||||
|
||||
and
|
||||
|
||||
```cpp
|
||||
template<>
|
||||
void Alloc<GPUPlace>(GPUPlace p, size_t size) {
|
||||
return GetGPUBuddyAllocator(p.id)->Alloc(size);
|
||||
}
|
||||
```
|
||||
|
||||
Similar specializations exist for `Free` and `Used`.
|
||||
|
||||
### Implementation
|
||||
|
||||
`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions.
|
||||
|
||||
```cpp
|
||||
BuddyAllocator* GetCPUBuddyAllocator() {
|
||||
static BuddyAllocator* a = NULL;
|
||||
if (a == NULL) {
|
||||
a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
|
||||
}
|
||||
return a;
|
||||
}
|
||||
|
||||
BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
|
||||
static BuddyAllocator* as = NULL;
|
||||
if (as == NULL) {
|
||||
as = new BuddyAllocator*[platform::NumGPUs()];
|
||||
for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
|
||||
as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
|
||||
}
|
||||
}
|
||||
return as[gpu_id);
|
||||
```
|
||||
|
||||
#### `BuddyAllocator`
|
||||
|
||||
`BuddyAllocator` implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm:
|
||||
|
||||
```cpp
|
||||
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object:
|
||||
|
||||
```cpp
|
||||
class BuddyAllocator {
|
||||
private:
|
||||
struct Block {
|
||||
size_t size;
|
||||
Block* left, right;
|
||||
size_t index; // allocator id
|
||||
};
|
||||
...
|
||||
};
|
||||
```
|
||||
|
||||
Because BuddyAllocator has the meta-data of each block, it can trace the used memory -- record the amount returned by `Alloc` freed in `Free`. Instead, `CPUAllocator` and `GPUAllocator` doesn't know the size of freed memory block and cannot do the trace.
|
||||
|
||||
#### System Allocators
|
||||
|
||||
The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They work as the fallback allocators of `BuddyAllocator`.
|
||||
|
||||
## Justification
|
||||
|
||||
I got inspiration from Majel and Caffe2, though above design look different from both.
|
||||
|
||||
### Caffe2
|
||||
|
||||
In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy. In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
|
||||
|
||||
There are two implementations of `Context`:
|
||||
|
||||
1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory.
|
||||
|
||||
1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202). This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member. `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory.
|
||||
|
||||
### Majel
|
||||
|
||||
In Majel, there are basically two allocator types:
|
||||
|
||||
1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`.
|
||||
1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`.
|
||||
|
||||
However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces.
|
||||
|
||||
In Majel there are hidden global variables like:
|
||||
|
||||
1. `cpu::SystemAllocator g_cpu_allocator`, and
|
||||
1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`.
|
||||
|
||||
Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`.
|
||||
|
Loading…
Reference in new issue