You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
142 lines
4.6 KiB
142 lines
4.6 KiB
8 years ago
|
In my mind, the memory package works like the following:
|
||
|
|
||
|
## Design
|
||
|
|
||
|
### Usage
|
||
|
|
||
|
To allocate 4KB CPU memory:
|
||
|
|
||
|
```cpp
|
||
|
p = memory::Alloc(platform::CPUPlace(), 4*1024);
|
||
|
```
|
||
|
|
||
|
To allocate 4KB memory on the 3rd GPU:
|
||
|
|
||
|
```cpp
|
||
|
p = memory::Alloc(platform::GPUPlace(2), 4*1024);
|
||
|
```
|
||
|
|
||
|
To free memory and check the so-far used amount of memory on a place:
|
||
|
|
||
|
```cpp
|
||
|
auto pl = platform::GPUPlace(0);
|
||
|
p = memory::Alloc(pl, 4*1024);
|
||
|
cout << memory::Used(pl);
|
||
|
memory::Free(pl, p);
|
||
|
```
|
||
|
|
||
|
### The API
|
||
|
|
||
|
In `paddle/memory/memory.h` we have:
|
||
|
|
||
|
```cpp
|
||
|
template <typeanme Place> void* Alloc(Place, size_t);
|
||
|
template <typeanme Place> void Free(Place, void*);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`:
|
||
|
|
||
|
```cpp
|
||
|
template<>
|
||
|
void Alloc<CPUPlace>(CPUPlace p, size_t size) {
|
||
|
return GetCPUBuddyAllocator()->Alloc(size);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
and
|
||
|
|
||
|
```cpp
|
||
|
template<>
|
||
|
void Alloc(GPUPlace)(GPUPlace p, size_t size) {
|
||
|
return GetGPUBuddyAllocator(p.id)->Alloc(size);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### The Implementation
|
||
|
|
||
|
`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions.
|
||
|
|
||
|
```cpp
|
||
|
BuddyAllocator* GetCPUBuddyAllocator() {
|
||
|
static BuddyAllocator* a = NULL;
|
||
|
if (a == NULL) {
|
||
|
a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
|
||
|
}
|
||
|
return a;
|
||
|
}
|
||
|
|
||
|
BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
|
||
|
static BuddyAllocator* as = NULL;
|
||
|
if (as == NULL) {
|
||
|
as = new BuddyAllocator*[platform::NumGPUs()];
|
||
|
for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
|
||
|
as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
|
||
|
}
|
||
|
}
|
||
|
return as[gpu_id);
|
||
|
```
|
||
|
|
||
|
#### `BuddyAllocator`
|
||
|
|
||
|
`BuddyAllocator` implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm:
|
||
|
|
||
|
```cpp
|
||
|
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
|
||
|
...
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object:
|
||
|
|
||
|
```cpp
|
||
|
class BuddyAllocator {
|
||
|
private:
|
||
|
struct Block {
|
||
|
size_t size;
|
||
|
Blobk* left, right;
|
||
|
};
|
||
|
...
|
||
|
};
|
||
|
```
|
||
|
|
||
|
#### System Allocators
|
||
|
|
||
|
The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They hold information about the device, including the amount of memory has been allocated. So that we can call
|
||
|
|
||
|
- `GPUAllocator::Used` and
|
||
|
- `CPUAllocator::Used`
|
||
|
|
||
|
to get the amount of memory that has been allocated so far.
|
||
|
|
||
|
|
||
|
## Why Such a Design
|
||
|
|
||
|
I got inspiration from Majel and Caffe2, though above design look different from both.
|
||
|
|
||
|
### Caffe2
|
||
|
|
||
|
In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy. In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
|
||
|
|
||
|
There are two implementations of `Context`:
|
||
|
|
||
|
1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory.
|
||
|
|
||
|
1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202). This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member. `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory.
|
||
|
|
||
|
### Majel
|
||
|
|
||
|
In Majel, there are basically two allocator types:
|
||
|
|
||
|
1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`.
|
||
|
1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`.
|
||
|
|
||
|
However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces.
|
||
|
|
||
|
In Majel there are hidden global variables like:
|
||
|
|
||
|
1. `cpu::SystemAllocator g_cpu_allocator`, and
|
||
|
1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`.
|
||
|
|
||
|
Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`.
|