You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Paddle/paddle/memory
Yu Yang 1b0c7d7c7a
Simplize system_allocator and fix GPU_INFO (#6653)
7 years ago
..
detail Simplize system_allocator and fix GPU_INFO (#6653) 7 years ago
.clang-format Unify clang-format and add some missing clang-format 7 years ago
CMakeLists.txt Make enforce target (#5889) 7 years ago
README.md "update doc" (#5682) 7 years ago
memcpy.cc remove unused PADDLE_ONLY_CPU comment 7 years ago
memcpy.h Feature/save op (#5090) 7 years ago
memory.cc refine GPU memory allocation policy (#6373) 7 years ago
memory.h Add ENVIRONMENT interface interface 8 years ago
memory_test.cc remove unused PADDLE_ONLY_CPU comment 7 years ago

README.md

Region-based Heterogeneous Memory Management

Design

Usage

To allocate 4KB CPU memory:

p = memory::Alloc(platform::CPUPlace(), 4*1024);

To allocate 4KB memory on the 3rd GPU:

p = memory::Alloc(platform::GPUPlace(2), 4*1024);

To free memory and check the so-far used amount of memory on a place:

auto pl = platform::GPUPlace(0);
p = memory::Alloc(pl, 4*1024);
cout << memory::Used(pl);
memory::Free(pl, p);

API

In paddle/memory/memory.h we have:

namespace memory {
template <typename Place> void* Alloc(Place, size_t);
template <typename Place> void Free(Place, void*);
template <typename Place> size_t Used(Place);
}  // namespace memory

These function templates have specializations on either platform::CPUPlace or platform::GPUPlace:

template<>
void* Alloc<CPUPlace>(CPUPlace p, size_t size) {
  return GetCPUBuddyAllocator()->Alloc(size);
}

and

template<>
void Alloc<GPUPlace>(GPUPlace p, size_t size) {
  return GetGPUBuddyAllocator(p.id)->Alloc(size);
}

Similar specializations exist for Free and Used.

Implementation

GetCPUBuddyAllocator and GetGPUBuddyAllocator are singletions.

BuddyAllocator* GetCPUBuddyAllocator() {
  static BuddyAllocator* a = NULL;
  if (a == NULL) {
    a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
  }
  return a;
}

BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
  static BuddyAllocator* as = NULL;
  if (as == NULL) {
    as = new BuddyAllocator*[platform::NumGPUs()];
    for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
      as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
    }
  }
  return as[gpu_id);

BuddyAllocator

BuddyAllocator implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm:

BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
  ...
}

Please be aware that BuddyAllocator always allocate aligned memory, aligned on 32-bytes, which can hold a BuddyAllocator::Block object:

class BuddyAllocator {
 private:
  struct Block {
    size_t size;
    Block* left, right;
    size_t index; // allocator id
  };
  ...
};

Because BuddyAllocator has the meta-data of each block, it can trace the used memory -- record the amount returned by Alloc freed in Free. Instead, CPUAllocator and GPUAllocator doesn't know the size of freed memory block and cannot do the trace.

System Allocators

The GPUAllocator and CPUAllocator are calls system allocators. They work as the fallback allocators of BuddyAllocator.

Justification

I got inspiration from Majel and Caffe2, though above design look different from both.

Caffe2

In Caffe2, Tensor<Context>::mutable_data() allocates the memroy. In particular, Tensor<Context>::mutable_data calls Tensor<Context>::raw_mutable_data, which in turn calls Context::New.

There are two implementations of Context:

  1. CPUContext, whose New method calls g_cpu_allocator.get()->New(size_t) to allocate the memory.

  2. CUDAContext, which has a data member int gpu_id_. This looks very similar to class majel::GPUPlace, who also has an int id_ data member. CUDAContext::New(size_t) calls g_cub_allocator->DeviceAllocate(&ptr, nbytes) to allocate the memory.

Majel

In Majel, there are basically two allocator types:

  1. cpu::SystemAllocator, which has similar functionality to caffe2::CPUContext::New/Delete.
  2. gpu::SystemAllocator, which has similar functionality to caffe2::CUDAContext::New/Delete.

However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces.

In Majel there are hidden global variables like:

  1. cpu::SystemAllocator g_cpu_allocator, and
  2. vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS).

Programs allocate memory via a BuddyAllocator, which can take the g_cpu_allocator or a g_gpu_allocators[gpu_id] as its fallback allocator, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's New(size_t).