From 1c710053292394154d41db6c44ea808a8feaf65c Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Sun, 10 Sep 2017 15:03:58 -0700 Subject: [PATCH 01/21] Design Doc: Session --- doc/design/session.md | 62 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 doc/design/session.md diff --git a/doc/design/session.md b/doc/design/session.md new file mode 100644 index 0000000000..2e8c0ece7a --- /dev/null +++ b/doc/design/session.md @@ -0,0 +1,62 @@ +# Design Doc: Session + +## Abstract + +This design doc proposes to have an object called *Session* which +encapsulates the environment in which the computation graph is +executed. + +## Background + +A computation graph is executed in an environment which contains the +[scope](./scope.md) and other states. PaddlePaddle used to only have +an implicit global session on which `paddle.eval()` is executed. + +This has the limitation that the user can not create two independent +environments. For example, in reinforcement learning, the user may +want to have a stale model for inference and a fresh model for +training, and only replace the stale model with the fresh model +periodically. Also, we have no concept that can encapsulate a remote +environment that could execute a computation graph. + +## Session + +Session is an object that owns all runtime states such as scope, +reader OP's file handles, connection to a remote PaddlePaddle cluster, +etc. + +Session has two methods: `eval` and `close`. `eval` executes the +target OP in a given graph, and `close` closes the session and +releases all related resources: + +```Python +a = paddle.constant(1.0) +b = paddle.constant(2.0) +c = a + b +sess = paddle.session() +sess.eval(c) +sess.close() +``` + +### Remote Session + +Paddle Cloud will support user creating a remote session pointing to +the Paddle Cloud cluster. The user can send the computation graph to +be executed on the Paddle Cloud. In this way, the user can control a +cluster from her local computer: + +```Python +reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud +image = reader.column(0) +label = reader.column(1) +fc1 = paddle.op.fc(image, size=256, act="sigmoid") +fc2 = paddle.op.fc(fc1, size=10, act="softmax") +cost = paddle.op.cross_entropy(fc2) +opt = paddle.optimizer.sgd(cost) + +remote_config = ... # remote configuration such as endpoint, number of nodes and authentication. +sess = paddle.remoteSession(remote_config) +for i in range(1000): + sess.eval(opt) +sess.close() +``` From 94dfd8649e06108bc0c03e6f53eb43ab13f30332 Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Mon, 25 Sep 2017 16:02:58 -0700 Subject: [PATCH 02/21] fix according to comments --- doc/design/session.md | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/doc/design/session.md b/doc/design/session.md index 2e8c0ece7a..dc034c3906 100644 --- a/doc/design/session.md +++ b/doc/design/session.md @@ -6,26 +6,36 @@ This design doc proposes to have an object called *Session* which encapsulates the environment in which the computation graph is executed. +The session is able to distinguish running a graph locally or +remotely, using CPU only or using one or more GPUs. Different sessions +have different runtime environments such as [scopes](./scope.md) and +device contexts. + + ## Background -A computation graph is executed in an environment which contains the -[scope](./scope.md) and other states. PaddlePaddle used to only have -an implicit global session on which `paddle.eval()` is executed. +A computation graph runs in an environment which contains states such +as the scope and device contexts. The current design has an implicit +global session on which `paddle.eval()` is executed. + +Since the user is not able to explicitly switch between runtime +environments such as the scope and the device contexts, the user +cannot run a topology in two independent environments. For example, in +reinforcement learning, the user may want to have a stale model for +inference and a fresh model for training, and only replace the stale +model with the fresh model periodically. Also, we have no concept that +can encapsulate a remote environment that could execute a computation +graph. -This has the limitation that the user can not create two independent -environments. For example, in reinforcement learning, the user may -want to have a stale model for inference and a fresh model for -training, and only replace the stale model with the fresh model -periodically. Also, we have no concept that can encapsulate a remote -environment that could execute a computation graph. +We need a session concept to address above issues. ## Session -Session is an object that owns all runtime states such as scope, +A session is an object that owns all runtime states such as scope, reader OP's file handles, connection to a remote PaddlePaddle cluster, etc. -Session has two methods: `eval` and `close`. `eval` executes the +The session has two methods: `eval` and `close`. `eval` executes the target OP in a given graph, and `close` closes the session and releases all related resources: @@ -51,7 +61,7 @@ image = reader.column(0) label = reader.column(1) fc1 = paddle.op.fc(image, size=256, act="sigmoid") fc2 = paddle.op.fc(fc1, size=10, act="softmax") -cost = paddle.op.cross_entropy(fc2) +cost = paddle.op.cross_entropy(fc2, label) opt = paddle.optimizer.sgd(cost) remote_config = ... # remote configuration such as endpoint, number of nodes and authentication. From f24b5dffc42063ddfe28229fdb242c3df4ec1aa7 Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Tue, 26 Sep 2017 18:03:37 -0700 Subject: [PATCH 03/21] Update Session design doc --- doc/design/refactor/session.md | 160 +++++++++++++++++++++++++++++++++ doc/design/session.md | 72 --------------- 2 files changed, 160 insertions(+), 72 deletions(-) create mode 100644 doc/design/refactor/session.md delete mode 100644 doc/design/session.md diff --git a/doc/design/refactor/session.md b/doc/design/refactor/session.md new file mode 100644 index 0000000000..5f58148f01 --- /dev/null +++ b/doc/design/refactor/session.md @@ -0,0 +1,160 @@ +# Design Doc: Session + +## Abstract + +The *session* object encapsulates the environment in which the +computation graph is executed. + +We will have *local* session and *remote* session, they offer the +same [interface](#interface). The local session encapsulates the local +runtime environment and the remote session encapsulates the cluster +runtime envrionment. + +The local runtime envrionment contains: + +1. computation devices (i.e., CPU, GPU) handles, and +1. the [scope](../scope.md) which holds all variables. + +The remote runtime envrionment contains: + +1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster, + and +1. the distributed [scope](../scope.md) in a cluster which holds all + variables. + +The user can create a remote session on Paddle Cloud and evaluate the +computation graph with it. In this way, the user can control the +remote computation resource in a cluster from his local computer. + + +## Background + +The current design has an implicit global session on which +`paddle.eval()` is executed. The pain point is: + +Since the user is not able to explicitly switch between runtime +environments such as the scope and the device contexts, the user +cannot run a topology in two independent environments. + +For example, in reinforcement learning, the user may want to have a +stale model for inference and a fresh model for training, and only +replace the stale model with the fresh model periodically. + +Furthermore, we have no concept that encapsulates a remote environment +that executes a computation graph. + +We need the session object to address above issues. + + +## Session + +A session is an object that owns the runtime environment. All +computations are executed through `session.eval`. + + +### Interface + +``` +eval( + targets, + feed_dict=None, +) +``` + +Evaluates the target Operations or Variables in `targets`. + +- *targets*: the evaluation targets. Can be a single Operation or + Variable, or a list with the Operations or Variables as elements. + + The value returned by `eval()` has the same shape as the `target` + argument. + + The computation graph is implicitly inferred from the targets. + +- *feed_dict*: a dictionary that contains the tensors which overrides + the edges of the computation graph. + +``` +close() +``` + +Closes the session. Calling this method releases the scope. + + +### Create a Local Session + +``` +session( + gpu_ids=None +) +``` + +Creates a new session. One session owns one scope, so creating +multiple sessions will create different scopes. + +- *gpu_ids*: a single `int` or a list of `int` of the GPU IDs to be + used as the computation devices. If not specified, all avaiable GPUs + will be used. + + +#### Example + +```Python +a = paddle.constant(1.0) +b = paddle.constant(2.0) +c = a + b +sess = paddle.session(gpu_ids=[0,1]) +sess.eval(c) +sess.close() +``` + +### Create a Remote Session + +``` +create_cloud_job( + name, + num_trainer, + mem_per_trainer, + gpu_per_trainer, + cpu_per_trainer, + num_ps, + mem_per_ps, + cpu_per_ps, +) +``` + +Creates a Paddle Cloud job. Fails if the job name exists. + +``` +get_cloud_job( + name +) +``` + +Gets a Paddle Cloud job. + +``` +remote_session( + job +) +``` + +- *job*: the Paddle Cloud job. + +#### Example + +```Python +reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud +image = reader.column(0) +label = reader.column(1) +fc1 = paddle.op.fc(image, size=256, act="sigmoid") +fc2 = paddle.op.fc(fc1, size=10, act="softmax") +cost = paddle.op.cross_entropy(fc2, label) +opt = paddle.optimizer.sgd(cost) + +job = paddle.create_cloud_job("test", 3, "1G", 1, 1, 2, "1G", 1) +sess = paddle.remote_ession(job) +for i in range(1000): + sess.eval(opt) +sess.close() +``` diff --git a/doc/design/session.md b/doc/design/session.md deleted file mode 100644 index dc034c3906..0000000000 --- a/doc/design/session.md +++ /dev/null @@ -1,72 +0,0 @@ -# Design Doc: Session - -## Abstract - -This design doc proposes to have an object called *Session* which -encapsulates the environment in which the computation graph is -executed. - -The session is able to distinguish running a graph locally or -remotely, using CPU only or using one or more GPUs. Different sessions -have different runtime environments such as [scopes](./scope.md) and -device contexts. - - -## Background - -A computation graph runs in an environment which contains states such -as the scope and device contexts. The current design has an implicit -global session on which `paddle.eval()` is executed. - -Since the user is not able to explicitly switch between runtime -environments such as the scope and the device contexts, the user -cannot run a topology in two independent environments. For example, in -reinforcement learning, the user may want to have a stale model for -inference and a fresh model for training, and only replace the stale -model with the fresh model periodically. Also, we have no concept that -can encapsulate a remote environment that could execute a computation -graph. - -We need a session concept to address above issues. - -## Session - -A session is an object that owns all runtime states such as scope, -reader OP's file handles, connection to a remote PaddlePaddle cluster, -etc. - -The session has two methods: `eval` and `close`. `eval` executes the -target OP in a given graph, and `close` closes the session and -releases all related resources: - -```Python -a = paddle.constant(1.0) -b = paddle.constant(2.0) -c = a + b -sess = paddle.session() -sess.eval(c) -sess.close() -``` - -### Remote Session - -Paddle Cloud will support user creating a remote session pointing to -the Paddle Cloud cluster. The user can send the computation graph to -be executed on the Paddle Cloud. In this way, the user can control a -cluster from her local computer: - -```Python -reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud -image = reader.column(0) -label = reader.column(1) -fc1 = paddle.op.fc(image, size=256, act="sigmoid") -fc2 = paddle.op.fc(fc1, size=10, act="softmax") -cost = paddle.op.cross_entropy(fc2, label) -opt = paddle.optimizer.sgd(cost) - -remote_config = ... # remote configuration such as endpoint, number of nodes and authentication. -sess = paddle.remoteSession(remote_config) -for i in range(1000): - sess.eval(opt) -sess.close() -``` From 757c76b83f3701c29efc88c546ec90a18952f98a Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Thu, 28 Sep 2017 15:07:17 -0700 Subject: [PATCH 04/21] update according to comments --- doc/design/refactor/session.md | 74 +++++++++++++++++++++------------- 1 file changed, 47 insertions(+), 27 deletions(-) diff --git a/doc/design/refactor/session.md b/doc/design/refactor/session.md index 5f58148f01..9a7451ece5 100644 --- a/doc/design/refactor/session.md +++ b/doc/design/refactor/session.md @@ -5,17 +5,17 @@ The *session* object encapsulates the environment in which the computation graph is executed. -We will have *local* session and *remote* session, they offer the +We will have the *local* session and *remote* session, they offer the same [interface](#interface). The local session encapsulates the local runtime environment and the remote session encapsulates the cluster -runtime envrionment. +runtime environment. -The local runtime envrionment contains: +The local runtime environment contains: 1. computation devices (i.e., CPU, GPU) handles, and 1. the [scope](../scope.md) which holds all variables. -The remote runtime envrionment contains: +The remote runtime environment contains: 1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster, and @@ -29,12 +29,12 @@ remote computation resource in a cluster from his local computer. ## Background -The current design has an implicit global session on which +The current design has an implicit global session in which `paddle.eval()` is executed. The pain point is: Since the user is not able to explicitly switch between runtime -environments such as the scope and the device contexts, the user -cannot run a topology in two independent environments. +environments, the user cannot run a topology in two independent +environments. For example, in reinforcement learning, the user may want to have a stale model for inference and a fresh model for training, and only @@ -49,12 +49,12 @@ We need the session object to address above issues. ## Session A session is an object that owns the runtime environment. All -computations are executed through `session.eval`. +computations are executed through `session.eval()`. ### Interface -``` +```python eval( targets, feed_dict=None, @@ -64,37 +64,57 @@ eval( Evaluates the target Operations or Variables in `targets`. - *targets*: the evaluation targets. Can be a single Operation or - Variable, or a list with the Operations or Variables as elements. + Variable, or a list with the Operations or Variables as + elements. The value returned by `eval()` has the same shape as the + `target` argument. + + The PaddlePaddle program is represented by + the [ProgramDesc](../design/program.md), `eval()` will infer the + ProgramDesc from the given targets and run the PaddlePaddle + program. Please + see + [this graph](./distributed_architecture.md#local-training-architecture) for + the detailed illustration for the local session + and + [this graph](./distributed_architecture.md#distributed-training-architecture) for + the detailed illustration for the remote session. + +- *feed_dict*: a dictionary that contains the tensors which override + the edges of the computation graph. - The value returned by `eval()` has the same shape as the `target` - argument. + feed_dict not only can provide the input data, it can override any + OP's input as well: - The computation graph is implicitly inferred from the targets. + ```python + a = pd.constant(1.0, name="a") + b = pd.constant(2.0) + c = pd.mul(a,b) + sess.eval(targets=c, feed_dict={"a":3.0}) # returns 6.0 + ``` -- *feed_dict*: a dictionary that contains the tensors which overrides - the edges of the computation graph. - -``` +```python close() ``` -Closes the session. Calling this method releases the scope. +Closes the session and releases the scope that the session owns. ### Create a Local Session -``` +```python session( - gpu_ids=None + devices=None ) ``` Creates a new session. One session owns one scope, so creating multiple sessions will create different scopes. -- *gpu_ids*: a single `int` or a list of `int` of the GPU IDs to be - used as the computation devices. If not specified, all avaiable GPUs - will be used. +- *devices*: a single `string` or a list of `string` of device names, + the corresponding devices will be the computation devices for + `eval()`. If not specified, all available devices (e.g., all GPUs) + will be used. The user doesn't need to specify the CPU device since + it will be always used. #### Example @@ -103,14 +123,14 @@ multiple sessions will create different scopes. a = paddle.constant(1.0) b = paddle.constant(2.0) c = a + b -sess = paddle.session(gpu_ids=[0,1]) +sess = paddle.session(devices=["gpu:0", "gpu:1", "fpga:0"]) sess.eval(c) sess.close() ``` ### Create a Remote Session -``` +```python create_cloud_job( name, num_trainer, @@ -125,7 +145,7 @@ create_cloud_job( Creates a Paddle Cloud job. Fails if the job name exists. -``` +```python get_cloud_job( name ) @@ -133,7 +153,7 @@ get_cloud_job( Gets a Paddle Cloud job. -``` +```python remote_session( job ) From 5f51d0afc49f4bd4c624ec62aa1e3ccd31840aee Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Wed, 4 Oct 2017 10:00:39 -0700 Subject: [PATCH 05/21] Add -D PADDLE_WITH_CUDA in cmake/configure.cmake --- cmake/configure.cmake | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/cmake/configure.cmake b/cmake/configure.cmake index 51c3b918cc..4e044ca421 100644 --- a/cmake/configure.cmake +++ b/cmake/configure.cmake @@ -49,11 +49,16 @@ if(NOT WITH_GOLANG) endif(NOT WITH_GOLANG) if(NOT WITH_GPU) + # Will gradually remove uses of PADDLE_ONLY_CPU in source files, + # so could we remove -DPADDLE_ONLY_CPU. + # c.f. https://github.com/PaddlePaddle/Paddle/issues/4588 add_definitions(-DPADDLE_ONLY_CPU) add_definitions(-DHPPL_STUB_FUNC) list(APPEND CMAKE_CXX_SOURCE_FILE_EXTENSIONS cu) else() + add_definitions(-DPADDLE_WITH_CUDA) + FIND_PACKAGE(CUDA REQUIRED) if(${CUDA_VERSION_MAJOR} VERSION_LESS 7) From a9e298bebef29390b815e431e1a475ab1417015a Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Wed, 4 Oct 2017 13:40:32 -0700 Subject: [PATCH 06/21] fix according to comments --- doc/design/refactor/session.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/design/refactor/session.md b/doc/design/refactor/session.md index 9a7451ece5..1d9a26683c 100644 --- a/doc/design/refactor/session.md +++ b/doc/design/refactor/session.md @@ -86,10 +86,10 @@ Evaluates the target Operations or Variables in `targets`. OP's input as well: ```python - a = pd.constant(1.0, name="a") - b = pd.constant(2.0) + a = pd.constant(2.0, name="a") + b = pd.variable(name="b") c = pd.mul(a,b) - sess.eval(targets=c, feed_dict={"a":3.0}) # returns 6.0 + sess.eval(targets=c, feed_dict={"b":3.0}) # returns 6.0 ``` ```python @@ -107,14 +107,14 @@ session( ) ``` -Creates a new session. One session owns one scope, so creating +Creates a new session. One session owns one global scope, so creating multiple sessions will create different scopes. - *devices*: a single `string` or a list of `string` of device names, the corresponding devices will be the computation devices for `eval()`. If not specified, all available devices (e.g., all GPUs) will be used. The user doesn't need to specify the CPU device since - it will be always used. + it will be always used. Multiple sessions can use the same device. #### Example From 4558807c48035ee1d279d16975d6e2c607cb35f5 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Wed, 4 Oct 2017 14:01:34 -0700 Subject: [PATCH 07/21] Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU --- paddle/api/Util.cpp | 2 +- paddle/capi/Matrix.cpp | 2 +- paddle/framework/grad_op_builder_test.cc | 2 +- paddle/framework/lod_tensor.h | 4 +-- paddle/framework/op_proto_maker_test.cc | 2 +- paddle/framework/op_registry.h | 2 +- paddle/framework/op_registry_test.cc | 2 +- paddle/framework/operator.cc | 2 +- paddle/framework/tensor_impl.h | 4 +-- paddle/framework/tensor_test.cc | 8 +++--- paddle/function/BlockExpandOp.cpp | 2 +- paddle/function/ContextProjectionOp.cpp | 2 +- paddle/function/CosSimOp.cpp | 2 +- paddle/function/CropOp.cpp | 2 +- paddle/function/CrossMapNormalOp.cpp | 2 +- paddle/function/DepthwiseConvOp.cpp | 2 +- paddle/function/DepthwiseConvOpTest.cpp | 2 +- paddle/function/GemmConvOp.cpp | 2 +- paddle/function/GemmConvOpTest.cpp | 2 +- paddle/function/Im2ColTest.cpp | 2 +- paddle/function/MulOp.cpp | 2 +- paddle/function/PadOp.cpp | 2 +- paddle/function/RowConvOp.cpp | 2 +- paddle/function/SwitchOp.cpp | 2 +- paddle/gserver/layers/BatchNormBaseLayer.cpp | 2 +- .../layers/BatchNormalizationLayer.cpp | 6 ++--- paddle/gserver/layers/PoolLayer.cpp | 4 +-- paddle/gserver/tests/LayerGradUtil.cpp | 2 +- paddle/gserver/tests/test_BatchNorm.cpp | 2 +- paddle/gserver/tests/test_ConvUnify.cpp | 2 +- paddle/gserver/tests/test_DetectionOutput.cpp | 2 +- paddle/gserver/tests/test_Evaluator.cpp | 2 +- paddle/gserver/tests/test_KmaxSeqScore.cpp | 2 +- paddle/gserver/tests/test_LayerGrad.cpp | 26 +++++++++---------- paddle/gserver/tests/test_NetworkCompare.cpp | 2 +- paddle/gserver/tests/test_PriorBox.cpp | 2 +- .../gserver/tests/test_ProtoDataProvider.cpp | 6 ++--- paddle/gserver/tests/test_PyDataProvider.cpp | 4 +-- .../gserver/tests/test_SelectiveFCLayer.cpp | 8 +++--- .../gserver/tests/test_SeqSliceLayerGrad.cpp | 2 +- paddle/gserver/tests/test_WarpCTCLayer.cpp | 2 +- paddle/math/Matrix.cpp | 6 ++--- paddle/math/SparseMatrix.cpp | 2 +- paddle/math/Vector.cpp | 6 ++--- paddle/math/tests/test_Allocator.cpp | 4 +-- paddle/math/tests/test_BaseMatrix.cpp | 2 +- paddle/math/tests/test_CpuGpuVector.cpp | 2 +- paddle/math/tests/test_ExecViaCpu.cpp | 2 +- paddle/math/tests/test_GpuProfiler.cpp | 2 +- paddle/math/tests/test_Matrix.cpp | 2 +- paddle/math/tests/test_SparseMatrix.cpp | 6 ++--- paddle/math/tests/test_TrainingAlgorithm.cpp | 2 +- paddle/math/tests/test_batchTranspose.cpp | 2 +- paddle/math/tests/test_matrixCompare.cpp | 2 +- paddle/math/tests/test_perturbation.cpp | 2 +- .../math/tests/test_sparseMatrixCompare.cpp | 2 +- paddle/memory/detail/buddy_allocator.cc | 2 +- paddle/memory/detail/system_allocator.cc | 2 +- paddle/memory/detail/system_allocator.h | 2 +- paddle/memory/detail/system_allocator_test.cc | 2 +- paddle/memory/memcpy.cc | 2 +- paddle/memory/memcpy.h | 2 +- paddle/memory/memory.cc | 2 +- paddle/memory/memory_test.cc | 2 +- paddle/operators/detail/strided_memcpy.h | 2 +- paddle/operators/math/im2col_test.cc | 4 +-- paddle/operators/math/math_function_test.cc | 2 +- paddle/operators/strided_memcpy_test.cc | 4 +-- paddle/platform/device_context.cc | 2 +- paddle/platform/device_context.h | 4 +-- paddle/platform/enforce.h | 4 +-- paddle/platform/enforce_test.cc | 2 +- paddle/platform/gpu_info.h | 2 +- paddle/platform/variant.h | 2 +- paddle/pserver/test/SocketTest.cpp | 2 +- paddle/pserver/test/test_ProtoServer.cpp | 2 +- paddle/pybind/pybind.cc | 12 ++++----- paddle/pybind/tensor_py.h | 2 +- paddle/string/to_string_test.cc | 2 +- paddle/trainer/MergeModel.cpp | 2 +- paddle/trainer/tests/test_Compare.cpp | 2 +- paddle/trainer/tests/test_CompareSparse.cpp | 4 +-- paddle/trainer/tests/test_Trainer.cpp | 4 +-- paddle/trainer/tests/test_TrainerOnePass.cpp | 6 ++--- .../test_recurrent_machine_generation.cpp | 2 +- paddle/utils/Flags.cpp | 2 +- paddle/utils/Util.h | 2 +- paddle/utils/Version.h | 2 +- 88 files changed, 134 insertions(+), 134 deletions(-) diff --git a/paddle/api/Util.cpp b/paddle/api/Util.cpp index 7446d892fd..11bd05c09d 100644 --- a/paddle/api/Util.cpp +++ b/paddle/api/Util.cpp @@ -47,7 +47,7 @@ bool isUsingGpu() { return FLAGS_use_gpu; } void setUseGpu(bool useGpu) { FLAGS_use_gpu = useGpu; } bool isGpuVersion() { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA return false; #else return true; diff --git a/paddle/capi/Matrix.cpp b/paddle/capi/Matrix.cpp index 5b3737a759..4547afaf1d 100644 --- a/paddle/capi/Matrix.cpp +++ b/paddle/capi/Matrix.cpp @@ -46,7 +46,7 @@ paddle_error paddle_matrix_set_row(paddle_matrix mat, if (rowID >= ptr->mat->getHeight()) return kPD_OUT_OF_RANGE; paddle::real* buf = ptr->mat->getRowBuf(rowID); size_t width = ptr->mat->getWidth(); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA hl_memcpy(buf, rowArray, sizeof(paddle::real) * width); #else std::copy(rowArray, rowArray + width, buf); diff --git a/paddle/framework/grad_op_builder_test.cc b/paddle/framework/grad_op_builder_test.cc index 2dbc2e6620..793780ea44 100644 --- a/paddle/framework/grad_op_builder_test.cc +++ b/paddle/framework/grad_op_builder_test.cc @@ -183,4 +183,4 @@ TEST(GradOpDescBuilder, IOIgnoredInGradient) { {f::GradVarName("in3_1"), f::GradVarName("in3_2")})); delete forw_op; delete grad_op; -} \ No newline at end of file +} diff --git a/paddle/framework/lod_tensor.h b/paddle/framework/lod_tensor.h index b12c95b6b7..4db36ee766 100644 --- a/paddle/framework/lod_tensor.h +++ b/paddle/framework/lod_tensor.h @@ -15,7 +15,7 @@ #pragma once #include -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #include #include @@ -29,7 +29,7 @@ namespace paddle { namespace framework { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA template using Vector = std::vector; #else diff --git a/paddle/framework/op_proto_maker_test.cc b/paddle/framework/op_proto_maker_test.cc index b01e30f753..988a14cf4d 100644 --- a/paddle/framework/op_proto_maker_test.cc +++ b/paddle/framework/op_proto_maker_test.cc @@ -48,4 +48,4 @@ TEST(ProtoMaker, DuplicatedInOut) { paddle::framework::OpAttrChecker op_checker; auto proto_maker = TestInOutProtoMaker(&op_proto, &op_checker); ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); -} \ No newline at end of file +} diff --git a/paddle/framework/op_registry.h b/paddle/framework/op_registry.h index aca6579f36..958cf581f5 100644 --- a/paddle/framework/op_registry.h +++ b/paddle/framework/op_registry.h @@ -211,7 +211,7 @@ class OpKernelRegistrar : public Registrar { // TODO(fengjiayi): The following macros // seems ugly, do we have better method? -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA #define USE_OP_KERNEL(op_type) USE_OP_DEVICE_KERNEL(op_type, CPU) #else #define USE_OP_KERNEL(op_type) \ diff --git a/paddle/framework/op_registry_test.cc b/paddle/framework/op_registry_test.cc index f89f40b444..b860fe6cac 100644 --- a/paddle/framework/op_registry_test.cc +++ b/paddle/framework/op_registry_test.cc @@ -183,4 +183,4 @@ class CosineOpComplete : public paddle::framework::CosineOp { TEST(OperatorRegistrar, Test) { using namespace paddle::framework; OperatorRegistrar reg("cos"); -} \ No newline at end of file +} diff --git a/paddle/framework/operator.cc b/paddle/framework/operator.cc index 21c1c6f9e6..2ca838f838 100644 --- a/paddle/framework/operator.cc +++ b/paddle/framework/operator.cc @@ -25,7 +25,7 @@ Eigen::DefaultDevice& ExecutionContext::GetEigenDevice< return *device_context_.GetEigenDevice(); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template <> Eigen::GpuDevice& ExecutionContext::GetEigenDevice() const { diff --git a/paddle/framework/tensor_impl.h b/paddle/framework/tensor_impl.h index 1cde1f74b8..379eac94f9 100644 --- a/paddle/framework/tensor_impl.h +++ b/paddle/framework/tensor_impl.h @@ -65,7 +65,7 @@ inline T* Tensor::mutable_data(platform::Place place) { holder_.reset(new PlaceholderImpl( boost::get(place), size)); } else if (platform::is_gpu_place(place)) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA PADDLE_THROW("'GPUPlace' is not supported in CPU only device."); } #else @@ -103,7 +103,7 @@ inline void Tensor::CopyFrom(const Tensor& src, memory::Copy(boost::get(dst_place), dst_ptr, boost::get(src_place), src_ptr, size); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA else if (platform::is_gpu_place(src_place) && platform::is_cpu_place(dst_place)) { memory::Copy(boost::get(dst_place), dst_ptr, diff --git a/paddle/framework/tensor_test.cc b/paddle/framework/tensor_test.cc index 86c6945ab5..58cf0fc3cb 100644 --- a/paddle/framework/tensor_test.cc +++ b/paddle/framework/tensor_test.cc @@ -74,7 +74,7 @@ TEST(Tensor, MutableData) { EXPECT_EQ(p1, p2); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA { Tensor src_tensor; float* p1 = nullptr; @@ -126,7 +126,7 @@ TEST(Tensor, ShareDataWith) { ASSERT_EQ(src_tensor.data(), dst_tensor.data()); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA { Tensor src_tensor; Tensor dst_tensor; @@ -163,7 +163,7 @@ TEST(Tensor, Slice) { EXPECT_EQ(src_data_address + 3 * 4 * 1 * sizeof(int), slice_data_address); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA { Tensor src_tensor; src_tensor.mutable_data(make_ddim({6, 9}), GPUPlace()); @@ -218,7 +218,7 @@ TEST(Tensor, CopyFrom) { EXPECT_EQ(dst_ptr[i], slice_ptr[i]); } } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA { Tensor src_tensor; Tensor gpu_tensor; diff --git a/paddle/function/BlockExpandOp.cpp b/paddle/function/BlockExpandOp.cpp index ad78f5f584..bd0fe119ce 100644 --- a/paddle/function/BlockExpandOp.cpp +++ b/paddle/function/BlockExpandOp.cpp @@ -194,7 +194,7 @@ public: REGISTER_TYPED_FUNC(BlockExpand, CPU, BlockExpandForward); REGISTER_TYPED_FUNC(BlockExpandGrad, CPU, BlockExpandBackward); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(BlockExpand, GPU, BlockExpandForward); REGISTER_TYPED_FUNC(BlockExpandGrad, GPU, BlockExpandBackward); #endif diff --git a/paddle/function/ContextProjectionOp.cpp b/paddle/function/ContextProjectionOp.cpp index ab18c39df8..23916c0f4b 100644 --- a/paddle/function/ContextProjectionOp.cpp +++ b/paddle/function/ContextProjectionOp.cpp @@ -395,7 +395,7 @@ REGISTER_TYPED_FUNC(ContextProjectionForward, REGISTER_TYPED_FUNC(ContextProjectionBackward, CPU, ContextProjectionBackwardFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(ContextProjectionForward, GPU, ContextProjectionForwardFunc); diff --git a/paddle/function/CosSimOp.cpp b/paddle/function/CosSimOp.cpp index 4418f144d3..2e5c281f37 100644 --- a/paddle/function/CosSimOp.cpp +++ b/paddle/function/CosSimOp.cpp @@ -233,7 +233,7 @@ private: REGISTER_TYPED_FUNC(CosSimForward, CPU, CosSimForwardFunc); REGISTER_TYPED_FUNC(CosSimBackward, CPU, CosSimBackwardFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(CosSimForward, GPU, CosSimForwardFunc); REGISTER_TYPED_FUNC(CosSimBackward, GPU, CosSimBackwardFunc); #endif diff --git a/paddle/function/CropOp.cpp b/paddle/function/CropOp.cpp index 39504cc2c1..46f98f12c1 100644 --- a/paddle/function/CropOp.cpp +++ b/paddle/function/CropOp.cpp @@ -169,7 +169,7 @@ private: REGISTER_TYPED_FUNC(Crop, CPU, CropFunc); REGISTER_TYPED_FUNC(CropGrad, CPU, CropGradFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(Crop, GPU, CropFunc); REGISTER_TYPED_FUNC(CropGrad, GPU, CropGradFunc); #endif diff --git a/paddle/function/CrossMapNormalOp.cpp b/paddle/function/CrossMapNormalOp.cpp index 1cf0918bed..9e88669d37 100644 --- a/paddle/function/CrossMapNormalOp.cpp +++ b/paddle/function/CrossMapNormalOp.cpp @@ -336,7 +336,7 @@ private: REGISTER_TYPED_FUNC(CrossMapNormal, CPU, CrossMapNormalFunc); REGISTER_TYPED_FUNC(CrossMapNormalGrad, CPU, CrossMapNormalGradFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(CrossMapNormal, GPU, CrossMapNormalFunc); REGISTER_TYPED_FUNC(CrossMapNormalGrad, GPU, CrossMapNormalGradFunc); #endif diff --git a/paddle/function/DepthwiseConvOp.cpp b/paddle/function/DepthwiseConvOp.cpp index 7656ab3d0a..9863e3ae1d 100644 --- a/paddle/function/DepthwiseConvOp.cpp +++ b/paddle/function/DepthwiseConvOp.cpp @@ -292,7 +292,7 @@ REGISTER_TYPED_FUNC(DepthwiseConvGradInput, REGISTER_TYPED_FUNC(DepthwiseConvGradFilter, CPU, DepthwiseConvGradFilterFunction); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(DepthwiseConv, GPU, DepthwiseConvFunction); REGISTER_TYPED_FUNC(DepthwiseConvGradInput, GPU, diff --git a/paddle/function/DepthwiseConvOpTest.cpp b/paddle/function/DepthwiseConvOpTest.cpp index 39033ecb2b..b1a90da7db 100644 --- a/paddle/function/DepthwiseConvOpTest.cpp +++ b/paddle/function/DepthwiseConvOpTest.cpp @@ -17,7 +17,7 @@ limitations under the License. */ namespace paddle { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(DepthwiseConv, Forward) { DepthwiseConvolution( "GemmConv-CPU", "DepthwiseConv-GPU", forward); diff --git a/paddle/function/GemmConvOp.cpp b/paddle/function/GemmConvOp.cpp index 68e08c1480..bdb56ddac3 100644 --- a/paddle/function/GemmConvOp.cpp +++ b/paddle/function/GemmConvOp.cpp @@ -340,7 +340,7 @@ public: REGISTER_TYPED_FUNC(GemmConv, CPU, GemmConvFunction); REGISTER_TYPED_FUNC(GemmConvGradInput, CPU, GemmConvGradInputFunction); REGISTER_TYPED_FUNC(GemmConvGradFilter, CPU, GemmConvGradFilterFunction); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(GemmConv, GPU, GemmConvFunction); REGISTER_TYPED_FUNC(GemmConvGradInput, GPU, GemmConvGradInputFunction); REGISTER_TYPED_FUNC(GemmConvGradFilter, GPU, GemmConvGradFilterFunction); diff --git a/paddle/function/GemmConvOpTest.cpp b/paddle/function/GemmConvOpTest.cpp index bd1cf3c6a4..b5b5e1f35b 100644 --- a/paddle/function/GemmConvOpTest.cpp +++ b/paddle/function/GemmConvOpTest.cpp @@ -24,7 +24,7 @@ TEST(GemmConv, NaiveConv) { "NaiveConv-CPU", "GemmConv-CPU", forward); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(GemmConv, Forward) { Convolution( "GemmConv-CPU", "GemmConv-GPU", forward); diff --git a/paddle/function/Im2ColTest.cpp b/paddle/function/Im2ColTest.cpp index 55325e94b5..a0a01a5fc7 100644 --- a/paddle/function/Im2ColTest.cpp +++ b/paddle/function/Im2ColTest.cpp @@ -116,7 +116,7 @@ void TestIm2ColFunctor() { TEST(Im2ColFunctor, CPU) { TestIm2ColFunctor(); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(Im2ColFunctor, GPU) { TestIm2ColFunctor(); } diff --git a/paddle/function/MulOp.cpp b/paddle/function/MulOp.cpp index 655026320c..704a8c4132 100644 --- a/paddle/function/MulOp.cpp +++ b/paddle/function/MulOp.cpp @@ -341,7 +341,7 @@ private: }; REGISTER_TYPED_FUNC(MulOp, CPU, MulFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(MulOp, GPU, MulFunc); #endif } // namespace paddle diff --git a/paddle/function/PadOp.cpp b/paddle/function/PadOp.cpp index 24c9bf4e72..eed2f2e308 100644 --- a/paddle/function/PadOp.cpp +++ b/paddle/function/PadOp.cpp @@ -207,7 +207,7 @@ private: REGISTER_TYPED_FUNC(Pad, CPU, PadFunc); REGISTER_TYPED_FUNC(PadGrad, CPU, PadGradFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(Pad, GPU, PadFunc); REGISTER_TYPED_FUNC(PadGrad, GPU, PadGradFunc); #endif diff --git a/paddle/function/RowConvOp.cpp b/paddle/function/RowConvOp.cpp index 09e702f71a..7c802d6627 100644 --- a/paddle/function/RowConvOp.cpp +++ b/paddle/function/RowConvOp.cpp @@ -217,7 +217,7 @@ public: REGISTER_TYPED_FUNC(RowConv, CPU, RowConvFunc); REGISTER_TYPED_FUNC(RowConvGrad, CPU, RowConvGradFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(RowConv, GPU, RowConvFunc); REGISTER_TYPED_FUNC(RowConvGrad, GPU, RowConvGradFunc); #endif diff --git a/paddle/function/SwitchOp.cpp b/paddle/function/SwitchOp.cpp index db839b5b76..597723a2dd 100644 --- a/paddle/function/SwitchOp.cpp +++ b/paddle/function/SwitchOp.cpp @@ -132,7 +132,7 @@ public: REGISTER_TYPED_FUNC(NCHW2NHWC, CPU, NCHW2NHWCFunc); REGISTER_TYPED_FUNC(NHWC2NCHW, CPU, NHWC2NCHWFunc); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA REGISTER_TYPED_FUNC(NCHW2NHWC, GPU, NCHW2NHWCFunc); REGISTER_TYPED_FUNC(NHWC2NCHW, GPU, NHWC2NCHWFunc); #endif diff --git a/paddle/gserver/layers/BatchNormBaseLayer.cpp b/paddle/gserver/layers/BatchNormBaseLayer.cpp index 55f52816ab..bc7d1c83a4 100644 --- a/paddle/gserver/layers/BatchNormBaseLayer.cpp +++ b/paddle/gserver/layers/BatchNormBaseLayer.cpp @@ -16,7 +16,7 @@ limitations under the License. */ #include "BatchNormalizationLayer.h" #include "Layer.h" #include "paddle/utils/Stat.h" -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include "CudnnBatchNormLayer.h" #endif diff --git a/paddle/gserver/layers/BatchNormalizationLayer.cpp b/paddle/gserver/layers/BatchNormalizationLayer.cpp index 33cf24431d..dacff25e59 100644 --- a/paddle/gserver/layers/BatchNormalizationLayer.cpp +++ b/paddle/gserver/layers/BatchNormalizationLayer.cpp @@ -13,7 +13,7 @@ See the License for the specific language governing permissions and limitations under the License. */ #include "paddle/utils/Stat.h" -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include "hl_batch_transpose.h" #endif #include "BatchNormalizationLayer.h" @@ -90,7 +90,7 @@ void BatchNormalizationLayer::expandMat(const MatrixPtr& in, MatrixPtr& out) { size_t batchSize = in->getHeight(); CHECK_EQ(out->getHeight(), batchSize * imgPixels_); if (useGpu_) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA LOG(FATAL) << "paddle is compiled only for cpu"; #else batchTranspose( @@ -127,7 +127,7 @@ void BatchNormalizationLayer::shrinkMat(const MatrixPtr& in, MatrixPtr& out) { } CHECK_EQ(in->getHeight(), static_cast(batchSize * imgPixels_)); if (useGpu_) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA LOG(FATAL) << "paddle is compiled only for cpu"; #else batchTranspose( diff --git a/paddle/gserver/layers/PoolLayer.cpp b/paddle/gserver/layers/PoolLayer.cpp index 43ab4e4d47..7b932d5a76 100644 --- a/paddle/gserver/layers/PoolLayer.cpp +++ b/paddle/gserver/layers/PoolLayer.cpp @@ -15,7 +15,7 @@ limitations under the License. */ #include "PoolLayer.h" #include "PoolProjectionLayer.h" #include "paddle/utils/Logging.h" -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include "CudnnPoolLayer.h" #endif namespace paddle { @@ -53,7 +53,7 @@ Layer* PoolLayer::create(const LayerConfig& config) { const std::string& pool = config.inputs(0).pool_conf().pool_type(); if (pool == "max-projection" || pool == "avg-projection") { return new PoolProjectionLayer(config); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA } else if (CudnnPoolLayer::typeCheck(pool)) { return new CudnnPoolLayer(config); #endif diff --git a/paddle/gserver/tests/LayerGradUtil.cpp b/paddle/gserver/tests/LayerGradUtil.cpp index 59df057a80..cd957c7c0b 100644 --- a/paddle/gserver/tests/LayerGradUtil.cpp +++ b/paddle/gserver/tests/LayerGradUtil.cpp @@ -674,7 +674,7 @@ void testLayerGradKernel(TestConfig testConf, bool useGpu, bool useWeight, float epsilon) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) return; #endif FLAGS_use_gpu = useGpu; diff --git a/paddle/gserver/tests/test_BatchNorm.cpp b/paddle/gserver/tests/test_BatchNorm.cpp index c1c85f8fac..050fde9d0a 100644 --- a/paddle/gserver/tests/test_BatchNorm.cpp +++ b/paddle/gserver/tests/test_BatchNorm.cpp @@ -119,7 +119,7 @@ TEST(Layer, batchNorm) { CHECK_EQ(static_cast(convLayer->getOutputValue()->getWidth()), 576); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA void batchNormInference(int n, int c, int h, int w) { MatrixPtr input = std::make_shared(n, c * h * w); MatrixPtr cudnnOut = std::make_shared(n, c * h * w); diff --git a/paddle/gserver/tests/test_ConvUnify.cpp b/paddle/gserver/tests/test_ConvUnify.cpp index 16556469cb..ffcc47e2a8 100644 --- a/paddle/gserver/tests/test_ConvUnify.cpp +++ b/paddle/gserver/tests/test_ConvUnify.cpp @@ -117,7 +117,7 @@ MatrixPtr doOneConvTest(size_t imgSize, } TEST(Layer, convParaUnified) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA MatrixPtr input, resultCpu, resultGpu; /// TEST1 for conv /// diff --git a/paddle/gserver/tests/test_DetectionOutput.cpp b/paddle/gserver/tests/test_DetectionOutput.cpp index 1a83f48fae..dc39c97a87 100644 --- a/paddle/gserver/tests/test_DetectionOutput.cpp +++ b/paddle/gserver/tests/test_DetectionOutput.cpp @@ -150,7 +150,7 @@ TEST(Layer, detectionOutputLayerFwd) { useGpu, result2); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA // GPU case 1. useGpu = true; inputLoc = Matrix::create(1, 16, false, useGpu); diff --git a/paddle/gserver/tests/test_Evaluator.cpp b/paddle/gserver/tests/test_Evaluator.cpp index 42bb570572..62a131171f 100644 --- a/paddle/gserver/tests/test_Evaluator.cpp +++ b/paddle/gserver/tests/test_Evaluator.cpp @@ -51,7 +51,7 @@ void testEvaluator(TestConfig testConf, string testEvaluatorName, size_t batchSize, bool useGpu) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) return; #endif FLAGS_use_gpu = useGpu; diff --git a/paddle/gserver/tests/test_KmaxSeqScore.cpp b/paddle/gserver/tests/test_KmaxSeqScore.cpp index 1594de8502..6386259882 100644 --- a/paddle/gserver/tests/test_KmaxSeqScore.cpp +++ b/paddle/gserver/tests/test_KmaxSeqScore.cpp @@ -97,7 +97,7 @@ TEST(Layer, kmaxSeqScoreLayer) { Matrix::create(subSeqStartPosition.back(), 1, false, false); std::vector mode = {false}; -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA mode.push_back(true); #endif diff --git a/paddle/gserver/tests/test_LayerGrad.cpp b/paddle/gserver/tests/test_LayerGrad.cpp index e887dee5f9..90a3352898 100644 --- a/paddle/gserver/tests/test_LayerGrad.cpp +++ b/paddle/gserver/tests/test_LayerGrad.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #endif #include @@ -258,7 +258,7 @@ void testProjectionConv(size_t groups, bool isDeconv) { true); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(Projection, conv) { /// test ConvProjection testProjectionConv(1, false); @@ -422,7 +422,7 @@ TEST(Layer, depthwiseConvLayer) { // 'depthwise_conv' is a sepecial case of 'exconv' whose // groups size equals to the input channels size. testDepthwiseConvLayer("exconv", /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testDepthwiseConvLayer("exconv", /* useGpu= */ true); #endif } @@ -480,7 +480,7 @@ void testConvLayer(const string& type, bool trans, bool useGpu) { TEST(Layer, convLayer) { testConvLayer("exconv", /* trans= */ false, /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testConvLayer("exconv", /* trans= */ false, /* useGpu= */ true); testConvLayer("cudnn_conv", /* trans= */ false, /* useGpu= */ true); #endif @@ -525,7 +525,7 @@ TEST(Layer, convTransLayer) { for (auto useGpu : {false, true}) { testConvTransLayer("exconvt", /* trans= */ false, /* useGpu= */ useGpu); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testConvTransLayer("cudnn_convt", /* trans= */ false, /* useGpu= */ true); #endif } @@ -638,7 +638,7 @@ TEST(Layer, SelectiveFullyConnectedLayer) { /* trans= */ false, /* useGup= */ false, false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testLayerGrad(config, "selective_fc", 100, @@ -1210,7 +1210,7 @@ void testPoolLayer(const string& poolType, bool trans, bool useGpu) { testLayerGrad(config, "pool", 100, trans, useGpu); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA void testPoolLayer2(const string& poolType, bool trans, bool useGpu) { TestConfig config; config.inputDefs.push_back({INPUT_DATA, "layer_0", 3200, 0}); @@ -1236,7 +1236,7 @@ TEST(Layer, PoolLayer) { testPoolLayer("avg-projection", /* trans= */ false, /* useGpu= */ false); testPoolLayer("max-projection", /* trans= */ false, /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testPoolLayer("avg-projection", /* trans= */ false, /* useGpu= */ true); testPoolLayer("max-projection", /* trans= */ false, /* useGpu= */ true); testPoolLayer("cudnn-max-pool", /* trans= */ false, /* useGpu= */ true); @@ -1309,7 +1309,7 @@ void testPool3DLayer(const string& poolType, bool trans, bool useGpu) { TEST(Layer, Pool3DLayer) { testPool3DLayer("avg", /* trans= */ false, /* useGpu= */ false); testPool3DLayer("max", /* trans= */ false, /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testPool3DLayer("avg", /* trans= */ false, /* useGpu= */ true); testPool3DLayer("max", /* trans= */ false, /* useGpu= */ true); #endif @@ -1695,7 +1695,7 @@ void testBatchNormLayer(const string& type, bool trans, bool useGpu) { TEST(Layer, BatchNormalizationLayer) { testBatchNormLayer("batch_norm", false, false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testBatchNormLayer("batch_norm", false, true); if (hl_get_cudnn_lib_version() >= int(4000)) { testBatchNormLayer("cudnn_batch_norm", false, true); @@ -1744,7 +1744,7 @@ void testBatchNorm3DLayer(const string& type, bool trans, bool useGpu) { TEST(Layer, testBatchNorm3DLayer) { testBatchNorm3DLayer("batch_norm", false, false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testBatchNorm3DLayer("batch_norm", false, true); if (hl_get_cudnn_lib_version() >= int(4000)) { testBatchNorm3DLayer("cudnn_batch_norm", false, true); @@ -2262,7 +2262,7 @@ void test3DConvLayer(const string& type, bool trans, bool useGpu) { TEST(Layer, test3DConvLayer) { test3DConvLayer("conv3d", /* trans= */ false, /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA test3DConvLayer("conv3d", /* trans= */ false, /* useGpu= */ true); #endif } @@ -2339,7 +2339,7 @@ void test3DDeConvLayer(const string& type, bool trans, bool useGpu) { TEST(Layer, test3DDeConvLayer) { test3DDeConvLayer("deconv3d", /* trans= */ false, /* useGpu= */ false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA test3DDeConvLayer("deconv3d", /* trans= */ false, /* useGpu= */ true); #endif } diff --git a/paddle/gserver/tests/test_NetworkCompare.cpp b/paddle/gserver/tests/test_NetworkCompare.cpp index e322fef9a4..2b92211936 100644 --- a/paddle/gserver/tests/test_NetworkCompare.cpp +++ b/paddle/gserver/tests/test_NetworkCompare.cpp @@ -243,7 +243,7 @@ TEST(Compare, concat_slice) { compareNetwork(config_file_a, config_file_b); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(Compare, img_pool) { std::string config_file_a = "./gserver/tests/img_pool_a.conf"; std::string config_file_b = "./gserver/tests/img_pool_b.conf"; diff --git a/paddle/gserver/tests/test_PriorBox.cpp b/paddle/gserver/tests/test_PriorBox.cpp index cbc0fff7b8..8dc5568784 100644 --- a/paddle/gserver/tests/test_PriorBox.cpp +++ b/paddle/gserver/tests/test_PriorBox.cpp @@ -151,7 +151,7 @@ TEST(Layer, priorBoxLayerFwd) { useGpu, result); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA // reset the input parameters variance[1] = 0.1; variance[3] = 0.2; diff --git a/paddle/gserver/tests/test_ProtoDataProvider.cpp b/paddle/gserver/tests/test_ProtoDataProvider.cpp index 988dbc2513..af6472619d 100644 --- a/paddle/gserver/tests/test_ProtoDataProvider.cpp +++ b/paddle/gserver/tests/test_ProtoDataProvider.cpp @@ -485,7 +485,7 @@ TEST(ProtoDataProvider, test) { // Currently in async mode, useGpu is not supported continue; } -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) { continue; } @@ -525,7 +525,7 @@ TEST(ProtoDataProvider, constant_slots) { for (int numConstantSlots : {1, 2}) { for (int useGpu : numTwoArray) { for (int dataCompression : numTwoArray) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) { continue; } @@ -708,7 +708,7 @@ TEST(ProtoSequenceDataProvider, test) { // Currently in async mode, useGpu is not supported continue; } -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) { continue; } diff --git a/paddle/gserver/tests/test_PyDataProvider.cpp b/paddle/gserver/tests/test_PyDataProvider.cpp index f6522febf8..fe54799259 100644 --- a/paddle/gserver/tests/test_PyDataProvider.cpp +++ b/paddle/gserver/tests/test_PyDataProvider.cpp @@ -37,7 +37,7 @@ TEST(PyDataProvider, py_fill_slots) { config.clear_files(); std::string dataFile = "gserver/tests/pyDataProvider/pyDataProviderList"; config.set_files(dataFile); -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA bool useGpu = false; #else bool useGpu = true; @@ -71,7 +71,7 @@ TEST(PyDataProvider, py_fill_nest_slots) { std::string dataFile = "gserver/tests/pyDataProvider/pyDataProviderList"; config.set_files(dataFile); EXPECT_EQ(config.IsInitialized(), true); -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA bool useGpu = false; #else bool useGpu = true; diff --git a/paddle/gserver/tests/test_SelectiveFCLayer.cpp b/paddle/gserver/tests/test_SelectiveFCLayer.cpp index b25d32fb2c..4c87fe1bba 100644 --- a/paddle/gserver/tests/test_SelectiveFCLayer.cpp +++ b/paddle/gserver/tests/test_SelectiveFCLayer.cpp @@ -321,7 +321,7 @@ TEST(Layer, SelectiveFcLayer_train_dense_mul) { "filelist=gserver/tests/SelectiveFcTest/dense_mul_list"; for (auto useGpu : {false, true}) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) { break; } @@ -388,7 +388,7 @@ void testSelectiveFcLayerTrainSparseMul(const LayerConfig& config, outMatSelfc->getWidth(), outMatSelfc->getElementCnt())); cpuOutMatSelfc->copyFrom(*outMatSelfc, HPPL_STREAM_DEFAULT); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA if (useGpu) { hl_stream_synchronize(HPPL_STREAM_DEFAULT); } @@ -418,7 +418,7 @@ void testSelectiveFcLayerTrainSparseMul(const LayerConfig& config, MatrixPtr cpuOutMatFc( new CpuMatrix(outMatFc->getHeight(), outMatFc->getWidth())); cpuOutMatFc->copyFrom(*outMatFc, HPPL_STREAM_DEFAULT); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA if (useGpu) { hl_stream_synchronize(HPPL_STREAM_DEFAULT); } @@ -443,7 +443,7 @@ TEST(Layer, SelectiveFcLayer_train_sparse_mul) { selLayerConfig.set_size(fcLayerWidth); testSelectiveFcLayerTrainSparseMul(selLayerConfig, false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testSelectiveFcLayerTrainSparseMul(selLayerConfig, true); #endif } diff --git a/paddle/gserver/tests/test_SeqSliceLayerGrad.cpp b/paddle/gserver/tests/test_SeqSliceLayerGrad.cpp index f28149081b..3366002ca1 100644 --- a/paddle/gserver/tests/test_SeqSliceLayerGrad.cpp +++ b/paddle/gserver/tests/test_SeqSliceLayerGrad.cpp @@ -195,7 +195,7 @@ TEST(Layer, SeqSliceLayer) { vector> ends; std::vector mode = {false}; -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA mode.push_back(true); #endif genSeqInfo(seqStartPos, subSeqStartPos); diff --git a/paddle/gserver/tests/test_WarpCTCLayer.cpp b/paddle/gserver/tests/test_WarpCTCLayer.cpp index ae5b64257f..da82946006 100644 --- a/paddle/gserver/tests/test_WarpCTCLayer.cpp +++ b/paddle/gserver/tests/test_WarpCTCLayer.cpp @@ -199,7 +199,7 @@ TEST(Layer, WarpCTCLayer) { for (auto batchSize : {1, 10, 32}) { for (auto normByTimes : {false, true}) { for (auto useGpu : {false, true}) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) continue; #endif LOG(INFO) << "layerSize=" << layerSize << " batchSize=" << batchSize diff --git a/paddle/math/Matrix.cpp b/paddle/math/Matrix.cpp index de02f9c0d5..c3e34d5309 100644 --- a/paddle/math/Matrix.cpp +++ b/paddle/math/Matrix.cpp @@ -670,7 +670,7 @@ void GpuMatrix::leftMul(Matrix& a, real scaleAB, real scaleT) { } void GpuMatrix::selectRows(Matrix& table, IVector& ids) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA CHECK(dynamic_cast(&table)); CHECK(table.useGpu()); CHECK(ids.useGpu()); @@ -694,7 +694,7 @@ void GpuMatrix::selectRows(Matrix& table, IVector& ids) { } void GpuMatrix::addToRows(Matrix& table, IVector& ids) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA CHECK(dynamic_cast(&table)); CHECK(table.useGpu()); CHECK(ids.useGpu()); @@ -741,7 +741,7 @@ void GpuMatrix::rowMax(Matrix& max) { } void GpuMatrix::rowMax(IVector& maxIds, Matrix& maxVal) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA CHECK(maxIds.useGpu() && maxVal.useGpu()) << "Matrix type are not equal"; size_t numSamples = getHeight(); size_t beam = maxVal.getWidth(); diff --git a/paddle/math/SparseMatrix.cpp b/paddle/math/SparseMatrix.cpp index 1f31082ae8..284b68d590 100644 --- a/paddle/math/SparseMatrix.cpp +++ b/paddle/math/SparseMatrix.cpp @@ -836,7 +836,7 @@ void GpuSparseMatrix::zeroMem() { } void GpuSparseMatrix::rowMax(IVector& maxIds, Matrix& maxVal) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA CHECK(maxIds.useGpu() && maxVal.useGpu()) << "Matrix type are not equal"; size_t numSamples = getHeight(); size_t beam = maxVal.getWidth(); diff --git a/paddle/math/Vector.cpp b/paddle/math/Vector.cpp index 54e57b255d..ff72672e3a 100644 --- a/paddle/math/Vector.cpp +++ b/paddle/math/Vector.cpp @@ -172,7 +172,7 @@ void GpuVectorT::isEqualTo(const VectorT& b, const T& value) { template void GpuVectorT::selectFrom(const VectorT& src, const VectorT& ids) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA hl_vector_select_from(this->getData(), this->getSize(), src.getData(), @@ -850,7 +850,7 @@ CpuGpuVectorT::CpuGpuVectorT(CpuGpuVectorT& src, size_t size) : sync_(nullptr) { CHECK_LE(offset + size, static_cast(src.getSize())); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA SyncedFlag* flag = src.getSync(); if (*flag == DATA_AT_CPU) { src.copyToGpu(); // will set synchronous data between CPU and GPU @@ -861,7 +861,7 @@ CpuGpuVectorT::CpuGpuVectorT(CpuGpuVectorT& src, auto cMemHandle = (src.getVector(false))->getMemoryHandle(); cpuVectorT_ = std::make_shared>( size, std::dynamic_pointer_cast(cMemHandle), offset); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA auto gMemHandle = (src.getVector(true))->getMemoryHandle(); gpuVectorT_ = std::make_shared>( size, std::dynamic_pointer_cast(gMemHandle), offset); diff --git a/paddle/math/tests/test_Allocator.cpp b/paddle/math/tests/test_Allocator.cpp index cf2f66aea1..1fecf659e5 100644 --- a/paddle/math/tests/test_Allocator.cpp +++ b/paddle/math/tests/test_Allocator.cpp @@ -68,7 +68,7 @@ void testPoolAllocator() { TEST(Allocator, Pool) { testPoolAllocator(); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testPoolAllocator(); #endif } @@ -92,7 +92,7 @@ TEST(MemoryHandle, Cpu) { EXPECT_EQ(ptr1, ptr2); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(MemoryHandle, Gpu) { int numGpu = hl_get_device_count(); diff --git a/paddle/math/tests/test_BaseMatrix.cpp b/paddle/math/tests/test_BaseMatrix.cpp index 730759f3db..1766257860 100644 --- a/paddle/math/tests/test_BaseMatrix.cpp +++ b/paddle/math/tests/test_BaseMatrix.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA /** * This test file use autotest::AutoCompare and cmpWithoutArg to compares the * implementation of CPU and GPU member function in diff --git a/paddle/math/tests/test_CpuGpuVector.cpp b/paddle/math/tests/test_CpuGpuVector.cpp index ccb4a902b0..c72f89c824 100644 --- a/paddle/math/tests/test_CpuGpuVector.cpp +++ b/paddle/math/tests/test_CpuGpuVector.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #include "paddle/math/Vector.h" diff --git a/paddle/math/tests/test_ExecViaCpu.cpp b/paddle/math/tests/test_ExecViaCpu.cpp index 2d439cd060..25e0ba11de 100644 --- a/paddle/math/tests/test_ExecViaCpu.cpp +++ b/paddle/math/tests/test_ExecViaCpu.cpp @@ -94,7 +94,7 @@ void testWrapper(F&& f) { } } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(ExecViaCpu, test1) { testWrapper(f); testWrapper(&f); diff --git a/paddle/math/tests/test_GpuProfiler.cpp b/paddle/math/tests/test_GpuProfiler.cpp index 6dab187e3e..9402bd3ec4 100644 --- a/paddle/math/tests/test_GpuProfiler.cpp +++ b/paddle/math/tests/test_GpuProfiler.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #include "paddle/math/Matrix.h" diff --git a/paddle/math/tests/test_Matrix.cpp b/paddle/math/tests/test_Matrix.cpp index 7a145eae6a..2f99fa3581 100644 --- a/paddle/math/tests/test_Matrix.cpp +++ b/paddle/math/tests/test_Matrix.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA /** * This test file use autotest::AutoCompare and cmpWithArg to compares the * implementation of CPU and GPU member function in Matrix.cpp. diff --git a/paddle/math/tests/test_SparseMatrix.cpp b/paddle/math/tests/test_SparseMatrix.cpp index 8151dde106..8abbe8d82e 100644 --- a/paddle/math/tests/test_SparseMatrix.cpp +++ b/paddle/math/tests/test_SparseMatrix.cpp @@ -47,7 +47,7 @@ struct MatrixPara { SparseFormat format; }; -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA void test_sparse_matrix_mul(MatrixPara paraA, MatrixPara paraB, MatrixPara paraC) { @@ -452,7 +452,7 @@ TEST(Matrix, SparseMatrixCSRFormatTrimFrom) { matB->trimFrom(*mat); checkSMatrixEqual2(matA, matB); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA GpuSparseMatrixPtr matC = std::make_shared( height, trimedWidth, height, FLOAT_VALUE, SPARSE_CSR, true); matC->trimFrom(*mat); @@ -546,7 +546,7 @@ TEST(Matrix, SparseMatrixCSCFormatTrimFrom) { matB->trimFrom(*mat); checkSMatrixEqual2(matA, matB); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA GpuSparseMatrixPtr matC = std::make_shared( height, trimedWidth, height, FLOAT_VALUE, SPARSE_CSC, true); matC->trimFrom(*mat); diff --git a/paddle/math/tests/test_TrainingAlgorithm.cpp b/paddle/math/tests/test_TrainingAlgorithm.cpp index 36ac024007..5ae0aa036f 100644 --- a/paddle/math/tests/test_TrainingAlgorithm.cpp +++ b/paddle/math/tests/test_TrainingAlgorithm.cpp @@ -91,7 +91,7 @@ int VectorCheckErr(const VectorPtr& vector1, const VectorPtr& vector2) { typedef std::function testMatrixFunc; void testCase(testMatrixFunc matrixFunc) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA for (auto useGpu : {false, true}) { #else for (auto useGpu : {false}) { diff --git a/paddle/math/tests/test_batchTranspose.cpp b/paddle/math/tests/test_batchTranspose.cpp index 0189e534eb..b70a619764 100644 --- a/paddle/math/tests/test_batchTranspose.cpp +++ b/paddle/math/tests/test_batchTranspose.cpp @@ -17,7 +17,7 @@ limitations under the License. */ using namespace paddle; // NOLINT -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(MatrixBatchTransTest, test_batch_matrix_transpose) { const int nx = 100; const int ny = 50; diff --git a/paddle/math/tests/test_matrixCompare.cpp b/paddle/math/tests/test_matrixCompare.cpp index 7735877ac8..7e5a1db44a 100644 --- a/paddle/math/tests/test_matrixCompare.cpp +++ b/paddle/math/tests/test_matrixCompare.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA /// This unittest checks GpuMatrix/CpuMatrix get same result, so disable when /// only cpu version. diff --git a/paddle/math/tests/test_perturbation.cpp b/paddle/math/tests/test_perturbation.cpp index dff18136ae..c7c07c817a 100644 --- a/paddle/math/tests/test_perturbation.cpp +++ b/paddle/math/tests/test_perturbation.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #include diff --git a/paddle/math/tests/test_sparseMatrixCompare.cpp b/paddle/math/tests/test_sparseMatrixCompare.cpp index e39cc0a2f6..2b2a391b9d 100644 --- a/paddle/math/tests/test_sparseMatrixCompare.cpp +++ b/paddle/math/tests/test_sparseMatrixCompare.cpp @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA /// This unittest checks GpuSparseMatrix/CpuSparseMatrix get same result, // so disable when /// only cpu version. diff --git a/paddle/memory/detail/buddy_allocator.cc b/paddle/memory/detail/buddy_allocator.cc index ed0c3374ff..fdc5ed19dc 100644 --- a/paddle/memory/detail/buddy_allocator.cc +++ b/paddle/memory/detail/buddy_allocator.cc @@ -175,7 +175,7 @@ void* BuddyAllocator::SystemAlloc(size_t size) { } BuddyAllocator::PoolSet::iterator BuddyAllocator::RefillPool() { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA if (system_allocator_->UseGpu()) { if ((total_used_ + total_free_) == 0) { // Compute the maximum allocation size for the first allocation. diff --git a/paddle/memory/detail/system_allocator.cc b/paddle/memory/detail/system_allocator.cc index 64f8182b5c..6c9a46dd09 100644 --- a/paddle/memory/detail/system_allocator.cc +++ b/paddle/memory/detail/system_allocator.cc @@ -62,7 +62,7 @@ void CPUAllocator::Free(void* p, size_t size, size_t index) { bool CPUAllocator::UseGpu() const { return false; } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA void* GPUAllocator::Alloc(size_t& index, size_t size) { // CUDA documentation doesn't explain if cudaMalloc returns nullptr diff --git a/paddle/memory/detail/system_allocator.h b/paddle/memory/detail/system_allocator.h index 6b1f40347b..ee9b012f91 100644 --- a/paddle/memory/detail/system_allocator.h +++ b/paddle/memory/detail/system_allocator.h @@ -40,7 +40,7 @@ class CPUAllocator : public SystemAllocator { virtual bool UseGpu() const; }; -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA class GPUAllocator : public SystemAllocator { public: virtual void* Alloc(size_t& index, size_t size); diff --git a/paddle/memory/detail/system_allocator_test.cc b/paddle/memory/detail/system_allocator_test.cc index 57d5443d50..cd563844e7 100644 --- a/paddle/memory/detail/system_allocator_test.cc +++ b/paddle/memory/detail/system_allocator_test.cc @@ -56,7 +56,7 @@ TEST(CPUAllocator, LockMem) { TestAllocator(a, 0); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(GPUAllocator, Alloc) { paddle::memory::detail::GPUAllocator a; TestAllocator(a, 2048); diff --git a/paddle/memory/memcpy.cc b/paddle/memory/memcpy.cc index 184d0f8fa7..790420a8ab 100644 --- a/paddle/memory/memcpy.cc +++ b/paddle/memory/memcpy.cc @@ -26,7 +26,7 @@ void Copy(platform::CPUPlace, void* dst, std::memcpy(dst, src, num); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template <> void Copy(platform::CPUPlace dst_place, void* dst, diff --git a/paddle/memory/memcpy.h b/paddle/memory/memcpy.h index 7142831d43..0bccee58c3 100644 --- a/paddle/memory/memcpy.h +++ b/paddle/memory/memcpy.h @@ -33,7 +33,7 @@ namespace memory { template void Copy(DstPlace, void* dst, SrcPlace, const void* src, size_t num); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA /** * \brief Copy memory from one place to another place. diff --git a/paddle/memory/memory.cc b/paddle/memory/memory.cc index 6d5a74dafe..355b6218d0 100644 --- a/paddle/memory/memory.cc +++ b/paddle/memory/memory.cc @@ -62,7 +62,7 @@ size_t Used(platform::CPUPlace place) { return GetCPUBuddyAllocator()->Used(); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) { using BuddyAllocVec = std::vector; diff --git a/paddle/memory/memory_test.cc b/paddle/memory/memory_test.cc index 7a617f04dc..0d402038a0 100644 --- a/paddle/memory/memory_test.cc +++ b/paddle/memory/memory_test.cc @@ -80,7 +80,7 @@ TEST(BuddyAllocator, CPUMultAlloc) { } } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA size_t align(size_t size, paddle::platform::GPUPlace place) { size += sizeof(paddle::memory::detail::Metadata); diff --git a/paddle/operators/detail/strided_memcpy.h b/paddle/operators/detail/strided_memcpy.h index 9f05a26322..068c82f399 100644 --- a/paddle/operators/detail/strided_memcpy.h +++ b/paddle/operators/detail/strided_memcpy.h @@ -34,7 +34,7 @@ struct StridedMemcpyFunctor { auto& cpu_place = boost::get(place); memory::Copy(cpu_place, dst, cpu_place, src, sizeof(T) * dst_dim.head); } else { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA auto& gpu_place = boost::get(place); auto& cuda_ctx = reinterpret_cast(dev_ctx); diff --git a/paddle/operators/math/im2col_test.cc b/paddle/operators/math/im2col_test.cc index 3d040ca2b5..40bdbfe733 100644 --- a/paddle/operators/math/im2col_test.cc +++ b/paddle/operators/math/im2col_test.cc @@ -71,7 +71,7 @@ void testIm2col() { context = new paddle::platform::CPUDeviceContext(paddle::platform::CPUPlace()); } else { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA context = new paddle::platform::CUDADeviceContext(paddle::platform::GPUPlace()); #else @@ -116,7 +116,7 @@ void testIm2col() { TEST(math, im2col) { testIm2col(); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA testIm2col(); #endif } diff --git a/paddle/operators/math/math_function_test.cc b/paddle/operators/math/math_function_test.cc index 2252268620..9945ba101d 100644 --- a/paddle/operators/math/math_function_test.cc +++ b/paddle/operators/math/math_function_test.cc @@ -1,7 +1,7 @@ #include "paddle/operators/math/math_function.h" #include "gtest/gtest.h" -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(math_function, notrans_mul_trans) { paddle::framework::Tensor input1; paddle::framework::Tensor input1_gpu; diff --git a/paddle/operators/strided_memcpy_test.cc b/paddle/operators/strided_memcpy_test.cc index e0dd7b19f1..68f064eaee 100644 --- a/paddle/operators/strided_memcpy_test.cc +++ b/paddle/operators/strided_memcpy_test.cc @@ -72,7 +72,7 @@ TEST(StridedMemcpy, CPUConcat) { } } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(StridedMemcpy, GPUCrop) { // clang-format off int src[] = { @@ -157,4 +157,4 @@ TEST(StridedMemcpy, GPUConcat) { #endif } // namespace operators -} // namespace paddle \ No newline at end of file +} // namespace paddle diff --git a/paddle/platform/device_context.cc b/paddle/platform/device_context.cc index 8dcc357a16..a9b6b79903 100644 --- a/paddle/platform/device_context.cc +++ b/paddle/platform/device_context.cc @@ -35,7 +35,7 @@ Eigen::DefaultDevice* CPUDeviceContext::eigen_device() const { Place CPUDeviceContext::GetPlace() const { return CPUPlace(); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template <> Eigen::GpuDevice* diff --git a/paddle/platform/device_context.h b/paddle/platform/device_context.h index c1c4c7f760..ef5f19214d 100644 --- a/paddle/platform/device_context.h +++ b/paddle/platform/device_context.h @@ -14,7 +14,7 @@ limitations under the License. */ #include "paddle/platform/enforce.h" #include "paddle/platform/place.h" -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include "paddle/platform/dynload/cublas.h" #include "paddle/platform/dynload/cudnn.h" #include "paddle/platform/gpu_info.h" @@ -61,7 +61,7 @@ class CPUDeviceContext : public DeviceContext { std::unique_ptr eigen_device_; }; -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template <> struct EigenDeviceConverter { using EigenDeviceType = Eigen::GpuDevice; diff --git a/paddle/platform/enforce.h b/paddle/platform/enforce.h index f9fe521d50..15d8446cd8 100644 --- a/paddle/platform/enforce.h +++ b/paddle/platform/enforce.h @@ -29,7 +29,7 @@ limitations under the License. */ #include // for __cxa_demangle #endif -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include "paddle/platform/dynload/cublas.h" #include "paddle/platform/dynload/cudnn.h" @@ -113,7 +113,7 @@ inline typename std::enable_if::type throw_on_error( } } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template inline typename std::enable_if::type throw_on_error( diff --git a/paddle/platform/enforce_test.cc b/paddle/platform/enforce_test.cc index 80bdee3d9d..8206a055ea 100644 --- a/paddle/platform/enforce_test.cc +++ b/paddle/platform/enforce_test.cc @@ -213,4 +213,4 @@ TEST(ENFORCE_USER_DEFINED_CLASS, EQ) { TEST(ENFORCE_USER_DEFINED_CLASS, NE) { Dims a{{1, 2, 3, 4}}, b{{5, 6, 7, 8}}; ASSERT_THROW(PADDLE_ENFORCE_EQ(a, b), paddle::platform::EnforceNotMet); -} \ No newline at end of file +} diff --git a/paddle/platform/gpu_info.h b/paddle/platform/gpu_info.h index ac884386dd..e47c9b4a2a 100644 --- a/paddle/platform/gpu_info.h +++ b/paddle/platform/gpu_info.h @@ -14,7 +14,7 @@ limitations under the License. */ #pragma once -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA #include #include diff --git a/paddle/platform/variant.h b/paddle/platform/variant.h index 8145799dfd..619897ca19 100644 --- a/paddle/platform/variant.h +++ b/paddle/platform/variant.h @@ -16,7 +16,7 @@ #include -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA // Because boost's variadic templates has bug on nvcc, boost will disable // variadic template support when GPU enabled on nvcc. diff --git a/paddle/pserver/test/SocketTest.cpp b/paddle/pserver/test/SocketTest.cpp index 96724530f5..b43461d61b 100644 --- a/paddle/pserver/test/SocketTest.cpp +++ b/paddle/pserver/test/SocketTest.cpp @@ -215,7 +215,7 @@ int main(int argc, char** argv) { uint64_t dataSize = FLAGS_dim * sizeof(real); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA GpuVector gpuParam(FLAGS_dim); GpuVector gpuGrad(FLAGS_dim); #else diff --git a/paddle/pserver/test/test_ProtoServer.cpp b/paddle/pserver/test/test_ProtoServer.cpp index 74ab1f2f77..ad8ffed9c1 100644 --- a/paddle/pserver/test/test_ProtoServer.cpp +++ b/paddle/pserver/test/test_ProtoServer.cpp @@ -99,7 +99,7 @@ TEST(ProtoServer, regular) { } TEST(ProtoServer, extended) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA ProtoClient* client; if (FLAGS_rdma_tcp == "rdma") client = new ProtoClient(FLAGS_server_addr, FLAGS_port, F_RDMA); diff --git a/paddle/pybind/pybind.cc b/paddle/pybind/pybind.cc index 761d82fc4d..cff54b1741 100644 --- a/paddle/pybind/pybind.cc +++ b/paddle/pybind/pybind.cc @@ -34,7 +34,7 @@ static size_t UniqueIntegerGenerator() { } bool IsCompileGPU() { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA return false; #else return true; @@ -78,7 +78,7 @@ PYBIND11_PLUGIN(core) { .def("set", PyCPUTensorSetFromArray) .def("set", PyCPUTensorSetFromArray) .def("set", PyCPUTensorSetFromArray) -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA .def("set", PyCUDATensorSetFromArray) .def("set", PyCUDATensorSetFromArray) .def("set", PyCUDATensorSetFromArray) @@ -96,7 +96,7 @@ PYBIND11_PLUGIN(core) { .def( "__init__", [](LoDTensor &instance, const std::vector> &lod) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA new (&instance) LoDTensor(lod); #else LoD new_lod; @@ -107,7 +107,7 @@ PYBIND11_PLUGIN(core) { }) .def("set_lod", [](LoDTensor &self, const std::vector> &lod) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA self.set_lod(lod); #else LoD new_lod; @@ -117,7 +117,7 @@ PYBIND11_PLUGIN(core) { #endif }) .def("lod", [](LoDTensor &self) -> std::vector> { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA return self.lod(); #else auto lod = self.lod(); @@ -203,7 +203,7 @@ All parameter, weight, gradient are variables in Paddle. .def_static("create", [](paddle::platform::GPUPlace& place) -> paddle::platform::DeviceContext* { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA PADDLE_THROW("GPUPlace is not supported in CPU device."); #else return new paddle::platform::CUDADeviceContext(place); diff --git a/paddle/pybind/tensor_py.h b/paddle/pybind/tensor_py.h index 62e85fa54f..9e73f79cbd 100644 --- a/paddle/pybind/tensor_py.h +++ b/paddle/pybind/tensor_py.h @@ -106,7 +106,7 @@ void PyCPUTensorSetFromArray( std::memcpy(dst, array.data(), sizeof(T) * array.size()); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA template void PyCUDATensorSetFromArray( framework::Tensor &self, diff --git a/paddle/string/to_string_test.cc b/paddle/string/to_string_test.cc index 542c771a98..971484dd0c 100644 --- a/paddle/string/to_string_test.cc +++ b/paddle/string/to_string_test.cc @@ -36,4 +36,4 @@ TEST(to_string, user_defined) { using namespace paddle::string; UserDefinedClass instance; ASSERT_EQ(kOutputString, to_string(instance)); -} \ No newline at end of file +} diff --git a/paddle/trainer/MergeModel.cpp b/paddle/trainer/MergeModel.cpp index a37d53bc72..6c52eaf449 100644 --- a/paddle/trainer/MergeModel.cpp +++ b/paddle/trainer/MergeModel.cpp @@ -29,7 +29,7 @@ int main(int argc, char** argv) { initMain(argc, argv); initPython(argc, argv); string confFile = TrainerConfigHelper::getConfigNameFromPath(FLAGS_model_dir); -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA FLAGS_use_gpu = false; #endif auto config = std::make_shared(confFile); diff --git a/paddle/trainer/tests/test_Compare.cpp b/paddle/trainer/tests/test_Compare.cpp index b5d29da45a..f3a964acb6 100644 --- a/paddle/trainer/tests/test_Compare.cpp +++ b/paddle/trainer/tests/test_Compare.cpp @@ -146,7 +146,7 @@ void compareGradient(comData& comDataCpu, comData& comDataGpu) { } int main(int argc, char** argv) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA exit(0); #endif paddle::initMain(argc, argv); diff --git a/paddle/trainer/tests/test_CompareSparse.cpp b/paddle/trainer/tests/test_CompareSparse.cpp index 4da9ce20fb..5f1834bd73 100644 --- a/paddle/trainer/tests/test_CompareSparse.cpp +++ b/paddle/trainer/tests/test_CompareSparse.cpp @@ -174,7 +174,7 @@ TEST(compareSparse, multiGradientMachine) { FLAGS_local = local; FLAGS_ports_num_for_sparse = 5; for (bool useGpu : {false, true}) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) continue; #endif FLAGS_parallel_nn = useGpu; @@ -198,7 +198,7 @@ TEST(compareSparse, NeuralNetwork) { FLAGS_local = local; FLAGS_ports_num_for_sparse = 5; for (bool useGpu : {false, true}) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA if (useGpu) continue; #endif FLAGS_parallel_nn = useGpu; diff --git a/paddle/trainer/tests/test_Trainer.cpp b/paddle/trainer/tests/test_Trainer.cpp index f69e1aafee..425b3d10a3 100644 --- a/paddle/trainer/tests/test_Trainer.cpp +++ b/paddle/trainer/tests/test_Trainer.cpp @@ -51,7 +51,7 @@ void checkGradientTest(const string& configFile, TEST(checkGradient, cpu) { checkGradientTest(configFile1, false, false); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(checkGradient, gpu) { checkGradientTest(configFile1, true, false); } TEST(checkGradient, multiGpu) { @@ -97,7 +97,7 @@ TEST(checkGradient, hsigmoid) { checkGradientTest(configFile2, false, false); } TEST(checkGradient, chunk) { checkGradientTest(configFile3, false, false); -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA checkGradientTest(configFile3, true, true); #endif } diff --git a/paddle/trainer/tests/test_TrainerOnePass.cpp b/paddle/trainer/tests/test_TrainerOnePass.cpp index 4c4d124fa9..b2a93d4d5e 100644 --- a/paddle/trainer/tests/test_TrainerOnePass.cpp +++ b/paddle/trainer/tests/test_TrainerOnePass.cpp @@ -79,7 +79,7 @@ void trainerOnePassTest(const string& configFile, // 1. test trainer (cpu, gpu). TEST(trainerOnePass, cpu) { trainerOnePassTest(configFile1, false, false); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(trainerOnePass, gpu) { trainerOnePassTest(configFile1, true, false); } TEST(trainerOnePass, gpu2) { trainerOnePassTest(configFile1, true, false, 2); } @@ -94,7 +94,7 @@ TEST(trainerOnePass, parallel) { #endif // 2. test average_window. -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(average_window, gpu) { trainerOnePassTest(configFile1, true, false, 4, 0.01); } @@ -266,7 +266,7 @@ TEST(checkRemoteUpdater, cpuTrainerOldUpdater) { checkRemoteParameterUpdaterTest(configFile1, false, false, 1, true); } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST(checkRemoteUpdater, gpuTrainer) { checkRemoteParameterUpdaterTest(configFile1, true, false); } diff --git a/paddle/trainer/tests/test_recurrent_machine_generation.cpp b/paddle/trainer/tests/test_recurrent_machine_generation.cpp index 74b4fed7ed..a8fbe31c2b 100644 --- a/paddle/trainer/tests/test_recurrent_machine_generation.cpp +++ b/paddle/trainer/tests/test_recurrent_machine_generation.cpp @@ -113,7 +113,7 @@ void testGeneration(const string& configFile, #ifndef PADDLE_TYPE_DOUBLE TEST(RecurrentGradientMachine, test_generation) { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA const auto useGpuConfs = {false}; #else const auto useGpuConfs = {true, false}; diff --git a/paddle/utils/Flags.cpp b/paddle/utils/Flags.cpp index 32155ded35..8f100f02e9 100644 --- a/paddle/utils/Flags.cpp +++ b/paddle/utils/Flags.cpp @@ -14,7 +14,7 @@ limitations under the License. */ #include "Flags.h" -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA DEFINE_bool(use_gpu, false, "Only support CPU training"); #else DEFINE_bool(use_gpu, true, "Whether to use GPU for training"); diff --git a/paddle/utils/Util.h b/paddle/utils/Util.h index 904d0f5061..9579881ea3 100644 --- a/paddle/utils/Util.h +++ b/paddle/utils/Util.h @@ -218,7 +218,7 @@ protected: * *d2* is peer device to enable direct access to by the d1 device. */ inline void enablePeerAccess(int d1, int d2) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA if (hl_device_can_access_peer(d1, d2)) { SetDevice dev(d1); hl_device_enable_peer_access(d2); diff --git a/paddle/utils/Version.h b/paddle/utils/Version.h index 611fda83d9..004d62451c 100644 --- a/paddle/utils/Version.h +++ b/paddle/utils/Version.h @@ -48,7 +48,7 @@ void printVersion(std::ostream& os); * @return return true if paddle compiled with GPU */ constexpr bool isWithGpu() { -#ifndef PADDLE_WITH_GPU +#ifndef PADDLE_WITH_CUDA return false; #else return true; From 2b204f048bf6599bdb9ba799769404dc5fd206a8 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Wed, 4 Oct 2017 14:09:19 -0700 Subject: [PATCH 08/21] Rename platform::GetDeviceCount into platform::GetCUDADeviceCount --- paddle/memory/memory.cc | 2 +- paddle/platform/device_context_test.cc | 4 ++-- paddle/platform/gpu_info.cc | 4 ++-- paddle/platform/gpu_info.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/paddle/memory/memory.cc b/paddle/memory/memory.cc index 6d5a74dafe..f816962890 100644 --- a/paddle/memory/memory.cc +++ b/paddle/memory/memory.cc @@ -77,7 +77,7 @@ BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) { // GPU buddy allocator initialization std::call_once(gpu_allocator_flag, [&]() { - int gpu_num = platform::GetDeviceCount(); + int gpu_num = platform::GetCUDADeviceCount(); allocators.reserve(gpu_num); for (int gpu = 0; gpu < gpu_num; gpu++) { platform::SetDeviceId(gpu); diff --git a/paddle/platform/device_context_test.cc b/paddle/platform/device_context_test.cc index f4b00c57de..8bf5174c4a 100644 --- a/paddle/platform/device_context_test.cc +++ b/paddle/platform/device_context_test.cc @@ -20,7 +20,7 @@ TEST(Device, Init) { using paddle::platform::CUDADeviceContext; using paddle::platform::GPUPlace; - int count = paddle::platform::GetDeviceCount(); + int count = paddle::platform::GetCUDADeviceCount(); for (int i = 0; i < count; i++) { DeviceContext* device_context = new CUDADeviceContext(GPUPlace(i)); Eigen::GpuDevice* gpu_device = @@ -34,7 +34,7 @@ TEST(Device, CUDADeviceContext) { using paddle::platform::CUDADeviceContext; using paddle::platform::GPUPlace; - int count = paddle::platform::GetDeviceCount(); + int count = paddle::platform::GetCUDADeviceCount(); for (int i = 0; i < count; i++) { CUDADeviceContext* device_context = new CUDADeviceContext(GPUPlace(i)); Eigen::GpuDevice* gpu_device = device_context->eigen_device(); diff --git a/paddle/platform/gpu_info.cc b/paddle/platform/gpu_info.cc index be381a4e26..70ad611d5d 100644 --- a/paddle/platform/gpu_info.cc +++ b/paddle/platform/gpu_info.cc @@ -26,11 +26,11 @@ DEFINE_double(fraction_of_gpu_memory_to_use, 0.95, namespace paddle { namespace platform { -int GetDeviceCount() { +int GetCUDADeviceCount() { int count; PADDLE_ENFORCE( cudaGetDeviceCount(&count), - "cudaGetDeviceCount failed in paddle::platform::GetDeviceCount"); + "cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCount"); return count; } diff --git a/paddle/platform/gpu_info.h b/paddle/platform/gpu_info.h index ac884386dd..276783bbe4 100644 --- a/paddle/platform/gpu_info.h +++ b/paddle/platform/gpu_info.h @@ -28,7 +28,7 @@ const std::string kEnvFractionGpuMemoryToUse = "PADDLE_FRACTION_GPU_MEMORY_TO_USE"; //! Get the total number of GPU devices in system. -int GetDeviceCount(); +int GetCUDADeviceCount(); //! Get the current GPU device id in system. int GetCurrentDeviceId(); From ee7b3ed09e699da191fe238ab409f32318637380 Mon Sep 17 00:00:00 2001 From: qiaolongfei Date: Wed, 4 Oct 2017 15:33:44 -0700 Subject: [PATCH 09/21] use EigenScalar to get learning_rate from GPU device --- paddle/operators/sgd_op.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/paddle/operators/sgd_op.h b/paddle/operators/sgd_op.h index d72d333a9a..954fd48272 100644 --- a/paddle/operators/sgd_op.h +++ b/paddle/operators/sgd_op.h @@ -23,6 +23,9 @@ using Tensor = framework::Tensor; template using EigenVector = framework::EigenVector; +template +using EigenScalar = framework::EigenScalar; template class SGDOpKernel : public framework::OpKernel { @@ -31,13 +34,14 @@ class SGDOpKernel : public framework::OpKernel { auto param = ctx.Input("Param"); auto grad = ctx.Input("Grad"); auto param_out = ctx.Output("ParamOut"); - float lr = ctx.Input("LearningRate")->data()[0]; + auto learning_rate = ctx.Input("LearningRate"); param_out->mutable_data(ctx.GetPlace()); auto p = EigenVector::Flatten(*param); auto g = EigenVector::Flatten(*grad); auto o = EigenVector::Flatten(*param_out); + auto lr = EigenScalar::From(*learning_rate); auto place = ctx.GetEigenDevice(); o.device(place) = p - lr * g; From 775c60246b66469e06f01a50c89b7b39594a3b63 Mon Sep 17 00:00:00 2001 From: qiaolongfei Date: Wed, 4 Oct 2017 16:53:21 -0700 Subject: [PATCH 10/21] remove using in sgd header file --- paddle/operators/sgd_op.h | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/paddle/operators/sgd_op.h b/paddle/operators/sgd_op.h index 954fd48272..b501d244d7 100644 --- a/paddle/operators/sgd_op.h +++ b/paddle/operators/sgd_op.h @@ -19,32 +19,25 @@ limitations under the License. */ namespace paddle { namespace operators { -using Tensor = framework::Tensor; -template -using EigenVector = framework::EigenVector; -template -using EigenScalar = framework::EigenScalar; - template class SGDOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { - auto param = ctx.Input("Param"); - auto grad = ctx.Input("Grad"); - auto param_out = ctx.Output("ParamOut"); - auto learning_rate = ctx.Input("LearningRate"); + auto param = ctx.Input("Param"); + auto grad = ctx.Input("Grad"); + auto param_out = ctx.Output("ParamOut"); + auto learning_rate = ctx.Input("LearningRate"); param_out->mutable_data(ctx.GetPlace()); - auto p = EigenVector::Flatten(*param); - auto g = EigenVector::Flatten(*grad); - auto o = EigenVector::Flatten(*param_out); - auto lr = EigenScalar::From(*learning_rate); + auto p = framework::EigenVector::Flatten(*param); + auto g = framework::EigenVector::Flatten(*grad); + auto o = framework::EigenVector::Flatten(*param_out); + auto lr = framework::EigenVector::From(*learning_rate); auto place = ctx.GetEigenDevice(); - o.device(place) = p - lr * g; + Eigen::DSizes grad_dsize(grad->dims()[0], grad->dims()[1]); + o.device(place) = p - lr.broadcast(grad_dsize) * g; } }; From 8ebc31d9358c919fdd6f50d502f4ee071a91d38e Mon Sep 17 00:00:00 2001 From: qiaolongfei Date: Wed, 4 Oct 2017 17:13:02 -0700 Subject: [PATCH 11/21] optimize the dsize --- paddle/operators/sgd_op.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/paddle/operators/sgd_op.h b/paddle/operators/sgd_op.h index b501d244d7..26f4012f25 100644 --- a/paddle/operators/sgd_op.h +++ b/paddle/operators/sgd_op.h @@ -33,10 +33,10 @@ class SGDOpKernel : public framework::OpKernel { auto p = framework::EigenVector::Flatten(*param); auto g = framework::EigenVector::Flatten(*grad); auto o = framework::EigenVector::Flatten(*param_out); - auto lr = framework::EigenVector::From(*learning_rate); + auto lr = framework::EigenVector::Flatten(*learning_rate); auto place = ctx.GetEigenDevice(); - Eigen::DSizes grad_dsize(grad->dims()[0], grad->dims()[1]); + Eigen::DSizes grad_dsize(grad->numel()); o.device(place) = p - lr.broadcast(grad_dsize) * g; } }; From 46530f9e666012dd83c8763563f7acae8f5aad30 Mon Sep 17 00:00:00 2001 From: Kavya Srinet Date: Wed, 4 Oct 2017 19:02:22 -0700 Subject: [PATCH 12/21] Added Leaky Relu activation --- paddle/operators/activation_op.cc | 19 ++++++++++++ paddle/operators/activation_op.h | 30 ++++++++++++++++++- .../v2/framework/tests/test_activation_op.py | 17 +++++++++++ 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/paddle/operators/activation_op.cc b/paddle/operators/activation_op.cc index 7ae4d2f6b6..5f2ecc2673 100644 --- a/paddle/operators/activation_op.cc +++ b/paddle/operators/activation_op.cc @@ -69,6 +69,22 @@ class ReluOpMaker : public framework::OpProtoAndCheckerMaker { } }; +template +class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker { + public: + LeakyReluOpMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", "Input of LeakyRelu operator"); + AddOutput("Y", "Output of LeakyRelu operator"); + AddComment( + "LeakyRelu activation operator, " + "leaky_relu = max(x, alpha * x)"); + AddAttr("alpha", "The small negative slope") + .SetDefault(static_cast(0.02f)); + } +}; + class TanhOpMaker : public framework::OpProtoAndCheckerMaker { public: TanhOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) @@ -240,6 +256,9 @@ REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad, REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker, brelu_grad, ops::ActivationOpGrad); +REGISTER_OP(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker, + leaky_relu_grad, ops::ActivationOpGrad); + REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker, soft_relu_grad, ops::ActivationOpGrad); diff --git a/paddle/operators/activation_op.h b/paddle/operators/activation_op.h index ff35c2d97e..dae66cc77d 100644 --- a/paddle/operators/activation_op.h +++ b/paddle/operators/activation_op.h @@ -309,6 +309,33 @@ struct SoftReluGradFunctor : public BaseActivationFunctor { } }; +template +struct LeakyReluFunctor : public BaseActivationFunctor { + float alpha; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"alpha", &alpha}}; + } + + template + void operator()(Device d, X x, Y y) const { + y.device(d) = x.cwiseMax(alpha * x); + } +}; + +template +struct LeakyReluGradFunctor : public BaseActivationFunctor { + float alpha; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"alpha", &alpha}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { + auto temp1 = alpha * (x < static_cast(0)).template cast().eval(); + auto temp2 = (x >= static_cast(0)).template cast().eval(); + dx.device(d) = dy * (temp1 + temp2).template cast(); + } +}; + template struct PowFunctor : public BaseActivationFunctor { float factor; @@ -379,4 +406,5 @@ struct STanhGradFunctor : public BaseActivationFunctor { __macro(soft_relu, SoftReluFunctor, SoftReluGradFunctor); \ __macro(pow, PowFunctor, PowGradFunctor); \ __macro(stanh, STanhFunctor, STanhGradFunctor); \ - __macro(softsign, SoftsignFunctor, SoftsignGradFunctor) + __macro(softsign, SoftsignFunctor, SoftsignGradFunctor); \ + __macro(leaky_relu, LeakyReluFunctor, LeakyReluGradFunctor) diff --git a/python/paddle/v2/framework/tests/test_activation_op.py b/python/paddle/v2/framework/tests/test_activation_op.py index c44eb84906..ce6dec7748 100644 --- a/python/paddle/v2/framework/tests/test_activation_op.py +++ b/python/paddle/v2/framework/tests/test_activation_op.py @@ -122,6 +122,23 @@ class TestBRelu(OpTest): self.check_grad(['X'], 'Y', max_relative_error=0.02) +class TestLeakyRelu(OpTest): + def setUp(self): + self.op_type = "leaky_relu" + alpha = 0.02 + self.attrs = {'alpha': alpha} + self.inputs = {'X': np.random.uniform(-3, 3, [4, 4]).astype("float32")} + self.outputs = { + 'Y': np.maximum(self.inputs['X'], alpha * self.inputs['X']) + } + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(['X'], 'Y', max_relative_error=0.008) + + class TestSoftRelu(OpTest): def setUp(self): self.op_type = "soft_relu" From 47c994be07d608471734d5d14f980d83f2e0a7a6 Mon Sep 17 00:00:00 2001 From: Kavya Srinet Date: Thu, 5 Oct 2017 08:50:51 -0700 Subject: [PATCH 13/21] Updated the reltive error --- python/paddle/v2/framework/tests/test_activation_op.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/paddle/v2/framework/tests/test_activation_op.py b/python/paddle/v2/framework/tests/test_activation_op.py index ce6dec7748..f232996a55 100644 --- a/python/paddle/v2/framework/tests/test_activation_op.py +++ b/python/paddle/v2/framework/tests/test_activation_op.py @@ -136,7 +136,7 @@ class TestLeakyRelu(OpTest): self.check_output() def test_check_grad(self): - self.check_grad(['X'], 'Y', max_relative_error=0.008) + self.check_grad(['X'], 'Y', max_relative_error=0.007) class TestSoftRelu(OpTest): From 60af56c1b8e60240238d877c093bf9c99706fefe Mon Sep 17 00:00:00 2001 From: Kavya Srinet Date: Wed, 4 Oct 2017 19:02:22 -0700 Subject: [PATCH 14/21] Added Leaky Relu activation --- paddle/operators/activation_op.cc | 19 ++++++++++++ paddle/operators/activation_op.h | 30 ++++++++++++++++++- .../v2/framework/tests/test_activation_op.py | 17 +++++++++++ 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/paddle/operators/activation_op.cc b/paddle/operators/activation_op.cc index 7ae4d2f6b6..5f2ecc2673 100644 --- a/paddle/operators/activation_op.cc +++ b/paddle/operators/activation_op.cc @@ -69,6 +69,22 @@ class ReluOpMaker : public framework::OpProtoAndCheckerMaker { } }; +template +class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker { + public: + LeakyReluOpMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", "Input of LeakyRelu operator"); + AddOutput("Y", "Output of LeakyRelu operator"); + AddComment( + "LeakyRelu activation operator, " + "leaky_relu = max(x, alpha * x)"); + AddAttr("alpha", "The small negative slope") + .SetDefault(static_cast(0.02f)); + } +}; + class TanhOpMaker : public framework::OpProtoAndCheckerMaker { public: TanhOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) @@ -240,6 +256,9 @@ REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad, REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker, brelu_grad, ops::ActivationOpGrad); +REGISTER_OP(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker, + leaky_relu_grad, ops::ActivationOpGrad); + REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker, soft_relu_grad, ops::ActivationOpGrad); diff --git a/paddle/operators/activation_op.h b/paddle/operators/activation_op.h index ff35c2d97e..dae66cc77d 100644 --- a/paddle/operators/activation_op.h +++ b/paddle/operators/activation_op.h @@ -309,6 +309,33 @@ struct SoftReluGradFunctor : public BaseActivationFunctor { } }; +template +struct LeakyReluFunctor : public BaseActivationFunctor { + float alpha; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"alpha", &alpha}}; + } + + template + void operator()(Device d, X x, Y y) const { + y.device(d) = x.cwiseMax(alpha * x); + } +}; + +template +struct LeakyReluGradFunctor : public BaseActivationFunctor { + float alpha; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"alpha", &alpha}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { + auto temp1 = alpha * (x < static_cast(0)).template cast().eval(); + auto temp2 = (x >= static_cast(0)).template cast().eval(); + dx.device(d) = dy * (temp1 + temp2).template cast(); + } +}; + template struct PowFunctor : public BaseActivationFunctor { float factor; @@ -379,4 +406,5 @@ struct STanhGradFunctor : public BaseActivationFunctor { __macro(soft_relu, SoftReluFunctor, SoftReluGradFunctor); \ __macro(pow, PowFunctor, PowGradFunctor); \ __macro(stanh, STanhFunctor, STanhGradFunctor); \ - __macro(softsign, SoftsignFunctor, SoftsignGradFunctor) + __macro(softsign, SoftsignFunctor, SoftsignGradFunctor); \ + __macro(leaky_relu, LeakyReluFunctor, LeakyReluGradFunctor) diff --git a/python/paddle/v2/framework/tests/test_activation_op.py b/python/paddle/v2/framework/tests/test_activation_op.py index c44eb84906..ce6dec7748 100644 --- a/python/paddle/v2/framework/tests/test_activation_op.py +++ b/python/paddle/v2/framework/tests/test_activation_op.py @@ -122,6 +122,23 @@ class TestBRelu(OpTest): self.check_grad(['X'], 'Y', max_relative_error=0.02) +class TestLeakyRelu(OpTest): + def setUp(self): + self.op_type = "leaky_relu" + alpha = 0.02 + self.attrs = {'alpha': alpha} + self.inputs = {'X': np.random.uniform(-3, 3, [4, 4]).astype("float32")} + self.outputs = { + 'Y': np.maximum(self.inputs['X'], alpha * self.inputs['X']) + } + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(['X'], 'Y', max_relative_error=0.008) + + class TestSoftRelu(OpTest): def setUp(self): self.op_type = "soft_relu" From 11070e5f36be24be8da6fa2b70a2dae13212c513 Mon Sep 17 00:00:00 2001 From: Kavya Srinet Date: Thu, 5 Oct 2017 08:50:51 -0700 Subject: [PATCH 15/21] Updated the reltive error --- python/paddle/v2/framework/tests/test_activation_op.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/paddle/v2/framework/tests/test_activation_op.py b/python/paddle/v2/framework/tests/test_activation_op.py index ce6dec7748..f232996a55 100644 --- a/python/paddle/v2/framework/tests/test_activation_op.py +++ b/python/paddle/v2/framework/tests/test_activation_op.py @@ -136,7 +136,7 @@ class TestLeakyRelu(OpTest): self.check_output() def test_check_grad(self): - self.check_grad(['X'], 'Y', max_relative_error=0.008) + self.check_grad(['X'], 'Y', max_relative_error=0.007) class TestSoftRelu(OpTest): From 828c5b3e1dd3c80b955a1c65179ca6d5a27d852d Mon Sep 17 00:00:00 2001 From: Abhinav Arora Date: Thu, 5 Oct 2017 13:07:55 -0700 Subject: [PATCH 16/21] Adding Adadelta optimization operator (#4576) * Adding Adadelta optimization operator * Making inputs and outputs conform to naming convention * Removing type alias from header files * Fixing Adadelta documentation in comments * Addressing code review feedback --- paddle/operators/adadelta_op.cc | 115 ++++++++++++++++++ paddle/operators/adadelta_op.cu | 20 +++ paddle/operators/adadelta_op.h | 69 +++++++++++ .../v2/framework/tests/test_adadelta_op.py | 96 +++++++++++++++ 4 files changed, 300 insertions(+) create mode 100644 paddle/operators/adadelta_op.cc create mode 100644 paddle/operators/adadelta_op.cu create mode 100644 paddle/operators/adadelta_op.h create mode 100644 python/paddle/v2/framework/tests/test_adadelta_op.py diff --git a/paddle/operators/adadelta_op.cc b/paddle/operators/adadelta_op.cc new file mode 100644 index 0000000000..bd8c93b4a1 --- /dev/null +++ b/paddle/operators/adadelta_op.cc @@ -0,0 +1,115 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/adadelta_op.h" + +namespace paddle { +namespace operators { + +class AdadeltaOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(framework::InferShapeContextBase *ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("Param"), + "Input(Param) of AdadeltaOp should not be null."); + PADDLE_ENFORCE(ctx->HasInput("Grad"), + "Input(Grad) of AdadeltaOp should not be null."); + PADDLE_ENFORCE(ctx->HasInput("AvgSquaredGrad"), + "Input(AvgSquaredGrad) of AdadeltaOp should not be null."); + PADDLE_ENFORCE(ctx->HasInput("AvgSquaredUpdate"), + "Input(AvgSquaredUpdate) of AdadeltaOp should not be null."); + + PADDLE_ENFORCE(ctx->HasOutput("ParamOut"), + "Output(ParamOut) of AdadeltaOp should not be null."); + PADDLE_ENFORCE( + ctx->HasOutput("AvgSquaredGradOut"), + "Output(AvgSquaredGradOut) of AdadeltaOp should not be null."); + PADDLE_ENFORCE( + ctx->HasOutput("AvgSquaredUpdateOut"), + "Output(AvgSquaredUpdateOut) of AdadeltaOp should not be null."); + + auto param_dim = ctx->GetInputDim("Param"); + PADDLE_ENFORCE_EQ( + param_dim, ctx->GetInputDim("Grad"), + "param and grad input of AdadeltaOp should have same dimension"); + PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("AvgSquaredGrad"), + "Param and AvgSquaredGrad input of AdadeltaOp " + "should have same dimension"); + PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("AvgSquaredUpdate"), + "Param and AvgSquaredUpdate input of AdadeltaOp " + "should have same dimension"); + + ctx->SetOutputDim("ParamOut", param_dim); + ctx->SetOutputDim("AvgSquaredGradOut", param_dim); + ctx->SetOutputDim("AvgSquaredUpdateOut", param_dim); + } +}; + +class AdadeltaOpMaker : public framework::OpProtoAndCheckerMaker { + public: + AdadeltaOpMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("Param", "(Tensor) Input parameter"); + AddInput("Grad", "(Tensor) Input gradient"); + AddInput("AvgSquaredGrad", + "(Tensor) Input expectation of squared gradient"); + AddInput("AvgSquaredUpdate", + "(Tensor) Input expectation of squared parameter updates"); + + AddOutput("ParamOut", "(Tensor) Output parameter"); + AddOutput("AvgSquaredGradOut", + "(Tensor) Output expectation of squared gradient"); + AddOutput("AvgSquaredUpdateOut", + "(Tensor) Output expectation of squared parameter updates"); + + AddAttr("rho", + "(float, default 0.95) Exponential decay rate " + "for squared gradients.") + .SetDefault(0.95f); + AddAttr("epsilon", + "(float, default 1.0e-6) Constant for " + "numerical stability") + .SetDefault(1.0e-6f); + AddComment(R"DOC( +Adadelta Updates Operator. + +This implements the Adadelta optimizer[1]. Adadelta is a per-dimension +adaptive learning rate method for gradient descent. + +Adadelta updates: + +avg_squared_grad_out = rho * avg_squared_grad + (1 - rho) * grad * grad +param_update = - sqrt((avg_squared_update + epsilon) / + (avg_squared_grad_out + epsilon)) * grad +avg_squared_update_out = rho * avg_squared_update + (1 - rho) * param_update**2 +param_out = param + param_update + +References: + [1] ADADELTA: An Adaptive Learning Rate Method + https://arxiv.org/abs/1212.5701 + +)DOC"); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP_WITHOUT_GRADIENT(adadelta, ops::AdadeltaOp, ops::AdadeltaOpMaker); +REGISTER_OP_CPU_KERNEL( + adadelta, ops::AdadeltaOpKernel); diff --git a/paddle/operators/adadelta_op.cu b/paddle/operators/adadelta_op.cu new file mode 100644 index 0000000000..3af1c8c8e9 --- /dev/null +++ b/paddle/operators/adadelta_op.cu @@ -0,0 +1,20 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#define EIGEN_USE_GPU +#include "paddle/operators/adadelta_op.h" + +namespace ops = paddle::operators; +REGISTER_OP_GPU_KERNEL( + adadelta, ops::AdadeltaOpKernel); diff --git a/paddle/operators/adadelta_op.h b/paddle/operators/adadelta_op.h new file mode 100644 index 0000000000..d29e15c435 --- /dev/null +++ b/paddle/operators/adadelta_op.h @@ -0,0 +1,69 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once +#include "paddle/framework/eigen.h" +#include "paddle/framework/op_registry.h" + +namespace paddle { +namespace operators { + +template +class AdadeltaOpKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + auto param_out_tensor = ctx.Output("ParamOut"); + auto avg_squared_grad_out_tensor = + ctx.Output("AvgSquaredGradOut"); + auto avg_squared_update_out_tensor = + ctx.Output("AvgSquaredUpdateOut"); + + param_out_tensor->mutable_data(ctx.GetPlace()); + avg_squared_grad_out_tensor->mutable_data(ctx.GetPlace()); + avg_squared_update_out_tensor->mutable_data(ctx.GetPlace()); + + float rho = ctx.Attr("rho"); + float epsilon = ctx.Attr("epsilon"); + + auto param = framework::EigenVector::Flatten( + *ctx.Input("Param")); + auto grad = framework::EigenVector::Flatten( + *ctx.Input("Grad")); + // Squared gradient accumulator + auto avg_squared_grad = framework::EigenVector::Flatten( + *ctx.Input("AvgSquaredGrad")); + // Squared updates accumulator + auto avg_squared_update = framework::EigenVector::Flatten( + *ctx.Input("AvgSquaredUpdate")); + auto param_out = framework::EigenVector::Flatten(*param_out_tensor); + auto avg_squared_grad_out = + framework::EigenVector::Flatten(*avg_squared_grad_out_tensor); + auto avg_squared_update_out = + framework::EigenVector::Flatten(*avg_squared_update_out_tensor); + auto place = ctx.GetEigenDevice(); + + avg_squared_grad_out.device(place) = + rho * avg_squared_grad + (1 - rho) * grad.square(); + auto update = + -((avg_squared_update + epsilon) / (avg_squared_grad_out + epsilon)) + .sqrt() * + grad; + avg_squared_update_out.device(place) = + rho * avg_squared_update + (1 - rho) * update.square(); + param_out.device(place) = param + update; + } +}; + +} // namespace operators +} // namespace paddle diff --git a/python/paddle/v2/framework/tests/test_adadelta_op.py b/python/paddle/v2/framework/tests/test_adadelta_op.py new file mode 100644 index 0000000000..7105593a98 --- /dev/null +++ b/python/paddle/v2/framework/tests/test_adadelta_op.py @@ -0,0 +1,96 @@ +import unittest +import numpy as np +from op_test import OpTest + + +class TestAdadeltaOp1(OpTest): + def setUp(self): + self.op_type = "adadelta" + param = np.random.uniform(-1, 1, (102, 105)).astype("float32") + grad = np.random.uniform(-1, 1, (102, 105)).astype("float32") + # The squared gradient is positive + avg_squared_grad = np.random.random((102, 105)).astype("float32") + # The squared update is positive + avg_squared_update = np.random.random((102, 105)).astype("float32") + + rho = 0.95 + epsilon = 1e-6 + + self.inputs = { + 'Param': param, + 'Grad': grad, + 'AvgSquaredGrad': avg_squared_grad, + 'AvgSquaredUpdate': avg_squared_update + } + + self.attrs = {'rho': rho, 'epsilon': epsilon} + + avg_squared_grad_out = rho * avg_squared_grad + \ + (1 - rho) * np.square(grad) + update = -np.multiply( + np.sqrt( + np.divide(avg_squared_update + epsilon, avg_squared_grad_out + + epsilon)), grad) + + avg_squared_update_out = rho * avg_squared_update + \ + (1 - rho) * np.square(update) + + param_out = param + update + + self.outputs = { + 'ParamOut': param_out, + 'AvgSquaredGradOut': avg_squared_grad_out, + 'AvgSquaredUpdateOut': avg_squared_update_out + } + + def test_check_output(self): + self.check_output() + + +class TestAdadeltaOp2(OpTest): + '''Test Adadelta op with default attribute values + ''' + + def setUp(self): + self.op_type = "adadelta" + param = np.random.uniform(-1, 1, (102, 105)).astype("float32") + grad = np.random.uniform(-1, 1, (102, 105)).astype("float32") + # The squared gradient is positive + avg_squared_grad = np.random.random((102, 105)).astype("float32") + # The squared update is positive + avg_squared_update = np.random.random((102, 105)).astype("float32") + + rho = 0.95 + epsilon = 1e-6 + + self.inputs = { + 'Param': param, + 'Grad': grad, + 'AvgSquaredGrad': avg_squared_grad, + 'AvgSquaredUpdate': avg_squared_update + } + + avg_squared_grad_out = rho * avg_squared_grad + \ + (1 - rho) * np.square(grad) + update = -np.multiply( + np.sqrt( + np.divide(avg_squared_update + epsilon, avg_squared_grad_out + + epsilon)), grad) + + avg_squared_update_out = rho * avg_squared_update + \ + (1 - rho) * np.square(update) + + param_out = param + update + + self.outputs = { + 'ParamOut': param_out, + 'AvgSquaredGradOut': avg_squared_grad_out, + 'AvgSquaredUpdateOut': avg_squared_update_out + } + + def test_check_output(self): + self.check_output() + + +if __name__ == "__main__": + unittest.main() From 45c4dcaabb4cbf140384dcffe3392d2e10b2a6d7 Mon Sep 17 00:00:00 2001 From: qijun Date: Thu, 5 Oct 2017 15:54:44 -0700 Subject: [PATCH 17/21] add fetch operator --- paddle/framework/executor.cc | 18 ++++---- paddle/framework/executor_test.cc | 67 ++++++++++++++++++++++++++++++ paddle/framework/scope.cc | 5 ++- paddle/operators/activation_op.cu | 18 ++++---- paddle/operators/feed_op.cc | 6 +-- paddle/operators/fetch_op.cc | 68 +++++++++++++++++++++++++++++++ paddle/operators/fetch_op.cu | 18 ++++++++ paddle/operators/fetch_op.h | 40 ++++++++++++++++++ 8 files changed, 218 insertions(+), 22 deletions(-) create mode 100644 paddle/operators/fetch_op.cc create mode 100644 paddle/operators/fetch_op.cu create mode 100644 paddle/operators/fetch_op.h diff --git a/paddle/framework/executor.cc b/paddle/framework/executor.cc index aafef12554..51ddb7e58e 100644 --- a/paddle/framework/executor.cc +++ b/paddle/framework/executor.cc @@ -75,15 +75,15 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope) { device_context->Wait(); } // // print tensor value - for (auto& var : block.vars()) { - std::cout << var.name() << std::endl; - auto v = scope->FindVar(var.name()); - const LoDTensor& t = v->Get(); - for (int i = 0; i < t.numel(); ++i) { - std::cout << t.data()[i] << " "; - } - std::cout << std::endl; - } + // for (auto& var : block.vars()) { + // std::cout << var.name() << std::endl; + // auto v = scope->FindVar(var.name()); + // const LoDTensor& t = v->Get(); + // for (int i = 0; i < t.numel(); ++i) { + // std::cout << t.data()[i] << " "; + // } + // std::cout << std::endl; + // } } } // namespace framework diff --git a/paddle/framework/executor_test.cc b/paddle/framework/executor_test.cc index 0856d1f32e..980f5f579c 100644 --- a/paddle/framework/executor_test.cc +++ b/paddle/framework/executor_test.cc @@ -25,6 +25,7 @@ limitations under the License. */ USE_OP(elementwise_add); USE_OP(gaussian_random); USE_OP(feed); +USE_OP(fetch); using std::string; using namespace paddle::platform; @@ -94,6 +95,41 @@ void add_feed_op(string var_name, int index, proto_block* block) { Out->add_arguments(var_name); } +void add_fetch_op(string var_name, int index, proto_block* block) { + std::vector dim{3}; + + // insert variable + auto a = block->add_vars(); + a->set_name(var_name); + auto a_lt = a->mutable_lod_tensor(); + a_lt->set_data_type(paddle::framework::DataType::FP32); + for (int i : dim) { + a_lt->add_dims(i); + } + + // insert operation + auto op = block->add_ops(); + op->set_type("fetch"); + + // set dims attr + auto dims = op->add_attrs(); + dims->set_name("dims"); + dims->set_type(paddle::framework::AttrType::INTS); + for (int i : dim) { + dims->add_ints(i); + } + + // set col attr + auto col = op->add_attrs(); + col->set_name("col"); + col->set_type(paddle::framework::AttrType::INT); + col->set_i(index); + + auto Out = op->add_inputs(); + Out->set_parameter("Input"); + Out->add_arguments(var_name); +} + std::once_flag set_variable_flag; template @@ -119,6 +155,27 @@ void set_feed_variable(const std::vector>& inputs) { } } +template +std::vector> get_fetch_variable() { + typedef std::vector FetchOutputs; + Variable* g_fetch_value = GetScope()->FindVar("fetch_value"); + FetchOutputs& fetch_outputs = *(g_fetch_value->GetMutable()); + auto size = fetch_outputs.size(); + + std::vector> result; + result.reserve(size); + + for (size_t i = 0; i < size; i++) { + std::vector tmp; + tmp.reserve(fetch_outputs[i].numel()); + memcpy(tmp.data(), fetch_outputs[i].data(), + fetch_outputs[i].numel() * sizeof(T)); + result.push_back(tmp); + } + + return result; +} + class ExecutorTesterRandom : public ::testing::Test { public: virtual void SetUp() override { @@ -181,6 +238,8 @@ class ExecutorTesterFeed : public ::testing::Test { Out->set_parameter("Out"); Out->add_arguments("c"); + add_fetch_op("c", 0, root_block); + std::vector vec1 = {1.0, 2.0, 3.0}; std::vector vec2 = {4.0, 5.0, 6.0}; inputs_.push_back(vec1); @@ -213,8 +272,16 @@ TEST_F(ExecutorTesterFeed, CPU) { // 3 mini-batch for (int i = 0; i < 3; i++) { // need to set feed variable before Executor::Run + std::cout << "start mini-batch " << i << std::endl; set_feed_variable(inputs_); executor->Run(pdesc_, GetScope()); + std::vector> result = get_fetch_variable(); + for (auto& vec : result) { + for (auto& num : vec) { + std::cout << num << " "; + } + std::cout << std::endl; + } } delete executor; diff --git a/paddle/framework/scope.cc b/paddle/framework/scope.cc index b04120abf2..2c416570cf 100644 --- a/paddle/framework/scope.cc +++ b/paddle/framework/scope.cc @@ -74,7 +74,10 @@ std::unique_ptr make_unique(Args&&... args) { framework::Scope* GetScope() { static std::unique_ptr g_scope = make_unique(); - std::call_once(feed_variable_flag, [&]() { g_scope->NewVar("feed_value"); }); + std::call_once(feed_variable_flag, [&]() { + g_scope->NewVar("feed_value"); + g_scope->NewVar("fetch_value"); + }); return g_scope.get(); } diff --git a/paddle/operators/activation_op.cu b/paddle/operators/activation_op.cu index 44a6aaf9cb..93e9f1c694 100644 --- a/paddle/operators/activation_op.cu +++ b/paddle/operators/activation_op.cu @@ -1,16 +1,16 @@ /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at -http://www.apache.org/licenses/LICENSE-2.0 + http://www.apache.org/licenses/LICENSE-2.0 -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ #define EIGEN_USE_GPU #include "paddle/operators/activation_op.h" diff --git a/paddle/operators/feed_op.cc b/paddle/operators/feed_op.cc index 5ae882bc8a..a61855cb99 100644 --- a/paddle/operators/feed_op.cc +++ b/paddle/operators/feed_op.cc @@ -49,9 +49,9 @@ class FeedOpMaker : public framework::OpProtoAndCheckerMaker { AddAttr("data_type", "output data type") .SetDefault(framework::DataType::FP32); AddAttr("col", "The col in global feed variable").SetDefault(0); - AddAttr>("dims", "The dimension of random tensor."); - AddOutput("Out", "The output of dropout op."); - AddComment(R"DOC(Feed data to global feed variable)DOC"); + AddAttr>("dims", "The dimension of feed tensor."); + AddOutput("Out", "The output of feed op."); + AddComment(R"DOC(Feed data from global feed variable)DOC"); } }; diff --git a/paddle/operators/fetch_op.cc b/paddle/operators/fetch_op.cc new file mode 100644 index 0000000000..68e8d26dbe --- /dev/null +++ b/paddle/operators/fetch_op.cc @@ -0,0 +1,68 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/fetch_op.h" + +namespace paddle { +namespace operators { + +class FetchOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(framework::InferShapeContextBase* ctx) const override { + typedef std::vector FetchOutputs; + PADDLE_ENFORCE(ctx->HasInput("Input"), "Input should be not null."); + int col = ctx->Attrs().Get("col"); + framework::Variable* g_fetch_variable = + framework::GetScope()->FindVar("fetch_value"); + + FetchOutputs* tensors = g_fetch_variable->GetMutable(); + if (tensors->size() < col) { + tensors->resize(col); + } + + auto input_dim = ctx->GetInputDim("Input"); + framework::Tensor tmp; + tmp.Resize(input_dim); + (*tensors)[col].Resize(input_dim); + // need to handle LodTensor later + } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return static_cast(Attr("data_type")); + } +}; + +class FetchOpMaker : public framework::OpProtoAndCheckerMaker { + public: + FetchOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddAttr("data_type", "output data type") + .SetDefault(framework::DataType::FP32); + AddAttr("col", "The col in global fetch variable").SetDefault(0); + AddAttr>("dims", "The dimension of fetch tensor."); + AddInput("Input", "The output of fetch op."); + AddComment(R"DOC(Fetch data to global fetch variable)DOC"); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP_WITHOUT_GRADIENT(fetch, ops::FetchOp, ops::FetchOpMaker); +REGISTER_OP_CPU_KERNEL(fetch, ops::FetchKernel); diff --git a/paddle/operators/fetch_op.cu b/paddle/operators/fetch_op.cu new file mode 100644 index 0000000000..2e24d3a8ad --- /dev/null +++ b/paddle/operators/fetch_op.cu @@ -0,0 +1,18 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/feed_op.h" + +namespace ops = paddle::operators; +REGISTER_OP_GPU_KERNEL(fetch, ops::FetchKernel); diff --git a/paddle/operators/fetch_op.h b/paddle/operators/fetch_op.h new file mode 100644 index 0000000000..95e7986a22 --- /dev/null +++ b/paddle/operators/fetch_op.h @@ -0,0 +1,40 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once +#include "paddle/framework/eigen.h" +#include "paddle/framework/op_registry.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; + +template +class FetchKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + typedef std::vector FetchOutputs; + Tensor* input = ctx.Output("Input"); + int col = ctx.template Attr("col"); + framework::Variable* g_fetch_variable = + framework::GetScope()->FindVar("fetch_value"); + FetchOutputs tensors = g_fetch_variable->Get(); + tensors[col].mutable_data(platform::CPUPlace()); + tensors[col].CopyFrom(*input, platform::CPUPlace()); + } +}; + +} // namespace operators +} // namespace paddle From 48b080db9fcc4f34535c98878112e6633d6d8d7d Mon Sep 17 00:00:00 2001 From: qijun Date: Thu, 5 Oct 2017 20:48:04 -0700 Subject: [PATCH 18/21] ensure global BuddyAllocator is initialized before global Scope --- paddle/framework/executor_test.cc | 94 +++++++++++++++++-------------- paddle/operators/feed_op.cc | 4 +- paddle/operators/feed_op.h | 2 +- paddle/operators/fetch_op.cc | 7 ++- paddle/operators/fetch_op.h | 8 +-- 5 files changed, 62 insertions(+), 53 deletions(-) diff --git a/paddle/framework/executor_test.cc b/paddle/framework/executor_test.cc index 980f5f579c..d3ea18d154 100644 --- a/paddle/framework/executor_test.cc +++ b/paddle/framework/executor_test.cc @@ -13,8 +13,6 @@ See the License for the specific language governing permissions and limitations under the License. */ #include "paddle/framework/executor.h" -#include // for unique_ptr -#include // for call_once #include #include "gtest/gtest.h" #include "paddle/framework/attribute.h" @@ -34,9 +32,8 @@ using namespace paddle::framework; typedef paddle::framework::BlockDesc proto_block; typedef paddle::framework::OpDesc proto_op; -void add_gaussian_random_op(string var_name, proto_block* block) { - std::vector dim{2, 3}; - +void add_gaussian_random_op(string var_name, std::vector& dim, + proto_block* block) { // insert variable auto a = block->add_vars(); a->set_name(var_name); @@ -60,9 +57,8 @@ void add_gaussian_random_op(string var_name, proto_block* block) { Out->add_arguments(var_name); } -void add_feed_op(string var_name, int index, proto_block* block) { - std::vector dim{3}; - +void add_feed_op(string var_name, std::vector& dim, int index, + proto_block* block) { // insert variable auto a = block->add_vars(); a->set_name(var_name); @@ -95,9 +91,8 @@ void add_feed_op(string var_name, int index, proto_block* block) { Out->add_arguments(var_name); } -void add_fetch_op(string var_name, int index, proto_block* block) { - std::vector dim{3}; - +void add_fetch_op(string var_name, std::vector& dim, int index, + proto_block* block) { // insert variable auto a = block->add_vars(); a->set_name(var_name); @@ -138,20 +133,11 @@ void set_feed_variable(const std::vector>& inputs) { Variable* g_feed_value = GetScope()->FindVar("feed_value"); FeedInputs& feed_inputs = *(g_feed_value->GetMutable()); auto size = inputs.size(); - - std::call_once(set_variable_flag, [&]() { - feed_inputs.reserve(size); - for (size_t i = 0; i < size; i++) { - paddle::framework::Tensor tmp; - tmp.mutable_data(make_ddim({static_cast(inputs[i].size())}), - CPUPlace()); - feed_inputs.push_back(tmp); - } - }); - + feed_inputs.resize(size); for (size_t i = 0; i < size; i++) { - memcpy(feed_inputs[i].data(), inputs[i].data(), - inputs[i].size() * sizeof(T)); + T* dst = feed_inputs[i].mutable_data( + make_ddim({static_cast(inputs[i].size())}), CPUPlace()); + memcpy(dst, inputs[i].data(), inputs[i].size() * sizeof(T)); } } @@ -160,19 +146,17 @@ std::vector> get_fetch_variable() { typedef std::vector FetchOutputs; Variable* g_fetch_value = GetScope()->FindVar("fetch_value"); FetchOutputs& fetch_outputs = *(g_fetch_value->GetMutable()); - auto size = fetch_outputs.size(); + auto size = fetch_outputs.size(); std::vector> result; result.reserve(size); - for (size_t i = 0; i < size; i++) { std::vector tmp; - tmp.reserve(fetch_outputs[i].numel()); + tmp.resize(fetch_outputs[i].numel()); memcpy(tmp.data(), fetch_outputs[i].data(), fetch_outputs[i].numel() * sizeof(T)); result.push_back(tmp); } - return result; } @@ -183,8 +167,9 @@ class ExecutorTesterRandom : public ::testing::Test { root_block->set_idx(0); root_block->set_parent_idx(-1); - add_gaussian_random_op("a", root_block); - add_gaussian_random_op("b", root_block); + std::vector dim{2, 3}; + add_gaussian_random_op("a", dim, root_block); + add_gaussian_random_op("b", dim, root_block); auto c = root_block->add_vars(); c->set_name("c"); @@ -203,12 +188,11 @@ class ExecutorTesterRandom : public ::testing::Test { Out->set_parameter("Out"); Out->add_arguments("c"); - scope_ = GetScope(); + add_fetch_op("c", dim, 0, root_block); } protected: ProgramDesc pdesc_; - Scope* scope_; }; class ExecutorTesterFeed : public ::testing::Test { @@ -218,8 +202,10 @@ class ExecutorTesterFeed : public ::testing::Test { root_block->set_idx(0); root_block->set_parent_idx(-1); - add_feed_op("a", 0, root_block); - add_feed_op("b", 1, root_block); + std::vector dim{6}; + + add_feed_op("a", dim, 0, root_block); + add_feed_op("b", dim, 1, root_block); auto c = root_block->add_vars(); c->set_name("c"); @@ -238,10 +224,10 @@ class ExecutorTesterFeed : public ::testing::Test { Out->set_parameter("Out"); Out->add_arguments("c"); - add_fetch_op("c", 0, root_block); + add_fetch_op("c", dim, 0, root_block); - std::vector vec1 = {1.0, 2.0, 3.0}; - std::vector vec2 = {4.0, 5.0, 6.0}; + std::vector vec1 = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0}; + std::vector vec2 = {4.0, 5.0, 6.0, 7.0, 8.0, 9.0}; inputs_.push_back(vec1); inputs_.push_back(vec2); } @@ -253,12 +239,24 @@ class ExecutorTesterFeed : public ::testing::Test { TEST_F(ExecutorTesterRandom, CPU) { std::vector places; - CPUPlace cpu_place1, cpu_place2; - places.push_back(cpu_place1); - places.push_back(cpu_place2); + CPUPlace cpu_place; + places.push_back(cpu_place); + + // We have a global Scope and BuddyAllocator, and we must ensure + // global BuddyAllocator is initialized before global Scope. Thus, + // global Scope will deconstruct before BuddyAllocator. Otherwise, + // "pointer being freed was not allocated" error will appear. + paddle::memory::Used(cpu_place); Executor* executor = new Executor(places); - executor->Run(pdesc_, scope_); + executor->Run(pdesc_, GetScope()); + std::vector> result = get_fetch_variable(); + for (auto& vec : result) { + for (auto& num : vec) { + std::cout << num << " "; + } + std::cout << std::endl; + } delete executor; } @@ -267,6 +265,12 @@ TEST_F(ExecutorTesterFeed, CPU) { CPUPlace cpu_place; places.push_back(cpu_place); + // We have a global Scope and BuddyAllocator, and we must ensure + // global BuddyAllocator is initialized before global Scope. Thus, + // global Scope will deconstruct before BuddyAllocator. Otherwise, + // "pointer being freed was not allocated" error will appear. + paddle::memory::Used(cpu_place); + Executor* executor = new Executor(places); // 3 mini-batch @@ -293,8 +297,10 @@ TEST_F(ExecutorTesterRandom, GPU) { GPUPlace gpu_place(0); places.push_back(gpu_place); + paddle::memory::Used(gpu_place); + Executor* executor = new Executor(places); - executor->Run(pdesc_, scope_); + executor->Run(pdesc_, GetScope()); delete executor; } @@ -303,11 +309,13 @@ TEST_F(ExecutorTesterFeed, GPU) { GPUPlace gpu_place(0); places.push_back(gpu_place); + paddle::memory::Used(gpu_place); + Executor* executor = new Executor(places); // need to set feed variable before Executor::Run set_feed_variable(inputs_); - executor->Run(pdesc_, scope_); + executor->Run(pdesc_, GetScope()); delete executor; } diff --git a/paddle/operators/feed_op.cc b/paddle/operators/feed_op.cc index a61855cb99..d40db3ff2e 100644 --- a/paddle/operators/feed_op.cc +++ b/paddle/operators/feed_op.cc @@ -29,11 +29,11 @@ class FeedOp : public framework::OperatorWithKernel { framework::Variable* g_feed_variable = framework::GetScope()->FindVar("feed_value"); - FeedInputs tensors = g_feed_variable->Get(); + const FeedInputs& tensors = g_feed_variable->Get(); auto in_dim = tensors[col].dims(); ctx->SetOutputDim("Out", in_dim); - // need to handle LodTensor later + // TODO(qijun) need to handle LodTensor later } framework::DataType IndicateDataType( diff --git a/paddle/operators/feed_op.h b/paddle/operators/feed_op.h index 57781e205f..cf93b6f434 100644 --- a/paddle/operators/feed_op.h +++ b/paddle/operators/feed_op.h @@ -31,7 +31,7 @@ class FeedKernel : public framework::OpKernel { framework::Variable* g_feed_variable = framework::GetScope()->FindVar("feed_value"); int col = ctx.template Attr("col"); - FeedInputs tensors = g_feed_variable->Get(); + const FeedInputs& tensors = g_feed_variable->Get(); out->CopyFrom(tensors[col], ctx.GetPlace()); } }; diff --git a/paddle/operators/fetch_op.cc b/paddle/operators/fetch_op.cc index 68e8d26dbe..a885deacc8 100644 --- a/paddle/operators/fetch_op.cc +++ b/paddle/operators/fetch_op.cc @@ -30,15 +30,16 @@ class FetchOp : public framework::OperatorWithKernel { framework::GetScope()->FindVar("fetch_value"); FetchOutputs* tensors = g_fetch_variable->GetMutable(); - if (tensors->size() < col) { - tensors->resize(col); + if (tensors->size() < static_cast(col + 1)) { + tensors->resize(col + 1); } auto input_dim = ctx->GetInputDim("Input"); framework::Tensor tmp; tmp.Resize(input_dim); (*tensors)[col].Resize(input_dim); - // need to handle LodTensor later + + // TODO(qijun) need to handle LodTensor later } framework::DataType IndicateDataType( diff --git a/paddle/operators/fetch_op.h b/paddle/operators/fetch_op.h index 95e7986a22..e8d5e3a9c0 100644 --- a/paddle/operators/fetch_op.h +++ b/paddle/operators/fetch_op.h @@ -26,13 +26,13 @@ class FetchKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { typedef std::vector FetchOutputs; - Tensor* input = ctx.Output("Input"); + const Tensor* input = ctx.Input("Input"); int col = ctx.template Attr("col"); framework::Variable* g_fetch_variable = framework::GetScope()->FindVar("fetch_value"); - FetchOutputs tensors = g_fetch_variable->Get(); - tensors[col].mutable_data(platform::CPUPlace()); - tensors[col].CopyFrom(*input, platform::CPUPlace()); + FetchOutputs* tensors = g_fetch_variable->GetMutable(); + (*tensors)[col].mutable_data(platform::CPUPlace()); + (*tensors)[col].CopyFrom(*input, platform::CPUPlace()); } }; From bbceb72398f23902fae2f011c2b6c7f2a8b7b8e3 Mon Sep 17 00:00:00 2001 From: qijun Date: Thu, 5 Oct 2017 20:54:16 -0700 Subject: [PATCH 19/21] refine some codes --- paddle/framework/executor.cc | 10 ---------- paddle/framework/executor_test.cc | 2 ++ paddle/framework/scope.cc | 9 ++------- paddle/operators/feed_op.cc | 2 +- paddle/operators/fetch_op.cc | 2 +- 5 files changed, 6 insertions(+), 19 deletions(-) diff --git a/paddle/framework/executor.cc b/paddle/framework/executor.cc index 51ddb7e58e..ee0df039ac 100644 --- a/paddle/framework/executor.cc +++ b/paddle/framework/executor.cc @@ -74,16 +74,6 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope) { for (auto& device_context : device_contexts_) { device_context->Wait(); } - // // print tensor value - // for (auto& var : block.vars()) { - // std::cout << var.name() << std::endl; - // auto v = scope->FindVar(var.name()); - // const LoDTensor& t = v->Get(); - // for (int i = 0; i < t.numel(); ++i) { - // std::cout << t.data()[i] << " "; - // } - // std::cout << std::endl; - // } } } // namespace framework diff --git a/paddle/framework/executor_test.cc b/paddle/framework/executor_test.cc index d3ea18d154..5e327cc893 100644 --- a/paddle/framework/executor_test.cc +++ b/paddle/framework/executor_test.cc @@ -130,6 +130,7 @@ std::once_flag set_variable_flag; template void set_feed_variable(const std::vector>& inputs) { typedef std::vector FeedInputs; + // Tensors in feed value variable will only be in CPUPlace Variable* g_feed_value = GetScope()->FindVar("feed_value"); FeedInputs& feed_inputs = *(g_feed_value->GetMutable()); auto size = inputs.size(); @@ -144,6 +145,7 @@ void set_feed_variable(const std::vector>& inputs) { template std::vector> get_fetch_variable() { typedef std::vector FetchOutputs; + // Tensors in fetch value variable will only be in CPUPlace Variable* g_fetch_value = GetScope()->FindVar("fetch_value"); FetchOutputs& fetch_outputs = *(g_fetch_value->GetMutable()); diff --git a/paddle/framework/scope.cc b/paddle/framework/scope.cc index 2c416570cf..b6a9d7fbc2 100644 --- a/paddle/framework/scope.cc +++ b/paddle/framework/scope.cc @@ -66,15 +66,10 @@ void Scope::DropKids() { std::once_flag feed_variable_flag; -template -std::unique_ptr make_unique(Args&&... args) { - return std::unique_ptr(new T(std::forward(args)...)); -} - framework::Scope* GetScope() { - static std::unique_ptr g_scope = - make_unique(); + static std::unique_ptr g_scope{nullptr}; std::call_once(feed_variable_flag, [&]() { + g_scope.reset(new framework::Scope()); g_scope->NewVar("feed_value"); g_scope->NewVar("fetch_value"); }); diff --git a/paddle/operators/feed_op.cc b/paddle/operators/feed_op.cc index d40db3ff2e..f2c498e2e2 100644 --- a/paddle/operators/feed_op.cc +++ b/paddle/operators/feed_op.cc @@ -33,7 +33,7 @@ class FeedOp : public framework::OperatorWithKernel { auto in_dim = tensors[col].dims(); ctx->SetOutputDim("Out", in_dim); - // TODO(qijun) need to handle LodTensor later + // TODO(qijun): need to handle LodTensor later } framework::DataType IndicateDataType( diff --git a/paddle/operators/fetch_op.cc b/paddle/operators/fetch_op.cc index a885deacc8..f6882cbd03 100644 --- a/paddle/operators/fetch_op.cc +++ b/paddle/operators/fetch_op.cc @@ -39,7 +39,7 @@ class FetchOp : public framework::OperatorWithKernel { tmp.Resize(input_dim); (*tensors)[col].Resize(input_dim); - // TODO(qijun) need to handle LodTensor later + // TODO(qijun): need to handle LodTensor later } framework::DataType IndicateDataType( From 1f5192a27b968a7980c2eead7b6885e66f09575a Mon Sep 17 00:00:00 2001 From: qijun Date: Fri, 6 Oct 2017 11:06:59 -0700 Subject: [PATCH 20/21] fix executor gpu unittest --- paddle/framework/executor.cc | 2 +- paddle/framework/executor_test.cc | 20 +++++++++++++++----- paddle/operators/fetch_op.cu | 2 +- paddle/platform/gpu_info.cc | 3 ++- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/paddle/framework/executor.cc b/paddle/framework/executor.cc index ee0df039ac..c18ba049c8 100644 --- a/paddle/framework/executor.cc +++ b/paddle/framework/executor.cc @@ -30,7 +30,7 @@ Executor::Executor(const std::vector& places) { device_contexts_[i] = new platform::CPUDeviceContext( boost::get(places[i])); } else if (platform::is_gpu_place(places[i])) { -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA device_contexts_[i] = new platform::CUDADeviceContext( boost::get(places[i])); #else diff --git a/paddle/framework/executor_test.cc b/paddle/framework/executor_test.cc index 5e327cc893..55e209628b 100644 --- a/paddle/framework/executor_test.cc +++ b/paddle/framework/executor_test.cc @@ -293,7 +293,7 @@ TEST_F(ExecutorTesterFeed, CPU) { delete executor; } -#ifdef PADDLE_WITH_GPU +#ifdef PADDLE_WITH_CUDA TEST_F(ExecutorTesterRandom, GPU) { std::vector places; GPUPlace gpu_place(0); @@ -315,10 +315,20 @@ TEST_F(ExecutorTesterFeed, GPU) { Executor* executor = new Executor(places); - // need to set feed variable before Executor::Run - set_feed_variable(inputs_); - executor->Run(pdesc_, GetScope()); - + // 3 mini-batch + for (int i = 0; i < 3; i++) { + // need to set feed variable before Executor::Run + std::cout << "start mini-batch " << i << std::endl; + set_feed_variable(inputs_); + executor->Run(pdesc_, GetScope()); + std::vector> result = get_fetch_variable(); + for (auto& vec : result) { + for (auto& num : vec) { + std::cout << num << " "; + } + std::cout << std::endl; + } + } delete executor; } #endif diff --git a/paddle/operators/fetch_op.cu b/paddle/operators/fetch_op.cu index 2e24d3a8ad..ca39d24c79 100644 --- a/paddle/operators/fetch_op.cu +++ b/paddle/operators/fetch_op.cu @@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#include "paddle/operators/feed_op.h" +#include "paddle/operators/fetch_op.h" namespace ops = paddle::operators; REGISTER_OP_GPU_KERNEL(fetch, ops::FetchKernel); diff --git a/paddle/platform/gpu_info.cc b/paddle/platform/gpu_info.cc index 486dcd623a..aa76bb209d 100644 --- a/paddle/platform/gpu_info.cc +++ b/paddle/platform/gpu_info.cc @@ -43,7 +43,8 @@ int GetCurrentDeviceId() { } void SetDeviceId(int id) { - PADDLE_ENFORCE(id < GetDeviceCount(), "id must less than GPU count"); + // TODO(qijun): find a better way to cache the cuda device count + PADDLE_ENFORCE(id < GetCUDADeviceCount(), "id must less than GPU count"); PADDLE_ENFORCE(cudaSetDevice(id), "cudaSetDevice failed in paddle::platform::SetDeviceId"); } From e8a678e1eecd11fee219a93c6c586ee24663a506 Mon Sep 17 00:00:00 2001 From: qijun Date: Fri, 6 Oct 2017 22:46:04 +0000 Subject: [PATCH 21/21] fix executor gpu unittest runtime error --- paddle/framework/executor_test.cc | 19 ++++++++++++++++--- paddle/operators/fetch_op.cc | 2 -- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/paddle/framework/executor_test.cc b/paddle/framework/executor_test.cc index 55e209628b..82f9bd6f2d 100644 --- a/paddle/framework/executor_test.cc +++ b/paddle/framework/executor_test.cc @@ -239,6 +239,7 @@ class ExecutorTesterFeed : public ::testing::Test { std::vector> inputs_; }; +#ifndef PADDLE_WITH_CUDA TEST_F(ExecutorTesterRandom, CPU) { std::vector places; CPUPlace cpu_place; @@ -292,13 +293,19 @@ TEST_F(ExecutorTesterFeed, CPU) { delete executor; } - -#ifdef PADDLE_WITH_CUDA +#else TEST_F(ExecutorTesterRandom, GPU) { std::vector places; GPUPlace gpu_place(0); places.push_back(gpu_place); + // We have a global Scope and BuddyAllocator, and we must ensure + // global BuddyAllocator is initialized before global Scope. Thus, + // global Scope will deconstruct before BuddyAllocator. Otherwise, + // "pointer being freed was not allocated" error will appear. + // If paddle is compiled with GPU, both CPU and GPU BuddyAllocator + // need to be used at first. + paddle::memory::Used(CPUPlace()); paddle::memory::Used(gpu_place); Executor* executor = new Executor(places); @@ -310,7 +317,13 @@ TEST_F(ExecutorTesterFeed, GPU) { std::vector places; GPUPlace gpu_place(0); places.push_back(gpu_place); - + // We have a global Scope and BuddyAllocator, and we must ensure + // global BuddyAllocator is initialized before global Scope. Thus, + // global Scope will deconstruct before BuddyAllocator. Otherwise, + // "pointer being freed was not allocated" error will appear. + // If paddle is compiled with GPU, both CPU and GPU BuddyAllocator + // need to be used at first. + paddle::memory::Used(CPUPlace()); paddle::memory::Used(gpu_place); Executor* executor = new Executor(places); diff --git a/paddle/operators/fetch_op.cc b/paddle/operators/fetch_op.cc index f6882cbd03..4b6b3ca85a 100644 --- a/paddle/operators/fetch_op.cc +++ b/paddle/operators/fetch_op.cc @@ -35,8 +35,6 @@ class FetchOp : public framework::OperatorWithKernel { } auto input_dim = ctx->GetInputDim("Input"); - framework::Tensor tmp; - tmp.Resize(input_dim); (*tensors)[col].Resize(input_dim); // TODO(qijun): need to handle LodTensor later