update to develop branch and resolve conflicts.

8 years ago · dbe0598745
parent ad64ca5da2 6d0d29f645
commit dbe0598745
101 changed files with 2552 additions and 993 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -21,7 +21,6 @@ addons:
      - python
      - python-pip
      - python2.7-dev
      - python-numpy
      - python-wheel
      - libboost-dev
      - curl
@ -35,8 +34,8 @@ before_install:
  - if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
  # Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python
  # protobuf version.
-  - pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
+  - sudo pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
-  - pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
+  - sudo pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
  - curl https://glide.sh/get | bash
  - eval "$(GIMME_GO_VERSION=1.8.3 gimme)"
  - go get -u github.com/alecthomas/gometalinter
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE)
 endif()
 if(ANDROID)
-    if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "21")
+    if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "16")
-        message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 21")
+        message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 16")
    endif()
    set(WITH_GPU OFF CACHE STRING
--- a/doc/design/functions_operators_layers.md
+++ b/doc/design/functions_operators_layers.md
@ -86,12 +86,13 @@ def layer.fc(X):
 We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`.  So we have the following concepts in above illustrative example:
-```
+
 | C++ functions/functors | mul          | add          |             |          |
 |------------------------|--------------|--------------|-------------|----------|
 | C++ operator class     | mulOp        | addOp        | FCOp        |          |
 | Python binding         | operator.mul | operator.add | operator.fc |          |
 | Python function        |              |              |             | layer.fc |
-```
+
 This is how we differentiate layer and operators in PaddlePaddle:
--- a/doc/design/ops/dist_train.md
+++ b/doc/design/ops/dist_train.md
@ -0,0 +1,106 @@
 # Design Doc: Operation Graph Based Parameter Server
 ## Abstract
 We propose an approach to implement the parameter server. In this
 approach, there is no fundamental difference between the trainer and
 the parameter server: they both run subgraphs, but subgraphs of
 different purposes.
 ## Background
 The previous implementations of the parameter server does not run a
 subgraph. parameter initialization, optimizer computation, network
 communication and checkpointing are implemented twice on both the
 trainer and the parameter server.
 It would be great if we can write code once and use them on both the
 trainer and the parameter server: reduces code duplication and
 improves extensibility. Given that after the current refactor, we are
 representing everything as a computing graph on the
 trainer. Representing everything as a computing graph on the parameter
 server becomes a natural extension.
 ## Design
 ### Graph Converter
 The *graph converter* converts the user-defined operation (OP) graph
 into subgraphs to be scheduled on different nodes with the following
 steps:
 1. OP placement: the OPs will be placed on different nodes according
   to heuristic that minimizes estimated total computation
   time. Currently we will use a simple heuristic that puts parameter
   varable on parameter server workers and everything else on trainer
   workers.
 1. Add communication OPs to enable the communication between nodes.
 We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
 Below is an example of converting the user defined graph to the
 subgraphs for the trainer and the parameter server:
 <img src="src/local-graph.png" width="300"/>
 After converting:
 <img src="src/dist-graph.png" width="700"/>
 1. The parameter variable W and it's optimizer subgraph are placed on the parameter server.
 1. Operators are added to the subgraphs.
   - *Send* sends data to the connected *Recv* operator.  The
 	 scheduler on the receive node will only schedule *Recv* operator
 	 to run when the *Send* operator has ran (the *Send* OP will mark
 	 the *Recv* OP runnable automatically).
   - *Enueue* enqueues the input variable, it can block until space
     become available in the queue.
   - *Dequeue* outputs configurable numbers of tensors from the
     queue. It will block until the queue have the required number of
     tensors.
 ### Benefits
 - Model parallelism become easier to implement: it's an extension to
  the trainer - parameter server approach. we already have the
  communication OPs, but need to extend the graph converter's
  placement functionality.
 - User-defined optimizer is easier to add - user can now express it as
  a subgraph.
 - No more duplication logic inside the trainer and the parameter
  server mentioned in the background section.
 ### Challenges
 - It might be hard for the graph converter to cut a general graph
  (without any hint for which subgraph is the optimizer). We may need
  to label which subgraph inside the OP graph is the optimizer.
 - It's important to balance the parameter shards of on multiple
  parameter server. If a single parameter is very big (some
  word-embedding, fully connected, softmax layer), we need to
  automatically partition the single parameter onto different
  parameter servers when possible (only element-wise optimizer depends
  on the parameter variable).
 ### Discussion
 - In the "Aync SGD" figure, the "W" variable on the parameter server
  could be read and wrote concurrently, what is our locking strategy?
  E.g., each variable have a lock cpp method to be invoked by every
  OP, or, have a lock OP.
 - Can the Enqueue OP be implemented under our current tensor design
  (puts the input tensor into the queue tensor)?
 - *Dequeue* OP will have variable numbers of output (depends on the
  `min_count` attribute), does our current design support it? (similar
  question for the *Add* OP)
 ### References:
 [1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)
--- a/doc/design/ops/src/dist-graph.graffle
+++ b/doc/design/ops/src/dist-graph.graffle
--- a/doc/design/ops/src/dist-graph.png
+++ b/doc/design/ops/src/dist-graph.png
--- a/doc/design/ops/src/local-graph.graffle
+++ b/doc/design/ops/src/local-graph.graffle
--- a/doc/design/ops/src/local-graph.png
+++ b/doc/design/ops/src/local-graph.png
--- a/paddle/framework/CMakeLists.txt
+++ b/paddle/framework/CMakeLists.txt
@ -9,6 +9,7 @@ cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)
 cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor)
 cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor)
 nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
 cc_test(variable_test SRCS variable_test.cc)
--- a/paddle/framework/attribute.h
+++ b/paddle/framework/attribute.h
@ -45,7 +45,19 @@ class GreaterThanChecker {
 public:
  explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
  void operator()(T& value) const {
-    PADDLE_ENFORCE(value > lower_bound_, "larger_than check fail");
+    PADDLE_ENFORCE(value > lower_bound_, "larger_than check fails.");
  }
 private:
  T lower_bound_;
 };
 template <typename T>
 class EqualGreaterThanChecker {
 public:
  explicit EqualGreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
  void operator()(T& value) const {
    PADDLE_ENFORCE_GE(value, lower_bound_, "equal_larger_than check fails.");
  }
 private:
@ -115,6 +127,11 @@ class TypedAttrChecker {
    return *this;
  }
  TypedAttrChecker& EqualGreaterThan(const T& lower_bound) {
    value_checkers_.push_back(EqualGreaterThanChecker<T>(lower_bound));
    return *this;
  }
  // we can add more common limits, like LessThan(), Between()...
  TypedAttrChecker& SetDefault(const T& default_value) {
--- a/paddle/framework/backward.md
+++ b/paddle/framework/backward.md
@ -2,20 +2,20 @@
 ## Motivation
-In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
+In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
-  
+
 ## Backward Operator Registry
-A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
+A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.
 |                        | forward operator | backward operator 
 | ---------------------- | ---------------- |------------------------- |		
 | **Operator::inputs_**  | Inputs       | Inputs, Outputs, OutputGradients |	
 | **Operator::outputs_** | Outputs          | InputGradients            |
- In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
+ In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
-For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
+For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:
 ```cpp
 REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
 ## Backward Opeartor Creating
-Given a certain forward operator, we can get its corresponding backward opeartor by calling:
+Given a certain forward operator, we can get its corresponding backward operator by calling:
 ```cpp
 OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
-``` 
+```
 The function `BuildGradOp` will sequentially execute following processes:
 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
-2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
+2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building
 In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. 
-given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
+given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.
 1. Op 
-   when the input forward network is a Op, return its gradient Operator Immediately.
+   when the input forward network is an Op, return its gradient Operator Immediately.
 2. NetOp 
-   when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp.
+   when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
-   **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable.  
+   **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.  
   <p align="center">
-   <img src="./images/duplicate_op.png" width="70%" ><br/>
+   <img src="./images/duplicate_op.png" width="50%" ><br/>
-   1. shared variable in two operators. 
+   1. Shared variable in operators. 
   </p>
-   Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links. 
+   Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links. 
   <p align="center">
-   <img src="images/duplicate_op2.png" width="90%" ><br/>
+   <img src="images/duplicate_op2.png" width="50%" ><br/>
-   2. replace shared variable gradient with `Add` Operator
+   2. Replace shared variable's gradient with `Add` operator.
   </p>
--- a/paddle/framework/ddim.cc
+++ b/paddle/framework/ddim.cc
@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) {
 DDim::DDim(std::initializer_list<int64_t> init_list) {
  *this = make_ddim(init_list);
 }
 DDim flatten_to_2d(const DDim& src, int num_col_dims) {
  int rank = src.size();
  return make_ddim({product(slice_ddim(src, 0, num_col_dims)),
                    product(slice_ddim(src, num_col_dims, rank))});
 }
 DDim flatten_to_1d(const DDim& src) { return make_ddim({product(src)}); }
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/ddim.h
+++ b/paddle/framework/ddim.h
@ -115,6 +115,12 @@ int arity(const DDim& ddim);
 std::ostream& operator<<(std::ostream&, const DDim&);
 // Reshape a tensor to a matrix. The matrix's first dimension(column length)
 // will be the product of tensor's first `num_col_dims` dimensions.
 DDim flatten_to_2d(const DDim& src, int num_col_dims);
 DDim flatten_to_1d(const DDim& src);
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/eigen.h
+++ b/paddle/framework/eigen.h
@ -63,20 +63,35 @@ struct EigenTensor {
 template <typename T, int MajorType = Eigen::RowMajor,
          typename IndexType = Eigen::DenseIndex>
-struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {};
+struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {
  static typename EigenMatrix::Type Reshape(Tensor& tensor, int num_col_dims) {
    int rank = tensor.dims_.size();
    PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
                   "`num_col_dims` must be between (0, rank_of_tensor).");
    return EigenMatrix::From(tensor,
                             flatten_to_2d(tensor.dims(), num_col_dims));
  }
  static typename EigenMatrix::ConstType Reshape(const Tensor& tensor,
                                                 int num_col_dims) {
    int rank = tensor.dims_.size();
    PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
                   "`num_col_dims` must be between (0, rank_of_tensor).");
    return EigenMatrix::From(tensor,
                             flatten_to_2d(tensor.dims(), num_col_dims));
  }
 };
 template <typename T, int MajorType = Eigen::RowMajor,
          typename IndexType = Eigen::DenseIndex>
 struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> {
  // Flatten reshapes a Tensor into an EigenVector.
  static typename EigenVector::Type Flatten(Tensor& tensor) {
-    return EigenVector::From(
+    return EigenVector::From(tensor, {product(tensor.dims_)});
        tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
  }
  static typename EigenVector::ConstType Flatten(const Tensor& tensor) {
-    return EigenVector::From(
+    return EigenVector::From(tensor, {product(tensor.dims_)});
        tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
  }
 };
--- a/paddle/framework/eigen_test.cc
+++ b/paddle/framework/eigen_test.cc
@ -108,5 +108,24 @@ TEST(Eigen, Matrix) {
  }
 }
 TEST(Eigen, MatrixReshape) {
  Tensor t;
  float* p = t.mutable_data<float>({2, 3, 6, 4}, platform::CPUPlace());
  for (int i = 0; i < 2 * 3 * 6 * 4; ++i) {
    p[i] = static_cast<float>(i);
  }
  EigenMatrix<float>::Type em = EigenMatrix<float>::Reshape(t, 2);
  ASSERT_EQ(2 * 3, em.dimension(0));
  ASSERT_EQ(6 * 4, em.dimension(1));
  for (int i = 0; i < 2 * 3; i++) {
    for (int j = 0; j < 6 * 4; j++) {
      ASSERT_NEAR(i * 6 * 4 + j, em(i, j), 1e-6f);
    }
  }
 }
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/images/duplicate_op2.graffle
+++ b/paddle/framework/images/duplicate_op2.graffle
--- a/paddle/framework/images/duplicate_op2.png
+++ b/paddle/framework/images/duplicate_op2.png
--- a/paddle/framework/lod_tensor.h
+++ b/paddle/framework/lod_tensor.h
@ -18,8 +18,10 @@
 #ifndef PADDLE_ONLY_CPU
 #include <thrust/device_vector.h>
 #include <thrust/host_vector.h>
 #include <thrust/system/cuda/experimental/pinned_allocator.h>
 #endif
 #include <glog/logging.h>
 #include "paddle/framework/ddim.h"
 #include "paddle/framework/tensor.h"
 #include "paddle/platform/enforce.h"
@ -32,7 +34,8 @@ template <typename T>
 using Vector = std::vector<T>;
 #else
 template <typename T>
-using Vector = thrust::host_vector<T>;
+using Vector = thrust::host_vector<
    T, thrust::system::cuda::experimental::pinned_allocator<T>>;
 #endif
 using LoD = std::vector<Vector<size_t>>;
--- a/paddle/framework/lod_tensor_test.cu
+++ b/paddle/framework/lod_tensor_test.cu
@ -0,0 +1,52 @@
 /*
  Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 */
 #include <cuda.h>
 #include <cuda_runtime.h>
 #include "paddle/framework/lod_tensor.h"
 #include "paddle/platform/assert.h"
 #include <gtest/gtest.h>
 __global__ void test(size_t* a, int size) {
  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size;
       i += blockDim.x * gridDim.x) {
    a[i] *= 2;
  }
 }
 TEST(LoDTensor, LoDInGPU) {
  paddle::framework::Tensor tensor;
  paddle::framework::LoDTensor lod_tensor;
  paddle::platform::GPUPlace place(0);
  paddle::framework::LoD src_lod;
  src_lod.push_back(std::vector<size_t>{0, 2, 4, 6, 8, 10, 12, 14});
  tensor.Resize({14, 16});
  tensor.mutable_data<float>(place);
  lod_tensor.set_lod(src_lod);
  lod_tensor.set_tensor(&tensor);
  CHECK_EQ(lod_tensor.lod_element(0, 2), 4);
  CHECK_EQ(lod_tensor.lod_element(0, 4), 8);
  auto lod = lod_tensor.lod();
  test<<<1, 8>>>(lod[0].data(), lod[0].size());
  cudaDeviceSynchronize();
  for (size_t i = 0; i < src_lod[0].size(); ++i) {
    CHECK_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
  }
 }
--- a/paddle/framework/operator.cc
+++ b/paddle/framework/operator.cc
@ -123,6 +123,15 @@ OperatorBase::OperatorBase(const std::string& type,
  CheckAllInputOutputSet();
 }
 std::vector<std::string> OperatorBase::InputVars() const {
  std::vector<std::string> ret_val;
  for (auto& o : outputs_) {
    ret_val.reserve(ret_val.size() + o.second.size());
    ret_val.insert(ret_val.end(), o.second.begin(), o.second.end());
  }
  return ret_val;
 }
 std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
  std::vector<std::string> ret_val;
  if (has_intermediate) {
--- a/paddle/framework/operator.h
+++ b/paddle/framework/operator.h
@ -94,11 +94,14 @@ class OperatorBase {
  const VariableNameMap& Inputs() const { return inputs_; }
  const VariableNameMap& Outputs() const { return outputs_; }
  //! Get a input with argument's name described in `op_proto`
  std::string Input(const std::string& name) const;
  //! Get a input which has multiple variables.
  const std::vector<std::string>& Inputs(const std::string& name) const;
  std::vector<std::string> InputVars() const;
  //! Get a output with argument's name described in `op_proto`
  std::string Output(const std::string& name) const;
  //! Get an output which has multiple variables.
@ -311,9 +314,9 @@ class InferShapeContext {
  }
  template <typename T>
-  std::vector<const T*> MultiOutput(const std::string& name) const {
+  std::vector<T*> MultiOutput(const std::string& name) const {
    auto names = op_.Outputs(name);
-    std::vector<const T*> res;
+    std::vector<T*> res;
    res.reserve(names.size());
    std::transform(names.begin(), names.end(), std::back_inserter(res),
                   [&](const std::string& sub_name) {
--- a/paddle/framework/tensor.h
+++ b/paddle/framework/tensor.h
@ -43,6 +43,9 @@ class Tensor {
  template <typename T, size_t D, int MajorType, typename IndexType>
  friend struct EigenTensor;
  template <typename T, int MajorType, typename IndexType>
  friend struct EigenMatrix;
  template <typename T, int MajorType, typename IndexType>
  friend struct EigenVector;
--- a/paddle/framework/tensor_impl.h
+++ b/paddle/framework/tensor_impl.h
@ -151,5 +151,13 @@ inline const DDim& Tensor::dims() const { return dims_; }
 inline int64_t Tensor::numel() const { return numel_; }
 template <typename T>
 inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
  Tensor res;
  res.ShareDataWith<T>(src);
  res.Resize(flatten_to_2d(src.dims(), num_col_dims));
  return res;
 }
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/tensor_test.cc
+++ b/paddle/framework/tensor_test.cc
@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) {
  }
 #endif
 }
 TEST(Tensor, ReshapeToMatrix) {
  using namespace paddle::framework;
  using namespace paddle::platform;
  Tensor src;
  int* src_ptr = src.mutable_data<int>({2, 3, 4, 9}, CPUPlace());
  for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
    src_ptr[i] = i;
  }
  Tensor res = ReshapeToMatrix<int>(src, 2);
  ASSERT_EQ(res.dims()[0], 2 * 3);
  ASSERT_EQ(res.dims()[1], 4 * 9);
 }
--- a/paddle/gserver/layers/BatchNormBaseLayer.cpp
+++ b/paddle/gserver/layers/BatchNormBaseLayer.cpp
@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() {
  const ImageConfig& conf = config_.inputs(0).image_conf();
  imageH_ = inputLayers_[0]->getOutput().getFrameHeight();
  imageW_ = inputLayers_[0]->getOutput().getFrameWidth();
  imageD_ = inputLayers_[0]->getOutput().getFrameDepth();
  if (0 == imageD_) imageD_ = conf.img_size_z();
  if (imageH_ == 0 && imageW_ == 0) {
    imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
    imageW_ = conf.img_size();
  } else {
    getOutput().setFrameHeight(imageH_);
    getOutput().setFrameWidth(imageW_);
    getOutput().setFrameDepth(imageD_);
  }
-  imgPixels_ = imageH_ * imageW_;
+  imgPixels_ = imageH_ * imageW_ * imageD_;
 }
 }  // namespace paddle
--- a/Show More
+++ b/Show More