update to the develop branch.

8 years ago · cf2608e383
parent 64fe9bcc5c 154e1d0491
commit cf2608e383
131 changed files with 3175 additions and 1522 deletions
--- a/2
+++ b/2
@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/
 RUN apt-get update && \
    apt-get install -y \
-    git python-pip python-dev openssh-server bison  \
+    git python-pip python-dev openssh-server bison libnccl-dev \
    wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
    curl sed grep graphviz libjpeg-dev zlib1g-dev  \
    python-matplotlib gcc-4.8 g++-4.8 \
--- a/doc/design/block.md
+++ b/doc/design/block.md
@ -189,7 +189,7 @@ OpDesc {
  inputs = {0} // the index of x in vars of BlockDesc above
  outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
  attrs {
-    "memories" : {1} // the index of h
+    "states" : {1} // the index of h
    "step_net" : <above step net>
  }
 };
--- a/doc/design/register_grad_op.md
+++ b/doc/design/register_grad_op.md
@ -3,17 +3,17 @@
 ## The Problem Posed
-Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance.
+Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
-However, we noticed two problems with the current deisgn:
+However, we noticed two problems with the current design:
-1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
+1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
-1. Some operator's gradient computation requires more than one gradient operators.  For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator.  So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.
+1. For some operators, the gradient computation can be written in terms of existing operators.  For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator.  Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
 ## The Current Implementation
-The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is
+Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
 ```cpp
 struct OpInfo {
@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) {
 ## Proposed Solution
-The mapping relationship between an operator and its gradient operators is a function. The interface of that function is:
+The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:
 ```cpp
 // (OpDesc) --> vector<OpDesc>
 std::function<std::vector<OpDescBind>(const OpDescBind&)>;
 ```
-The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast.
+The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for  the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
-The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be
+The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like 
 ```cpp
 struct OpInfo {
@ -49,7 +49,7 @@ struct OpInfo {
 };
 ```
-The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators.
+The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
 We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) {
 We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
-We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
+We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
 The user interface should be
--- a/doc/faq/local/index_cn.rst
+++ b/doc/faq/local/index_cn.rst
@ -174,7 +174,7 @@ decoder_inputs = paddle.layer.fc(
 1. 两者都是对梯度的截断，但截断时机不同，前者在 :code:`optimzier` 更新网络参数时应用；后者在激活函数反向计算时被调用；
 2. 截断对象不同：前者截断可学习参数的梯度，后者截断回传给前层的梯度;
-除此之外，还可以通过减小学习律或者对数据进行归一化处理来解决这类问题。
+除此之外，还可以通过减小学习率或者对数据进行归一化处理来解决这类问题。
 5.  如何调用 infer 接口输出多个layer的预测结果
 -----------------------------------------------
--- a/paddle/capi/CMakeLists.txt
+++ b/paddle/capi/CMakeLists.txt
@ -28,23 +28,37 @@ add_style_check_target(paddle_capi ${CAPI_SOURCES} ${CAPI_HEADER}
 add_dependencies(paddle_capi paddle_proto)
-# combine all paddle static libraries together, into libpaddle_capi_whole.a
+# TODO: paddle_capi_whole will be removed.
-# user should use PaddleCAPI as -lpaddle_capi_whole
+if(MOBILE_INFERENCE)
-set(PADDLE_CAPI_INFER_LIBS
+    set(PADDLE_CAPI_INFER_LIBS
-    paddle_utils
+        paddle_utils
-    paddle_parameter
+        paddle_parameter
-    paddle_math
+        paddle_math
-    paddle_cuda
+        paddle_cuda
-    paddle_function
+        paddle_function
-    paddle_gserver
+        paddle_gserver
-    paddle_proto)
+        paddle_proto)
-
+else()
    set(PADDLE_CAPI_INFER_LIBS
        paddle_utils
        paddle_parameter
        paddle_math
        paddle_cuda
        paddle_function
        paddle_gserver
        paddle_proto
        paddle_pserver
        paddle_network)
 endif()
 cc_library(paddle_capi_whole DEPS paddle_capi ${PADDLE_CAPI_INFER_LIBS})
-# No shared library for iOS
+# Link the static library for inference
 cc_library(paddle_capi_engine DEPS paddle_capi paddle_utils paddle_parameter paddle_math paddle_cuda paddle_proto)
 cc_library(paddle_capi_layers DEPS paddle_function paddle_gserver)
 # Link the shared library for inference
 if(NOT IOS)
-  set(LINK_FLAGS " -Wl,--retain-symbols-file ${CMAKE_CURRENT_SOURCE_DIR}/export.sym -Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/export.map")
+  set(LINK_FLAGS "-Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/paddle_capi.map")
  # TODO: merge mkl into paddle_capi_shared
  add_library(paddle_capi_shared SHARED ${CAPI_SOURCES})
  set_target_properties(paddle_capi_shared	PROPERTIES LINK_FLAGS "${LINK_FLAGS}")
  target_include_directories(paddle_capi_shared PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
@ -53,9 +67,10 @@ endif()
 # install library & headers.
 install(FILES ${CAPI_HEADERS} DESTINATION include/paddle)
 install(FILES paddle_capi.map DESTINATION include/paddle)
 install(FILES ${CMAKE_CURRENT_BINARY_DIR}/config.h DESTINATION include/paddle)
 if(ANDROID)
-  install(TARGETS paddle_capi_whole paddle_capi_shared
+  install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers paddle_capi_shared
          ARCHIVE DESTINATION lib/${ANDROID_ABI}
          LIBRARY DESTINATION lib/${ANDROID_ABI})
  execute_process(
@ -80,7 +95,7 @@ if(ANDROID)
      )"
  )
 else(ANDROID)
-  install(TARGETS paddle_capi_whole ARCHIVE DESTINATION lib)
+  install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers ARCHIVE DESTINATION lib)
  if(NOT IOS)
    install(TARGETS paddle_capi_shared DESTINATION lib)
  endif()
--- a/paddle/capi/export.sym
+++ b/paddle/capi/export.sym
--- a/paddle/capi/paddle_capi.map
+++ b/paddle/capi/paddle_capi.map
--- a/paddle/framework/CMakeLists.txt
+++ b/paddle/framework/CMakeLists.txt
@ -19,16 +19,15 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope)
 proto_library(framework_proto SRCS framework.proto)
 cc_library(attribute SRCS attribute.cc DEPS framework_proto)
-cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info)
+cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc)
 cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc
 device_context)
 cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute)
 cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
 cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
-cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope proto_desc glog)
+cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog)
 cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
 cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info operator)
-cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog)
+cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
 cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
 py_proto_compile(framework_py_proto SRCS framework.proto)
@ -44,7 +43,7 @@ add_custom_command(TARGET framework_py_proto POST_BUILD
 cc_library(backward SRCS backward.cc DEPS net_op)
 cc_test(backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context)
-cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward)
+cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward glog)
 cc_library(prune SRCS prune.cc DEPS framework_proto)
 cc_test(prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context)
--- a/paddle/framework/backward.cc
+++ b/paddle/framework/backward.cc
@ -21,6 +21,7 @@
 #include "paddle/framework/block_desc.h"
 #include "paddle/framework/op_registry.h"
 #include "paddle/operators/dynamic_recurrent_op.h"
 #include "paddle/operators/net_op.h"
 #include "paddle/operators/recurrent_op.h"
@ -220,8 +221,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
    // process recurrent gradient op as a special operator.
    if (forwardOp.Type() == "recurrent") {
      // NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
-      // or
+      // or this will result in infinite loop.
      // this will result in infinite loop.
      const auto& rnnop =
          *static_cast<const operators::RecurrentOp*>(&forwardOp);
      auto rnn_grad_op =
@ -231,6 +231,18 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
      // create stepnet's gradient op
      rnn_grad_op->set_stepnet(
          BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
    } else if (forwardOp.Type() == "dynamic_recurrent") {
      // NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
      // or this will result in infinite loop.
      const auto& rnnop =
          *static_cast<const operators::DynamicRecurrentOp*>(&forwardOp);
      auto rnn_grad_op =
          static_cast<operators::DynamicRecurrentGradientOp*>(grad_op.get());
      const auto& stepnet_op =
          *static_cast<const OperatorBase*>(&rnnop.rnn.GetStepUnit());
      // create stepnet's gradient op
      rnn_grad_op->rnn.SetStepUnit(
          BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
    }
    if (net->ops_.empty()) {  // Current no aux op is added to network
--- a/paddle/framework/block_desc.cc
+++ b/paddle/framework/block_desc.cc
@ -41,6 +41,19 @@ bool BlockDescBind::HasVar(const std::string &name) const {
  return vars_.find(name) != vars_.end();
 }
 VarDescBind *BlockDescBind::FindVarRecursive(const std::string &name) const {
  auto it = vars_.find(name);
  if (it == vars_.end()) {
    return Parent() == kNoneBlockIndex ? nullptr
                                       : ParentBlock()->FindVarRecursive(name);
  }
  return it->second.get();
 }
 bool BlockDescBind::HasVarRecursive(const std::string &name) const {
  return FindVarRecursive(name) != nullptr;
 }
 std::vector<VarDescBind *> BlockDescBind::AllVars() const {
  std::vector<VarDescBind *> res;
  for (const auto &p : vars_) {
@ -97,7 +110,7 @@ void BlockDescBind::Flush() {
 }
 BlockDescBind *BlockDescBind::ParentBlock() const {
-  if (this->desc_->parent_idx() == -1) {
+  if (this->desc_->parent_idx() == kNoneBlockIndex) {
    return nullptr;
  }
  return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
--- a/paddle/framework/block_desc.h
+++ b/paddle/framework/block_desc.h
@ -21,6 +21,7 @@ limitations under the License. */
 #include <vector>
 #include "paddle/framework/op_desc.h"
 #include "paddle/framework/proto_desc.h"
 #include "paddle/framework/var_desc.h"
 #include "paddle/platform/macros.h"
@ -56,6 +57,10 @@ class BlockDescBind {
  bool HasVar(const std::string &var_name) const;
  VarDescBind *FindVarRecursive(const std::string &name_bytes) const;
  bool HasVarRecursive(const std::string &var_name) const;
  std::set<std::string> LocalVarNames() const {
    std::set<std::string> var_names;
    for (auto &var : vars_) {
--- a/paddle/framework/data_type.h
+++ b/paddle/framework/data_type.h
@ -26,6 +26,8 @@ inline DataType ToDataType(std::type_index type) {
    return DataType::FP64;
  } else if (typeid(int).hash_code() == type.hash_code()) {
    return DataType::INT32;
  } else if (typeid(int64_t).hash_code() == type.hash_code()) {
    return DataType::INT64;
  } else {
    PADDLE_THROW("Not supported");
  }
--- a/paddle/framework/executor.cc
+++ b/paddle/framework/executor.cc
@ -68,9 +68,13 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
  for (auto& var : block.vars()) {
    if (var.persistable()) {
-      scope->Var(var.name());
+      auto* ptr = scope->Var(var.name());
      VLOG(3) << "Create Variable " << var.name()
              << " global, which pointer is " << ptr;
    } else {
-      local_scope.Var(var.name());
+      auto* ptr = local_scope.Var(var.name());
      VLOG(3) << "Create Variable " << var.name()
              << " locally, which pointer is " << ptr;
    }
  }
@ -80,8 +84,7 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
    op->Run(local_scope, *device);
  }
-  // TODO(tonyyang-svail):
+  scope->DeleteScope(&local_scope);
  //  - Destroy local_scope
 }
 }  // namespace framework
--- a/paddle/framework/feed_fetch_method.h
+++ b/paddle/framework/feed_fetch_method.h
@ -13,37 +13,45 @@ See the License for the specific language governing permissions and
 limitations under the License. */
 #pragma once
 #include "glog/logging.h"
 #include "paddle/framework/feed_fetch_type.h"
 #include "paddle/framework/scope.h"
 #include "paddle/framework/variable.h"
 namespace paddle {
 namespace framework {
-template <typename T>
+void SetFeedVariable(Scope* scope, const LoDTensor& input,
-void SetFeedVariable(const LoDTensor& input, const std::string& var_name,
+                     const std::string& var_name, size_t index) {
                     size_t index) {
  // If var_name Variable is not found in GlobalScope, a new variable will
  // be created.
-  Variable* g_feed_value = GetGlobalScope().Var(var_name);
+  VLOG(3) << "SetFeedVariable name=" << var_name << " index=" << index;
  Variable* g_feed_value = scope->Var(var_name);
  auto& feed_inputs =
      *(g_feed_value->GetMutable<std::vector<paddle::framework::LoDTensor>>());
  if (index >= feed_inputs.size()) {
    feed_inputs.resize(index + 1);
  }
  // shared data with input tensor
-  feed_inputs[index].ShareDataWith<T>(input);
+  feed_inputs[index].ShareDataWith(input);
  // set lod
  feed_inputs[index].set_lod(input.lod());
 }
-LoDTensor& GetFetchVariable(const std::string& var_name, size_t index) {
+LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
                            size_t index) {
  // Since we want to fetch LodTensor from a variable, the variable must
  // be created alreadly.
-  Variable* g_fetch_value = GetGlobalScope().FindVar(var_name);
+  Variable* g_fetch_value = scope.FindVar(var_name);
-  auto& fetch_outputs =
+  PADDLE_ENFORCE(g_fetch_value->IsType<FeedFetchList>(),
-      *(g_fetch_value->GetMutable<std::vector<paddle::framework::LoDTensor>>());
+                 "Only %s can be invoked by GetFetchVariable",
                 typeid(FeedFetchList).name());
  auto& fetch_outputs = *g_fetch_value->GetMutable<FeedFetchList>();
  auto& tensor = fetch_outputs[index];
  VLOG(3) << "Fetch " << var_name << " with index " << index
          << " shape= " << tensor.dims();
  PADDLE_ENFORCE_LT(index, fetch_outputs.size());
-  return fetch_outputs[index];
+  return tensor;
 }
 }  // namespace framework
--- a/paddle/framework/framework.proto
+++ b/paddle/framework/framework.proto
@ -68,6 +68,7 @@ message OpProto {
    optional bool duplicable = 3 [ default = false ];
    optional bool intermediate = 4 [ default = false ];
    optional bool dispensable = 5 [ default = false ];
  }
  // AttrProto describes the C++ type Attribute.
@ -112,6 +113,8 @@ message VarDesc {
  enum VarType {
    LOD_TENSOR = 1;
    SELECTED_ROWS = 2;
    FEED_MINIBATCH = 3;
    FETCH_LIST = 4;
  }
  required string name = 1;
  required VarType type = 2;
--- a/paddle/framework/lod_tensor.cc
+++ b/paddle/framework/lod_tensor.cc
@ -25,31 +25,50 @@ LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end) {
  for (size_t i = level_begin; i < level_end; i++) {
    new_lod.emplace_back(in.at(i));
  }
  // transform the lowest level to absolute offset.
  LoD abs_offset_lod = ToAbsOffset(in);
  new_lod.back() = abs_offset_lod[level_end - 1];
  return new_lod;
 }
 LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
                 size_t elem_end) {
-  // slice the lod.
+  PADDLE_ENFORCE_LT(level, in.size());
-  LoD new_lod;
+  PADDLE_ENFORCE_LT(elem_end, in[level].size());
-  new_lod.reserve(in.size() - level);
+
-  auto start = in.at(level)[elem_begin];
+  LoD res;
-  auto end = in.at(level)[elem_end];
+  res.resize(in.size() - level);
-
+  // copy the first level
-  for (auto it = in.begin() + level; it != in.end(); it++) {
+  res[0].assign(in[level].begin() + elem_begin,
-    auto it_begin = std::find(it->begin(), it->end(), start);
+                in[level].begin() + elem_end + 1);
-    auto it_end = std::find(it_begin, it->end(), end);
+  for (size_t lvl = 1; lvl < res.size(); lvl++) {
-    PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info");
+    const auto& in_level = in[level + lvl];
-    PADDLE_ENFORCE(it_end != it->end(), "error in parsing lod info");
+    const auto& above_level = res[lvl - 1];
-    new_lod.emplace_back(it_begin, it_end + 1);
+    auto& out_level = res[lvl];
-    // reset offset if tensor is copyed and sliced.
+    out_level.assign(in_level.begin() + above_level.front(),
-    std::transform(new_lod.back().begin(), new_lod.back().end(),
+                     in_level.begin() + above_level.back() + 1);
                   new_lod.back().begin(),
                   [start](int v) { return v - start; });
    PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LoD");
  }
-  PADDLE_ENFORCE_LE(new_lod.size(), in.size());
+  for (size_t lvl = 0; lvl < res.size(); lvl++) {
-  return new_lod;
+    // to make the first offset equals 0, all the elements minus the first
    // element
    size_t front = res[lvl].front();
    for (auto& ele : res[lvl]) {
      ele -= front;
    }
  }
  return res;
 }
 LoD ToAbsOffset(const LoD& in) {
  // the lowest level stores relative offsets
  if (in.empty() || in.size() == 1) return in;
  LoD result = in;
  for (int level = result.size() - 2; level >= 0; level--) {
    for (auto& ele : result[level]) {
      ele = result[level + 1][ele];
    }
  }
  return result;
 }
 bool operator==(const LoD& a, const LoD& b) {
@ -75,17 +94,7 @@ bool operator==(const LoD& a, const LoD& b) {
 size_t LoDTensor::NumElements(size_t level, size_t idx) const {
  PADDLE_ENFORCE_LT(level, NumLevels());
  PADDLE_ENFORCE_LT(idx, NumElements(level));
-  // the last level of LoD, just return number of records in Tensor
+  return lod_[level][idx + 1] - lod_[level][idx];
  if (level == NumLevels() - 1) {
    return lod_[level][idx + 1] - lod_[level][idx];
  }
  // high level of LoD, and there is another lower level, return number of
  // lower-level elements
  auto tmp = SliceInLevel(lod_, level, idx, idx + 1);
  PADDLE_ENFORCE_GE(tmp.size(), 2);
  // there is a 0 as a placeholder stored in LoD, so the number of elements
  // equals lod.size() - 1
  return tmp[1].size() - 1;
 }
 void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
--- a/paddle/framework/lod_tensor.h
+++ b/paddle/framework/lod_tensor.h
@ -39,23 +39,36 @@ using Vector = thrust::host_vector<
 #endif
 /*
- * 3-level LoD stores
+ * LoD is short for Level of Details.
 *
- * 0 10 20
+ * - in a level, each element indicates relative offset of the lower level
 * 0 5 10 15 20
 * 0 2 5 7 10 12 15 20
 *
 * - in a level, each element indicates offset in the underlying Tensor
 * - the first element should be 0 and that indicates that this sequence start
 * from 0
 * - each sequence's begin and end(no-inclusive) is level[id, id+1]
 *
 * For example:
 *    3-level LoD stores
 *
 *    0 2 3
 *    0 2 4 7
 *    0 2 5 7 10 12 15 20
 */
 using LoD = std::vector<Vector<size_t>>;
 /*
 * Slice levels from a LoD.
 * NOTE the lowest level should always be the absolute offsets of the underlying
 * tensor instances. So if higher layers are sliced without the lowest level,
 * the lower level of the sliced LoD will be transformed to the absolute offset.
 */
 LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
 LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
                 size_t elem_end);
 /*
 * Transform an LoD from relative offsets to absolute offsets.
 */
 LoD ToAbsOffset(const LoD& in);
 bool operator==(const LoD& a, const LoD& b);
--- a/paddle/framework/lod_tensor_test.cc
+++ b/paddle/framework/lod_tensor_test.cc
@ -30,8 +30,8 @@ class LoDTensorTester : public ::testing::Test {
    // 0 5 10 15 20
    // 0 2 5 7 10 12 15 20
    LoD lod;
-    lod.push_back(std::vector<size_t>{0, 10, 20});
+    lod.push_back(std::vector<size_t>{0, 2, 3});
-    lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20});
+    lod.push_back(std::vector<size_t>{0, 2, 5, 8});
    lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20});
    ASSERT_EQ(lod.size(), 3UL);
@ -52,14 +52,14 @@ TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); }
 TEST_F(LoDTensorTester, NumElements) {
  ASSERT_EQ(lod_tensor_.NumElements(0), 2UL);
-  ASSERT_EQ(lod_tensor_.NumElements(1), 4UL);
+  ASSERT_EQ(lod_tensor_.NumElements(1), 3UL);
  ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
 }
 TEST_F(LoDTensorTester, NumElements2) {
  ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
-  ASSERT_EQ(lod_tensor_.NumElements(0, 1), 2UL);
+  ASSERT_EQ(lod_tensor_.NumElements(0, 1), 1UL);
-  ASSERT_EQ(lod_tensor_.NumElements(1, 1), 2UL);
+  ASSERT_EQ(lod_tensor_.NumElements(1, 1), 3UL);
 }
 TEST_F(LoDTensorTester, ShrinkLevels) {
@ -68,17 +68,16 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
    LoDTensor new_lod_tensor = lod_tensor_;
    new_lod_tensor.ShrinkLevels(level, level + 1);
    ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
    ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
    ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
  }
  // shrink 2 level
  for (size_t level = 0; level < 2UL; ++level) {
    LoDTensor new_lod_tensor = lod_tensor_;
    new_lod_tensor.ShrinkLevels(level, level + 2);
    // the lowest level's last element should be the tensor's batch_size.
    ASSERT_EQ(new_lod_tensor.lod().back().back(),
              lod_tensor_.lod().back().back());
    ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
    ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
    ASSERT_EQ(new_lod_tensor.NumElements(1),
              lod_tensor_.NumElements(level + 1));
    ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
  }
 }
@ -86,19 +85,19 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
 TEST_F(LoDTensorTester, ShrinkInLevel) {
  size_t level = 0;
  LoDTensor new_lod_tensor = lod_tensor_;
-  new_lod_tensor.ShrinkInLevel(level, 0, 2);
+  new_lod_tensor.ShrinkInLevel(level, 0, 1);
  EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL);
-  EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL);
+  EXPECT_EQ(new_lod_tensor.NumElements(0), 1UL);
-  EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL);
+  EXPECT_EQ(new_lod_tensor.NumElements(1), 2UL);
-  EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL);
+  EXPECT_EQ(new_lod_tensor.NumElements(2), 5UL);
  ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
  level = 1;
  new_lod_tensor = lod_tensor_;
-  new_lod_tensor.ShrinkInLevel(level, 0, 2);
+  new_lod_tensor.ShrinkInLevel(level, 1, 2);
  ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
-  ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL);
+  ASSERT_EQ(new_lod_tensor.NumElements(0), 1UL);
-  ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL);
+  ASSERT_EQ(new_lod_tensor.NumElements(1), 3UL);
  ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
 }
--- a/paddle/framework/op_proto_maker.h
+++ b/paddle/framework/op_proto_maker.h
@ -44,6 +44,11 @@ class OpProtoAndCheckerMaker {
      var_->set_intermediate(true);
      return *this;
    }
    VariableBuilder& AsDispensable() {
      var_->set_dispensable(true);
      return *this;
    }
  };
  VariableBuilder AddInput(const std::string& name, const std::string& comment);
--- a/paddle/framework/operator.cc
+++ b/paddle/framework/operator.cc
@ -252,5 +252,20 @@ std::ostream& operator<<(std::ostream& os,
  return os;
 }
 bool OpSupportGPU(const std::string& op_type) {
  auto& all_kernels = OperatorWithKernel::AllOpKernels();
  auto it = all_kernels.find(op_type);
  if (it == all_kernels.end()) {
    // All control operator must support GPU
    return true;
  }
  for (auto& kern_pair : it->second) {
    if (platform::is_gpu_place(kern_pair.first.place_)) {
      return true;
    }
  }
  return false;
 }
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/operator.h
+++ b/paddle/framework/operator.h
@ -327,37 +327,47 @@ class CompileTimeInferShapeContext : public InferShapeContext {
  bool HasInput(const std::string& name) const override {
    const std::vector<std::string>& input_names = op_.Input(name);
    auto length = input_names.size();
    if (length == 0) {
      return false;
    }
    PADDLE_ENFORCE_EQ(length, 1UL,
                      "Input(%s) should have only one value, "
                      "but it have %d now",
                      name, length);
-    return block_.HasVar(input_names[0]);
+    return block_.HasVarRecursive(input_names[0]);
  }
  bool HasOutput(const std::string& name) const override {
    const std::vector<std::string>& output_names = op_.Output(name);
    auto length = output_names.size();
    if (length == 0) {
      return false;
    }
    PADDLE_ENFORCE_EQ(length, 1UL,
                      "Output(%s) should have only one value, "
                      "but it have %d now",
                      name, length);
-    return block_.HasVar(output_names[0]);
+    return block_.HasVarRecursive(output_names[0]);
  }
  bool HasInputs(const std::string& name) const override {
    const std::vector<std::string>& input_names = op_.Input(name);
-    PADDLE_ENFORCE(!input_names.empty(), "Inputs(%s) length is 0", name);
+    if (input_names.empty()) {
      return false;
    }
    for (auto& input : input_names) {
-      if (!block_.HasVar(input)) return false;
+      if (!block_.HasVarRecursive(input)) return false;
    }
    return true;
  }
  bool HasOutputs(const std::string& name) const override {
    const std::vector<std::string>& output_names = op_.Output(name);
-    PADDLE_ENFORCE(!output_names.empty(), "Inputs(%s) length is 0", name);
+    if (output_names.empty()) {
      return false;
    }
    for (auto& output : output_names) {
-      if (!block_.HasVar(output)) return false;
+      if (!block_.HasVarRecursive(output)) return false;
    }
    return true;
  }
@ -404,11 +414,11 @@ class CompileTimeInferShapeContext : public InferShapeContext {
 private:
  DDim GetDim(const std::string& name) const override {
-    return framework::make_ddim(block_.FindVar(name)->Shape());
+    return framework::make_ddim(block_.FindVarRecursive(name)->Shape());
  }
  void SetDim(const std::string& name, const DDim& dim) override {
-    block_.FindVar(name)->SetShape(framework::vectorize(dim));
+    block_.FindVarRecursive(name)->SetShape(framework::vectorize(dim));
  }
  const OpDescBind& op_;
@ -421,13 +431,27 @@ class RuntimeInferShapeContext : public InferShapeContext {
      : op_(op), scope_(scope) {}
  bool HasInput(const std::string& name) const override {
-    auto ipt = op_.Input(name);
+    auto& ins = Inputs(name);
    size_t length = ins.size();
    if (length == 0) {
      return false;
    }
    PADDLE_ENFORCE_EQ(length, 1UL, "Input %s should have more than one inputs",
                      name);
    auto ipt = ins[0];
    auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
    return var != nullptr;
  }
  bool HasOutput(const std::string& name) const override {
-    auto ipt = op_.Output(name);
+    auto& outs = Outputs(name);
    size_t length = outs.size();
    if (length == 0) {
      return false;
    }
    PADDLE_ENFORCE_EQ(length, 1UL, "Output %s should have more than one inputs",
                      name);
    auto ipt = outs[0];
    auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
    return var != nullptr;
  }
@ -649,5 +673,7 @@ class OperatorWithKernel : public OperatorBase {
 std::ostream& operator<<(std::ostream& os,
                         const OperatorWithKernel::OpKernelKey& kernel_key);
 extern bool OpSupportGPU(const std::string& op_type);
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/framework/program_desc.cc
+++ b/paddle/framework/program_desc.cc
@ -35,8 +35,8 @@ ProgramDesc *ProgramDescBind::Proto() {
 ProgramDescBind::ProgramDescBind() {
  auto *block = prog_.mutable_blocks()->Add();
-  block->set_idx(0);
+  block->set_idx(kRootBlockIndex);
-  block->set_parent_idx(-1);
+  block->set_parent_idx(kNoneBlockIndex);
  blocks_.emplace_back(new BlockDescBind(this, block));
 }
--- a/paddle/framework/program_desc.h
+++ b/paddle/framework/program_desc.h
@ -17,6 +17,7 @@ limitations under the License. */
 #include <memory>
 #include <vector>
 #include "paddle/framework/framework.pb.h"
 #include "paddle/framework/proto_desc.h"
 #include "paddle/platform/macros.h"
 namespace paddle {
--- a/paddle/framework/program_desc_test.cc
+++ b/paddle/framework/program_desc_test.cc
@ -80,4 +80,4 @@ TEST(ProgramDesc, copy_ctor) {
  // different and it is correct.
 }
 }  // namespace framework
-}  // namespace paddle
+}  // namespace paddle
--- a/paddle/framework/proto_desc.h
+++ b/paddle/framework/proto_desc.h
@ -0,0 +1,26 @@
 /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */
 #pragma once
 namespace paddle {
 namespace framework {
 // The Index of first Block in Program. also called root block.
 constexpr int kRootBlockIndex = 0;
 // The Parent Index of root Block, this block does not exist.
 constexpr int kNoneBlockIndex = -1;
 }  // namespace framework
 }  // namespace paddle
--- a/Show More
+++ b/Show More