Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into quantize_transpiler_update

7 years ago · 8a850e21ad
parent 182b24ce3c ae7fb2a191
commit 8a850e21ad
98 changed files with 2323 additions and 986 deletions
--- a/README.md
+++ b/README.md
@ -19,7 +19,7 @@ Our vision is to enable deep learning for everyone via PaddlePaddle.
 Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.
-### Latest PaddlePaddle Release: [Fluid 0.14.0](https://github.com/PaddlePaddle/Paddle/tree/v0.14.0)
+### Latest PaddlePaddle Release: [Fluid 0.15.0](https://github.com/PaddlePaddle/Paddle/tree/v0.15.0)
 ### Install Latest Stable Release:
 ```
 # Linux CPU
@ -76,26 +76,26 @@ pip install paddlepaddle-gpu==0.14.0.post85
 ## Installation
-It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/beginners_guide/install/install_doc.html) on our website.
+It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/0.15.0/new_docs/beginners_guide/install/install_doc.html) on our website.
 ## Documentation
-We provide [English](http://paddlepaddle.org/documentation/docs/en/0.14.0/getstarted/index_en.html) and
+We provide [English](http://paddlepaddle.org/documentation/docs/en/0.15.0/getstarted/index_en.html) and
-[Chinese](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/beginners_guide/index.html) documentation.
+[Chinese](http://paddlepaddle.org/documentation/docs/zh/0.15.0/new_docs/beginners_guide/index.html) documentation.
 - [Deep Learning 101](https://github.com/PaddlePaddle/book)
  You might want to start from this online interactive book that can run in a Jupyter Notebook.
- [Distributed Training](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/user_guides/howto/training/cluster_howto.html)
+- [Distributed Training](http://paddlepaddle.org/documentation/docs/zh/0.15.0/new_docs/user_guides/howto/training/cluster_howto.html)
  You can run distributed training jobs on MPI clusters.
- [Python API](http://paddlepaddle.org/documentation/api/zh/0.14.0/fluid.html)
+- [Python API](http://paddlepaddle.org/documentation/api/zh/0.15.0/fluid.html)
   Our new API enables much shorter programs.
- [How to Contribute](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/advanced_usage/development/contribute_to_paddle.html)
+- [How to Contribute](http://paddlepaddle.org/documentation/docs/zh/0.15.0/new_docs/advanced_usage/development/contribute_to_paddle.html)
   We appreciate your contributions!
--- a/benchmark/fluid/fluid_benchmark.py
+++ b/benchmark/fluid/fluid_benchmark.py
@ -91,7 +91,8 @@ def dist_transpile(trainer_id, args, train_prog, startup_prog):
        program=train_prog,
        pservers=pserver_endpoints,
        trainers=trainers,
-        sync_mode=not args.async_mode)
+        sync_mode=not args.async_mode,
        startup_program=startup_prog)
    if training_role == "PSERVER":
        pserver_program = t.get_pserver_program(current_endpoint)
        pserver_startup_program = t.get_startup_program(
--- a/benchmark/fluid/models/resnet.py
+++ b/benchmark/fluid/models/resnet.py
--- a/doc/fluid/dev/releasing_process_cn.md
+++ b/doc/fluid/dev/releasing_process_cn.md
@ -1,24 +1,23 @@
 # PaddlePaddle发行规范
-PaddlePaddle使用git-flow branching model做分支管理，使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。
+PaddlePaddle使用Trunk Based Development，使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。
 PaddlePaddle每次发新的版本，遵循以下流程:
 1. 从`develop`分支派生出新的分支，分支名为`release/版本号`。例如，`release/0.10.0`
-1. 将新分支的版本打上tag，tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`，第二个为`0.10.0rc2`，依次类推。
+2. 将新分支的版本打上tag，tag为`版本号rc-Patch号`。例如，第一个tag为`0.10.0-rc0`。
-1. 对这个版本的提交，做如下几个操作:
+3. 新分支一般不接受新的feature和优化。QA在release分支上进行测试。研发基于最新的develop开发。
-  * 使用Regression Test List作为检查列表，测试本次release的正确性。
+4. QA和研发发现的bug，在develop上修复验证后，cherry-pick修复到release分支。直到release分支相对稳定。
-	  * 如果失败，记录下所有失败的例子，在这个`release/版本号`分支中，修复所有bug后，Patch号加一，到第二步
+5. 如果有需要，在release分支最新代码上打上新的tag，比如`0.10.0-rc1`，让更多的用户加入测试。重复3-4步。
-	* 修改`python/setup.py.in`中的版本信息,并将`istaged`字段设为`True`。
+6. release分支稳定后，打上正式的release tag，比如`0.10.0`。
-	* 将这个版本的python wheel包发布到pypi。
+7. 将这个版本的python wheel包发布到pypi。
-	* 更新Docker镜像（参考后面的操作细节）。
+8. 更新Docker镜像（参考后面的操作细节）。
 1. 第三步完成后，将`release/版本号`分支合入master分支，将master分支的合入commit打上tag，tag为`版本号`。同时再将`master`分支合入`develop`分支。
 1. 协同完成Release Note的书写。
 需要注意的是:
-* `release/版本号`分支一旦建立，一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭，方便测试人员测试PaddlePaddle的行为。
+* bug修复需要先在develop上进行，然后进入release分支。而不是直接在release分支上开发。
-* 在`release/版本号`分支存在的时候，如果有bugfix的行为，需要将bugfix的分支同时merge到`master`, `develop`和`release/版本号`这三个分支。
+
 * release分支原则上只接受修复类的修改，不接受新feature。
 ## 发布wheel包到pypi
@ -61,24 +60,21 @@ docker push [镜像]:[version]
 ## PaddlePaddle 分支规范
-PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范，并适应github的特性做了一些区别。
+PaddlePaddle开发过程使用[Trunk Based Development](https://trunkbaseddevelopment.com/) 开发规范。
 * PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中:
 	* `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。
 	* `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试，但并没有经过回归测试。
 	* `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。
-* 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范，但所有fork的版本库的所有分支都相当于特性分支。
+* `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试。并且会经过模型回归测试。
-	* 建议，开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支
+* `release/版本号`分支为每一次Release时建立的临时分支。release分支主要用于测试，bug修复和最终发版。
-	* 建议，开发者fork的版本库中，再基于`develop`版本fork出自己的功能分支。
+* `master`分支因为历史原因，已经废弃。
 	* 当功能分支开发完毕后，向PaddlePaddle的主版本库提交`Pull Reuqest`，进而进行代码评审。
 		* 在评审过程中，开发者修改自己的代码，可以继续在自己的功能分支提交代码。
-* BugFix分支也是在开发者自己的fork版本库维护，与功能分支不同的是，BugFix分支需要分别给主版本库的`master`、`develop`与可能有的`release/版本号`分支，同时提起`Pull Request`。
+* 其他开发者fork的feature branch。
 	* 建议，开发者的feature branch需要同步主版本库的`develop`分支。
 	* 建议，开发者的feature branch需要基于主版本库中的`develop`分支。
 	* 当feature branch开发完毕后，向PaddlePaddle的主版本库提交`Pull Reuqest`，进而进行代码评审。
 		* 在评审过程中，开发者修改自己的代码，可以继续在自己的feature branch提交代码。
 ## PaddlePaddle回归测试列表
-本列表说明PaddlePaddle发版之前需要测试的功能点。
+TODO
 ### PaddlePaddle Book中所有章节
--- a/doc/fluid/dev/releasing_process_en.md
+++ b/doc/fluid/dev/releasing_process_en.md
@ -4,26 +4,21 @@ PaddlePaddle manages its branches using "git-flow branching model", and [Semanti
 Each time we release a new PaddlePaddle version, we should follow the below steps:
-1. Fork a new branch from `develop` named `release/[version]`, e.g. `release/0.10.0`.
+1. Create a new release branch from `develop`，named `release/[version]`. E.g.，`release/0.10.0`
-1. Push a new tag on the release branch, the tag name should be like `[version]rc.patch`. The
+2. Create a new tag for the release branch, tag format: `version-rc.Patch`. E.g. the first tag is `0.10.0-rc0`。
-   first tag should be `0.10.0rc1`, and the second should be `0.10.0.rc2` and so on.
+3. New release branch normally doesn't accept new features or optimizations. QA will test on the release branch. Developer should develop based on `develop` branch.
-1. After that, we should do:
+4. If QA or Developer find bugs. They should first fix and verify on `develop` branch. Then cherry-pick the fix to the release branch. Wait until the release branch is stable.
-  * Run all regression test on the Regression Test List (see PaddlePaddle TeamCity CI), to confirm
+5. If necessary, create a new tag on the relese branch, e.g. `0.10.0-rc1`. Involve more users to try it and repeat step 3-4.
-      that this release has no major bugs.
+6. After release branch is stable，Create the official release tag，such as `0.10.0`.
-        * If regression test fails, we must fix those bugs and create a new `release/[version]`
+7. Release the python wheel package to pypi.
-          branch from previous release branch.
+8. Update the docker image (More details below).
-    * Modify `python/setup.py.in`, change the version number and change `ISTAGED` to `True`.
+
-    * Publish PaddlePaddle release wheel packages to pypi (see below instructions for detail).
+NOTE:
-    * Update the Docker images (see below instructions for detail).
+
-1. After above step, merge `release/[version]` branch to master and push a tag on the master commit,
+* bug fix should happen on `develop` branch, then cherry-pick to relese branch. Avoid developing directly on release branch.
-   then merge `master` to `develop`.
+
-1. Update the Release Note.          
+* release normally only accept bug fixes. Don't add new features.
-
+
 ***NOTE:***
 * Do ***NOT*** merge commits from develop branch to release branches to keep the release branch contain
  features only for current release, so that we can test on that version.
 * If we want to fix bugs on release branches, we must merge the fix to master, develop and release branch.
 ## Publish Wheel Packages to pypi
@ -97,26 +92,22 @@ You can then checkout the latest pushed tags at https://hub.docker.com/r/paddlep
 ## Branching Model
-We use [git-flow](http://nvie.com/posts/a-successful-git-branching-model/) as our branching model,
+PaddlePaddle uses [Trunk Based Development](https://trunkbaseddevelopment.com/) as our branching model.
-with some modifications:
+
-
+* `develop` branch is used for development. Each comment to `develop` branc goes through unit tests and model regression tests.
-* `master` branch is the stable branch. Each version on the master branch is tested and guaranteed.
+* `release/[version]` branch is used for each release. Release branch is used for tests, bug fix and evetual release.
-* `develop` branch is for development. Each commit on develop branch has passed CI unit test, but no
+* `master` branch as been deprecated for historical reasons
-  regression tests are run.
+
-* `release/[version]` branch is used to publish each release. Latest release version branches have
+* Developer's feature branch。
-  bugfix only for that version, but no feature updates.
+	* Developer's feature branch should sync with upstream `develop` branch.
-* Developer forks are not required to follow
+	* Developer's feature branch should be forked from upstream `develop` branch.
-  [git-flow](http://nvie.com/posts/a-successful-git-branching-model/)
+	* After feature branch is ready, create a `Pull Request` against the Paddle repo and go through code review.
-  branching model, all forks is like a feature branch.
+	   * In the review process, develop modify codes and push to their own feature branch.
    * Advise: developer fork's develop branch is used to sync up with main repo's develop branch.
    * Advise: developer use it's fork's develop branch to for new branch to start developing.
  * Use that branch on developer's fork to create pull requests and start reviews.
      * developer can push new commits to that branch when the pull request is open.
 * Bug fixes are also started from developers forked repo. And, bug fixes branch can merge to
  `master`, `develop` and `releases`.
 ## PaddlePaddle Regression Test List
 TODO
 ### All Chapters of PaddlePaddle Book
 We need to guarantee that all the chapters of PaddlePaddle Book can run correctly. Including
--- a/paddle/fluid/API.spec
+++ b/paddle/fluid/API.spec
@ -100,7 +100,7 @@ paddle.fluid.layers.gru_unit ArgSpec(args=['input', 'hidden', 'size', 'param_att
 paddle.fluid.layers.linear_chain_crf ArgSpec(args=['input', 'label', 'param_attr'], varargs=None, keywords=None, defaults=(None,))
 paddle.fluid.layers.crf_decoding ArgSpec(args=['input', 'param_attr', 'label'], varargs=None, keywords=None, defaults=(None,))
 paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None, defaults=None)
-paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,))
+paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
 paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None)
 paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,))
 paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(3, 1, None, None, None, None))
@ -142,7 +142,7 @@ paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 's
 paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None))
 paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None)
 paddle.fluid.layers.layer_norm ArgSpec(args=['input', 'scale', 'shift', 'begin_norm_axis', 'epsilon', 'param_attr', 'bias_attr', 'act', 'name'], varargs=None, keywords=None, defaults=(True, True, 1, 1e-05, None, None, None, None))
-paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,))
+paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
 paddle.fluid.layers.smooth_l1 ArgSpec(args=['x', 'y', 'inside_weight', 'outside_weight', 'sigma'], varargs=None, keywords=None, defaults=(None, None, None))
 paddle.fluid.layers.one_hot ArgSpec(args=['input', 'depth'], varargs=None, keywords=None, defaults=None)
 paddle.fluid.layers.autoincreased_step_counter ArgSpec(args=['counter_name', 'begin', 'step'], varargs=None, keywords=None, defaults=(None, 1, 1))
--- a/paddle/fluid/framework/CMakeLists.txt
+++ b/paddle/fluid/framework/CMakeLists.txt
@ -56,9 +56,9 @@ else()
  cc_test(mixed_vector_test SRCS mixed_vector_test.cc DEPS place memory device_context tensor)
 endif()
 if (NOT WIN32)
-cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor framework_proto recordio)
+cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor framework_proto recordio version)
 else()
-cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor framework_proto)
+cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor framework_proto version)
 endif (NOT WIN32)
 cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor memory)
@ -116,7 +116,11 @@ cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope gl
 endif(NOT WIN32)
 cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry device_context)
-cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog)
+
 cc_library(version SRCS version.cc)
 cc_test(version_test SRCS version_test.cc DEPS version)
 cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog version)
 cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
 nv_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
--- a/paddle/fluid/framework/details/multi_devices_graph_pass.cc
+++ b/paddle/fluid/framework/details/multi_devices_graph_pass.cc
@ -442,8 +442,7 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilder::ApplyImpl(
  use_gpu = nccl_ctxs_ != nullptr;
 #endif
-  if (use_gpu ||
+  if (use_gpu && strategy_.reduce_ == BuildStrategy::ReduceStrategy::kReduce) {
      strategy_.reduce_ == BuildStrategy::ReduceStrategy::kAllReduce) {
    // Insert BCast Ops
    for (size_t dev_id = 0; dev_id < bcast_var_name_set.size(); ++dev_id) {
      auto &to_bcast_set = bcast_var_name_set[dev_id];
--- a/paddle/fluid/framework/framework.proto
+++ b/paddle/fluid/framework/framework.proto
@ -16,6 +16,13 @@ syntax = "proto2";
 option optimize_for = LITE_RUNTIME;
 package paddle.framework.proto;
 // Any incompatible changes to ProgramDesc and its dependencies should
 // raise the version defined version.h.
 //
 // Serailization and Deserialization codes should be modified in a way
 // that supports old versions following the version and compatibility policy.
 message Version { optional int64 version = 1 [ default = 0 ]; }
 enum AttrType {
  INT = 0;
  FLOAT = 1;
@ -180,4 +187,8 @@ message BlockDesc {
 // for more details.
 // TODO(panyx0718): A model can have multiple programs. Need a
 // way to distinguish them. Maybe ID or name?
-message ProgramDesc { repeated BlockDesc blocks = 1; }
+message ProgramDesc {
  repeated BlockDesc blocks = 1;
  optional Version version = 2;
 }
--- a/paddle/fluid/framework/ir/CMakeLists.txt
+++ b/paddle/fluid/framework/ir/CMakeLists.txt
@ -19,7 +19,7 @@ function(pass_library TARGET DEST)
 endfunction()
 cc_library(node SRCS node.cc DEPS proto_desc)
-cc_library(graph SRCS graph.cc DEPS node)
+cc_library(graph SRCS graph.cc DEPS node pretty_log)
 cc_library(graph_helper SRCS graph_helper.cc DEPS graph)
 cc_library(pass SRCS pass.cc DEPS graph node graph_helper)
 cc_library(graph_traits SRCS graph_traits.cc DEPS graph)
@ -28,6 +28,9 @@ cc_library(graph_pattern_detector SRCS graph_pattern_detector.cc DEPS graph grap
 pass_library(graph_to_program_pass base)
 pass_library(graph_viz_pass base)
 pass_library(fc_fuse_pass inference)
 if(WITH_MKLDNN)
  pass_library(conv_relu_mkldnn_fuse_pass inference)
 endif()
 pass_library(attention_lstm_fuse_pass inference)
 pass_library(infer_clean_graph_pass inference)
 pass_library(fc_lstm_fuse_pass inference)
@ -42,3 +45,6 @@ cc_test(graph_helper_test SRCS graph_helper_test.cc DEPS graph graph_helper op_r
 cc_test(graph_to_program_pass_test SRCS graph_to_program_pass_test.cc DEPS graph_to_program_pass)
 cc_test(test_graph_pattern_detector SRCS graph_pattern_detector_tester.cc DEPS graph_pattern_detector)
 cc_test(test_fc_fuse_pass SRCS fc_fuse_pass_tester.cc DEPS fc_fuse_pass framework_proto)
 if(WITH_MKLDNN)
  cc_test(test_conv_relu_mkldnn_fuse_pass SRCS conv_relu_mkldnn_fuse_pass_tester.cc DEPS conv_relu_mkldnn_fuse_pass)
 endif()
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
@ -0,0 +1,90 @@
 // Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #include "paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h"
 #include <string>
 #include <vector>
 #include "paddle/fluid/platform/enforce.h"
 namespace paddle {
 namespace framework {
 namespace ir {
 std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
    std::unique_ptr<ir::Graph> graph) const {
  PADDLE_ENFORCE(graph.get());
  FusePassBase::Init("conv_relu_mkldnn_fuse", graph.get());
  std::unordered_set<Node*> nodes2delete;
  GraphPatternDetector gpd;
  auto* conv_input = gpd.mutable_pattern()
                         ->NewNode("conv_relu_mkldnn_fuse/conv_input")
                         ->AsInput()
                         ->assert_is_op_input("conv2d", "Input");
  patterns::ConvReLU conv_relu_pattern(gpd.mutable_pattern(),
                                       "conv_relu_mkldnn_fuse");
  conv_relu_pattern(conv_input);
  int found_conv_relu_count = 0;
  auto handler = [&](const GraphPatternDetector::subgraph_t& subgraph,
                     Graph* g) {
    VLOG(4) << "handle ConvReLU fuse";
    GET_IR_NODE_FROM_SUBGRAPH(conv_weight, conv_weight,
                              conv_relu_pattern);  // Filter
    GET_IR_NODE_FROM_SUBGRAPH(conv_bias, conv_bias, conv_relu_pattern);  // Bias
    GET_IR_NODE_FROM_SUBGRAPH(conv_out, conv_out, conv_relu_pattern);    // tmp
    GET_IR_NODE_FROM_SUBGRAPH(conv, conv, conv_relu_pattern);  // CONV op
    GET_IR_NODE_FROM_SUBGRAPH(relu_out, relu_out, conv_relu_pattern);  // Out
    GET_IR_NODE_FROM_SUBGRAPH(relu, relu, conv_relu_pattern);  // ReLU op
    // Create an ConvReLU Node.
    OpDesc desc;
    std::string conv_relu_i_in = subgraph.at(conv_input)->Name();
    std::string conv_relu_w_in = conv_weight->Name();
    std::string conv_relu_b_in = conv_bias->Name();
    std::string conv_relu_out = relu_out->Name();
    desc.SetInput("Input", std::vector<std::string>({conv_relu_i_in}));
    desc.SetInput("Filter", std::vector<std::string>({conv_relu_w_in}));
    desc.SetInput("Bias", std::vector<std::string>({conv_relu_b_in}));
    desc.SetOutput("Out", std::vector<std::string>({conv_relu_out}));
    desc.SetType("conv2d");
    for (auto& attr : conv->Op()->GetAttrMap()) {
      desc.SetAttr(attr.first, attr.second);
    }
    desc.SetAttr("fuse_relu", true);
    auto conv_relu_node = g->CreateOpNode(&desc);  // OpDesc will be copied.
    GraphSafeRemoveNodes(graph.get(), {conv, relu, conv_out});
    PADDLE_ENFORCE(subgraph.count(conv_input));
    IR_NODE_LINK_TO(subgraph.at(conv_input), conv_relu_node);
    IR_NODE_LINK_TO(conv_weight, conv_relu_node);
    IR_NODE_LINK_TO(conv_bias, conv_relu_node);
    IR_NODE_LINK_TO(conv_relu_node, relu_out);
    found_conv_relu_count++;
  };
  gpd(graph.get(), handler);
  AddStatis(found_conv_relu_count);
  return graph;
 }
 }  // namespace ir
 }  // namespace framework
 }  // namespace paddle
 REGISTER_PASS(conv_relu_mkldnn_fuse_pass,
              paddle::framework::ir::ConvReLUFusePass);
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h
@ -0,0 +1,39 @@
 // Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #pragma once
 #include "paddle/fluid/framework/ir/fuse_pass_base.h"
 #include "paddle/fluid/framework/ir/graph.h"
 #include "paddle/fluid/framework/ir/graph_pattern_detector.h"
 #include "paddle/fluid/framework/ir/pass.h"
 namespace paddle {
 namespace framework {
 namespace ir {
 /*
 * Fuse the CONV and ReLU to a ConvReLUOp.
 */
 class ConvReLUFusePass : public FusePassBase {
 public:
  virtual ~ConvReLUFusePass() {}
 protected:
  std::unique_ptr<ir::Graph> ApplyImpl(std::unique_ptr<ir::Graph> graph) const;
 };
 }  // namespace ir
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
@ -0,0 +1,108 @@
 // Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #include "paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h"
 #include <gtest/gtest.h>
 namespace paddle {
 namespace framework {
 namespace ir {
 void SetOp(ProgramDesc* prog, const std::string& type,
           const std::vector<std::string>& inputs,
           const std::vector<std::string>& outputs) {
  auto* op = prog->MutableBlock(0)->AppendOp();
  op->SetType(type);
  if (type == "conv2d") {
    op->SetAttr("use_mkldnn", true);
    op->SetInput("Input", {inputs[0]});
    op->SetInput("Filter", {inputs[1]});
    op->SetInput("Bias", {inputs[2]});
  } else if (type == "relu") {
    op->SetInput("X", inputs);
  }
  op->SetOutput("Out", outputs);
 }
 // a->OP0->b
 // b->OP1->c
 // (c, weights, bias)->conv->f
 // (f)->relu->g
 ProgramDesc BuildProgramDesc() {
  ProgramDesc prog;
  for (auto& v :
       std::vector<std::string>({"a", "b", "c", "weights", "bias", "f", "g"})) {
    auto* var = prog.MutableBlock(0)->Var(v);
    var->SetType(proto::VarType::SELECTED_ROWS);
    if (v == "weights" || v == "bias") {
      var->SetPersistable(true);
    }
  }
  SetOp(&prog, "OP0", std::vector<std::string>({"a"}),
        std::vector<std::string>({"b"}));
  SetOp(&prog, "OP1", std::vector<std::string>({"b"}),
        std::vector<std::string>({"c"}));
  SetOp(&prog, "conv2d", std::vector<std::string>({"c", "weights", "bias"}),
        std::vector<std::string>({"f"}));
  SetOp(&prog, "relu", std::vector<std::string>({"f"}),
        std::vector<std::string>({"g"}));
  return prog;
 }
 TEST(ConvReLUFusePass, basic) {
  auto prog = BuildProgramDesc();
  std::unique_ptr<ir::Graph> graph(new ir::Graph(prog));
  auto pass = PassRegistry::Instance().Get("conv_relu_mkldnn_fuse_pass");
  int original_nodes_num = graph->Nodes().size();
  graph = pass->Apply(std::move(graph));
  int current_nodes_num = graph->Nodes().size();
  // Remove 3 Nodes: CONV, RELU, conv_out
  // Add 1 Node: ConvReLU
  EXPECT_EQ(original_nodes_num - 2, current_nodes_num);
  // Assert conv_relu op in newly generated graph
  int conv_relu_count = 0;
  for (auto* node : graph->Nodes()) {
    if (node->IsOp() && node->Op()->Type() == "conv2d") {
      if (node->Op()->HasAttr("use_mkldnn")) {
        bool use_mkldnn = boost::get<bool>(node->Op()->GetAttr("use_mkldnn"));
        if (use_mkldnn) {
          if (node->Op()->HasAttr("fuse_relu")) {
            bool fuse_relu = boost::get<bool>(node->Op()->GetAttr("fuse_relu"));
            if (fuse_relu) {
              ++conv_relu_count;
            }
          }
        }
      }
    }
  }
  EXPECT_EQ(conv_relu_count, 1);
 }
 }  // namespace ir
 }  // namespace framework
 }  // namespace paddle
 USE_PASS(conv_relu_mkldnn_fuse_pass);
--- a/paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
@ -51,7 +51,7 @@ int BuildFusion(Graph* graph, const std::string& name_scope, Scope* scope,
    if (with_fc_bias) {
      // Add FC-bias with LSTM-bias and create a new weight
      PADDLE_ENFORCE(scope);
-      const std::string& new_bias_var = name_scope + "_bias.new";
+      const std::string& new_bias_var = patterns::UniqueKey("NewBias");
      auto* bias_var = scope->Var(new_bias_var);
      PADDLE_ENFORCE(bias_var);
      auto* bias_tensor = bias_var->GetMutable<framework::LoDTensor>();
@ -120,7 +120,6 @@ int BuildFusion(Graph* graph, const std::string& name_scope, Scope* scope,
  auto handler = [&](const GraphPatternDetector::subgraph_t& subgraph,
                     Graph* g) {
    GET_IR_NODE_FROM_SUBGRAPH(lstm, lstm, lstm_pattern);
    GET_IR_NODE_FROM_SUBGRAPH(Weight, Weight, lstm_pattern);
    GET_IR_NODE_FROM_SUBGRAPH(Bias, Bias, lstm_pattern);
@ -136,7 +135,7 @@ int BuildFusion(Graph* graph, const std::string& name_scope, Scope* scope,
                   fc_bias);
      // Remove unneeded nodes.
      std::unordered_set<const Node*> marked_nodes(
-          {mul, lstm, elementwise_add});
+          {mul, lstm, elementwise_add, fc_bias});
      GraphSafeRemoveNodes(graph, marked_nodes);
    } else {
      GET_IR_NODE_FROM_SUBGRAPH(fc_out, mul_out, fc_pattern);
--- a/paddle/fluid/framework/ir/graph_pattern_detector.cc
+++ b/paddle/fluid/framework/ir/graph_pattern_detector.cc
@ -21,12 +21,17 @@
 #include "paddle/fluid/framework/ir/graph_traits.h"
 #include "paddle/fluid/framework/ir/graph_viz_pass.h"
 #include "paddle/fluid/platform/enforce.h"
 #include "paddle/fluid/string/pretty_log.h"
 #include "paddle/fluid/string/printf.h"
 namespace paddle {
 namespace framework {
 namespace ir {
 using string::PrettyLogEndl;
 using string::PrettyLog;
 using string::Style;
 size_t PDPattern::id_ = 0UL;
 PDNode* PDPattern::NewNode(const std::string& name) {
@ -83,7 +88,7 @@ void GraphPatternDetector::operator()(Graph* graph,
  ValidateByNodeRole(&subgraphs);
  if (subgraphs.empty()) return;
-  LOG(INFO) << "detect " << subgraphs.size() << " subgraph matches the pattern";
+  PrettyLogEndl(Style::detail(), "---  detect %d subgraphs", subgraphs.size());
  int id = 0;
  for (auto& g : subgraphs) {
    VLOG(3) << "optimizing #" << id++ << " subgraph";
@ -517,6 +522,39 @@ bool VarLinksFromOp(Node* node, const std::string& op_type) {
  return false;
 }
 PDNode* patterns::ConvReLU::operator()(
    paddle::framework::ir::PDNode* conv_input) {
  // Create Operators
  conv_input->assert_is_op_input("conv2d", "Input");
  auto* conv_op = pattern->NewNode(conv_repr())->assert_is_op("conv2d");
  auto* relu_op = pattern->NewNode(relu_repr())->assert_is_op("relu");
  // Create variables
  // Filter
  auto* conv_weight_var = pattern->NewNode(conv_weight_repr())
                              ->AsInput()
                              ->assert_is_persistable_var()
                              ->assert_is_op_input("conv2d", "Filter");
  // Bias
  auto* conv_bias_var = pattern->NewNode(conv_bias_repr())
                            ->AsInput()
                            ->assert_is_persistable_var()
                            ->assert_is_op_input("conv2d", "Bias");
  // intermediate variable, will be removed in the IR after fuse.
  auto* conv_out_var = pattern->NewNode(conv_out_repr())
                           ->AsIntermediate()
                           ->assert_is_only_output_of_op("conv2d")
                           ->assert_is_op_input("relu");
  // output
  auto* relu_out_var = pattern->NewNode(relu_out_repr())
                           ->AsOutput()
                           ->assert_is_op_output("relu");
  conv_op->LinksFrom({conv_input, conv_weight_var, conv_bias_var})
      .LinksTo({conv_out_var});
  relu_op->LinksFrom({conv_out_var}).LinksTo({relu_out_var});
  return relu_out_var;
 }
 PDNode* patterns::FC::operator()(paddle::framework::ir::PDNode* x,
                                 bool with_bias) {
  // Create shared nodes.
--- a/paddle/fluid/framework/ir/graph_pattern_detector.h
+++ b/paddle/fluid/framework/ir/graph_pattern_detector.h
@ -360,6 +360,28 @@ struct PatternBase {
  size_t id_;
 };
 // CONV with ReLU
 // op: conv + relu
 // named nodes:
 // conv_input, conv_weight,
 // conv_bias, conv_out, conv,
 // relu_out, relu
 struct ConvReLU : public PatternBase {
  ConvReLU(PDPattern* pattern, const std::string& name_scope)
      : PatternBase(pattern, name_scope, "conv_relu") {}
  PDNode* operator()(PDNode* conv_input);
  // declare operator node's name
  PATTERN_DECL_NODE(conv);
  PATTERN_DECL_NODE(relu);
  // declare variable node's name
  PATTERN_DECL_NODE(conv_weight);
  PATTERN_DECL_NODE(conv_bias);
  PATTERN_DECL_NODE(conv_out);
  PATTERN_DECL_NODE(relu_out);
 };
 // FC with bias
 // op: mul + elementwise_add
 // named nodes:
--- a/paddle/fluid/framework/lod_tensor.cc
+++ b/paddle/fluid/framework/lod_tensor.cc
@ -21,6 +21,7 @@ limitations under the License. */
 #include "paddle/fluid/framework/framework.pb.h"
 #include "paddle/fluid/framework/lod_tensor.h"
 #include "paddle/fluid/framework/var_type.h"
 #include "paddle/fluid/framework/version.h"
 #include "paddle/fluid/memory/memcpy.h"
 #include "paddle/fluid/memory/memory.h"
@ -251,8 +252,8 @@ void AppendLoD(LoD *lod, const LoD &lod_length) {
 void SerializeToStream(std::ostream &os, const LoDTensor &tensor,
                       const platform::DeviceContext &dev_ctx) {
  {  // the 1st field, uint32_t version for LoDTensor
-    constexpr uint32_t version = 0;
+    os.write(reinterpret_cast<const char *>(&kCurTensorVersion),
-    os.write(reinterpret_cast<const char *>(&version), sizeof(version));
+             sizeof(kCurTensorVersion));
  }
  {
    // the 2st field, LoD information
@ -281,6 +282,8 @@ void DeserializeFromStream(std::istream &is, LoDTensor *tensor,
    // the 1st field, unit32_t version for LoDTensor
    uint32_t version;
    is.read(reinterpret_cast<char *>(&version), sizeof(version));
    PADDLE_ENFORCE(framework::IsTensorVersionSupported(version),
                   "tensor version %u is not supported.", version);
    PADDLE_ENFORCE_EQ(version, 0U, "Only version 0 is supported");
  }
  {
--- a/paddle/fluid/framework/operator.cc
+++ b/paddle/fluid/framework/operator.cc
@ -464,35 +464,35 @@ class RuntimeInferShapeContext : public InferShapeContext {
      : op_(op), scope_(scope) {}
  bool HasInput(const std::string& name) const override {
-    if (!op_.HasInputs(name)) {
+    // has only one input
    const auto& ins = op_.Inputs();
    auto it = ins.find(name);
    if (it == ins.end()) {
      return false;
    }
-    auto& ins = Inputs(name);
+    const auto& in = it->second;
-    size_t length = ins.size();
+    if (in.size() == 0 || in[0] == kEmptyVarName) {
    if (length == 0) {
      return false;
    }
-    PADDLE_ENFORCE_EQ(length, 1UL,
+    PADDLE_ENFORCE_EQ(in.size(), 1UL,
                      "Input %s should not have more than one inputs", name);
-    auto ipt = ins[0];
+    return scope_.FindVar(in[0]) != nullptr;
    auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
    return var != nullptr;
  }
  bool HasOutput(const std::string& name) const override {
-    if (!op_.HasOutputs(name)) {
+    // has only one output
    const auto& outs = op_.Outputs();
    auto it = outs.find(name);
    if (it == outs.end()) {
      return false;
    }
-    auto& outs = Outputs(name);
+    const auto& out = it->second;
-    size_t length = outs.size();
+    if (out.size() == 0 || out[0] == kEmptyVarName) {
    if (length == 0) {
      return false;
    }
-    PADDLE_ENFORCE_EQ(length, 1UL,
+    PADDLE_ENFORCE_EQ(out.size(), 1UL,
-                      "Output %s should not have more than one inputs", name);
+                      "Output %s should not have more than one outputs", name);
-    auto ipt = outs[0];
+    return scope_.FindVar(out[0]) != nullptr;
    auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
    return var != nullptr;
  }
  bool HasInputs(const std::string& name) const override {
--- a/paddle/fluid/framework/parallel_executor.cc
+++ b/paddle/fluid/framework/parallel_executor.cc
@ -352,7 +352,10 @@ void ParallelExecutor::FeedAndSplitTensorIntoLocalScopes(
 ParallelExecutor::~ParallelExecutor() {
  if (member_->own_local_scope_) {
    for (size_t i = 1; i < member_->local_scopes_.size(); ++i) {
-      member_->global_scope_->DeleteScope(member_->local_scopes_[i]);
+      Scope *local_scope = member_->local_scopes_[i];
      if (member_->global_scope_->HasKid(local_scope)) {
        member_->global_scope_->DeleteScope(local_scope);
      }
    }
  }
 }
--- a/paddle/fluid/framework/program_desc.cc
+++ b/paddle/fluid/framework/program_desc.cc
@ -15,6 +15,7 @@ limitations under the License. */
 #include "paddle/fluid/framework/program_desc.h"
 #include "paddle/fluid/framework/block_desc.h"
 #include "paddle/fluid/framework/feed_fetch_type.h"
 #include "paddle/fluid/framework/version.h"
 namespace paddle {
 namespace framework {
@ -38,7 +39,10 @@ proto::ProgramDesc *ProgramDesc::Proto() {
  return &desc_;
 }
 int64_t ProgramDesc::Version() const { return desc_.version().version(); }
 ProgramDesc::ProgramDesc() {
  desc_.mutable_version()->set_version(kCurProgramVersion);
  auto *block = desc_.mutable_blocks()->Add();
  block->set_idx(kRootBlockIndex);
  block->set_parent_idx(kNoneBlockIndex);
--- a/paddle/fluid/framework/program_desc.h
+++ b/paddle/fluid/framework/program_desc.h
@ -57,6 +57,8 @@ class ProgramDesc {
  proto::ProgramDesc *Proto();
  int64_t Version() const;
  // The output variable of feed_op is referenced as feed_target.
  // This function is used to collect the output variable's name of all
  // feed_ops.
--- a/paddle/fluid/framework/program_desc_test.cc
+++ b/paddle/fluid/framework/program_desc_test.cc
@ -87,8 +87,17 @@ TEST(ProgramDesc, copy_ctor) {
    ASSERT_EQ(op_origin->Inputs(), op_copy->Inputs());
    ASSERT_EQ(op_origin->Outputs(), op_copy->Outputs());
-    ASSERT_EQ(op_copy->Proto()->SerializeAsString(),
+    ASSERT_EQ(op_origin->Proto()->attrs().size(),
-              op_origin->Proto()->SerializeAsString());
+              op_copy->Proto()->attrs().size());
    for (auto it = op_origin->Proto()->attrs().begin();
         it != op_origin->Proto()->attrs().end(); ++it) {
      for (auto it_2 = op_copy->Proto()->attrs().begin();
           it_2 != op_copy->Proto()->attrs().end(); ++it_2) {
        if (it->name() == it_2->name()) {
          ASSERT_TRUE(it_2->SerializeAsString() == it->SerializeAsString());
        }
      }
    }
    if (op->Type() == "op_with_subblock") {
      ASSERT_EQ(1, op->GetBlockAttrId("sub_block"));
--- a/paddle/fluid/framework/rw_lock.h
+++ b/paddle/fluid/framework/rw_lock.h
@ -56,5 +56,76 @@ struct RWLock {
 };
 #endif
 class RWLockGuard {
 public:
  enum Status { kUnLock, kWRLock, kRDLock };
  RWLockGuard(RWLock* rw_lock, Status init_status)
      : lock_(rw_lock), status_(Status::kUnLock) {
    switch (init_status) {
      case Status::kRDLock: {
        RDLock();
        break;
      }
      case Status::kWRLock: {
        WRLock();
        break;
      }
      case Status::kUnLock: {
        break;
      }
    }
  }
  void WRLock() {
    switch (status_) {
      case Status::kUnLock: {
        lock_->WRLock();
        status_ = Status::kWRLock;
        break;
      }
      case Status::kWRLock: {
        break;
      }
      case Status::kRDLock: {
        PADDLE_THROW(
            "Please unlock read lock first before invoking write lock.");
        break;
      }
    }
  }
  void RDLock() {
    switch (status_) {
      case Status::kUnLock: {
        lock_->RDLock();
        status_ = Status::kRDLock;
        break;
      }
      case Status::kRDLock: {
        break;
      }
      case Status::kWRLock: {
        PADDLE_THROW(
            "Please unlock write lock first before invoking read lock.");
        break;
      }
    }
  }
  void UnLock() {
    if (status_ != Status::kUnLock) {
      lock_->UNLock();
      status_ = Status::kUnLock;
    }
  }
  ~RWLockGuard() { UnLock(); }
 private:
  RWLock* lock_;
  Status status_;
 };
 }  // namespace framework
 }  // namespace paddle
--- a/paddle/fluid/framework/scope.cc
+++ b/paddle/fluid/framework/scope.cc
@ -72,6 +72,12 @@ void Scope::DropKids() {
  kids_.clear();
 }
 bool Scope::HasKid(const Scope* scope) const {
  std::unique_lock<std::mutex> lock(mutex_);
  auto it = std::find(this->kids_.begin(), this->kids_.end(), scope);
  return it != this->kids_.end();
 }
 std::vector<std::string> Scope::LocalVarNames() const {
  std::unique_lock<std::mutex> lock(mutex_);
  std::vector<std::string> known_vars;
--- a/paddle/fluid/framework/scope.h
+++ b/paddle/fluid/framework/scope.h
@ -71,6 +71,9 @@ class Scope {
  /// Drop all kids scopes belonged to this scope.
  void DropKids();
  /// Find if a scope exists in the kid scopes
  bool HasKid(const Scope* scope) const;
  // enumerate all the variables current contains.
  std::vector<std::string> LocalVarNames() const;
--- a/Show More
+++ b/Show More