Enable the detection of subgraph composed of grad ops (#21223)
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=developrevert-22710-feature/integrated_ps_api
parent
50af6b5d79
commit
dcfb603897
@ -1,9 +1,11 @@
|
||||
cc_library(code_generator SRCS operation.cc code_generator.cc code_generator_helper.cc DEPS graph)
|
||||
cc_library(code_generator
|
||||
SRCS operation.cc code_generator.cc code_generator_helper.cc
|
||||
DEPS graph subgraph_detector)
|
||||
if(WITH_GPU)
|
||||
cc_test(test_code_generator SRCS code_generator_tester.cc DEPS code_generator device_code lod_tensor graph_viz_pass)
|
||||
endif()
|
||||
|
||||
cc_library(fusion_group_pass
|
||||
SRCS fusion_group_pass.cc elementwise_group_detector.cc
|
||||
DEPS graph_pattern_detector pass code_generator)
|
||||
DEPS subgraph_detector fuse_pass_base code_generator device_code)
|
||||
cc_test(test_fusion_group_pass SRCS fusion_group_pass_tester.cc DEPS fusion_group_pass graph_viz_pass)
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,39 @@
|
||||
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import print_function
|
||||
|
||||
import unittest
|
||||
import paddle.fluid as fluid
|
||||
import paddle.fluid.core as core
|
||||
from test_eager_deletion_padding_rnn import RNNConfig, PaddingRNNTestBase
|
||||
|
||||
|
||||
class FusionGroupPaddingRNNTest(PaddingRNNTestBase):
|
||||
def set_customed_config(self):
|
||||
self.build_strategy.enable_auto_fusion = True
|
||||
|
||||
# Use CUDA executor
|
||||
if core.is_compiled_with_cuda():
|
||||
self.exe = fluid.Executor(fluid.CUDAPlace(0))
|
||||
|
||||
def test_train_enable_fusion_group(self):
|
||||
rnn_model = "static"
|
||||
config = RNNConfig("test", rnn_model)
|
||||
with fluid.scope_guard(fluid.Scope()):
|
||||
self.train(config, parallel=True, use_program_cache=False)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in new issue