Paddle

Commit Graph

Author	SHA1	Message	Date
Yiqun Liu	071a702060	Fix the error misjudgment when there are control nodes in graph. (#23943 )	5 years ago
Zeng Jinle	c49791362f	Correct reader device index (#23802 ) * correct reader device index, test=develop * fix async executor scope var initialization, test=develop	5 years ago
joanna.wozna.intel	12ba05ce0c	Add scale-matmul fuse pass (#23734 )	5 years ago
Chen Weihang	532079a222	API (CompiledProgram) error message enhancement (#23559 ) * api compild program error polish, test=develop * fix coverage problem, test=develop * fix details & add unittests, test=develop * add test for coverage, test=develop	5 years ago
chenhaoze	9b06dd8628	Add three passes and api reference of paddle_pass_builder. test=develop (#23741 ) * Add three passes and api reference of paddle_pass_builder.h	5 years ago
joanna.wozna.intel	5ee099ca57	Op-requant squash (#23665 ) * Op-requant squash test=develop * Add matmul to op-requant test test=develop	5 years ago
mozga-intel	3baaee9aab	Remove: NGraph engine from PDPD repository (#23545 ) * Remove the NGraph engine from PDPD repository 1. Each operator was removed from the operator's directory 2. Each test was removed from the unittest directory 3. The parallel executor support was removed from the PDPD 4. The CMake file was removed from the PDPD 5. The NG flags were removed from the repository test=develop * Remove ngraph from: 1. Cmake file 2. Python file test=develop	5 years ago
joanna.wozna.intel	3cb5623dad	Add matmul dequant squash (#23505 ) test=develop	5 years ago
wangchaochaohu	c1187cd6f4	Fp16 refine for fusion group (#23472 )	5 years ago
joanna.wozna.intel	ce08fdcf2b	Add support for INT8 matmul in C-API quantization (#23463 ) * Integrate matmul with cpu_quantize_pass test=develop * Add matmul checking scales test=develop * Change condition of matmul quantization test=develop * Remove redundant var test=develop	5 years ago
wangchaochaohu	d085f79228	fix untime fail for output var stop_gradient=True for fusion group (#23317 )	5 years ago
Kaipeng Deng	d223a24904	Fix inplace_abn compile error on Windows (#23464 ) * fix inplace_abn windows compile error. test=develop	5 years ago
wangchaochaohu	5c60778731	polish the code of fusion group test=develop (#23370 )	5 years ago
Yiqun Liu	bc2981e998	Disable test_code_generator and test_post_training_quantization_mobilenetv1 (#23440 )	5 years ago
joanna.wozna.intel	8c463700e1	Add default pass attributes (#23042 )	5 years ago
Kaipeng Deng	21d95be0db	Add inplace abn op (#22806 ) * add inplace_abn_op. test=develop	5 years ago
Zeng Jinle	3a21980b78	add reader dependency pass, test=develop (#23301 )	5 years ago
wangchaochaohu	d280106007	Add support for attr type Op and add fill_constant Op and scale Op (#23163 ) * add attr support for fusion group and add support for fill_constant and scale Op	5 years ago
Jacek Czaja	2bb1b0e89e	[DNNL] Added MKL-DNN inplace pass for C-API inference (#23315 )	5 years ago
Wojciech Uss	f836c8aa8f	add check for scales and a message (#23119 )	5 years ago
Tao Luo	c00d427d52	simplify the cmake log of ir/CMakeLists.txt (#23262 ) test=develop	5 years ago
Zeng Jinle	bae5930ba1	fix graph attr copy issues, test=develop (#23191 )	5 years ago
Zeng Jinle	acfc9b8a70	Reader sequential and inference partial feed (#22699 ) * sequential reader stage 1, test=develop * fix ut, test=develop * fix iterable=False reset bug, add some logs and polish code, test=develop * inference feed partial data, test=develop * Turn on keep_order=True for test, test=develop * enhance ut to test more cases, test=develop * test commit for reverting * Revert "test commit for reverting", test=develop This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58. * add ut of merged and unmerged results, test=develop * add more uts for coverages and add en doc of api, test=develop * follow comments, test=develop * change note style, test=develop	5 years ago
Wilber	95b356a069	update embedding_eltwise_layernorm fuse and kernel. test=develop (#23114 ) update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input	5 years ago
Yiqun Liu	3af4771122	Add the detection and code-generation of sqrt and square in fusion_group (#23095 )	5 years ago
Sylwester Fraczek	abee05a8c8	added mkldnn swish activation (#23041 )	5 years ago
wangchaochaohu	3757e0687c	Add Unittest for backward of fusion group (#22932 ) * add fusion group test for backward and refine code	5 years ago
wangchaochaohu	f0d193a23c	Cast fusion for fusion group (#22876 ) * add support for expression type convert and add cast Op support in fusion group	5 years ago
Wilber	ff3ddbb502	add skip_layernorm pass. test=develop (#22895 ) * add skip_layernorm pass. test=develop	5 years ago
Zhaolong Xing	8d6dc102fe	[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse (#22494 ) * 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop	5 years ago
liu zhengxi	61fef9754b	Fix fc padding bug during inference fusion (#22860 ) * fix fc padding during fusion, test=develop * fix optim model inference after SaveOptimModel, test=develop	5 years ago
wangchaochaohu	ca9e77a8d4	add sum op support for fusion group (#22771 ) * Add the codegen and auto fusion for sum Op in fusion group	5 years ago
tianshuo78520a	433cef03e5	fix typo word (#22784 )	5 years ago
GaoWei8	cdf5f6fb8c	Add an inference interface to disable FC padding (#22097 ) * Add an interface of disabling FC padding * fix bert regression * polish fc padding interface * recover pass function * fix argument error * fix mkldnn error	5 years ago
tianshuo78520a	d2ba91aad1	fix typo words (#22653 )	5 years ago
Yiqun Liu	22bbd54719	Add the support of fp16 in fusion_group (#22239 )	5 years ago
Wilber	9a8203aa25	fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop (#22551 ) 当一个模型中有多个fc_lstm子图的时候，且其中fc共用了同一个persistable的bias，此时不应该将bias节点删除，只将非persistable的节点去除即可。	5 years ago
Zhaolong Xing	8acd745c25	[Ernie GPU Optim]: Fuse three fc to multihtead matmul (#22486 ) * 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop	5 years ago
Yiqun Liu	dcfb603897	Enable the detection of subgraph composed of grad ops (#21223 ) * Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop	5 years ago
joanna.wozna.intel	17f2c0899f	Add dequant-scale squash (#22409 ) * Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
石晓伟	e1b0d7cbb1	remove anakin from code, test=develop (#22420 )	5 years ago
Michał Gallus	269db0d1d1	[DNNL] Fix accuracy in INT8 FC (#22404 ) * Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop	5 years ago
joanna.wozna.intel	3099d9d47c	Restore requantize squash (#22399 )	5 years ago
Yiqun Liu	b7cac50b64	Implement a common python unittest to test the ir passes. (#22209 ) * Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop	5 years ago
lidanqing	895f8da7d6	change std::cout to log(INFO), vlog (#22316 )	5 years ago
Zhen Wang	e40cfb1010	fix the bug of assert_is_op_output. test=develop (#22262 )	5 years ago
Wojciech Uss	d3a6647372	improve placement pass tests code coverage (#22197 )	5 years ago
Zhen Wang	46189b166d	Add bn and relu fuse pass (#22048 ) * add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop	5 years ago
joanna.wozna.intel	5b2e98aa17	Add multiple quantize operators fuse (#22062 )	5 years ago
Yiqun Liu	96980c2244	Polish the PADDLE_ENFORCE in fusion_group pass related codes. (#22144 ) * Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop	5 years ago
liu zhengxi	724b13e459	fix xception precision problem, test=develop (#22124 )	5 years ago
Yiqun Liu	b1401fb74d	Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#22094 ) test=develop	5 years ago
Yiqun Liu	d48320777e	Add the first implememtation of fusion_group op (#19621 ) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop	5 years ago
Michał Gallus	6192108408	[DNNL] 3D Fully-Connected (#21746 )	5 years ago
liu zhengxi	196e20dfbb	Fix multi-threads memory out of bounds error for passes (#21920 ) * fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop * fix attention_lstm_fuse_pass during multi-threads inference, test=develop * fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop * fix fc_lstm_fuse_pass during multi-threads inference, test=develop * fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop	5 years ago
石晓伟	03479469a7	fix multi-thread error of fc_gru_fuse_pass.cc, test=develop (#21841 ) * fix multi-thread error of fc_gru_fuse_pass.cc, test=develop * export FLAGS and GLOG symbols, test=develop	5 years ago
Pei Yang	3e5008ad01	fix trt calib not working bug, test=develop (#21934 )	5 years ago
Aurelius84	51a86d2b6b	Optimize adam speed (#21777 ) * optimize adam speed by removing _finish_update test=develop * fix SparseAdamFunctor param list test=develop * Remove scale_op in expect_list of adam_op test=develop * fix test optimizer loss assert error test=develop * fix test optimizer loss assert error test=develop * modify PADDLE_ENFORCE usage test=develop * fix op_type in lamb_op.cc test=develop * fix errors ostream format bug test=develop * add betaPowOut in ngraph op test=develop * fix ngraph::op api for gcc8 test=develop * clean code test=develop * modify struct into class test=develop * remove code of beta1Tensor in lamb_op test=develop	5 years ago
lidanqing	d3a96632fa	Add fc-dequantize squash in cpu_quantize_squash_pass for ernie model (#21714 ) * fc-dequantize squash test=develop * change according to reviews test=develop * change PADDLE_ENFORCE test=develop * add second test when fc-dequant do not fuse test=develop * change all related PADDLE_ENFORCE test=develop	5 years ago
joanna.wozna.intel	d419b859c0	Add reshape int8 mkldnn op (#21428 ) * Add reshape int8 op test=develop * Change test to CPUPlace test=develop * Correct tests test=develop	5 years ago
Tao Luo	01fa4ead61	fix -Wno-error=sign-compare warning in gcc8 (#21434 ) * fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop	5 years ago
Zeng Jinle	89966525f1	Polish reference count pass (#21324 ) * fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop	5 years ago
GaoWei8	8493f20ebc	Polish the codes of fc when needs padding (#21378 ) test=develop	5 years ago
Michał Gallus	5d7d548275	INT8 Fully-connected (#17641 ) * Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop	5 years ago
GaoWei8	234060f88f	Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972 ) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop	5 years ago
zhouwei25	345b67b5e2	remove warning LNK4006 and warning LNK4221 (#21226 )	5 years ago
Yiqun Liu	c918788ba9	Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. (#21310 ) * Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. test=develop * Print the subgraph when check failed. test=develop	5 years ago
Chen Weihang	952508527a	Polish some PE code details (#21274 ) * polish code details, test=develop * futher polish hint msg, test=develop	5 years ago
Yiqun Liu	6b1e1f0dda	Enable generating code for a given subgraph. (#21126 ) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop	5 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	5 years ago
Zhaolong Xing	65f7052554	TRT int8: refine trt int8 for dynamic range set (#21112 ) * refine trt int8 for dynamic range set test=develop * refine trt int8 test=develop	5 years ago
Chen Weihang	8da0cd537a	Add examples for error message writing specification - NotFound, OutOfRange, AlreadyExists, PermissionDenied (#21134 ) * add examples for error msg spec, test=develop * change ENFORCE to ENFORCE_*, test=develop add more already exists examples, test=develop	5 years ago
Chen Weihang	8414575b78	Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137 ) * add examples for error spec, test=develop * change ENFORCE to ENFORCE_**, test=develop	5 years ago
WangXi	de5d3ff688	Fix dgc buffer illegal & reuse velocity (#21012 )	5 years ago
Chen Weihang	826254f664	Add pre-condition check for fuse optimizer op pass (#21005 ) * add pre condition check for fuse optimizer op pass, test=develop * add log & set init to zero, test=develop * fix test_fuse_all_reduce_pass failed, test=develop * polish details, test=develop * refine PADDLE_ENFORCE & remove needless VLOG, test=develop * refactor op check method, test=develop	5 years ago
Yiqun Liu	9091f8cdf9	Support generating code for grad_op (#21066 ) * Add the definition of operation in fusion_group. * Use operations in OperationMap to detect fusion_group of elementwise pattern. * Add namespace fusion_group in code_generator. * Use operations recorded in OperationMap to generate code. * Remove implementation codes to .cc file. * Refine Operation and CodeGenerator to make it easier to generate code for grad_op. Refine the unittest for better reuse. * Avoid recording the template's keyword in a array. * Support the generating of code for grad_op and add unittest. test=develop * Remove replaced_element_in_order and use use number instead. test=develop	5 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	5 years ago
Zeng Jinle	878a40f57d	Support NoNeedBufferVarsInference in dygraph backward (#20868 ) * support no need buffer vars in dygraph, test=develop * fix inference compilation error, test=develop * update no_need_buffer_vars_inference, test=develop * add unittests for no_need_buffer_vars_context, test=develop * refine no_need_buffer_vars by return ref, test=develop * polish some codes, test=develop	5 years ago
Wilber	c534149642	fix squared_mat_sub_fuse_pass when elementwise_op input is from persistable param test=develop (#20960 ) fix squared_mat_sub_fuse_pass when elementwise_op input is from persistable param	5 years ago
WangXi	eec4fa9099	And Enforce to fuse pass for DGC doesn't support fuse for now, test=develop (#20935 )	5 years ago
hong	8c4573a3cb	GradMaker for dygraph (#19706 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * optimize grad maker; test=develop * optimize grad maker * test * grad make optim; test=develop * fix unittest bugs; test=develop * add dygraph grad op maker and split_op * grad op maker refactor; test=develop * add dygraph grad maker; test=develop * fix op deformable_conv_v1_op bug; test=develop * fix deformable_conv prroi pool bugs; * fix new op grad op maker bug; test=develop * fix split by ref bug; test=develop * fix dygraph auto prune bug; test=develop * fix test_trace bug; test=develop * fix fused emb seq pool bug; test=develop * remove useless code in op_desc file; test=develop * remove useless code, StrVarBaseNode; test=develop * fix review issues; test=develop * fix rank_loss grad maker; test=develop * remove flag in VarBase; test=develop * fix distributed_notify_op compile bug ; test=develop * fix reshape op double grad; test=develop * fix expand as op; test=develop * add impertive type_defs.h for demo_train; test=develop * fix inference lib cmake; test=develop * fix inference lib; test=develop * fix infernce_lib; test=develop * fix inference cmake; test=develop * fix inference lib; test=develop * fix inference lib; test=develop * remove condition dygraph grad maker, modify local name; test=develop * fix split grad maker bug; test=develop * fix pyramid_op bug; test=develop * change travis time out limit; test=develop * restore travis; test=develop * change timeout limit; test=develop	5 years ago
Yiqun Liu	b5f3be8330	Implement a pass detect fusion group of elementwise op (#19884 ) * Add fusion_group_pass and elementwise pattern. * Rewrite the detector of elementwise group. test=develop * Add a comment in codegen. * Add more unittest cases. test=develop * Move code_generator related code to fusion_group directory. * Correct the including path. * Add the definition of SubGraph and finish the insert of fusion_group op in pass. * Insert graph_vis_pass in tester to visualize the graph for debug.	5 years ago
wangchaochaohu	ba45dce35d	fix codetest for windows make test=develop (#20796 )	5 years ago
石晓伟	48a774c713	fix ts_sort's bug, test=develop (#20720 )	5 years ago
wopeizl	9e5948230e	add support to gcc8, add docker env test=develop (#19807 ) * add support to gcc8, add docker env test=develop	5 years ago
WangXi	cadc6a9704	fix dgc test and bug when not set trainers_endpoints_, test=develop (#20617 )	5 years ago
Pei Yang	443f604c3b	add DisableGlogInfo() to AnalysisConfig, test=develop (#20581 )	5 years ago
zhaoyuchen2018	b8333edef6	Add Multihead matmul fuse pass (#20167 ) * Add multihead fuse pass for ernie opt * Refine softmax test=develop * Refine cuda kernel * Refine cuda version * Refine cmake test=develop * refine header file * refine test case and pass * refine comments	5 years ago
Adam	7faa3e9555	Add ConvTranspose + BatchNorm fuse pass (#20161 ) * Add ConvTranspose + BatchNorm fuse pass test=develop * Add tests for conv+bn and conv_transpose+bn passes test=develop	5 years ago
bingyanghuang	9de6772510	Follow comment of Merged QAT PR 18970 (#19979 ) * Follow Wangzhen's comment in PR 18970, test=develop * Review comments, test=develop * Leave fake quantization around mul test=develop * Replace Fake with Real Quantized Mul test=develop * Fix bug in quantize placement pass Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop	5 years ago
joanna.wozna.intel	f5221ac19f	Disable conv requant squash (#20041 ) * Fix conv2d+dequantize squash for residual fusion test=develop * Disable conv-requant squash test=develop	5 years ago
wangchaochaohu	c9ea317b36	codegen code for reconstruction (#19728 ) * codegen code for reconstruction test=develop * fix the cmake test=develop * fix review advice test=develop	5 years ago
chengduo	101a2b610a	Add dtype for coalesce_tensor_op (#20016 ) Add dtype for coalesce_tensor_op	5 years ago
joanna.wozna.intel	3f1d0234ae	Fix conv2d+dequantize squash for residual fusion (#19545 ) * Fix conv2d+dequantize squash for residual fusion test=develop * Change condition test=develop	6 years ago
Yiqun Liu	3cd985a669	Add a pass to fuse fc+elementwise_add+layernorm (#19776 ) * Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop	6 years ago
Zeng Jinle	3fd3b663a8	fix gc bug in controlflow ops, test=develop (#19827 )	6 years ago
Zeng Jinle	db26de8389	[Bug fix] Disable memory reuse on feeded variables (#19835 ) * fix memory reuse bug on feeding variables, test=develop * add comments to reference count members, test=develop	6 years ago
chengduo	8281497030	Fix warning info of build_strategy (#19805 ) * fix warning info test=develop * fix bug of all_reduce_deps_pass test=develop	6 years ago
Yiqun Liu	c67c8758cb	Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop	6 years ago
chengduo	056fdedde3	Open fuse all reduce option (#19765 ) * Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop	6 years ago
chengduo	e506c99c20	Open fuse broadcast option (#18833 ) * fix vlog level and fuse option type test=develop	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
chengduo	5866a7a5fe	Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418 ) * Enable fused_all_reduce_op_handle support GPU and CPU Gradients	6 years ago
wangchaochaohu	ed8f44ea21	codegen for fused elementwise operation (#19520 ) * test=develop codegen for fused elementwise operation * fix test=develop	6 years ago
baojun	a3a4b6e570	Enable ngraph through build_strategy (#19266 ) * enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Yiqun Liu	c5548178b0	A a pass to enable the use of cudnn (#19346 ) * Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop	6 years ago
Yiqun Liu	fcec365d29	Add a pass to replace dropout_op with scale_op when is_test is true (#19297 ) * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop	6 years ago
tangwei12	65c7368400	Fix the correctness of async mode at distributed training (#18863 ) * fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop	6 years ago
joanna.wozna.intel	2e3ec66be0	Add conv dequant squash for int8 (#18905 )	6 years ago
Tao Luo	c82280e445	remove unused conv_elementwise_add2_act_fuse.cc (#19344 ) test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
liuwei1031	50582071dc	fix compilation issue in windows vs2017 (#19183 ) * fix compilation issue in windows vs2017, test=develop * fix gtest lib not found issue, test=develop	6 years ago
juncaipeng	5368b36512	remove the warning for reminding user to avoid using the OriginProgram method, test=develop (#19244 ) This log information may annoy users who don't need to care about it.	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
joanna.wozna.intel	492a00f53e	Add conv reqantize squash (#18754 ) * Add requantize squash test=develop * Add more precise tests test=develop * REname and REfactor tester test=develop	6 years ago
joanna.wozna.intel	bce72c7fea	Replace Relu with bounded Relu in MobileNetV2 quantization (#18988 ) test=develop	6 years ago
chengduo	e044e84264	open fuse_all_optimizer_ops (#19087 ) test=develop	6 years ago
chengduo	17d62ab220	Enhance fuse optimization op pass (#19010 ) * Enhance fuse optimization op pass test=develop	6 years ago
Zeng Jinle	2175d19993	fix memory_reuse_pass memory_size calculation error, test=develop (#19020 )	6 years ago
Zeng Jinle	7ac748adb4	Open gc by default (#18836 ) * open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
Zeng Jinle	8008ab4e6b	Remove legacy C++ memory optimization codes (#18834 ) * remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop	6 years ago
chengduo	4140fe11a4	Open fuse optimization ops (#18741 ) * open fuse optimization ops test=develop	6 years ago
Zeng Jinle	a802da650b	Feature/mem opt pass refactor (#18735 ) * first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop	6 years ago
Zhaolong Xing	26ae6d49e4	Update trt5 for paddle-trt (#18645 ) * update paddle-trt for: 1. fix bug: when batch > 2, core in split plugin. 2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.) 3. add new attr to dropout. 4. shuffle channel, swish, relu6 support test=develop * 1. fix ci test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Huihuang Zheng	89bc3fd841	Support memory eager deletion on recurrent OP (#17710 ) Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
gongweibao	160ddc980c	Regroup fusion by date type. (#18496 )	6 years ago
chengduo	7453857324	Make fuse_all_reduce_op_pass support mix_precision (#17652 )	6 years ago
Michał Gallus	7023a86c3a	Fix Pooling output scale (#18186 ) * Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
chengduo	14e1e165df	update alloc_continuous_space_for_grad_pass (#18287 ) test=develop	6 years ago
gongweibao	f5caf3443c	Fix reinitialized ncclid error! (#18025 )	6 years ago
gongweibao	da9143c1cc	Polish codes of old prs. (#17938 )	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
gongweibao	972c54cd70	Fix FLAGS_fuse_parameter_memory_size unit from Bytes to MBytes. (#17924 )	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
Yiqun Liu	8fd39f3e99	Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236 ) * Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop	6 years ago
mozga-intel	5eb81fe595	Capi for a ngraph engine (#17037 )	6 years ago
Jacek Czaja	6d8075ecef	[MKL-DNN] conv_transpose mkldnn bias pass (#17644 ) * - changes to graph detector - Changes to pass - Added ut for new pass - use_pass - Added pass to mkldnn passes - fix to registration - improved verbose messaging for conv bias passes - Lint fixes test=develop * - Lint fixes test=develop	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago

1 2 3 4 5 ...

662 Commits (d038746e1c9c20ddda5c548a3c4d666b88238116)