Paddle

Commit Graph

Author	SHA1	Message	Date
Leo Chen	9f926eb720	Layernorm opt (#29522 ) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: zlsh80826 <zlsh80826@gmail.com>	5 years ago
arlesniak	b781953ef5	[oneDNN] Fix flags use test for #29080 , assert condition more general (#29493 ) * Flags assert condition more general, print output if pattern not found * removed test_flags_use_mkldnn form skip list regarding #29080 descr	5 years ago
tangwei12	ae3f7a7100	add ps table (#29463 ) * add ps table Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178	5 years ago
chalsliu	36ec9456cf	Make PADDLE_ROOT as an environment variable	5 years ago
ShenLiang	d8391a1983	fix error message of gather nd (#29521 )	5 years ago
Zhen Wang	5ac71b36fb	Remove tensor copy in the update_loss_scaling op. (#29426 ) * remove tensor copy in the update_loss_scaling op * not use thrust. * fix some cuda memory access error.	5 years ago
Zhou Wei	e74e1a226c	support deepcopy for Layer/Tensor/Paramerbase (#29387 ) * support deepcopy for Layer/Tensor/Paramerbase * fix some code	5 years ago
joejiong	87e75a77c2	Add tangent operator (#29207 ) As the title	5 years ago
zlsh80826	95e334810a	Softmax vectorization (#29404 ) * vec softmax fw * vec softmax bw * add a message argument for compiler compatibility	5 years ago
wanghuancoder	a136c9cdb8	fix increamental coverage script bug, WITH_INCREMENTAL_COVERAGE to DWITH_INCREMENTAL_COVERAGE, test=develop (#29509 )	5 years ago
Aurelius84	966aa0e387	Fix test_mobile_net random failed on windows GPU(#29480 )	5 years ago
ShenLiang	2ef9e0e23c	Rebuild group automatically in dynamic graph distributed (#29255 ) * add tensor_indices in AssignGroupBySize * add rebuild group in reducer	5 years ago
procr	3a0558339d	support mobilenet for kunlun (#29458 )	5 years ago
Huihuang Zheng	a1909affc6	Fix Unit Test: Add Sleep Time for CUDA Retry (#29442 ) Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.	5 years ago
Leo Chen	e5e522493d	make gelu fp16 computing more robust (#29484 )	5 years ago
LoveAn	8094ac686e	Print ccache/clcache hit rate (#29341 ) * test ccache hit statistics, test=develop * test ccache hit statistics, test=develop * add cache hit statistics, test=develop * fix no percent symbol erro on windows, test=develop * remove switch, test=develop	5 years ago
Zhang Ting	560b432349	Revert "improve elementwise_add_grad perf (#29277 )" (#29464 ) This reverts commit `befd6d5338`.	5 years ago
jakpiase	57a4f16d9e	added internal and external reorders to profiler (#29443 ) * added external reorder to profiler * added external and internal reorders to profiler * added internal and external reorder to profiler * added formatting to int/ext reorder commit * removed unnecessary comment	5 years ago
Pei Yang	2480bdef6c	change hard_swish from plugin to layer (#29177 ) * change hard_swish from plugin to layer * add ut when threshold != scale	5 years ago
taixiurong	ecca6585cd	1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448 ) Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>	5 years ago
LoveAn	03b42d9fa7	fix unittest on windows, test=develop (#29365 )	5 years ago
TTerror	a5fcc4b545	update reduce_sum op on xpu (#29367 ) * update reduce_sum op on xpu * update reduce_sum op on xpu * support running on xpu	5 years ago
Jack Zhou	c7cada8571	Fix gru performace decline in 1.8.5 (#29455 )	5 years ago
Zhang Ting	6296f4ed09	revert cast eigen kernel (#29427 )	5 years ago
Leo Chen	a040c055a5	fix layer_norm accuracy (#29434 )	5 years ago
Zhou Wei	24ba9ed436	fix that parameters'grad has grad var (#29408 )	5 years ago
Leo Chen	4e19ce1df5	refine reshape grad and double grad kernel, use tensor copy async (#29128 )	5 years ago
Shang Zhizhou	225a9c4ed8	Fix unittest (#29412 ) * fix tensorrt unittest precision error * fix unittest precision error. test_trt_subgraph_pass && test_trt_dynamic_shape_transformer_prune	5 years ago
Pei Yang	f860de4af7	support clip op trt converter (#29411 )	5 years ago
Jack Zhou	1dd7b97b66	fix rnn_op bug in cudnn_version>= 8 (#29406 )	5 years ago
LoveAn	671555ed32	Compiling operator libraries with Unity build (#29130 ) * Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop	5 years ago
Zhou Wei	5c9bd0bf7c	print whether has build cache (#29035 )	5 years ago
cc	a623ce044f	Use different name_scope for different conv type, test=develop (#29355 )	5 years ago
yongqiangma	7c508d8668	update unbind norm add CUDAPlace api doc information (#29322 ) * enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop * fix format. test=develop * format fix. test=develop * add lod_rank_table. test=develop * fix format. test=develop * fix doc info. test=develop * fix np error * add unbind dygraph api. test=develop * fix unbind doc.test=develop	5 years ago
chentianyu03	879e913b6d	Make transpose, trace, kron, reshape, sum op support complex type (#29321 ) * add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases	5 years ago
卖鱼的哲学	074065e5de	fix expand/uniform_random && concat/transpose to new api on xpu (#29280 ) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header	5 years ago
lilong12	1decf4ada6	update, test=develop (#29331 )	5 years ago
QingshuChen	74bf3bed36	support global pooling for kunlun (#29293 ) * test=kunlun	5 years ago
liym27	b10ecd9d3a	[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )	5 years ago
Chen Weihang	9ad800ebb2	Support type promote for basic math ops (quantum required) (#29265 ) * basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments	5 years ago
tangwei12	8358791607	fix gpu outofrange (#29238 ) * fix gpu emb out of range Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf * fix doc Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf	5 years ago
Leo Chen	b58cfff89d	use has_grad instead of train_mode (#29309 ) * use has_grad instead of train_mode * add vlog for debug * fix ut * fix ut	5 years ago
Zhang Ting	befd6d5338	improve elementwise_add_grad perf (#29277 ) * improve performance of elementwise_sum_grad	5 years ago
Shang Zhizhou	ebf689197d	fix tensorrt output shape error (#29308 ) * fix tensorrt output shape error * fix unittest tensorrt_engine_op_test * fix code style for unitest	5 years ago
Aurelius84	67c700b479	[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421 )	5 years ago
ShenLiang	696dc4bb13	fix the warning of reducer (#29323 )	5 years ago
wangchaochaohu	c4be80f402	polish the code of cumsum and remove some unused code (#29303 )	5 years ago
ShenLiang	c00af94435	fix matmulv2 for windows (#29302 )	5 years ago
wanghuancoder	3765da98c7	add coverage incremental switch, test=develop (#29290 )	5 years ago
Wilber	d68af02c04	fix analysis_config bug. (#29304 )	5 years ago
ShenLiang	0fb18bc214	enforce the matmul_v2 error message (#29297 )	5 years ago
Zhen Wang	9b59a589b1	Remove some useless log. (#29300 )	5 years ago
Leo Chen	13a22a3752	fix shape of tile_grad op (#29289 )	5 years ago
Zhen Wang	be3777a50a	Add pure fp16 training with master weights. (#27712 ) * add the weight decay func for the momentum op * Add the multi_precision function in Momentum Optimizer. * Make sure that the initial value of master weights are same with the fp16 weights. * add static loss scaling. * add the rescale_grad function in the pure fp16 training. * use the original momentum updating method. * Polish some codes, such as variable names. * add docstring for apis. * update the var creation details of _create_master_weight. * not modify codes about imperative momentum updating. * Fix the error of test_dist_sparse_tensor_load_momentum UT. * add unit test for multi precision fp16 training. * add more unit tests for CI. * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT. * For CI Coverage Checking.	5 years ago
Wojciech Uss	6673fb0565	change import math.h to cmath (#29260 )	5 years ago
furnace	7584bb5096	Layer norm fp16 (#29169 ) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: zhiqiu <chenqiuliang@baidu.com>	5 years ago
Shang Zhizhou	c59b4f28a2	fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF (#29275 )	5 years ago
Shang Zhizhou	fc80d2e09c	add compile option WITH_TENSORRT (#29208 ) * add compile option WITH_TENSORRT * add WITH_TENSORRT to ci paddle_buils.sh * add WITH_TENSORRT to paddle_build.sh * change FATAL to WARNING when TensorRT is not found and WITN_TENSORRT=ON, just to pass ci-py3 temporarily	5 years ago
Leo Chen	116305ea4b	Improve performance of elementwise_add grad op (#29187 ) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut	5 years ago
卖鱼的哲学	07c67d5a8b	add deformable_conv op on xpu (#29234 ) * rebase develop * update deformable_conv op on xpu * update deformable_conv op on xpu	5 years ago
Chen Weihang	1de32f823d	Hot fix complle failed in gcc4.8 caused by complex impl (#29254 ) * hot fix complle failed in gcc4.8 * fix failed unittest	5 years ago
GeminiCarrie	642abe2a48	Fix a bug when running on an operating system without "bash." (#29131 ) * Fix a bug when running on an operating system without "bash." * add execution condition * for ci-coverage	5 years ago
ShenLiang	46b73e6cd9	Change the api of DataParallel and Fleet (#29224 )	5 years ago
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	5 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	5 years ago
Zhou Wei	c0a991c874	accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429 ) * The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * fix coverage * fix api doc * fix CI unittest * fix CI unittest * fix unitest * empty tensor does’t need inner_var_ * fix some error message	5 years ago
Wilber	74c43ac638	fix lite unit test. (#29233 )	5 years ago
Adam Osewski	4096ff94dc	Small optimizations for conv2d kernel subroutines. (#29188 ) - Make sure that oneDNN memory descriptors are created only once at first iteration.	5 years ago
joanna.wozna.intel	5c61eeef61	Enable all image classification models (#29155 )	5 years ago
Wilber	4fec182d24	[Lite-Subgraph] Fix compile error for lite subgraph. (#29146 )	5 years ago
123malin	b5c6342336	Update ps gpu (#29209 ) * fix paramete prefetch & device guard Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: chengmo <chengmo@baidu.com>	5 years ago
liym27	865a45984f	Check whether there is any inplace operation affecting gradient calculation. (#27901 ) * Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable. * Add a new attribute `_inplace_version` for VarBase. * Raise exception if an inplace operation can result in incorrect gradient computation. * Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation. * For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode. * Use original var_wrapper if the inplace_version is not changed. * Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.	5 years ago
chen zhiyu	4056c4f11c	Add unittest in musl build (#29099 ) * add musl docker build script * rm space test=document_fix * fix some docs and types errors test=document_fix * move install of python requirement to docker build * add copyright to docker file. * add extr opts * format docs * add ut test add pip cache * add more args description in readme * add stack backtrace in ctest * fix readme bugs	5 years ago
123malin	03d4665f44	prefetch optimize (#29095 ) * test=develop, optimize async prefetch	5 years ago
WangXi	0c2a51d240	optimizer amp, all use fp16 communication, overlap last comm and compute (#28957 )	5 years ago
Chen Weihang	0b032faeee	Polish unittests details and execution conditions to adapt to MUSL (#29044 ) * fix failed tests in yingchun gived list * add unittests into static_mode_white_list * add enable static * fix dist unittest * skip test_sigmoid_focal_loss_op & add gym * revert no need skip unittests * remove gym	5 years ago
123malin	92817f8005	test=develop, rm pathlib (#28658 ) * test=develop, rm pathlib	5 years ago
Wojciech Uss	4fd4095d1b	Add quantization of multi_gru op and tests (#28615 )	5 years ago
Jack Zhou	bc6033f86b	fix gru gcc7.4 bug for the gru compile fix gru gcc7.4 bug for the gru compile	5 years ago
wanghuancoder	0239f79695	Generate code coverage reports only for incremental files (#28508 ) * Generate code coverage reports only for incremental files, test=develop * Generate code coverage reports only for incremental files, test=develop * Generate code coverage reports only for incremental files, test=develop * test for diff python file, test=develop * fix no python diff report, test=develop * add cc test file, test=develop * fix bug in generic.cmake, test=develop * for debug no cc report, test=develp * modify compire branch form test_pr to test, test=develop * fix bug, test=develop * test for h file changed, test=develop * debug for redefinition of argument optimize error, test=develop * close -o3 for test, test=develop * remove -o3 for test, test=develop * remove coverage option for nvcc, test=develop * use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop * reopen -o3, test=develop * remove debug code, test=develop * remove unused code, test=develop	5 years ago
wangchaochaohu	b818429ae7	optimize cumsum OP (#29193 )	5 years ago
ShenLiang	e2d01eb650	Support dynamic graph distributed (#28997 ) * add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document	5 years ago
lilong12	7e5e9934fe	update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020 ) * update, test=develop	5 years ago
pangyoki	7c8ac064c8	Delete prettytable in condabuild (#29145 ) * update conda_build script with removing opencv * modified filepath * modified some content * Delete Commented-Out Code * delete prettytable in conda_build Co-authored-by: XieYunshen <1084314248@qq.com>	5 years ago
Zhou Wei	e668cb07fb	fix CUDA 11 error on windows (#29101 )	5 years ago
Jack Zhou	085260f3de	Add eigen gru and fix the dropout bug in the rnn Add eigen gru and fix the dropout bug in the rnn	5 years ago
yaoxuefeng	545df287fc	add user_define_dump (#28596 )	5 years ago
Aurelius84	71815637cc	Move gym into unittest/requirements.txt (#29149 )	5 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	5 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	5 years ago
Leo Chen	fd3fcb051a	fix typo of flag name (#29154 )	5 years ago
Noel	da71173bc9	Fix ops doc for some ops Fix ops doc for some ops	5 years ago
Leo Chen	770395cb93	Split train_mode and has_grad for tracer (#29064 ) * split train_mode and has_grad * fix format * fix ci problems * fix sample code	5 years ago
Aurelius84	7ae3cb554a	Polish CUDA Information stdout (#29109 )	5 years ago
chalsliu	7a15e64034	Support precision test for new ut	5 years ago
WangXi	173c22aec2	optimize fast graph executor (#28962 )	5 years ago
Shang Zhizhou	562ded1041	fix unittest trt_dynamic_shape_transformer_prune_test error (#29122 )	5 years ago
Shibo Tao	db41258501	add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. (#29034 )	5 years ago
joanna.wozna.intel	b0d1ac161e	Add bf16 pool2d and unify bf16 unit tests (#29039 ) * Add bf16 pool2d and unify bf16 unit tests * Add change default ops test	5 years ago
joanna.wozna.intel	fddea67445	Fix cpu_bfloat16_pass (#28730 ) * Fix cpu_bfloat16_pass * Add output_format * Fix incorrect SetOutput * Change fromating	5 years ago
Qi Li	2fd16cf6fc	fix win ci failure, test=develop (#29089 ) * fix win ci failure, test=develop * add ci test, test=develop	5 years ago
Chen Weihang	fea0e294ee	Hide the C++ stack by default and add hints (#29042 ) * default not show cpp statck & add hint * fix failed unittest * fix failed unittests	5 years ago
Chen Weihang	b1274ac3d6	set show cpp stack by default, test=document_fix (#29102 )	5 years ago
joejiong	582c0a0468	add uint8 for reshape op (#28996 ) add uint8 for reshape operator	5 years ago
Zhou Wei	8ca0a8a859	fix tensor detach to zero copy (#27921 ) * fix tensor detach to zero copy * fix tensor detach to zero copy	5 years ago
Aurelius84	8af0d85ea4	fix unittest failed on windows GPU (#29072 )	5 years ago
taixiurong	a5aa4dc7a9	add xpu elementwise ops (#29031 )	5 years ago
joejiong	b04c78ef5e	Update pow (#29000 ) Simple code clean up	5 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	5 years ago
Wojciech Uss	7b5a8e46de	Add multi_gru_fuse_pass and tests (#28601 ) * Add multi_gru_fuse_pass and tests * fix date * cleaned up headers	5 years ago
LoveAn	c91bb084f4	Add op benchmark ci pipeline in Paddle repo (#28692 )	5 years ago
Zhou Wei	5e26a15484	Open GPU unitest on windows (#29003 ) * open unittests on windows * open GPU unittest on windows	5 years ago
Leo Chen	3815d7aa40	Upgrade string literals to raw string (#28989 ) * upgrade comment string to raw string * fix string in * fix string with ' ' * revert update on comments * upgrade only necessary * fix sample code checker * fix comments with '''	5 years ago
lilong12	767d0ba267	update, test=develop (#28700 )	5 years ago
Wojciech Uss	991345b368	Add multi_gru_seq_fuse_pass and tests (#28604 ) * Add multi_gru_seq_fuse_pass and tests * fix date * removed unused functions	5 years ago
123malin	fbf9564f6b	【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442 ) * test=develop, optimize global_step	5 years ago
lilong12	f77a78cdee	enable pipeline to run with Executor.run() (#28373 ) * update, test=develop	5 years ago
Thunderbrook	0073f9bdb0	support ps-gpu (#28752 ) * ps gpu transpile * ps gpu * remove op * gps trainer * local ps * add macro * HeterBox * def cuda * tab * code style * style Co-authored-by: Thunderbrook <a754913769#163.com>	5 years ago
Chen Weihang	768dab441e	polish two api doc detail, test=document_fix (#28971 )	5 years ago
furnace	8ff3550658	refactor momentum op to combine weight (#27414 ) * refactor momentum op to combine weight_decay (scale op and sum op)	5 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	5 years ago
chen zhiyu	3d0ff8eebc	optimize musl docker build script (#28974 ) * add musl docker build script * rm space test=document_fix * fix some docs and types errors test=document_fix * move install of python requirement to docker build * add copyright to docker file. * add extr opts * format docs	5 years ago
Pei Yang	994673bf4f	change avg pooling and global pooling to trt layer in dynamic shape mode (#28702 ) * change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg	5 years ago
yaoxuefeng	71c1cd1408	fix truncated_gaussian seed (#28777 )	5 years ago
HappyAngel	de528981e5	fix paddlepredictor build error. test=develop (#28792 )	5 years ago
Wilber	a22ea652cf	fix trt delete_pass bug. (#28763 )	5 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	5 years ago
Chen Weihang	b969c32ab1	fix occupied 0 device memory bug (#28771 )	5 years ago
joejiong	1a532d5133	add uint8 support for squeeze operator (#28734 ) Adding uint8 support for squeeze operator.	5 years ago
wangchaochaohu	8b853b3030	fix the number of perf algo for conv cudnn in exhaustive mode (#28694 )	5 years ago
joanna.wozna.intel	8c0ea4bffe	Add bf16 matmul, fc, elementwise add and mul (#28729 ) * Add bf16 matmul, fc, elementwise add and mul * Correct unit test	5 years ago
Wojciech Uss	efc3b182f0	a fix for the fc_lstm_fuse_pass (#28709 )	5 years ago
Zhou Wei	3b0dd5f620	fix bug that to_tensor not support paddle.Place (#28717 )	5 years ago
yaoxuefeng	08b62f4902	fix shuffle batch op shuffle (#28533 )	5 years ago
taixiurong	d3d1a6b6e0	add kunlun kernel: slice, slice_grad, top_k, cast. test=kunlun (#28542 ) 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api * 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api	5 years ago
Jack Zhou	9362d85e0e	Add LSTM, Simple RNN and GRU CPU kernel (#28577 ) * add lstm, simple rnn op kernel * fix the test_lstm for the rnn op * change func name * fix forward postprocess bug * add gru forward, backward code * remove unittest.skipIf; use a big rnn op instead of combination op * fix input doesn't have gradient bug * add eigen lstm forward, backward Co-authored-by: wawltor <fangzeyang0904@hotmail.com>	5 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	5 years ago
Zhang Ting	dab4920568	improve performance of cast op (#28727 )	5 years ago
Zhou Wei	3a88acd2ee	open unittests on windows (#28750 )	5 years ago
yaoxuefeng	03f46e3526	fix truncated_gaussian op cuda seed setting (#28678 )	5 years ago
Wilber	04cefeacc5	Disable windows gpu static lib. (#28741 )	5 years ago
Wojciech Uss	04bcc13fac	Add multi_gru op and tests (#28591 ) * Add multi_gru op and tests * removed redundant disable_dygraph()	5 years ago
wanghuancoder	5aec7dbeb0	use forward declarations for framework.pb.h (#28494 ) * use forward declarations for framework.pb.h, test=develop * use forward declarations for framework.pb.h, test=develop	5 years ago
iducn	f1074e3b19	hide the token output to safely (#28716 )	5 years ago
joejiong	32b90b1c2d	add log10 (#28576 ) Add new operator log10	5 years ago
Leo Chen	3d09929b1f	Add check for non-dispensable input (#28666 ) * Add check for non-dispensable input * fix typo	5 years ago
Chen Weihang	7eeb99fe02	Add basic hook classes for dygraph & implement reduce hook (#28584 ) * add base hook classes and reduce hook impl * fix constructor typo * polish comment format * refactor baisc hook class design * polish design details	5 years ago
Guo Sheng	858ffa0c8b	Fix the dropout setting when not initialized in rnn_op. (#28561 ) test=develop	5 years ago
Jacek Czaja	6d8d3d4c22	[oneDNN] Layer norm bf16 kernel (#28619 )	5 years ago
lilong12	80d2024644	bug fix, test=develop (#28674 )	5 years ago
Zhou Wei	bf143652ac	fix lstm OP compile error on windows (#28667 ) * add unittest and check unittest for windows * fix lstm OP compile error on windows	5 years ago
石晓伟	57dab959ca	add datanorm op new scale_w register (#28657 ) Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	5 years ago
cc	65aac81191	Fix fake_quant error when cout > 1024, test=develop (#28603 )	5 years ago
lilong12	b2f7ab6636	bug fix, test=develop (#28648 )	5 years ago
wawltor	8f2656ef5c	fix the gradient bug for the topk v2 fix the gradient bug for the topk v2	5 years ago
wangchaochaohu	a972c33fd7	refine gather OP performance for dynamic mode (#28587 )	5 years ago
joanna.wozna.intel	2cb71c0cde	Add checkpoint to quantize (#28612 ) * Add checkpoint to quantize * Change bfloat16 option	5 years ago
lidanqing	804271cff9	Op version python mkldnn_inplace test (#28354 ) * add mkldnn inplace op version test * update mkldnn_inplace fuse pass * update the inplace test	5 years ago
pangyoki	b889a0cee2	add gaussian_random op_version (#28602 )	5 years ago
YUNSHEN XIE	cf2c42a937	fix exec nightly error on mac (#28567 )	5 years ago
Guo Sheng	110febdc54	Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622 ) * Fix gradients with ignore_idx in softmax_with_cross_entropy. test=develop * Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu. Remove softmax_with_cross_entropy from op_threshold_white_list. test=develop * Fix test_softmax_cross_entropy_op.py. test=develop	5 years ago
Wilber	8b97bb2e1f	Update cmake for arm ft and fix a bug for Predictor dtor. (#28586 )	5 years ago
Leo Chen	f962bd3432	Fix cudnn workspace limit in cudnn-8 (#28611 )	5 years ago
Leo Chen	90805e2df7	Register op_version for new attribute use_addto (#28463 ) * register op_version for addto * upgrade pass capability * change eq to le * change eq to le * fix merge	5 years ago
danleifeng	a24d186814	fix nccl init failed in parallel dygraph mode (#28497 )	5 years ago
Zhou Wei	93c39779b4	open a part of GPU unittest for windows (#28378 ) * open a part of GPU unittest for windows * open a part of GPU unittest for windows	5 years ago
lilong12	ed9dd7c9f0	add send and recv ops (#28590 ) * update, test=develop	5 years ago
Zhong Hui	a829357e4d	register the op version for some ops register the op version for some ops	5 years ago
Zhou Wei	bf6e7cba7a	updata 2.0 API english doc (#28525 ) * make Numpy version is below 1.19.3 * fix 2.0 doc	5 years ago
YUNSHEN XIE	7b1619e69b	disable test_trt_dynamic_shape_transformer_prune,test=document_fix (#28588 )	5 years ago
Zhou Wei	849467b5aa	fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547 )	5 years ago
Shang Zhizhou	8699f38d08	裁剪transformer模型trt支持；修复tensorRT不支持DeletePass的bug (#28517 ) * skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie	5 years ago
joejiong	08d2413142	add log2 operator (#28319 ) As the title	5 years ago
lidanqing	0fc181dbd0	[Fix bug] If the pass name is not found, IsCompatible should return false (#28475 )	5 years ago
Wilber	1bf4836580	[Inference] Add TryShrinkMemory interface. (#28409 )	5 years ago
wangchaochaohu	c52fe48f6f	fix the GetKernelTypeForVar of input for fluid.gather (#28534 )	5 years ago
wangchaochaohu	d7cfee9b31	Checkout point add (#28488 ) * upgrade pass capability	5 years ago
YUNSHEN XIE	98dc11bb6a	add monitoring for executive ut at night (#28377 ) * add monitoring for executive ut at night * fix some error for paddle_build.bat * fix some error * fix some error in windows * fix some error on windows	5 years ago
Pei Yang	75196cda40	Paddle-TRT int8 support mul op channelwise quant (#28422 ) * paddle-trt support mul channelwise quant * add support for depthwise_conv2d * add errmsg for unsupported op type	5 years ago
zhupengyang	47cbf61dd4	fix softmax unittest float16 random error (#28480 )	5 years ago
Zhou Wei	53e9aa948d	remove diff with develop (#28504 )	5 years ago
YUNSHEN XIE	369605be1d	fix cmake error when execute build_inference_lib (#28503 )	5 years ago
Wilber	645e999afc	fix api_impl test. (#28483 )	5 years ago
YUNSHEN XIE	1e698c600e	fix cmake error when setting ut timeout properity (#28492 )	5 years ago
wangchaochaohu	e14ed71cc2	refine the performance of gather Op (#28458 )	5 years ago
wanghuancoder	e29ab5eacb	clear clcache cache file and reopen clcache (#28384 ) * clear clcache cache file and reopen clcache, test=develop * reopen clcache, test=develop	5 years ago
YUNSHEN XIE	ba0756325a	exec ut no more than 15s 1 (#28439 ) * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix * test for limiting ut exec time as 15S * fix an error caused by cannot find ut * fix some error * can not find test_transformer * fix error caused by ut not run in windows * fix error caused by Compiler Options * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt * setting timeout value to 120s for old ut * add the timeout value setting * fix error caused by ut only run in coverage_ci * add analyzer_transformer_profile_tester * fix some error * fix some error * fix error with inference option * fix error with inference option setting as ON_INFER * add some ut to set timeout * modified some option * fix error * fix some timeout error * fix error * fix error * fix timeout for test_analyzer_bfloat16_resnet50 * fix error * setting timeout properity for some ut * first pr for new ut timeout as 15S	5 years ago
Chen Weihang	155b4f9b6c	Remove selected rows all reduce over height check (#28460 ) * remove slelected rows all reduce over height check * polish unittest	5 years ago
taixiurong	fad4744aa4	fix crash in adam in xpu, *test=kunlun (#28433 )	5 years ago
QingshuChen	6bba8e57b1	fix batch_norm_xpu bug & remove xpusimulator dependence (#28430 ) *test=kunlun	5 years ago
Wilber	ced5c40c41	Update memory release interface. (#28456 )	5 years ago
joanna.wozna.intel	7821759d48	Add bfloat16 softmax and gelu (#28394 ) * Add bfloat16 softmax and gelu * Add pass attr bfloat16_enabled_op_types * Changes from review	5 years ago
iducn	ba0fe0a812	revert the modified shell script (#28453 )	5 years ago
Chen Weihang	c42e656179	Add retry for dygraph parallel socket bind (#28404 ) * add retry for dygraph parallel socket bind * change to loop always * fix writing error	5 years ago
石晓伟	c41fd033e5	check op_version_registry in CI test, test=develop (#28402 )	5 years ago
Jacek Czaja	ca41541472	[oneDNN]Sum bf16 kernel (#28382 ) * - Added sum bf16 oneDNN test=develop * - Fix to UT of sum bf16 test=develop	5 years ago
Chen Weihang	23439b1688	show cpp stack when catch signal (#28415 )	5 years ago
Leo Chen	44a476c2ab	support cuda pinned place (#28416 )	5 years ago
lidanqing	12b9587be5	Add conv_bias pass version python test (#28278 ) * add conv_bias pass version test * update according to reviews	5 years ago
Wilber	05114693cf	[Inference] Memory modification for ShrinkMemory. (#28355 )	5 years ago

... 2 3 4 5 6 ...

18180 Commits (53bb126510aa8bd6aefbc187052d720feb2f03ef)