Paddle

Commit Graph

Author	SHA1	Message	Date
wangchaochaohu	f350aa59ff	Fix the compiler error for half type (#29799 )	4 years ago
wuhuanzhou	27aa15150c	Add approval for PR-CI-OP-benchmark (#29797 ) * Add approval for PR-CI-OP-benchmark, test=develop * dont show token in log, test=document_fix	4 years ago
Huihuang Zheng	1cbb282d77	Add Retry Logic to CublasHandlerHolder Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.	4 years ago
LielinJiang	e5af650b71	Add double grad for conv_transpose (#29706 ) * add double grad for conv_transpose	4 years ago
Leo Chen	224f3bcbb1	format code (#29714 )	4 years ago
LoveAn	2e5b4a216c	Optimize compilation time with Unity Build (#29733 ) * Test compilation time with less parallel count, notest, test=windows_ci * optimize rules of Unity Build, notest, test=windows_ci, test=windows_op * limit parallel counts used only on GPU, test=develop * remove limit of argument /m:8 on Windows, test=develop	4 years ago
Zhang Jun	0c23ba95d8	enable MakeCiper api for inference;test=develop (#29692 )	4 years ago
wangchaochaohu	7b2dc4e6b1	optimization for fp16 elementwise add (#29744 )	4 years ago
chalsliu	27bdbec7fc	Refine precision test print message	4 years ago
chalsliu	e63a68feac	Retry when download failed for precision test	4 years ago
Jacek Czaja	07790ba13e	[oneDNN] Reimplemented elementwise_add grad (#29747 ) * - Reimplemented elementwise_add grad - lint * - fix after review * - Fix to fix after review	4 years ago
Aurelius84	17c8e3adfe	Polish code in gpu_launch_config.h (#29730 )	4 years ago
wangchaochaohu	068d905e1e	fix the shape choose of vectorize for cuda	4 years ago
syyxsxx	7c2affaa26	fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626 ) fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug	4 years ago
石晓伟	8bd2879ef7	update the operator registration for incompatible upgrade, test=develop (#29720 )	4 years ago
chentianyu03	71063b8137	add conj op for complex types (#29527 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number	4 years ago
Wilber	b593d588aa	[Inference] EnableUseGpu has higher priority than flags (#29697 ) * enable_use_gpu has higher priority than FLAGS * update.	4 years ago
WangXi	9cbcc6cadc	fleet sync build strategy, test=develop (#29732 )	4 years ago
wanghuancoder	0c59ad2a1a	Windows generate pdb and dump, for debug (#29628 ) * Windows generate pdb and dump, for debug * fix code style, test=develop * modify cmakelist	4 years ago
Huihuang Zheng	4c4d4ba5e0	Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617 ) Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.	4 years ago
Chen Weihang	6cfa59de1b	[Complex] Add real & imag op and api for complex tensor (#29672 ) * add complex real op & api & unittest * add imag op & api & unittest * refactor op impl * revert simplify writing due to complile failed * polish details * polish grad op code	4 years ago
Jacek Czaja	9eff1a674f	Added missing format of oneDNN (#29670 )	4 years ago
wangchaochaohu	2e0d1ed00f	delete the code for fp16 optimization because it is not faster than common template code (#29715 )	4 years ago
TTerror	af8ded773a	update activation op on kunlun (#29577 ) * fix expand && concat/transpose to new api * update xpu_header * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * add nearest_interp on kunlun * update error message	4 years ago
ceci3	cc387159f3	add pad and concat double grad (#29549 ) * add constant pad double grad	4 years ago
liuyuhui	f13c3a9cd7	[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337 )	4 years ago
Y_Xuan	76738504ad	添加rocm平台支持代码 (#29342 ) * 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期	4 years ago
Zhang Ting	1e9127f688	improve dropout grad (#29605 ) * improve grad perf	4 years ago
wangchaochaohu	eab44e1f32	refine (#29622 )	4 years ago
WangXi	613c46bc07	fix gen_nccl_id_op_helper compile failed, test=develop (#29614 )	4 years ago
chen zhiyu	f5f8809c1a	1. add python version selection 2.add dynamic flags setting. (#29612 )	4 years ago
YUNSHEN XIE	2926e74326	New UT should not exceed 15s (#29492 ) * added UT should not exceed 15s * fix error * UT limit of 15s is the first to be executed * fix error * fix error with CI_SKIP_CPP_TEST * modfied tiemout setting * fix error	4 years ago
Chen Weihang	f02aece1f0	Add complex dtype op (add) test example (#29603 ) * add op test case for complex * polish code details * add xpu set constant support * fix argument rror * remove useless pyc file	4 years ago
AshburnLee	efea540ca9	Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732 )	4 years ago
lijianshe02	7779768b53	add transpose double grad test=develop (#29600 ) * add transpose double grad test=develop	4 years ago
wangchaochaohu	1b69e528d3	optimize for long width for elementwise (#29602 )	4 years ago
Wilber	78dad78610	fix none-contiguous bug for python api. (#29615 )	4 years ago
Zhou Wei	18f9df0da4	fix cache pip error (#29618 )	4 years ago
ShenLiang	1efef8baed	Fix bug of matmul_v2 for broadcast case (#29599 ) * fix bug of matmul_v2 for broadcast	4 years ago
qingqing01	8d549fc85d	Add clip double grad (#29590 )	4 years ago
wangchaochaohu	ac4bae8ee9	elementwise_add_grad Op optimization (#29575 )	4 years ago
arlesniak	62d4483649	Added verbose oneDNN lib version (#29378 )	4 years ago
lilong12	ff6a145011	update, test=develop (#29559 )	4 years ago
WangXi	467c716963	gen nccl id use socket (#29431 )	4 years ago
tangwei12	0034273b7e	add service (#29560 ) * add service, remove ut on mac * fix heter_profiler & add heter stop method * fix code style	4 years ago
Leo Chen	c0163837a5	Fix compile problem when cuda_arch < 6000 (#29576 ) * fix compile problem when cuda_arch < 6000 * refine code * refine code	4 years ago
QingshuChen	79a41a9ed6	support roi_align & affine_channel for kunlun (#29561 ) * support roi_align & affine_channel for kunlun * minor	4 years ago
Huihuang Zheng	831e9135b9	Fix Windows Unittest (#29543 ) Fix 3 Windows Unittests test_fuse_all_reduce_pass: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag test_feed_data_check_shape_type: Paddle cannot run multiple-GPU on Windows so set single visible GPU flag test_tsm: Winodws GPU size is not enough so decrease batch size and data size.	4 years ago
Jacek Czaja	f6cca62575	[oneDNN] Making ThreadID info in caching key optional (#29272 )	4 years ago
GeminiCarrie	08f24a3108	Fix precision problem (#29567 ) * Fix a bug when running on an operating system without "bash." * add execution condition * for ci-coverage * get cpu information to check the precision problem * Update compilation environment for musl version * update dependencies * remove test code check cpu info remove test code review * update alpine and third_party denpendencies * add newline for ci Code format	4 years ago
Wilber	740c0d58c3	update for xpu ci. (#29568 )	4 years ago
JZ-LIANG	d33d468f02	[Sharding] add hybrid-dp feature (#29518 ) * Sharding add hybrid-dp feature * update sharding in distributed_strategy * update sharding unitest * revise code format for sharding	4 years ago
Leo Chen	1e72e03217	remove duplicated macro (#29563 )	4 years ago
Zhang Ting	6702040e94	improve dropout (#29465 ) * improve drop out * add VectorizedRandomGeneratorWithGenerator * fix bug * modify according to comments	4 years ago
Zhang Ting	30d9589afe	add cast cuda kernel (#29352 )	4 years ago
LoveAn	b5d4a1f33d	Add the strategy of skipping cc/cu test compilation and execution in CI (#29499 ) * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop * fix if error with CI_SKIP_TEST, test=develop * fix add properties to test error on Linux/MAC, test=develop * fix set test properties of test_code_generator error, test=develop * remove test codes and advance judgment of file modification on Linux, test=develop * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix * Add branch judgement on Linux, test=develop	4 years ago
Aurelius84	2a42250699	Polish hash function of executor cache key (#29556 ) * Add more value to calculate hash key * fix size_t * polish code	4 years ago
taixiurong	760d015c14	add xpu ops for training transformer in kunlun (#29539 ) * 1.fix matmul bug 2. add one hot * add xpu error msg	4 years ago
Jacek Czaja	83a693ee55	[oneDNN] Added Unit Test for Multiple instances prediction (#29501 ) * - Added infrastructre for new test - Added UT for Multiple models prediction - cosmetic fixes - lint - lint fixes * - Removed timeout for MMP test	4 years ago
joanna.wozna.intel	0ce6d7fa77	Fix bf16 activations test for softmax and gelu (#29502 ) * Fix bf16 activations test for softmax and gelu * Resolve conflict	4 years ago
Zhong Hui	60bfd308ab	fix p_norm with empty shape (#29500 ) fix p_norm with empty shape (#29500)	4 years ago
Zhou Wei	b9e926b8e5	change the code format (#29550 )	4 years ago
Leo Chen	9f926eb720	Layernorm opt (#29522 ) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: zlsh80826 <zlsh80826@gmail.com>	4 years ago
arlesniak	b781953ef5	[oneDNN] Fix flags use test for #29080 , assert condition more general (#29493 ) * Flags assert condition more general, print output if pattern not found * removed test_flags_use_mkldnn form skip list regarding #29080 descr	4 years ago
tangwei12	ae3f7a7100	add ps table (#29463 ) * add ps table Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178	4 years ago
chalsliu	36ec9456cf	Make PADDLE_ROOT as an environment variable	4 years ago
ShenLiang	d8391a1983	fix error message of gather nd (#29521 )	4 years ago
Zhen Wang	5ac71b36fb	Remove tensor copy in the update_loss_scaling op. (#29426 ) * remove tensor copy in the update_loss_scaling op * not use thrust. * fix some cuda memory access error.	4 years ago
Zhou Wei	e74e1a226c	support deepcopy for Layer/Tensor/Paramerbase (#29387 ) * support deepcopy for Layer/Tensor/Paramerbase * fix some code	4 years ago
joejiong	87e75a77c2	Add tangent operator (#29207 ) As the title	4 years ago
zlsh80826	95e334810a	Softmax vectorization (#29404 ) * vec softmax fw * vec softmax bw * add a message argument for compiler compatibility	4 years ago
wanghuancoder	a136c9cdb8	fix increamental coverage script bug, WITH_INCREMENTAL_COVERAGE to DWITH_INCREMENTAL_COVERAGE, test=develop (#29509 )	4 years ago
Aurelius84	966aa0e387	Fix test_mobile_net random failed on windows GPU(#29480 )	4 years ago
ShenLiang	2ef9e0e23c	Rebuild group automatically in dynamic graph distributed (#29255 ) * add tensor_indices in AssignGroupBySize * add rebuild group in reducer	4 years ago
procr	3a0558339d	support mobilenet for kunlun (#29458 )	4 years ago
Huihuang Zheng	a1909affc6	Fix Unit Test: Add Sleep Time for CUDA Retry (#29442 ) Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.	4 years ago
Leo Chen	e5e522493d	make gelu fp16 computing more robust (#29484 )	4 years ago
LoveAn	8094ac686e	Print ccache/clcache hit rate (#29341 ) * test ccache hit statistics, test=develop * test ccache hit statistics, test=develop * add cache hit statistics, test=develop * fix no percent symbol erro on windows, test=develop * remove switch, test=develop	4 years ago
Zhang Ting	560b432349	Revert "improve elementwise_add_grad perf (#29277 )" (#29464 ) This reverts commit `befd6d5338`.	4 years ago
jakpiase	57a4f16d9e	added internal and external reorders to profiler (#29443 ) * added external reorder to profiler * added external and internal reorders to profiler * added internal and external reorder to profiler * added formatting to int/ext reorder commit * removed unnecessary comment	4 years ago
Pei Yang	2480bdef6c	change hard_swish from plugin to layer (#29177 ) * change hard_swish from plugin to layer * add ut when threshold != scale	4 years ago
taixiurong	ecca6585cd	1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448 ) Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>	4 years ago
LoveAn	03b42d9fa7	fix unittest on windows, test=develop (#29365 )	4 years ago
TTerror	a5fcc4b545	update reduce_sum op on xpu (#29367 ) * update reduce_sum op on xpu * update reduce_sum op on xpu * support running on xpu	4 years ago
Jack Zhou	c7cada8571	Fix gru performace decline in 1.8.5 (#29455 )	4 years ago
Zhang Ting	6296f4ed09	revert cast eigen kernel (#29427 )	4 years ago
Leo Chen	a040c055a5	fix layer_norm accuracy (#29434 )	4 years ago
Zhou Wei	24ba9ed436	fix that parameters'grad has grad var (#29408 )	4 years ago
Leo Chen	4e19ce1df5	refine reshape grad and double grad kernel, use tensor copy async (#29128 )	4 years ago
Shang Zhizhou	225a9c4ed8	Fix unittest (#29412 ) * fix tensorrt unittest precision error * fix unittest precision error. test_trt_subgraph_pass && test_trt_dynamic_shape_transformer_prune	4 years ago
Pei Yang	f860de4af7	support clip op trt converter (#29411 )	4 years ago
Jack Zhou	1dd7b97b66	fix rnn_op bug in cudnn_version>= 8 (#29406 )	4 years ago
LoveAn	671555ed32	Compiling operator libraries with Unity build (#29130 ) * Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop	4 years ago
Zhou Wei	5c9bd0bf7c	print whether has build cache (#29035 )	4 years ago
cc	a623ce044f	Use different name_scope for different conv type, test=develop (#29355 )	4 years ago
yongqiangma	7c508d8668	update unbind norm add CUDAPlace api doc information (#29322 ) * enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop * fix format. test=develop * format fix. test=develop * add lod_rank_table. test=develop * fix format. test=develop * fix doc info. test=develop * fix np error * add unbind dygraph api. test=develop * fix unbind doc.test=develop	4 years ago
chentianyu03	879e913b6d	Make transpose, trace, kron, reshape, sum op support complex type (#29321 ) * add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases	4 years ago
卖鱼的哲学	074065e5de	fix expand/uniform_random && concat/transpose to new api on xpu (#29280 ) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header	4 years ago
lilong12	1decf4ada6	update, test=develop (#29331 )	4 years ago
QingshuChen	74bf3bed36	support global pooling for kunlun (#29293 ) * test=kunlun	4 years ago
liym27	b10ecd9d3a	[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )	4 years ago
Chen Weihang	9ad800ebb2	Support type promote for basic math ops (quantum required) (#29265 ) * basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments	4 years ago
tangwei12	8358791607	fix gpu outofrange (#29238 ) * fix gpu emb out of range Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf * fix doc Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf	4 years ago
Leo Chen	b58cfff89d	use has_grad instead of train_mode (#29309 ) * use has_grad instead of train_mode * add vlog for debug * fix ut * fix ut	4 years ago
Zhang Ting	befd6d5338	improve elementwise_add_grad perf (#29277 ) * improve performance of elementwise_sum_grad	4 years ago
Shang Zhizhou	ebf689197d	fix tensorrt output shape error (#29308 ) * fix tensorrt output shape error * fix unittest tensorrt_engine_op_test * fix code style for unitest	4 years ago
Aurelius84	67c700b479	[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421 )	4 years ago
ShenLiang	696dc4bb13	fix the warning of reducer (#29323 )	4 years ago
wangchaochaohu	c4be80f402	polish the code of cumsum and remove some unused code (#29303 )	4 years ago
ShenLiang	c00af94435	fix matmulv2 for windows (#29302 )	4 years ago
wanghuancoder	3765da98c7	add coverage incremental switch, test=develop (#29290 )	4 years ago
Wilber	d68af02c04	fix analysis_config bug. (#29304 )	4 years ago
ShenLiang	0fb18bc214	enforce the matmul_v2 error message (#29297 )	4 years ago
Zhen Wang	9b59a589b1	Remove some useless log. (#29300 )	4 years ago
Leo Chen	13a22a3752	fix shape of tile_grad op (#29289 )	4 years ago
Zhen Wang	be3777a50a	Add pure fp16 training with master weights. (#27712 ) * add the weight decay func for the momentum op * Add the multi_precision function in Momentum Optimizer. * Make sure that the initial value of master weights are same with the fp16 weights. * add static loss scaling. * add the rescale_grad function in the pure fp16 training. * use the original momentum updating method. * Polish some codes, such as variable names. * add docstring for apis. * update the var creation details of _create_master_weight. * not modify codes about imperative momentum updating. * Fix the error of test_dist_sparse_tensor_load_momentum UT. * add unit test for multi precision fp16 training. * add more unit tests for CI. * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT. * For CI Coverage Checking.	4 years ago
Wojciech Uss	6673fb0565	change import math.h to cmath (#29260 )	4 years ago
furnace	7584bb5096	Layer norm fp16 (#29169 ) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: zhiqiu <chenqiuliang@baidu.com>	4 years ago
Shang Zhizhou	c59b4f28a2	fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF (#29275 )	4 years ago
Shang Zhizhou	fc80d2e09c	add compile option WITH_TENSORRT (#29208 ) * add compile option WITH_TENSORRT * add WITH_TENSORRT to ci paddle_buils.sh * add WITH_TENSORRT to paddle_build.sh * change FATAL to WARNING when TensorRT is not found and WITN_TENSORRT=ON, just to pass ci-py3 temporarily	4 years ago
Leo Chen	116305ea4b	Improve performance of elementwise_add grad op (#29187 ) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut	4 years ago
卖鱼的哲学	07c67d5a8b	add deformable_conv op on xpu (#29234 ) * rebase develop * update deformable_conv op on xpu * update deformable_conv op on xpu	4 years ago
Chen Weihang	1de32f823d	Hot fix complle failed in gcc4.8 caused by complex impl (#29254 ) * hot fix complle failed in gcc4.8 * fix failed unittest	4 years ago
GeminiCarrie	642abe2a48	Fix a bug when running on an operating system without "bash." (#29131 ) * Fix a bug when running on an operating system without "bash." * add execution condition * for ci-coverage	4 years ago
ShenLiang	46b73e6cd9	Change the api of DataParallel and Fleet (#29224 )	4 years ago
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	4 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	4 years ago
Zhou Wei	c0a991c874	accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429 ) * The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * fix coverage * fix api doc * fix CI unittest * fix CI unittest * fix unitest * empty tensor does’t need inner_var_ * fix some error message	4 years ago
Wilber	74c43ac638	fix lite unit test. (#29233 )	4 years ago
Adam Osewski	4096ff94dc	Small optimizations for conv2d kernel subroutines. (#29188 ) - Make sure that oneDNN memory descriptors are created only once at first iteration.	4 years ago
joanna.wozna.intel	5c61eeef61	Enable all image classification models (#29155 )	4 years ago
Wilber	4fec182d24	[Lite-Subgraph] Fix compile error for lite subgraph. (#29146 )	4 years ago
123malin	b5c6342336	Update ps gpu (#29209 ) * fix paramete prefetch & device guard Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: chengmo <chengmo@baidu.com>	4 years ago
liym27	865a45984f	Check whether there is any inplace operation affecting gradient calculation. (#27901 ) * Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable. * Add a new attribute `_inplace_version` for VarBase. * Raise exception if an inplace operation can result in incorrect gradient computation. * Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation. * For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode. * Use original var_wrapper if the inplace_version is not changed. * Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.	4 years ago
chen zhiyu	4056c4f11c	Add unittest in musl build (#29099 ) * add musl docker build script * rm space test=document_fix * fix some docs and types errors test=document_fix * move install of python requirement to docker build * add copyright to docker file. * add extr opts * format docs * add ut test add pip cache * add more args description in readme * add stack backtrace in ctest * fix readme bugs	4 years ago
123malin	03d4665f44	prefetch optimize (#29095 ) * test=develop, optimize async prefetch	4 years ago
WangXi	0c2a51d240	optimizer amp, all use fp16 communication, overlap last comm and compute (#28957 )	4 years ago
Chen Weihang	0b032faeee	Polish unittests details and execution conditions to adapt to MUSL (#29044 ) * fix failed tests in yingchun gived list * add unittests into static_mode_white_list * add enable static * fix dist unittest * skip test_sigmoid_focal_loss_op & add gym * revert no need skip unittests * remove gym	4 years ago
123malin	92817f8005	test=develop, rm pathlib (#28658 ) * test=develop, rm pathlib	4 years ago
Wojciech Uss	4fd4095d1b	Add quantization of multi_gru op and tests (#28615 )	4 years ago
Jack Zhou	bc6033f86b	fix gru gcc7.4 bug for the gru compile fix gru gcc7.4 bug for the gru compile	4 years ago
wanghuancoder	0239f79695	Generate code coverage reports only for incremental files (#28508 ) * Generate code coverage reports only for incremental files, test=develop * Generate code coverage reports only for incremental files, test=develop * Generate code coverage reports only for incremental files, test=develop * test for diff python file, test=develop * fix no python diff report, test=develop * add cc test file, test=develop * fix bug in generic.cmake, test=develop * for debug no cc report, test=develp * modify compire branch form test_pr to test, test=develop * fix bug, test=develop * test for h file changed, test=develop * debug for redefinition of argument optimize error, test=develop * close -o3 for test, test=develop * remove -o3 for test, test=develop * remove coverage option for nvcc, test=develop * use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop * reopen -o3, test=develop * remove debug code, test=develop * remove unused code, test=develop	4 years ago
wangchaochaohu	b818429ae7	optimize cumsum OP (#29193 )	4 years ago
ShenLiang	e2d01eb650	Support dynamic graph distributed (#28997 ) * add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document	4 years ago
lilong12	7e5e9934fe	update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020 ) * update, test=develop	4 years ago
pangyoki	7c8ac064c8	Delete prettytable in condabuild (#29145 ) * update conda_build script with removing opencv * modified filepath * modified some content * Delete Commented-Out Code * delete prettytable in conda_build Co-authored-by: XieYunshen <1084314248@qq.com>	4 years ago
Zhou Wei	e668cb07fb	fix CUDA 11 error on windows (#29101 )	4 years ago
Jack Zhou	085260f3de	Add eigen gru and fix the dropout bug in the rnn Add eigen gru and fix the dropout bug in the rnn	4 years ago
yaoxuefeng	545df287fc	add user_define_dump (#28596 )	4 years ago
Aurelius84	71815637cc	Move gym into unittest/requirements.txt (#29149 )	4 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	4 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	4 years ago
Leo Chen	fd3fcb051a	fix typo of flag name (#29154 )	4 years ago
Noel	da71173bc9	Fix ops doc for some ops Fix ops doc for some ops	4 years ago
Leo Chen	770395cb93	Split train_mode and has_grad for tracer (#29064 ) * split train_mode and has_grad * fix format * fix ci problems * fix sample code	4 years ago
Aurelius84	7ae3cb554a	Polish CUDA Information stdout (#29109 )	4 years ago
chalsliu	7a15e64034	Support precision test for new ut	4 years ago
WangXi	173c22aec2	optimize fast graph executor (#28962 )	4 years ago
Shang Zhizhou	562ded1041	fix unittest trt_dynamic_shape_transformer_prune_test error (#29122 )	4 years ago
Shibo Tao	db41258501	add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. (#29034 )	4 years ago
joanna.wozna.intel	b0d1ac161e	Add bf16 pool2d and unify bf16 unit tests (#29039 ) * Add bf16 pool2d and unify bf16 unit tests * Add change default ops test	4 years ago
joanna.wozna.intel	fddea67445	Fix cpu_bfloat16_pass (#28730 ) * Fix cpu_bfloat16_pass * Add output_format * Fix incorrect SetOutput * Change fromating	4 years ago
Qi Li	2fd16cf6fc	fix win ci failure, test=develop (#29089 ) * fix win ci failure, test=develop * add ci test, test=develop	4 years ago
Chen Weihang	fea0e294ee	Hide the C++ stack by default and add hints (#29042 ) * default not show cpp statck & add hint * fix failed unittest * fix failed unittests	4 years ago
Chen Weihang	b1274ac3d6	set show cpp stack by default, test=document_fix (#29102 )	4 years ago
joejiong	582c0a0468	add uint8 for reshape op (#28996 ) add uint8 for reshape operator	4 years ago
Zhou Wei	8ca0a8a859	fix tensor detach to zero copy (#27921 ) * fix tensor detach to zero copy * fix tensor detach to zero copy	4 years ago
Aurelius84	8af0d85ea4	fix unittest failed on windows GPU (#29072 )	4 years ago
taixiurong	a5aa4dc7a9	add xpu elementwise ops (#29031 )	4 years ago
joejiong	b04c78ef5e	Update pow (#29000 ) Simple code clean up	4 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	4 years ago
Wojciech Uss	7b5a8e46de	Add multi_gru_fuse_pass and tests (#28601 ) * Add multi_gru_fuse_pass and tests * fix date * cleaned up headers	4 years ago
LoveAn	c91bb084f4	Add op benchmark ci pipeline in Paddle repo (#28692 )	4 years ago
Zhou Wei	5e26a15484	Open GPU unitest on windows (#29003 ) * open unittests on windows * open GPU unittest on windows	4 years ago
Leo Chen	3815d7aa40	Upgrade string literals to raw string (#28989 ) * upgrade comment string to raw string * fix string in * fix string with ' ' * revert update on comments * upgrade only necessary * fix sample code checker * fix comments with '''	4 years ago
lilong12	767d0ba267	update, test=develop (#28700 )	4 years ago
Wojciech Uss	991345b368	Add multi_gru_seq_fuse_pass and tests (#28604 ) * Add multi_gru_seq_fuse_pass and tests * fix date * removed unused functions	4 years ago
123malin	fbf9564f6b	【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442 ) * test=develop, optimize global_step	4 years ago
lilong12	f77a78cdee	enable pipeline to run with Executor.run() (#28373 ) * update, test=develop	4 years ago
Thunderbrook	0073f9bdb0	support ps-gpu (#28752 ) * ps gpu transpile * ps gpu * remove op * gps trainer * local ps * add macro * HeterBox * def cuda * tab * code style * style Co-authored-by: Thunderbrook <a754913769#163.com>	4 years ago
Chen Weihang	768dab441e	polish two api doc detail, test=document_fix (#28971 )	4 years ago
furnace	8ff3550658	refactor momentum op to combine weight (#27414 ) * refactor momentum op to combine weight_decay (scale op and sum op)	4 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	4 years ago
chen zhiyu	3d0ff8eebc	optimize musl docker build script (#28974 ) * add musl docker build script * rm space test=document_fix * fix some docs and types errors test=document_fix * move install of python requirement to docker build * add copyright to docker file. * add extr opts * format docs	4 years ago
Pei Yang	994673bf4f	change avg pooling and global pooling to trt layer in dynamic shape mode (#28702 ) * change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg	4 years ago
yaoxuefeng	71c1cd1408	fix truncated_gaussian seed (#28777 )	4 years ago
HappyAngel	de528981e5	fix paddlepredictor build error. test=develop (#28792 )	4 years ago
Wilber	a22ea652cf	fix trt delete_pass bug. (#28763 )	4 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	4 years ago
Chen Weihang	b969c32ab1	fix occupied 0 device memory bug (#28771 )	4 years ago
joejiong	1a532d5133	add uint8 support for squeeze operator (#28734 ) Adding uint8 support for squeeze operator.	4 years ago
wangchaochaohu	8b853b3030	fix the number of perf algo for conv cudnn in exhaustive mode (#28694 )	4 years ago
joanna.wozna.intel	8c0ea4bffe	Add bf16 matmul, fc, elementwise add and mul (#28729 ) * Add bf16 matmul, fc, elementwise add and mul * Correct unit test	4 years ago
Wojciech Uss	efc3b182f0	a fix for the fc_lstm_fuse_pass (#28709 )	4 years ago
Zhou Wei	3b0dd5f620	fix bug that to_tensor not support paddle.Place (#28717 )	4 years ago
yaoxuefeng	08b62f4902	fix shuffle batch op shuffle (#28533 )	4 years ago
taixiurong	d3d1a6b6e0	add kunlun kernel: slice, slice_grad, top_k, cast. test=kunlun (#28542 ) 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api * 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api	4 years ago
Jack Zhou	9362d85e0e	Add LSTM, Simple RNN and GRU CPU kernel (#28577 ) * add lstm, simple rnn op kernel * fix the test_lstm for the rnn op * change func name * fix forward postprocess bug * add gru forward, backward code * remove unittest.skipIf; use a big rnn op instead of combination op * fix input doesn't have gradient bug * add eigen lstm forward, backward Co-authored-by: wawltor <fangzeyang0904@hotmail.com>	4 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	4 years ago
Zhang Ting	dab4920568	improve performance of cast op (#28727 )	4 years ago

... 2 3 4 5 6 ...

18242 Commits (342d62de60850d1e991b1a23aed360c1d6f78bbf)