alncat
5b59499e57
fixed compilation error on gcc 4.8.x due to the usage of isfinite ( #30733 )
4 years ago
Chengmo
78d37c3f75
【Paddle.Fleet】Fix brpc get hostname ( #30703 )
...
* fix Brpc get hostname
4 years ago
taixiurong
caf3680bbc
fix bugs in transformer predict in xpu place ( #30730 )
...
* transformer predict
* trans bug fix
4 years ago
jakpiase
f8da5536ed
REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30719 )
...
* added external reorder to profiler
* resolved conflict
* added enable_static
* initial version of lstm, not working yet
* added lstm to operators.cmake
* added vanilla lstm mkldnn op
* added peephole weights integration
* minor changes
* added formatting
* added fusion_lstm_mkldnn to static_whitelist
* added formatting
* removed comment
* moved use_peepholes attribute inside is_cached block
* reverted wrong changes
* minor formatting change
* minor changes
* changed stream handling
* minor change
* added datatype to GetExpectedKernelType()
* added reading stream from TLS
4 years ago
liuyuhui
67abfc1588
[Kunlun] fix dead lock for exec_op_count_ ( #30718 )
4 years ago
alncat
5ace20fc3f
modified conv+bn fuse pass to fix wrong mask in mask rcnn ( #30704 )
4 years ago
Tao Luo
824a79d383
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30661 )" ( #30708 )
...
This reverts commit d834f4e6e8
.
4 years ago
lilong12
7fbc68a2c0
update, test=develop ( #30692 )
4 years ago
jakpiase
d834f4e6e8
Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30661 )
...
* added external reorder to profiler
* resolved conflict
* added enable_static
* initial version of lstm, not working yet
* added lstm to operators.cmake
* added vanilla lstm mkldnn op
* added peephole weights integration
* minor changes
* added formatting
* added fusion_lstm_mkldnn to static_whitelist
* added formatting
* removed comment
* moved use_peepholes attribute inside is_cached block
* reverted wrong changes
* minor formatting change
* minor changes
4 years ago
arlesniak
5bf25d1e8b
More precise mkldnn kernel rules in GetExpectedKernelType ( #29840 )
...
* More precise mkldnn kernel choice in GetExpectedKernelType
* Fixes after review
* Refresh develop for CI
* CI experiment
* get back from CI exper
4 years ago
Jacek Czaja
173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op ( #30358 )
4 years ago
Shang Zhizhou
ae0f88a988
add DLA support:C++&&Python api ( #30165 )
...
* add dla
* add dla done
* add python api
Co-authored-by: shangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>
4 years ago
chentianyu03
fb7fbc7a5d
fix abs bug and add abs test case ( #30637 )
...
* add abs test case
* use std::abs to fix abs bug
* fix the abs bug
* fix abs bug
4 years ago
ShenLiang
9514b4aa5f
Fix scatter grad bug ( #30604 )
4 years ago
Pei Yang
cf9bdb9404
extend trt ut timeout threshold ( #30537 )
4 years ago
Thunderbrook
1bebc09253
solve build gpu task core ( #30626 )
...
* build gpu task core
* format
4 years ago
石晓伟
33bf6eb753
revert external gflags, test=develop ( #30623 )
4 years ago
Jacek Czaja
dfdb0359ea
- Disabling oneDNN inplace pass ( #30588 )
4 years ago
TTerror
10271ddfc4
support reduce_max op on kunlun ( #30581 )
...
* support reduce_max op on kunlun
* support reduce_max op on kunlun
* support reduce_max op on kunlun
* support reduce_max op on kunlun
4 years ago
QingshuChen
5013c67644
fix softmax bug for multi_card in kunlun ( #30600 )
4 years ago
wuhuanzhou
7e671c07b6
optimize unity build ( #30195 )
...
* optimize unity build, test=develop
* fix code style error, test=develop
* fix code style error and test /MP settings, test=develop
4 years ago
liuyuhui
e5b0d9e1fc
[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor ( #30586 )
4 years ago
Zhou Wei
9674e440e2
optimize windows CI, clear tp cache,polish code,improve level of msvc log ( #30579 )
4 years ago
wanghuancoder
90773473a0
use nvtx push pop in timeline ( #30567 )
...
* delete empty line of pybing.cc, test=develop
* use nvtx push pop in timeline, test=develop
* change year, test=develop
* add #ifdef PADDLE_WITH_CUDA, test=develop
* add #ifndef WIN32, test=develop
* is_pushed to is_pushed_, test=develop
4 years ago
chentianyu03
358106fcb0
make abs op support complex types ( #30375 )
...
* rewrite abs op
* rewrite abs op and remove abs in activation
* remove abs register in old codes
* fix abs_grad type error
* fix abs double_grad output name error
* modify abs_grad, abs_grad_grad functor for windows building
* format code style
* fix the bug of result is nan when the divisor is zero
* add missing abs attr and add abs for float16
4 years ago
Wilber
2d5758c456
update. ( #30585 )
4 years ago
Tao Luo
9dd71c74df
disable test_analyzer_detect ( #30541 )
4 years ago
tangwei12
c9e78a22c5
add trainers for pserver ( #30523 )
...
* add trainers for pserver
Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8
4 years ago
wanghuancoder
d1b25ed9d7
add some RecordEvent, for dygraph timeline ( #30299 )
...
* add some RecordEvent, for dygraph timeline, test=develop
* change GpuMemcpySync to memory::Copy, test=develop
* fix compile problem, test=develop
* fix compile problem, test=develop
* fix, test=develop
* fix, test=develop
4 years ago
YUNSHEN XIE
bbea5a1fa9
The new unit test cannot have the same name as the existing unit test ( #29878 )
...
* check UT Duplicate name
* fix error
* Optimized log display
* modified exit code
4 years ago
liym27
ff25c5b36f
Fix bug: GetAttrValue should deal with attr with attrType vector<double> ( #30536 )
4 years ago
WangXi
572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer ( #30455 )
4 years ago
ykkk2333
549855ac20
add rmsprop_op_xpu test=kunlun ( #30493 )
...
* add rmsprop_op_xpu test=kunlun
* modified rmsprop_op_xpu error code. test=kunlun
4 years ago
Zhou Wei
fb20ec9a4e
fix bug of multicard grad ncclAllReduce ( #30553 )
4 years ago
Zhen Wang
f30d00553a
Fix the compiling error of update_loss_scaling when using cuda9. ( #30538 )
4 years ago
Leo Chen
81217a94d8
unify calling cudaSetDevice ( #30470 )
...
* unify calling cudaSetDevice
* fix compile
4 years ago
pangyoki
00554b3f6b
fix error message of Inplace strategy ( #30520 )
4 years ago
Leo Chen
7043b8cfc6
support layer_norm fp16 in dygraph amp ( #30430 )
...
* support layer_norm fp16 in dygraph amp
* add ut
* refine code
4 years ago
wanghuancoder
59ad6ff3e3
delete empty line of pybing.cc, test=develop ( #30529 )
4 years ago
hutuxian
e207fe6385
Ascend Framework Part2: pybind files ( #30410 )
4 years ago
hutuxian
40ede12631
Ascend Framework Part1: OP & Wrapper ( #30281 )
4 years ago
liuyuhui
843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization ( #30317 )
4 years ago
QingshuChen
8489d4f76f
optimize batch_norm & pool op for kunlun ( #30490 )
4 years ago
wanghuancoder
bd97192274
if pybind.cc changed, generate total report, test=develop ( #30514 )
4 years ago
taixiurong
5e5c2827a3
fix range op crash in dygraph xpu place ( #30469 )
4 years ago
JZ-LIANG
16ba0abc79
Recompute Offload: fixed bug in memcpy ( #30484 )
4 years ago
guofei
11e78ebaa3
Modify the calculation logic of LambOptimizer ( #29313 )
...
* Modify the calculation logic of LambOptimizer
4 years ago
Adam Osewski
c5ffad126c
[oneDNN] Refactor fuse pass helper functions to one place. ( #30460 )
...
* Move pass tester helper functions to single common place.
* Use helper functions in two more fuse pass tests.
4 years ago
Zhang Ting
c9a334e1b3
add VecCastCUDAKernel ( #30296 )
4 years ago
pangyoki
13d757362c
Add Inplace strategy (Output reuse Input Varbase) in dygraph ( #30103 )
...
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* fix test_cross_entropy_loss error because of reshape2
* add inplace strategy
* add elementwise_add sub
* let backward op not use inplace
* grad op do not use inplace
* fix memory increase error and add leaf error message
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
* add unittest and leaf error message
* merge view error
* optimize op_function_generator format and support sum inplace op
* fix format of basic_engine
* fix format for framework
* little change of variable wrapper
* add reshape, squeeze, unsqueeze, scatter api
* add relu elu tanh softmax inplace api
* fix test_squeeze_op unittest
* fix test_relu_op unittest
* fix comment problems
* delete sample code of inplace api
* add reference of grad_pending_nodes in basic_engine
* fix unittest name
* add inplace apis into wlist
* fix error message
* add PADDLE_ENFORCE for set grad op twice
* fix head file error
4 years ago
Yang Zhang
008b0a8b56
Fix float64 bug in layer norm ( #30452 )
...
built-in `rsqrt` is shadowed
4 years ago
石晓伟
715d862868
export global google flags to users, test=develop ( #30448 )
4 years ago
Wojciech Uss
88fc7a7d68
fix cache key for inplaced elementwise ops ( #30404 )
4 years ago
wawltor
3d49882e2c
fix the rnn mask memory bug for out of read ( #30459 )
...
* fix the rnn mask memory bug for out of read
* update the code for the rnn
4 years ago
taixiurong
6a3c8725b0
support transformer v2.0 ( #30381 )
4 years ago
ShenLiang
e85be1b1b2
fix flatten api grad ( #30426 )
4 years ago
yaoxuefeng
6e0da01c61
Heter ps new ( #30198 )
4 years ago
123malin
2a98e9323a
test=develop, add distributed_infer ( #30300 )
...
* test=develop, add distributed_infer
4 years ago
QingshuChen
cf786d22ec
fix bug that cann't find mkldnn(kunlun) ( #30394 )
4 years ago
cc
8e3a294045
skip quantizing ops in cpu inference ( #30342 )
...
* skip quantizing ops in cpu inference, test=develop
4 years ago
alncat
7bbf3ac5ab
Added support for inference using quantization aware trained dygraph ( #30288 )
...
* added support for inference using qunatization aware trained dygraph
* added support for inference using qunatization aware trained dygraph
correct boost get usage
* Delete incorrect warning message (#30196 )
* fix warning and no grad
* clean redundant API alias in 2.0 - part 2 (#30013 )
* delete paddle.nn.functional.assign
* fix dynamic to static error
* just add the op error message for the matmul xpu (#30246 )
add the op error message for the matmul xpu
* Add Static Variable Clone (#30208 )
Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat
* use wget to replace curl to download the lcov file (#30229 )
* use wget to replace curl to download the lcov file
* add cache for lcov
* fix test_pool3d_op timeout issue (#30248 )
* Fix unittests bugs. (#30250 )
* modify error message based on comments (#30189 )
* modify error message based on comments
* edit code according to review.
* Correct spelling according to review.
* Fix bug for 'save mutiple method' (#30218 )
* Fix bug for 'save mutiple method'
* To pass coverage.
* edit code to pass coverage.
* edit code to pass coverage.
* add unittest for coverage.
* change for coverage.
* edit for coverage.
* added support for inference using qunatization aware trained dygraph
* Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206 )
* add alias from fluid.layers.auc to static.auc
* Update __init__.py
* added support for inference using qunatization aware trained dygraph
correct boost get usage
* corrected boost get usage
* corrected naming issues and enforcing zero check
* correct paddle enforce message
* added more error checkings
* corrected error report message and optimized code
* corrected findvar usage
* corrected paddle_enforce in scope
* correct error messages
* correct error reporting format
Co-authored-by: LielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: XiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com>
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Huihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
Co-authored-by: Bai Yifan <me@ethanbai.com>
Co-authored-by: gongweibao <weibao.gong@gmail.com>
Co-authored-by: WeiXin <weixin10@baidu.com>
Co-authored-by: Jiaqi Liu <liujiaqi06@baidu.com>
4 years ago
GaoWei8
180877e988
Softmax backward optimize ( #30249 )
...
* softmax backward optimize
4 years ago
Zhou Wei
b1d8ff45d7
running unit test sigle GPU parallely on Linux/windows GPU ( #29523 )
4 years ago
Zhang Jun
10a8f3e5c3
fix bug on compiling inference shared lib with crypto;test=develop ( #30269 )
...
* fix bug on compiling inference shared lib with crypto;test=develop
* fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF
* update cmake
* remove unnecessary enforce message
4 years ago
Huihuang Zheng
28e156c27f
Fix Sleep Error in enforce.h ( #30335 )
...
usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.
4 years ago
Leo Chen
3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card ( #30338 )
...
* set expected place in child thread for dataloader
* set device id when set tensor from numpy
* revert tensor_py change
* add compile guard
* fix ci
* fix bug
4 years ago
QingshuChen
2c1bba02e4
optimize memcpy perf for kunlun ( #30291 )
...
* optimize memcpy perf for kunlun
* remove useless unitest for kunlun mean
* minor
4 years ago
ShenLiang
a60f17b89d
Support unused parameters in dynamic graph distributed ( #30224 )
4 years ago
JZ-LIANG
75936d838f
Recompute Offload ( #30233 )
4 years ago
lidanqing
a60893f6b5
correct the allowed dimension size ( #30326 )
4 years ago
Chen Weihang
c8c8f205ba
remove c++ stacktrace hint ( #30325 )
4 years ago
tangwei12
5e839e4da5
add sparse embedding & load vars for 2.0 & gloo bug fix ( #30306 )
...
* add sparse embedding & load vars for 2.0
Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b
* fix hdfs gloo
Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6
* fix gloo hdfs
Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e
* move loadvar/sparse embedding from incubute to static
Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
4 years ago
tangwei12
25f80fd304
Fix/distributed proto ( #29981 )
...
* rename sendrecv.proto to namespace paddle.distributed
* split ps with distributed
4 years ago
Chengmo
d479ae1725
【Paddle.Fleet】Support local save sparse param ( #30175 )
...
* add save tensor support
Co-authored-by: seiriosPlus <tangwei12@baidu.com>
4 years ago
Double_V
231501fefc
fix elugradgrad test fail & error message opt ( #30171 )
...
* fix elugradgrad test fail and error message opt
* fix unitest,test=develop
* Update prroi_pool_op.h
fix error message
* opt message,test=develop
* fix ci fail,test=develop
4 years ago
Zhen Wang
fb49ea388e
Fix the accuracy problem of allclose op when using float64 data type in static mode. ( #29890 )
...
* Fix the accuracy problem of allclose op when using float64 data type in static mode.
* Format the code style.
4 years ago
yaoxuefeng
4656525e24
fix datanorm error msg ( #30294 )
4 years ago
furnace
77051cc9f0
add fp16 support for tril_triu op ( #30186 )
4 years ago
石晓伟
efa54629fb
fix header file paths of gflags, commit 3, test=develop ( #30273 )
4 years ago
Chengmo
5b2c15afcd
Fix server.h include device_context ( #30243 )
...
* fix cmake
Co-authored-by: seiriosPlus <tangwei12@baidu.com>
4 years ago
石晓伟
a0ee09148e
enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop ( #30240 )
4 years ago
石晓伟
a66eebab5c
fix header file paths of gflags, commit 4, test=develop ( #30274 )
4 years ago
石晓伟
8c4500ff6d
fix header file paths of gflags, commit 2, test=develop ( #30272 )
4 years ago
liym27
b4989fb744
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value ( #30126 )
4 years ago
wangchaochaohu
8dcae0c55d
register OPMaker and Infer Shape Check for fused_elementwise_add ( #30259 )
4 years ago
AshburnLee
924aac2216
Add tf32 switch for cuDNN ( #29192 )
4 years ago
石晓伟
8ce2482b80
fix header file paths of gflags, commit 1, test=develop ( #30271 )
4 years ago
chentianyu03
c7371b7b20
type promotion for grad ( #30177 )
...
* type promotion for grad
* add type promotion for div op
4 years ago
liym27
3ce878f309
Check the rank of input in kernel of set_value op ( #30147 )
4 years ago
WeiXin
66dc4ac77b
modify error message based on comments ( #30189 )
...
* modify error message based on comments
* edit code according to review.
* Correct spelling according to review.
4 years ago
wawltor
fee424411a
just add the op error message for the matmul xpu ( #30246 )
...
add the op error message for the matmul xpu
4 years ago
GaoWei8
0a21924a8d
optimize softmax forward ( #30217 )
...
* optimize softmax forward
4 years ago
wangchaochaohu
af80859dd6
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) ( #29885 )
4 years ago
zhang wenhui
5932fee60a
enhance error message, test=develop ( #30220 )
4 years ago
pangyoki
da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op ( #29913 )
...
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
4 years ago
Jacek Czaja
4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching ( #30203 )
...
* - Added UT for testing elementwise_mul caching
* lint fixes
4 years ago
Zhen Wang
7f7dfccf20
Support pure fp16 training for AMP API. ( #29544 )
...
* add cast ops before and after unsupported fp16 ops.
* Keep partial net in FP32 pattern.
* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
* Add fp16 support for adam op.
* add multi precision attr for adam.
* Fix the bug of test_multi_precision_fp16_train UT.
* Code format for CI.
* Fix the redefine error about MPTypeTrait on windows.
* fix bugs of the _create_accumulators func in Momentum.
* fix bug when inserting post cast op.
* Add the update_loss_scaling op in allow_set of UnusedVarCheck.
* Update for ci coverage.
* Add some doc for OptimizerWithMixedPrecision.
* Fix the code style.
* Imporve the doc of `amp_init`.
* Change for fp16 testing if users have the infer program defined in separate way.
4 years ago
Leo Chen
789743e190
use cuda generator in bernoulli cuda kernel ( #30199 )
4 years ago
Leo Chen
8696335f86
Fix dtype of ungenerated grad var ( #28511 )
...
* fix dtype of ungenerated grad var
* update ut
* refine code
* set default dtype
* fix could_use_cudnn bug
* remove debug code
* re-implement
* fix bug
4 years ago
Wilber
609c022222
shape op support int8 and uint8 tensor ( #30201 )
4 years ago
Wilber
01a287bf0a
fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON ( #30194 )
4 years ago
ruri
e42e1e80dc
Add version checking, test=op_version ( #30129 )
4 years ago
Leo Chen
1f97d61c68
Add callback after TensorCopy ( #30123 )
...
* change to tensor copy sync
* change to tensor copy sync
* make copy_to safe when use TensorCopy
* refine code
* add ut
* add cudapinned garbagecollector
* add testcase: cpu place -> cuda pinned place
4 years ago
Chengmo
528e03fc08
【Paddle.Fleet】Fix tensor table ( #30075 )
...
* add tensor table
4 years ago
Wilber
ade244948c
disable mkldnn inplace pass on windows ( #30164 )
4 years ago
joanna.wozna.intel
907262ee15
Fix analysis predictor test ( #30191 )
...
* Add a necessary condition
* Remove test for white list and add header
4 years ago
lijianshe02
2dc7ee276b
enhance error message of nll_loss op test=develop ( #30125 )
...
* enhance error message of nll_loss op test=develop
4 years ago
Huihuang Zheng
54bf3f5a56
Refine PADDLE_ENFORCE Error Messages. test=develop ( #30149 )
...
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
4 years ago
Chen Weihang
d0fb06b27f
[Complex] Simplify prepared op impl to improve performance ( #30153 )
...
* simplify prepared op impl to improve performance
* fix kunlun compile error
* continue fix kunlun compile error
* only transform diff place when dtype diff
* fix failed unittests
* remove useless file
* polish impl by review comment
4 years ago
123malin
c5b415bfd9
Improve Index select cuda kernel ( #30139 )
...
* test=develop, add index_select_cuda kernel
4 years ago
wangchaochaohu
7dd551e08b
refine the paddle place support using str ( #28769 )
4 years ago
WeiXin
404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t ( #30161 )
4 years ago
Wilber
91a8a25721
enhance error info for py_func ( #30138 )
...
* enhance error info for py_func
* update
4 years ago
weihaoji
b8207af6bc
[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop ( #30122 )
4 years ago
liuyuhui
15fac5e7fa
fix assign_op_xpu concat_op_xpu warining ( #30120 )
4 years ago
Jack Zhou
f5428eca4f
fix enforce msg of sum xpu op ( #30113 )
4 years ago
123malin
198fbdfb60
Add Lookahead and ModelAverage Optimizer ( #30004 )
...
* test=develop, add model_average and lookahead
4 years ago
Leo Chen
adac38c506
add dispenable input for core.ops.reshape2/expand/slice ( #30072 )
...
* add dispenable input 'shape' for core.ops.reshape2
* add dispenable inputs for core.ops.reshape2/expand/slice
* add ut
4 years ago
ShenLiang
becf99d2e8
fix error message ( #30135 )
4 years ago
Zhou Wei
30888ca343
Polish and Optimize the print/repr information of Layer ( #29998 )
...
* Polish and Optimize the print/repr message of all layer
* fix some code format
4 years ago
Zhou Wei
9c99d37906
fix unittest failed on windows ( #29837 )
4 years ago
wangguanzhong
69839f8a9a
fix error message for distribute_fpn_proposals_op ( #30116 )
4 years ago
QingshuChen
8e1c3ddf15
add aarch64 and sunway kunlun lib ( #30027 )
...
* add aarch64 and sunway kunlun lib
* minor
* optimize elementwise_add for kunlun
* update kunlun dependence
* minor
* minor
4 years ago
Shang Zhizhou
05b27695f1
add inference api: DisableTensorRtOps ( #30109 )
...
* snap
* add inference api: DisableTensorRtOPs
* fix code style
* update api to experimental
* update variable name
4 years ago
石晓伟
53bb126510
fix a bug in op_version_registry, test=develop, test=op_version ( #29994 )
4 years ago
xiemoyuan
3e0c492910
Optimize the error message of framework. ( #30134 )
4 years ago
liym27
9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result( #30003 )
...
1. when slice_item is a slice:
1) the start of __getitem__ should be std::max(start, 0) if slice
2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data
4 years ago
chentianyu03
666e665132
change the kron gradient when complex types ( #29995 )
4 years ago
chentianyu03
a5e422c85d
add trace op_register_version and fix version bug; test=op_version ( #30000 )
...
* add trace op_register_version and fix defaulf bug; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* fix missing the template bug of vector; test=op_version
4 years ago
cc
9f34374b48
Fix the formate of raising error in randperm op ( #30108 )
...
* fix the formate of raising error in randperm op
4 years ago
liuyuhui
254ad61959
fix xpu pe sync, test=notest ( #30095 )
4 years ago
Thunderbrook
0b8e1fadc5
add topo-aware in heter-ps ( #30087 )
...
* add topo aware
* resource.h
* topo aware
* format
4 years ago
hong
297fff1a79
support dygraph in xpu place ( #30051 )
...
* support dygraph in xpu place; test=develop
* fix cpu/gpu compile error; test=develop
* fix compile error; test=develop
* fix xpu compile error; testd=develop
4 years ago
wangchaochaohu
d0a5620575
fix the compiler error when gcc4 cuda9.0 ( #29997 )
4 years ago
WangXi
ee16006b5d
Optimization grad merge performance ( #29784 )
4 years ago
yongqiangma
e891f4da1b
Add p_norm op version info ( #30042 )
...
* p_norm fix op version info. test=develop
4 years ago
tangwei12
7d1c149e09
for inference checkpoint ( #30081 )
...
* for inference checkpoint
Change-Id: I36c979240ffa55bf1ef0c9315402960762af6be4
* for inference checkpoint
Change-Id: I82025365d5b792cbea1ead506df685aecc8ac198
4 years ago
tangwei12
7d4bdff07d
fix large scale memory ( #30035 )
...
* memory holder optimize
Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e
* memory holder optimize
Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f
* fix large scale memory holder
Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f
* fix large scale memory holder
Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655
4 years ago
Shang Zhizhou
08dc5bc27e
fix op version checker of pass bug ( #30028 )
...
* fix op version checker of pass bug
* fix code style
* update pass version
4 years ago
cc
68398abce9
[Inference] zero_copy_tensor supports int8_t ( #30053 )
...
* zero_copy_tensor supports int8_t
4 years ago
whs
1b999d2b5d
Add version checking ( #30040 )
4 years ago
ceci3
85b2f05ab0
register ModifyAttr for instance_norm, test=op_version ( #30065 )
...
* register instance norm, test=op_version
4 years ago
channings
ddcff254db
fix op_register_version for compare ops, test=op_version ( #30007 )
...
Co-authored-by: zhoushunjie <zhoushunjie@baidu.com>
4 years ago
Wilber
66e16b7e99
update lite subgraph. ( #30056 )
4 years ago
GaoWei8
a64822589f
add REGISTER_OP_VERSION for LSTM ( #30038 )
4 years ago
yinhaofeng
6e93fb92f9
Register op version for linspace,test=op_version ( #30025 )
...
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
4 years ago
123malin
d0056c324d
test=develop, add op_register_version for roll_op ( #30023 )
...
* test=develop, add op_register_version for roll_op
4 years ago
chentianyu03
e012930aa3
complex gradient matmul ( #29966 )
...
* dot op support complex types
* matmul support complex types
* add test case
* matmul broadcast gradient support complex
* move conjFunctor to complex_functor.h
4 years ago
ShenLiang
893d37e5c6
Fix rank_attention op_version, test=op_version ( #30006 )
...
* fix rank_attention, test=op_version
4 years ago
Adam Osewski
13aef97043
operator checkpoints for new attributes. ( #29832 )
...
* Add operator checkpoints for new attributes.
* Fix adding subsequent checkpoint to quantize op.
4 years ago
wangguanzhong
844d8e0c2c
add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version ( #30034 )
4 years ago
cc
c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op ( #30016 )
...
* Add mkldnn nearest_interp and bilinear_interp op
* don't run mkldnn interpolate in default
* add interpolate_mkldnn_pass
4 years ago
chalsliu
c053bf2a57
Revert "register ModifyAttr for instance_norm, test=op_version ( #29938 )"
5 years ago
wawltor
cc2f94620c
add the support the op version check for matmul, test=op_version ( #30011 )
...
* add the support the op version check for matmul, test=op_version
5 years ago
wawltor
b33aaea86c
add the op version check for the elementwise ops, test=op_version ( #30010 )
...
* add the op version check for the elementwise ops, test=op_version
* add the support check for elementwise_ops, test=op_version
5 years ago
Chengmo
4cbcc9b6da
fix momentum op register ( #29941 )
...
* fix momentum op register
5 years ago
hutuxian
7c1f69bdf0
add op_version for flip op [test=op_version] ( #30019 )
5 years ago
ceci3
77c1684397
register ModifyAttr for instance_norm, test=op_version ( #29938 )
...
* upgrade instance_norm, test=op_version
* fix
5 years ago
Leo Chen
47d10c55d5
Enhance debugging ( #30001 )
...
* add debug code
* add place info
* fix compile problem
* add place for output
5 years ago
FlyingQianMM
d42f93e504
add op_register_version for allclose op; test=op_version ( #29968 )
5 years ago
wawltor
8f49f9d5c9
change the elementwise ops version check, test=op_version
...
change the elementwise ops version check, test=op_version
5 years ago
guofei
b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop ( #29957 )
...
Add moving_average_abs_max_scale op_register_version
5 years ago
Thunderbrook
0ca6de171f
add include ( #29952 )
5 years ago
zhangchunle
631d783748
fix bug in windows ci ( #29963 )
5 years ago
Pei Yang
6206b9bc71
fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test ( #29977 )
5 years ago
wangxinxin08
be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version ( #29937 )
5 years ago
石晓伟
958612231f
compile the denormal.cc on aarch64, test=develop ( #29956 )
5 years ago
Guo Sheng
6ac4f0af6a
Register op version for coalesce_tensor. ( #29940 )
...
test=develop
test=op_version
5 years ago
Chen Weihang
a1d9a14e89
support grad accumulated across batch ( #29942 )
5 years ago
cc
6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul ( #29911 )
...
* map matmul/squeeze2+matmul/reshape2+matmul to mul
5 years ago
Huihuang Zheng
d038746e1c
Fix Unix Sleep for Wrong Time. test=develop ( #29953 )
...
PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it.
After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html , the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".
5 years ago
YUNSHEN XIE
121658d251
Support xpu ut coverage ( #29892 )
...
* add xpu_coverage function
* xpu coverage ipipe only deal with xpu files
* fix import error
* fix format error
* 'fix format error'
* fix format error
* fix error
* fix format error
* fix format error
5 years ago
Jack Zhou
5a4e42ca9a
add gru op_register_version; test=op_version; ( #29931 )
...
* add gru op_register_version; test=op_version;
* Update fc,mul version;test=op_version;
5 years ago
Wilber
2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. ( #29925 )
5 years ago
Qi Li
913f77a4b7
Register op version for print, test=op_version ( #29945 )
5 years ago
石晓伟
181ea1870b
flush denormals to zero, test=develop ( #29924 )
...
* flush denormals to zero, test=develop
* add comments, test=develop
5 years ago
cc
7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version ( #29923 )
...
* add op version for fake_quant and fake_dequant ops, test=op_version, test=develop
5 years ago
石晓伟
acb5e86363
fix a bug in reset_tensor_array, test=develop ( #29620 )
...
* fix a bug in reset_tensor_array, test=develop
* ci coverage, test=develop
5 years ago
liuyuhui
3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor ( #29926 )
5 years ago
Wilber
332da133a1
Support mips arch ( #29903 )
...
* Support MIPS arch.
5 years ago
LielinJiang
eab0b60e16
Register op version for grid_sampler, test=op_version ( #29916 )
...
* register op version for grid_sampler
5 years ago
liym27
9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor ( #29842 )
...
* Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )"
This reverts commit b10ecd9d3a
.
* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase
5 years ago
liuyuhui
4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor ( #29574 )
5 years ago
LielinJiang
0f4b218640
Enable bilateral_slice unittest on windows platform ( #29896 )
...
* enable bilateral_slice unittest on windows platform
* reduce max threads
5 years ago
Ren Wei (任卫)
95df0e1447
Add the ipipe log param prefix ( #29545 )
...
* Add the ipipe log param prefix
1. add the prefix;
2. using Colon before the metric values;
* 增加效率云日志指标收集前缀
暂未验证windows bat的这个字符串替换是否正常
* Preserve The Old Format Metrics During The Transition Period
Please DELETE the old format metrics log finally.
The period man last for a week.
* ipipe_log_param + ccache and clcache ..
5 years ago
YUNSHEN XIE
2a01756bf3
remove duplicate ut names ( #29809 )
5 years ago
Chen Weihang
a6072055be
[Complex] Handle complex to real after type promotion ( #29855 )
...
* try to add fwd op input dtypes
* refactor base impl
* return tmp_ins after dygraph prepare data
* fix typo found in debug
* polish comment & add complex net test
* revert detail change
* fix unittest failed
* add complex kernel condition control
* fix xpu test failed & polish comment
* polish details by review comments
5 years ago
Chen Weihang
1a304e6c06
[Complex] Add support for complex grad accumulated ( #29889 )
...
* add support for complex grad accumulated
* add unittest for coverage
* update test dtype
* remove useless blank line
5 years ago
taixiurong
c7acad9f2f
support some shape for matmul and cast in xpu place ( #29900 )
...
* support some shape in matmul and cast
* modify matmul
5 years ago
Leo Chen
6b258317cb
fix TransferInplaceBack ( #29830 )
5 years ago
QingshuChen
59b47f3b32
feat: support check_nan_inf for kunlun/xpu device ( #29694 )
...
* feat: support check_nan_inf for kunlun device
* support kunlun stack
* minor
5 years ago
tangwei12
032414ca2a
[Feature] one ps (3/4) ( #29604 )
...
* oneps (3/4)
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: malin10 <malin10@baidu.com>
Co-authored-by: chengmo <chengmo@baidu.com>
5 years ago
jakpiase
edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) ( #29772 )
5 years ago
Wilber
2c0a4a3470
call_statck is turned on default when ON_INFER=ON ( #29798 )
5 years ago
Wilber
ad0b01ffe2
lod operator should not be reused in memory_optimize pass. ( #29828 )
5 years ago
liym27
97e75ad0f5
[setitem] Support Tensor setitem in static mode ( #29708 )
...
1. Type of index: int, slice(step must be 1).
2. Type of value:
(1) int32, int64, float32, bool;
(2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
(3) paddle.Tensor(int32, int64, float32, float64, bool);
5 years ago
YUNSHEN XIE
24ce051a84
remove duplicate ut reload ( #29810 )
...
* remove duplicate ut reload
* remove duplicate ut define in cmakelist
5 years ago
Jacek Czaja
c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching ( #29606 )
5 years ago
Thunderbrook
09b6e71928
heter box ( #29734 )
...
* add heter box
* add trainer, worker, wrapper...
* format
* for ci
* format
* remove boost get
* boost & copyright
* rename
* rename
* format
* format
* format
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
5 years ago
Jacek Czaja
7b33720c90
[oneDNN] Tensor copy fix to oneDNN tensors ( #29771 )
...
* - Tensor copy fix to oneDNN tensors
* - Fixes after review
5 years ago
123malin
a400b76db7
Roll cuda kernel ( #29655 )
...
* test=develop, optimize roll_op_cuda_kernel
5 years ago
wuhuanzhou
e7ac74c85b
optimize compilation time of argmin/argmax op ( #29595 )
...
* Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop
* remove changes of reduce_op.h, test=develop
5 years ago
Zhou Wei
3f83ec61c2
move running unittest on windows to another file ( #29815 )
5 years ago
chentianyu03
ddfc3d2c2f
change grad elementwise_mul for complex types ( #29757 )
...
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
* delete no need to calculate inputs in dygraph op_test
* delete no need to calculate inputs in dygraph op_test
* modify grad of mul for complex types
* fix the grads of inputs args order not match bug
5 years ago
chentianyu03
2a260d9b0e
change the grad of div when complex types ( #29804 )
...
* change the grad of div when complex types
* fix the grads of inputs args order not match bug
5 years ago
ShenLiang
f65f1caad3
opt sparse allreduce using ncclgather ( #29819 )
5 years ago
TTerror
82aa01c373
add nearest_interp_v2 on kunlun ( #29725 )
...
* add nearest_interp_v2 on kunlun
* add nearest_interp_v2 on kunlun
5 years ago
wangchaochaohu
01c37c8e02
refine the compiler error for half2 operation ( #29816 )
5 years ago
whs
82630408b4
Support double backward rsqrt ( #29589 )
5 years ago
Zhang Ting
b76f5a8489
fix the bug of dropout_grad ( #29813 )
5 years ago
LielinJiang
a94c3cbbf3
register cudnn conv double grad for depthwise conv ( #29807 )
5 years ago
ShenLiang
01e2874a0e
Support multi-stream communication for dynamic graph distributed ( #29525 )
...
* fix fleet for multi-stream
* fix memcpy for ncclid
* use sync to solve move operation
5 years ago
wangchaochaohu
f350aa59ff
Fix the compiler error for half type ( #29799 )
5 years ago
wuhuanzhou
27aa15150c
Add approval for PR-CI-OP-benchmark ( #29797 )
...
* Add approval for PR-CI-OP-benchmark, test=develop
* dont show token in log, test=document_fix
5 years ago
Huihuang Zheng
1cbb282d77
Add Retry Logic to CublasHandlerHolder
...
Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.
5 years ago
LielinJiang
e5af650b71
Add double grad for conv_transpose ( #29706 )
...
* add double grad for conv_transpose
5 years ago
Leo Chen
224f3bcbb1
format code ( #29714 )
5 years ago
LoveAn
2e5b4a216c
Optimize compilation time with Unity Build ( #29733 )
...
* Test compilation time with less parallel count, notest, test=windows_ci
* optimize rules of Unity Build, notest, test=windows_ci, test=windows_op
* limit parallel counts used only on GPU, test=develop
* remove limit of argument /m:8 on Windows, test=develop
5 years ago
Zhang Jun
0c23ba95d8
enable MakeCiper api for inference;test=develop ( #29692 )
5 years ago
wangchaochaohu
7b2dc4e6b1
optimization for fp16 elementwise add ( #29744 )
5 years ago
chalsliu
27bdbec7fc
Refine precision test print message
5 years ago
chalsliu
e63a68feac
Retry when download failed for precision test
5 years ago
Jacek Czaja
07790ba13e
[oneDNN] Reimplemented elementwise_add grad ( #29747 )
...
* - Reimplemented elementwise_add grad
- lint
* - fix after review
* - Fix to fix after review
5 years ago
Aurelius84
17c8e3adfe
Polish code in gpu_launch_config.h ( #29730 )
5 years ago
wangchaochaohu
068d905e1e
fix the shape choose of vectorize for cuda
5 years ago
syyxsxx
7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug ( #29626 )
...
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug
5 years ago
石晓伟
8bd2879ef7
update the operator registration for incompatible upgrade, test=develop ( #29720 )
5 years ago
chentianyu03
71063b8137
add conj op for complex types ( #29527 )
...
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
5 years ago
Wilber
b593d588aa
[Inference] EnableUseGpu has higher priority than flags ( #29697 )
...
* enable_use_gpu has higher priority than FLAGS
* update.
5 years ago
WangXi
9cbcc6cadc
fleet sync build strategy, test=develop ( #29732 )
5 years ago
wanghuancoder
0c59ad2a1a
Windows generate pdb and dump, for debug ( #29628 )
...
* Windows generate pdb and dump, for debug
* fix code style, test=develop
* modify cmakelist
5 years ago
Huihuang Zheng
4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop ( #29617 )
...
Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
5 years ago
Chen Weihang
6cfa59de1b
[Complex] Add real & imag op and api for complex tensor ( #29672 )
...
* add complex real op & api & unittest
* add imag op & api & unittest
* refactor op impl
* revert simplify writing due to complile failed
* polish details
* polish grad op code
5 years ago
Jacek Czaja
9eff1a674f
Added missing format of oneDNN ( #29670 )
5 years ago
wangchaochaohu
2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code ( #29715 )
5 years ago
TTerror
af8ded773a
update activation op on kunlun ( #29577 )
...
* fix expand && concat/transpose to new api
* update xpu_header
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* update activation op on kunlun
* add nearest_interp on kunlun
* update error message
5 years ago
ceci3
cc387159f3
add pad and concat double grad ( #29549 )
...
* add constant pad double grad
5 years ago
liuyuhui
f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor ( #29337 )
5 years ago
Y_Xuan
76738504ad
添加rocm平台支持代码 ( #29342 )
...
* 添加rocm平台支持代码
* 修改一些问题
* 修改一些歧义并添加备注
* 修改代码格式
* 解决冲突后的代码修改
* 修改operators.cmake
* 修改格式
* 修正错误
* 统一接口
* 修改日期
5 years ago
Zhang Ting
1e9127f688
improve dropout grad ( #29605 )
...
* improve grad perf
5 years ago
wangchaochaohu
eab44e1f32
refine ( #29622 )
5 years ago
WangXi
613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop ( #29614 )
5 years ago
chen zhiyu
f5f8809c1a
1. add python version selection 2.add dynamic flags setting. ( #29612 )
5 years ago
YUNSHEN XIE
2926e74326
New UT should not exceed 15s ( #29492 )
...
* added UT should not exceed 15s
* fix error
* UT limit of 15s is the first to be executed
* fix error
* fix error with CI_SKIP_CPP_TEST
* modfied tiemout setting
* fix error
5 years ago
Chen Weihang
f02aece1f0
Add complex dtype op (add) test example ( #29603 )
...
* add op test case for complex
* polish code details
* add xpu set constant support
* fix argument rror
* remove useless pyc file
5 years ago
AshburnLee
efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS ( #28732 )
5 years ago
lijianshe02
7779768b53
add transpose double grad test=develop ( #29600 )
...
* add transpose double grad test=develop
5 years ago
wangchaochaohu
1b69e528d3
optimize for long width for elementwise ( #29602 )
5 years ago
Wilber
78dad78610
fix none-contiguous bug for python api. ( #29615 )
5 years ago
Zhou Wei
18f9df0da4
fix cache pip error ( #29618 )
5 years ago