tianshuo78520a
2e93233899
Add WITH_XPU_BKCL in Kunlun-CI ( #30919 )
4 years ago
Qi Li
34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop ( #30774 )
4 years ago
Jacek Czaja
9e527d9956
[oneDNN] Added basic changes for elementwise_add_grad bf16 ( #30925 )
4 years ago
Chengmo
c98f144fbc
add truncated gaussian random ( #30922 )
...
add truncated gaussian random
4 years ago
liuyuhui
4a8b8b4547
[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess ( #30858 )
4 years ago
liym27
39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. ( #30817 )
4 years ago
liuyuhui
bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc ( #30917 )
4 years ago
wanghuancoder
aab3a3012e
add include for heterbox_trainer.cc, develop=test ( #30910 )
4 years ago
taixiurong
24873f4f77
dyngraph ( #30892 )
4 years ago
Adam Osewski
092a2b1413
More UT for LayerNormFuse pass ( #30891 )
...
* Additionally change to not throw error from inside pass.
4 years ago
tianshuo78520a
a80fe67f84
Change cmake/third_party files for CI ( #30833 )
4 years ago
Jacek Czaja
abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel ( #30757 )
4 years ago
joanna.wozna.intel
73cdea01d4
Add bf16 fast performance verification ( #30551 )
...
* Update Xbyak and add bf16 fast performance verification
* Fix formating
* Change LOG message
* Trigger an update of a new tag
4 years ago
Shang Zhizhou
e6095bc2ce
fix split trt plugin initialize ( #30875 )
...
* fix split trt plugin initialize
* update
4 years ago
WangXi
6e3856d3fb
fix xpu dygraph place ( #30868 )
4 years ago
wanghuancoder
35c5b23f68
use iwyu clean include second time, test=develop ( #30829 )
...
* use iwyu clean include second time, test=develop
4 years ago
cucuzg
ac2e2e6b7f
add clip_by_norm on kunlun, *test=kunlun ( #30862 )
4 years ago
wawltor
b7560a59ab
fix the broadcast for the large second input ( #30818 )
...
fix the broadcast for the large second input
4 years ago
JamesLim
6e1e036a75
Implement cuda kernel for index_sample. ( #30380 )
4 years ago
AshburnLee
666efc2336
Call new cudnn batch norm API regardless of data type and data layout ( #30157 )
4 years ago
QingshuChen
5c8455d6ea
try again if kunlun memory malloc failed ( #30855 )
...
* try again if kunlun memory malloc failed
* minor
4 years ago
石晓伟
2ac4143b6c
support xpu with analysis predictor, test=develop ( #30832 )
...
* support xpu inference with analysis predictor, test=develop
* merge the cmake of the xpu toolchain, test=develop
* add c-apis, test=develop
* fix a bug in extern_xpu, test=develop
4 years ago
liuyuhui
2cb55eff57
fix WITH_XPU_BKCL in CMakeLists.txt ( #30854 )
4 years ago
Adam Osewski
4f066e316e
Layer normalization fuse pass. ( #30721 )
4 years ago
WangXi
b1026f64af
【kunlun】dygraph supports multi xpu card training ( #30671 )
4 years ago
joanna.wozna.intel
04532b8a83
Update Xbyak to v5.81 ( #30809 )
4 years ago
Shang Zhizhou
b909450994
fix trt plugin clone and initialize bugs in TRT7.1+ ( #30709 )
...
* fix trt plugin clone and initialize bugs
* fix unit test error
* enable trt in ci py3
* update unittest timeout
4 years ago
Wilber
b08ae368bb
ci compilation depends on a stable release ( #30755 )
...
* update lite tag
* disable ut
4 years ago
Thunderbrook
cb66c53c2d
dump to cpu ( #30750 )
...
* dump to cpu
* format
* format
* format
4 years ago
Chengmo
d3fac0ea85
fix int64 bug ( #30780 )
...
fix push sparse int64 bug
4 years ago
Qi Li
69875dc42c
[ROCM] update fluid memory for rocm35 (part1), test=develop ( #30758 )
4 years ago
QingshuChen
c35a9880f9
fix malloc L3 failed bug for kunlun ( #30745 )
...
* fix malloc L3 failed bug for kunlun
* minor
4 years ago
WangXi
31ed9c9eed
Fleet distributed strategy support pure fp16 ( #30754 )
4 years ago
Zhen Wang
53d01afed6
Fix the nan bug when passing all zero values into clip_by_norm_op. ( #30777 )
4 years ago
ShenLiang
3858f458ea
rm Singleton of reducer ( #30775 )
4 years ago
Qi Li
f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop ( #30639 )
...
* [ROCM] update fluid platform for rocm35 (part1), test=develop
* address review comments, test=develop
4 years ago
Wojciech Uss
fc00240575
A fix for oneDNN matmul kernel. Fixes issue #30309 ( #30723 )
4 years ago
lidanqing
46989e889b
Fix python3 incompatibility issues ( #30698 )
...
* solve python3 incompatibility issues
* update checksum
4 years ago
alncat
5b59499e57
fixed compilation error on gcc 4.8.x due to the usage of isfinite ( #30733 )
4 years ago
Chengmo
78d37c3f75
【Paddle.Fleet】Fix brpc get hostname ( #30703 )
...
* fix Brpc get hostname
4 years ago
taixiurong
caf3680bbc
fix bugs in transformer predict in xpu place ( #30730 )
...
* transformer predict
* trans bug fix
4 years ago
jakpiase
f8da5536ed
REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30719 )
...
* added external reorder to profiler
* resolved conflict
* added enable_static
* initial version of lstm, not working yet
* added lstm to operators.cmake
* added vanilla lstm mkldnn op
* added peephole weights integration
* minor changes
* added formatting
* added fusion_lstm_mkldnn to static_whitelist
* added formatting
* removed comment
* moved use_peepholes attribute inside is_cached block
* reverted wrong changes
* minor formatting change
* minor changes
* changed stream handling
* minor change
* added datatype to GetExpectedKernelType()
* added reading stream from TLS
4 years ago
liuyuhui
67abfc1588
[Kunlun] fix dead lock for exec_op_count_ ( #30718 )
4 years ago
alncat
5ace20fc3f
modified conv+bn fuse pass to fix wrong mask in mask rcnn ( #30704 )
4 years ago
Tao Luo
824a79d383
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30661 )" ( #30708 )
...
This reverts commit d834f4e6e8
.
4 years ago
lilong12
7fbc68a2c0
update, test=develop ( #30692 )
4 years ago
jakpiase
d834f4e6e8
Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel ( #30661 )
...
* added external reorder to profiler
* resolved conflict
* added enable_static
* initial version of lstm, not working yet
* added lstm to operators.cmake
* added vanilla lstm mkldnn op
* added peephole weights integration
* minor changes
* added formatting
* added fusion_lstm_mkldnn to static_whitelist
* added formatting
* removed comment
* moved use_peepholes attribute inside is_cached block
* reverted wrong changes
* minor formatting change
* minor changes
4 years ago
arlesniak
5bf25d1e8b
More precise mkldnn kernel rules in GetExpectedKernelType ( #29840 )
...
* More precise mkldnn kernel choice in GetExpectedKernelType
* Fixes after review
* Refresh develop for CI
* CI experiment
* get back from CI exper
4 years ago
Jacek Czaja
173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op ( #30358 )
4 years ago
Shang Zhizhou
ae0f88a988
add DLA support:C++&&Python api ( #30165 )
...
* add dla
* add dla done
* add python api
Co-authored-by: shangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>
4 years ago
chentianyu03
fb7fbc7a5d
fix abs bug and add abs test case ( #30637 )
...
* add abs test case
* use std::abs to fix abs bug
* fix the abs bug
* fix abs bug
4 years ago
ShenLiang
9514b4aa5f
Fix scatter grad bug ( #30604 )
4 years ago
Pei Yang
cf9bdb9404
extend trt ut timeout threshold ( #30537 )
4 years ago
Thunderbrook
1bebc09253
solve build gpu task core ( #30626 )
...
* build gpu task core
* format
4 years ago
石晓伟
33bf6eb753
revert external gflags, test=develop ( #30623 )
4 years ago
Jacek Czaja
dfdb0359ea
- Disabling oneDNN inplace pass ( #30588 )
4 years ago
TTerror
10271ddfc4
support reduce_max op on kunlun ( #30581 )
...
* support reduce_max op on kunlun
* support reduce_max op on kunlun
* support reduce_max op on kunlun
* support reduce_max op on kunlun
4 years ago
QingshuChen
5013c67644
fix softmax bug for multi_card in kunlun ( #30600 )
4 years ago
wuhuanzhou
7e671c07b6
optimize unity build ( #30195 )
...
* optimize unity build, test=develop
* fix code style error, test=develop
* fix code style error and test /MP settings, test=develop
4 years ago
liuyuhui
e5b0d9e1fc
[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor ( #30586 )
4 years ago
Zhou Wei
9674e440e2
optimize windows CI, clear tp cache,polish code,improve level of msvc log ( #30579 )
4 years ago
wanghuancoder
90773473a0
use nvtx push pop in timeline ( #30567 )
...
* delete empty line of pybing.cc, test=develop
* use nvtx push pop in timeline, test=develop
* change year, test=develop
* add #ifdef PADDLE_WITH_CUDA, test=develop
* add #ifndef WIN32, test=develop
* is_pushed to is_pushed_, test=develop
4 years ago
chentianyu03
358106fcb0
make abs op support complex types ( #30375 )
...
* rewrite abs op
* rewrite abs op and remove abs in activation
* remove abs register in old codes
* fix abs_grad type error
* fix abs double_grad output name error
* modify abs_grad, abs_grad_grad functor for windows building
* format code style
* fix the bug of result is nan when the divisor is zero
* add missing abs attr and add abs for float16
4 years ago
Wilber
2d5758c456
update. ( #30585 )
4 years ago
Tao Luo
9dd71c74df
disable test_analyzer_detect ( #30541 )
4 years ago
tangwei12
c9e78a22c5
add trainers for pserver ( #30523 )
...
* add trainers for pserver
Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8
4 years ago
wanghuancoder
d1b25ed9d7
add some RecordEvent, for dygraph timeline ( #30299 )
...
* add some RecordEvent, for dygraph timeline, test=develop
* change GpuMemcpySync to memory::Copy, test=develop
* fix compile problem, test=develop
* fix compile problem, test=develop
* fix, test=develop
* fix, test=develop
4 years ago
YUNSHEN XIE
bbea5a1fa9
The new unit test cannot have the same name as the existing unit test ( #29878 )
...
* check UT Duplicate name
* fix error
* Optimized log display
* modified exit code
4 years ago
liym27
ff25c5b36f
Fix bug: GetAttrValue should deal with attr with attrType vector<double> ( #30536 )
4 years ago
WangXi
572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer ( #30455 )
4 years ago
ykkk2333
549855ac20
add rmsprop_op_xpu test=kunlun ( #30493 )
...
* add rmsprop_op_xpu test=kunlun
* modified rmsprop_op_xpu error code. test=kunlun
4 years ago
Zhou Wei
fb20ec9a4e
fix bug of multicard grad ncclAllReduce ( #30553 )
4 years ago
Zhen Wang
f30d00553a
Fix the compiling error of update_loss_scaling when using cuda9. ( #30538 )
4 years ago
Leo Chen
81217a94d8
unify calling cudaSetDevice ( #30470 )
...
* unify calling cudaSetDevice
* fix compile
4 years ago
pangyoki
00554b3f6b
fix error message of Inplace strategy ( #30520 )
4 years ago
Leo Chen
7043b8cfc6
support layer_norm fp16 in dygraph amp ( #30430 )
...
* support layer_norm fp16 in dygraph amp
* add ut
* refine code
4 years ago
wanghuancoder
59ad6ff3e3
delete empty line of pybing.cc, test=develop ( #30529 )
4 years ago
hutuxian
e207fe6385
Ascend Framework Part2: pybind files ( #30410 )
4 years ago
hutuxian
40ede12631
Ascend Framework Part1: OP & Wrapper ( #30281 )
4 years ago
liuyuhui
843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization ( #30317 )
4 years ago
QingshuChen
8489d4f76f
optimize batch_norm & pool op for kunlun ( #30490 )
4 years ago
wanghuancoder
bd97192274
if pybind.cc changed, generate total report, test=develop ( #30514 )
4 years ago
taixiurong
5e5c2827a3
fix range op crash in dygraph xpu place ( #30469 )
4 years ago
JZ-LIANG
16ba0abc79
Recompute Offload: fixed bug in memcpy ( #30484 )
4 years ago
guofei
11e78ebaa3
Modify the calculation logic of LambOptimizer ( #29313 )
...
* Modify the calculation logic of LambOptimizer
4 years ago
Adam Osewski
c5ffad126c
[oneDNN] Refactor fuse pass helper functions to one place. ( #30460 )
...
* Move pass tester helper functions to single common place.
* Use helper functions in two more fuse pass tests.
4 years ago
Zhang Ting
c9a334e1b3
add VecCastCUDAKernel ( #30296 )
4 years ago
pangyoki
13d757362c
Add Inplace strategy (Output reuse Input Varbase) in dygraph ( #30103 )
...
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* fix test_cross_entropy_loss error because of reshape2
* add inplace strategy
* add elementwise_add sub
* let backward op not use inplace
* grad op do not use inplace
* fix memory increase error and add leaf error message
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
* add unittest and leaf error message
* merge view error
* optimize op_function_generator format and support sum inplace op
* fix format of basic_engine
* fix format for framework
* little change of variable wrapper
* add reshape, squeeze, unsqueeze, scatter api
* add relu elu tanh softmax inplace api
* fix test_squeeze_op unittest
* fix test_relu_op unittest
* fix comment problems
* delete sample code of inplace api
* add reference of grad_pending_nodes in basic_engine
* fix unittest name
* add inplace apis into wlist
* fix error message
* add PADDLE_ENFORCE for set grad op twice
* fix head file error
4 years ago
Yang Zhang
008b0a8b56
Fix float64 bug in layer norm ( #30452 )
...
built-in `rsqrt` is shadowed
4 years ago
石晓伟
715d862868
export global google flags to users, test=develop ( #30448 )
4 years ago
Wojciech Uss
88fc7a7d68
fix cache key for inplaced elementwise ops ( #30404 )
4 years ago
wawltor
3d49882e2c
fix the rnn mask memory bug for out of read ( #30459 )
...
* fix the rnn mask memory bug for out of read
* update the code for the rnn
4 years ago
taixiurong
6a3c8725b0
support transformer v2.0 ( #30381 )
4 years ago
ShenLiang
e85be1b1b2
fix flatten api grad ( #30426 )
4 years ago
yaoxuefeng
6e0da01c61
Heter ps new ( #30198 )
4 years ago
123malin
2a98e9323a
test=develop, add distributed_infer ( #30300 )
...
* test=develop, add distributed_infer
4 years ago
QingshuChen
cf786d22ec
fix bug that cann't find mkldnn(kunlun) ( #30394 )
4 years ago
cc
8e3a294045
skip quantizing ops in cpu inference ( #30342 )
...
* skip quantizing ops in cpu inference, test=develop
4 years ago
alncat
7bbf3ac5ab
Added support for inference using quantization aware trained dygraph ( #30288 )
...
* added support for inference using qunatization aware trained dygraph
* added support for inference using qunatization aware trained dygraph
correct boost get usage
* Delete incorrect warning message (#30196 )
* fix warning and no grad
* clean redundant API alias in 2.0 - part 2 (#30013 )
* delete paddle.nn.functional.assign
* fix dynamic to static error
* just add the op error message for the matmul xpu (#30246 )
add the op error message for the matmul xpu
* Add Static Variable Clone (#30208 )
Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat
* use wget to replace curl to download the lcov file (#30229 )
* use wget to replace curl to download the lcov file
* add cache for lcov
* fix test_pool3d_op timeout issue (#30248 )
* Fix unittests bugs. (#30250 )
* modify error message based on comments (#30189 )
* modify error message based on comments
* edit code according to review.
* Correct spelling according to review.
* Fix bug for 'save mutiple method' (#30218 )
* Fix bug for 'save mutiple method'
* To pass coverage.
* edit code to pass coverage.
* edit code to pass coverage.
* add unittest for coverage.
* change for coverage.
* edit for coverage.
* added support for inference using qunatization aware trained dygraph
* Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206 )
* add alias from fluid.layers.auc to static.auc
* Update __init__.py
* added support for inference using qunatization aware trained dygraph
correct boost get usage
* corrected boost get usage
* corrected naming issues and enforcing zero check
* correct paddle enforce message
* added more error checkings
* corrected error report message and optimized code
* corrected findvar usage
* corrected paddle_enforce in scope
* correct error messages
* correct error reporting format
Co-authored-by: LielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: XiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com>
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Huihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
Co-authored-by: Bai Yifan <me@ethanbai.com>
Co-authored-by: gongweibao <weibao.gong@gmail.com>
Co-authored-by: WeiXin <weixin10@baidu.com>
Co-authored-by: Jiaqi Liu <liujiaqi06@baidu.com>
4 years ago
GaoWei8
180877e988
Softmax backward optimize ( #30249 )
...
* softmax backward optimize
4 years ago
Zhou Wei
b1d8ff45d7
running unit test sigle GPU parallely on Linux/windows GPU ( #29523 )
4 years ago
Zhang Jun
10a8f3e5c3
fix bug on compiling inference shared lib with crypto;test=develop ( #30269 )
...
* fix bug on compiling inference shared lib with crypto;test=develop
* fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF
* update cmake
* remove unnecessary enforce message
4 years ago
Huihuang Zheng
28e156c27f
Fix Sleep Error in enforce.h ( #30335 )
...
usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.
4 years ago
Leo Chen
3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card ( #30338 )
...
* set expected place in child thread for dataloader
* set device id when set tensor from numpy
* revert tensor_py change
* add compile guard
* fix ci
* fix bug
4 years ago
QingshuChen
2c1bba02e4
optimize memcpy perf for kunlun ( #30291 )
...
* optimize memcpy perf for kunlun
* remove useless unitest for kunlun mean
* minor
4 years ago
ShenLiang
a60f17b89d
Support unused parameters in dynamic graph distributed ( #30224 )
4 years ago
JZ-LIANG
75936d838f
Recompute Offload ( #30233 )
4 years ago
lidanqing
a60893f6b5
correct the allowed dimension size ( #30326 )
4 years ago
Chen Weihang
c8c8f205ba
remove c++ stacktrace hint ( #30325 )
4 years ago
tangwei12
5e839e4da5
add sparse embedding & load vars for 2.0 & gloo bug fix ( #30306 )
...
* add sparse embedding & load vars for 2.0
Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b
* fix hdfs gloo
Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6
* fix gloo hdfs
Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e
* move loadvar/sparse embedding from incubute to static
Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
4 years ago
tangwei12
25f80fd304
Fix/distributed proto ( #29981 )
...
* rename sendrecv.proto to namespace paddle.distributed
* split ps with distributed
4 years ago
Chengmo
d479ae1725
【Paddle.Fleet】Support local save sparse param ( #30175 )
...
* add save tensor support
Co-authored-by: seiriosPlus <tangwei12@baidu.com>
4 years ago
Double_V
231501fefc
fix elugradgrad test fail & error message opt ( #30171 )
...
* fix elugradgrad test fail and error message opt
* fix unitest,test=develop
* Update prroi_pool_op.h
fix error message
* opt message,test=develop
* fix ci fail,test=develop
4 years ago
Zhen Wang
fb49ea388e
Fix the accuracy problem of allclose op when using float64 data type in static mode. ( #29890 )
...
* Fix the accuracy problem of allclose op when using float64 data type in static mode.
* Format the code style.
4 years ago
yaoxuefeng
4656525e24
fix datanorm error msg ( #30294 )
4 years ago
furnace
77051cc9f0
add fp16 support for tril_triu op ( #30186 )
4 years ago
石晓伟
efa54629fb
fix header file paths of gflags, commit 3, test=develop ( #30273 )
4 years ago
Chengmo
5b2c15afcd
Fix server.h include device_context ( #30243 )
...
* fix cmake
Co-authored-by: seiriosPlus <tangwei12@baidu.com>
4 years ago
石晓伟
a0ee09148e
enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop ( #30240 )
4 years ago
石晓伟
a66eebab5c
fix header file paths of gflags, commit 4, test=develop ( #30274 )
4 years ago
石晓伟
8c4500ff6d
fix header file paths of gflags, commit 2, test=develop ( #30272 )
4 years ago
liym27
b4989fb744
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value ( #30126 )
4 years ago
wangchaochaohu
8dcae0c55d
register OPMaker and Infer Shape Check for fused_elementwise_add ( #30259 )
4 years ago
AshburnLee
924aac2216
Add tf32 switch for cuDNN ( #29192 )
4 years ago
石晓伟
8ce2482b80
fix header file paths of gflags, commit 1, test=develop ( #30271 )
4 years ago
chentianyu03
c7371b7b20
type promotion for grad ( #30177 )
...
* type promotion for grad
* add type promotion for div op
4 years ago
liym27
3ce878f309
Check the rank of input in kernel of set_value op ( #30147 )
4 years ago
WeiXin
66dc4ac77b
modify error message based on comments ( #30189 )
...
* modify error message based on comments
* edit code according to review.
* Correct spelling according to review.
4 years ago
wawltor
fee424411a
just add the op error message for the matmul xpu ( #30246 )
...
add the op error message for the matmul xpu
4 years ago
GaoWei8
0a21924a8d
optimize softmax forward ( #30217 )
...
* optimize softmax forward
4 years ago
wangchaochaohu
af80859dd6
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) ( #29885 )
4 years ago
zhang wenhui
5932fee60a
enhance error message, test=develop ( #30220 )
4 years ago
pangyoki
da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op ( #29913 )
...
* add view strategy on squeeze,unsqueeze,reshape,flatten
* add squeeze unittest
* add unittests
* use View strategy as name rather than Reuse Allacation
* fix view api doc
* fix format
* use core.ops when input of reshape2 is Tensor
* fix test_cross_entropy_loss error because of reshape2
* delete selected_rows
* change op_function
* little change
* solve HandleViewBetweenInputAndOutput
4 years ago
Jacek Czaja
4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching ( #30203 )
...
* - Added UT for testing elementwise_mul caching
* lint fixes
4 years ago
Zhen Wang
7f7dfccf20
Support pure fp16 training for AMP API. ( #29544 )
...
* add cast ops before and after unsupported fp16 ops.
* Keep partial net in FP32 pattern.
* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
* Add fp16 support for adam op.
* add multi precision attr for adam.
* Fix the bug of test_multi_precision_fp16_train UT.
* Code format for CI.
* Fix the redefine error about MPTypeTrait on windows.
* fix bugs of the _create_accumulators func in Momentum.
* fix bug when inserting post cast op.
* Add the update_loss_scaling op in allow_set of UnusedVarCheck.
* Update for ci coverage.
* Add some doc for OptimizerWithMixedPrecision.
* Fix the code style.
* Imporve the doc of `amp_init`.
* Change for fp16 testing if users have the infer program defined in separate way.
4 years ago
Leo Chen
789743e190
use cuda generator in bernoulli cuda kernel ( #30199 )
4 years ago
Leo Chen
8696335f86
Fix dtype of ungenerated grad var ( #28511 )
...
* fix dtype of ungenerated grad var
* update ut
* refine code
* set default dtype
* fix could_use_cudnn bug
* remove debug code
* re-implement
* fix bug
4 years ago
Wilber
609c022222
shape op support int8 and uint8 tensor ( #30201 )
4 years ago
Wilber
01a287bf0a
fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON ( #30194 )
4 years ago
ruri
e42e1e80dc
Add version checking, test=op_version ( #30129 )
4 years ago
Leo Chen
1f97d61c68
Add callback after TensorCopy ( #30123 )
...
* change to tensor copy sync
* change to tensor copy sync
* make copy_to safe when use TensorCopy
* refine code
* add ut
* add cudapinned garbagecollector
* add testcase: cpu place -> cuda pinned place
4 years ago
Chengmo
528e03fc08
【Paddle.Fleet】Fix tensor table ( #30075 )
...
* add tensor table
4 years ago
Wilber
ade244948c
disable mkldnn inplace pass on windows ( #30164 )
4 years ago
joanna.wozna.intel
907262ee15
Fix analysis predictor test ( #30191 )
...
* Add a necessary condition
* Remove test for white list and add header
4 years ago
lijianshe02
2dc7ee276b
enhance error message of nll_loss op test=develop ( #30125 )
...
* enhance error message of nll_loss op test=develop
4 years ago
Huihuang Zheng
54bf3f5a56
Refine PADDLE_ENFORCE Error Messages. test=develop ( #30149 )
...
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
4 years ago
Chen Weihang
d0fb06b27f
[Complex] Simplify prepared op impl to improve performance ( #30153 )
...
* simplify prepared op impl to improve performance
* fix kunlun compile error
* continue fix kunlun compile error
* only transform diff place when dtype diff
* fix failed unittests
* remove useless file
* polish impl by review comment
4 years ago
123malin
c5b415bfd9
Improve Index select cuda kernel ( #30139 )
...
* test=develop, add index_select_cuda kernel
4 years ago
wangchaochaohu
7dd551e08b
refine the paddle place support using str ( #28769 )
4 years ago
WeiXin
404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t ( #30161 )
4 years ago
Wilber
91a8a25721
enhance error info for py_func ( #30138 )
...
* enhance error info for py_func
* update
4 years ago
weihaoji
b8207af6bc
[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop ( #30122 )
4 years ago
liuyuhui
15fac5e7fa
fix assign_op_xpu concat_op_xpu warining ( #30120 )
4 years ago
Jack Zhou
f5428eca4f
fix enforce msg of sum xpu op ( #30113 )
4 years ago
123malin
198fbdfb60
Add Lookahead and ModelAverage Optimizer ( #30004 )
...
* test=develop, add model_average and lookahead
4 years ago
Leo Chen
adac38c506
add dispenable input for core.ops.reshape2/expand/slice ( #30072 )
...
* add dispenable input 'shape' for core.ops.reshape2
* add dispenable inputs for core.ops.reshape2/expand/slice
* add ut
4 years ago
ShenLiang
becf99d2e8
fix error message ( #30135 )
4 years ago
Zhou Wei
30888ca343
Polish and Optimize the print/repr information of Layer ( #29998 )
...
* Polish and Optimize the print/repr message of all layer
* fix some code format
4 years ago
Zhou Wei
9c99d37906
fix unittest failed on windows ( #29837 )
4 years ago
wangguanzhong
69839f8a9a
fix error message for distribute_fpn_proposals_op ( #30116 )
4 years ago
QingshuChen
8e1c3ddf15
add aarch64 and sunway kunlun lib ( #30027 )
...
* add aarch64 and sunway kunlun lib
* minor
* optimize elementwise_add for kunlun
* update kunlun dependence
* minor
* minor
4 years ago
Shang Zhizhou
05b27695f1
add inference api: DisableTensorRtOps ( #30109 )
...
* snap
* add inference api: DisableTensorRtOPs
* fix code style
* update api to experimental
* update variable name
4 years ago
石晓伟
53bb126510
fix a bug in op_version_registry, test=develop, test=op_version ( #29994 )
4 years ago
xiemoyuan
3e0c492910
Optimize the error message of framework. ( #30134 )
4 years ago
liym27
9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result( #30003 )
...
1. when slice_item is a slice:
1) the start of __getitem__ should be std::max(start, 0) if slice
2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data
4 years ago
chentianyu03
666e665132
change the kron gradient when complex types ( #29995 )
4 years ago
chentianyu03
a5e422c85d
add trace op_register_version and fix version bug; test=op_version ( #30000 )
...
* add trace op_register_version and fix defaulf bug; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* add trace op_register_version; test=op_version
* fix missing the template bug of vector; test=op_version
4 years ago
cc
9f34374b48
Fix the formate of raising error in randperm op ( #30108 )
...
* fix the formate of raising error in randperm op
4 years ago
liuyuhui
254ad61959
fix xpu pe sync, test=notest ( #30095 )
4 years ago
Thunderbrook
0b8e1fadc5
add topo-aware in heter-ps ( #30087 )
...
* add topo aware
* resource.h
* topo aware
* format
4 years ago
hong
297fff1a79
support dygraph in xpu place ( #30051 )
...
* support dygraph in xpu place; test=develop
* fix cpu/gpu compile error; test=develop
* fix compile error; test=develop
* fix xpu compile error; testd=develop
4 years ago
wangchaochaohu
d0a5620575
fix the compiler error when gcc4 cuda9.0 ( #29997 )
4 years ago
WangXi
ee16006b5d
Optimization grad merge performance ( #29784 )
4 years ago
yongqiangma
e891f4da1b
Add p_norm op version info ( #30042 )
...
* p_norm fix op version info. test=develop
4 years ago
tangwei12
7d1c149e09
for inference checkpoint ( #30081 )
...
* for inference checkpoint
Change-Id: I36c979240ffa55bf1ef0c9315402960762af6be4
* for inference checkpoint
Change-Id: I82025365d5b792cbea1ead506df685aecc8ac198
4 years ago
tangwei12
7d4bdff07d
fix large scale memory ( #30035 )
...
* memory holder optimize
Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e
* memory holder optimize
Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f
* fix large scale memory holder
Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f
* fix large scale memory holder
Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655
4 years ago
Shang Zhizhou
08dc5bc27e
fix op version checker of pass bug ( #30028 )
...
* fix op version checker of pass bug
* fix code style
* update pass version
4 years ago
cc
68398abce9
[Inference] zero_copy_tensor supports int8_t ( #30053 )
...
* zero_copy_tensor supports int8_t
4 years ago
whs
1b999d2b5d
Add version checking ( #30040 )
4 years ago
ceci3
85b2f05ab0
register ModifyAttr for instance_norm, test=op_version ( #30065 )
...
* register instance norm, test=op_version
4 years ago
channings
ddcff254db
fix op_register_version for compare ops, test=op_version ( #30007 )
...
Co-authored-by: zhoushunjie <zhoushunjie@baidu.com>
4 years ago
Wilber
66e16b7e99
update lite subgraph. ( #30056 )
4 years ago
GaoWei8
a64822589f
add REGISTER_OP_VERSION for LSTM ( #30038 )
4 years ago
yinhaofeng
6e93fb92f9
Register op version for linspace,test=op_version ( #30025 )
...
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
* Register op version for linspace,test=op_version
4 years ago
123malin
d0056c324d
test=develop, add op_register_version for roll_op ( #30023 )
...
* test=develop, add op_register_version for roll_op
4 years ago
chentianyu03
e012930aa3
complex gradient matmul ( #29966 )
...
* dot op support complex types
* matmul support complex types
* add test case
* matmul broadcast gradient support complex
* move conjFunctor to complex_functor.h
4 years ago
ShenLiang
893d37e5c6
Fix rank_attention op_version, test=op_version ( #30006 )
...
* fix rank_attention, test=op_version
4 years ago
Adam Osewski
13aef97043
operator checkpoints for new attributes. ( #29832 )
...
* Add operator checkpoints for new attributes.
* Fix adding subsequent checkpoint to quantize op.
4 years ago
wangguanzhong
844d8e0c2c
add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version ( #30034 )
4 years ago
cc
c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op ( #30016 )
...
* Add mkldnn nearest_interp and bilinear_interp op
* don't run mkldnn interpolate in default
* add interpolate_mkldnn_pass
4 years ago
chalsliu
c053bf2a57
Revert "register ModifyAttr for instance_norm, test=op_version ( #29938 )"
4 years ago
wawltor
cc2f94620c
add the support the op version check for matmul, test=op_version ( #30011 )
...
* add the support the op version check for matmul, test=op_version
4 years ago
wawltor
b33aaea86c
add the op version check for the elementwise ops, test=op_version ( #30010 )
...
* add the op version check for the elementwise ops, test=op_version
* add the support check for elementwise_ops, test=op_version
4 years ago
Chengmo
4cbcc9b6da
fix momentum op register ( #29941 )
...
* fix momentum op register
4 years ago
hutuxian
7c1f69bdf0
add op_version for flip op [test=op_version] ( #30019 )
4 years ago
ceci3
77c1684397
register ModifyAttr for instance_norm, test=op_version ( #29938 )
...
* upgrade instance_norm, test=op_version
* fix
4 years ago
Leo Chen
47d10c55d5
Enhance debugging ( #30001 )
...
* add debug code
* add place info
* fix compile problem
* add place for output
4 years ago
FlyingQianMM
d42f93e504
add op_register_version for allclose op; test=op_version ( #29968 )
4 years ago
wawltor
8f49f9d5c9
change the elementwise ops version check, test=op_version
...
change the elementwise ops version check, test=op_version
4 years ago
guofei
b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop ( #29957 )
...
Add moving_average_abs_max_scale op_register_version
4 years ago
Thunderbrook
0ca6de171f
add include ( #29952 )
4 years ago
zhangchunle
631d783748
fix bug in windows ci ( #29963 )
4 years ago
Pei Yang
6206b9bc71
fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test ( #29977 )
4 years ago
wangxinxin08
be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version ( #29937 )
4 years ago
石晓伟
958612231f
compile the denormal.cc on aarch64, test=develop ( #29956 )
4 years ago
Guo Sheng
6ac4f0af6a
Register op version for coalesce_tensor. ( #29940 )
...
test=develop
test=op_version
4 years ago
Chen Weihang
a1d9a14e89
support grad accumulated across batch ( #29942 )
4 years ago
cc
6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul ( #29911 )
...
* map matmul/squeeze2+matmul/reshape2+matmul to mul
4 years ago
Huihuang Zheng
d038746e1c
Fix Unix Sleep for Wrong Time. test=develop ( #29953 )
...
PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it.
After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html , the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".
4 years ago
YUNSHEN XIE
121658d251
Support xpu ut coverage ( #29892 )
...
* add xpu_coverage function
* xpu coverage ipipe only deal with xpu files
* fix import error
* fix format error
* 'fix format error'
* fix format error
* fix error
* fix format error
* fix format error
4 years ago
Jack Zhou
5a4e42ca9a
add gru op_register_version; test=op_version; ( #29931 )
...
* add gru op_register_version; test=op_version;
* Update fc,mul version;test=op_version;
4 years ago
Wilber
2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. ( #29925 )
4 years ago
Qi Li
913f77a4b7
Register op version for print, test=op_version ( #29945 )
4 years ago
石晓伟
181ea1870b
flush denormals to zero, test=develop ( #29924 )
...
* flush denormals to zero, test=develop
* add comments, test=develop
4 years ago
cc
7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version ( #29923 )
...
* add op version for fake_quant and fake_dequant ops, test=op_version, test=develop
4 years ago
石晓伟
acb5e86363
fix a bug in reset_tensor_array, test=develop ( #29620 )
...
* fix a bug in reset_tensor_array, test=develop
* ci coverage, test=develop
4 years ago
liuyuhui
3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor ( #29926 )
4 years ago
Wilber
332da133a1
Support mips arch ( #29903 )
...
* Support MIPS arch.
4 years ago
LielinJiang
eab0b60e16
Register op version for grid_sampler, test=op_version ( #29916 )
...
* register op version for grid_sampler
4 years ago
liym27
9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor ( #29842 )
...
* Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )"
This reverts commit b10ecd9d3a
.
* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase
4 years ago
liuyuhui
4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor ( #29574 )
4 years ago
LielinJiang
0f4b218640
Enable bilateral_slice unittest on windows platform ( #29896 )
...
* enable bilateral_slice unittest on windows platform
* reduce max threads
4 years ago
Ren Wei (任卫)
95df0e1447
Add the ipipe log param prefix ( #29545 )
...
* Add the ipipe log param prefix
1. add the prefix;
2. using Colon before the metric values;
* 增加效率云日志指标收集前缀
暂未验证windows bat的这个字符串替换是否正常
* Preserve The Old Format Metrics During The Transition Period
Please DELETE the old format metrics log finally.
The period man last for a week.
* ipipe_log_param + ccache and clcache ..
4 years ago
YUNSHEN XIE
2a01756bf3
remove duplicate ut names ( #29809 )
4 years ago
Chen Weihang
a6072055be
[Complex] Handle complex to real after type promotion ( #29855 )
...
* try to add fwd op input dtypes
* refactor base impl
* return tmp_ins after dygraph prepare data
* fix typo found in debug
* polish comment & add complex net test
* revert detail change
* fix unittest failed
* add complex kernel condition control
* fix xpu test failed & polish comment
* polish details by review comments
4 years ago
Chen Weihang
1a304e6c06
[Complex] Add support for complex grad accumulated ( #29889 )
...
* add support for complex grad accumulated
* add unittest for coverage
* update test dtype
* remove useless blank line
4 years ago
taixiurong
c7acad9f2f
support some shape for matmul and cast in xpu place ( #29900 )
...
* support some shape in matmul and cast
* modify matmul
4 years ago
Leo Chen
6b258317cb
fix TransferInplaceBack ( #29830 )
4 years ago
QingshuChen
59b47f3b32
feat: support check_nan_inf for kunlun/xpu device ( #29694 )
...
* feat: support check_nan_inf for kunlun device
* support kunlun stack
* minor
4 years ago
tangwei12
032414ca2a
[Feature] one ps (3/4) ( #29604 )
...
* oneps (3/4)
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: malin10 <malin10@baidu.com>
Co-authored-by: chengmo <chengmo@baidu.com>
4 years ago
jakpiase
edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) ( #29772 )
4 years ago
Wilber
2c0a4a3470
call_statck is turned on default when ON_INFER=ON ( #29798 )
4 years ago
Wilber
ad0b01ffe2
lod operator should not be reused in memory_optimize pass. ( #29828 )
4 years ago
liym27
97e75ad0f5
[setitem] Support Tensor setitem in static mode ( #29708 )
...
1. Type of index: int, slice(step must be 1).
2. Type of value:
(1) int32, int64, float32, bool;
(2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
(3) paddle.Tensor(int32, int64, float32, float64, bool);
4 years ago
YUNSHEN XIE
24ce051a84
remove duplicate ut reload ( #29810 )
...
* remove duplicate ut reload
* remove duplicate ut define in cmakelist
4 years ago
Jacek Czaja
c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching ( #29606 )
4 years ago
Thunderbrook
09b6e71928
heter box ( #29734 )
...
* add heter box
* add trainer, worker, wrapper...
* format
* for ci
* format
* remove boost get
* boost & copyright
* rename
* rename
* format
* format
* format
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
4 years ago
Jacek Czaja
7b33720c90
[oneDNN] Tensor copy fix to oneDNN tensors ( #29771 )
...
* - Tensor copy fix to oneDNN tensors
* - Fixes after review
4 years ago
123malin
a400b76db7
Roll cuda kernel ( #29655 )
...
* test=develop, optimize roll_op_cuda_kernel
4 years ago
wuhuanzhou
e7ac74c85b
optimize compilation time of argmin/argmax op ( #29595 )
...
* Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop
* remove changes of reduce_op.h, test=develop
4 years ago
Zhou Wei
3f83ec61c2
move running unittest on windows to another file ( #29815 )
4 years ago
chentianyu03
ddfc3d2c2f
change grad elementwise_mul for complex types ( #29757 )
...
* add conj op for complex types
* add conj for complex types
* add more test case
* add conj_op test
* modify conj api and impl
* add complex type for fill_constant_op xpu
* add setConstant for complex type
* remove complex conj test file
* user define grad for test_conj_op
* add test case for static mode of conj api
* modify conj doc
* change input args name to x
* remove useless codes
* conj support real types
* add conj test case for real number
* delete no need to calculate inputs in dygraph op_test
* delete no need to calculate inputs in dygraph op_test
* modify grad of mul for complex types
* fix the grads of inputs args order not match bug
4 years ago
chentianyu03
2a260d9b0e
change the grad of div when complex types ( #29804 )
...
* change the grad of div when complex types
* fix the grads of inputs args order not match bug
4 years ago
ShenLiang
f65f1caad3
opt sparse allreduce using ncclgather ( #29819 )
4 years ago
TTerror
82aa01c373
add nearest_interp_v2 on kunlun ( #29725 )
...
* add nearest_interp_v2 on kunlun
* add nearest_interp_v2 on kunlun
4 years ago
wangchaochaohu
01c37c8e02
refine the compiler error for half2 operation ( #29816 )
4 years ago
whs
82630408b4
Support double backward rsqrt ( #29589 )
4 years ago
Zhang Ting
b76f5a8489
fix the bug of dropout_grad ( #29813 )
4 years ago
LielinJiang
a94c3cbbf3
register cudnn conv double grad for depthwise conv ( #29807 )
4 years ago
ShenLiang
01e2874a0e
Support multi-stream communication for dynamic graph distributed ( #29525 )
...
* fix fleet for multi-stream
* fix memcpy for ncclid
* use sync to solve move operation
4 years ago