Commit Graph

18542 Commits (b48841ba2e7335eaa435a54436ed580d4aef001c)

Author SHA1 Message Date
tianshuo78520a 2e93233899
Add WITH_XPU_BKCL in Kunlun-CI (#30919)
4 years ago
Qi Li 34f1628ce8
[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774)
4 years ago
Jacek Czaja 9e527d9956
[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925)
4 years ago
Chengmo c98f144fbc
add truncated gaussian random (#30922)
4 years ago
liuyuhui 4a8b8b4547
[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858)
4 years ago
liym27 39f41cb47f
Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)
4 years ago
liuyuhui bef46ccfc8
[Kunlun]fix include files of gen_comm_id_helper.cc (#30917)
4 years ago
wanghuancoder aab3a3012e
add include for heterbox_trainer.cc, develop=test (#30910)
4 years ago
taixiurong 24873f4f77
dyngraph (#30892)
4 years ago
Adam Osewski 092a2b1413
More UT for LayerNormFuse pass (#30891)
4 years ago
tianshuo78520a a80fe67f84
Change cmake/third_party files for CI (#30833)
4 years ago
Jacek Czaja abfa822650
[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757)
4 years ago
joanna.wozna.intel 73cdea01d4
Add bf16 fast performance verification (#30551)
4 years ago
Shang Zhizhou e6095bc2ce
fix split trt plugin initialize (#30875)
4 years ago
WangXi 6e3856d3fb
fix xpu dygraph place (#30868)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
cucuzg ac2e2e6b7f
add clip_by_norm on kunlun, *test=kunlun (#30862)
4 years ago
wawltor b7560a59ab
fix the broadcast for the large second input (#30818)
4 years ago
JamesLim 6e1e036a75
Implement cuda kernel for index_sample. (#30380)
4 years ago
AshburnLee 666efc2336
Call new cudnn batch norm API regardless of data type and data layout (#30157)
4 years ago
QingshuChen 5c8455d6ea
try again if kunlun memory malloc failed (#30855)
4 years ago
石晓伟 2ac4143b6c
support xpu with analysis predictor, test=develop (#30832)
4 years ago
liuyuhui 2cb55eff57
fix WITH_XPU_BKCL in CMakeLists.txt (#30854)
4 years ago
Adam Osewski 4f066e316e
Layer normalization fuse pass. (#30721)
4 years ago
WangXi b1026f64af
【kunlun】dygraph supports multi xpu card training (#30671)
4 years ago
joanna.wozna.intel 04532b8a83
Update Xbyak to v5.81 (#30809)
4 years ago
Shang Zhizhou b909450994
fix trt plugin clone and initialize bugs in TRT7.1+ (#30709)
4 years ago
Wilber b08ae368bb
ci compilation depends on a stable release (#30755)
4 years ago
Thunderbrook cb66c53c2d
dump to cpu (#30750)
4 years ago
Chengmo d3fac0ea85
fix int64 bug (#30780)
4 years ago
Qi Li 69875dc42c
[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758)
4 years ago
QingshuChen c35a9880f9
fix malloc L3 failed bug for kunlun (#30745)
4 years ago
WangXi 31ed9c9eed
Fleet distributed strategy support pure fp16 (#30754)
4 years ago
Zhen Wang 53d01afed6
Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777)
4 years ago
ShenLiang 3858f458ea
rm Singleton of reducer (#30775)
4 years ago
Qi Li f89da4ab45
[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639)
4 years ago
Wojciech Uss fc00240575
A fix for oneDNN matmul kernel. Fixes issue #30309 (#30723)
4 years ago
lidanqing 46989e889b
Fix python3 incompatibility issues (#30698)
4 years ago
alncat 5b59499e57
fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733)
4 years ago
Chengmo 78d37c3f75
【Paddle.Fleet】Fix brpc get hostname (#30703)
4 years ago
taixiurong caf3680bbc
fix bugs in transformer predict in xpu place (#30730)
4 years ago
jakpiase f8da5536ed
REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719)
4 years ago
liuyuhui 67abfc1588
[Kunlun] fix dead lock for exec_op_count_ (#30718)
4 years ago
alncat 5ace20fc3f
modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704)
4 years ago
Tao Luo 824a79d383
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708)
4 years ago
lilong12 7fbc68a2c0
update, test=develop (#30692)
4 years ago
jakpiase d834f4e6e8
Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)
4 years ago
arlesniak 5bf25d1e8b
More precise mkldnn kernel rules in GetExpectedKernelType (#29840)
4 years ago
Jacek Czaja 173660be7b
[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358)
4 years ago
Shang Zhizhou ae0f88a988
add DLA support:C++&&Python api (#30165)
4 years ago
chentianyu03 fb7fbc7a5d
fix abs bug and add abs test case (#30637)
4 years ago
ShenLiang 9514b4aa5f
Fix scatter grad bug (#30604)
4 years ago
Pei Yang cf9bdb9404
extend trt ut timeout threshold (#30537)
4 years ago
Thunderbrook 1bebc09253
solve build gpu task core (#30626)
4 years ago
石晓伟 33bf6eb753
revert external gflags, test=develop (#30623)
4 years ago
Jacek Czaja dfdb0359ea
- Disabling oneDNN inplace pass (#30588)
4 years ago
TTerror 10271ddfc4
support reduce_max op on kunlun (#30581)
4 years ago
QingshuChen 5013c67644
fix softmax bug for multi_card in kunlun (#30600)
4 years ago
wuhuanzhou 7e671c07b6
optimize unity build (#30195)
4 years ago
liuyuhui e5b0d9e1fc
[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586)
4 years ago
Zhou Wei 9674e440e2
optimize windows CI, clear tp cache,polish code,improve level of msvc log (#30579)
4 years ago
wanghuancoder 90773473a0
use nvtx push pop in timeline (#30567)
4 years ago
chentianyu03 358106fcb0
make abs op support complex types (#30375)
4 years ago
Wilber 2d5758c456
update. (#30585)
4 years ago
Tao Luo 9dd71c74df
disable test_analyzer_detect (#30541)
4 years ago
tangwei12 c9e78a22c5
add trainers for pserver (#30523)
4 years ago
wanghuancoder d1b25ed9d7
add some RecordEvent, for dygraph timeline (#30299)
4 years ago
YUNSHEN XIE bbea5a1fa9
The new unit test cannot have the same name as the existing unit test (#29878)
4 years ago
liym27 ff25c5b36f
Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30536)
4 years ago
WangXi 572c466d19
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455)
4 years ago
ykkk2333 549855ac20
add rmsprop_op_xpu test=kunlun (#30493)
4 years ago
Zhou Wei fb20ec9a4e
fix bug of multicard grad ncclAllReduce (#30553)
4 years ago
Zhen Wang f30d00553a
Fix the compiling error of update_loss_scaling when using cuda9. (#30538)
4 years ago
Leo Chen 81217a94d8
unify calling cudaSetDevice (#30470)
4 years ago
pangyoki 00554b3f6b
fix error message of Inplace strategy (#30520)
4 years ago
Leo Chen 7043b8cfc6
support layer_norm fp16 in dygraph amp (#30430)
4 years ago
wanghuancoder 59ad6ff3e3
delete empty line of pybing.cc, test=develop (#30529)
4 years ago
hutuxian e207fe6385
Ascend Framework Part2: pybind files (#30410)
4 years ago
hutuxian 40ede12631
Ascend Framework Part1: OP & Wrapper (#30281)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
QingshuChen 8489d4f76f
optimize batch_norm & pool op for kunlun (#30490)
4 years ago
wanghuancoder bd97192274
if pybind.cc changed, generate total report, test=develop (#30514)
4 years ago
taixiurong 5e5c2827a3
fix range op crash in dygraph xpu place (#30469)
4 years ago
JZ-LIANG 16ba0abc79
Recompute Offload: fixed bug in memcpy (#30484)
4 years ago
guofei 11e78ebaa3
Modify the calculation logic of LambOptimizer (#29313)
4 years ago
Adam Osewski c5ffad126c
[oneDNN] Refactor fuse pass helper functions to one place. (#30460)
4 years ago
Zhang Ting c9a334e1b3
add VecCastCUDAKernel (#30296)
4 years ago
pangyoki 13d757362c
Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103)
4 years ago
Yang Zhang 008b0a8b56
Fix float64 bug in layer norm (#30452)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Wojciech Uss 88fc7a7d68
fix cache key for inplaced elementwise ops (#30404)
4 years ago
wawltor 3d49882e2c
fix the rnn mask memory bug for out of read (#30459)
4 years ago
taixiurong 6a3c8725b0
support transformer v2.0 (#30381)
4 years ago
ShenLiang e85be1b1b2
fix flatten api grad (#30426)
4 years ago
yaoxuefeng 6e0da01c61
Heter ps new (#30198)
4 years ago
123malin 2a98e9323a
test=develop, add distributed_infer (#30300)
4 years ago
QingshuChen cf786d22ec
fix bug that cann't find mkldnn(kunlun) (#30394)
4 years ago
cc 8e3a294045
skip quantizing ops in cpu inference (#30342)
4 years ago
alncat 7bbf3ac5ab
Added support for inference using quantization aware trained dygraph (#30288)
4 years ago
GaoWei8 180877e988
Softmax backward optimize (#30249)
4 years ago
Zhou Wei b1d8ff45d7
running unit test sigle GPU parallely on Linux/windows GPU (#29523)
4 years ago
Zhang Jun 10a8f3e5c3
fix bug on compiling inference shared lib with crypto;test=develop (#30269)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
Leo Chen 3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
ShenLiang a60f17b89d
Support unused parameters in dynamic graph distributed (#30224)
4 years ago
JZ-LIANG 75936d838f
Recompute Offload (#30233)
4 years ago
lidanqing a60893f6b5
correct the allowed dimension size (#30326)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
tangwei12 5e839e4da5
add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
Chengmo d479ae1725
【Paddle.Fleet】Support local save sparse param (#30175)
4 years ago
Double_V 231501fefc
fix elugradgrad test fail & error message opt (#30171)
4 years ago
Zhen Wang fb49ea388e
Fix the accuracy problem of allclose op when using float64 data type in static mode. (#29890)
4 years ago
yaoxuefeng 4656525e24
fix datanorm error msg (#30294)
4 years ago
furnace 77051cc9f0
add fp16 support for tril_triu op (#30186)
4 years ago
石晓伟 efa54629fb
fix header file paths of gflags, commit 3, test=develop (#30273)
4 years ago
Chengmo 5b2c15afcd
Fix server.h include device_context (#30243)
4 years ago
石晓伟 a0ee09148e
enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)
4 years ago
石晓伟 a66eebab5c
fix header file paths of gflags, commit 4, test=develop (#30274)
4 years ago
石晓伟 8c4500ff6d
fix header file paths of gflags, commit 2, test=develop (#30272)
4 years ago
liym27 b4989fb744
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126)
4 years ago
wangchaochaohu 8dcae0c55d
register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
石晓伟 8ce2482b80
fix header file paths of gflags, commit 1, test=develop (#30271)
4 years ago
chentianyu03 c7371b7b20
type promotion for grad (#30177)
4 years ago
liym27 3ce878f309
Check the rank of input in kernel of set_value op (#30147)
4 years ago
WeiXin 66dc4ac77b
modify error message based on comments (#30189)
4 years ago
wawltor fee424411a
just add the op error message for the matmul xpu (#30246)
4 years ago
GaoWei8 0a21924a8d
optimize softmax forward (#30217)
4 years ago
wangchaochaohu af80859dd6
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
4 years ago
zhang wenhui 5932fee60a
enhance error message, test=develop (#30220)
4 years ago
pangyoki da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
Zhen Wang 7f7dfccf20
Support pure fp16 training for AMP API. (#29544)
4 years ago
Leo Chen 789743e190
use cuda generator in bernoulli cuda kernel (#30199)
4 years ago
Leo Chen 8696335f86
Fix dtype of ungenerated grad var (#28511)
4 years ago
Wilber 609c022222
shape op support int8 and uint8 tensor (#30201)
4 years ago
Wilber 01a287bf0a
fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON (#30194)
4 years ago
ruri e42e1e80dc
Add version checking, test=op_version (#30129)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
Wilber ade244948c
disable mkldnn inplace pass on windows (#30164)
4 years ago
joanna.wozna.intel 907262ee15
Fix analysis predictor test (#30191)
4 years ago
lijianshe02 2dc7ee276b
enhance error message of nll_loss op test=develop (#30125)
4 years ago
Huihuang Zheng 54bf3f5a56
Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)
4 years ago
Chen Weihang d0fb06b27f
[Complex] Simplify prepared op impl to improve performance (#30153)
4 years ago
123malin c5b415bfd9
Improve Index select cuda kernel (#30139)
4 years ago
wangchaochaohu 7dd551e08b
refine the paddle place support using str (#28769)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Wilber 91a8a25721
enhance error info for py_func (#30138)
4 years ago
weihaoji b8207af6bc
[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop (#30122)
4 years ago
liuyuhui 15fac5e7fa
fix assign_op_xpu concat_op_xpu warining (#30120)
4 years ago
Jack Zhou f5428eca4f
fix enforce msg of sum xpu op (#30113)
4 years ago
123malin 198fbdfb60
Add Lookahead and ModelAverage Optimizer (#30004)
4 years ago
Leo Chen adac38c506
add dispenable input for core.ops.reshape2/expand/slice (#30072)
4 years ago
ShenLiang becf99d2e8
fix error message (#30135)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
Zhou Wei 9c99d37906
fix unittest failed on windows (#29837)
4 years ago
wangguanzhong 69839f8a9a
fix error message for distribute_fpn_proposals_op (#30116)
4 years ago
QingshuChen 8e1c3ddf15
add aarch64 and sunway kunlun lib (#30027)
4 years ago
Shang Zhizhou 05b27695f1
add inference api: DisableTensorRtOps (#30109)
4 years ago
石晓伟 53bb126510
fix a bug in op_version_registry, test=develop, test=op_version (#29994)
4 years ago
xiemoyuan 3e0c492910
Optimize the error message of framework. (#30134)
4 years ago
liym27 9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003)
4 years ago
chentianyu03 666e665132
change the kron gradient when complex types (#29995)
4 years ago
chentianyu03 a5e422c85d
add trace op_register_version and fix version bug; test=op_version (#30000)
4 years ago
cc 9f34374b48
Fix the formate of raising error in randperm op (#30108)
4 years ago
liuyuhui 254ad61959
fix xpu pe sync, test=notest (#30095)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
hong 297fff1a79
support dygraph in xpu place (#30051)
4 years ago
wangchaochaohu d0a5620575
fix the compiler error when gcc4 cuda9.0 (#29997)
4 years ago
WangXi ee16006b5d
Optimization grad merge performance (#29784)
4 years ago
yongqiangma e891f4da1b
Add p_norm op version info (#30042)
4 years ago
tangwei12 7d1c149e09
for inference checkpoint (#30081)
4 years ago
tangwei12 7d4bdff07d
fix large scale memory (#30035)
4 years ago
Shang Zhizhou 08dc5bc27e
fix op version checker of pass bug (#30028)
4 years ago
cc 68398abce9
[Inference] zero_copy_tensor supports int8_t (#30053)
4 years ago
whs 1b999d2b5d
Add version checking (#30040)
4 years ago
ceci3 85b2f05ab0
register ModifyAttr for instance_norm, test=op_version (#30065)
4 years ago
channings ddcff254db
fix op_register_version for compare ops, test=op_version (#30007)
4 years ago
Wilber 66e16b7e99
update lite subgraph. (#30056)
4 years ago
GaoWei8 a64822589f
add REGISTER_OP_VERSION for LSTM (#30038)
4 years ago
yinhaofeng 6e93fb92f9
Register op version for linspace,test=op_version (#30025)
4 years ago
123malin d0056c324d
test=develop, add op_register_version for roll_op (#30023)
4 years ago
chentianyu03 e012930aa3
complex gradient matmul (#29966)
4 years ago
ShenLiang 893d37e5c6
Fix rank_attention op_version, test=op_version (#30006)
4 years ago
Adam Osewski 13aef97043
operator checkpoints for new attributes. (#29832)
4 years ago
wangguanzhong 844d8e0c2c
add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034)
4 years ago
cc c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op (#30016)
4 years ago
chalsliu c053bf2a57
Revert "register ModifyAttr for instance_norm, test=op_version (#29938)"
4 years ago
wawltor cc2f94620c
add the support the op version check for matmul, test=op_version (#30011)
4 years ago
wawltor b33aaea86c
add the op version check for the elementwise ops, test=op_version (#30010)
4 years ago
Chengmo 4cbcc9b6da
fix momentum op register (#29941)
4 years ago
hutuxian 7c1f69bdf0
add op_version for flip op [test=op_version] (#30019)
4 years ago
ceci3 77c1684397
register ModifyAttr for instance_norm, test=op_version (#29938)
4 years ago
Leo Chen 47d10c55d5
Enhance debugging (#30001)
4 years ago
FlyingQianMM d42f93e504
add op_register_version for allclose op; test=op_version (#29968)
4 years ago
wawltor 8f49f9d5c9
change the elementwise ops version check, test=op_version
4 years ago
guofei b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop (#29957)
4 years ago
Thunderbrook 0ca6de171f
add include (#29952)
4 years ago
zhangchunle 631d783748
fix bug in windows ci (#29963)
4 years ago
Pei Yang 6206b9bc71
fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test (#29977)
4 years ago
wangxinxin08 be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)
4 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
4 years ago
Guo Sheng 6ac4f0af6a
Register op version for coalesce_tensor. (#29940)
4 years ago
Chen Weihang a1d9a14e89
support grad accumulated across batch (#29942)
4 years ago
cc 6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911)
4 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
4 years ago
YUNSHEN XIE 121658d251
Support xpu ut coverage (#29892)
4 years ago
Jack Zhou 5a4e42ca9a
add gru op_register_version; test=op_version; (#29931)
4 years ago
Wilber 2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925)
4 years ago
Qi Li 913f77a4b7
Register op version for print, test=op_version (#29945)
4 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
4 years ago
cc 7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version (#29923)
4 years ago
石晓伟 acb5e86363
fix a bug in reset_tensor_array, test=develop (#29620)
4 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
4 years ago
Wilber 332da133a1
Support mips arch (#29903)
4 years ago
LielinJiang eab0b60e16
Register op version for grid_sampler, test=op_version (#29916)
4 years ago
liym27 9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
4 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
4 years ago
Ren Wei (任卫) 95df0e1447
Add the ipipe log param prefix (#29545)
4 years ago
YUNSHEN XIE 2a01756bf3
remove duplicate ut names (#29809)
4 years ago
Chen Weihang a6072055be
[Complex] Handle complex to real after type promotion (#29855)
4 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
4 years ago
taixiurong c7acad9f2f
support some shape for matmul and cast in xpu place (#29900)
4 years ago
Leo Chen 6b258317cb
fix TransferInplaceBack (#29830)
4 years ago
QingshuChen 59b47f3b32
feat: support check_nan_inf for kunlun/xpu device (#29694)
4 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
4 years ago
jakpiase edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772)
4 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
4 years ago
Wilber ad0b01ffe2
lod operator should not be reused in memory_optimize pass. (#29828)
4 years ago
liym27 97e75ad0f5
[setitem] Support Tensor setitem in static mode (#29708)
4 years ago
YUNSHEN XIE 24ce051a84
remove duplicate ut reload (#29810)
4 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
4 years ago
Thunderbrook 09b6e71928
heter box (#29734)
4 years ago
Jacek Czaja 7b33720c90
[oneDNN] Tensor copy fix to oneDNN tensors (#29771)
4 years ago
123malin a400b76db7
Roll cuda kernel (#29655)
4 years ago
wuhuanzhou e7ac74c85b
optimize compilation time of argmin/argmax op (#29595)
4 years ago
Zhou Wei 3f83ec61c2
move running unittest on windows to another file (#29815)
4 years ago
chentianyu03 ddfc3d2c2f
change grad elementwise_mul for complex types (#29757)
4 years ago
chentianyu03 2a260d9b0e
change the grad of div when complex types (#29804)
4 years ago
ShenLiang f65f1caad3
opt sparse allreduce using ncclgather (#29819)
4 years ago
TTerror 82aa01c373
add nearest_interp_v2 on kunlun (#29725)
4 years ago
wangchaochaohu 01c37c8e02
refine the compiler error for half2 operation (#29816)
4 years ago
whs 82630408b4
Support double backward rsqrt (#29589)
4 years ago
Zhang Ting b76f5a8489
fix the bug of dropout_grad (#29813)
4 years ago
LielinJiang a94c3cbbf3
register cudnn conv double grad for depthwise conv (#29807)
4 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
4 years ago