Commit Graph

11092 Commits (8c19d7aa2f89a38b3a68e53c73d88af16a3de8ce)

Author SHA1 Message Date
ronnywang 8c19d7aa2f
[ROCM] fix test_conv2d_transpose_op (#31749)
4 years ago
Ouyang Chao a45c8ca69d
fix bug of DepthwiseConvTransposeGradKernel (#31762)
4 years ago
Jacek Czaja 25fc2a1fdb
[oneDNN] Added Elementwise Mul grad fp32/bf16 (#31647)
4 years ago
Chen Weihang 878e117b6d
[CustomOp] Support float16 in custom op (#31725)
4 years ago
ronnywang c9e1d9dc31
[ROCM] fix test_rnn_op (#31735)
4 years ago
zlsh80826 1c67cf0c98
run radix sort of proposals layer on context stream (#31631)
4 years ago
Chen Weihang e429deb0c4
[CustomOp] Support attribute in infershape function (#31713)
4 years ago
Adam Osewski a4a2b77def
[oneDNN] lookup_table op with support for BF16 data type. (#31558)
4 years ago
zlsh80826 c86e771e94
NMS Performance Optimization (#31634)
4 years ago
zlsh80826 50cafa0b0c
remove redundant sync, set collect/dist kernel to context stream, sub_lod memcpy opt (#31641)
4 years ago
ronnywang 420527f0d9
[ROCM] fix layer_norm, norm, p_norm, test_sequence_softmax_op, test_math_op_patch_var_base (#31709)
4 years ago
Chen Weihang 87852616aa
[CustomOp] Support complex dtype in custom op (#31657)
4 years ago
zlsh80826 fe241fd02f
[Paddle-TRT] gather converter (#31640)
4 years ago
zlsh80826 4ea3427865
[Paddle-TRT] support batch axis concatenation when using dynamic shape (#31627)
4 years ago
Zhang Ting 7f50bb7ec1
support NHWC for temporal_shift op (#31642)
4 years ago
Chen Weihang 2fbe9b097a
[CustomOp] Remove Eigen dependencies of float16 (#31669)
4 years ago
yiak c1b1ccfbf5
Update tinyformat.h (#31612)
4 years ago
ronnywang da10c5cf8b
[ROCM] fix softmax_with_cross_entropy_op, test=develop (#31629)
4 years ago
Chen Weihang 027b574a0e
[CustomOp] Remove the dependence of the underlying data types on eigen (#31602)
4 years ago
WangXi 9066b74f58
c_gen_nccl_id add SocketServer to persit server (#31589)
4 years ago
Kaipeng Deng a32e8bf1e7
DataLoader supprot dict str (#31481)
4 years ago
Chen Weihang 30a627aaf3
Normalized function parameter writing (#31588)
4 years ago
Pei Yang cac9635a67
[Paddle-TRT] Fix engine key in trt int8 calibration (#31513)
4 years ago
Shang Zhizhou 50ac7dbfd0
Trt elementwise plugin serialize (#31587)
4 years ago
whs da9dda5c9b
Make CreateProgramDesc more robust (#31543)
4 years ago
Qi Li 3d5aa9d10a
[ROCM] fix conv2d and conv3d op, test=develop (#31553)
4 years ago
Chen Weihang 95cceb2dd7
[CustomOp] Support duplicable op input and output (#31535)
4 years ago
lidanqing 0f1e7e3d52
[Bug fix] Different machine generate different binary file, remove md5 check (#31482)
4 years ago
jiangcheng 9ed6c895f1
optimize range op by place parameters on cpu rather than gpu, test=develop (#30811)
4 years ago
Thunderbrook 3789a69923
solve bug in heter mode (#31531)
4 years ago
chajchaj 6148b87f9d
add softmax_switch for softmax_with_cross_entropy_op, test=develop (#31428)
4 years ago
WangXi 83a2fb1f08
Add collective async wait op (#31463)
4 years ago
lilong12 0205e9f84e
remove the send/recv of tensor size (#31460)
4 years ago
furnace 910f377fa5
Bugfix rocm (#31490)
4 years ago
Qi Li 416e47edef
[ROCM] fix softmax with loss nan in HIP platform, test=develop (#31491)
4 years ago
Shang Zhizhou f57739be35
fix ernie_varlen when cutting head (#31497)
4 years ago
JamesLim 45c7d90564
Optimization of elementwise CUDA kernel (#30801)
4 years ago
Jacek Czaja 23d96cf221
[oneDNN] bumpup onednn 2.2 fixup version (#31473)
4 years ago
wangguanzhong 43d6abf0a5
update conv2d, test=develop (#31480)
4 years ago
wangguanzhong 50af0c2cbb
fix roi_align, test=develop (#31479)
4 years ago
ronnywang e03e46730c
[ROCM] fix gather_op, sigmoid_cross_entropy_with_logits_op, test=develop (#31467)
4 years ago
Qi Li b85c8e03be
[ROCM] fix reduce op, test=develop (#31478)
4 years ago
Jacek Czaja 39a5424ed1
[oneDNN] elementwise add bf16 grad kernel with broadcasting (#31385)
4 years ago
石晓伟 5f6213217b
update zero_copy_tensor_test.cc for build of gcc485, test=develop (#31470)
4 years ago
Qi Li 133a914bd0
[ROCM] fix test_dist_op ci test, test=develop (#31468)
4 years ago
Qi Li f9377965c4
[ROCM] fix dropout and remove hipcub, test=develop (#31455)
4 years ago
石晓伟 bc7632be73
upgrade inference tensor apis, test=develop (#31402)
4 years ago
JamesLim 8491ae9a02
Creating a CUDA function to find the minimum value in warp or block (#31191)
4 years ago
Pei Yang 30717a6cbc
fix trt serialization on windows (#31438)
4 years ago
Pei Yang 1321c47950
add more info in trt engine serialization (#31434)
4 years ago