Leo Chen
9f926eb720
Layernorm opt ( #29522 )
...
* layernorm fw opt
* layernorm bw opt
* fix typo, test=develop
* remove const dim3 for windows CI compatibility
* merge develop
Co-authored-by: zlsh80826 <zlsh80826@gmail.com>
5 years ago
arlesniak
b781953ef5
[oneDNN] Fix flags use test for #29080 , assert condition more general ( #29493 )
...
* Flags assert condition more general, print output if pattern not found
* removed test_flags_use_mkldnn form skip list regarding #29080 descr
5 years ago
tangwei12
ae3f7a7100
add ps table ( #29463 )
...
* add ps table
Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178
5 years ago
chalsliu
36ec9456cf
Make PADDLE_ROOT as an environment variable
5 years ago
ShenLiang
d8391a1983
fix error message of gather nd ( #29521 )
5 years ago
Zhen Wang
5ac71b36fb
Remove tensor copy in the update_loss_scaling op. ( #29426 )
...
* remove tensor copy in the update_loss_scaling op
* not use thrust.
* fix some cuda memory access error.
5 years ago
Zhou Wei
e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase ( #29387 )
...
* support deepcopy for Layer/Tensor/Paramerbase
* fix some code
5 years ago
joejiong
87e75a77c2
Add tangent operator ( #29207 )
...
As the title
5 years ago
zlsh80826
95e334810a
Softmax vectorization ( #29404 )
...
* vec softmax fw
* vec softmax bw
* add a message argument for compiler compatibility
5 years ago
wanghuancoder
a136c9cdb8
fix increamental coverage script bug, WITH_INCREMENTAL_COVERAGE to DWITH_INCREMENTAL_COVERAGE, test=develop ( #29509 )
5 years ago
Aurelius84
966aa0e387
Fix test_mobile_net random failed on windows GPU( #29480 )
5 years ago
ShenLiang
2ef9e0e23c
Rebuild group automatically in dynamic graph distributed ( #29255 )
...
* add tensor_indices in AssignGroupBySize
* add rebuild group in reducer
5 years ago
procr
3a0558339d
support mobilenet for kunlun ( #29458 )
5 years ago
Huihuang Zheng
a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry ( #29442 )
...
Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.
5 years ago
Leo Chen
e5e522493d
make gelu fp16 computing more robust ( #29484 )
5 years ago
LoveAn
8094ac686e
Print ccache/clcache hit rate ( #29341 )
...
* test ccache hit statistics, test=develop
* test ccache hit statistics, test=develop
* add cache hit statistics, test=develop
* fix no percent symbol erro on windows, test=develop
* remove switch, test=develop
5 years ago
Zhang Ting
560b432349
Revert "improve elementwise_add_grad perf ( #29277 )" ( #29464 )
...
This reverts commit befd6d5338
.
5 years ago
jakpiase
57a4f16d9e
added internal and external reorders to profiler ( #29443 )
...
* added external reorder to profiler
* added external and internal reorders to profiler
* added internal and external reorder to profiler
* added formatting to int/ext reorder commit
* removed unnecessary comment
5 years ago
Pei Yang
2480bdef6c
change hard_swish from plugin to layer ( #29177 )
...
* change hard_swish from plugin to layer
* add ut when threshold != scale
5 years ago
taixiurong
ecca6585cd
1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op ( #29448 )
...
Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>
5 years ago
LoveAn
03b42d9fa7
fix unittest on windows, test=develop ( #29365 )
5 years ago
TTerror
a5fcc4b545
update reduce_sum op on xpu ( #29367 )
...
* update reduce_sum op on xpu
* update reduce_sum op on xpu
* support running on xpu
5 years ago
Jack Zhou
c7cada8571
Fix gru performace decline in 1.8.5 ( #29455 )
5 years ago
Zhang Ting
6296f4ed09
revert cast eigen kernel ( #29427 )
5 years ago
Leo Chen
a040c055a5
fix layer_norm accuracy ( #29434 )
5 years ago
Zhou Wei
24ba9ed436
fix that parameters'grad has grad var ( #29408 )
5 years ago
Leo Chen
4e19ce1df5
refine reshape grad and double grad kernel, use tensor copy async ( #29128 )
5 years ago
Shang Zhizhou
225a9c4ed8
Fix unittest ( #29412 )
...
* fix tensorrt unittest precision error
* fix unittest precision error. test_trt_subgraph_pass && test_trt_dynamic_shape_transformer_prune
5 years ago
Pei Yang
f860de4af7
support clip op trt converter ( #29411 )
5 years ago
Jack Zhou
1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 ( #29406 )
5 years ago
LoveAn
671555ed32
Compiling operator libraries with Unity build ( #29130 )
...
* Compiling operator libraries with Unity Build on Windows CPU.
* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
* Add option in windows ci script, no_test, test=windows_ci
* Optimize parallel compiling, test=develop
* remove limit of parallel compile and skip some ops in UB, test=develop
* remove changes of header file, test=develop
* remove changes of header file, test=develop
* fix test_eye_op unittest failed, test=develop
* Compiling operator libraries with Unity Build on Linux, test=develop
* set default WITH_UNITY_BUILD=OFF, test=develop
* Move unity build rules into a single file and add comment, test=develop
* optimize parallel compilation, test=develop
* fix undefined reference error on coverage ci, test=develop
5 years ago
Zhou Wei
5c9bd0bf7c
print whether has build cache ( #29035 )
5 years ago
cc
a623ce044f
Use different name_scope for different conv type, test=develop ( #29355 )
5 years ago
yongqiangma
7c508d8668
update unbind norm add CUDAPlace api doc information ( #29322 )
...
* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop
* fix format. test=develop
* format fix. test=develop
* add lod_rank_table. test=develop
* fix format. test=develop
* fix doc info. test=develop
* fix np error
* add unbind dygraph api. test=develop
* fix unbind doc.test=develop
5 years ago
chentianyu03
879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type ( #29321 )
...
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
* kron, reshape, transpose support complex types
* sum and trace op support complex types
* add test case of sum and trace op
* fix the bug of imag part of complex not initialized
* format file
* format code style
* kron support type promotion; modify test cases
5 years ago
卖鱼的哲学
074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu ( #29280 )
...
* fix expand && concat/transpose to new api
* update uniform_random_op
* update xpu_header
5 years ago
lilong12
1decf4ada6
update, test=develop ( #29331 )
5 years ago
QingshuChen
74bf3bed36
support global pooling for kunlun ( #29293 )
...
* test=kunlun
5 years ago
liym27
b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows ( #29267 )
5 years ago
Chen Weihang
9ad800ebb2
Support type promote for basic math ops (quantum required) ( #29265 )
...
* basic impl of type promote
* add comment & another testcase
* fix complex bugs & support python op promote type
* fix failed unittests & polish code
* add unittest for coverage
* change to only promote complex type
* polish code details
* polish several comments
5 years ago
tangwei12
8358791607
fix gpu outofrange ( #29238 )
...
* fix gpu emb out of range
Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf
* fix doc
Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf
5 years ago
Leo Chen
b58cfff89d
use has_grad instead of train_mode ( #29309 )
...
* use has_grad instead of train_mode
* add vlog for debug
* fix ut
* fix ut
5 years ago
Zhang Ting
befd6d5338
improve elementwise_add_grad perf ( #29277 )
...
* improve performance of elementwise_sum_grad
5 years ago
Shang Zhizhou
ebf689197d
fix tensorrt output shape error ( #29308 )
...
* fix tensorrt output shape error
* fix unittest tensorrt_engine_op_test
* fix code style for unitest
5 years ago
Aurelius84
67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op ( #28421 )
5 years ago
ShenLiang
696dc4bb13
fix the warning of reducer ( #29323 )
5 years ago
wangchaochaohu
c4be80f402
polish the code of cumsum and remove some unused code ( #29303 )
5 years ago
ShenLiang
c00af94435
fix matmulv2 for windows ( #29302 )
5 years ago
wanghuancoder
3765da98c7
add coverage incremental switch, test=develop ( #29290 )
5 years ago
Wilber
d68af02c04
fix analysis_config bug. ( #29304 )
5 years ago
ShenLiang
0fb18bc214
enforce the matmul_v2 error message ( #29297 )
5 years ago
Zhen Wang
9b59a589b1
Remove some useless log. ( #29300 )
5 years ago
Leo Chen
13a22a3752
fix shape of tile_grad op ( #29289 )
5 years ago
Zhen Wang
be3777a50a
Add pure fp16 training with master weights. ( #27712 )
...
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
5 years ago
Wojciech Uss
6673fb0565
change import math.h to cmath ( #29260 )
5 years ago
furnace
7584bb5096
Layer norm fp16 ( #29169 )
...
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
5 years ago
Shang Zhizhou
c59b4f28a2
fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF ( #29275 )
5 years ago
Shang Zhizhou
fc80d2e09c
add compile option WITH_TENSORRT ( #29208 )
...
* add compile option WITH_TENSORRT
* add WITH_TENSORRT to ci paddle_buils.sh
* add WITH_TENSORRT to paddle_build.sh
* change FATAL to WARNING when TensorRT is not found and WITN_TENSORRT=ON, just to pass ci-py3 temporarily
5 years ago
Leo Chen
116305ea4b
Improve performance of elementwise_add grad op ( #29187 )
...
* pass stop_gradient for cast op
* improve performance of elementwise_add grad
* use tensor copy async
* dygraph branch
* fix dygraph branch
* add ut
5 years ago
卖鱼的哲学
07c67d5a8b
add deformable_conv op on xpu ( #29234 )
...
* rebase develop
* update deformable_conv op on xpu
* update deformable_conv op on xpu
5 years ago
Chen Weihang
1de32f823d
Hot fix complle failed in gcc4.8 caused by complex impl ( #29254 )
...
* hot fix complle failed in gcc4.8
* fix failed unittest
5 years ago
GeminiCarrie
642abe2a48
Fix a bug when running on an operating system without "bash." ( #29131 )
...
* Fix a bug when running on an operating system without "bash."
* add execution condition
* for ci-coverage
5 years ago
ShenLiang
46b73e6cd9
Change the api of DataParallel and Fleet ( #29224 )
5 years ago
QingshuChen
64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation ( #29229 )
...
* update conv2d & softmax to new xpu api
* test=kunlun
* remove useless comments
* test=kunlun
* remote softmax xpu op
* test=kunlun
* update kunlun softmax
* test=kunlun
* update xpu unitest
* test=kunlun
* fix elementwise_grad bug for kunlun
*test=kunlun
5 years ago
chentianyu03
8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… ( #29199 )
...
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
5 years ago
Zhou Wei
c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept ( #28429 )
...
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor
* fix coverage
* fix api doc
* fix CI unittest
* fix CI unittest
* fix unitest
* empty tensor does’t need inner_var_
* fix some error message
5 years ago
Wilber
74c43ac638
fix lite unit test. ( #29233 )
5 years ago
Adam Osewski
4096ff94dc
Small optimizations for conv2d kernel subroutines. ( #29188 )
...
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
5 years ago
joanna.wozna.intel
5c61eeef61
Enable all image classification models ( #29155 )
5 years ago
Wilber
4fec182d24
[Lite-Subgraph] Fix compile error for lite subgraph. ( #29146 )
5 years ago
123malin
b5c6342336
Update ps gpu ( #29209 )
...
* fix paramete prefetch & device guard
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: chengmo <chengmo@baidu.com>
5 years ago
liym27
865a45984f
Check whether there is any inplace operation affecting gradient calculation. ( #27901 )
...
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.
* Add a new attribute `_inplace_version` for VarBase.
* Raise exception if an inplace operation can result in incorrect gradient computation.
* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.
* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.
* Use original var_wrapper if the inplace_version is not changed.
* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
5 years ago
chen zhiyu
4056c4f11c
Add unittest in musl build ( #29099 )
...
* add musl docker build script
* rm space test=document_fix
* fix some docs and types errors test=document_fix
* move install of python requirement to docker build
* add copyright to docker file.
* add extr opts
* format docs
* add ut test add pip cache
* add more args description in readme
* add stack backtrace in ctest
* fix readme bugs
5 years ago
123malin
03d4665f44
prefetch optimize ( #29095 )
...
* test=develop, optimize async prefetch
5 years ago
WangXi
0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute ( #28957 )
5 years ago
Chen Weihang
0b032faeee
Polish unittests details and execution conditions to adapt to MUSL ( #29044 )
...
* fix failed tests in yingchun gived list
* add unittests into static_mode_white_list
* add enable static
* fix dist unittest
* skip test_sigmoid_focal_loss_op & add gym
* revert no need skip unittests
* remove gym
5 years ago
123malin
92817f8005
test=develop, rm pathlib ( #28658 )
...
* test=develop, rm pathlib
5 years ago
Wojciech Uss
4fd4095d1b
Add quantization of multi_gru op and tests ( #28615 )
5 years ago
Jack Zhou
bc6033f86b
fix gru gcc7.4 bug for the gru compile
...
fix gru gcc7.4 bug for the gru compile
5 years ago
wanghuancoder
0239f79695
Generate code coverage reports only for incremental files ( #28508 )
...
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* Generate code coverage reports only for incremental files, test=develop
* test for diff python file, test=develop
* fix no python diff report, test=develop
* add cc test file, test=develop
* fix bug in generic.cmake, test=develop
* for debug no cc report, test=develp
* modify compire branch form test_pr to test, test=develop
* fix bug, test=develop
* test for h file changed, test=develop
* debug for redefinition of argument optimize error, test=develop
* close -o3 for test, test=develop
* remove -o3 for test, test=develop
* remove coverage option for nvcc, test=develop
* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop
* reopen -o3, test=develop
* remove debug code, test=develop
* remove unused code, test=develop
5 years ago
wangchaochaohu
b818429ae7
optimize cumsum OP ( #29193 )
5 years ago
ShenLiang
e2d01eb650
Support dynamic graph distributed ( #28997 )
...
* add reducer
* refine envent for memorycopy
* add concat&split for allreduce
* apply concat & split for fuse tensor
* fix nccl dep
* fix the untest, compile problem and ddp initialize problem
* fix untest for mac & add some comments & solve the repeated param in sublayers
* fix untest for windows & fix document
5 years ago
lilong12
7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. ( #29020 )
...
* update, test=develop
5 years ago
pangyoki
7c8ac064c8
Delete prettytable in condabuild ( #29145 )
...
* update conda_build script with removing opencv
* modified filepath
* modified some content
* Delete Commented-Out Code
* delete prettytable in conda_build
Co-authored-by: XieYunshen <1084314248@qq.com>
5 years ago
Zhou Wei
e668cb07fb
fix CUDA 11 error on windows ( #29101 )
5 years ago
Jack Zhou
085260f3de
Add eigen gru and fix the dropout bug in the rnn
...
Add eigen gru and fix the dropout bug in the rnn
5 years ago
yaoxuefeng
545df287fc
add user_define_dump ( #28596 )
5 years ago
Aurelius84
71815637cc
Move gym into unittest/requirements.txt ( #29149 )
5 years ago
arlesniak
bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes ( #28988 )
5 years ago
Shang Zhizhou
b9e76a0103
detect tensorRT plugin fp16 in runtime ( #27933 )
...
* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
* comile with cuda9
* add some unittest
* notest;test=coverage
* add unittest for trt plugin swish && split
* update ernie unittest
* fix some error message
* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
* fix comile errror when CUDA_ARCH_NAME < Pascal"
* fix comile error
* update unittest timeout
* compile with cuda9
* update error msg
* fix code style
* add some comments
* add define IF_CUDA_ARCH_SUPPORT_FP16
* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
5 years ago
Leo Chen
fd3fcb051a
fix typo of flag name ( #29154 )
5 years ago
Noel
da71173bc9
Fix ops doc for some ops
...
Fix ops doc for some ops
5 years ago
Leo Chen
770395cb93
Split train_mode and has_grad for tracer ( #29064 )
...
* split train_mode and has_grad
* fix format
* fix ci problems
* fix sample code
5 years ago
Aurelius84
7ae3cb554a
Polish CUDA Information stdout ( #29109 )
5 years ago
chalsliu
7a15e64034
Support precision test for new ut
5 years ago
WangXi
173c22aec2
optimize fast graph executor ( #28962 )
5 years ago
Shang Zhizhou
562ded1041
fix unittest trt_dynamic_shape_transformer_prune_test error ( #29122 )
5 years ago
Shibo Tao
db41258501
add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. ( #29034 )
5 years ago
joanna.wozna.intel
b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests ( #29039 )
...
* Add bf16 pool2d and unify bf16 unit tests
* Add change default ops test
5 years ago
joanna.wozna.intel
fddea67445
Fix cpu_bfloat16_pass ( #28730 )
...
* Fix cpu_bfloat16_pass
* Add output_format
* Fix incorrect SetOutput
* Change fromating
5 years ago
Qi Li
2fd16cf6fc
fix win ci failure, test=develop ( #29089 )
...
* fix win ci failure, test=develop
* add ci test, test=develop
5 years ago
Chen Weihang
fea0e294ee
Hide the C++ stack by default and add hints ( #29042 )
...
* default not show cpp statck & add hint
* fix failed unittest
* fix failed unittests
5 years ago
Chen Weihang
b1274ac3d6
set show cpp stack by default, test=document_fix ( #29102 )
5 years ago
joejiong
582c0a0468
add uint8 for reshape op ( #28996 )
...
add uint8 for reshape operator
5 years ago
Zhou Wei
8ca0a8a859
fix tensor detach to zero copy ( #27921 )
...
* fix tensor detach to zero copy
* fix tensor detach to zero copy
5 years ago
Aurelius84
8af0d85ea4
fix unittest failed on windows GPU ( #29072 )
5 years ago
taixiurong
a5aa4dc7a9
add xpu elementwise ops ( #29031 )
5 years ago
joejiong
b04c78ef5e
Update pow ( #29000 )
...
Simple code clean up
5 years ago
wawltor
b2c8a00745
remove eigen threadpool for the speed up
...
remove eigen threadpool for the speed up
5 years ago
Wojciech Uss
7b5a8e46de
Add multi_gru_fuse_pass and tests ( #28601 )
...
* Add multi_gru_fuse_pass and tests
* fix date
* cleaned up headers
5 years ago
LoveAn
c91bb084f4
Add op benchmark ci pipeline in Paddle repo ( #28692 )
5 years ago
Zhou Wei
5e26a15484
Open GPU unitest on windows ( #29003 )
...
* open unittests on windows
* open GPU unittest on windows
5 years ago
Leo Chen
3815d7aa40
Upgrade string literals to raw string ( #28989 )
...
* upgrade comment string to raw string
* fix string in
* fix string with ' '
* revert update on comments
* upgrade only necessary
* fix sample code checker
* fix comments with '''
5 years ago
lilong12
767d0ba267
update, test=develop ( #28700 )
5 years ago
Wojciech Uss
991345b368
Add multi_gru_seq_fuse_pass and tests ( #28604 )
...
* Add multi_gru_seq_fuse_pass and tests
* fix date
* removed unused functions
5 years ago
123malin
fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode ( #28442 )
...
* test=develop, optimize global_step
5 years ago
lilong12
f77a78cdee
enable pipeline to run with Executor.run() ( #28373 )
...
* update, test=develop
5 years ago
Thunderbrook
0073f9bdb0
support ps-gpu ( #28752 )
...
* ps gpu transpile
* ps gpu
* remove op
* gps trainer
* local ps
* add macro
* HeterBox
* def cuda
* tab
* code style
* style
Co-authored-by: Thunderbrook <a754913769#163.com>
5 years ago
Chen Weihang
768dab441e
polish two api doc detail, test=document_fix ( #28971 )
5 years ago
furnace
8ff3550658
refactor momentum op to combine weight ( #27414 )
...
* refactor momentum op to combine weight_decay (scale op and sum op)
5 years ago
Jacek Czaja
bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor ( #28758 )
5 years ago
chen zhiyu
3d0ff8eebc
optimize musl docker build script ( #28974 )
...
* add musl docker build script
* rm space test=document_fix
* fix some docs and types errors test=document_fix
* move install of python requirement to docker build
* add copyright to docker file.
* add extr opts
* format docs
5 years ago
Pei Yang
994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode ( #28702 )
...
* change avg pooling and global pooling to trt layer
* add support for static shape global pooling
* modify trt errmsg
5 years ago
yaoxuefeng
71c1cd1408
fix truncated_gaussian seed ( #28777 )
5 years ago
HappyAngel
de528981e5
fix paddlepredictor build error. test=develop ( #28792 )
5 years ago
Wilber
a22ea652cf
fix trt delete_pass bug. ( #28763 )
5 years ago
gongweibao
1dad8ceaab
Fix gpu memory allocation bug. ( #28703 )
5 years ago
Chen Weihang
b969c32ab1
fix occupied 0 device memory bug ( #28771 )
5 years ago
joejiong
1a532d5133
add uint8 support for squeeze operator ( #28734 )
...
Adding uint8 support for squeeze operator.
5 years ago
wangchaochaohu
8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode ( #28694 )
5 years ago
joanna.wozna.intel
8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul ( #28729 )
...
* Add bf16 matmul, fc, elementwise add and mul
* Correct unit test
5 years ago
Wojciech Uss
efc3b182f0
a fix for the fc_lstm_fuse_pass ( #28709 )
5 years ago
Zhou Wei
3b0dd5f620
fix bug that to_tensor not support paddle.Place ( #28717 )
5 years ago
yaoxuefeng
08b62f4902
fix shuffle batch op shuffle ( #28533 )
5 years ago
taixiurong
d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun ( #28542 )
...
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
5 years ago
Jack Zhou
9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel ( #28577 )
...
* add lstm, simple rnn op kernel
* fix the test_lstm for the rnn op
* change func name
* fix forward postprocess bug
* add gru forward, backward code
* remove unittest.skipIf; use a big rnn op instead of combination op
* fix input doesn't have gradient bug
* add eigen lstm forward, backward
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
5 years ago
QingshuChen
30ef3815b3
adjust kunlun header file ( #28536 )
...
* adjust kunlun header file
*test=kunlun
* update kunlun unittest
*test=kunlun
* update xpu unitest
* test = kunlun
* update xpu unittest
* test=kunlun
* update xpu unitest
* test=kunlun
5 years ago
Zhang Ting
dab4920568
improve performance of cast op ( #28727 )
5 years ago
Zhou Wei
3a88acd2ee
open unittests on windows ( #28750 )
5 years ago
yaoxuefeng
03f46e3526
fix truncated_gaussian op cuda seed setting ( #28678 )
5 years ago
Wilber
04cefeacc5
Disable windows gpu static lib. ( #28741 )
5 years ago
Wojciech Uss
04bcc13fac
Add multi_gru op and tests ( #28591 )
...
* Add multi_gru op and tests
* removed redundant disable_dygraph()
5 years ago
wanghuancoder
5aec7dbeb0
use forward declarations for framework.pb.h ( #28494 )
...
* use forward declarations for framework.pb.h, test=develop
* use forward declarations for framework.pb.h, test=develop
5 years ago
iducn
f1074e3b19
hide the token output to safely ( #28716 )
5 years ago
joejiong
32b90b1c2d
add log10 ( #28576 )
...
Add new operator log10
5 years ago
Leo Chen
3d09929b1f
Add check for non-dispensable input ( #28666 )
...
* Add check for non-dispensable input
* fix typo
5 years ago
Chen Weihang
7eeb99fe02
Add basic hook classes for dygraph & implement reduce hook ( #28584 )
...
* add base hook classes and reduce hook impl
* fix constructor typo
* polish comment format
* refactor baisc hook class design
* polish design details
5 years ago
Guo Sheng
858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. ( #28561 )
...
test=develop
5 years ago
Jacek Czaja
6d8d3d4c22
[oneDNN] Layer norm bf16 kernel ( #28619 )
5 years ago
lilong12
80d2024644
bug fix, test=develop ( #28674 )
5 years ago
Zhou Wei
bf143652ac
fix lstm OP compile error on windows ( #28667 )
...
* add unittest and check unittest for windows
* fix lstm OP compile error on windows
5 years ago
石晓伟
57dab959ca
add datanorm op new scale_w register ( #28657 )
...
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
5 years ago
cc
65aac81191
Fix fake_quant error when cout > 1024, test=develop ( #28603 )
5 years ago
lilong12
b2f7ab6636
bug fix, test=develop ( #28648 )
5 years ago
wawltor
8f2656ef5c
fix the gradient bug for the topk v2
...
fix the gradient bug for the topk v2
5 years ago
wangchaochaohu
a972c33fd7
refine gather OP performance for dynamic mode ( #28587 )
5 years ago
joanna.wozna.intel
2cb71c0cde
Add checkpoint to quantize ( #28612 )
...
* Add checkpoint to quantize
* Change bfloat16 option
5 years ago
lidanqing
804271cff9
Op version python mkldnn_inplace test ( #28354 )
...
* add mkldnn inplace op version test
* update mkldnn_inplace fuse pass
* update the inplace test
5 years ago
pangyoki
b889a0cee2
add gaussian_random op_version ( #28602 )
5 years ago
YUNSHEN XIE
cf2c42a937
fix exec nightly error on mac ( #28567 )
5 years ago
Guo Sheng
110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy ( #28622 )
...
* Fix gradients with ignore_idx in softmax_with_cross_entropy.
test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu.
Remove softmax_with_cross_entropy from op_threshold_white_list.
test=develop
* Fix test_softmax_cross_entropy_op.py.
test=develop
5 years ago
Wilber
8b97bb2e1f
Update cmake for arm ft and fix a bug for Predictor dtor. ( #28586 )
5 years ago
Leo Chen
f962bd3432
Fix cudnn workspace limit in cudnn-8 ( #28611 )
5 years ago
Leo Chen
90805e2df7
Register op_version for new attribute use_addto ( #28463 )
...
* register op_version for addto
* upgrade pass capability
* change eq to le
* change eq to le
* fix merge
5 years ago
danleifeng
a24d186814
fix nccl init failed in parallel dygraph mode ( #28497 )
5 years ago
Zhou Wei
93c39779b4
open a part of GPU unittest for windows ( #28378 )
...
* open a part of GPU unittest for windows
* open a part of GPU unittest for windows
5 years ago
lilong12
ed9dd7c9f0
add send and recv ops ( #28590 )
...
* update, test=develop
5 years ago
Zhong Hui
a829357e4d
register the op version for some ops
...
register the op version for some ops
5 years ago
Zhou Wei
bf6e7cba7a
updata 2.0 API english doc ( #28525 )
...
* make Numpy version is below 1.19.3
* fix 2.0 doc
5 years ago
YUNSHEN XIE
7b1619e69b
disable test_trt_dynamic_shape_transformer_prune,test=document_fix ( #28588 )
5 years ago
Zhou Wei
849467b5aa
fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks ( #28547 )
5 years ago
Shang Zhizhou
8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug ( #28517 )
...
* skip_layernorm_op done
* add unittest
* slice op convertor support trt < 6
* skip_layernorm only work in ernie
5 years ago
joejiong
08d2413142
add log2 operator ( #28319 )
...
As the title
5 years ago
lidanqing
0fc181dbd0
[Fix bug] If the pass name is not found, IsCompatible should return false ( #28475 )
5 years ago
Wilber
1bf4836580
[Inference] Add TryShrinkMemory interface. ( #28409 )
5 years ago
wangchaochaohu
c52fe48f6f
fix the GetKernelTypeForVar of input for fluid.gather ( #28534 )
5 years ago
wangchaochaohu
d7cfee9b31
Checkout point add ( #28488 )
...
* upgrade pass capability
5 years ago
YUNSHEN XIE
98dc11bb6a
add monitoring for executive ut at night ( #28377 )
...
* add monitoring for executive ut at night
* fix some error for paddle_build.bat
* fix some error
* fix some error in windows
* fix some error on windows
5 years ago
Pei Yang
75196cda40
Paddle-TRT int8 support mul op channelwise quant ( #28422 )
...
* paddle-trt support mul channelwise quant
* add support for depthwise_conv2d
* add errmsg for unsupported op type
5 years ago
zhupengyang
47cbf61dd4
fix softmax unittest float16 random error ( #28480 )
5 years ago
Zhou Wei
53e9aa948d
remove diff with develop ( #28504 )
5 years ago
YUNSHEN XIE
369605be1d
fix cmake error when execute build_inference_lib ( #28503 )
5 years ago
Wilber
645e999afc
fix api_impl test. ( #28483 )
5 years ago
YUNSHEN XIE
1e698c600e
fix cmake error when setting ut timeout properity ( #28492 )
5 years ago
wangchaochaohu
e14ed71cc2
refine the performance of gather Op ( #28458 )
5 years ago
wanghuancoder
e29ab5eacb
clear clcache cache file and reopen clcache ( #28384 )
...
* clear clcache cache file and reopen clcache, test=develop
* reopen clcache, test=develop
5 years ago
YUNSHEN XIE
ba0756325a
exec ut no more than 15s 1 ( #28439 )
...
* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
* test for limiting ut exec time as 15S
* fix an error caused by cannot find ut
* fix some error
* can not find test_transformer
* fix error caused by ut not run in windows
* fix error caused by Compiler Options
* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
* setting timeout value to 120s for old ut
* add the timeout value setting
* fix error caused by ut only run in coverage_ci
* add analyzer_transformer_profile_tester
* fix some error
* fix some error
* fix error with inference option
* fix error with inference option setting as ON_INFER
* add some ut to set timeout
* modified some option
* fix error
* fix some timeout error
* fix error
* fix error
* fix timeout for test_analyzer_bfloat16_resnet50
* fix error
* setting timeout properity for some ut
* first pr for new ut timeout as 15S
5 years ago
Chen Weihang
155b4f9b6c
Remove selected rows all reduce over height check ( #28460 )
...
* remove slelected rows all reduce over height check
* polish unittest
5 years ago
taixiurong
fad4744aa4
fix crash in adam in xpu, *test=kunlun ( #28433 )
5 years ago
QingshuChen
6bba8e57b1
fix batch_norm_xpu bug & remove xpusimulator dependence ( #28430 )
...
*test=kunlun
5 years ago
Wilber
ced5c40c41
Update memory release interface. ( #28456 )
5 years ago
joanna.wozna.intel
7821759d48
Add bfloat16 softmax and gelu ( #28394 )
...
* Add bfloat16 softmax and gelu
* Add pass attr bfloat16_enabled_op_types
* Changes from review
5 years ago
iducn
ba0fe0a812
revert the modified shell script ( #28453 )
5 years ago
Chen Weihang
c42e656179
Add retry for dygraph parallel socket bind ( #28404 )
...
* add retry for dygraph parallel socket bind
* change to loop always
* fix writing error
5 years ago
石晓伟
c41fd033e5
check op_version_registry in CI test, test=develop ( #28402 )
5 years ago
Jacek Czaja
ca41541472
[oneDNN]Sum bf16 kernel ( #28382 )
...
* - Added sum bf16 oneDNN
test=develop
* - Fix to UT of sum bf16
test=develop
5 years ago
Chen Weihang
23439b1688
show cpp stack when catch signal ( #28415 )
5 years ago
Leo Chen
44a476c2ab
support cuda pinned place ( #28416 )
5 years ago
lidanqing
12b9587be5
Add conv_bias pass version python test ( #28278 )
...
* add conv_bias pass version test
* update according to reviews
5 years ago
Wilber
05114693cf
[Inference] Memory modification for ShrinkMemory. ( #28355 )
5 years ago