phlrain
4b9689379f
fix cudnn lstm; test=develop
7 years ago
phlrain
d1a17cadd4
fix cudnn rnn; test=develop
7 years ago
JiabinYang
4124253796
add mac ci check on import, test=develop
7 years ago
Qiao Longfei
9450048acb
add PADDLE_ENABLE_REMOTE_PREFETCH to enable remote prefetch
...
test=develop
7 years ago
Xin Pan
75939c2059
fix
...
test=develop
7 years ago
Tao Luo
20120d9c97
Merge pull request #14608 from jczaja/prv-conv2d-transpose-mkldnn
...
[MKL-DNN]conv2d transpose
7 years ago
Qiao Longfei
3e45a5a5ec
lookup_table gpu kernel support prefetch
...
test=develop
7 years ago
Zhaolong Xing
d215293c92
Merge pull request #14649 from NHZlX/add_params_sync_pass
...
Add params sync pass
7 years ago
Qiyang Min
055da6e00d
Merge pull request #14656 from velconia/disable_dist_transpiler_ut_in_mac
...
Change pip to correct version when install wheel package
7 years ago
qingqing01
731d45a39a
Enable BatchNorm to use global mean and variane during training ( #14630 )
...
* Enable BatchNorm to use global mean and variane during training
* Update doc and follow comments.
7 years ago
nhzlx
49c28b8c52
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_params_sync_pass
...
test=develop
7 years ago
nhzlx
3c83a2f720
fix comments
7 years ago
Xin Pan
ad6ed5b745
fix py3
...
test=develop
7 years ago
Xin Pan
0cc9ab3dc2
enable API check for readers
...
test=develop
7 years ago
luotao1
4a4daa8ab4
Merge branch 'develop' into has_attr
7 years ago
Qiao Longfei
75eba6108d
Add scope doc ( #14582 )
...
* add doc for scope
* update doc for force_init_on_cpu
test=develop
* follow comment test=develop
* update format test=develop
7 years ago
Tao Luo
ea47685f91
Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum
...
Softmax for inference MKL further changes
7 years ago
Qiao Longfei
3a3cfc2d8d
prefetch support gpu
...
test=develop
7 years ago
minqiyang
fe0dee88d8
Change pip version to correct version when install wheel package
...
test=develop
7 years ago
baojun-nervana
d5ee05e6c3
Replaced VarIsTensor
...
test=develop
7 years ago
baojun-nervana
e6bd53be60
Named to RuntimeInferShape
...
test=develop
7 years ago
Sang Ik Lee
24e70920db
Refactor some build settings.
...
test=develop
7 years ago
baojun-nervana
a29696146c
Added annotation
...
test=develop
7 years ago
Sang Ik Lee
d6125a5eec
Include ngraph in inference demo build.
...
test=develop
7 years ago
baojun-nervana
caf4b937b3
Added RunInferShape
...
test=develop
7 years ago
baojun-nervana
1d19eb2bd4
Implemented ngraph engine
...
test=develop
7 years ago
Qiao Longfei
4b9082a4cd
follow comment
7 years ago
Tao Luo
b4de023ee1
Merge pull request #14636 from Superjomn/fix/word2vec
...
fix word2vec bug
7 years ago
luotao1
fe915901cd
update Opdesc's HasAttr
...
test=develop
7 years ago
chengduo
6776e92846
refine tensor_array_write_read ( #14643 )
...
test=develop
7 years ago
nhzlx
d3e140a572
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_params_sync_pass
...
test=develop
7 years ago
nhzlx
d666c8eb1d
fix benchmark
7 years ago
nhzlx
900fbb83f9
add params sync pass
7 years ago
superjomn
9c665c81ae
update
...
test=develop
7 years ago
Jacek Czaja
48e1b97e8e
- Coding style fixes
...
test=develop
7 years ago
Qiao Longfei
d32de7e6e1
fix code format test=develop
7 years ago
Qiao Longfei
5a660aee7d
update log level in parameter prefetch test=develop
7 years ago
Qiao Longfei
8ebde595c9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
Qiao Longfei
b9d3d75fc4
fix prefetch dependency test=develop
7 years ago
Qiao Longfei
145c535750
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
minqiyang
9d7c3b18c0
Polish code
...
test=develop
7 years ago
minqiyang
2b430adaee
Polish code
...
test=develop
7 years ago
minqiyang
a02ce58f2c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
Jiabin Yang
12e1719f96
Merge pull request #14352 from JiabinYang/enhance_hierachical_sigmod_op
...
Enhance hierarchical sigmoid op
7 years ago
Qiao Longfei
40f68b1349
unit test ready
7 years ago
Qiao Longfei
36e26a53b0
Optimize bilinear tensor product op ( #14485 )
...
* optimize bilinear_tensor_product
* add set zero to set grad to 0.
7 years ago
Tao Luo
4ec9de0122
Merge pull request #14628 from Sand3r-/mgallus/mkldnn-elementwise_mul
...
EltwiseMul: Changes from previous PR
7 years ago
Qiao Longfei
35b79ab865
Merge pull request #13983 from jacquesqiao/add-ctr-reader
...
Add ctr reader
7 years ago
wopeizl
b1dbbb7f88
Merge pull request #14629 from wopeizl/windows/port
...
fix the build issue on manylinux1
7 years ago
Qiao Longfei
da387720d7
fix infer compile test=develop
7 years ago
Jacek Czaja
cf40daee58
- Building fix to softmax for inference
7 years ago
Clementine
6c71c1f8f9
Add activation gelu ( #14569 )
7 years ago
Michal Gallus
9455be0ba5
EltwiseMul: Extract StringToFormat to MKLDNN helper
...
test=develop
7 years ago
peizhilin
351dc78e1c
code style fix
...
test=develop
7 years ago
Jacek Czaja
1540df51cf
- Fix to test_conv2d_transpose_mkldnn for GPU
...
test=develop
7 years ago
JiabinYang
eda069068d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
JiabinYang
a08dc83eb0
remove arg 'non_leaf_num', test=develop
7 years ago
chengduo
6648f5ed6f
add ShareLoD for dropout_grad ( #14616 )
...
test=develop
7 years ago
peizhilin
b6b8626e9c
fix the build issue on manylinux1
7 years ago
Qiao Longfei
18fd2d01b7
update embedding api
7 years ago
JiabinYang
7594787deb
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
JiabinYang
c469334cfb
polish python code and comment, test=develop
7 years ago
Xin Pan
3c77ce3751
Merge pull request #14593 from panyx0718/fix5
...
Protect important header files.
7 years ago
Qiao Longfei
92afbb923c
fix compile problem test=develop
7 years ago
Tao Luo
e8ef14d2a7
Merge pull request #14610 from Superjomn/revert/cache_fix
...
Revert "fix transfer cache thread_local bug (#14581 )"
7 years ago
Qiao Longfei
97cbec9b74
clean code
7 years ago
Qiao Longfei
1edd435da6
fix ci problem test=develop
7 years ago
JiabinYang
87648f8edf
merge develop, test=develop
7 years ago
Yiqun Liu
726f2cefe3
Fix bug of referencing a temporary variable. ( #14614 )
...
test=develop
7 years ago
wopeizl
db9284ecde
Merge pull request #14617 from wopeizl/windows/online
...
Windows/online
7 years ago
JiabinYang
c3c3c0b33c
polish code, test=develop
7 years ago
gongweibao
867c312bc4
Fix allreduce dependency order. ( #14586 )
7 years ago
Jacek Czaja
8bfa1fa9bb
- ASUM MKL integration
7 years ago
phlrain
487ee36aec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
tangwei12
56a4912b76
Make NCE_OP more efficient and support SelectedRows ( #14469 )
...
* Fix truncated normal.
* Fix.
* Make nce support more distribution.
* Fix API.spec.
* Fix python API.
* Fix.
test=develop
* Fix API.spec
test=develop
* Fix sampler.
* Fix order of arguments in python API.
test=develop
* NCE add selectedrows support
* NCE update weighted sampling
* fix bugs in nce_op, and assign_value_op optimized
* fix bugs in nce_op, revert assign_value_op
* nce_op optimize
* nce_op optimize
* nce_op optimize
* add selectedRows test later
test=develop
* add selectedRows supported
* add selectedRows supported
test=develop
* add selectedRows supported
* add nce selectedRows supported, test=develop
* add nce selectedRows supported
* add nce selectedRows supported, test=develop
* fix height in nce, test=develop
* add ut
* add ut, test=develop
* make AutoGrownIndex inline
test=develop
* fix tinny error, test=develop
7 years ago
liuhongyu
1ffe41d722
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
Qiao Longfei
9589babe12
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
liuhongyu
05917c3c79
add cudnn lstm; test=develop
7 years ago
Zeng Jinle
1c48d61442
Merge pull request #14599 from sneaxiy/fix_mac_unittest_bug
...
Fix Mac unittest bug
7 years ago
Qiao Longfei
f35f3fe77a
ctr reader can not be used in windows
...
test=develop
7 years ago
peizhilin
6a85dd3278
Merge remote-tracking branch 'upstream/develop' into windows/build
...
test=develop
7 years ago
peizhilin
38715e6fd0
minor fix
7 years ago
JiabinYang
7389597ce2
Update API.spec, test=develop
7 years ago
peizhilin
511cc9024a
fix for build issue
7 years ago
Qiao Longfei
6bef565dac
clean code test=develop
7 years ago
Qiao Longfei
e7d1f524f3
change log level
...
test=develop
7 years ago
JiabinYang
7e4bd695e6
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
Qiao Longfei
fe54adf70c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
JiabinYang
b10df8bcfa
refine code and add none bias ut, test=develop
7 years ago
Kaipeng Deng
251a1bb0f4
Merge pull request #14588 from heavengate/revert_interpolate
...
fix interpolate_op incompatible. test=develop
7 years ago
Qiao Longfei
668ae9083e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
Qiyang Min
30e47bce8b
Merge branch 'develop' into revert_vlog
7 years ago
tensor-tang
3ae6692a0d
Merge pull request #14512 from tensor-tang/fea/jit/rnn
...
Fea/jit/rnn
7 years ago
superjomn
4babc6b06c
update
...
test=develop
7 years ago
sneaxiy
f3522a11d2
fix mac unittest bug
...
test=develop
7 years ago
Qiao Longfei
87e4edd2ea
fix grad_varname in remote prefetch
7 years ago
Qiyang Min
6232d1f1dd
Merge pull request #14578 from velconia/add_production_dockerfile
...
Add python3.6 and python3.7 support to production generated Dockerfile
7 years ago
superjomn
dc249d3b69
Revert "fix transfer cache thread_local bug ( #14581 )"
...
This reverts commit 5c073a4db2
.
7 years ago
Qiao Longfei
d98c59fd2c
support none sliced variable
7 years ago
dengkaipeng
bb489d4cc9
add interp_method default bilinear. test=develop
7 years ago
dengkaipeng
78f563917c
revert interpolate_op to bilinear_interp_op & nearest_interp_op. test=develop
7 years ago
Jacek Czaja
fb24690a58
- conv2d transpose MKL-DNN
...
test=develop
- Added new header for MKLDNN reuse functionality
- Extended conv2d_transpose GetExpectedKernelType for MKL-DNN supporrt
- Buildable conv transpose mkldnn and conv mkldnn using conv template
- Conv2d transpose roughlt implemented and buildable
- Added modifications conv2d transpose MKLDNN unit tests
- Fix to UT of conv2d transpose mkldnn op
- Wrong type of MKLDNN primitive was chosen for conv2d transpose
- HAcks for conv2d transpose
- UT enalbed
- Replaced copying loop with memcpy
- Draft of passing lambda into AcquireMemory
- Made reorder (IOHW->OIHW) to be called only once
7 years ago
tensor-tang
7a91271436
Merge branch 'develop' into fea/jit/rnn
7 years ago
minqiyang
be04d99fe4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
wopeizl
05b7ee7eeb
Merge pull request #14545 from wopeizl/windows/online
...
Windows/online
7 years ago
JiabinYang
81e145764d
refine code and comments, test=develop
7 years ago
minqiyang
bcaa8a3b67
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_production_dockerfile
...
test=develop
7 years ago
Qiao Longfei
af2f5fc824
fix some bugs
7 years ago
JiabinYang
2f6b529aff
refine code and comments, test=develop
7 years ago
Xin Pan
e32f4c5423
fix
...
test=develop
7 years ago
Xin Pan
3e665862b8
Protect important header files.
...
test=develop
7 years ago
minqiyang
e43f5bc77c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_dist_resnet_ut_in_py36
...
test=develop
7 years ago
minqiyang
53433d7f2e
Revert the changes of VLOG
...
test=develop
7 years ago
tensor-tang
1f0291a51e
add comments and follow comments
...
test=develop
7 years ago
tensor-tang
557229bd39
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
Qiao Longfei
ed9fa4b301
can run
7 years ago
peizhilin
30849d1f20
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
qingqing01
6224e61fd9
Transpose-Flatten-Concat fusion operator. ( #14568 )
...
* Transpose-Flatten-Concat fusion operator.
* Add unit testing and fix bug.
7 years ago
Yan Chunwei
5c073a4db2
fix transfer cache thread_local bug ( #14581 )
7 years ago
Xin Pan
87332bb18d
Merge pull request #14579 from Superjomn/fix/transfer-cache-compile-error
...
fix compile
7 years ago
minqiyang
8b154c172f
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_dist_resnet_ut_in_py36
...
test=develop
7 years ago
Qiao Longfei
686d15c8e0
update grpc_variable_response
7 years ago
Jiabin Yang
13bc7619f5
Merge pull request #14552 from JiabinYang/fix_mac/fix_pinned_memory
...
fix Mac unittest error on reading pined memory flag
7 years ago
tangwei12
3639d99f99
Fix save and load lookup table/optimizer vars ( #14301 )
...
* fix mkdir conflict
* fix load/save lookup tables
test=develop
* add lookup_table_utils
* fix load optimize vars on pserver
* delete lookup table utils
* fix save and load lookup tables
* fix load optimizer var
* fix load optimizer var, test=develop
* fix python 3 style, test=develop
* move lookup_table_utils to contrib utils
7 years ago
peizhilin
36cd18b549
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
qingqing01
39ec80def4
Remove the memory copy of feeding data in C++ inference API ( #14577 )
...
* Remove the memory copy for feeding data in C++ inference API
* Fix compling dependence
* Fix compling in ONLY_CPU mode
7 years ago
peizhilin
b2f8d4183d
Given the different fraction_of_gpu_memory_to_use depends on platform
7 years ago
Qiao Longfei
d827881502
fix pserver and prefetch rpc
7 years ago
peizhilin
1afa9492af
Recover the profiler
7 years ago
Yiqun Liu
bf222f197d
Use sub scope in tensor_array_to_tensor op. ( #14524 )
...
test=develop
7 years ago
superjomn
4b40c0013b
fix compile
...
test=develop
7 years ago
JiabinYang
02d68051db
add sparsed bias grad, test=develop
7 years ago
dzhwinter
840c1b29ad
test=develop ( #14562 )
...
* test=develop
remove code.
* test=develop
7 years ago
Qiao Longfei
5856c2f332
change Var to FindVar
7 years ago
Yu Yang
26af9cf90c
Merge pull request #14565 from chengduoZH/fix_cublas_warp_error
...
Fix cublas warp error
7 years ago
Qiao Longfei
312b7786d9
clean code
7 years ago
Qiao Longfei
2b6c0c09d6
add unit test
7 years ago
Yan Chunwei
923c8e3332
add benchmark for inference ( #14571 )
7 years ago
minqiyang
c92c440fa1
Add python3.6 and python3.7 support to production generated Dockerfile
...
test=develop
7 years ago
Qiao Longfei
47280ef8b4
lookup table op support prefetch
7 years ago
Yan Chunwei
a7188d5bc7
fix executor transfer cache bug ( #14518 )
7 years ago
gongweibao
c1bf9664cd
Add options to disable SO_REUSEPORT of grpc. ( #14269 )
7 years ago
minqiyang
ee73810fd5
Fix API.spec
...
test=develop
7 years ago
Qiao Longfei
4ad5fd8f54
add parameter prefetch
7 years ago
Qiao Longfei
9d276fe8a8
add parameter prefetch
7 years ago
minqiyang
d2045260a5
Change visibilities of variant_visitor of pybind11
...
test=develop
7 years ago
minqiyang
b67229187e
Change to PYBIND11_MODULE because the deprecation of PYBIND11_PLUGIN
...
test=develop
7 years ago
minqiyang
81994e84e0
Change the include files because the version changes of pybind11
...
test=develop
7 years ago
Tao Luo
e90afec47b
Merge pull request #14543 from luotao1/threads
...
add thread related inference api
7 years ago
qingqing01
64ca3d176c
Add bias_attr in sequence_conv_pool API. ( #14553 )
7 years ago
chengduozh
f7847ca6a3
fix cublas warp error
...
test=develop
7 years ago
Zhaolong Xing
e52d90a35e
Merge pull request #14527 from hjchen2/develop
...
Refine split TensorRT plugin
7 years ago
Qiyang Min
4531281386
Merge pull request #14526 from velconia/add_python36and37_to_paddle_build
...
Add python 3.6 and python 3.7 support to paddle build
7 years ago
JiabinYang
47c4e65d60
test=develop
7 years ago
luotao1
116979a40a
refine api name
...
test=develop
7 years ago
luotao1
e66b4c6bff
adjust tester_helper to make multi-instance multi-thread work
...
test=develop
7 years ago
luotao1
a5c4b463c9
add SetMKLDNNThreadId api
7 years ago
luotao1
e21edb26f6
add Set/GetCPUNumThreads api
7 years ago
Qiao Longfei
9851a53478
add prefetch part in pserver
7 years ago
JiabinYang
5cd2fc9fd0
just for test
7 years ago
JiabinYang
42470f14b7
test=develop
7 years ago
peizhilin
445fff24dc
add the bigobj option to NVCC compile
...
fix code style
7 years ago
sabreshao
61c5f13fcf
Fix cmake for AMDGPU platform ( #13801 )
...
* HIP cmake.
Enable whole archieve build for pybind library.
Disable two warning.
Rollback to C++11.
Link RCCL to WA gpu kernel loading issue.
Update eigen to fix build failure.
Add more include directories.
Fix O3 build failure.
Update eigen.
fix tensor_util_test segment fault issue
add more macro check in hip.cmake.
we may consider refine hip.cmake to inherit all add_definitions() in parrent scope, in the future.
Fix rocRAND load.
Update eigen to fix gru_unit_op and reduce_op.
Add HIP support to testing.
Update eigen to support int16 and int8 in arg min and arg max.
* add rocprim as cub library used by nv implementation
* Reduce build time in rocprim.
* Add rocprim introduction, remove useless cmake code.
* Remove useless flags and format cmake file.
7 years ago
qingqing01
36f08eef3b
CUDA kernel for density_prior_box_op. ( #14513 )
...
* CUDA kernel for density_prior_box_op.
* Support flatten to 2D.
7 years ago
tensor-tang
6a7f83d45d
enable gru jitcode and refine act and lstm jitcode
...
test=develop
7 years ago
tensor-tang
686eaf20ba
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
peizhilin
81bd7eeff4
rollback the format
7 years ago
Qiao Longfei
1f87f263a2
clean code
7 years ago
Qiao Longfei
361cb0e078
lookup remote table can compile
7 years ago
JiabinYang
0fca16847c
temp
7 years ago
JiabinYang
e9be3366a9
test=develop
7 years ago
Zeng Jinle
bfc34ac19f
Merge pull request #14536 from sneaxiy/dlpack_integration
...
Add dlpack support
7 years ago
chengduo
00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH ( #13929 )
...
* refine cublase
test=develop
* code refine
* refine cublas
* add GEMME_EX
* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop
* fix CublasCall for cuda version
test=develop
* fix error
test=develop
* fix GEMM_EX to be compatible with gcc 4.8
test=develop
* add GEMM_EX
test=develop
* to compatiable with gcc4.8
test=develop
7 years ago
peizhilin
dfbac60398
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
7c8c9dc9bf
fix unit test cases
7 years ago
tensor-tang
0c5ed5f6fc
enable peephole jitcode
...
test=develop
7 years ago
JiabinYang
3c6102a367
test=develop
7 years ago
Qiao Longfei
7c3ce2952d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
7 years ago
Qiao Longfei
60a4f69b3c
add lookup remote table op
7 years ago
Qiao Longfei
e0b48f7e29
init lookup remote table
7 years ago
tensor-tang
e3b61cf52b
init gru jitcode and fix lstm jitcode
...
test=develop
7 years ago
tensor-tang
0f25446574
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
minqiyang
d68b9ede44
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_python36and37_to_paddle_build
...
test=develop
7 years ago
Dun
ae7d22862b
Group Norm ( #13843 )
...
Add group normalization operator.
7 years ago
hjchen2
1adda8e06c
Add more unit tests for split plugin
...
test=develop
7 years ago
sneaxiy
488610a65a
merge develop
...
test=develop
7 years ago
Jiabin Yang
de2db11735
Merge pull request #14537 from reyoung/feature/fix_macos_ut
...
fix(Cpu): fix cpu compile and unittest
7 years ago
wopeizl
d9a1f3e58e
Windows/online ( #14474 )
...
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop
7 years ago
Yu Yang
533c5d5803
fix(Cpu): fix cpu compile and unittest
...
test=develop
7 years ago
sneaxiy
3912545ffe
add dlpack support
...
test=develop
7 years ago
JiabinYang
57a18e32a1
test=develop
7 years ago
peizhilin
bef475c92b
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Tao Luo
5d4d117edc
Merge pull request #14502 from qingqing01/cudnn5_fix
...
Fix compling with cuDNN v5
7 years ago
Jiabin Yang
f7b55de9e5
Merge branch 'develop' into enhance_hierachical_sigmod_op
7 years ago
Yu Yang
e68c1fcd5a
Merge pull request #14522 from reyoung/feature/fix_op_header_deps
...
fix(Compile): fix depends error when compile op using cub
7 years ago
hjchen2
6eba5bd276
Fix direct copy and refine split ut
...
test=develop
7 years ago
Qiao Longfei
fd290c2580
fix mac compile of analysis
...
test=develop
7 years ago
hjchen2
5857fb3014
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop
...
test=develop
7 years ago
tensor-tang
3562051302
add gru refer code and remove redundant avx code
...
test=develop
7 years ago
JiabinYang
af9a3301da
test=develop
7 years ago
hjchen2
3e3599f3d9
Refine split tensorrt plugin
7 years ago
peizhilin
f10e196fc8
fix build issue
7 years ago
Yu Yang
6a128dea32
Merge pull request #14515 from reyoung/feature/fix_macos_build
...
fix(Macos): fix compile on macos
7 years ago
Zhaolong Xing
ad349e770f
Merge pull request #14452 from NHZlX/fix_avg_pool_trt_bug
...
fix avg pool trt bug
7 years ago
tensor-tang
f913860873
jitkernel lstm refer support peephole
...
test=develop
7 years ago
tensor-tang
2f9b5f2383
Merge branch 'develop' into fea/jit/rnn
7 years ago
JiabinYang
014e50c284
test=develop
7 years ago
minqiyang
255cc1eb65
Add support for Mac build
...
test=develop
7 years ago
minqiyang
c19ff1f3d2
Add python3.6 and python3.7 support in padde build scripts
...
test=develop
7 years ago
peizhilin
6e66fadb95
clean up the pre-definitions on windows
7 years ago
Yu Yang
3edd32d070
fix(Compile): fix depends error when compile op using cub
...
some operators depend on cub and xxhash by header. The dependency should be declared explicitly rather than declared to pybind.
test=develop
7 years ago
Dang Qingqing
cda60311f9
Fix compling with cuDNN v5
...
test=develop
7 years ago
peizhilin
67562a6fcd
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
703b26e697
add profiler, parallel_executor back
7 years ago
Tao Luo
1d9b2a453c
Merge pull request #14508 from luotao1/warm_up_multi_thread
...
add warm up in TestMultiThreadPrediction
7 years ago
Yu Yang
b3364d4035
fix(Macos): fix compile on macos
...
test=develop
7 years ago
Yu Yang
a685f305f8
Merge pull request #14479 from reyoung/feature/fix_macos_ut
...
fix(Mac): fix unittest of macos
7 years ago
tensor-tang
10fb4ceefc
Merge pull request #14351 from tpatejko/tpatejko/mkldnn-elementwise_mul
...
[MKLDNN][JIT][AVX512] Elementwise Mul
7 years ago
jerrywgz
13e254faed
refine code, test=develop
7 years ago
tensor-tang
b4c826c548
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
...
test=develop
7 years ago
tensor-tang
ce31deb7e9
refine refer code and add lstm refer code
...
test=develop
7 years ago
jerrywgz
79cec53111
add ignore index for sigmoid cross entropy with logits op, test=develop
7 years ago
nhzlx
e62872df8b
fix conflicts
7 years ago
nhzlx
a4dc1d4292
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_trt
...
test=develop
7 years ago
nhzlx
faeb9b8aa9
fix compile rely problem
7 years ago
chengduo
a8d3aaae2a
print output log warning ( #14497 )
...
test=develop
7 years ago
Tao Luo
eb9b9becdc
add warm up in TestMultiThreadPrediction
...
test=develop
7 years ago
tensor-tang
c2cfb03a72
add lstm jitcode
7 years ago
Tao Luo
5cc7946313
Merge pull request #14499 from luotao1/disable_openblas_test
...
disable two openblas test temporary
7 years ago
Houjiang Chen
10ae3ba486
Merge pull request #14493 from hjchen2/develop
...
Implement leaky relu converter from fluid to tensorRT
7 years ago
nhzlx
2a84054372
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_trt
...
test=develop
7 years ago
nhzlx
b742d46520
fix demo ci bug on trt
7 years ago
peizhilin
25adf970b2
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Houjiang Chen
33c65517fd
Update CMakeLists.txt test=develop
7 years ago
Tao Luo
1d3e9bde1e
Merge pull request #14488 from yihuaxu/develop_7a64d48f5_stack_opt
...
Optimize the stack operator
7 years ago
Houjiang Chen
01bda73116
Update CMakeLists.txt
7 years ago
Tao Luo
09ee266f8e
disable two openblas test temporary
...
test=develop
7 years ago
hjchen2
2c2a192eb1
Resolve merge conflicts
...
test=develop
7 years ago
Yiqun Liu
8bc1c5d2ab
Implement the Tensorrt plugin for elementwise op ( #14487 )
...
* Initialize the elementwise plugin.
* Implement the basic CUDA kernel of elementwise plugin.
test=develop
7 years ago
tensor-tang
7aa3aff338
Merge pull request #14465 from tensor-tang/fea/jit/exp
...
jitcode act support all size
7 years ago
Tao Luo
1b894e495f
Merge pull request #14437 from jczaja/prv-softmax-mkl
...
Introducing MKL to softmax for inference
7 years ago
peizhilin
3f73c0a70d
fix the build issue on windows
7 years ago
chengduo
a94a7355f0
Refine the GraphNum check ( #14144 )
...
* refine GraphCheck
test=develop
* fix ci fail
test=develop
7 years ago
peizhilin
3a72a634cf
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yihua Xu
a906a361be
Add the macro for NVCC (test=develop)
7 years ago
Yihua Xu
d91740acb1
Revert "Remove the remnant code (test=develop)"
...
This reverts commit be50670348
.
7 years ago
Yihua Xu
be50670348
Remove the remnant code (test=develop)
7 years ago
hjchen2
1622cb9937
Fix alpha tensor key
7 years ago
hjchen2
a8c077df7c
Implement leaky relu tensorRT converter
7 years ago
qingqing01
9eefd2c766
Modify some infer-shape about detection operators in compile-time. ( #14483 )
...
* Modify some infer-shape in compile-time.
7 years ago
Tao Luo
cf685f361b
Merge pull request #14458 from tpatejko/tpatejko/mkldnn-skip-connections
...
[WIP] Correcting and extending MKLDNN residual connection fuse pass
7 years ago
Yihua Xu
f4c869d872
Optimize the layer_norm operator with AVX intrinsic function ( #14417 )
...
* Optimize layer_norm operator with AVX intrinsic functions
* Revert the wrong modifications
* Implement the jit kernel for layer_norm operator
* Add math headfile to fix the compile issue (test=develop)
* Add math headfile to fix the compile issue (test=develop)
* Fixed the intrinsic headfile issue (test=develop)
* Fix the conflicts (test=develop)
* Revert for CUDA compiler (test=develop)
* Fixed the cuda depency (test=develop)
* Fix the marco issues (test=develop)
7 years ago
Houjiang Chen
816b464037
Merge pull request #14486 from hjchen2/develop
...
Fix tensorrt plugin cmake dependency, test=develop
7 years ago
peizhilin
81f750a88c
fix the dependency
7 years ago
peizhilin
ee0fd78c81
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yu Yang
f1a392a5fe
Merge pull request #13804 from sneaxiy/rewrite_allocation
...
Rewrite allocation
7 years ago
Yihua Xu
f418f552df
Merge branch 'develop' into develop_7a64d48f5_stack_opt (test=develop)
7 years ago
peizhilin
8443961a4f
add warp_ctc back
7 years ago
hjchen2
2825685f2a
Fix tensorrt plugin cmake dependency, test=develop
7 years ago
Superjomn
e878a8e885
update
...
test=develop
7 years ago
qingqing01
fd7e643153
Convolution fusion operator. ( #14449 )
...
* Convolution fusion operator.
* Clean code
test=develop
7 years ago
Yu Yang
98bbfc17be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
7 years ago
Yu Yang
7486b0ddec
fix(Mac): fix unittest of macos
...
test=develop
7 years ago
peizhilin
4a6769da84
re-organize the cmake file
7 years ago
dengkaipeng
8ef6280c03
Add operator double support. test=develop
7 years ago
Yu Yang
d424115f9e
Clean code
...
test=develop
7 years ago
peizhilin
1aff40a4c6
exclude warpctc_op on windows
7 years ago
peizhilin
7d51a0e887
disable DSO by default on windows
7 years ago
peizhilin
b967e01cbe
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Wu Yi
d7bd0361cb
fix dist deps ( #14471 )
...
* fix dist deps test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
7 years ago
Yu Yang
b12c77dae2
Fix unittests
...
test=develop
7 years ago
Jacek Czaja
9b0eae3023
- Removing partial specialization of sotmax for inference for GPU
...
test=develop
7 years ago
peizhilin
c59d3e83bc
test case fix
7 years ago
peizhilin
a3e952f41d
add the jit back
...
fix compile error on windows
7 years ago
tensor-tang
a19b3225a1
fix jitcode small size
...
test=develop
7 years ago
Jacek Czaja
be80bb4f28
- Fix to GPU
...
test=develop
7 years ago
tensor-tang
4dbdfa60ef
sigmoid and tanh support all size
...
test=develop
7 years ago
tensor-tang
ccb8963705
refine exp jitcode with all size
...
test=develop
7 years ago
peizhilin
1cc23ef67d
merge from paddle:develop
7 years ago
tensor-tang
d3eae8f61b
refine relu and fix addrelu test
7 years ago
tensor-tang
4e67fe6a12
refine act and vxx with all size
7 years ago
tensor-tang
ba3eaed7a7
exp support all size
7 years ago
tensor-tang
1ffce8c0ae
fix build error on noavx
...
test=develop
7 years ago
superjomn
4bf6817cbc
fix gpu load model
...
the parameters will load from CPUPlace, that will keep copying data
between CPU and GPU places.
test=develop
7 years ago
Michal Gallus
c69c41604e
MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults
...
test=develop
7 years ago
Michal Gallus
785066eb8a
MKLDNN elementwise_mul: Check if AVX512 is available
...
test=develop
7 years ago
Michal Gallus
08f63c4d12
MKLDNN elementwise_mul: Lint changes to UT & integration
...
test=develop
7 years ago
Michal Gallus
49b09327f6
MKLDNN elementwise_mul: Reorder on non-nchw input, fallback on non-16 divisable fm
...
test=develop
7 years ago
Michal Gallus
d14858e4ba
MKLDNN elementwise_mul: Parallelize mul
7 years ago
Michal Gallus
ed31936ba1
MKLDNN elementwise_mul: Support NCHW, update UT
7 years ago
Michal Gallus
4e54ab76ec
Add HasAttr method to Operator
7 years ago
Tomasz Patejko
700bcbf74f
MKLDNN elementwise_mul: h and w loops implemented in xbyak
7 years ago
Tomasz Patejko
ad09facafe
MKLDNN elementwise_mul: CPU tests initially refactored. MKLDNN mul test for broadcast added
7 years ago
Tomasz Patejko
2d73ad180a
MKLDNN elementwise_mul: simple xbyak version for AVX512
7 years ago
Tomasz Patejko
213ec37d6a
MKLDNN elementwise_add: simple initial implementation of the operator for MKLDNN format
7 years ago
Wu Yi
a2d9b34417
Refine operator cmake ( #14413 )
...
* wip simplify operator framework
* wip
* wip
* done test=develop
* clean test=develop
* fix test=develop
* fix deps test=develop
* fix cpu build test=develop
* fix tensorrt build test=develop
* fix tests test=develop
* fix test=develop
* fix cpu build test=develop
7 years ago
Tomasz Patejko
53da846d1e
MKLDNN residual connections fuse pass: initial implementation of fusion for projection pass
...
test=develop
7 years ago
peizhilin
764f97deac
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
8580b7a130
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
tensor-tang
7f17e561d7
Merge pull request #14423 from tensor-tang/fea/jit/act
...
jitcode act relu, exp, sigmoid, tanh
7 years ago
Jiabin Yang
28bd5b7bad
fix space_to_depth_op unicode problem ( #14430 )
...
* fix space_to_depth_op unicode problem
* test=develop
7 years ago
Jacek Czaja
513bb6c151
Squashing MKL based softmax for inference
...
test=develop
- Added profiling to softmax functors
- MKL based softmax inference op
- Fix to softmax compuation via MKL
- cleaning
- Cosmetic fixes to softmax MKL
- Fix to ON_INFER lack of propagation
7 years ago
nhzlx
9b64aac41f
add macro for pool2dDirectCUDAFunctor
...
test=develop
7 years ago
Tomasz Patejko
dbc4fcd722
MKLDNN residual connections fuse pass: unit tests enabled and added
7 years ago
Tomasz Patejko
4224089354
MKLDNN residual connections fuse pass: Maybe removed and boost::optional used where it makes sense
7 years ago
Tomasz Patejko
86fd3b32be
MKLDNN residual connections fuse pass: counting statistics added to the pass
7 years ago
Tomasz Patejko
ee6f778beb
MKLDNN residual connections fuse pass: further refactoring
7 years ago
Tomasz Patejko
7423748e37
MKLDNN residual connections fuse pass:
...
* implements reachability check between identity node and non-identity argument to elementwise_add
* implements handling identity node as x and as y argument to elementwise_add
7 years ago
nhzlx
8f9a8c455a
delete unused test code.
...
test=develop
7 years ago
whs
1722678258
Make nce support more distribution. ( #13549 )
...
* Fix truncated normal.
* Fix.
* Make nce support more distribution.
* Fix API.spec.
* Fix python API.
* Fix.
test=develop
* Fix API.spec
test=develop
* Fix sampler.
* Fix order of arguments in python API.
test=develop
7 years ago
nhzlx
83f8c403a7
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into fix_avg_pool_trt_bug
...
test=develop
7 years ago
nhzlx
b969116988
fxi avg pool trt bug and fix cpplint
7 years ago
tensor-tang
1f00723fa3
exp, sigmoid, tanh jitcode support more size
...
test=develop
7 years ago
Yu Yang
19e669a992
Add legacy_allocator
...
test=develop
7 years ago
Zhaolong Xing
2f27c048cc
Merge pull request #14440 from hjchen2/develop
...
Add PRelu tensorRT plugin and Conv2d transpose op converter
7 years ago
Yu Yang
1cb7e7dda2
fix(allocation): fix ut
...
test=develop
7 years ago
Qiyang Min
d971d5b875
Merge pull request #14431 from velconia/fix_expand_op_dim_in_compile_time
...
Fix expand op incorrect infer shape
7 years ago
hjchen2
6a7b995737
Refine commit message to enable ci, test=develop
7 years ago
Wu Yi
b32c13dc20
Add cudnn ctc loss ( #12366 )
...
* add cudnn ctc loss
* wip add test test=develop
* wip
* wip
* done test=develop
* move include cudnn test=develop
* test test=develop
* fix build test=develop
* fix build test=develop
* fix build on cudnn5 test=develop
* fix cudnn5 build test=develop
* fix cudnn5 build test=develop
* merge develop softmax functor change test=develop
7 years ago
peizhilin
d1a1fafc4c
code style
7 years ago
tensor-tang
8cda7b3d20
Merge remote-tracking branch 'ups/develop' into fea/jit/act
...
test=develop
7 years ago
tensor-tang
e2d6eddd32
remove ComputeDeprecated
...
test=develop
7 years ago
peizhilin
6d0d5a76eb
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
162f2d4109
disable the openblas multi-thread on windows since no support
...
adjust the python script
7 years ago
Yan Chunwei
7796f65f89
fix inference on gpu out of mem ( #14414 )
...
* fix inference on gpu out of mem
the transfer logic in operator.cc will keep creating new scopes.
7 years ago
dengkaipeng
f115eb0d1e
enhance api. test=develop
7 years ago
tensor-tang
64f7516aee
fix lrn on mac ( #14426 )
...
* rename and fix blas vsqr
test=develop
* update
7 years ago
Yu Yang
c8f6e70ab4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
7 years ago
hjchen2
413f5948b2
Fix code style
7 years ago
hjchen2
21f33b4274
Complete PRelu plugin and Conv2d transpose op converter
7 years ago
tensor-tang
f65ddff8d1
unify act jitcode of relu, exp, sigmoid and tanh
7 years ago
tensor-tang
6a159071b6
add vtanh jitcode of size 8
7 years ago
tensor-tang
046374bcd1
add vsigmoid jitcode of size 8
7 years ago
minqiyang
560b29ccb7
Polish code
...
test=develop
7 years ago
minqiyang
21d6e8e8c8
Polish code
...
test=develop
7 years ago
minqiyang
50b6e4c6bc
Fix expand grad op infer shape
...
test=develop
7 years ago
Sylwester Fraczek
8a1eeec579
add mkldnn prop_kind phase for inference-only case to pooling and activations ( #14278 )
...
* add is_test to pooling and activations
add prop_kind support for layers activation. conv and pooling
add a pass that sets is_test to true
add transpiler version of is_test pass
test=develop
* patch test and pass
test=develop
* add pass to analyzer.h
test=develop
* add is_test attr description & pass only on mkldnn
in:
activation_op.cc
batch_norm_op.cc
conv_op.cc
dropout_op.cc
lrn_op.cc
pool_op.cc
sequence_pool_op.cc
softmax_op.cc
* fix is_test handling for activation pool and conv
* change description of is_test for all layers again
* remove GetAttr(use_mkldnn) from pass
* rename correct_mkldnn_test_phase to is_test
and remove dependency on MKLDNN
test=develop
* review fix magic number
* two if(..)s into one
* Check is_test once and pass mkldnn forward prop kind
* dereference shared_ptr with * (without get())
test=develop
* add is_test_pass back
test=develop
7 years ago
peizhilin
d1429ac4a5
add recordio support
7 years ago
chengduo
82773477ae
Add selu ( #14415 )
...
* add selu
* use for range
test=develop
* add API
test=develop
* follow comment
test=develop
* update API.spec
test=develop
7 years ago
dengkaipeng
95d5060ddd
fix abs -> fabs error. test=develop
7 years ago
Tao Luo
9d29ebc010
Merge pull request #14306 from sfraczek/sfraczek/test-analyzer-mobilenet
...
add test_analyzer_mobilenet
7 years ago
minqiyang
30147d7f58
Fix expand op incorrect infer shape
...
test=develop
7 years ago
Sylwester Fraczek
d318583eb5
rename mobilenet dir to mobilenet_depthwise_conv
...
test=develop
7 years ago
JiabinYang
ba9ff508e8
temp fix
7 years ago
Yu Yang
e5c4cf6140
Polish allocation
...
Clean allocation->Deleter
test=develop
7 years ago
Yihua Xu
03ccb9a461
Optimize the stack operator
7 years ago
dengkaipeng
2faa2b4048
remove cu file. test=develop
7 years ago
tensor-tang
ee2a7f1b8c
refine exp and fix error on avx
...
test=develop
7 years ago
tensor-tang
1e06a32a0d
add vexp jitcode of size 8
...
test=develop
7 years ago
tensor-tang
2354409601
Merge pull request #14374 from tensor-tang/fea/jit/act
...
add vrelu jitcode
7 years ago