tensor-tang
d651a91138
fix build on win, fix use condition of crf decoding and layer norm and
...
enhance test precision
test=develop
7 years ago
JiabinYang
bfcb5e5235
test=develop, fix gpu compile error on prefetch, and fix hs/nce ut failed on gpu
7 years ago
tensor-tang
d53c4756ad
clean code and remove unused files
...
test=develop
7 years ago
tensor-tang
95fb31285c
Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
Xin Pan
cf3a07e8f8
Merge pull request #14878 from panyx0718/imperative
...
MLP forward backward
7 years ago
peizhilin
9f55f1ff50
use the platform api to decide the specific instruction support or not
...
test=develop
7 years ago
tensor-tang
c187a7c618
add more impls of lstm and gru and fix build on win
...
test=develop
7 years ago
heqiaozhi
39f4e9273e
data_norm
...
test=develop
7 years ago
sneaxiy
74a8e6b032
merge develop
...
fix conflict
test=develop
7 years ago
Xin Pan
1fe3ac352a
move more and fix while
...
test=develop
7 years ago
sneaxiy
ae6f46a1a9
rewrite variable type
...
test=develop
7 years ago
Jacek Czaja
709d9e3cb7
- Added reusing MKL-DNN primitives for Transpose MKL-DNN op
...
test=develop
7 years ago
peizhilin
0b4f742e8a
fix the build issue
...
test=develop
7 years ago
peizhilin
da42cf2055
fix build issue when xbyak is disabled on windows
...
test=develop
7 years ago
tensor-tang
83d075aa79
fix lstm and gru jitcode
...
test=develop
7 years ago
peizhilin
1cc9d59838
disable xbyak on windows
...
test=develop
7 years ago
Xin Pan
876993887b
convert more interface to avoid scope
...
test=develop
7 years ago
tensor-tang
20392be001
Merge remote-tracking branch 'ups/develop' into refine/jit
...
fix conflicts
test=develop
7 years ago
tensor-tang
f332f589bc
add more impls of sigmoid and vtanh
7 years ago
jerrywgz
dda28b0e68
fix bug in if-else op, test=develop
7 years ago
JiabinYang
4877f5d71f
test=develop, fix compile error under gpu mode
7 years ago
JiabinYang
8515ee3a29
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_prefech_hs
7 years ago
JiabinYang
5ec9b37798
test=develop, fix compile error under gpu mode
7 years ago
heqiaozhi
a94285869b
add API
...
test=develop
7 years ago
mozga-intel
9035bb81fe
Enable mul operator for a ngraph engine ( #14801 )
...
* Enable mul operator for a ngraph
test=develop
* Enable activation ops test
test=develop
* Remove unused line
test=develop
7 years ago
tensor-tang
ea259c6363
enable layer norm intrinsic code
7 years ago
gongweibao
b849157e9d
Add size enforce ( #14919 )
7 years ago
heqiaozhi
5c7a8aee07
merge upstream to my develop
...
test=develop
Merge remote-tracking branch 'upstream/develop' into develop
7 years ago
Jacek Czaja
aa6e9c30be
[MKL-DNN ]Added transpose/transpose2 Op ( #14872 )
...
* - Added transpose MKLDNN Op
- Few basic UT works
- Added 1D transpose
- implementing generic mem desc for MKLDNN transpose
- Modified trnaspose op to support more dimensional data eg. 5,6..10
- Added is_test attribute to transpose op
test=develop
* - Added support for MKLDNN::memory::format::any for Transpose MKLDNN op
test=develop
* - Additional transpose mkldnn op correction to mkldnn layout
test=develop
* Cosmetic fixes
test=develop
* - Removed const_cast to obey coding standard
test=develop
7 years ago
heqiaozhi
4f6e9e3ac3
teacher student sigmoid loss
7 years ago
peizhilin
07c7eaabb4
Merge remote-tracking branch 'upstream/develop' into windows/mkl
...
test=develop
7 years ago
wopeizl
6c66b3d496
Merge pull request #14943 from wopeizl/windows/ctc
...
add ctc support for windows
7 years ago
Xin Pan
dfcf746ea1
Merge pull request #14904 from panyx0718/clean2
...
refactor RunImpl
7 years ago
tensor-tang
b1516783ea
enable crf decoding intrinsic code
7 years ago
tensor-tang
4cc7707d28
add crf_decoding and layer norm intrisic code
7 years ago
tensor-tang
10c340c9a3
fix confilcts
7 years ago
tensor-tang
893957f711
Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
tensor-tang
6648995f53
fix build
7 years ago
JiabinYang
3b7b2e1ded
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_prefech_hs
7 years ago
Xin Pan
6324032602
MLP forward backward
...
test=develop
7 years ago
peizhilin
19ebd8b4cf
add ctc support for windows
7 years ago
Xin Pan
c89a1fb287
Merge pull request #14879 from panyx0718/clean
...
clean parallel do
7 years ago
Qiao Longfei
3f3a84b6dc
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into multithread-sparse-adam
...
test=develop
7 years ago
Qiao Longfei
e2d56561e7
Merge pull request #14889 from jacquesqiao/optimize-adam
...
adam optimizer support lazy mode
7 years ago
sneaxiy
a500dfa579
rewrite ddim
...
test=develop
7 years ago
JiabinYang
b5fa916413
fix bug after merge reyoung optimization, test=develop
7 years ago
sneaxiy
dc8847af87
add examples and comments
...
test=develop
7 years ago
peizhilin
fa135bbf52
Fix the mkl build script on windows
...
test=develop
7 years ago
Xin Pan
70981f5d79
clean
...
test=develop
7 years ago
Qiao Longfei
e0df9f2346
merge lazy mode
7 years ago
Yu Yang
2803cf5776
Merge pull request #14868 from reyoung/feature/refine_w2v
...
Feature/refine w2v
7 years ago
Zhaolong Xing
a9fb34fad8
Merge pull request #14903 from NHZlX/add_conv_elementwise_pass
...
Add conv + elementwiseAdd pass
7 years ago
peizhilin
b601f2de8d
include the mkl fix only
...
test=develop
7 years ago
Qiyang Min
fd1d2c897e
Merge pull request #14894 from velconia/add_huber_regression_loss_op
...
Add python interface for huber loss
7 years ago
peizhilin
5a6d7fe2ff
add mkl,ctc support for windows
7 years ago
wopeizl
0f085f0a5a
Merge pull request #14892 from wopeizl/windows/port3
...
fix script issue
7 years ago
JiabinYang
656040c726
merge reyoung optimization
7 years ago
Qiao Longfei
8936c7913b
add log test=develop
7 years ago
Xin Pan
eaf8ba35b5
change input
...
test=develop
7 years ago
Xin Pan
840e6729e2
inject context
...
test=develop
7 years ago
Qiao Longfei
59cf96ec18
add log
7 years ago
wopeizl
fa78fc60be
Merge pull request #14907 from wopeizl/windows/avx
...
add avx support for windows
7 years ago
Qiao Longfei
fe3995d335
refine code test=develop
7 years ago
tensor-tang
74292f414c
enable eltwise nchw16c mul nc
7 years ago
Qiao Longfei
fd152289fa
clean for range in test=develop
7 years ago
nhzlx
050a68dde3
fix comments
...
test=develop
7 years ago
shippingwang
7f73c16e42
Add
7 years ago
shippingwang
2dd55b873f
Add shuffle_channel_op
7 years ago
tensor-tang
720b55cbcf
enable crf decoding and layer norm refer code
7 years ago
tensor-tang
64a90b2f1c
use vadd, vaddrelu, lstm and gru jitkernel
7 years ago
Qiao Longfei
1141db8114
update test_adam_op
...
test=develop
7 years ago
gongweibao
addded48e1
test=develop ( #14898 )
7 years ago
Qiao Longfei
3bd54ed769
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into multithread-sparse-adam
7 years ago
Qiao Longfei
96604fda10
fix gpu data
...
test=develop
7 years ago
nhzlx
fcc93d96d5
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_conv_elementwise_pass
...
fix conflicts
test=develop
7 years ago
minqiyang
24eb8f038c
Fix bug
...
test=develop
7 years ago
Yu Yang
740e1626ce
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_w2v
...
test=develop
7 years ago
Yancey1989
a760a550b0
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
7 years ago
minqiyang
bd0067b26c
Polish code
...
test=develop
7 years ago
Yu Yang
bacf1d2399
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type
7 years ago
Qiao Longfei
238b24bfa2
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-adam
7 years ago
peizhilin
01dd9061a0
add avx support for windows
...
test=develop
7 years ago
Qiao Longfei
fcde2b2725
add ForRangeIn
7 years ago
tensor-tang
3713d08d40
enable jitcode gru
7 years ago
tensor-tang
7c1f3ad6eb
enable jitcode lstm
7 years ago
Xin Pan
363bf8a4d8
Merge pull request #14897 from panyx0718/clean2
...
In most times, const_cast is bad and break interface contract and
7 years ago
nhzlx
388953027e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_conv_elementwise_pass
...
test=develop
7 years ago
nhzlx
514648665a
fix trt_op test test=develop
7 years ago
Kaipeng Deng
dc76e4b0f1
Merge pull request #14701 from heavengate/adaptive_pool
...
add adaptive pool2d and pool3d
7 years ago
tensor-tang
80766bcb82
enable act jitcode vexp, vrelu, vsigmoid and vtanh
7 years ago
nhzlx
050e118f3c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_trt_thread_bug
...
test=develop
7 years ago
nhzlx
96216052d5
1. fix trt multi thread bug
7 years ago
gongweibao
0b1c7d838c
Add brpc serialization support. ( #11430 )
7 years ago
tensor-tang
fd0a954fbf
enable blas jitcode vmul, vadd, vaddrelu, vscal and vaddbias
7 years ago
Yan Chunwei
a985949be9
Fea/fuse conv elementwise add fuse ( #14669 )
7 years ago
tensor-tang
5e97be7ba7
enable jitkernel mkl vexp, vsigmoid and vtanh
7 years ago
minqiyang
5fea8cd478
Add sorted_result parameter to SelectedRows Functor
...
test=develop
7 years ago
tensor-tang
ae17926987
enable jitkernel mkl vmul, vadd and vscal
7 years ago
tensor-tang
77907a3502
refine benchmark template
7 years ago
Xin Pan
e90b2f104c
In most times, const_cast is bad and break interface contract and
...
make the code unreadable and make the program unstable.
test=develop
7 years ago
Yancey1989
4a4ccac1d0
update by comment test=develop
7 years ago
tensor-tang
8e785fec8d
clean code and refine tests template
7 years ago
minqiyang
65d355a72c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_huber_regression_loss_op
...
test=develop
7 years ago
minqiyang
c550e0ce06
Add python interface for huber regression loss
...
test=develop
7 years ago
peizhilin
23dec78772
fix script issue
...
test=develop
7 years ago
minqiyang
da796dfe05
Remove BinarySearch from Adam Op
...
test=develop
7 years ago
Yu Yang
b17444c84c
Fix merge bug
...
test=develop
7 years ago
Qiao Longfei
c624417c6f
change sparse mode to lazy mode
7 years ago
Qiao Longfei
4035e4bab2
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-adam
7 years ago
Qiao Longfei
fac8702269
adam support multithread
7 years ago
tensor-tang
00d3afbcc9
add gru refer functions, test and benchmark
7 years ago
Qiao Longfei
3dc29b3905
change sparse_update to adam_update
7 years ago
tensor-tang
6eec461725
add lstm peephole benchmark
7 years ago
tensor-tang
bf9302f950
add lstm, peephole refer and test
7 years ago
sneaxiy
f0df62f136
add more unittest case
...
test=develop
7 years ago
Qiao Longfei
fc6ec6bd14
add sparse mode adam
7 years ago
Yancey1989
c722b1dcb6
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
...
test=develop
7 years ago
Yu Yang
4ecdb6f486
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type
...
test=develop
7 years ago
Xin Pan
47ea2534fb
clean parallel do
...
test=develop
7 years ago
sneaxiy
f6741df462
merge develop
...
fix bug
test=develop
7 years ago
Yu Yang
7b10bf0e60
Use mkl
7 years ago
Zeng Jinle
1b564bc49a
Merge pull request #14670 from sneaxiy/refactor_eager_deletion
...
Rewrite eager deletion
7 years ago
SunGaofeng
e3c4b0dace
this is for psroi_pool op, test=develop ( #14796 )
...
* Add psroi_pool operator.
7 years ago
tensor-tang
bf951fa737
add refer vrelu, videntity, vexp, vsigmoid, vtanh and test and benchmark
7 years ago
Yu Yang
15550a2753
Polish code
7 years ago
sneaxiy
deb0d41cea
fix cmake
...
fix cmake again
test=develop
7 years ago
Yancey1989
23eb8c4299
fix ci test=develop
7 years ago
Yu Yang
9e0b33d7ad
Merge branch 'feature/tensor_type' into feature/refine_w2v
7 years ago
JiabinYang
50fce87905
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_prefech_hs
7 years ago
Yu Yang
194e66f785
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type
7 years ago
tensor-tang
e9216e82f9
add refer vscal, vaddbias and test and benchmark
7 years ago
sneaxiy
db2daefe50
merge develop
...
test=develop
7 years ago
sneaxiy
8b9d33fa1e
add unittest and fix bug
...
add API.spec
test=develop
7 years ago
tensor-tang
a37038880e
fix unit test with double type
7 years ago
tensor-tang
417d031f90
add refer vadd, vaddrelu, vsub and tests and benchmark
7 years ago
Yancey1989
106e285236
add unittest for parllelgraph mode test=develop
7 years ago
JiabinYang
c2e851f7b2
test=develop, remove sparse bias and add prefetch and related tests
7 years ago
Yu Yang
be11375661
Refine code
7 years ago
Yu Yang
8d9401152e
Refine w2v
7 years ago
Tao Luo
66b6e473d0
Merge pull request #14732 from Sand3r-/mgallus/mkldnn-concat
...
[MKL-DNN] Concat Layer
7 years ago
sneaxiy
0c554a59fa
merge develop
...
test=develop
7 years ago
tensor-tang
f3250097bc
fix bug and mac compile
7 years ago
tensor-tang
bc0df6a948
make typename tuples
7 years ago
tensor-tang
194ce2e92c
add benchmark
7 years ago
Yibing Liu
6951ef9a55
Fix the gelu backward to avoid nan ( #14857 )
...
* Fix the gelu backward to avoid nan
test=develop
* Remove unnecessary calls
test=develop
7 years ago
Yu Yang
c00e07cda0
Fix distribute compile
...
test=develop
7 years ago
Qiao Longfei
3668f07965
Merge pull request #14844 from jacquesqiao/pserver-should-crash
...
pserver should crash early whe has problem
7 years ago
sneaxiy
ca84c2ca8f
merge develop
...
test=develop
7 years ago
sneaxiy
e240ba2918
implement backward
...
test=develop
7 years ago
sneaxiy
06f8aa5b97
remove while_op support temporarily
...
test=develop
7 years ago
Yu Yang
81520a24cf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_eigen_tensor
7 years ago
Yu Yang
9bd70a1e04
Change tensor uses proto::VarType::type
...
test=develop
7 years ago
Yu Yang
eeca721a99
Merge pull request #14842 from reyoung/feature/refine_eigen_tensor
...
Fix Eigen macro when using GPU
7 years ago
tensor-tang
d538513fce
fix the compile error on mac
7 years ago
tensor-tang
28eb7d840c
test all impls and all inplace cases
7 years ago
Yihua Xu
acc6ae49b1
Fix the issue to run on AVX2 and AVX512F machines ( #14851 )
...
test=develop
7 years ago
Michal Gallus
92daace55c
MKL-DNN Concat: Fix segfault related to referencing deleter memory primitive
...
test=develop
7 years ago
tensor-tang
d4cab7d948
use jitkernel in one file
7 years ago
tensor-tang
adc7ba2edd
Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
tensor-tang
900c789a35
use jitcode and use vmul
7 years ago
Qiao Longfei
1870262ba9
pserver should crash early whe has problem
...
test=develop
7 years ago
dengkaipeng
a81fabd327
fix doc errors. test=develop
7 years ago
dengkaipeng
cf06e50f1d
add doc for adaptive pool. test=develop
7 years ago
dengkaipeng
266c6856c9
add adaptive pool 2d & 3d. test=develop
7 years ago
dengkaipeng
eab4745965
add adaptive mode for pool.
7 years ago
Yu Yang
7604b1ad51
Fix Eigen macro when using GPU
...
The macro should be defined by compiler rather than by source.
test=develop
7 years ago
Qiao Longfei
1213e2838f
Merge pull request #14820 from jacquesqiao/fix-split-selected-rows
...
split selected rows op should always init output selected rows
7 years ago
JiabinYang
c35fdf1581
Merge branch 'add_prefetch_in_nce' of https://github.com/seiriosPlus/Paddle into feature/add_prefech_hs
7 years ago
sneaxiy
7923042365
merge develop
...
test=develop
7 years ago
tangwei12
59cbf06e2e
fix numel nce and prefetch
...
test=develop
7 years ago
sneaxiy
8760d23c7d
featue/py_func
7 years ago
tangwei12
33a004a779
fix numel nce and prefetch
7 years ago
zhang wenhui
c4c5f0b8ca
Merge pull request #14771 from frankwhzhang/bpr
...
add bpr_loss operator
7 years ago
Yancey1989
79082c9459
fix pyreader failed
7 years ago
tensor-tang
53709e7e61
refine names
7 years ago
Qiao Longfei
abf140289f
split selected rows op should always init output selected rows
...
test=develop
7 years ago
Yancey1989
2dda19f756
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
7 years ago
frankwhzhang
c9a653820b
fix label_pos ,add test_layers.py, test=develop
7 years ago
tangwei12
57557f6774
fix scope in nce and prefetch
7 years ago
frankwhzhang
a672b291e5
fix code style, test=develop
7 years ago
frankwhzhang
ea95f9c335
fix style bug, test=develop
7 years ago
tangwei12
bb2e7f0bbe
add scope in prefetch
7 years ago
gongweibao
f1fb64b17f
Add reduce sparse tensor feature. ( #14757 )
7 years ago
tangwei12
527946df49
add scope in prefetch
7 years ago
Yancey1989
73edf13767
update
7 years ago
Yancey1989
220db4f334
clean code
7 years ago
frankwhzhang
f4cc5881b0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into bpr
7 years ago
frankwhzhang
97de98cd0a
update bpr_loss op code, test=develop
7 years ago
Yihua Xu
3821fc3950
Merge branch 'develop' into develop_4f71a6ee2_conv3d_bias_fusion_mkldnn_impl
...
test=develop
7 years ago
Yihua Xu
240d974ac5
Clean Code
...
test=develop
7 years ago
Tao Luo
54fcafb5f6
Merge pull request #14707 from yihuaxu/develop_4f71a6ee2_conv3d_mkldnn_opt
...
Implement conv3d with mkldnn library
7 years ago
tangwei12
b653ed0516
add prefetch and remvoe selectedrows of bias
7 years ago
sneaxiy
387bac46b5
refine code
...
test=develop
7 years ago
Yihua Xu
155328a488
Clean Code
...
test=develop
7 years ago
Tao Luo
743cb840f1
update with comments
...
test=develop
7 years ago
Yancey1989
c9de6f1b05
init parallel graph mode
7 years ago
tensor-tang
ce674b685f
add readme doc and complete TODOs
7 years ago
tangwei12
7fa2e821e4
add local scope in nce
7 years ago
Tao Luo
42359e88a4
clean code
...
test=develop
7 years ago
Tao Luo
923b18877e
Merge branch 'develop' into memory_load
...
test=develop
7 years ago
Tao Luo
405b2486db
support loading from memory
...
test=develop
7 years ago
tangwei12
627a6b8bac
add prefetch in nce
7 years ago
frankwhzhang
272f3d3111
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into bpr
7 years ago
tangwei12
4cb0100c8e
add prefetch in nce
7 years ago
frankwhzhang
570d89ec84
add bpr_loss operator , test=develop
7 years ago
Qiao Longfei
05208e1f2b
optimize code
...
test=develop
7 years ago
qingqing01
549f165b59
Speed conv_fusion_op for identity activation. ( #14744 )
...
* Refine conv_fusion_op for identity activation.
* Fix unit testing.
test=develop
7 years ago
tensor-tang
fab0ee8757
Merge remote-tracking branch 'ups/develop' into refine/jitkernel
7 years ago
Houjiang Chen
c6b39a0099
Merge pull request #14714 from NHZlX/add_prelu_gpu
...
add prelu cuda kernel for inference.
7 years ago
tensor-tang
dbe451976b
Merge pull request #14753 from tensor-tang/refine/namespace
...
remove jit namespace
7 years ago
Jiabin Yang
d9bb55a1f9
Merge pull request #14756 from JiabinYang/fix_hs_op
...
fix bug in dist train on hs, test=develop
7 years ago
Yihua Xu
65dbc7cca4
Merge branch 'develop' into develop_4f71a6ee2_conv3d_mkldnn_opt
7 years ago
JiabinYang
e05e1d7d88
fix bug in dist train on hs, test=develop
7 years ago
tensor-tang
a1eb21e704
refine names
7 years ago
tensor-tang
b523787f9f
remove jit namespace
...
test=develop
7 years ago
tensor-tang
191948c933
enable jitcode
7 years ago
tensor-tang
4a93db9288
remove jit namespace
...
test=develop
7 years ago
Hongyu Liu
8cda28f345
Merge pull request #14733 from phlrain/add_cudnn_5_support
...
Add cudnn 5 support
7 years ago
Xin Pan
73b4d1aa72
Merge pull request #14742 from panyx0718/infer2
...
support customized kernel selection
7 years ago
Qiao Longfei
9af76ade4c
fix unused var
7 years ago
Jiabin Yang
21c0f8749e
Merge pull request #14728 from JiabinYang/optimize_hs_op
...
Optimize hs op
7 years ago
tensor-tang
45bfa70cb8
complete vmul jit kernel
7 years ago
tensor-tang
77236e33fc
init jitkernel
7 years ago
Xin Pan
82d68281c0
follow comments
...
test=develop
7 years ago
liuhongyu
8b2898e201
fix bug of formate; test=develop
7 years ago
Xin Pan
41c28d54c6
allow customize kernel selection
...
test=develop
7 years ago
liuhongyu
773dc73fbf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_5_support
7 years ago
liuhongyu
8daf67f90f
fix bugs; test=develop
7 years ago
chengduo
04539d4c5d
Fix clip.py ( #14718 )
...
* expose square
test=develop
* fix activation
test=develop
* Add square API
test=develop
* add necessary op
* code refine
* fix API.spec
test=develop
* fix unit test
test=develop
* add unit test sparse_grad_clip
test=develop
* fix API.spec
test=develop
* remove mac test for test_gradient_clip
test=develop
* remove selectedrows_mul_tensor
test=develop
7 years ago
Michal Gallus
6fdbb365ce
Include MKL-DNN header to concat op only when flag is set
...
test=develop
7 years ago
Michal Gallus
f2a880421e
Fix style @ concat integration and tests
...
test=develop
7 years ago
Michal Gallus
738069e491
Refactor MKL-DNN Concat
...
test=develop
7 years ago
Michal Gallus
208f912512
Implement MKL-DNN Concat
...
test=develop
7 years ago
liuhongyu
968dd3c078
add cudnn 5 support; test=develop
7 years ago
sneaxiy
e694d0c2e4
fix while_op eager deletion bug
...
add unittest
test=develop
7 years ago
Qiao Longfei
7b7fe01cae
optimize code
7 years ago
Qiao Longfei
daba57f752
complete ctr_reader
7 years ago
JiabinYang
8c75705984
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize_hs_op
...
, test=develop
7 years ago
JiabinYang
b387a19410
optimize op with blas
7 years ago
Zeng Jinle
ff4237309a
Merge pull request #14720 from sneaxiy/fix_seq_mask_op_infershape
...
Fix sequence_mask_op InferShape
7 years ago
Kaipeng Deng
934f13a70a
Merge pull request #14371 from heavengate/yolo_loss
...
Add YOLOv3 loss operator for YOLOv3 model
7 years ago
sneaxiy
65867d8989
test=develop
7 years ago
sneaxiy
64ad051b9a
merge develop
...
test=develop
7 years ago
sneaxiy
c47c451a00
fix bug
7 years ago
nhzlx
e7abe6b654
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_prelu_gpu
...
test=develop
7 years ago
nhzlx
f75815b78c
add prelu gpu inference
7 years ago
Yihua Xu
ea00270fe8
Remove the dims checking when the dim is 3 (test=develop)
7 years ago
jerrywgz
96dc3d8326
Merge pull request #14511 from jerrywgz/ignore_index_for_sigmoid_cross_entropy
...
add ignore index for sigmoid cross entropy with logits op, test=develop
7 years ago
Yihua Xu
669191c9cc
Implement conv3d with mkldnn library (test=develop)
7 years ago
Hongyu Liu
4f71a6ee2c
Merge pull request #14622 from PaddlePaddle/add_cudnn_lstm
...
Add cudnn lstm
7 years ago
Yibing Liu
c7382df80f
Print assert failure id in lookup_table_op ( #14698 )
7 years ago
Qiao Longfei
9f53aad13a
add test for read csv data
7 years ago
Qiao Longfei
fbd6f50148
add ReadSvmData
7 years ago
Qiao Longfei
d7c8ebac2e
add datadesc
7 years ago
Qiao Longfei
a05a948d89
update readthread
7 years ago
Qiao Longfei
2cd25794bd
add PlainFileReader
7 years ago
phlrain
4c256ca6be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
phlrain
b65722d3cf
fix uni test; test=develop
7 years ago
Qiao Longfei
7f07dfa1a4
clean code
7 years ago
tangwei12
618f7620e2
add enforce for auc ( #14687 )
...
* add enforce for AUC, test=develop
7 years ago
phlrain
2770ea1a73
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
chengduozh
3f4aca618f
code refine
...
test=develop
7 years ago
chengduozh
af8c2cec13
fix operator.cmake
...
test=develop
7 years ago
chengduozh
679d8fc6fe
rename op name
...
test=develop
7 years ago
jerrywgz
3df0538940
replace -100 to kIgnoreIndex
7 years ago
Wang Guibao
41e19eb431
AsyncExecutor ( #14627 )
...
* AsyncExecutor: C++ side
* Google naming conventions
* Rename MultiExecutor to AsyncExecutor
* pybind with async_executor
* Naming convention
* remove some flags and unused code
* add refactored file of async_executor and data_feed
* clear async executor interface and add data feed factory
* split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
* Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
* refine async_executor_refactor.cc
* add some files about datafeed
* Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
* Interface rework
* add MultiSlotDataFeed
* Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
* update data_feed for add MultiSlotDataFeed
* update datafeed and async_executor to run bow_net demo
* fix bug that finish_set_filelist failed in multithread
* delete finish_binding_memory_(flag), because it can not be marked under the current interface
* Fix bug
* update async_executor.py for support set_use_slots
* update async_executor.py for support set_use_slots and set set_dense_slots
* fix bug that when the number of files is less than the number of threads, it will fetch nan
* remove redundant code, and make executor exit when set a illegal queue size
* add batch_size check
* add MultiSlotDesc
* Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
* add some checkpoint in DataFeedDesc
* add CheckFile function in MultiSlotDataFeed
* update something error info
* fix deaded lock bug
* Fix fetch variable
* Merge error
* fix code style in async_executor
* using one lock blocking queue replace two lock blocking queue because of some bugs
* update code style
* add utest for data_feed
* Fix fetch var
* update utest for data_feed for multithread
* update SetFileList info
* fix bug in utest of data_feed
* Add comments for python
* Add comments for python code
* Fix pybind.cc with new pybind11 version
* add note for DataFeedDesc's set_use_slots function
* Add save_model
* update data_feed_test for multi-type
* add comment for executor_thread_worker
* Remove unused code
* update data_feed_test for generate test data file
* removed unnecessary interfaces and add comments
* c++ style check
* update data_feed.cc
* AsyncExecutor: C++ side
Google naming conventions
Rename MultiExecutor to AsyncExecutor
pybind with async_executor
Naming convention
remove some flags and unused code
add refactored file of async_executor and data_feed
clear async executor interface and add data feed factory
split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
refine async_executor_refactor.cc
add some files about datafeed
Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
add MultiSlotDataFeed
Interface rework
Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
update datafeed and async_executor to run bow_net demo
update async_executor.py for support set_use_slots
Fix bug
update async_executor.py for support set_use_slots and set set_dense_slots
fix bug that when the number of files is less than the number of threads, it will fetch nan
remove redundant code, and make executor exit when set a illegal queue size
add MultiSlotDesc
Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
add some checkpoint in DataFeedDesc
Fix fetch variable
fix code style in async_executor
Fix fetch var
add utest for data_feed
Add comments for python
update utest for data_feed for multithread
fix bug in utest of data_feed
Add comments for python code
Fix pybind.cc with new pybind11 version
add note for DataFeedDesc's set_use_slots function
update data_feed_test for multi-type
Add save_model
update data_feed_test for generate test data file
removed unnecessary interfaces and add comments
add comment for executor_thread_worker
Remove unused code
update data_feed.cc
c++ style check
* commit for code style
* commit for code style
* commit for code style
* commit for code style
* Comment away __init__ in async_executor.py
* clang-format fix test=develop
* use PADDLE_THROW instead of exit(-1); use unique_ptr to manage scope var in data_feed_test.cc
* commit for update code style
* commit for update code style
* Add async_executor demo; Remove some methods
test=develop
* commit for update code style
* commit for update code style
* commit for update code style
* update API.spec
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* Fix API.spec
test=develop
* Fix API.spec
test=develop
* Fix windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix code style
test=develop
* Fix code style
test=develop
* update datafeed
* Fix code style
test=develop
* update data_feed_test for test Tensor test=develop
* Fix code style
test=develop
* Fix windows build failure
test=develop
* Fix code style and windows build failure
test=develop
* Fix PYTHON3.5 build failure
test=develop
* AsyncExecutor API
test=develop
7 years ago
whs
1b9753d109
Make pad2d support for variable paddings. ( #14667 )
...
* Make pad2d support for variable paddings.
test=develop
* Rename get_paddings and add inline modifier.
test=develop
* Fix comments.
7 years ago
luotao1
bcc90123f0
speedup box_coder_op for multi-threads
...
test=develop
7 years ago
phlrain
6ce4250172
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
Qiao Longfei
44debca844
Merge pull request #14589 from jacquesqiao/refactor-prefetch
...
Refactor prefetch
7 years ago
phlrain
bd94ab0ef3
rename op; test=develop
7 years ago
phlrain
92f5be1d82
remove inputvarname in operator; test=develop
7 years ago
phlrain
cf1fe61004
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
phlrain
d1a17cadd4
fix cudnn rnn; test=develop
7 years ago
Tao Luo
20120d9c97
Merge pull request #14608 from jczaja/prv-conv2d-transpose-mkldnn
...
[MKL-DNN]conv2d transpose
7 years ago
Qiao Longfei
3e45a5a5ec
lookup_table gpu kernel support prefetch
...
test=develop
7 years ago
qingqing01
731d45a39a
Enable BatchNorm to use global mean and variane during training ( #14630 )
...
* Enable BatchNorm to use global mean and variane during training
* Update doc and follow comments.
7 years ago
Tao Luo
ea47685f91
Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum
...
Softmax for inference MKL further changes
7 years ago
Qiao Longfei
3a3cfc2d8d
prefetch support gpu
...
test=develop
7 years ago
Qiao Longfei
4b9082a4cd
follow comment
7 years ago
chengduo
6776e92846
refine tensor_array_write_read ( #14643 )
...
test=develop
7 years ago
Jacek Czaja
48e1b97e8e
- Coding style fixes
...
test=develop
7 years ago
Qiao Longfei
d32de7e6e1
fix code format test=develop
7 years ago
Qiao Longfei
5a660aee7d
update log level in parameter prefetch test=develop
7 years ago
Qiao Longfei
8ebde595c9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
Qiao Longfei
b9d3d75fc4
fix prefetch dependency test=develop
7 years ago
Qiao Longfei
145c535750
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
minqiyang
9d7c3b18c0
Polish code
...
test=develop
7 years ago
minqiyang
2b430adaee
Polish code
...
test=develop
7 years ago
minqiyang
a02ce58f2c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
Jiabin Yang
12e1719f96
Merge pull request #14352 from JiabinYang/enhance_hierachical_sigmod_op
...
Enhance hierarchical sigmoid op
7 years ago
Qiao Longfei
40f68b1349
unit test ready
7 years ago
Qiao Longfei
36e26a53b0
Optimize bilinear tensor product op ( #14485 )
...
* optimize bilinear_tensor_product
* add set zero to set grad to 0.
7 years ago
Tao Luo
4ec9de0122
Merge pull request #14628 from Sand3r-/mgallus/mkldnn-elementwise_mul
...
EltwiseMul: Changes from previous PR
7 years ago
Qiao Longfei
35b79ab865
Merge pull request #13983 from jacquesqiao/add-ctr-reader
...
Add ctr reader
7 years ago
Qiao Longfei
da387720d7
fix infer compile test=develop
7 years ago
Jacek Czaja
cf40daee58
- Building fix to softmax for inference
7 years ago
Clementine
6c71c1f8f9
Add activation gelu ( #14569 )
7 years ago
Michal Gallus
9455be0ba5
EltwiseMul: Extract StringToFormat to MKLDNN helper
...
test=develop
7 years ago
Jacek Czaja
1540df51cf
- Fix to test_conv2d_transpose_mkldnn for GPU
...
test=develop
7 years ago
JiabinYang
eda069068d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
JiabinYang
a08dc83eb0
remove arg 'non_leaf_num', test=develop
7 years ago
chengduo
6648f5ed6f
add ShareLoD for dropout_grad ( #14616 )
...
test=develop
7 years ago
JiabinYang
c469334cfb
polish python code and comment, test=develop
7 years ago
Qiao Longfei
92afbb923c
fix compile problem test=develop
7 years ago
Qiao Longfei
97cbec9b74
clean code
7 years ago
Qiao Longfei
1edd435da6
fix ci problem test=develop
7 years ago
JiabinYang
87648f8edf
merge develop, test=develop
7 years ago
wopeizl
db9284ecde
Merge pull request #14617 from wopeizl/windows/online
...
Windows/online
7 years ago
JiabinYang
c3c3c0b33c
polish code, test=develop
7 years ago
Jacek Czaja
8bfa1fa9bb
- ASUM MKL integration
7 years ago
phlrain
487ee36aec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
tangwei12
56a4912b76
Make NCE_OP more efficient and support SelectedRows ( #14469 )
...
* Fix truncated normal.
* Fix.
* Make nce support more distribution.
* Fix API.spec.
* Fix python API.
* Fix.
test=develop
* Fix API.spec
test=develop
* Fix sampler.
* Fix order of arguments in python API.
test=develop
* NCE add selectedrows support
* NCE update weighted sampling
* fix bugs in nce_op, and assign_value_op optimized
* fix bugs in nce_op, revert assign_value_op
* nce_op optimize
* nce_op optimize
* nce_op optimize
* add selectedRows test later
test=develop
* add selectedRows supported
* add selectedRows supported
test=develop
* add selectedRows supported
* add nce selectedRows supported, test=develop
* add nce selectedRows supported
* add nce selectedRows supported, test=develop
* fix height in nce, test=develop
* add ut
* add ut, test=develop
* make AutoGrownIndex inline
test=develop
* fix tinny error, test=develop
7 years ago
liuhongyu
1ffe41d722
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
Qiao Longfei
9589babe12
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
liuhongyu
05917c3c79
add cudnn lstm; test=develop
7 years ago
Qiao Longfei
f35f3fe77a
ctr reader can not be used in windows
...
test=develop
7 years ago
peizhilin
6a85dd3278
Merge remote-tracking branch 'upstream/develop' into windows/build
...
test=develop
7 years ago
peizhilin
38715e6fd0
minor fix
7 years ago
Qiao Longfei
6bef565dac
clean code test=develop
7 years ago
Qiao Longfei
e7d1f524f3
change log level
...
test=develop
7 years ago
JiabinYang
7e4bd695e6
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
Qiao Longfei
fe54adf70c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
JiabinYang
b10df8bcfa
refine code and add none bias ut, test=develop
7 years ago
Kaipeng Deng
251a1bb0f4
Merge pull request #14588 from heavengate/revert_interpolate
...
fix interpolate_op incompatible. test=develop
7 years ago
Qiao Longfei
668ae9083e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
Qiyang Min
30e47bce8b
Merge branch 'develop' into revert_vlog
7 years ago
Qiao Longfei
87e4edd2ea
fix grad_varname in remote prefetch
7 years ago
Qiao Longfei
d98c59fd2c
support none sliced variable
7 years ago
dengkaipeng
bb489d4cc9
add interp_method default bilinear. test=develop
7 years ago
dengkaipeng
78f563917c
revert interpolate_op to bilinear_interp_op & nearest_interp_op. test=develop
7 years ago
Jacek Czaja
fb24690a58
- conv2d transpose MKL-DNN
...
test=develop
- Added new header for MKLDNN reuse functionality
- Extended conv2d_transpose GetExpectedKernelType for MKL-DNN supporrt
- Buildable conv transpose mkldnn and conv mkldnn using conv template
- Conv2d transpose roughlt implemented and buildable
- Added modifications conv2d transpose MKLDNN unit tests
- Fix to UT of conv2d transpose mkldnn op
- Wrong type of MKLDNN primitive was chosen for conv2d transpose
- HAcks for conv2d transpose
- UT enalbed
- Replaced copying loop with memcpy
- Draft of passing lambda into AcquireMemory
- Made reorder (IOHW->OIHW) to be called only once
7 years ago
tensor-tang
7a91271436
Merge branch 'develop' into fea/jit/rnn
7 years ago
minqiyang
be04d99fe4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
JiabinYang
81e145764d
refine code and comments, test=develop
7 years ago
Qiao Longfei
af2f5fc824
fix some bugs
7 years ago
JiabinYang
2f6b529aff
refine code and comments, test=develop
7 years ago
minqiyang
53433d7f2e
Revert the changes of VLOG
...
test=develop
7 years ago
tensor-tang
1f0291a51e
add comments and follow comments
...
test=develop
7 years ago
tensor-tang
557229bd39
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
Qiao Longfei
ed9fa4b301
can run
7 years ago
peizhilin
30849d1f20
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
qingqing01
6224e61fd9
Transpose-Flatten-Concat fusion operator. ( #14568 )
...
* Transpose-Flatten-Concat fusion operator.
* Add unit testing and fix bug.
7 years ago
Qiao Longfei
686d15c8e0
update grpc_variable_response
7 years ago
tangwei12
3639d99f99
Fix save and load lookup table/optimizer vars ( #14301 )
...
* fix mkdir conflict
* fix load/save lookup tables
test=develop
* add lookup_table_utils
* fix load optimize vars on pserver
* delete lookup table utils
* fix save and load lookup tables
* fix load optimizer var
* fix load optimizer var, test=develop
* fix python 3 style, test=develop
* move lookup_table_utils to contrib utils
7 years ago
peizhilin
36cd18b549
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Qiao Longfei
d827881502
fix pserver and prefetch rpc
7 years ago
Yiqun Liu
bf222f197d
Use sub scope in tensor_array_to_tensor op. ( #14524 )
...
test=develop
7 years ago
JiabinYang
02d68051db
add sparsed bias grad, test=develop
7 years ago
Qiao Longfei
5856c2f332
change Var to FindVar
7 years ago
Qiao Longfei
312b7786d9
clean code
7 years ago
Qiao Longfei
2b6c0c09d6
add unit test
7 years ago
Qiao Longfei
47280ef8b4
lookup table op support prefetch
7 years ago
gongweibao
c1bf9664cd
Add options to disable SO_REUSEPORT of grpc. ( #14269 )
7 years ago
Qiao Longfei
4ad5fd8f54
add parameter prefetch
7 years ago
Qiao Longfei
9d276fe8a8
add parameter prefetch
7 years ago
luotao1
e21edb26f6
add Set/GetCPUNumThreads api
7 years ago
Qiao Longfei
9851a53478
add prefetch part in pserver
7 years ago
JiabinYang
42470f14b7
test=develop
7 years ago
peizhilin
445fff24dc
add the bigobj option to NVCC compile
...
fix code style
7 years ago
qingqing01
36f08eef3b
CUDA kernel for density_prior_box_op. ( #14513 )
...
* CUDA kernel for density_prior_box_op.
* Support flatten to 2D.
7 years ago
tensor-tang
6a7f83d45d
enable gru jitcode and refine act and lstm jitcode
...
test=develop
7 years ago
tensor-tang
686eaf20ba
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
peizhilin
81bd7eeff4
rollback the format
7 years ago
Qiao Longfei
1f87f263a2
clean code
7 years ago
Qiao Longfei
361cb0e078
lookup remote table can compile
7 years ago
JiabinYang
0fca16847c
temp
7 years ago
JiabinYang
e9be3366a9
test=develop
7 years ago
chengduo
00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH ( #13929 )
...
* refine cublase
test=develop
* code refine
* refine cublas
* add GEMME_EX
* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop
* fix CublasCall for cuda version
test=develop
* fix error
test=develop
* fix GEMM_EX to be compatible with gcc 4.8
test=develop
* add GEMM_EX
test=develop
* to compatiable with gcc4.8
test=develop
7 years ago
peizhilin
dfbac60398
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
7c8c9dc9bf
fix unit test cases
7 years ago
tensor-tang
0c5ed5f6fc
enable peephole jitcode
...
test=develop
7 years ago
JiabinYang
3c6102a367
test=develop
7 years ago
Qiao Longfei
7c3ce2952d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
7 years ago
Qiao Longfei
60a4f69b3c
add lookup remote table op
7 years ago
Qiao Longfei
e0b48f7e29
init lookup remote table
7 years ago
tensor-tang
e3b61cf52b
init gru jitcode and fix lstm jitcode
...
test=develop
7 years ago
tensor-tang
0f25446574
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
Dun
ae7d22862b
Group Norm ( #13843 )
...
Add group normalization operator.
7 years ago
wopeizl
d9a1f3e58e
Windows/online ( #14474 )
...
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop
7 years ago
JiabinYang
57a18e32a1
test=develop
7 years ago
peizhilin
bef475c92b
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Tao Luo
5d4d117edc
Merge pull request #14502 from qingqing01/cudnn5_fix
...
Fix compling with cuDNN v5
7 years ago
Jiabin Yang
f7b55de9e5
Merge branch 'develop' into enhance_hierachical_sigmod_op
7 years ago
Yu Yang
e68c1fcd5a
Merge pull request #14522 from reyoung/feature/fix_op_header_deps
...
fix(Compile): fix depends error when compile op using cub
7 years ago
tensor-tang
3562051302
add gru refer code and remove redundant avx code
...
test=develop
7 years ago
JiabinYang
af9a3301da
test=develop
7 years ago
Zhaolong Xing
ad349e770f
Merge pull request #14452 from NHZlX/fix_avg_pool_trt_bug
...
fix avg pool trt bug
7 years ago
tensor-tang
f913860873
jitkernel lstm refer support peephole
...
test=develop
7 years ago
tensor-tang
2f9b5f2383
Merge branch 'develop' into fea/jit/rnn
7 years ago
JiabinYang
014e50c284
test=develop
7 years ago
Yu Yang
3edd32d070
fix(Compile): fix depends error when compile op using cub
...
some operators depend on cub and xxhash by header. The dependency should be declared explicitly rather than declared to pybind.
test=develop
7 years ago
Dang Qingqing
cda60311f9
Fix compling with cuDNN v5
...
test=develop
7 years ago
peizhilin
67562a6fcd
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
tensor-tang
10fb4ceefc
Merge pull request #14351 from tpatejko/tpatejko/mkldnn-elementwise_mul
...
[MKLDNN][JIT][AVX512] Elementwise Mul
7 years ago
jerrywgz
13e254faed
refine code, test=develop
7 years ago
tensor-tang
b4c826c548
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
...
test=develop
7 years ago
tensor-tang
ce31deb7e9
refine refer code and add lstm refer code
...
test=develop
7 years ago
jerrywgz
79cec53111
add ignore index for sigmoid cross entropy with logits op, test=develop
7 years ago
nhzlx
e62872df8b
fix conflicts
7 years ago
tensor-tang
c2cfb03a72
add lstm jitcode
7 years ago
peizhilin
25adf970b2
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Tao Luo
1d3e9bde1e
Merge pull request #14488 from yihuaxu/develop_7a64d48f5_stack_opt
...
Optimize the stack operator
7 years ago
tensor-tang
7aa3aff338
Merge pull request #14465 from tensor-tang/fea/jit/exp
...
jitcode act support all size
7 years ago
Tao Luo
1b894e495f
Merge pull request #14437 from jczaja/prv-softmax-mkl
...
Introducing MKL to softmax for inference
7 years ago
peizhilin
3a72a634cf
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yihua Xu
a906a361be
Add the macro for NVCC (test=develop)
7 years ago
Yihua Xu
d91740acb1
Revert "Remove the remnant code (test=develop)"
...
This reverts commit be50670348 .
7 years ago
Yihua Xu
be50670348
Remove the remnant code (test=develop)
7 years ago
qingqing01
9eefd2c766
Modify some infer-shape about detection operators in compile-time. ( #14483 )
...
* Modify some infer-shape in compile-time.
7 years ago
Yihua Xu
f4c869d872
Optimize the layer_norm operator with AVX intrinsic function ( #14417 )
...
* Optimize layer_norm operator with AVX intrinsic functions
* Revert the wrong modifications
* Implement the jit kernel for layer_norm operator
* Add math headfile to fix the compile issue (test=develop)
* Add math headfile to fix the compile issue (test=develop)
* Fixed the intrinsic headfile issue (test=develop)
* Fix the conflicts (test=develop)
* Revert for CUDA compiler (test=develop)
* Fixed the cuda depency (test=develop)
* Fix the marco issues (test=develop)
7 years ago
peizhilin
ee0fd78c81
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yu Yang
f1a392a5fe
Merge pull request #13804 from sneaxiy/rewrite_allocation
...
Rewrite allocation
7 years ago
Yihua Xu
f418f552df
Merge branch 'develop' into develop_7a64d48f5_stack_opt (test=develop)
7 years ago
peizhilin
8443961a4f
add warp_ctc back
7 years ago
qingqing01
fd7e643153
Convolution fusion operator. ( #14449 )
...
* Convolution fusion operator.
* Clean code
test=develop
7 years ago
Yu Yang
98bbfc17be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
7 years ago
peizhilin
4a6769da84
re-organize the cmake file
7 years ago
dengkaipeng
8ef6280c03
Add operator double support. test=develop
7 years ago
peizhilin
1aff40a4c6
exclude warpctc_op on windows
7 years ago
peizhilin
7d51a0e887
disable DSO by default on windows
7 years ago
peizhilin
b967e01cbe
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Wu Yi
d7bd0361cb
fix dist deps ( #14471 )
...
* fix dist deps test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
7 years ago
Jacek Czaja
9b0eae3023
- Removing partial specialization of sotmax for inference for GPU
...
test=develop
7 years ago
peizhilin
a3e952f41d
add the jit back
...
fix compile error on windows
7 years ago
tensor-tang
a19b3225a1
fix jitcode small size
...
test=develop
7 years ago
Jacek Czaja
be80bb4f28
- Fix to GPU
...
test=develop
7 years ago
tensor-tang
4dbdfa60ef
sigmoid and tanh support all size
...
test=develop
7 years ago
tensor-tang
ccb8963705
refine exp jitcode with all size
...
test=develop
7 years ago
peizhilin
1cc23ef67d
merge from paddle:develop
7 years ago
tensor-tang
d3eae8f61b
refine relu and fix addrelu test
7 years ago
tensor-tang
4e67fe6a12
refine act and vxx with all size
7 years ago
tensor-tang
ba3eaed7a7
exp support all size
7 years ago
tensor-tang
1ffce8c0ae
fix build error on noavx
...
test=develop
7 years ago
Michal Gallus
c69c41604e
MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults
...
test=develop
7 years ago
Michal Gallus
785066eb8a
MKLDNN elementwise_mul: Check if AVX512 is available
...
test=develop
7 years ago
Michal Gallus
08f63c4d12
MKLDNN elementwise_mul: Lint changes to UT & integration
...
test=develop
7 years ago
Michal Gallus
49b09327f6
MKLDNN elementwise_mul: Reorder on non-nchw input, fallback on non-16 divisable fm
...
test=develop
7 years ago
Michal Gallus
d14858e4ba
MKLDNN elementwise_mul: Parallelize mul
7 years ago
Michal Gallus
ed31936ba1
MKLDNN elementwise_mul: Support NCHW, update UT
7 years ago
Tomasz Patejko
700bcbf74f
MKLDNN elementwise_mul: h and w loops implemented in xbyak
7 years ago
Tomasz Patejko
ad09facafe
MKLDNN elementwise_mul: CPU tests initially refactored. MKLDNN mul test for broadcast added
7 years ago
Tomasz Patejko
2d73ad180a
MKLDNN elementwise_mul: simple xbyak version for AVX512
7 years ago
Tomasz Patejko
213ec37d6a
MKLDNN elementwise_add: simple initial implementation of the operator for MKLDNN format
7 years ago
Wu Yi
a2d9b34417
Refine operator cmake ( #14413 )
...
* wip simplify operator framework
* wip
* wip
* done test=develop
* clean test=develop
* fix test=develop
* fix deps test=develop
* fix cpu build test=develop
* fix tensorrt build test=develop
* fix tests test=develop
* fix test=develop
* fix cpu build test=develop
7 years ago
peizhilin
764f97deac
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
8580b7a130
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
tensor-tang
7f17e561d7
Merge pull request #14423 from tensor-tang/fea/jit/act
...
jitcode act relu, exp, sigmoid, tanh
7 years ago
Jiabin Yang
28bd5b7bad
fix space_to_depth_op unicode problem ( #14430 )
...
* fix space_to_depth_op unicode problem
* test=develop
7 years ago
Jacek Czaja
513bb6c151
Squashing MKL based softmax for inference
...
test=develop
- Added profiling to softmax functors
- MKL based softmax inference op
- Fix to softmax compuation via MKL
- cleaning
- Cosmetic fixes to softmax MKL
- Fix to ON_INFER lack of propagation
7 years ago