tensor-tang
3eb55f0643
Merge remote-tracking branch 'ups/develop' into refine/op/peephole
7 years ago
tensor-tang
d7ac1cc836
refine seq when bs is large
7 years ago
tensor-tang
9dd5a177a5
refine batch mode and peephole
7 years ago
Qiao Longfei
6e03f7900f
Add centered mode rmsprop ( #13161 )
...
* rmsprop optimizer support v1 mode
* typo
* optimize code
* refine code
* optimize unit test
* update test_rmsprop_op.py
* update formula of rmsprop
* optimize document
* update API.spec for RMSPropOptimizer
* add default value to check_output_with_place equal_nan
7 years ago
Yan Chunwei
9df2d8b5ba
test/add text-classification test ( #13081 )
7 years ago
tensor-tang
f10710b0ca
move seq peephole if out of loop
7 years ago
tensor-tang
2f3b498949
refine fusion seq lstm peephole
7 years ago
tangwei12
d1e2efae6b
reimplement auc in fluid ( #13167 )
...
* reimplement auc in pyton
* reimplement auc in fluid
* add auc unittest
* replace new auc in layers
* add batch Auc in Fluid
* name formated
7 years ago
Yu Yang
f57d706aa7
Use double to reduce
7 years ago
tensor-tang
5f586e2223
Merge remote-tracking branch 'ups/develop' into refine/op/fusion_lstm
7 years ago
Brian Liu
04272c0d41
Enable lstm peephole ( #13160 )
...
* Refine fusion lstm op code for better readability
* Enable peephole in fusion lstm op (seq_mode part) and add unit test
* Enable peephole in fused lstop op (batch_mode part)
Set batch_mode as default as well
* Use pre-commit to clean format
* Follow up review comments as well as adding more unit tests for seq mode
7 years ago
fengjiayi
56750e6a3e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_CudnnHolder_bug
7 years ago
Qiao Longfei
cdd14f17f1
fix async mode handle COMPLETE_MESSAGE ( #13212 )
7 years ago
minqiyang
8059445fb5
Fix fake_quantize_op
7 years ago
tensor-tang
78d9ad5712
fusion gru enfore only used
7 years ago
tensor-tang
555083ae2a
enforce only used
7 years ago
fengjiayi
db5e3dd767
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_CudnnHolder_bug
7 years ago
Jiabin Yang
d091dd02a0
fix mac compile error 0903 ( #13184 )
7 years ago
Yu Yang
cda7842e26
Revert "Revert "Add Python Callstacks when Op::Run error ( #12759 )""
...
This reverts commit 1f270275a6
.
7 years ago
qingqing01
9557cc218d
Refine and fix some code for faster-rcnn. ( #13135 )
...
* Fix bug in generate_proposals_op.
* Fix data type for RoIs.
* Refine and fix rpn_target_assign_op.
* Add the missing file bbox_util.h
* Rename BoxEncoder to BoxToDelta
7 years ago
fengjiayi
82a1b35b9b
Revert "Revert "Add CudnnHolder and use it in Conv and ConvTranspose op""
...
This reverts commit 151e169eb7
.
7 years ago
guochaorong
151e169eb7
Revert "Add CudnnHolder and use it in Conv and ConvTranspose op"
7 years ago
Chen Weihang
3b6090e80b
Merge pull request #12887 from chenwhql/sequence_enumerate_op
...
Feat: add sequence enumerate op
7 years ago
tensor-tang
1cc35f3642
Merge pull request #13118 from tensor-tang/optimize/op/fusion_lstm
...
Optimize fusion lstm batch mode
7 years ago
dzhwinter
6fb28796f5
memory ( #13143 )
7 years ago
dzhwinter
e722f68318
fix windows compile ( #13147 )
7 years ago
dzhwinter
f05520060e
fix style ( #13142 )
7 years ago
dzhwinter
856c26faef
fix elementwise ( #13146 )
7 years ago
fengjiayi
653c8ded7d
Merge pull request #13078 from JiayiFeng/dev_CudnnHolder
...
Add CudnnHolder and use it in Conv and ConvTranspose op
7 years ago
tensor-tang
20659fc905
Merge pull request #13107 from tensor-tang/optimize/op/fusion_gru
...
Optimize fusion gru
7 years ago
tensor-tang
93c034ee51
Merge remote-tracking branch 'ups/develop' into optimize/op/fusion_lstm
7 years ago
tensor-tang
c7adb99ae0
follow comment and refine code
7 years ago
tensor-tang
83f4bc4ecf
follow comment and refine code
7 years ago
tensor-tang
f38905a6e5
Merge remote-tracking branch 'ups/develop' into optimize/op/fusion_gru
7 years ago
tangwei12
fbdd4f8c0f
Merge pull request #13101 from zenghsh3/develop
...
Fix bug of sampling_id op
7 years ago
tensor-tang
9838bacb35
Merge branch 'develop' into optimize/op/fusion_lstm
7 years ago
qingqing01
9bd933d3fb
Improve and fix fake_quantize_op ( #13092 )
...
* Improve and fix fake_quantize_op.
7 years ago
Tao Luo
3fe0575b62
Merge pull request #13148 from dzhwinter/windows/math_compile
...
cuda math port
7 years ago
chenweihang
7ddbbcb0b5
doc: refine API and doc
7 years ago
dzhwinter
34757efb8e
fix windows compile
7 years ago
tensor-tang
c44108803a
refine prelu
7 years ago
chenweihang
b081363bae
Merge branch 'sequence_enumerate_op' of https://github.com/chenwhql/Paddle into sequence_enumerate_op
7 years ago
chenweihang
0b7d82befb
doc: refine English description
7 years ago
dzhwinter
b11332a07b
"fix style" ( #13094 )
7 years ago
dzhwinter
ab1097cd8e
Feature/template ( #13093 )
...
* remove template operator
* "fix compile"
* "fix ci"
* "fix ci"
7 years ago
tensor-tang
80edd7ef29
enable run with fuse pass
7 years ago
fengjiayi
f79ca23115
fix bugs
7 years ago
tensor-tang
a79a77eeb5
refine and clean code
7 years ago
tensor-tang
c459fb5be0
add fusion lstm batch mode
7 years ago
whs
e10aa80f03
Add pad2d op. ( #12950 )
...
* Add pad2d op.
* Add unitest and python api.
* Fix cuda op kernel.
* Fix python api.
* Fix python api.
* Update API.spec.
* Fix python api
7 years ago
tensor-tang
7bdd11d88e
Merge branch 'develop' into optimize/op/fusion_gru
7 years ago
fengjiayi
1f36a4c27c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_CudnnHolder
7 years ago
fengjiayi
b0aca8824d
make CudnnHolder thread safe
7 years ago
tensor-tang
596213906b
add gru seq mode forward
7 years ago
zenghsh3
d7495838b3
refine
7 years ago
zenghsh3
04a05d1d58
merged
7 years ago
zenghsh3
08b73b68c4
fix bug of sampling_id_op
7 years ago
tensor-tang
b0d36c4c3d
add cross vec to speedup gru
7 years ago
tensor-tang
038c16eed2
save intermediate data to out buffer
7 years ago
Xingyuan Bu
0a97d24b41
Faster RCNN Generate Proposal Labels ( #12616 )
...
* Add generate_proposal_labels for Faster-RCNN.
7 years ago
fengjiayi
d5f74b7308
use CudnnHolder in conv_transpose_cudnn_op
7 years ago
fengjiayi
407ff0bdbc
use CudnnHolder in conv_cudnn_op
7 years ago
chengduo
3bd1d22a7d
Enhance fused_elementwise_activation_op ( #12837 )
...
* Enhance the function of fused_elementwise_activation_op
* enhance unit test
* Clean Code And Add Doc
* Add compound functors
* Fix doc and enhance unit test
* define Dx and Dy for d_binary_func
* add mul_scale
* add mul_scale
* add elementwise_mul
* code refine
* code refine
* add doc
* add AsIntermediate
7 years ago
tensor-tang
2d0ddf8c41
refine cpu gru batch mode
7 years ago
tensor-tang
70d3981220
add cpu vec bias sub
7 years ago
jerrywgz
85fe65ae61
modified error info for maxout op
7 years ago
Chen Weihang
b98b744067
Merge branch 'develop' into sequence_enumerate_op
7 years ago
Yan Chunwei
902f19b46a
fea/fuse attention lstm simplify.with fusion lstm.with sequnce expand ( #13006 )
7 years ago
Xingyuan Bu
2ad5d91ef8
Faster RCNN Generate Proposals ( #12056 )
...
* Add proposals generation operator for Faster-RCNN.
7 years ago
tensor-tang
89d6d69ce4
Merge pull request #12781 from tensor-tang/feature/op/fusion_gru
...
add fusion gru
7 years ago
tensor-tang
d941192e74
fix gcc53 on cpu vec ( #13020 )
7 years ago
tensor-tang
2328a69157
Merge pull request #13012 from tensor-tang/refine/seq2batch
...
refine seq2batch
7 years ago
Xin Pan
2bb15f437c
Merge pull request #12791 from panyx0718/ir3
...
graph to program pass
7 years ago
Qiao Longfei
a22309afe8
clean useless check code in auc_op ( #13023 )
7 years ago
Yu Yang
8965cee89f
Polish PrintOp ( #12895 )
...
* Polish PrintOp
* Polish PrintOp
* Polish PrintOp
* Refine test_print_op
7 years ago
chengduo
7ad39c4077
Enhance pad_constant_like_op ( #12999 )
...
* enhance pad_constant_like_op
* add API
* add API
7 years ago
qingqing01
0353eddb51
Improve fake_dequantize_op. ( #12877 )
...
* Improve fake_dequantize_op.
* Follow comments.
7 years ago
Qiao Longfei
11e01d9b2d
Scale support selectedrows ( #12960 )
...
* add ScaleOpVarTypeInference for scale op
* scale op support scale selected rows
* optimize code
* use FindVar
* use FindVarRecursive in ScaleOpVarTypeInference
7 years ago
fengjiayi
7b84c580e2
Merge pull request #12824 from JiayiFeng/dev_sequence_padding_op
...
Sequence pad op
7 years ago
tensor-tang
fd4f7c3ab5
refine seq2batch
7 years ago
Wu Yi
0ee6fed05b
Refine dist rpc deps ( #12899 )
...
* refine dist train RPC deps
* clean up
* clean up
* fix ut
* remove input for fetch_barrier
* follow comments
7 years ago
fengjiayi
7e0c9f50ae
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op
7 years ago
Zeng Jinle
599a32641b
Merge pull request #12971 from sneaxiy/unstack_op
...
Add unstack op
7 years ago
Tao Luo
26cac36bfd
Merge pull request #12515 from kbinias/kbinias/bnorm-fwd-reuse
...
Reusing primitives for forward Batch Norm operator
7 years ago
tensor-tang
a481c5e98c
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_expand_concat_fc
7 years ago
tensor-tang
49c31febb5
fix typo and op test
7 years ago
fengjiayi
9cb455fa7d
update function
7 years ago
Krzysztof Binias
fb4b4f8d57
Refactor code
7 years ago
Krzysztof Binias
50d3e6e96b
Reusing primitives for forward Batch Norm operator
7 years ago
Zeng Jinle
ef7bd03a03
Merge pull request #12964 from sneaxiy/fix_concat_sync
...
Fix concat bug
7 years ago
sneaxiy
52a480bb98
Merge develop
7 years ago
tensor-tang
02909335e9
rename fusion seq_concat_fc to fusion seqexpand_concat_fc
7 years ago
Xin Pan
1a67061fee
graph to program pass
...
fix a few other things
7 years ago
qingqing01
1f09bc320c
Support data type int8_t . ( #12841 )
...
* Support int8 type.
7 years ago
chenweihang
00b30b9938
doc: unified infershape format
7 years ago
chenweihang
0c4697f8cd
fix: change to enumerate by sentence
7 years ago
tensor-tang
c45cee0349
refine infershape and forward
7 years ago
sneaxiy
24264bc0b8
Merge develop
7 years ago
dzhwinter
0153c21d83
add unstack_op
7 years ago
tensor-tang
c7c2506733
add forward implementation
7 years ago
jerrywgz
6033c1a278
Add error info & remove data sharing between input and output in rnn_memory_helper_op
7 years ago
chengduo
3e1050a2e8
Add pad_constant_like_op ( #12943 )
...
* Add pad_constant_batch_size_like
* refine pad_op
* optimize memory
7 years ago
dzhwinter
6cc7870517
fix concat synchronization bug
7 years ago
tensor-tang
954b0e113f
init fusion seq expand concat fc op
7 years ago
tensor-tang
c488ee96a7
Merge remote-tracking branch 'ups/develop' into refine/op/fusion_lstm
7 years ago
tensor-tang
e61cf3214d
complete reverse seq
7 years ago
Chen Weihang
4ec12496dd
Merge branch 'develop' into sequence_enumerate_op
7 years ago
tensor-tang
4b28fab8c9
enable more acts
7 years ago
tensor-tang
607c41952e
compute gates
7 years ago
Qiao Longfei
3c58b87b45
fix auc layer and add check for auc op ( #12954 )
...
* fix auc layer and add check for auc op
* use input to check if states are inited
* optimize code
7 years ago
jerrywgz
835573bbf2
add error_info prelu_op
7 years ago
Yibing Liu
c1488b1796
Merge pull request #12940 from sneaxiy/stack_op
...
Speedup stack_op
7 years ago
dzhwinter
eca4563e5d
operators module ( #12938 )
7 years ago
tensor-tang
6be273cbdb
add seq mode lstm
7 years ago
tensor-tang
36363292c3
Merge pull request #12904 from tensor-tang/refine/jit
...
optimize cpu vec activations
7 years ago
jerrywgz
bc7503c85e
modified error_info for maxout_op
7 years ago
Zeng Jinle
d189d4dbab
Merge pull request #12884 from sneaxiy/sequence_mask_op
...
Add sequence_mask_op for DAM model
7 years ago
sneaxiy
3b38e5a4fc
speed up stack_op
7 years ago
tensor-tang
7bdaf09664
Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
Tao Luo
989cc2a4f4
Merge pull request #12913 from luotao1/concat
...
enhance the forward of concat op
7 years ago
Tao Luo
8650f6ffae
Merge pull request #12898 from luotao1/expand
...
remove broadcast in sequence_expand
7 years ago
Qiao Longfei
52948a0b50
Merge pull request #12909 from jacquesqiao/fix-sparse-update-bug
...
fix sparse update bug
7 years ago
tensor-tang
ba943d38e3
make runtime avx act
7 years ago
tensor-tang
3462c29940
refine add bias with avx
7 years ago
tangwei12
ef6445ee39
Merge pull request #12908 from seiriosPlus/fill_constant_selectedrows
...
add SelectedRows support in fill_constant_op
7 years ago
tensor-tang
bb9f98e10d
add inplace test
7 years ago
tensor-tang
f269614bcd
further optimize tanh with avx and mkl
7 years ago
chenweihang
733ea0d29b
adjust infershape details
7 years ago
luotao1
e999c74cff
Merge branch 'develop' into concat
7 years ago
luotao1
b61cf7ac4f
Merge branch 'develop' into expand
7 years ago
luotao1
2b4edacca0
enhance the forward of concat op
7 years ago
Tao Luo
3e3b5f4fda
Merge pull request #12675 from Sand3r-/fix-conv-mkldnn-0.15
...
Update MKLDNN to 0.15, fix convolution integration
7 years ago
tensor-tang
7a4924cd44
further optimize sigmoid with avx and avx512
7 years ago
qiaolongfei
fcf20eed0f
fix sparse update bug
7 years ago
tangwei12
ca22586818
code optimize
...
(cherry picked from commit 587cca7)
7 years ago
Xin Pan
557be6fc58
Merge pull request #12902 from PaddlePaddle/revert-12736
...
Revert "Disable in_place in batch_norm API. (#12736 )"
7 years ago
tensor-tang
6bd89ba5b6
fix typo
7 years ago
Chen Weihang
2969aba14f
Merge branch 'develop' into sequence_enumerate_op
7 years ago
chenweihang
219a2369da
feat: wrap sequence enumerate op
7 years ago
tensor-tang
e3bb98eb38
optimize relu with avx and avx512
7 years ago
guochaorong
1f270275a6
Revert "Add Python Callstacks when Op::Run error ( #12759 )"
...
This reverts commit b2df17003f
.
7 years ago
guochaorong
b1fc238694
Revert "Disable in_place in batch_norm API. ( #12736 )"
...
This reverts commit f5d5d7b2d9
.
7 years ago
tensor-tang
25976fe736
optimize the sigmoid and tanh
7 years ago
tensor-tang
2eb46c2b06
add cpu vec test
7 years ago
sneaxiy
1083e99520
Merge develop
7 years ago
tensor-tang
f0f06992c1
Merge pull request #12878 from tensor-tang/feature/op/attention_lstm
...
Add attention lstm cpu forward
7 years ago
luotao1
83f4edabe9
remove broadcast in sequence_expand
7 years ago
sneaxiy
5ea7bf88ba
Merge pull request #12872 from sneaxiy/stack_op
...
Add stack_op for DAM model
7 years ago
Tao Luo
ef2da86b4f
Merge pull request #12885 from luotao1/test_ditu_rnn
...
enhance test_analyzer to profile ditu inference demo
7 years ago
sneaxiy
e895c98f0a
add support to max_len is None
7 years ago
fengjiayi
f4a4a4cbd9
add op comment and python layer
7 years ago
tangwei12
acdd95d5ca
bug fix
7 years ago
chenweihang
d2e5395b97
feat: add sequence enumerate op
7 years ago
luotao1
9c7fde45a7
enhance test_analyzer to profile ditu inference demo
7 years ago
chengduo
8ad9055804
Add is_test for while_op ( #12874 )
...
* add is_test for while_op
* Change API
7 years ago
sneaxiy
64464cb1fa
Merge develop
7 years ago
qingqing01
79918a8442
add sequence_mask_op for DAM model
7 years ago
Yu Yang
b2df17003f
Add Python Callstacks when Op::Run error ( #12759 )
...
* Add Python Callstacks when Op::Run error
* Skip op with sub-block
* refactor: refine callstack info's format
* Reshape only support matrix
* Polish Python code
* Fix UT
* Fix Py3
7 years ago
Yu Yang
17fcc4f5d0
Merge pull request #12864 from reyoung/feature/process_lod_grad
...
Feature/process lod grad
7 years ago
tensor-tang
5ca0bb9aad
support more activation type and remove some comments
7 years ago
sneaxiy
ba168bd2d2
modify API.spec
7 years ago
tensor-tang
d9bf73f3ab
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_gru
7 years ago
tensor-tang
dd938d0b94
fix bugs and pass op test
7 years ago
tensor-tang
ec59f0d454
add cpu vec
7 years ago
tensor-tang
cf5ea925c3
fix bugs
7 years ago
tensor-tang
6ed20474d4
refine attention lstm infershape
7 years ago
tensor-tang
508548f897
implement attention lstm cpu forward
7 years ago
tensor-tang
9affc36c89
init attention lstm
7 years ago
tensor-tang
3dd66390b2
add blas vexp
7 years ago
tensor-tang
0ec1f65cf1
fix blas dot and add cblas scal
7 years ago
tensor-tang
a2203d0466
add cblas dot
7 years ago
tensor-tang
f72ab8961e
refine blas gemm
7 years ago
qingqing01
f5d5d7b2d9
Disable in_place in batch_norm API. ( #12736 )
...
* Disable in_place in batch_norm API.
7 years ago
sneaxiy
c73c5ed573
use for_range
7 years ago
Xin Pan
b548ecbc2b
add stack_op
7 years ago
Yu Yang
eb8fd853bc
Fix sequence_softmax_cudnn op
7 years ago
Yu Yang
3768677980
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/process_lod_grad
7 years ago
Yu Yang
2a36ad1a96
Handle LoD for concat & seq_softmax ops
7 years ago
Yu Yang
211d81863d
Process elemwise grad op's lod. mul_op's lod
7 years ago
Yan Chunwei
9ee698e605
enhance/ditu rnn with fc fuse ( #12831 )
...
* make fc fuse work with ditu rnn
* add ditu rnn data download to CMAKE
7 years ago
Xin Pan
78415f326d
Merge pull request #12838 from panyx0718/infer
...
speed up while_op
7 years ago
fengjiayi
ce182d9037
bug fix
7 years ago
Xin Pan
a2c0e52f3e
speed up while_op
7 years ago
tensor-tang
6f78fd7d1e
fuse fc in gru
7 years ago
tensor-tang
300180cc26
init fusion gru op
7 years ago
Zhaolong Xing
21ba32b065
Merge pull request #12843 from NHZlX/fix_ssa_bug_for_trt
...
fix ssa bug with batch_norm and refine the trt
7 years ago
Michał Gallus
cd32ddac12
Fuse Convolution and Eltwise Add into MKLDNN's Conv+Bias ( #12669 )
...
* Fuse Convolution and Eltwise Add into Conv+Bias
* Reduce bias branching at conv_mkldnn_op
* Add MKLDNN build checks for Conv Bias
* Conv-bias: check if bias input exist befor assignment
* Conv-bias: Remove Bias dim check from infershape
It was causing conv3d test to crash upon\ncalling HasInput(Bias)
7 years ago
nhzlx
c999895e93
merge develop
7 years ago
nhzlx
276950291a
1. fix ssa bug with batchnorm, 2. refine the trt
7 years ago
Yan Chunwei
896a37b6e3
fea/link ir to inference analysis and fc fuse support ( #12789 )
...
* link IR graph to analysis graph
* add clean code and update
* add infer_clean_pass
* add ir_pass_manager
* support fc fuse executation
* fix ir circle
7 years ago
dzhwinter
e23ddf6ae4
status ( #12764 )
7 years ago
Tao Luo
d04ef276a5
Merge pull request #12745 from tensor-tang/refine/op/elewise_mul
...
Refine elementwise mul cpu forward
7 years ago
tangwei12
cbc6e6eb97
Merge pull request #12247 from seiriosPlus/dis_ckpt_fix
...
add load slice_vars in io.py
7 years ago
Qingsheng Li
3d11d018e0
Fix scatter_op python API ( #12742 )
...
* Fix scatter_op python API and remove inconsistency between implementation and doc
* API spec change
* Change as review comment
7 years ago
Tao Luo
8f9f414a14
Merge pull request #12805 from tensor-tang/fix/op/elewise_add
...
fix SEGV element wise add at debug mode
7 years ago
tensor-tang
e955361267
Merge pull request #12737 from tensor-tang/feature/op/fusion_lstm
...
add fusion lstm
7 years ago
tensor-tang
82bb9170fb
Merge remote-tracking branch 'ups/develop' into fix/op/elewise_add
7 years ago
Chen Weihang
57b34d9196
Merge pull request #12808 from chenwhql/remove_inplace_param_in_squeeze_and_unsqueeze
...
Refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
Yihua Xu
084d4a9e9e
Optimize CRF Decoding with AVX/AVX2/AVX512F instruction ( #12767 )
...
* Optimize CRF decoding with AVX/AVX2 instruction
* Enable the AVX2 flags for compiling
* Clean the code and decrease the count of multiply calculation
* Add the support of AVX512 instruction to optimize CRF Decoding
* Clean the code
* Enable the AVX512f flags for compiling
* Clean the code for the invaluable switch
* Fixed the issue to check AVX512F status
* Clean the code
* Add some explanation of the key points
7 years ago
fengjiayi
34b209cffa
Complete sequence_padding GPU kernel
7 years ago
dzhwinter
00463fdfe3
cudnn windows support ( #12757 )
...
* cudnn widndows
* "add comment"
* "windows support"
* "fix cmake error"
7 years ago
qingqing01
c62f68cb94
Fix bug in conditional_block_op. ( #12246 )
...
* Fix bug in conditional_block_op.
* Fix bug and add comments.
* Rename arguments.
7 years ago
chenweihang
bc471b6ac4
refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
tensor-tang
0507f7bc3c
fix SEGV elementwise add at debug mode
7 years ago
tangwei12
ca1e18c04a
Merge pull request #12469 from seiriosPlus/sum_op_dim_fix
...
sum_op selectedRows dim bug fix
7 years ago
Zhaolong Xing
e5674f6dde
Merge pull request #12753 from NHZlX/add_benchmark
...
modify tensorrt engine op from cpu mode to gpu
7 years ago
tensor-tang
b090479409
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
tangwei12
b4f52b01d0
bug fix when all inputs are empty
7 years ago
tangwei12
3efac174ea
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into sum_op_dim_fix
7 years ago
tangwei12
dbb4f0d35d
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into dis_ckpt_fix
7 years ago
Qiao Longfei
fd10669ecb
Add dependency to send recv ( #12760 )
...
Add dependency to send recv
7 years ago
fengjiayi
8d8d48a34f
Complete sequence_pad_op and its CPU kernel. Add unittests
7 years ago
tangwei12
7c12c0f865
add sync in load selectedrows
7 years ago
Michal Gallus
4a7f0698e0
Add consts to new MKLDNN integration
...
Also replace memory types from int64_t to size_t
7 years ago
Michal Gallus
6588d0e039
Update MKLDNN to 0.15, fix conv integration
7 years ago
tangwei12
9f11db4080
add todo in impl
7 years ago
tangwei12
c24a9263ba
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into sum_op_dim_fix
7 years ago
tangwei12
ac9ae97001
code fix
7 years ago
nhzlx
f55e8901c8
merge develop
7 years ago
nhzlx
1600ba86f6
1. change tensorrt op from cpu to gpu
7 years ago
tangwei12
bb9f494740
merge develop
7 years ago
dzhwinter
4069262f0e
Revert ""cherry picked operators changes" ( #12184 )" ( #12747 )
...
This reverts commit bf3c34960f
.
7 years ago
Qiao Longfei
653fad08f8
Optimize selected rows for dist lookup table with pthread rwlock ( #12635 )
...
Optimize selected rows for dist lookup table with rwlock
7 years ago
fengjiayi
3c749fae43
update CPU sequence_padding functor
7 years ago
tensor-tang
92890ac258
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
tangwei12
0749c8822d
Merge pull request #12556 from seiriosPlus/samplingIdOp
...
Sampling id op
7 years ago
tensor-tang
a56142c155
optimize elementwise_mul cpu forward
7 years ago
tensor-tang
6644ce79a5
add mklml vmul
7 years ago
tensor-tang
ff92b6ba81
Merge pull request #12531 from tensor-tang/refine/op/gru
...
Refine gru cpu forward
7 years ago
tangwei12
26b228e405
remove assignment and add vlog
7 years ago
tangwei12
125e9166e1
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into sum_op_dim_fix
7 years ago
tensor-tang
a72f68f223
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
tensor-tang
df28a3b452
fix lod and op test
7 years ago
Qingsheng Li
317e18abd2
Remove Data Sharing between input and output in scatter_op ( #12672 )
...
* Remove Data Sharing between input and output in scatter_op
* Removed data sharing in backward op
7 years ago
tensor-tang
f3cd2612ae
refine fc and use the fc compute in fusion_lstm
7 years ago
tangwei12
822496f626
merge cpu and gpu
7 years ago
dzhwinter
bf3c34960f
"cherry picked operators changes" ( #12184 )
...
* "cherry picked operators changes"
* "remove duplicated code"
* "add constant setter"
* "add get expected kernel"
* "fix ci"
* "add fill constant"
7 years ago
tensor-tang
40138c4cd6
add unit test of fusion lstm op
7 years ago
jerrywgz
c108376506
Add three modes for prelu_op ( #12630 )
...
* Add three modes for prelu_op.
7 years ago
tangwei12
9f09d68678
add enforce
7 years ago
gongweibao
d06849305a
parameter dispather. ( #12666 )
7 years ago
tensor-tang
852bc6f4aa
refine fusion lstm op doc
7 years ago
tensor-tang
8f9132959e
fuse fc in lstm
7 years ago
tensor-tang
ddb05dffb6
init fusion lstm op
7 years ago
tensor-tang
efc5392d97
Merge pull request #12676 from tensor-tang/refine/op/fc
...
refine fc op
7 years ago
tangwei12
470fb7c5c3
bug fix
7 years ago
tangwei12
60dda7bf9f
add gpu Implementation
7 years ago
tangwei12
4661f5589d
random optimize
7 years ago
Bai Yifan
9333a62792
Add flatten op interface and enhance APIs about detection to support variable-length image. ( #12422 )
...
* add flatten api&enhance detection api
* unify shape_op data type
* update API.spec
7 years ago
tensor-tang
eee38464dc
refine fc op use cpu only
7 years ago