Yu Yang
17fcc4f5d0
Merge pull request #12864 from reyoung/feature/process_lod_grad
...
Feature/process lod grad
7 years ago
Xin Pan
698c926ce5
copy program and fix op_desc
7 years ago
minqiyang
8b8f6487d9
Add debug info for fetch feed
7 years ago
tensor-tang
4e538db14d
refine jit space
7 years ago
tensor-tang
5ca0bb9aad
support more activation type and remove some comments
7 years ago
sneaxiy
ba168bd2d2
modify API.spec
7 years ago
tensor-tang
dd938d0b94
fix bugs and pass op test
7 years ago
tensor-tang
ec59f0d454
add cpu vec
7 years ago
tensor-tang
cf5ea925c3
fix bugs
7 years ago
tensor-tang
6ed20474d4
refine attention lstm infershape
7 years ago
tensor-tang
508548f897
implement attention lstm cpu forward
7 years ago
tensor-tang
9affc36c89
init attention lstm
7 years ago
tensor-tang
3dd66390b2
add blas vexp
7 years ago
tensor-tang
0ec1f65cf1
fix blas dot and add cblas scal
7 years ago
tensor-tang
a2203d0466
add cblas dot
7 years ago
tensor-tang
f72ab8961e
refine blas gemm
7 years ago
qingqing01
f5d5d7b2d9
Disable in_place in batch_norm API. ( #12736 )
...
* Disable in_place in batch_norm API.
7 years ago
sneaxiy
c73c5ed573
use for_range
7 years ago
Xin Pan
b548ecbc2b
add stack_op
7 years ago
Yu Yang
eb8fd853bc
Fix sequence_softmax_cudnn op
7 years ago
Yu Yang
3768677980
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/process_lod_grad
7 years ago
Tao Luo
decda738b0
fea/anakin compile with demo ( #12772 )
...
* anakin support x86
* fix code style
* add anakin ditu cnn demo
* add timer
* add rnn
* fix inference_anakin_cnn/rnn_test compile error
* make anakin_rnn_tester run
* add anakin_enable_op_time option
* update api/CMakeLists.txt
* enlarge the max_batch_size in anakin.config
* update with comments
7 years ago
Yu Yang
2a36ad1a96
Handle LoD for concat & seq_softmax ops
7 years ago
Yu Yang
211d81863d
Process elemwise grad op's lod. mul_op's lod
7 years ago
Yan Chunwei
9ee698e605
enhance/ditu rnn with fc fuse ( #12831 )
...
* make fc fuse work with ditu rnn
* add ditu rnn data download to CMAKE
7 years ago
Xin Pan
78415f326d
Merge pull request #12838 from panyx0718/infer
...
speed up while_op
7 years ago
Xin Pan
a2c0e52f3e
speed up while_op
7 years ago
typhoonzero
dd7a79158b
add scope info in graphviz debug
7 years ago
Zhaolong Xing
21ba32b065
Merge pull request #12843 from NHZlX/fix_ssa_bug_for_trt
...
fix ssa bug with batch_norm and refine the trt
7 years ago
Michał Gallus
cd32ddac12
Fuse Convolution and Eltwise Add into MKLDNN's Conv+Bias ( #12669 )
...
* Fuse Convolution and Eltwise Add into Conv+Bias
* Reduce bias branching at conv_mkldnn_op
* Add MKLDNN build checks for Conv Bias
* Conv-bias: check if bias input exist befor assignment
* Conv-bias: Remove Bias dim check from infershape
It was causing conv3d test to crash upon\ncalling HasInput(Bias)
7 years ago
nhzlx
c999895e93
merge develop
7 years ago
nhzlx
276950291a
1. fix ssa bug with batchnorm, 2. refine the trt
7 years ago
Yan Chunwei
896a37b6e3
fea/link ir to inference analysis and fc fuse support ( #12789 )
...
* link IR graph to analysis graph
* add clean code and update
* add infer_clean_pass
* add ir_pass_manager
* support fc fuse executation
* fix ir circle
7 years ago
dzhwinter
e23ddf6ae4
status ( #12764 )
7 years ago
Tao Luo
d04ef276a5
Merge pull request #12745 from tensor-tang/refine/op/elewise_mul
...
Refine elementwise mul cpu forward
7 years ago
tangwei12
cbc6e6eb97
Merge pull request #12247 from seiriosPlus/dis_ckpt_fix
...
add load slice_vars in io.py
7 years ago
Qiyang Min
72965226e6
Merge pull request #12818 from velconia/fix_python3_CI_job
...
Fix python3 CI job
7 years ago
minqiyang
656c77e712
Resume cicheck
7 years ago
minqiyang
e1492f19e1
Change the sequence of ci check
7 years ago
tangwei12
44bade8b17
fix api spec
7 years ago
Zhaolong Xing
470335e8c4
Merge pull request #12786 from NHZlX/add_batch_norm_trt_converter
...
Add batch norm trt converter
7 years ago
Qingsheng Li
3d11d018e0
Fix scatter_op python API ( #12742 )
...
* Fix scatter_op python API and remove inconsistency between implementation and doc
* API spec change
* Change as review comment
7 years ago
nhzlx
ff052c0e6f
merge develop
7 years ago
nhzlx
c6a5c4b0c0
add comments for execute in ut_helper
7 years ago
minqiyang
50d66a0790
Fix prelu_op
7 years ago
minqiyang
beb93bb901
Fix ut bug for graph_test
...
Port dist_transpiler new added codes
Port ut for clone desc
7 years ago
Tao Luo
8f9f414a14
Merge pull request #12805 from tensor-tang/fix/op/elewise_add
...
fix SEGV element wise add at debug mode
7 years ago
tensor-tang
e955361267
Merge pull request #12737 from tensor-tang/feature/op/fusion_lstm
...
add fusion lstm
7 years ago
tensor-tang
82bb9170fb
Merge remote-tracking branch 'ups/develop' into fix/op/elewise_add
7 years ago
tangwei12
99f74be561
Merge pull request #12802 from seiriosPlus/inference_teeny_mistakes
...
fix some teeny mistakes
7 years ago
Tao Luo
2ae885e224
Merge pull request #12811 from luotao1/tensorrt_compiler_bug
...
fix tensorrt compiler bug
7 years ago
Chen Weihang
57b34d9196
Merge pull request #12808 from chenwhql/remove_inplace_param_in_squeeze_and_unsqueeze
...
Refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
Xin Pan
daf464af68
Merge pull request #12807 from panyx0718/fix
...
fix program_desc constructor
7 years ago
luotao1
808e5b1748
fix tensorrt compiler bug
7 years ago
Yihua Xu
084d4a9e9e
Optimize CRF Decoding with AVX/AVX2/AVX512F instruction ( #12767 )
...
* Optimize CRF decoding with AVX/AVX2 instruction
* Enable the AVX2 flags for compiling
* Clean the code and decrease the count of multiply calculation
* Add the support of AVX512 instruction to optimize CRF Decoding
* Clean the code
* Enable the AVX512f flags for compiling
* Clean the code for the invaluable switch
* Fixed the issue to check AVX512F status
* Clean the code
* Add some explanation of the key points
7 years ago
dzhwinter
00463fdfe3
cudnn windows support ( #12757 )
...
* cudnn widndows
* "add comment"
* "windows support"
* "fix cmake error"
7 years ago
Xin Pan
4a4c469f61
add test
7 years ago
qingqing01
c62f68cb94
Fix bug in conditional_block_op. ( #12246 )
...
* Fix bug in conditional_block_op.
* Fix bug and add comments.
* Rename arguments.
7 years ago
nhzlx
1bf9d9e90c
fix comments
7 years ago
chenweihang
bc471b6ac4
refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
Xin Pan
7473d5f735
fix program_desc constructor
7 years ago
tensor-tang
0507f7bc3c
fix SEGV elementwise add at debug mode
7 years ago
tangwei12
cfb12f09bf
fix some teeny mistakes
7 years ago
Yu Yang
c6af7201e9
Merge pull request #12692 from reyoung/feature/fast_executor
...
Feature/fast executor
7 years ago
Xin Pan
e525aa232e
Merge pull request #12780 from panyx0718/ir4
...
fix ProgramToGraph
7 years ago
Tao Luo
7decbaaa13
Merge pull request #12762 from luotao1/anakin_cuda_env
...
disable anakin when cuda < 8.0 or cudnn < 7.0
7 years ago
nhzlx
324dd16816
merge develop
7 years ago
yuyang18
b8029fd650
Follow comments
7 years ago
tangwei12
ca1e18c04a
Merge pull request #12469 from seiriosPlus/sum_op_dim_fix
...
sum_op selectedRows dim bug fix
7 years ago
Xin Pan
1d3343240e
fix
7 years ago
nhzlx
144b20c160
add batch norm op converter
7 years ago
nhzlx
14311bb094
merge develop
7 years ago
Zhaolong Xing
e5674f6dde
Merge pull request #12753 from NHZlX/add_benchmark
...
modify tensorrt engine op from cpu mode to gpu
7 years ago
Zhaolong Xing
310708726b
Merge pull request #12761 from NHZlX/global_pooling_trt
...
Add support for global pooling for trt
7 years ago
tensor-tang
b090479409
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
nhzlx
1e92baf746
fix comments
7 years ago
Xin Pan
17b88811e0
fix ProgramToGraph
...
when while_grad, it writes multiple @EMPTY@ with no VarDesc.
7 years ago
tangwei12
b4f52b01d0
bug fix when all inputs are empty
7 years ago
tangwei12
3efac174ea
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into sum_op_dim_fix
7 years ago
tangwei12
dbb4f0d35d
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into dis_ckpt_fix
7 years ago
Qiao Longfei
fd10669ecb
Add dependency to send recv ( #12760 )
...
Add dependency to send recv
7 years ago
nhzlx
ce7f361a80
fix comments
7 years ago
Xin Pan
a9217031ba
small fix
7 years ago
nhzlx
df9cbabcee
add pool2d test for global_pooling true
7 years ago
dzhwinter
2673798ddb
"fix float16 ShuffleDownSync Bug" ( #12756 )
...
* "fix bug"
* "add test case"
7 years ago
Yan Chunwei
6fe5547db7
switch NodeAttr to boost::varient ( #12539 )
7 years ago
Chen Weihang
535a6e9206
Merge pull request #12509 from JiabinYang/scripts0802
...
fix the paddle script causes 'command not found' error'
7 years ago
nhzlx
133ec69625
add batch norm trt converter
7 years ago
tangwei12
7c12c0f865
add sync in load selectedrows
7 years ago
luotao1
413bf9d494
disable anakin when cuda < 8.0 or cudnn < 7.0
7 years ago
Michal Gallus
4a7f0698e0
Add consts to new MKLDNN integration
...
Also replace memory types from int64_t to size_t
7 years ago
Michal Gallus
6588d0e039
Update MKLDNN to 0.15, fix conv integration
7 years ago
tangwei12
9f11db4080
add todo in impl
7 years ago
tangwei12
40febec402
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into dis_ckpt_fix
7 years ago
tangwei12
c24a9263ba
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into sum_op_dim_fix
7 years ago
Qiao Longfei
03d4c7efd3
add rw lock test ( #12752 )
...
* add rw lock test
* optimize read_write and wirte_read test
7 years ago
dzhwinter
f36818d532
"windows testing easier" ( #12739 )
7 years ago
nhzlx
2bdd20be22
add support for global pooling for trt
7 years ago
tangwei12
ac9ae97001
code fix
7 years ago
nhzlx
f55e8901c8
merge develop
7 years ago