sneaxiy
e240ba2918
implement backward
...
test=develop
7 years ago
sneaxiy
06f8aa5b97
remove while_op support temporarily
...
test=develop
7 years ago
Yu Yang
81520a24cf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_eigen_tensor
7 years ago
Yu Yang
9bd70a1e04
Change tensor uses proto::VarType::type
...
test=develop
7 years ago
Yu Yang
eeca721a99
Merge pull request #14842 from reyoung/feature/refine_eigen_tensor
...
Fix Eigen macro when using GPU
7 years ago
tensor-tang
d538513fce
fix the compile error on mac
7 years ago
tensor-tang
28eb7d840c
test all impls and all inplace cases
7 years ago
Yihua Xu
acc6ae49b1
Fix the issue to run on AVX2 and AVX512F machines ( #14851 )
...
test=develop
7 years ago
Michal Gallus
92daace55c
MKL-DNN Concat: Fix segfault related to referencing deleter memory primitive
...
test=develop
7 years ago
tensor-tang
d4cab7d948
use jitkernel in one file
7 years ago
tensor-tang
adc7ba2edd
Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
tensor-tang
900c789a35
use jitcode and use vmul
7 years ago
Qiao Longfei
1870262ba9
pserver should crash early whe has problem
...
test=develop
7 years ago
dengkaipeng
a81fabd327
fix doc errors. test=develop
7 years ago
dengkaipeng
cf06e50f1d
add doc for adaptive pool. test=develop
7 years ago
dengkaipeng
266c6856c9
add adaptive pool 2d & 3d. test=develop
7 years ago
dengkaipeng
eab4745965
add adaptive mode for pool.
7 years ago
Yu Yang
7604b1ad51
Fix Eigen macro when using GPU
...
The macro should be defined by compiler rather than by source.
test=develop
7 years ago
Qiao Longfei
1213e2838f
Merge pull request #14820 from jacquesqiao/fix-split-selected-rows
...
split selected rows op should always init output selected rows
7 years ago
JiabinYang
c35fdf1581
Merge branch 'add_prefetch_in_nce' of https://github.com/seiriosPlus/Paddle into feature/add_prefech_hs
7 years ago
sneaxiy
7923042365
merge develop
...
test=develop
7 years ago
tangwei12
59cbf06e2e
fix numel nce and prefetch
...
test=develop
7 years ago
sneaxiy
8760d23c7d
featue/py_func
7 years ago
tangwei12
33a004a779
fix numel nce and prefetch
7 years ago
zhang wenhui
c4c5f0b8ca
Merge pull request #14771 from frankwhzhang/bpr
...
add bpr_loss operator
7 years ago
Yancey1989
79082c9459
fix pyreader failed
7 years ago
tensor-tang
53709e7e61
refine names
7 years ago
Qiao Longfei
abf140289f
split selected rows op should always init output selected rows
...
test=develop
7 years ago
Yancey1989
2dda19f756
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
7 years ago
frankwhzhang
c9a653820b
fix label_pos ,add test_layers.py, test=develop
7 years ago
tangwei12
57557f6774
fix scope in nce and prefetch
7 years ago
frankwhzhang
a672b291e5
fix code style, test=develop
7 years ago
frankwhzhang
ea95f9c335
fix style bug, test=develop
7 years ago
tangwei12
bb2e7f0bbe
add scope in prefetch
7 years ago
gongweibao
f1fb64b17f
Add reduce sparse tensor feature. ( #14757 )
7 years ago
tangwei12
527946df49
add scope in prefetch
7 years ago
Yancey1989
73edf13767
update
7 years ago
Yancey1989
220db4f334
clean code
7 years ago
frankwhzhang
f4cc5881b0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into bpr
7 years ago
frankwhzhang
97de98cd0a
update bpr_loss op code, test=develop
7 years ago
Yihua Xu
3821fc3950
Merge branch 'develop' into develop_4f71a6ee2_conv3d_bias_fusion_mkldnn_impl
...
test=develop
7 years ago
Yihua Xu
240d974ac5
Clean Code
...
test=develop
7 years ago
Tao Luo
54fcafb5f6
Merge pull request #14707 from yihuaxu/develop_4f71a6ee2_conv3d_mkldnn_opt
...
Implement conv3d with mkldnn library
7 years ago
tangwei12
b653ed0516
add prefetch and remvoe selectedrows of bias
7 years ago
sneaxiy
387bac46b5
refine code
...
test=develop
7 years ago
Yihua Xu
155328a488
Clean Code
...
test=develop
7 years ago
Tao Luo
743cb840f1
update with comments
...
test=develop
7 years ago
Yancey1989
c9de6f1b05
init parallel graph mode
7 years ago
tensor-tang
ce674b685f
add readme doc and complete TODOs
7 years ago
tangwei12
7fa2e821e4
add local scope in nce
7 years ago
Tao Luo
42359e88a4
clean code
...
test=develop
7 years ago
Tao Luo
923b18877e
Merge branch 'develop' into memory_load
...
test=develop
7 years ago
Tao Luo
405b2486db
support loading from memory
...
test=develop
7 years ago
tangwei12
627a6b8bac
add prefetch in nce
7 years ago
frankwhzhang
272f3d3111
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into bpr
7 years ago
tangwei12
4cb0100c8e
add prefetch in nce
7 years ago
frankwhzhang
570d89ec84
add bpr_loss operator , test=develop
7 years ago
qingqing01
549f165b59
Speed conv_fusion_op for identity activation. ( #14744 )
...
* Refine conv_fusion_op for identity activation.
* Fix unit testing.
test=develop
7 years ago
tensor-tang
fab0ee8757
Merge remote-tracking branch 'ups/develop' into refine/jitkernel
7 years ago
Houjiang Chen
c6b39a0099
Merge pull request #14714 from NHZlX/add_prelu_gpu
...
add prelu cuda kernel for inference.
7 years ago
tensor-tang
dbe451976b
Merge pull request #14753 from tensor-tang/refine/namespace
...
remove jit namespace
7 years ago
Jiabin Yang
d9bb55a1f9
Merge pull request #14756 from JiabinYang/fix_hs_op
...
fix bug in dist train on hs, test=develop
7 years ago
Yihua Xu
65dbc7cca4
Merge branch 'develop' into develop_4f71a6ee2_conv3d_mkldnn_opt
7 years ago
JiabinYang
e05e1d7d88
fix bug in dist train on hs, test=develop
7 years ago
tensor-tang
a1eb21e704
refine names
7 years ago
tensor-tang
b523787f9f
remove jit namespace
...
test=develop
7 years ago
tensor-tang
191948c933
enable jitcode
7 years ago
tensor-tang
4a93db9288
remove jit namespace
...
test=develop
7 years ago
Hongyu Liu
8cda28f345
Merge pull request #14733 from phlrain/add_cudnn_5_support
...
Add cudnn 5 support
7 years ago
Xin Pan
73b4d1aa72
Merge pull request #14742 from panyx0718/infer2
...
support customized kernel selection
7 years ago
Jiabin Yang
21c0f8749e
Merge pull request #14728 from JiabinYang/optimize_hs_op
...
Optimize hs op
7 years ago
tensor-tang
45bfa70cb8
complete vmul jit kernel
7 years ago
tensor-tang
77236e33fc
init jitkernel
7 years ago
Xin Pan
82d68281c0
follow comments
...
test=develop
7 years ago
liuhongyu
8b2898e201
fix bug of formate; test=develop
7 years ago
Xin Pan
41c28d54c6
allow customize kernel selection
...
test=develop
7 years ago
liuhongyu
773dc73fbf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_5_support
7 years ago
liuhongyu
8daf67f90f
fix bugs; test=develop
7 years ago
chengduo
04539d4c5d
Fix clip.py ( #14718 )
...
* expose square
test=develop
* fix activation
test=develop
* Add square API
test=develop
* add necessary op
* code refine
* fix API.spec
test=develop
* fix unit test
test=develop
* add unit test sparse_grad_clip
test=develop
* fix API.spec
test=develop
* remove mac test for test_gradient_clip
test=develop
* remove selectedrows_mul_tensor
test=develop
7 years ago
Michal Gallus
6fdbb365ce
Include MKL-DNN header to concat op only when flag is set
...
test=develop
7 years ago
Michal Gallus
f2a880421e
Fix style @ concat integration and tests
...
test=develop
7 years ago
Michal Gallus
738069e491
Refactor MKL-DNN Concat
...
test=develop
7 years ago
Michal Gallus
208f912512
Implement MKL-DNN Concat
...
test=develop
7 years ago
liuhongyu
968dd3c078
add cudnn 5 support; test=develop
7 years ago
sneaxiy
e694d0c2e4
fix while_op eager deletion bug
...
add unittest
test=develop
7 years ago
JiabinYang
8c75705984
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize_hs_op
...
, test=develop
7 years ago
JiabinYang
b387a19410
optimize op with blas
7 years ago
Zeng Jinle
ff4237309a
Merge pull request #14720 from sneaxiy/fix_seq_mask_op_infershape
...
Fix sequence_mask_op InferShape
7 years ago
Kaipeng Deng
934f13a70a
Merge pull request #14371 from heavengate/yolo_loss
...
Add YOLOv3 loss operator for YOLOv3 model
7 years ago
sneaxiy
65867d8989
test=develop
7 years ago
sneaxiy
64ad051b9a
merge develop
...
test=develop
7 years ago
sneaxiy
c47c451a00
fix bug
7 years ago
nhzlx
e7abe6b654
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_prelu_gpu
...
test=develop
7 years ago
nhzlx
f75815b78c
add prelu gpu inference
7 years ago
Yihua Xu
ea00270fe8
Remove the dims checking when the dim is 3 (test=develop)
7 years ago
jerrywgz
96dc3d8326
Merge pull request #14511 from jerrywgz/ignore_index_for_sigmoid_cross_entropy
...
add ignore index for sigmoid cross entropy with logits op, test=develop
7 years ago
Yihua Xu
669191c9cc
Implement conv3d with mkldnn library (test=develop)
7 years ago
Hongyu Liu
4f71a6ee2c
Merge pull request #14622 from PaddlePaddle/add_cudnn_lstm
...
Add cudnn lstm
7 years ago
Yibing Liu
c7382df80f
Print assert failure id in lookup_table_op ( #14698 )
7 years ago
phlrain
4c256ca6be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
phlrain
b65722d3cf
fix uni test; test=develop
7 years ago
tangwei12
618f7620e2
add enforce for auc ( #14687 )
...
* add enforce for AUC, test=develop
7 years ago
phlrain
2770ea1a73
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
chengduozh
3f4aca618f
code refine
...
test=develop
7 years ago
chengduozh
af8c2cec13
fix operator.cmake
...
test=develop
7 years ago
chengduozh
679d8fc6fe
rename op name
...
test=develop
7 years ago
jerrywgz
3df0538940
replace -100 to kIgnoreIndex
7 years ago
Wang Guibao
41e19eb431
AsyncExecutor ( #14627 )
...
* AsyncExecutor: C++ side
* Google naming conventions
* Rename MultiExecutor to AsyncExecutor
* pybind with async_executor
* Naming convention
* remove some flags and unused code
* add refactored file of async_executor and data_feed
* clear async executor interface and add data feed factory
* split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
* Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
* refine async_executor_refactor.cc
* add some files about datafeed
* Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
* Interface rework
* add MultiSlotDataFeed
* Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
* update data_feed for add MultiSlotDataFeed
* update datafeed and async_executor to run bow_net demo
* fix bug that finish_set_filelist failed in multithread
* delete finish_binding_memory_(flag), because it can not be marked under the current interface
* Fix bug
* update async_executor.py for support set_use_slots
* update async_executor.py for support set_use_slots and set set_dense_slots
* fix bug that when the number of files is less than the number of threads, it will fetch nan
* remove redundant code, and make executor exit when set a illegal queue size
* add batch_size check
* add MultiSlotDesc
* Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
* add some checkpoint in DataFeedDesc
* add CheckFile function in MultiSlotDataFeed
* update something error info
* fix deaded lock bug
* Fix fetch variable
* Merge error
* fix code style in async_executor
* using one lock blocking queue replace two lock blocking queue because of some bugs
* update code style
* add utest for data_feed
* Fix fetch var
* update utest for data_feed for multithread
* update SetFileList info
* fix bug in utest of data_feed
* Add comments for python
* Add comments for python code
* Fix pybind.cc with new pybind11 version
* add note for DataFeedDesc's set_use_slots function
* Add save_model
* update data_feed_test for multi-type
* add comment for executor_thread_worker
* Remove unused code
* update data_feed_test for generate test data file
* removed unnecessary interfaces and add comments
* c++ style check
* update data_feed.cc
* AsyncExecutor: C++ side
Google naming conventions
Rename MultiExecutor to AsyncExecutor
pybind with async_executor
Naming convention
remove some flags and unused code
add refactored file of async_executor and data_feed
clear async executor interface and add data feed factory
split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto
Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch
refine async_executor_refactor.cc
add some files about datafeed
Revert "add some files about datafeed"
This reverts commit 8ee8133ab841196925a2812b76f18d2812a6701d.
add MultiSlotDataFeed
Interface rework
Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side
update datafeed and async_executor to run bow_net demo
update async_executor.py for support set_use_slots
Fix bug
update async_executor.py for support set_use_slots and set set_dense_slots
fix bug that when the number of files is less than the number of threads, it will fetch nan
remove redundant code, and make executor exit when set a illegal queue size
add MultiSlotDesc
Revert "add MultiSlotDesc"
This reverts commit 2e72ebfad364ed6b5dcc75f38ffb2a1fdec83d8e.
add some checkpoint in DataFeedDesc
Fix fetch variable
fix code style in async_executor
Fix fetch var
add utest for data_feed
Add comments for python
update utest for data_feed for multithread
fix bug in utest of data_feed
Add comments for python code
Fix pybind.cc with new pybind11 version
add note for DataFeedDesc's set_use_slots function
update data_feed_test for multi-type
Add save_model
update data_feed_test for generate test data file
removed unnecessary interfaces and add comments
add comment for executor_thread_worker
Remove unused code
update data_feed.cc
c++ style check
* commit for code style
* commit for code style
* commit for code style
* commit for code style
* Comment away __init__ in async_executor.py
* clang-format fix test=develop
* use PADDLE_THROW instead of exit(-1); use unique_ptr to manage scope var in data_feed_test.cc
* commit for update code style
* commit for update code style
* Add async_executor demo; Remove some methods
test=develop
* commit for update code style
* commit for update code style
* commit for update code style
* update API.spec
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* AsyncExecutor
test=develop
* Fix API.spec
test=develop
* Fix API.spec
test=develop
* Fix windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* FIx windows build error
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix Windows Build
test=develop
* Fix code style
test=develop
* Fix code style
test=develop
* update datafeed
* Fix code style
test=develop
* update data_feed_test for test Tensor test=develop
* Fix code style
test=develop
* Fix windows build failure
test=develop
* Fix code style and windows build failure
test=develop
* Fix PYTHON3.5 build failure
test=develop
* AsyncExecutor API
test=develop
7 years ago
whs
1b9753d109
Make pad2d support for variable paddings. ( #14667 )
...
* Make pad2d support for variable paddings.
test=develop
* Rename get_paddings and add inline modifier.
test=develop
* Fix comments.
7 years ago
luotao1
bcc90123f0
speedup box_coder_op for multi-threads
...
test=develop
7 years ago
phlrain
6ce4250172
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
Qiao Longfei
44debca844
Merge pull request #14589 from jacquesqiao/refactor-prefetch
...
Refactor prefetch
7 years ago
phlrain
bd94ab0ef3
rename op; test=develop
7 years ago
phlrain
92f5be1d82
remove inputvarname in operator; test=develop
7 years ago
phlrain
cf1fe61004
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
phlrain
d1a17cadd4
fix cudnn rnn; test=develop
7 years ago
Tao Luo
20120d9c97
Merge pull request #14608 from jczaja/prv-conv2d-transpose-mkldnn
...
[MKL-DNN]conv2d transpose
7 years ago
Qiao Longfei
3e45a5a5ec
lookup_table gpu kernel support prefetch
...
test=develop
7 years ago
qingqing01
731d45a39a
Enable BatchNorm to use global mean and variane during training ( #14630 )
...
* Enable BatchNorm to use global mean and variane during training
* Update doc and follow comments.
7 years ago
Tao Luo
ea47685f91
Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum
...
Softmax for inference MKL further changes
7 years ago
Qiao Longfei
3a3cfc2d8d
prefetch support gpu
...
test=develop
7 years ago
Qiao Longfei
4b9082a4cd
follow comment
7 years ago
chengduo
6776e92846
refine tensor_array_write_read ( #14643 )
...
test=develop
7 years ago
Jacek Czaja
48e1b97e8e
- Coding style fixes
...
test=develop
7 years ago
Qiao Longfei
d32de7e6e1
fix code format test=develop
7 years ago
Qiao Longfei
5a660aee7d
update log level in parameter prefetch test=develop
7 years ago
Qiao Longfei
8ebde595c9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
Qiao Longfei
b9d3d75fc4
fix prefetch dependency test=develop
7 years ago
Qiao Longfei
145c535750
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
minqiyang
9d7c3b18c0
Polish code
...
test=develop
7 years ago
minqiyang
2b430adaee
Polish code
...
test=develop
7 years ago
minqiyang
a02ce58f2c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
Jiabin Yang
12e1719f96
Merge pull request #14352 from JiabinYang/enhance_hierachical_sigmod_op
...
Enhance hierarchical sigmoid op
7 years ago
Qiao Longfei
40f68b1349
unit test ready
7 years ago
Qiao Longfei
36e26a53b0
Optimize bilinear tensor product op ( #14485 )
...
* optimize bilinear_tensor_product
* add set zero to set grad to 0.
7 years ago
Tao Luo
4ec9de0122
Merge pull request #14628 from Sand3r-/mgallus/mkldnn-elementwise_mul
...
EltwiseMul: Changes from previous PR
7 years ago
Qiao Longfei
35b79ab865
Merge pull request #13983 from jacquesqiao/add-ctr-reader
...
Add ctr reader
7 years ago
Qiao Longfei
da387720d7
fix infer compile test=develop
7 years ago
Jacek Czaja
cf40daee58
- Building fix to softmax for inference
7 years ago
Clementine
6c71c1f8f9
Add activation gelu ( #14569 )
7 years ago
Michal Gallus
9455be0ba5
EltwiseMul: Extract StringToFormat to MKLDNN helper
...
test=develop
7 years ago
Jacek Czaja
1540df51cf
- Fix to test_conv2d_transpose_mkldnn for GPU
...
test=develop
7 years ago
JiabinYang
eda069068d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
JiabinYang
a08dc83eb0
remove arg 'non_leaf_num', test=develop
7 years ago
chengduo
6648f5ed6f
add ShareLoD for dropout_grad ( #14616 )
...
test=develop
7 years ago
JiabinYang
c469334cfb
polish python code and comment, test=develop
7 years ago
Qiao Longfei
92afbb923c
fix compile problem test=develop
7 years ago
Qiao Longfei
97cbec9b74
clean code
7 years ago
Qiao Longfei
1edd435da6
fix ci problem test=develop
7 years ago
JiabinYang
87648f8edf
merge develop, test=develop
7 years ago
wopeizl
db9284ecde
Merge pull request #14617 from wopeizl/windows/online
...
Windows/online
7 years ago
JiabinYang
c3c3c0b33c
polish code, test=develop
7 years ago
Jacek Czaja
8bfa1fa9bb
- ASUM MKL integration
7 years ago
phlrain
487ee36aec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
tangwei12
56a4912b76
Make NCE_OP more efficient and support SelectedRows ( #14469 )
...
* Fix truncated normal.
* Fix.
* Make nce support more distribution.
* Fix API.spec.
* Fix python API.
* Fix.
test=develop
* Fix API.spec
test=develop
* Fix sampler.
* Fix order of arguments in python API.
test=develop
* NCE add selectedrows support
* NCE update weighted sampling
* fix bugs in nce_op, and assign_value_op optimized
* fix bugs in nce_op, revert assign_value_op
* nce_op optimize
* nce_op optimize
* nce_op optimize
* add selectedRows test later
test=develop
* add selectedRows supported
* add selectedRows supported
test=develop
* add selectedRows supported
* add nce selectedRows supported, test=develop
* add nce selectedRows supported
* add nce selectedRows supported, test=develop
* fix height in nce, test=develop
* add ut
* add ut, test=develop
* make AutoGrownIndex inline
test=develop
* fix tinny error, test=develop
7 years ago
liuhongyu
1ffe41d722
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
7 years ago
Qiao Longfei
9589babe12
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
...
test=develop
7 years ago
liuhongyu
05917c3c79
add cudnn lstm; test=develop
7 years ago
Qiao Longfei
f35f3fe77a
ctr reader can not be used in windows
...
test=develop
7 years ago
peizhilin
6a85dd3278
Merge remote-tracking branch 'upstream/develop' into windows/build
...
test=develop
7 years ago
peizhilin
38715e6fd0
minor fix
7 years ago
Qiao Longfei
6bef565dac
clean code test=develop
7 years ago
Qiao Longfei
e7d1f524f3
change log level
...
test=develop
7 years ago
JiabinYang
7e4bd695e6
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into enhance_hierachical_sigmod_op
7 years ago
Qiao Longfei
fe54adf70c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
JiabinYang
b10df8bcfa
refine code and add none bias ut, test=develop
7 years ago
Kaipeng Deng
251a1bb0f4
Merge pull request #14588 from heavengate/revert_interpolate
...
fix interpolate_op incompatible. test=develop
7 years ago
Qiao Longfei
668ae9083e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-ctr-reader
7 years ago
Qiyang Min
30e47bce8b
Merge branch 'develop' into revert_vlog
7 years ago
Qiao Longfei
87e4edd2ea
fix grad_varname in remote prefetch
7 years ago
Qiao Longfei
d98c59fd2c
support none sliced variable
7 years ago
dengkaipeng
bb489d4cc9
add interp_method default bilinear. test=develop
7 years ago
dengkaipeng
78f563917c
revert interpolate_op to bilinear_interp_op & nearest_interp_op. test=develop
7 years ago
Jacek Czaja
fb24690a58
- conv2d transpose MKL-DNN
...
test=develop
- Added new header for MKLDNN reuse functionality
- Extended conv2d_transpose GetExpectedKernelType for MKL-DNN supporrt
- Buildable conv transpose mkldnn and conv mkldnn using conv template
- Conv2d transpose roughlt implemented and buildable
- Added modifications conv2d transpose MKLDNN unit tests
- Fix to UT of conv2d transpose mkldnn op
- Wrong type of MKLDNN primitive was chosen for conv2d transpose
- HAcks for conv2d transpose
- UT enalbed
- Replaced copying loop with memcpy
- Draft of passing lambda into AcquireMemory
- Made reorder (IOHW->OIHW) to be called only once
7 years ago
tensor-tang
7a91271436
Merge branch 'develop' into fea/jit/rnn
7 years ago
minqiyang
be04d99fe4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
7 years ago
JiabinYang
81e145764d
refine code and comments, test=develop
7 years ago
Qiao Longfei
af2f5fc824
fix some bugs
7 years ago
JiabinYang
2f6b529aff
refine code and comments, test=develop
7 years ago
minqiyang
53433d7f2e
Revert the changes of VLOG
...
test=develop
7 years ago
tensor-tang
1f0291a51e
add comments and follow comments
...
test=develop
7 years ago
tensor-tang
557229bd39
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
Qiao Longfei
ed9fa4b301
can run
7 years ago
peizhilin
30849d1f20
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
qingqing01
6224e61fd9
Transpose-Flatten-Concat fusion operator. ( #14568 )
...
* Transpose-Flatten-Concat fusion operator.
* Add unit testing and fix bug.
7 years ago
Qiao Longfei
686d15c8e0
update grpc_variable_response
7 years ago
tangwei12
3639d99f99
Fix save and load lookup table/optimizer vars ( #14301 )
...
* fix mkdir conflict
* fix load/save lookup tables
test=develop
* add lookup_table_utils
* fix load optimize vars on pserver
* delete lookup table utils
* fix save and load lookup tables
* fix load optimizer var
* fix load optimizer var, test=develop
* fix python 3 style, test=develop
* move lookup_table_utils to contrib utils
7 years ago
peizhilin
36cd18b549
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Qiao Longfei
d827881502
fix pserver and prefetch rpc
7 years ago
Yiqun Liu
bf222f197d
Use sub scope in tensor_array_to_tensor op. ( #14524 )
...
test=develop
7 years ago
JiabinYang
02d68051db
add sparsed bias grad, test=develop
7 years ago
Qiao Longfei
5856c2f332
change Var to FindVar
7 years ago
Qiao Longfei
312b7786d9
clean code
7 years ago
Qiao Longfei
2b6c0c09d6
add unit test
7 years ago
Qiao Longfei
47280ef8b4
lookup table op support prefetch
7 years ago
gongweibao
c1bf9664cd
Add options to disable SO_REUSEPORT of grpc. ( #14269 )
7 years ago
Qiao Longfei
4ad5fd8f54
add parameter prefetch
7 years ago
Qiao Longfei
9d276fe8a8
add parameter prefetch
7 years ago
luotao1
e21edb26f6
add Set/GetCPUNumThreads api
7 years ago
Qiao Longfei
9851a53478
add prefetch part in pserver
7 years ago
JiabinYang
42470f14b7
test=develop
7 years ago
peizhilin
445fff24dc
add the bigobj option to NVCC compile
...
fix code style
7 years ago
qingqing01
36f08eef3b
CUDA kernel for density_prior_box_op. ( #14513 )
...
* CUDA kernel for density_prior_box_op.
* Support flatten to 2D.
7 years ago
tensor-tang
6a7f83d45d
enable gru jitcode and refine act and lstm jitcode
...
test=develop
7 years ago
tensor-tang
686eaf20ba
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
peizhilin
81bd7eeff4
rollback the format
7 years ago
Qiao Longfei
1f87f263a2
clean code
7 years ago
Qiao Longfei
361cb0e078
lookup remote table can compile
7 years ago
JiabinYang
0fca16847c
temp
7 years ago
JiabinYang
e9be3366a9
test=develop
7 years ago
chengduo
00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH ( #13929 )
...
* refine cublase
test=develop
* code refine
* refine cublas
* add GEMME_EX
* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop
* fix CublasCall for cuda version
test=develop
* fix error
test=develop
* fix GEMM_EX to be compatible with gcc 4.8
test=develop
* add GEMM_EX
test=develop
* to compatiable with gcc4.8
test=develop
7 years ago
peizhilin
dfbac60398
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
7c8c9dc9bf
fix unit test cases
7 years ago
tensor-tang
0c5ed5f6fc
enable peephole jitcode
...
test=develop
7 years ago
JiabinYang
3c6102a367
test=develop
7 years ago
Qiao Longfei
7c3ce2952d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refactor-prefetch
7 years ago
Qiao Longfei
60a4f69b3c
add lookup remote table op
7 years ago
Qiao Longfei
e0b48f7e29
init lookup remote table
7 years ago
tensor-tang
e3b61cf52b
init gru jitcode and fix lstm jitcode
...
test=develop
7 years ago
tensor-tang
0f25446574
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
7 years ago
Dun
ae7d22862b
Group Norm ( #13843 )
...
Add group normalization operator.
7 years ago
wopeizl
d9a1f3e58e
Windows/online ( #14474 )
...
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop
7 years ago
JiabinYang
57a18e32a1
test=develop
7 years ago
peizhilin
bef475c92b
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Tao Luo
5d4d117edc
Merge pull request #14502 from qingqing01/cudnn5_fix
...
Fix compling with cuDNN v5
7 years ago
Jiabin Yang
f7b55de9e5
Merge branch 'develop' into enhance_hierachical_sigmod_op
7 years ago
Yu Yang
e68c1fcd5a
Merge pull request #14522 from reyoung/feature/fix_op_header_deps
...
fix(Compile): fix depends error when compile op using cub
7 years ago
tensor-tang
3562051302
add gru refer code and remove redundant avx code
...
test=develop
7 years ago
JiabinYang
af9a3301da
test=develop
7 years ago
Zhaolong Xing
ad349e770f
Merge pull request #14452 from NHZlX/fix_avg_pool_trt_bug
...
fix avg pool trt bug
7 years ago
tensor-tang
f913860873
jitkernel lstm refer support peephole
...
test=develop
7 years ago
tensor-tang
2f9b5f2383
Merge branch 'develop' into fea/jit/rnn
7 years ago
JiabinYang
014e50c284
test=develop
7 years ago
Yu Yang
3edd32d070
fix(Compile): fix depends error when compile op using cub
...
some operators depend on cub and xxhash by header. The dependency should be declared explicitly rather than declared to pybind.
test=develop
7 years ago
Dang Qingqing
cda60311f9
Fix compling with cuDNN v5
...
test=develop
7 years ago
peizhilin
67562a6fcd
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
tensor-tang
10fb4ceefc
Merge pull request #14351 from tpatejko/tpatejko/mkldnn-elementwise_mul
...
[MKLDNN][JIT][AVX512] Elementwise Mul
7 years ago
jerrywgz
13e254faed
refine code, test=develop
7 years ago
tensor-tang
b4c826c548
Merge remote-tracking branch 'ups/develop' into fea/jit/rnn
...
test=develop
7 years ago
tensor-tang
ce31deb7e9
refine refer code and add lstm refer code
...
test=develop
7 years ago
jerrywgz
79cec53111
add ignore index for sigmoid cross entropy with logits op, test=develop
7 years ago
nhzlx
e62872df8b
fix conflicts
7 years ago
tensor-tang
c2cfb03a72
add lstm jitcode
7 years ago
peizhilin
25adf970b2
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Tao Luo
1d3e9bde1e
Merge pull request #14488 from yihuaxu/develop_7a64d48f5_stack_opt
...
Optimize the stack operator
7 years ago
tensor-tang
7aa3aff338
Merge pull request #14465 from tensor-tang/fea/jit/exp
...
jitcode act support all size
7 years ago
Tao Luo
1b894e495f
Merge pull request #14437 from jczaja/prv-softmax-mkl
...
Introducing MKL to softmax for inference
7 years ago
peizhilin
3a72a634cf
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yihua Xu
a906a361be
Add the macro for NVCC (test=develop)
7 years ago
Yihua Xu
d91740acb1
Revert "Remove the remnant code (test=develop)"
...
This reverts commit be50670348
.
7 years ago
Yihua Xu
be50670348
Remove the remnant code (test=develop)
7 years ago
qingqing01
9eefd2c766
Modify some infer-shape about detection operators in compile-time. ( #14483 )
...
* Modify some infer-shape in compile-time.
7 years ago
Yihua Xu
f4c869d872
Optimize the layer_norm operator with AVX intrinsic function ( #14417 )
...
* Optimize layer_norm operator with AVX intrinsic functions
* Revert the wrong modifications
* Implement the jit kernel for layer_norm operator
* Add math headfile to fix the compile issue (test=develop)
* Add math headfile to fix the compile issue (test=develop)
* Fixed the intrinsic headfile issue (test=develop)
* Fix the conflicts (test=develop)
* Revert for CUDA compiler (test=develop)
* Fixed the cuda depency (test=develop)
* Fix the marco issues (test=develop)
7 years ago
peizhilin
ee0fd78c81
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Yu Yang
f1a392a5fe
Merge pull request #13804 from sneaxiy/rewrite_allocation
...
Rewrite allocation
7 years ago
Yihua Xu
f418f552df
Merge branch 'develop' into develop_7a64d48f5_stack_opt (test=develop)
7 years ago
peizhilin
8443961a4f
add warp_ctc back
7 years ago
qingqing01
fd7e643153
Convolution fusion operator. ( #14449 )
...
* Convolution fusion operator.
* Clean code
test=develop
7 years ago
Yu Yang
98bbfc17be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
7 years ago
peizhilin
4a6769da84
re-organize the cmake file
7 years ago
dengkaipeng
8ef6280c03
Add operator double support. test=develop
7 years ago
peizhilin
1aff40a4c6
exclude warpctc_op on windows
7 years ago
peizhilin
7d51a0e887
disable DSO by default on windows
7 years ago
peizhilin
b967e01cbe
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
Wu Yi
d7bd0361cb
fix dist deps ( #14471 )
...
* fix dist deps test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
7 years ago
Jacek Czaja
9b0eae3023
- Removing partial specialization of sotmax for inference for GPU
...
test=develop
7 years ago
peizhilin
a3e952f41d
add the jit back
...
fix compile error on windows
7 years ago
tensor-tang
a19b3225a1
fix jitcode small size
...
test=develop
7 years ago
Jacek Czaja
be80bb4f28
- Fix to GPU
...
test=develop
7 years ago
tensor-tang
4dbdfa60ef
sigmoid and tanh support all size
...
test=develop
7 years ago
tensor-tang
ccb8963705
refine exp jitcode with all size
...
test=develop
7 years ago
peizhilin
1cc23ef67d
merge from paddle:develop
7 years ago
tensor-tang
d3eae8f61b
refine relu and fix addrelu test
7 years ago
tensor-tang
4e67fe6a12
refine act and vxx with all size
7 years ago
tensor-tang
ba3eaed7a7
exp support all size
7 years ago
tensor-tang
1ffce8c0ae
fix build error on noavx
...
test=develop
7 years ago
Michal Gallus
c69c41604e
MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults
...
test=develop
7 years ago
Michal Gallus
785066eb8a
MKLDNN elementwise_mul: Check if AVX512 is available
...
test=develop
7 years ago
Michal Gallus
08f63c4d12
MKLDNN elementwise_mul: Lint changes to UT & integration
...
test=develop
7 years ago
Michal Gallus
49b09327f6
MKLDNN elementwise_mul: Reorder on non-nchw input, fallback on non-16 divisable fm
...
test=develop
7 years ago
Michal Gallus
d14858e4ba
MKLDNN elementwise_mul: Parallelize mul
7 years ago
Michal Gallus
ed31936ba1
MKLDNN elementwise_mul: Support NCHW, update UT
7 years ago
Tomasz Patejko
700bcbf74f
MKLDNN elementwise_mul: h and w loops implemented in xbyak
7 years ago
Tomasz Patejko
ad09facafe
MKLDNN elementwise_mul: CPU tests initially refactored. MKLDNN mul test for broadcast added
7 years ago
Tomasz Patejko
2d73ad180a
MKLDNN elementwise_mul: simple xbyak version for AVX512
7 years ago
Tomasz Patejko
213ec37d6a
MKLDNN elementwise_add: simple initial implementation of the operator for MKLDNN format
7 years ago
Wu Yi
a2d9b34417
Refine operator cmake ( #14413 )
...
* wip simplify operator framework
* wip
* wip
* done test=develop
* clean test=develop
* fix test=develop
* fix deps test=develop
* fix cpu build test=develop
* fix tensorrt build test=develop
* fix tests test=develop
* fix test=develop
* fix cpu build test=develop
7 years ago
peizhilin
764f97deac
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
peizhilin
8580b7a130
Merge remote-tracking branch 'upstream/develop' into windows/build
7 years ago
tensor-tang
7f17e561d7
Merge pull request #14423 from tensor-tang/fea/jit/act
...
jitcode act relu, exp, sigmoid, tanh
7 years ago
Jiabin Yang
28bd5b7bad
fix space_to_depth_op unicode problem ( #14430 )
...
* fix space_to_depth_op unicode problem
* test=develop
7 years ago
Jacek Czaja
513bb6c151
Squashing MKL based softmax for inference
...
test=develop
- Added profiling to softmax functors
- MKL based softmax inference op
- Fix to softmax compuation via MKL
- cleaning
- Cosmetic fixes to softmax MKL
- Fix to ON_INFER lack of propagation
7 years ago
nhzlx
9b64aac41f
add macro for pool2dDirectCUDAFunctor
...
test=develop
7 years ago
whs
1722678258
Make nce support more distribution. ( #13549 )
...
* Fix truncated normal.
* Fix.
* Make nce support more distribution.
* Fix API.spec.
* Fix python API.
* Fix.
test=develop
* Fix API.spec
test=develop
* Fix sampler.
* Fix order of arguments in python API.
test=develop
7 years ago
nhzlx
83f8c403a7
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into fix_avg_pool_trt_bug
...
test=develop
7 years ago
nhzlx
b969116988
fxi avg pool trt bug and fix cpplint
7 years ago
tensor-tang
1f00723fa3
exp, sigmoid, tanh jitcode support more size
...
test=develop
7 years ago
Qiyang Min
d971d5b875
Merge pull request #14431 from velconia/fix_expand_op_dim_in_compile_time
...
Fix expand op incorrect infer shape
7 years ago
Wu Yi
b32c13dc20
Add cudnn ctc loss ( #12366 )
...
* add cudnn ctc loss
* wip add test test=develop
* wip
* wip
* done test=develop
* move include cudnn test=develop
* test test=develop
* fix build test=develop
* fix build test=develop
* fix build on cudnn5 test=develop
* fix cudnn5 build test=develop
* fix cudnn5 build test=develop
* merge develop softmax functor change test=develop
7 years ago
tensor-tang
8cda7b3d20
Merge remote-tracking branch 'ups/develop' into fea/jit/act
...
test=develop
7 years ago