Chen Weihang
4a702ef361
Support SelelctedRows allreduce in multi-cards imperative mode ( #24690 )
...
* support selectedrows allreduce in multi-cards dygraph, test=develop
* remove useless import modules in unittests, test=develop
* add nccl cmake to get nccl version, test=develop
* add if-condition to compiled correctly, test=develop
* add detail version parseing for old nccl, test=develop
* polish camke details, test=develop
* fix remove test cmake error, test=develop
* fix cmake condition, test=develop
* change unittest camke list, test=develop
* fix unittest cmake rule, test=develop, test=framep0
5 years ago
Chen Weihang
d1062d5278
Replace all errors thrown by LOG(FATAL) with PADDLE_THROW ( #24759 )
...
* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop
* remove ci test case, test=develop
* replace all LOG(FATAL) & polish message, test=develop
* fix typo, test=develop
* polish error info detail, test=develop
5 years ago
Zhang Ting
7d0cbfd045
fix negative framework overhead in Profiling Report ( #24850 )
...
* fix negative framework overhead, test=develop
* use overhead summary, test=develop
5 years ago
Chen Weihang
0aed095188
The third time to simplify the C ++ error stack ( #24831 )
...
* simply C++ error stack once again, test=develop
* refactor code remove string pointer and recursive, test=develop
5 years ago
Adam
b490e41c1d
Add isCached() mechanism for BatchNorm and LRN oneDNN operators ( #24798 )
...
* Add isCached() mechanism for BatchNorm and LRN oneDNN operators
test=develop
* Formatting fix
test=develop
5 years ago
Wilber
f8e370ac7f
[Inference] [unittest] Inference unit tests rely on dynamic libraries ( #24743 )
5 years ago
Zhou Wei
d1047d0a69
add WITH_GPU for cudaerror download ( #24056 )
5 years ago
wangchaochaohu
79caed6667
fix the print error of PE record_event and framework overhead in profiler test=develop ( #24744 )
5 years ago
Adam
56a714a19b
Add isCached() machinism to oneDNN pooling primitive ( #24724 )
5 years ago
lidanqing
c3c61d34c1
Update PADDLE_ENFORCE in DNNL related ops ( #24333 )
...
* Update PADDLE_ENFORCE in DNNL related ops
test=develop
* Abstract macro of OP_GET_PLACE_CHECK
test=develop
* update according to reviews
* update GET_PLACE_CPU_CHECK
* fix typo
test=develop
* revert macro
test=develop
5 years ago
wangchaochaohu
dbfe5333c5
Add pe profiler Event ( #24611 )
5 years ago
Adam
586b587519
Add isCached() check in Softmax handler ( #24637 )
...
* Update isCached() to be thread freindly
test=develop
* Add isCached() check inside Softmax handler
test=develop
* Fix PaddleEnforce() message
test=develop
5 years ago
Leo Chen
1d03469685
use vector instead of pointer, test=develop ( #24620 )
5 years ago
Jacek Czaja
3292f0ef58
[onednn] elementwise add broadcasting support ( #24594 )
5 years ago
Yiqun Liu
560c815390
Add some check for CUDA Driver API and NVRTC ( #22719 )
...
* Add the check for whether CUDA Driver and NVRTC is available for the runtime system.
* Call cuInit to initialize the CUDA Driver API before all CUDA callings.
test=develop
* Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting.
test=develop
* Do not initialize CUDA Driver API for windows and macos.
test=develop
* Remove the call of cuInit when entering paddle and enable the test_code_generator.
test=develop
* Add some built-in functions for __half.
test=develop
* Change save_intermediate_out to false in unittest.
test=develop
* Fix error reference to tempropary variable when seting including path for device_code.
test=develop
5 years ago
Adam
dcf17f4813
Add isCached() mechanism to elementwise_add DNNL ( #24563 )
...
* Add isCached() mechanism to elementwise_add
test=develop
* Hide code inside handler
test=develop
5 years ago
pawelpiotrowicz
db2b6b6568
Hide globals & redesign restore PR ( #24279 )
...
test=develop
5 years ago
Chen Weihang
aa0f254fbe
Add macro BOOST_GET to enrich the error information of boost :: get ( #24175 )
...
* add new macro BOOST_GET_SAFELY & unittests, test=develop
* add different macro type, test=develop
* fix get macro type in executor, test=develop
* four macro part change backup
* using one macro for all case, test=develop
* revert attribute change, test=develop
* change to three func to solve gcc4.8 bug, test=develop
* polish some details, test=develop
5 years ago
Pei Yang
8c296dea75
fix compile error(cpuid.h not found) on nvidia jetson platforms. test=develop ( #24329 )
5 years ago
Guo Sheng
4a5de14426
Remove cusolver potrfBatched support on Windows. ( #24338 )
...
test=develop
test=win_gpu
5 years ago
Guo Sheng
1fc6cc502a
Fix cusolver loader for Windows ( #24157 )
...
* Fix cusolver loader for Windows in dynamic_loader.cc. test=develop
* Fix missing CUSOLVER_ROUTINE_EACH_R1.
test=gpu
test=develop
* Add unsupprot for cusolver on Windows temporarily. test=develop
* Fix GetCusolverDsoHandle error message. test=develop
5 years ago
石晓伟
17ac6e2580
update the analysis predictor for multi-stream support, test=develop ( #24046 )
...
* update the analysis predictor, test=develop
* update the unit test, test=develop
* no priority set before the inferface determined, test=develop
* interface name generalization, test=develop
5 years ago
Sylwester Fraczek
e1a7a88057
added reshape transpose matmul fuse pass ( #23754 )
5 years ago
Yiqun Liu
ecfddebbef
Add the implementation of inverse ( #23310 )
5 years ago
wangchaochaohu
6bf26ef156
fix warning mac compiler ( #24138 )
5 years ago
Guo Sheng
a8c0fb4e86
Add cholesky_op ( #23543 )
...
* Add cholesky_op forward part. test=develop
* Complete cholesky_op forward part. test=develop
* Add cholesky_op backward part. test=develop
* Complete cholesky_op backward part. test=develop
* Refine cholesky_op error check and docs. test=develop
* Add grad_check unit test for cholesky_op. test=develop
* Fix sample code in cholesky doc. test=develop
* Refine some error messages of cholesky_op. test=develop
* Refine some error messages of cholesky_op. test=develop
* Remove unused input in cholesky_grad. test=develop
* Remove unused input in cholesky_grad. test=develop
* Fix stream for cusolverDnSetStream. test=develop
* Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code.
test=develop
* Add CUSOLVER ERROR in enforce.h
test=develop
* Fix the missing return value in cholesky. test=develop
5 years ago
wangchaochaohu
6ba7c3ac92
Reduce the construction time of fuction about profiler ( #24117 )
5 years ago
石晓伟
34d7d6aef0
declare the stream::Priority as enum class, test=develop ( #24013 )
5 years ago
Jacek Czaja
c6c65c65c7
[DNNL] Added elementwise_add mkl-dnn inplace ( #23477 )
5 years ago
石晓伟
db6d867383
add boost dependency to cuda_stream ( #24032 )
5 years ago
石晓伟
d2584a7082
New feature: thread local allocator, test=develop ( #23989 )
...
* add the thread_local_allocator, test=develop
* refactor the thread_local_allocator, test=develop
* provides option setting strategy, test=develop
5 years ago
Zhou Wei
7817003795
Optimize the error messages of paddle CUDA API ( #23816 )
...
* Optimize the error messages of paddle CUDA API, test=develop
* fix the error messages of paddle CUDA API, test=develop
* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop
* remove build_ex_string,test=develop
* merge conflict,test=develop
5 years ago
Zhang Ting
b89dd86fb6
Update eigen ( #23203 )
...
* update eigen, test=develop
* remove patches, test=develop
* add definition of -fabi-version, test=develop
* add patch for TensorBlock.h, test=develop
* test windows, test=develop
* only update eigen for Linux, test=develop
* add code comments, test=develop
5 years ago
石晓伟
2d01cc85c4
DeviceContext Split, test=develop ( #23737 )
...
* supports thread-binding stream, test=develop
* avoid using thread_local variables in dtor, test=develop
* modify the stream priority enum, test=develop
5 years ago
guofei
c2a60bb1fa
Correct the wrong name in the flag comment ( #22977 )
...
Correct the name [`FLAGS_sync_nccl_allreduce`](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/flags/others_cn.html#flags-sync-nccl-allreduce ) based on the information from our official website.
5 years ago
Yi Liu
14e7041c6d
Fix CUDAHandleHolder destruction problem. ( #23772 )
...
eagerly release cuda resources before cuda enviroment destroying
test=develop
5 years ago
Michał Gallus
a63bcf9ae7
[DNNL][INT8][FP32] MatMul ( #23395 )
...
* Initial FP32 DNNL MatMul Implementation
* Implement int8 DNNL MatMul
* Unify in-kernel-naming, clean UTs
* MatmuL: Introduce op caching
* Final adjustments
test=develop
* Remove dy_graph disablement
test=develop
* Change dnnl header name to new one
test=develop
* Contrain multi head check to prevent fails
test=develop
* Resolve dnnl header problems on MAC CI
* Variable namings to kernel and skip_grad_ci added
test=develop
* Prevent MAC CI from failing
* Prevent windows build from failing
test=develop
* Modify UTs to conform to the rules
* Modify MatMul aux functions namings
test=develop
5 years ago
littletomatodonkey
1c08a2136e
test=develop, add addmm op ( #23384 )
...
add addmm op
5 years ago
Zeng Jinle
674355a097
fix GET_DATA_SAFELY ptr, test=develop ( #23679 )
5 years ago
silingtong123
c6d14bc839
show the exception messages of cpp inference library in msvc ( #23702 )
5 years ago
Tao Luo
e4f1b1c5e1
solve mklml memory leak ( #23557 )
5 years ago
mozga-intel
3baaee9aab
Remove: NGraph engine from PDPD repository ( #23545 )
...
* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop
* Remove ngraph from:
1. Cmake file
2. Python file
test=develop
5 years ago
Zhang Ting
480530c4e3
API(place-related) error message enhancement ( #23515 )
5 years ago
Chen Weihang
16315d3d9e
Delete Ref & VectorRef and add GetDataSafely ( #22997 )
...
* delete invalid check inferface Ref & VectorRef, test=develop
* fix vector ref delete error, test=develop
* try the new check inferface, test=develop
* change all related code with new check macro, test=develop
* remove static assert, test=develop
* polish detail, test=develop
* skip coverage problem, test=develop
* add new check macro, test=develop
5 years ago
Leo Chen
f297a33285
Dev/fix init flags ( #23465 )
...
* fix init_gflags with 'python -c', test=develop
* add test, test=develop
* use sys.executable instead of python, test=develop
* keep dummy, test=develop
5 years ago
Chen Weihang
7f1ad510bd
Add op inout check macro to simplify error message writing ( #23430 )
...
* add op inout check macro, test=develop
* fix enforce_test, test=develop
5 years ago
Adam
da7c73f847
Delete is_test attribute from activation operators ( #23318 )
...
* Delete is_test from activation operators
test=develop
* Revent unneeded changes
test=develop
5 years ago
石晓伟
5c59d2139e
reverts the commit 23177, test=develop ( #23363 )
5 years ago
Yi Liu
0471476a18
fix nccl comm double free bug ( #23344 )
...
As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.
5 years ago
wangchaochaohu
1ee2a9a424
Profiler refine ( #23294 )
...
* refine output of profiler for child event
5 years ago
Yi Liu
2169e6fb58
Initialize global nccl_comm in PE ( #23275 )
5 years ago
石晓伟
75ebb48a91
supports thread-binding stream, test=develop ( #23177 )
5 years ago
Zeng Jinle
77b4dc80c9
code polish for adding const qualifier, test=develop, test=document_fix ( #23248 )
5 years ago
Zeng Jinle
bba740710d
add cuda resource pool for BufferedReader, test=develop ( #23152 )
5 years ago
Sylwester Fraczek
abee05a8c8
added mkldnn swish activation ( #23041 )
5 years ago
Yi Liu
121b2aed4d
initialize global nccl context in dygraph ( #23037 )
...
initialize global nccl context in dygraph
test=develop
5 years ago
wangchaochaohu
99db0cf762
remove debug log test=develop ( #22994 )
5 years ago
wangchaochaohu
c979c9f2b0
refine the profiler print test=develop ( #22968 )
5 years ago
Zhang Ting
ca9c8b417d
fix compute ratio of profile, test=develop ( #22872 )
5 years ago
wangchaochaohu
dbb0b9b3b6
refine the profiler print ( #22823 )
...
* refine the profiler print test=develop
5 years ago
Zeng Jinle
d41d802ba3
Add flags to limit gpu memory ( #22793 )
...
* add recorded cuda memory apis, fix typo, test=develop
* add more ut, test=develop
* follow comments, test=develop
* fix py35 incompatible issues, test=develop
5 years ago
Zhang Ting
72ff5a09c3
fix print bug of profile, test=develop ( #22804 )
5 years ago
wangchaochaohu
8456c3f4dd
polish the profiler_help code ( #22811 )
5 years ago
wangchaochaohu
7578fcbac4
Profile code refine ( #22800 )
...
* add profiler_help.h to refine the code test=develop
5 years ago
Adam
2b80e9a719
Add cpu_info without XBYAK ( #22716 )
5 years ago
Zhang Ting
f97f3f9301
add framework overhead ratio in profile report ( #22590 )
...
* add framework overhead ratio, test=develop
* print GpuMemcpy overhead, test=develop
5 years ago
wangchaochaohu
611411b90e
Fusion group profile support ( #22718 )
...
* add support for the driver api callback and fix the profiler name show bug
5 years ago
tianshuo78520a
d2ba91aad1
fix typo words ( #22653 )
5 years ago
Yiqun Liu
22bbd54719
Add the support of fp16 in fusion_group ( #22239 )
5 years ago
wangchaochaohu
a089072c8b
fix the profile print error ( #22665 )
...
* fix the profile print error test=develop
5 years ago
wangchaochaohu
c65c6ae534
add flag to control profile level in python API ( #22319 )
...
* add python flag to control profile level test=develop
5 years ago
Chen Weihang
fe685cc185
fix enforce test error, test=develop ( #22610 )
5 years ago
Chen Weihang
266106da75
Fix mismatch with plus sign in the line ( #22588 )
...
* reproduce match error, test=develop, test=document_fix
* fix mismatch error, test=develop, test=document_fix
5 years ago
Wilber
de009152a7
Compile without nccl deps. [2/2] ( #22484 )
...
Compile without nccl deps. [1/2]
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
LielinJiang
2b1386b2b2
optimize performance of interpolate op ( #22436 )
...
* optimize interpolate op, test=develop
5 years ago
wangchaochaohu
77dd0d97bb
use enum class to replace the usage of enum in some condition test=develop ( #22464 )
5 years ago
Wilber
7bc4b09500
add WITH_NCCL option for cmake. ( #22384 )
...
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL
添加了PADDLE_WITH_NCCL定义
单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
Michał Gallus
269db0d1d1
[DNNL] Fix accuracy in INT8 FC ( #22404 )
...
* Enable quantize to reorder to nchw as well
* Correct FC MKL-DNN input dim requirements to accept 3D
* Improve DNNL FC format, error and 3D input handling
test=develop
* Improve error checking in FC
test=develop
* Improve PADDLE_ENFORCE messages in fc-related files
* Remove data layout attribute from obligatory pass args
test=develop
* Fix message in fc_mkldnn_pass to be logically correct
test=develop
5 years ago
wangchaochaohu
621d3e0b66
fix the bug of profile update ( #22207 )
...
* fix the bug of profile update test=develop
5 years ago
石晓伟
ad0dfb17c1
[Feature] Lite subgraph ( #22114 )
5 years ago
Yiqun Liu
96980c2244
Polish the PADDLE_ENFORCE in fusion_group pass related codes. ( #22144 )
...
* Polish the PADDLE_ENFORCE in fusion_group pass related codes.
test=develop
* Correct the unittest because of the change relu_grad's formula.
test=develop
5 years ago
wangchaochaohu
c3876cf82d
add support for nested profiling event and printing in different level ( #22061 )
...
* add support for nested profiling event and printing in different level
5 years ago
zhaoyuchen2018
3d4f2aa689
Refine stack op to improve xlnet performance, test=develop ( #22142 )
...
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Zeng Jinle
4c2df8e4d4
fix allocator strategy comment, test=develop, test=document_fix ( #22121 )
5 years ago
bingyanghuang
7872d06ff4
Add explanation on conv grad for dims<3 ( #22125 )
5 years ago
Chen Weihang
ba8414d3a5
replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop ( #22109 )
5 years ago
Jacek Czaja
b0b27ff699
[MKL-DNN] Conv grad and Batch Norm grad NHWC support ( #22088 )
5 years ago
Zeng Jinle
9587249442
polish allocator strategy doc, test=develop, test=document_fix ( #22095 )
5 years ago
Zeng Jinle
d9f5d1eb29
ag allocator by default, test=develop ( #21837 )
5 years ago
Jacek Czaja
ad8a9cb82c
[MKL-DNN] Pool & LRN Grad Ops NHWC support ( #21747 )
5 years ago
Yiqun Liu
d48320777e
Add the first implememtation of fusion_group op ( #19621 )
...
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop
* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop
* Disable for mac and windows.
test=develop
* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop
* Refine the CUDA kernel to support large dims.
test=develop
* Add DeviceCodePool to manage all device codes.
* Add the first implementation fusion_group op.
* Add unit-test for fusion_group op.
* Add the check of result.
* Add the check of nvrtc in unit-test.
test=develop
* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop
* Disable fusion_group op for mac and windows.
test=develop
* Make the compiling of device code return status instead of hanging up.
test=develop
* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
* Unify fusion_group_op's input and output names.
test=develop
* Add the check of CUDA driver library in unittest.
test=develop
* Refine the calling of PADDLE_ENFORCE.
test=develop
5 years ago
Chen Weihang
2e9082250d
polish default error msg & cublas error hint, test=develop ( #22032 )
5 years ago
Chen Weihang
35ff1568e9
Add error message for cublas inItizalize failed ( #21995 )
5 years ago
Chen Weihang
fbb42173a9
fix no hint problem when use ENFORCE for cuda, test=develop ( #21994 )
5 years ago
Chen Weihang
1fd1f06f11
Rename paddle throw error macro ( #21657 )
...
* rename paddle throw error macro, test=develop
* fix new error use case, test=develop
5 years ago
Adam
e81f0228df
MKL-DNN 1.0 Update ( #20162 )
...
* MKLDNN v1.0 rebase to Paddle 1.6
test=develop
* Add hacky paddle::string::to_string() implementation
* vectorize<int64-t>() -> vectorize() cleanup
test=develop
* PADDLE_ENFORCE and void_cast fixes
test=develop
* Rebase changes
test=develop
* Cosmetics
test=develop
* Delete MKL from mkldnn.cmake
test=develop
* CMake debug commands
test=develop
* Delete MKLDNN_VERBOSE and rebase fixes
test=develop
* Rebase fixes
test=develop
* Temporarily disable int8 resnet101 vgg16 and vgg19 tests
test=develop
* Add libmkldnn.so.1 to python setup
test=develop
* Add libmkldnn.so.1 to inference_lib cmake after rebase
test=develop
* Post rebase fixes + FC int8 changes
test=develop
* Fix LRN NHWC
test=develop
* Fix NHWC conv3d
test=develop
* Windows build fix + next conv3d fix
test=develop
* Fix conv2d on AVX2 machines
test=develop
5 years ago
Zeng Jinle
97e76cb96d
refine dev_ctx.Wait() exception throw, test=develop ( #21600 )
5 years ago
Huihuang Zheng
b241c7329c
Refine a Warning Which Can Occur Not Only During Init ( #21546 )
...
As the title
5 years ago
wangchaochaohu
932aca162d
Add Branch to avoid CPU profiler warning print ( #21556 )
...
* fix profiler warning message in cpu profile mode test=develop
5 years ago
Pei Yang
122b37ce62
make config option DisableGlogInfo() able to mute all inference logs ( #21318 )
...
* make DisableGlogInfo able to mute all logs in inference.
5 years ago
Zhaolong Xing
c5f0293cf3
NV jetson(nano, tx2, xavier) inference compile support ( #21393 )
...
* add jeston compile support
test=develop
* refine the cmake
test=develop
5 years ago
Huihuang Zheng
a71f53d7ac
Add warning message when initialize GLOG failed. ( #21487 )
...
Add warning message when initialize GLOG failed
5 years ago
Tao Luo
01fa4ead61
fix -Wno-error=sign-compare warning in gcc8 ( #21434 )
...
* fix -Wno-error=sign-compare warning in gcc8
test=develop
* fix warning in distributed codes
test=develop
5 years ago
Jie Fang
5e813b53c5
nhwc optimization for batchnorm ( #21090 )
5 years ago
Jacek Czaja
cd43c4440e
[MKL-DNN] LRN and Pool2d (FWD) NHWC support ( #21375 )
5 years ago
wangchaochaohu
8293f21a52
Profile refine ( #21258 )
...
* fix profile api high version test=develop
5 years ago
wangchaochaohu
e0e205ea2d
fix the profiling bug test=develop ( #21396 )
5 years ago
zhouwei25
345b67b5e2
remove warning LNK4006 and warning LNK4221 ( #21226 )
5 years ago
gongweibao
ed2a185248
optimize nhwc for tensor core in ConvOp and ConvGradOp ( #20597 )
5 years ago
Zeng Jinle
cdb3d27985
Fix warn of gcc8 ( #21205 )
...
* fix warnings oof gcc 8 compilation, test=develop
* fix boost::bad_get, test=develop
* refine PADDLE_ENFORCE, test=develop
5 years ago
liuwei1031
d8b6cf2bcd
fix sporadically hang issue on windows( #21201 )
...
cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows
5 years ago
zhaoyuchen2018
b93870e696
Improve topk performance. ( #21087 )
...
* Improve topk performance.
give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.
* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Chen Weihang
b3a3e6f60c
change cuda enforce & add example ( #21142 )
5 years ago
Chen Weihang
27fa9c100b
add examples for resource exhausted error, test=develop ( #21140 )
5 years ago
Chen Weihang
edd6680a71
Further simplify the C++ error info stack ( #21093 )
...
* simplify C++ error stack by rewrite Place, test=develop
* polish assignment overload func, test=develop
5 years ago
joanna.wozna.intel
77c2083586
Add transpose2 INT8 for mkl-dnn ( #19424 )
...
* Add transpose2 INT8 for mkl-dnn
test=develop
* Fix test_transpose_int8_mkldnn
test=develop
* Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"
This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing
changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c.
* Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2""
This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90.
* Add template to TransposeMKLDNNHandler
test=develop
* Resolve conflict
test=develop
* Restore get_size and refactor
test=develop
5 years ago
Chen Weihang
7ee25189c3
Enrich the type of error and declare the error type interfaces ( #21024 )
...
* Enrich the type of error and declare the error type interfaces, test=develop
* adjust tests to adapt new form, test=develop
* add inference deps with error_codes.pb.h, test=develop
* restore stack iter start pos, test=develop
* polish code based review comments, test=develop
5 years ago
Adam
3fda695bb0
Add support for asymetric padding in MKLDNN pool, conv and conv_transpose ( #21062 )
...
* Add asymetric padding support for mkldnn pooling
test=develop
* Add asymetric padding support for mkldnn conv
test=develop
* Add asymetric padding support for mkldnn conv_transpose
test=develop
5 years ago
Zeng Jinle
a710ccc0cb
refine error message of allocator again, test=develop ( #21023 )
5 years ago
wangchaochaohu
7695b713e1
gpu info query refine test=develop ( #20904 )
5 years ago
Chen Weihang
3358455c86
Polish and arrange code in enforce.h ( #20901 )
5 years ago
Chen Weihang
8b59ac3ad0
delete paddle infershape enforce marco ( #20832 )
5 years ago
Chen Weihang
1d1552d106
Make formatted ENFORCE stack adapt to more situations ( #20826 )
...
* Make formatted ENFORCE stack adapt to more situations and polish details, test=develop
* restore template message position, test=develop
5 years ago
Adam
67b59ddb38
Minor MKL-DNN conv int8 performance fixes ( #20753 )
...
test=develop
5 years ago
123malin
95e90aa102
test=develop, add communicator_is_sgd_optimizer flag ( #20677 )
...
* test=develop, communicator_is_sgd_optimizer flags
5 years ago
wopeizl
9e5948230e
add support to gcc8, add docker env test=develop ( #19807 )
...
* add support to gcc8, add docker env test=develop
5 years ago
WangXi
507afa8a8a
Fix dgc nan by stripping nccl from sparseReduce. ( #20630 )
5 years ago
lidanqing
46e93f7c86
Revert "Refactor conv computeINT8" ( #20640 )
...
* Revert "Refactor conv computeINT8 (#19574 )"
This reverts commit 2c32c2d649
.
test=develop
* replace PADDLE_ENFORCE
test=develop
5 years ago
Jacek Czaja
a1cd27f13f
[MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance ( #20241 )
...
* - Flushing mkl-dnn cache
test=develop
- Disabled clearing cache for LoadModel
- Added clearing of mkl-dnn cache when Executor is created
test=develop
- Do not clear for GPU places
test=develop
- compilation fix
test=develop
* - Moved clearing of mkl-dnn cache in destructor of executor
test=develop
* - Compilation fix
test=develop
- Reverted conditional clearing of mkl-dnn cache in Executors's
destructor
test=develop
- compilation fix
5 years ago
Zeng Jinle
4922eb6da5
make_conv_workspace_size_configurable, test=develop ( #20662 )
5 years ago
633WHU
12e4be0382
Dlpack support ( #20039 )
...
* support dlpack to tensor and implement python interface test=develop
* add unittest for _to_dlpack and from_dlpack test=develop
5 years ago
Wilber
751812a674
enable cpu machine to run paddle in gpu lib
...
enable cpu machine to run paddle model in gpu lib
5 years ago
Zeng Jinle
1d1d221f26
refine allocator_flag, test=develop, test=document_fix ( #20400 )
5 years ago
danleifeng
425279a57b
Improve elementwise operators performance in same dimensions. ( #19763 )
...
Improve elementwise operators performance in same dimensions
5 years ago
qingqing01
1a3eef026c
Enable users to create custom cpp op outside framework. ( #19256 )
...
* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.
5 years ago
liym27
24010472d4
fix pool2d pool3d,support asymmetric padding and channel_last ( #19739 )
...
* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.
test=develop,test=document_preview
* fix 'tensors' to 'Tensors'. test=develop,test=document_preview
* add test for converage ValueError.test=develop,test=document_preview
* resolve conflict in test_pool2d. test=develop
5 years ago
Chen Weihang
b916335025
Paddle error message stack shaping and optimization ( #19895 )
...
* shape and optimize paddle error message stack, test=develop
* limit exception type & add unittest, test=develop
* fix multi-platform problem, test=develop
* fix related unnitest failed, test=develop
* add doc & fix unittest errors, test=develop
* fix function name error, test=develop
* update tensor test exception msg compare, test=develop
* remove unittest on win32, the dir format is different, test=develop
* remove useless package, test=develop
* add paddle enforce handler unittest, test=develop
* add exception checkout, test=develop
* fix coverage failed, test=develop
* fix op registry test failed, test=develop
* refactor whole pr, test=develop
* remove test in CMakelist, test=develop
* fix coverage, test=develop
5 years ago
joanna.wozna.intel
1d32897c5c
Fix test pool2d int8 mkldnn ( #19976 )
...
* Fix conv2d+dequantize squash for residual fusion
test=develop
* Correct int8 input
test=develop
* Add if exclude or include padding in pool2d mkldnn
test=develop
5 years ago
Zeng Jinle
37f76407b0
fix cuda dev_ctx allocator cmake deps, test=develop ( #19953 )
5 years ago
Jacek Czaja
5b07ca9cdd
- ReImplemented pooling fwd mkldnn ( #19911 )
...
- First implementation of BWD and FWD of pooling mkl-dnn
- Compilation fix
- Fix
- Fix
- Fix
- Fix to crash
- Compilation fix
- Combined AcquireBacward with Fwd
test=develop
5 years ago
chengduo
d7251a8e1e
Delete local execution scopes ( #19749 )
...
* Add RecordHistoryLocalExecScopes
test=develop
5 years ago
Zeng Jinle
c7f36e7c00
Add lock to cudnn handle calls ( #19845 )
...
* refine reallocate of workspace size, test=develop
* add lock to cudnn handle calls, test=develop
5 years ago
Zeng Jinle
b25d1e758d
remove enforce.h file written, test=develop ( #19897 )
5 years ago
Jacek Czaja
619c797a7f
[MKL-DNN] LRN refactoring ( #19798 )
...
- LRN mkl-dnn kernel refactor
test=develop
- compilation fix
- Another compilation fix
- Compilation fix
- another compilation fix
- compilation fix
- Crash fix
- optional LRN mkldnn workspace
- Added mid allocation
- Workaround for tests
- Removed gradient from is_test ut
- Removed mid for inference
- Reverted LRN mid removal for is_test
- PADDLE_ENFORCE adjusted
- Rebase to templatization commit
- Compilation fix
- compilation fix
test=develop
- lint
test=develop
- Fix to crash
- Rebase to recent codebase
- lin
- lint
- compilation fix
5 years ago
lidanqing
2c32c2d649
Refactor conv computeINT8 ( #19574 )
...
* fix conflicts
test=develop
* change mask_bias_reorder
test=develop
* add ComputeMask function to make code clear
test=develop
* change according to reviews
test=develop
* change according to reviews
test=develop
5 years ago
Adam
c7e688921b
Add template functions for Acquire primitive/primitive_desc ( #19867 )
...
* Add template functions for Acquire primitive/primitive_desc
test=develop
* Move acquire primitive descriptor to protected section
test=develop
5 years ago
Zeng Jinle
13ca364ceb
remove some flags and add comments to some flags, test=develop ( #19813 )
5 years ago
Zeng Jinle
5eb381a3e2
refine reallocate of workspace size, test=develop ( #19843 )
5 years ago
Adam
dfdd73cbc0
Add MKLDNNhandlerT templatized class ( #19801 )
...
test=develop
5 years ago
Zeng Jinle
32b1151f5e
reduce default value of cudnn workspace size, test=develop ( #19780 )
5 years ago