Paddle

Commit Graph

Author	SHA1	Message	Date
Adam	68c6160e63	Add oneDNN fusion_gru kernel (#25594 ) * Add oneDNN fusion_gru kernel and fix fc+gru pass test=develop * Formatting changes test=develop * Lint fixes test=develop * Add memory::format_tag::any to GRU weights test=develop * Fix build with CUDA * Fix build with CUDA v2	5 years ago
Zhaolong Xing	358bc06c72	[CUDNN8 support] : support CUDNN8 (#25664 ) * cunn8 support test=develop * fix ci error test=develop	5 years ago
Pei Yang	b717895f64	Fix registering trt plugin (#25744 ) * develop dynamic shape serilization * add test param for gelu * fix bugs * delete redundant comments * debug * fix conflict. test=develop * fix bug. test=develop * add trt dynamic shape serialized support * fix ernie serialized bug test=develop * fix codestyle test=develop * fix bug test=develop * fix bug.test=develop * modify cmakelist test=develop * fix bug test=develop * fix error message. test=develop * fix trt register plugin based on pr#25003 * add trt dynload * fix deserialization bug of not finding plugin registration * refine code style * recover engine key in tensorrt_subgraph_pass * for ci coverage * add unittest for deserialization Co-authored-by: haozech <chenhaoze94@gmail.com>	5 years ago
Chen Weihang	9b5a65b819	refine init signal handler meg dumper (#25911 )	5 years ago
Chen Weihang	d47304e6d9	Refine paddle error stack format (#25790 ) * refine error stack format * polish compile traceback format * polish detail format	5 years ago
Chen Weihang	2469b578f5	Unified paddle error format when catch system signal (#25765 ) * unified signal error format * refine signal error message	5 years ago
Chen Weihang	1b3081b1b4	Simplify BufferedReader to improve DataLoader performance (#25648 ) * simplify buffered reader to improve DataLoader performance * fix 22 failed unittests * fix cuda pinned context condition * fix test_reader_reset failed * fix two failed unittests * change unittest place * polish error messaage * polish cast op GetExpecctedKernelType * remove debug info in unittest	5 years ago
arlesniak	e52df3b125	Added DNNL cache management for DyGraph (#25624 ) * Added DNNL cache management for DyGraph * move FLAGS_use_mkldnn to more general CMakeLists, getu use of the flag in ClearGradients * missing file * Fixes after review * Bringing back original idea of place for 'use_mkldnn' flag to be accessible from platform nad imperative. * Removed duplicate and added docs * Fixes for CI	5 years ago
joanna.wozna.intel	e5bbffa84c	Add NOMINMAX define due to windows.h max/min macro conflict (#25637 ) test=develop	5 years ago
Chen Weihang	a6abd92dfd	Polish install error hint message (#25531 ) * polish install error hint msg, test=develop * fix variable error, test=develop * polish hint messgae again	5 years ago
Jacek Czaja	7dbc441eab	[oneDNN] cache cosmetics improvement (#25576 )	5 years ago
LielinJiang	7129f544f0	Add bilateral_slice op (#25401 ) * add bilateral slice op	5 years ago
GaoWei8	c10dcff12d	refine PADDLE_ENFORCE (#25456 ) * Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop	5 years ago
Chen Weihang	0b54d54fd8	Fix index overflow bug of the CUDA kernel loop increment (#25435 ) * fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop * replace old macro & for condition, test=develop * polish details, test=develop	5 years ago
Chen Weihang	7be285a66f	remove useless property, test=develop (#25461 ) remove useless property	5 years ago
Jacek Czaja	a5d1592f6c	Added missing oneDNN format (#25450 ) test=develop	5 years ago
Chen Weihang	172d4ecb6c	remove WITH_DSO compile option (#25444 )	5 years ago
Zhen Wang	bb45af02ac	add the c++ part of Imperative QAT. test=develop (#25446 )	5 years ago
GaoWei8	ea7e532598	Refine PADDLE_ENFORCE (#25369 ) * refine PADDLE_ENFORCE test=develop	5 years ago
GaoWei8	fb70682f00	fix PADDLE_ENFORCE (#25297 ) * fix PADDLE_ENFORCE and refine the description test=develop	5 years ago
Chen Weihang	5a959f6e6e	Refactor dynamic dso search functions (#25214 ) * refactor dynamic dso search func, test=develop * polish details, test=develop * polish detail based review comments, test=develop * revert string type change, test=develop	5 years ago
Wilber	4c964abdf7	support build on arm. test=develop (#25212 )	5 years ago
Chen Weihang	353ea9e8ad	Add default cudnn lib path (#25175 ) * add default cudnn lib path, test=develop * change default path in func, test=develop * move to linux branch, test=develop * fix var error in other plat, test=develop	5 years ago
Adam	bd0b38e671	Refactor of conv fp32 oneDNN operator (#25137 ) * Refactor of conv fp32 oneDNN operator test=develop * Formatting fix test=develop * Return Enforces test=develop * GetWeights improvements test=develop	5 years ago
Tao Luo	2996315fc9	fix profiler_test on win32 (#25073 ) * remove disable profiler_test on win32 * add log * enlarge the elapsed time * Revert "add log" test=develop	5 years ago
Jacek Czaja	a7944904d3	[oneDNN]elementwise_add and elementwise_mul int8 support (#24984 ) * Start implementing int8 eltwise add test=develop * - Fix to Michal PR * - Fix test=develop * - Lint fixes test=develop * - Added checking if elementwise_mul can be used test=develop * - Added attribs to skip_attrs_set test=develop * - Improved broadcasting test=develop - fixes to compilation - fix - fix - Lint fixes test=develop * - removed redundant condition test=develop Co-authored-by: Michal Gallus <michal.gallus@intel.com>	5 years ago
hutuxian	5822862d8a	Monitor Framework (#24079 ) * Add a StatValue class in the backend to represent a stat. * Add a singleton StatRegistry to maintain the collection of stats. * For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.	5 years ago
wangchaochaohu	feba131893	fix the sgement fault error of profiler in seqseq model test=develop (#24952 )	5 years ago
Zhou Wei	4058e736ff	temporarily disable these unittests failed on windows (#24942 )	5 years ago
Chen Weihang	4a702ef361	Support SelelctedRows allreduce in multi-cards imperative mode (#24690 ) * support selectedrows allreduce in multi-cards dygraph, test=develop * remove useless import modules in unittests, test=develop * add nccl cmake to get nccl version, test=develop * add if-condition to compiled correctly, test=develop * add detail version parseing for old nccl, test=develop * polish camke details, test=develop * fix remove test cmake error, test=develop * fix cmake condition, test=develop * change unittest camke list, test=develop * fix unittest cmake rule, test=develop, test=framep0	5 years ago
Chen Weihang	d1062d5278	Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759 ) * remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop * remove ci test case, test=develop * replace all LOG(FATAL) & polish message, test=develop * fix typo, test=develop * polish error info detail, test=develop	5 years ago
Zhang Ting	7d0cbfd045	fix negative framework overhead in Profiling Report (#24850 ) * fix negative framework overhead, test=develop * use overhead summary, test=develop	5 years ago
Chen Weihang	0aed095188	The third time to simplify the C ++ error stack (#24831 ) * simply C++ error stack once again, test=develop * refactor code remove string pointer and recursive, test=develop	5 years ago
Adam	b490e41c1d	Add isCached() mechanism for BatchNorm and LRN oneDNN operators (#24798 ) * Add isCached() mechanism for BatchNorm and LRN oneDNN operators test=develop * Formatting fix test=develop	5 years ago
Wilber	f8e370ac7f	[Inference] [unittest] Inference unit tests rely on dynamic libraries (#24743 )	5 years ago
Zhou Wei	d1047d0a69	add WITH_GPU for cudaerror download (#24056 )	5 years ago
wangchaochaohu	79caed6667	fix the print error of PE record_event and framework overhead in profiler test=develop (#24744 )	5 years ago
Adam	56a714a19b	Add isCached() machinism to oneDNN pooling primitive (#24724 )	5 years ago
lidanqing	c3c61d34c1	Update PADDLE_ENFORCE in DNNL related ops (#24333 ) * Update PADDLE_ENFORCE in DNNL related ops test=develop * Abstract macro of OP_GET_PLACE_CHECK test=develop * update according to reviews * update GET_PLACE_CPU_CHECK * fix typo test=develop * revert macro test=develop	5 years ago
wangchaochaohu	dbfe5333c5	Add pe profiler Event (#24611 )	5 years ago
Adam	586b587519	Add isCached() check in Softmax handler (#24637 ) * Update isCached() to be thread freindly test=develop * Add isCached() check inside Softmax handler test=develop * Fix PaddleEnforce() message test=develop	5 years ago
Leo Chen	1d03469685	use vector instead of pointer, test=develop (#24620 )	5 years ago
Jacek Czaja	3292f0ef58	[onednn] elementwise add broadcasting support (#24594 )	5 years ago
Yiqun Liu	560c815390	Add some check for CUDA Driver API and NVRTC (#22719 ) * Add the check for whether CUDA Driver and NVRTC is available for the runtime system. * Call cuInit to initialize the CUDA Driver API before all CUDA callings. test=develop * Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting. test=develop * Do not initialize CUDA Driver API for windows and macos. test=develop * Remove the call of cuInit when entering paddle and enable the test_code_generator. test=develop * Add some built-in functions for __half. test=develop * Change save_intermediate_out to false in unittest. test=develop * Fix error reference to tempropary variable when seting including path for device_code. test=develop	5 years ago
Adam	dcf17f4813	Add isCached() mechanism to elementwise_add DNNL (#24563 ) * Add isCached() mechanism to elementwise_add test=develop * Hide code inside handler test=develop	5 years ago
pawelpiotrowicz	db2b6b6568	Hide globals & redesign restore PR (#24279 ) test=develop	5 years ago
Chen Weihang	aa0f254fbe	Add macro BOOST_GET to enrich the error information of boost :: get (#24175 ) * add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop	5 years ago
Pei Yang	8c296dea75	fix compile error(cpuid.h not found) on nvidia jetson platforms. test=develop (#24329 )	5 years ago
Guo Sheng	4a5de14426	Remove cusolver potrfBatched support on Windows. (#24338 ) test=develop test=win_gpu	5 years ago
Guo Sheng	1fc6cc502a	Fix cusolver loader for Windows (#24157 ) * Fix cusolver loader for Windows in dynamic_loader.cc. test=develop * Fix missing CUSOLVER_ROUTINE_EACH_R1. test=gpu test=develop * Add unsupprot for cusolver on Windows temporarily. test=develop * Fix GetCusolverDsoHandle error message. test=develop	5 years ago
石晓伟	17ac6e2580	update the analysis predictor for multi-stream support, test=develop (#24046 ) * update the analysis predictor, test=develop * update the unit test, test=develop * no priority set before the inferface determined, test=develop * interface name generalization, test=develop	5 years ago
Sylwester Fraczek	e1a7a88057	added reshape transpose matmul fuse pass (#23754 )	5 years ago
Yiqun Liu	ecfddebbef	Add the implementation of inverse (#23310 )	5 years ago
wangchaochaohu	6bf26ef156	fix warning mac compiler (#24138 )	5 years ago
Guo Sheng	a8c0fb4e86	Add cholesky_op (#23543 ) * Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop	5 years ago
wangchaochaohu	6ba7c3ac92	Reduce the construction time of fuction about profiler (#24117 )	5 years ago
石晓伟	34d7d6aef0	declare the stream::Priority as enum class, test=develop (#24013 )	5 years ago
Jacek Czaja	c6c65c65c7	[DNNL] Added elementwise_add mkl-dnn inplace (#23477 )	5 years ago
石晓伟	db6d867383	add boost dependency to cuda_stream (#24032 )	5 years ago
石晓伟	d2584a7082	New feature: thread local allocator, test=develop (#23989 ) * add the thread_local_allocator, test=develop * refactor the thread_local_allocator, test=develop * provides option setting strategy, test=develop	5 years ago
Zhou Wei	7817003795	Optimize the error messages of paddle CUDA API (#23816 ) * Optimize the error messages of paddle CUDA API, test=develop * fix the error messages of paddle CUDA API, test=develop * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop * remove build_ex_string,test=develop * merge conflict,test=develop	5 years ago
Zhang Ting	b89dd86fb6	Update eigen (#23203 ) * update eigen, test=develop * remove patches, test=develop * add definition of -fabi-version, test=develop * add patch for TensorBlock.h, test=develop * test windows, test=develop * only update eigen for Linux, test=develop * add code comments, test=develop	5 years ago
石晓伟	2d01cc85c4	DeviceContext Split, test=develop (#23737 ) * supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop	5 years ago
guofei	c2a60bb1fa	Correct the wrong name in the flag comment (#22977 ) Correct the name [`FLAGS_sync_nccl_allreduce`](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/flags/others_cn.html#flags-sync-nccl-allreduce) based on the information from our official website.	5 years ago
Yi Liu	14e7041c6d	Fix CUDAHandleHolder destruction problem. (#23772 ) eagerly release cuda resources before cuda enviroment destroying test=develop	5 years ago
Michał Gallus	a63bcf9ae7	[DNNL][INT8][FP32] MatMul (#23395 ) * Initial FP32 DNNL MatMul Implementation * Implement int8 DNNL MatMul * Unify in-kernel-naming, clean UTs * MatmuL: Introduce op caching * Final adjustments test=develop * Remove dy_graph disablement test=develop * Change dnnl header name to new one test=develop * Contrain multi head check to prevent fails test=develop * Resolve dnnl header problems on MAC CI * Variable namings to kernel and skip_grad_ci added test=develop * Prevent MAC CI from failing * Prevent windows build from failing test=develop * Modify UTs to conform to the rules * Modify MatMul aux functions namings test=develop	5 years ago
littletomatodonkey	1c08a2136e	test=develop, add addmm op (#23384 ) add addmm op	5 years ago
Zeng Jinle	674355a097	fix GET_DATA_SAFELY ptr, test=develop (#23679 )	5 years ago
silingtong123	c6d14bc839	show the exception messages of cpp inference library in msvc (#23702 )	5 years ago
Tao Luo	e4f1b1c5e1	solve mklml memory leak (#23557 )	5 years ago
mozga-intel	3baaee9aab	Remove: NGraph engine from PDPD repository (#23545 ) * Remove the NGraph engine from PDPD repository 1. Each operator was removed from the operator's directory 2. Each test was removed from the unittest directory 3. The parallel executor support was removed from the PDPD 4. The CMake file was removed from the PDPD 5. The NG flags were removed from the repository test=develop * Remove ngraph from: 1. Cmake file 2. Python file test=develop	5 years ago
Zhang Ting	480530c4e3	API(place-related) error message enhancement (#23515 )	5 years ago
Chen Weihang	16315d3d9e	Delete Ref & VectorRef and add GetDataSafely (#22997 ) * delete invalid check inferface Ref & VectorRef, test=develop * fix vector ref delete error, test=develop * try the new check inferface, test=develop * change all related code with new check macro, test=develop * remove static assert, test=develop * polish detail, test=develop * skip coverage problem, test=develop * add new check macro, test=develop	5 years ago
Leo Chen	f297a33285	Dev/fix init flags (#23465 ) * fix init_gflags with 'python -c', test=develop * add test, test=develop * use sys.executable instead of python, test=develop * keep dummy, test=develop	5 years ago
Chen Weihang	7f1ad510bd	Add op inout check macro to simplify error message writing (#23430 ) * add op inout check macro, test=develop * fix enforce_test, test=develop	5 years ago
Adam	da7c73f847	Delete is_test attribute from activation operators (#23318 ) * Delete is_test from activation operators test=develop * Revent unneeded changes test=develop	5 years ago
石晓伟	5c59d2139e	reverts the commit 23177, test=develop (#23363 )	5 years ago
Yi Liu	0471476a18	fix nccl comm double free bug (#23344 ) As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.	5 years ago
wangchaochaohu	1ee2a9a424	Profiler refine (#23294 ) * refine output of profiler for child event	5 years ago
Yi Liu	2169e6fb58	Initialize global nccl_comm in PE (#23275 )	5 years ago
石晓伟	75ebb48a91	supports thread-binding stream, test=develop (#23177 )	5 years ago
Zeng Jinle	77b4dc80c9	code polish for adding const qualifier, test=develop, test=document_fix (#23248 )	5 years ago
Zeng Jinle	bba740710d	add cuda resource pool for BufferedReader, test=develop (#23152 )	5 years ago
Sylwester Fraczek	abee05a8c8	added mkldnn swish activation (#23041 )	5 years ago
Yi Liu	121b2aed4d	initialize global nccl context in dygraph (#23037 ) initialize global nccl context in dygraph test=develop	5 years ago
wangchaochaohu	99db0cf762	remove debug log test=develop (#22994 )	5 years ago
wangchaochaohu	c979c9f2b0	refine the profiler print test=develop (#22968 )	5 years ago
Zhang Ting	ca9c8b417d	fix compute ratio of profile, test=develop (#22872 )	5 years ago
wangchaochaohu	dbb0b9b3b6	refine the profiler print (#22823 ) * refine the profiler print test=develop	5 years ago
Zeng Jinle	d41d802ba3	Add flags to limit gpu memory (#22793 ) * add recorded cuda memory apis, fix typo, test=develop * add more ut, test=develop * follow comments, test=develop * fix py35 incompatible issues, test=develop	5 years ago
Zhang Ting	72ff5a09c3	fix print bug of profile, test=develop (#22804 )	5 years ago
wangchaochaohu	8456c3f4dd	polish the profiler_help code (#22811 )	5 years ago
wangchaochaohu	7578fcbac4	Profile code refine (#22800 ) * add profiler_help.h to refine the code test=develop	5 years ago
Adam	2b80e9a719	Add cpu_info without XBYAK (#22716 )	5 years ago
Zhang Ting	f97f3f9301	add framework overhead ratio in profile report (#22590 ) * add framework overhead ratio, test=develop * print GpuMemcpy overhead, test=develop	5 years ago
wangchaochaohu	611411b90e	Fusion group profile support (#22718 ) * add support for the driver api callback and fix the profiler name show bug	5 years ago
tianshuo78520a	d2ba91aad1	fix typo words (#22653 )	5 years ago
Yiqun Liu	22bbd54719	Add the support of fp16 in fusion_group (#22239 )	5 years ago
wangchaochaohu	a089072c8b	fix the profile print error (#22665 ) * fix the profile print error test=develop	5 years ago
wangchaochaohu	c65c6ae534	add flag to control profile level in python API (#22319 ) * add python flag to control profile level test=develop	5 years ago
Chen Weihang	fe685cc185	fix enforce test error, test=develop (#22610 )	5 years ago
Chen Weihang	266106da75	Fix mismatch with plus sign in the line (#22588 ) * reproduce match error, test=develop, test=document_fix * fix mismatch error, test=develop, test=document_fix	5 years ago
Wilber	de009152a7	Compile without nccl deps. [2/2] (#22484 ) Compile without nccl deps. [1/2] Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
LielinJiang	2b1386b2b2	optimize performance of interpolate op (#22436 ) * optimize interpolate op, test=develop	5 years ago
wangchaochaohu	77dd0d97bb	use enum class to replace the usage of enum in some condition test=develop (#22464 )	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Michał Gallus	269db0d1d1	[DNNL] Fix accuracy in INT8 FC (#22404 ) * Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop	5 years ago
wangchaochaohu	621d3e0b66	fix the bug of profile update (#22207 ) * fix the bug of profile update test=develop	6 years ago
石晓伟	ad0dfb17c1	[Feature] Lite subgraph (#22114 )	6 years ago
Yiqun Liu	96980c2244	Polish the PADDLE_ENFORCE in fusion_group pass related codes. (#22144 ) * Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop	6 years ago
wangchaochaohu	c3876cf82d	add support for nested profiling event and printing in different level (#22061 ) * add support for nested profiling event and printing in different level	6 years ago
zhaoyuchen2018	3d4f2aa689	Refine stack op to improve xlnet performance, test=develop (#22142 ) stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
Zeng Jinle	4c2df8e4d4	fix allocator strategy comment, test=develop, test=document_fix (#22121 )	6 years ago
bingyanghuang	7872d06ff4	Add explanation on conv grad for dims<3 (#22125 )	6 years ago
Chen Weihang	ba8414d3a5	replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#22109 )	6 years ago
Jacek Czaja	b0b27ff699	[MKL-DNN] Conv grad and Batch Norm grad NHWC support (#22088 )	6 years ago
Zeng Jinle	9587249442	polish allocator strategy doc, test=develop, test=document_fix (#22095 )	6 years ago
Zeng Jinle	d9f5d1eb29	ag allocator by default, test=develop (#21837 )	6 years ago
Jacek Czaja	ad8a9cb82c	[MKL-DNN] Pool & LRN Grad Ops NHWC support (#21747 )	6 years ago
Yiqun Liu	d48320777e	Add the first implememtation of fusion_group op (#19621 ) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop	6 years ago
Chen Weihang	2e9082250d	polish default error msg & cublas error hint, test=develop (#22032 )	6 years ago
Chen Weihang	35ff1568e9	Add error message for cublas inItizalize failed (#21995 )	6 years ago
Chen Weihang	fbb42173a9	fix no hint problem when use ENFORCE for cuda, test=develop (#21994 )	6 years ago
Chen Weihang	1fd1f06f11	Rename paddle throw error macro (#21657 ) * rename paddle throw error macro, test=develop * fix new error use case, test=develop	6 years ago
Adam	e81f0228df	MKL-DNN 1.0 Update (#20162 ) * MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop	6 years ago
Zeng Jinle	97e76cb96d	refine dev_ctx.Wait() exception throw, test=develop (#21600 )	6 years ago
Huihuang Zheng	b241c7329c	Refine a Warning Which Can Occur Not Only During Init (#21546 ) As the title	6 years ago
wangchaochaohu	932aca162d	Add Branch to avoid CPU profiler warning print (#21556 ) * fix profiler warning message in cpu profile mode test=develop	6 years ago
Pei Yang	122b37ce62	make config option DisableGlogInfo() able to mute all inference logs (#21318 ) * make DisableGlogInfo able to mute all logs in inference.	6 years ago
Zhaolong Xing	c5f0293cf3	NV jetson(nano, tx2, xavier) inference compile support (#21393 ) * add jeston compile support test=develop * refine the cmake test=develop	6 years ago
Huihuang Zheng	a71f53d7ac	Add warning message when initialize GLOG failed. (#21487 ) Add warning message when initialize GLOG failed	6 years ago
Tao Luo	01fa4ead61	fix -Wno-error=sign-compare warning in gcc8 (#21434 ) * fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop	6 years ago
Jie Fang	5e813b53c5	nhwc optimization for batchnorm (#21090 )	6 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	6 years ago
wangchaochaohu	8293f21a52	Profile refine (#21258 ) * fix profile api high version test=develop	6 years ago
wangchaochaohu	e0e205ea2d	fix the profiling bug test=develop (#21396 )	6 years ago
zhouwei25	345b67b5e2	remove warning LNK4006 and warning LNK4221 (#21226 )	6 years ago
gongweibao	ed2a185248	optimize nhwc for tensor core in ConvOp and ConvGradOp (#20597 )	6 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	6 years ago
liuwei1031	d8b6cf2bcd	fix sporadically hang issue on windows(#21201 ) cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows	6 years ago
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
Chen Weihang	b3a3e6f60c	change cuda enforce & add example (#21142 )	6 years ago
Chen Weihang	27fa9c100b	add examples for resource exhausted error, test=develop (#21140 )	6 years ago
Chen Weihang	edd6680a71	Further simplify the C++ error info stack (#21093 ) * simplify C++ error stack by rewrite Place, test=develop * polish assignment overload func, test=develop	6 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	6 years ago
Chen Weihang	7ee25189c3	Enrich the type of error and declare the error type interfaces (#21024 ) * Enrich the type of error and declare the error type interfaces, test=develop * adjust tests to adapt new form, test=develop * add inference deps with error_codes.pb.h, test=develop * restore stack iter start pos, test=develop * polish code based review comments, test=develop	6 years ago
Adam	3fda695bb0	Add support for asymetric padding in MKLDNN pool, conv and conv_transpose (#21062 ) * Add asymetric padding support for mkldnn pooling test=develop * Add asymetric padding support for mkldnn conv test=develop * Add asymetric padding support for mkldnn conv_transpose test=develop	6 years ago
Zeng Jinle	a710ccc0cb	refine error message of allocator again, test=develop (#21023 )	6 years ago
wangchaochaohu	7695b713e1	gpu info query refine test=develop (#20904 )	6 years ago
Chen Weihang	3358455c86	Polish and arrange code in enforce.h (#20901 )	6 years ago

1 2 3 4 5 ...

1041 Commits (d0a5620575a3ce94e0a7a5a20192e9307b0b9c93)