Paddle

Commit Graph

Author	SHA1	Message	Date
Zhang Ting	72ff5a09c3	fix print bug of profile, test=develop (#22804 )	5 years ago
wangchaochaohu	8456c3f4dd	polish the profiler_help code (#22811 )	5 years ago
wangchaochaohu	7578fcbac4	Profile code refine (#22800 ) * add profiler_help.h to refine the code test=develop	5 years ago
Adam	2b80e9a719	Add cpu_info without XBYAK (#22716 )	5 years ago
Zhang Ting	f97f3f9301	add framework overhead ratio in profile report (#22590 ) * add framework overhead ratio, test=develop * print GpuMemcpy overhead, test=develop	5 years ago
wangchaochaohu	611411b90e	Fusion group profile support (#22718 ) * add support for the driver api callback and fix the profiler name show bug	5 years ago
tianshuo78520a	d2ba91aad1	fix typo words (#22653 )	5 years ago
Yiqun Liu	22bbd54719	Add the support of fp16 in fusion_group (#22239 )	5 years ago
wangchaochaohu	a089072c8b	fix the profile print error (#22665 ) * fix the profile print error test=develop	5 years ago
wangchaochaohu	c65c6ae534	add flag to control profile level in python API (#22319 ) * add python flag to control profile level test=develop	5 years ago
Chen Weihang	fe685cc185	fix enforce test error, test=develop (#22610 )	5 years ago
Chen Weihang	266106da75	Fix mismatch with plus sign in the line (#22588 ) * reproduce match error, test=develop, test=document_fix * fix mismatch error, test=develop, test=document_fix	5 years ago
Wilber	de009152a7	Compile without nccl deps. [2/2] (#22484 ) Compile without nccl deps. [1/2] Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
LielinJiang	2b1386b2b2	optimize performance of interpolate op (#22436 ) * optimize interpolate op, test=develop	5 years ago
wangchaochaohu	77dd0d97bb	use enum class to replace the usage of enum in some condition test=develop (#22464 )	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Michał Gallus	269db0d1d1	[DNNL] Fix accuracy in INT8 FC (#22404 ) * Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop	5 years ago
wangchaochaohu	621d3e0b66	fix the bug of profile update (#22207 ) * fix the bug of profile update test=develop	5 years ago
石晓伟	ad0dfb17c1	[Feature] Lite subgraph (#22114 )	5 years ago
Yiqun Liu	96980c2244	Polish the PADDLE_ENFORCE in fusion_group pass related codes. (#22144 ) * Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop	5 years ago
wangchaochaohu	c3876cf82d	add support for nested profiling event and printing in different level (#22061 ) * add support for nested profiling event and printing in different level	5 years ago
zhaoyuchen2018	3d4f2aa689	Refine stack op to improve xlnet performance, test=develop (#22142 ) stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Zeng Jinle	4c2df8e4d4	fix allocator strategy comment, test=develop, test=document_fix (#22121 )	5 years ago
bingyanghuang	7872d06ff4	Add explanation on conv grad for dims<3 (#22125 )	5 years ago
Chen Weihang	ba8414d3a5	replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#22109 )	5 years ago
Jacek Czaja	b0b27ff699	[MKL-DNN] Conv grad and Batch Norm grad NHWC support (#22088 )	5 years ago
Zeng Jinle	9587249442	polish allocator strategy doc, test=develop, test=document_fix (#22095 )	5 years ago
Zeng Jinle	d9f5d1eb29	ag allocator by default, test=develop (#21837 )	5 years ago
Jacek Czaja	ad8a9cb82c	[MKL-DNN] Pool & LRN Grad Ops NHWC support (#21747 )	5 years ago
Yiqun Liu	d48320777e	Add the first implememtation of fusion_group op (#19621 ) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop	5 years ago
Chen Weihang	2e9082250d	polish default error msg & cublas error hint, test=develop (#22032 )	5 years ago
Chen Weihang	35ff1568e9	Add error message for cublas inItizalize failed (#21995 )	5 years ago
Chen Weihang	fbb42173a9	fix no hint problem when use ENFORCE for cuda, test=develop (#21994 )	5 years ago
Chen Weihang	1fd1f06f11	Rename paddle throw error macro (#21657 ) * rename paddle throw error macro, test=develop * fix new error use case, test=develop	6 years ago
Adam	e81f0228df	MKL-DNN 1.0 Update (#20162 ) * MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop	6 years ago
Zeng Jinle	97e76cb96d	refine dev_ctx.Wait() exception throw, test=develop (#21600 )	6 years ago
Huihuang Zheng	b241c7329c	Refine a Warning Which Can Occur Not Only During Init (#21546 ) As the title	6 years ago
wangchaochaohu	932aca162d	Add Branch to avoid CPU profiler warning print (#21556 ) * fix profiler warning message in cpu profile mode test=develop	6 years ago
Pei Yang	122b37ce62	make config option DisableGlogInfo() able to mute all inference logs (#21318 ) * make DisableGlogInfo able to mute all logs in inference.	6 years ago
Zhaolong Xing	c5f0293cf3	NV jetson(nano, tx2, xavier) inference compile support (#21393 ) * add jeston compile support test=develop * refine the cmake test=develop	6 years ago
Huihuang Zheng	a71f53d7ac	Add warning message when initialize GLOG failed. (#21487 ) Add warning message when initialize GLOG failed	6 years ago
Tao Luo	01fa4ead61	fix -Wno-error=sign-compare warning in gcc8 (#21434 ) * fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop	6 years ago
Jie Fang	5e813b53c5	nhwc optimization for batchnorm (#21090 )	6 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	6 years ago
wangchaochaohu	8293f21a52	Profile refine (#21258 ) * fix profile api high version test=develop	6 years ago
wangchaochaohu	e0e205ea2d	fix the profiling bug test=develop (#21396 )	6 years ago
zhouwei25	345b67b5e2	remove warning LNK4006 and warning LNK4221 (#21226 )	6 years ago
gongweibao	ed2a185248	optimize nhwc for tensor core in ConvOp and ConvGradOp (#20597 )	6 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	6 years ago
liuwei1031	d8b6cf2bcd	fix sporadically hang issue on windows(#21201 ) cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows	6 years ago
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
Chen Weihang	b3a3e6f60c	change cuda enforce & add example (#21142 )	6 years ago
Chen Weihang	27fa9c100b	add examples for resource exhausted error, test=develop (#21140 )	6 years ago
Chen Weihang	edd6680a71	Further simplify the C++ error info stack (#21093 ) * simplify C++ error stack by rewrite Place, test=develop * polish assignment overload func, test=develop	6 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	6 years ago
Chen Weihang	7ee25189c3	Enrich the type of error and declare the error type interfaces (#21024 ) * Enrich the type of error and declare the error type interfaces, test=develop * adjust tests to adapt new form, test=develop * add inference deps with error_codes.pb.h, test=develop * restore stack iter start pos, test=develop * polish code based review comments, test=develop	6 years ago
Adam	3fda695bb0	Add support for asymetric padding in MKLDNN pool, conv and conv_transpose (#21062 ) * Add asymetric padding support for mkldnn pooling test=develop * Add asymetric padding support for mkldnn conv test=develop * Add asymetric padding support for mkldnn conv_transpose test=develop	6 years ago
Zeng Jinle	a710ccc0cb	refine error message of allocator again, test=develop (#21023 )	6 years ago
wangchaochaohu	7695b713e1	gpu info query refine test=develop (#20904 )	6 years ago
Chen Weihang	3358455c86	Polish and arrange code in enforce.h (#20901 )	6 years ago
Chen Weihang	8b59ac3ad0	delete paddle infershape enforce marco (#20832 )	6 years ago
Chen Weihang	1d1552d106	Make formatted ENFORCE stack adapt to more situations (#20826 ) * Make formatted ENFORCE stack adapt to more situations and polish details, test=develop * restore template message position, test=develop	6 years ago
Adam	67b59ddb38	Minor MKL-DNN conv int8 performance fixes (#20753 ) test=develop	6 years ago
123malin	95e90aa102	test=develop, add communicator_is_sgd_optimizer flag (#20677 ) * test=develop, communicator_is_sgd_optimizer flags	6 years ago
wopeizl	9e5948230e	add support to gcc8, add docker env test=develop (#19807 ) * add support to gcc8, add docker env test=develop	6 years ago
WangXi	507afa8a8a	Fix dgc nan by stripping nccl from sparseReduce. (#20630 )	6 years ago
lidanqing	46e93f7c86	Revert "Refactor conv computeINT8" (#20640 ) * Revert "Refactor conv computeINT8 (#19574)" This reverts commit `2c32c2d649`. test=develop * replace PADDLE_ENFORCE test=develop	6 years ago
Jacek Czaja	a1cd27f13f	[MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance (#20241 ) * - Flushing mkl-dnn cache test=develop - Disabled clearing cache for LoadModel - Added clearing of mkl-dnn cache when Executor is created test=develop - Do not clear for GPU places test=develop - compilation fix test=develop * - Moved clearing of mkl-dnn cache in destructor of executor test=develop * - Compilation fix test=develop - Reverted conditional clearing of mkl-dnn cache in Executors's destructor test=develop - compilation fix	6 years ago
Zeng Jinle	4922eb6da5	make_conv_workspace_size_configurable, test=develop (#20662 )	6 years ago
633WHU	12e4be0382	Dlpack support (#20039 ) * support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop	6 years ago
Wilber	751812a674	enable cpu machine to run paddle in gpu lib enable cpu machine to run paddle model in gpu lib	6 years ago
Zeng Jinle	1d1d221f26	refine allocator_flag, test=develop, test=document_fix (#20400 )	6 years ago
danleifeng	425279a57b	Improve elementwise operators performance in same dimensions. (#19763 ) Improve elementwise operators performance in same dimensions	6 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	6 years ago
liym27	24010472d4	fix pool2d pool3d,support asymmetric padding and channel_last (#19739 ) * fix pool2d pool3d: 1. support asymmetric padding; 2. support padding algorithm:"SAME" and "VALID"; 3. support channel_last: data_format NHWC and NDHWC; 4. support inferring shape when input with negative dims in compile time; 5. change doc of python API and c++; 6. fix bug in cuda kernel when Attr(adaptive) is true. test=develop,test=document_preview * fix 'tensors' to 'Tensors'. test=develop,test=document_preview * add test for converage ValueError.test=develop,test=document_preview * resolve conflict in test_pool2d. test=develop	6 years ago
Chen Weihang	b916335025	Paddle error message stack shaping and optimization (#19895 ) * shape and optimize paddle error message stack, test=develop * limit exception type & add unittest, test=develop * fix multi-platform problem, test=develop * fix related unnitest failed, test=develop * add doc & fix unittest errors, test=develop * fix function name error, test=develop * update tensor test exception msg compare, test=develop * remove unittest on win32, the dir format is different, test=develop * remove useless package, test=develop * add paddle enforce handler unittest, test=develop * add exception checkout, test=develop * fix coverage failed, test=develop * fix op registry test failed, test=develop * refactor whole pr, test=develop * remove test in CMakelist, test=develop * fix coverage, test=develop	6 years ago
joanna.wozna.intel	1d32897c5c	Fix test pool2d int8 mkldnn (#19976 ) * Fix conv2d+dequantize squash for residual fusion test=develop * Correct int8 input test=develop * Add if exclude or include padding in pool2d mkldnn test=develop	6 years ago
Zeng Jinle	37f76407b0	fix cuda dev_ctx allocator cmake deps, test=develop (#19953 )	6 years ago
Jacek Czaja	5b07ca9cdd	- ReImplemented pooling fwd mkldnn (#19911 ) - First implementation of BWD and FWD of pooling mkl-dnn - Compilation fix - Fix - Fix - Fix - Fix to crash - Compilation fix - Combined AcquireBacward with Fwd test=develop	6 years ago
chengduo	d7251a8e1e	Delete local execution scopes (#19749 ) * Add RecordHistoryLocalExecScopes test=develop	6 years ago
Zeng Jinle	c7f36e7c00	Add lock to cudnn handle calls (#19845 ) * refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop	6 years ago
Zeng Jinle	b25d1e758d	remove enforce.h file written, test=develop (#19897 )	6 years ago
Jacek Czaja	619c797a7f	[MKL-DNN] LRN refactoring (#19798 ) - LRN mkl-dnn kernel refactor test=develop - compilation fix - Another compilation fix - Compilation fix - another compilation fix - compilation fix - Crash fix - optional LRN mkldnn workspace - Added mid allocation - Workaround for tests - Removed gradient from is_test ut - Removed mid for inference - Reverted LRN mid removal for is_test - PADDLE_ENFORCE adjusted - Rebase to templatization commit - Compilation fix - compilation fix test=develop - lint test=develop - Fix to crash - Rebase to recent codebase - lin - lint - compilation fix	6 years ago
lidanqing	2c32c2d649	Refactor conv computeINT8 (#19574 ) * fix conflicts test=develop * change mask_bias_reorder test=develop * add ComputeMask function to make code clear test=develop * change according to reviews test=develop * change according to reviews test=develop	6 years ago
Adam	c7e688921b	Add template functions for Acquire primitive/primitive_desc (#19867 ) * Add template functions for Acquire primitive/primitive_desc test=develop * Move acquire primitive descriptor to protected section test=develop	6 years ago
Zeng Jinle	13ca364ceb	remove some flags and add comments to some flags, test=develop (#19813 )	6 years ago
Zeng Jinle	5eb381a3e2	refine reallocate of workspace size, test=develop (#19843 )	6 years ago
Adam	dfdd73cbc0	Add MKLDNNhandlerT templatized class (#19801 ) test=develop	6 years ago
Zeng Jinle	32b1151f5e	reduce default value of cudnn workspace size, test=develop (#19780 )	6 years ago
Adam	d4413a54bc	Add common CreateKey for mkldnn handlers (#19767 ) test=develop	6 years ago
Yihua Xu	0d6ea52958	Fix the definition issue when used mkl_scsrmm and mkl_dcsrmm functions. (#19774 ) test=develop	6 years ago
Jacek Czaja	9e4c958552	Refactoring activation mkldnn op (#19748 ) test=develop - fix to BWD test=develop	6 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Adam	428b2b9e17	MKLDNN handler cleanup (#19713 ) * MKLDNN handler cleanup * MKLDNN handler cleanup test=develop	6 years ago
XiaoguangHu	27235cf222	Add document annotations for FLAGS that need to be open to external developers test=develop (#19692 ) Add document annotations for FLAGS that need to be open to external developers	6 years ago
Tao Luo	f05d2c519d	paddle::framework::vectorize() templatization [PART3] (#19643 ) * paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop	6 years ago
Yiqun Liu	42b5bec6f9	Integrate NVRTC to support compiling CUDA kernel at runtime (#19422 ) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Adam	e94b26daf5	using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568 ) * using MKLDNNMemoryFormat = mkldnn::memory::format changes test=develop * PADDLE_ENFORCE update test=develop	6 years ago
Tao Luo	49523ea189	replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586 ) * remove unused PADDLE_ASSERT(_IS_NOT_ERROR) * replace PADDLE_ASSERT with PADDLE_ASSERT_MSG test=develop	6 years ago
zhouwei25	84c728013c	fix the compilation issue on windows caused by mkl_CSRMM (#19533 )	6 years ago
Jacek Czaja	cef95ee30d	[MKL-DNN] Refactoring Softmax (#19312 ) * - First set of modifications - Compilation fixes - compilation fix - Another compilation fix - Moved AcquireSoftmaxPrimitiveDescriptor call into handler - MKL-DNN Softmax PD refactor test=develop - Compilation fix test=develop - another compilation fix - cosmetcis test=develop - Compilation fix - Fix to crash when softmax backward is created * - Fixes after review of softmax refactoring test=develop	6 years ago
Zeng Jinle	0a73f7202a	Add retry_allocator for gpu (#19409 ) * add retry_allocator for gpu, test=develop * follow chengduoZH's comments, test=develop * follow huihuang's comments,test=develop * change f,l in enforce.h to be file,line, test=develop * increase code coverage by adding unittests, test=develop * fix CMakeLists.txt, test=develop	6 years ago
Jacek Czaja	ecd9f330c9	[MKL-DNN] Fix to face model on AVX512 platforms (#19282 ) - Refactor step 1 - Compilation fix - Yet another compilation fix - Even more compilation fix - Lint fixes test=develop - Removed deprectaed PADDLE_ENFORCE occurance test=develop - Candidate fix to BN forward - Lint fixes test=develop - Refactoring in data_layout_transform - compilation fix - Another comppilation fix - Step further into darkness - Yet another compilation fix - Yet another compilation fix - missing header - compilation fix - Added MKLDNN -> Paddle conversion in fetch op test=develop - Compilation fix test=develop - Lint test=develop - Mul fix - Fix to MKLDNN MUL op and Elementwise MUL UT test=develop - Workaround for diffrent weights with groups representation Paddle vs MKL-DNN. test=develop - Candidate fix for 5D convolution with groups - Refactor of fix for conv3d and conv2d in fetch op test=develop - Compilation fix - Still same compilation fix - Compilation fix - Compilation fix - Reverted refactoring of fixes - Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format not NHWC test=develop - minor fix in UT test=develop - Lint fixes test=develop	6 years ago
liuwei1031	d6cb1a4122	add dynamic C runtime support on windows, test=develop (#19502 )	6 years ago
Zeng Jinle	c2c5b1b941	remove signal raise msg, test=develop (#19527 )	6 years ago
Zeng Jinle	caf59d0f3f	Add signal message to stderr (#19421 ) * add signal message to stderr, test=develop * add unittests for ugly SignalHandle, test=develop	6 years ago
Yi Liu	efb05ba258	supports multiple NCCL communicators preserved in NCCLCommContext (#19407 ) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop	6 years ago
wopeizl	b8aa37d529	save the callstack information to file when exception throws test=dev… (#19324 ) * save the callstack information to file when exception throws test=develop	6 years ago
Tao Luo	6527a7df67	replace part of PADDLE_ASSERT to PADDLE_ENFORCE (#19285 ) * replace part of PADDLE_ASSERT to PADDLE_ENFORCE test=develop * remove unused fallback_alloc_size_ * add unit-test of CUDAPinnedAllocator test=develop	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
Zeng Jinle	91a0911ca3	Make PADDLE_ENFORCE_EQ support types that cannot be converted to std::string (#19243 ) * make PADDLE_ENFORCE_EQ support cannot to string types, test=develop * follow huihuang's comments, test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Zeng Jinle	002f325dcd	add PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#19211 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
wopeizl	80b7ef6fc8	add tensorrt support for windows (#19084 ) * add tensorrt support for windows	6 years ago
Zhang Ting	c2063217e7	optimize error message for "embedding" and "cross_entropy" OP (#18765 ) * optimize error message, test=develop * optimize error message, test=develop	6 years ago
liuwei1031	a43a763b54	fix warpctc.dll not found issue (#18761 ) * fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop	6 years ago
Zeng Jinle	08fa98f7cc	Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950 ) * fix gpu_info, test=develop * fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop * fix bug again for reserving size, test=develop	6 years ago
Jacek Czaja	5cf2d38594	- Removed passing X from FWD to GRAD via device context (#18911 ) test=develop - Extracted key generation from FWD and GRAD into separate function test=develop - Compilation fix test=develop - another compilation test=develop	6 years ago
Huihuang Zheng	ea6ee76fa9	GPU allocation uses fraction of available memory (#18896 ) GPU allocation uses fraction of available memory, also fix the GetUsed without lock	6 years ago
Jacek Czaja	cfcb96d2df	[MKL-DNN] Fix int8 performance regression (#18758 ) test=develop - optimization of TID to string test=develop	6 years ago
Huihuang Zheng	0d3f16f53e	Try to modify external gflags to solve CI compilation (#18872 )	6 years ago
Huihuang Zheng	cfce4994cf	Merge cuda 9/10 dockerfile with root dockerfile (#18693 ) Also fix a dependency error which may cause compile error	6 years ago
lidanqing	9ecd8ee789	change ComputeINT8 to template version to remove checking dst_datatype code (#18756 ) * change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews * reverse user_residual_memory_p and user_bias_memory_p declaration scope test=develop	6 years ago
Jacek Czaja	95c1816ec0	[MKL-DNN] Extended LRN with reusing via Acquire API (#18675 ) test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Jacek Czaja	0d8e6c9b8b	MKL-DNN upgrade to 0.20 (#18370 ) test=develop	6 years ago
zhouwei25	772e09560e	Optimize the content of error reporting information, print error code and official document web sites (#18671 ) optimize the error reporting information of cuda related API index on develop: 130ac17 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop	6 years ago
Zeng Jinle	ae58afc546	Feature/auto_growth_allocator (#18561 ) * feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop	6 years ago
liuwei1031	759530966c	print out error code of cudaGetDeviceProperties if failed (#18643 )	6 years ago
Jacek Czaja	71d883b8ef	[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585 ) * - Added partial draft of pooling acquire - Workspace support - compilation fix - Added draft of pooling backward reimplementation - Segfault fix - reverted 'any' for diff_dst crewation in pooling - Lint fixes test=develop - lint fixes test=develop - Further lint fixes test=develop * - Fixes after review test=develop * - Lint fixes test=develop * - Even more lint fixes test=develop	6 years ago
Tao Luo	076f833110	add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580 ) * add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
Jacek Czaja	8869d7f735	Activations MKLDNN ops refactoring (#18191 )	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Physher	0caa08ea40	Add mkldnn int8 mul-op kernel (#17834 )	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
chengduo	55baeceddb	Enhance execution error info (#18482 ) * enhance execution error info test=develop	6 years ago
Tao Luo	3f3112ceb0	add shape_blob for cache mkldnn primitive (#18454 ) test=develop	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
Brian Liu	4bc2987d2f	Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964 ) * Fix bug in quantize kernel which cause crash in vgg16/19 model test=develop * refine the code to reduce verbose code; test=develop * remove useless code; test=develop	6 years ago
Leo Zhao	681d3553f1	Fix potential mkldnn concat/pool/conv kernel issues (#18393 ) 1. some key generation method is not aligned with PR#17965 2. enlarge ptr lifetime to avoid memory release if SetBlob fails otherwise it will get core dump. test=develop	6 years ago
HaoRen	9931bc64f5	add dependecy of collective_helper (#18365 ) * add dependecy of collective_helper * test=develop fix dependecy of collective_helper	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Jacek Czaja	c2efdfd5bc	[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146 ) * - Reusing of reuder used in elementwise_add_mkldnn - Added MKL-DNN sum prim reusing test=develop - Compilation fixes test=develop - Yet another compilation fix test=develop - Yet another compilation fix test=develo - Yet another linking fix test=develop - Final compilation fix test=develop - lint fixes test=develop - Lint fixes test=develop * - Fixes after review test=develop	6 years ago
chengduo	4978db2c10	Remove nccl dep when the number of GPU is 1 (#18158 ) * remove nccl dep when the number of GPU is 1 test=develop	6 years ago
gongweibao	f5caf3443c	Fix reinitialized ncclid error! (#18025 )	6 years ago
Jacek Czaja	84bb45c054	[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965 ) * - removed is_reusing_ * - Added TID to keys for reusing apart from softmax PD * - compilation fix * - Yet another compilation fix * - Batch Norm and Conv adapted * - Fix to softmax MT * - Fixes to MT code of MKL-DNN * - Lint fixes test=develop	6 years ago
hutuxian	969e6378b9	Pipeline Concurrency (#17402 ) Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
Zeng Jinle	3925bd81e8	Fix cuda/cudnn version detection error (#17853 ) * fix cuda/cudnn version detection error, test=develop * fix again, test=develop	6 years ago
chengduo	d1169afaa3	remove InstallFailureSignalHandler (#17828 ) test=develop	6 years ago
Leo Zhao	50326563d5	enable mkldnn primitive reuse for platform reorder (#17826 ) test=develop	6 years ago
wangchaochaohu	c10157a5df	revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753 ) * revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop	6 years ago
chengduo	863c75168c	polish error doc (#17772 ) test=develop	6 years ago
gongweibao	0d561ef442	fix 2dconn test=develop (#17681 )	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
mozga-intel	f2694e122d	[NGraph] Enable assign operator for a ngraph, test=develop (#17437 ) * Enable assign operator for a ngraph, test=develop * Cross_entropy operators needs to be updated	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
mozga-intel	109b5aed5a	[NGraph] Enable reshape operator test=develop (#17512 )	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
qingqing01	97f0ec2357	Fix compiling error with cuDNN 5.1 (#17458 ) test=develop	6 years ago
Zeng Jinle	eab34b2df6	fix_dygraph_mem_leak, test=develop (#17396 )	6 years ago
qingqing01	e32c9888f5	Double backward of conv2d. (#17211 ) * Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables，return None in Python.	6 years ago
zhaoyuchen2018	792443ef23	Refine elementwise kernel. (#16952 ) * Refine elementwise kernel. Add a simple cuda kernel if grad x and y both exist Use 2D block cuda kernel to do broadcast. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
chengduo	db5e74ab95	update assert (#17282 ) test=develop	6 years ago
baojun	7bd1d03ee5	Adding lrn op for ngraph engine (#17189 ) * added lrn op test=develop * Added CreateConstant method test=develop * avoid duplicates test=develop	6 years ago
Tao Luo	ff1661f12a	remove unused FLAGS_warpctc_dir (#17162 ) * remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop	6 years ago
Huihuang Zheng	e4a5332416	Fix a typo in gpu_info.cc (#17175 ) test=develop	6 years ago
Huihuang Zheng	b9494058b3	Use CudnnWorkspaceHandle in exhaustive search (#17082 ) 1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop	6 years ago
Zeng Jinle	0c335dcd2c	Make conv cudnn workspace size configurable (#17036 ) * make_conv_cudnn_ws_size_configurable, test=develop * change std::max to std::min test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
xuezhong	742d758747	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_infershape_bug2	6 years ago
xuezhong	5663fbfb0a	fix infershape bug test=develop	6 years ago
Jacek Czaja	87a44b1149	[MKL-DNN] Added reusing of primitive descriptors (fp32) (#16667 ) * - Reuse of conv PD - conv transpose pd reused - Added PD reusing of softmax and Batch Norm - Refactoring and removal of not needed routines of mkl-dnn ops test=develop - Fix to reusing conv test=develop - Lint fixes test=develop - Further lint fixes test=develop - Lint fixes test=develop - lint fixes test=develop - Lint workaround test=develop * - Fix after review on including boost as third party header test=develop * - Fix after review. Name change to something more descriptive test=develop	6 years ago
dongdaxiang	a659b37ace	make lodtensor_printer usable in gpu setting test=develop	6 years ago
Chen Weihang	0b2aec14b6	Revert "Model data cryption link all lib (#16555 )" test=develop This reverts commit `c38c7c5619`.	6 years ago
Chen Weihang	c38c7c5619	Model data cryption link all lib (#16555 ) * link the libwbaes.so into paddle * polish detail, test=develop * try fix mac_pr_ci error, test=develop * add compile option, test=develop * fix ci error, test=develop * ignore failed to find mac lib, test=develop * change cdn to bj, cdn can't get the latest version * trigger ci, test=develop * temporary delete win32 lib linking, test=develop * change https to http, test=develop * turn compile option on to off * turn compile option off to on, test=develop * try lib compiled by gcc4.8, test=develop * update lib version, test=develop * link other lib, test=develop * add setup config * delete false, test=develop * delete no_soname, test=develop * recover so name set * fix, test=develop * adjust make config, test=develop * remove link to wbaes, test=develop * remove useless define, test=develop	6 years ago
guru4elephant	76b49f02ee	Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop Train with pipe reader merge develop	6 years ago
gongweibao	fea91164b7	Fix windows compilation error! (#16546 ) * fix compiled test=develop * follow comments test=develop	6 years ago
dongdaxiang	3a79be6eb3	refine API spec test=develop	6 years ago
dongdaxiang	98dda08a85	fix pull sparse slow problem test=develop	6 years ago
dongdaxiang	93c3c7f9b3	fix dataset testcase problem test=develop	6 years ago
dongdaxiang	d739bab844	fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem test=develop	6 years ago
dongdaxiang	e3107a6ae0	fix windows compile problem test=develop	6 years ago
dongdaxiang	398004ece0	disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer test=develop	6 years ago
dongdaxiang	39362a8415	move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids test=develop	6 years ago
dongdaxiang	a0b59773af	fix code style	6 years ago
dongdaxiang	365be5d559	support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem test=develop	6 years ago
dongdaxiang	dc8cf36e4b	add more example on datagenerator test=develop	6 years ago
dongdaxiang	6bf796df14	refine print fetch list	6 years ago
dongdaxiang	cf1360643f	add printer for fetch variable	6 years ago
Jacek Czaja	2632327429	[MKL-DNN] Tensor modifications revert (#16462 ) * Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)" This reverts commit `13816dd4ac`. Apart from enabling transformer for MKL-DNN * Revert "- MKL-DNN pooling updated to set_prim_desc" This reverts commit `c63f6b2039`. Conflicts: paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc * Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)" test=develop This reverts commit `dec9cf53c8`. * - concat compilation fix - lint test=develop - Lint fixes test=develop - Lint fixes test=develop - Fix Transpose MKLDNN op test=develop	6 years ago
Zeng Jinle	69cb9792ea	Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug Revert "Fix allocator bug"	6 years ago
sneaxiy	5656fa9f7c	fix travis ci test=develop	6 years ago
Zeng Jinle	174d0d0b90	Revert "Fix allocator bug" add include headers to fix travis-ci test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
Zeng Jinle	644e8af4cf	Merge pull request #16424 from sneaxiy/fix_allocator_bug Fix allocator bug	6 years ago
nhzlx	953bdde058	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD test=develop	6 years ago
sneaxiy	2d92b6be98	merge develop test=develop	6 years ago
Zeng Jinle	c64d959343	Merge pull request #16295 from zhhsplendid/zhenghuihuang-dev-2 Add support for init_memory and re-allocate_memory	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
sneaxiy	953214ad97	add more unittest modify allocator strategy remove changes of legacy buddy_allocator test=develop	6 years ago
Wu Yi	b7baeed7bb	fix win gpu build test=develop (#16334 )	6 years ago
zhhsplendid	124f1df481	Add flags for init and re-alloc gpu test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
Wu Yi	6382b62f6b	Collective ops (#15572 ) * wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop	6 years ago
zhhsplendid	22715487dc	add allocator flags test=develop	6 years ago
sneaxiy	fd23262e0c	merge develop, fix conflict test=develop	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
sneaxiy	682f2dbf29	merge develop test=develop	6 years ago
sneaxiy	2c4fcaa683	merge develop	6 years ago
chengduo	0979956619	Add memory profiler (#16137 ) test=develop	6 years ago
chengduo	ad80bde824	Revert "Revert "Add Event for TensorCopy"" (#16035 ) * Revert "Revert "Add Event for TensorCopy" (#16022)" This reverts commit `e2da3a5b22`. * use default stream test=develop	6 years ago
sneaxiy	2a639d5c2a	add allocator chain to fix bug test=develop	6 years ago
chengduo	e2da3a5b22	Revert "Add Event for TensorCopy" (#16022 ) * Revert "Add Event for TensorCopy (#15953)" This reverts commit `7235fd662b`. test=develop * fix CI test=develop	6 years ago
chengduo	7235fd662b	Add Event for TensorCopy (#15953 ) Add Event for TensorCopy	6 years ago
Tao Luo	4efdebc6f6	Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt Optimize gelu operation with mkl erf	6 years ago
dzhwinter	225c11a91f	polish cudnn related code and fix bug. (#15164 ) * staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop	6 years ago
xiaolil1	6724be2b0d	INT8 Pool kernel Key Creation Optimization. (#15883 ) * Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency. test=develop * Optimize key creation of pool fp32 grad. test=develop	6 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	6 years ago
peizhilin	c6472579c0	test=develop	6 years ago
peizhilin	b5d6e38b05	fix build issue for cudaEvent_t test=develop	6 years ago
wopeizl	3ccd8964a4	Merge pull request #15905 from wopeizl/win/fix_eigen fix build issue on windows for sample prop op	6 years ago
chengduo	8e904d322f	Remove unnecessary dependence for profiler (#15899 ) * refile profiler test=develop * follow comment test=develop	6 years ago
Xin Pan	44e7fcddc5	Merge pull request #15844 from panyx0718/infer add per kernel config and remove const_cast.	6 years ago
Jacek Czaja	dec9cf53c8	[MKL-DNN] MKL-DNN specific Tensor modification (#15429 ) * - Implemented draft of primitive desc keeping in Tensor test=develop - TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented - Added nchw and nc formats setting for sake of compatiblity Fixed unit tests - Worakaround to problem with 5D data in conv - Added 3D and 1D MKL-DNN formats for name handles for tensor test=develop - Fix to UTs test=develop - Conv fp32 op was updated Cosmetic fixes test=develop - tensor mkldnn cosmetics test=develop - Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils * - Lint fixes test=develop * - setting prim dec in Tensor , sets also layout to kMKLDNN test=develop * - Moved creation of prim desc totally out of Tensor test=develop * - Cosmetic fixes adter review test=develop	6 years ago
peizhilin	6ccdb1b947	fix build issue on windows for sample prop op test=develop	6 years ago
Dun	c6bd434ffe	add memset CUPTI && test=develop (#15868 )	6 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	6 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	6 years ago
chengduo	3b08c9abf4	enhance profiler (#15842 ) test=develop	6 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	6 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	6 years ago
Dun Liang	35a90e06bf	test=develop	6 years ago
Dun Liang	c9080f516b	test=develop	6 years ago
Dun Liang	1c7bb0e40c	test=develop	6 years ago
Xin Pan	5eb87506bc	add per kernel config and remove const_cast. test=develop	6 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	6 years ago
mozga-intel	13ec2d331b	Enable momentum operator for a ngraph engine (#15673 ) * Enable momentum operator for a ngraph engine test=develop * Update tests test=develop * Unnecessary line of the code as intended was removed test=develop	6 years ago

... 3 4 5 6 7 ...

1051 Commits (81138239db4dbb37cf659ec5688d24ce33f7ab57)