Paddle

Commit Graph

Author	SHA1	Message	Date
fengjiayi	8553ac6a95	fix unittests	7 years ago
fengjiayi	3fab4f65a4	Add EOFException to represent EOF in C++ reader	7 years ago
Yan Chunwei	28172bbb8e	add debug to replacing enforce with GLOG for debug (#11244 )	7 years ago
gongweibao	e2b1c5d925	fix code style (#11862 )	7 years ago
mozga-intel	b8a04c2fa1	Duplicated code was moved to common function	7 years ago
tensor-tang	e3a96300bb	move SetNumThreads to platform	7 years ago
Tao Luo	2dae8a4631	Merge pull request #11596 from tensor-tang/refine/mklml/dyload enable dynamic load mklml lib on fluid	7 years ago
Yi Wang	2625178add	No NCCL on macOS (#11652 ) * Make paddle no longer depend on boost * Update enforce.h	7 years ago
Tao Luo	60647c9aa4	Merge pull request #11519 from jczaja/prv-softmax-mkldnn-grad-operator MKLDNN: SoftmaxGrad Op	7 years ago
chengduo	da556ed6d4	enhance ParallelExecutor stable (#11637 )	7 years ago
Jacek Czaja	98f3ad3ba1	- MKLDNN Softmax Grad Op - Added hash function inside of MKLDNN softmax op to be used as handle for primitives stroing in a context - Style fixes to softmax mkldnn op - Fixes after review - Coding style - Fix to style - style fixes - style fix - style fixes - Fix to cody style check - Rephrasing a comment fix t obroken merge Fixes to rebase Conflicts: benchmark/fluid/models/machine_translation.py cmake/external/mkldnn.cmake paddle/fluid/operators/softmax_mkldnn_op.cc - Bumped revision of MKL-DNN up to have softmax backward primitive - Added choosing MKLDNN softmax grad operator - First reuse of softmax backward - Reinvented reusing for softmax - Fix to crash in reinvented reuse - Clang format fixes - Clang format fixes - Improved softmax mkldnn reuse mechanism - clang format fixes - Fix to broken merge - Fix	7 years ago
tensor-tang	d5fb8fa778	Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout" This reverts commit `4d8e8ee226`, reversing changes made to `d6a9f005c8`.	7 years ago
Yu Yang	9b3f48d7e6	Merge pull request #11616 from chengduoZH/fix_parallel_exe Enhance Parallel Executor stable	7 years ago
tensor-tang	28a0ef9522	remove usr local lib when dynamic load lib	7 years ago
tensor-tang	90780e22ce	Revert "MKLDNN layout: Support for sum operator"	7 years ago
chengduoZH	c99fca5f90	Add No Mutex	7 years ago
tensor-tang	3e73a7a924	add usr local lib to dynamic search path	7 years ago
tensor-tang	f503f12925	enable dynamic load mklml lib on fluid	7 years ago
mozga-intel	6512be59ec	MKLDNN layout: the code-review changes	7 years ago
tensor-tang	9a25f2895c	update the default cpu memory with MKLDNN	7 years ago
tensor-tang	a8c2ff316f	refine the initial cpu memory flag for mkldnn	7 years ago
Qiyang Min	046bb5c8cb	Fix NCCLBcast hang up bug in Parallel Executor (#11377 ) * 1. Create buddy allocator in each places before NcclBcast the variables 2. Check the memory usage of ALL gpus rather than the first one * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing 2. NOTE the usage of NCCLGroupGuard * Remove the memory usage check of gpus * Fix code style	7 years ago
Xin Pan	d2afd21021	Remove cuptiFinalize. In cupti samples, only cuptiFlush is used. I can't find any places calling cuptiFinalize and this API can error out as not_implemented in some cuda installation.	7 years ago
qiaolongfei	9ebbfa6bbc	fix build on mac	7 years ago
tensor-tang	056dd40475	add initial memory flag in MB for infer	7 years ago
yuyang18	a1254a86ba	Add lock to record_event.	7 years ago
mozga-intel	3ff9ba0e6b	Mkldnn layout (#11040 ) * Add MKLDNN layout support in Paddle Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance. * Add MKLDNN layout support in activation OP * Don't populate layout from input to output when kMKLDNN in * Refine pool mkldnn op kernel * MKLDNN layout * Remove the inferitance from tensor file * MKLDNN layout: refactoring * Remove additional #define to register new operator * Prepare mkldnn tests to work with layout	7 years ago
Xin Pan	ca2d6d3c66	Merge pull request #11224 from dzhwinter/fix/cudnn fix cudnn version issue	7 years ago
qingqing01	e0a32074bd	Fix PADDLE_ASSERT. (#10981 ) * Enable assertions in CUDA. * Fix PADDLE_ASSERT.	7 years ago
dzhwinter	44c662b4e1	Merge remote-tracking branch 'origin/develop' into fix/cudnn	7 years ago
Yu Yang	c36dd3b338	Merge pull request #11114 from reyoung/feature/yep Try to speed up parallel executor	7 years ago
dzhwinter	2b9ef7e249	"fix"	7 years ago
dzhwinter	75d8e8ca33	"fix compiled in manylinux"	7 years ago
dzhwinter	4777aec9be	"done"	7 years ago
dzhwinter	7971d4a310	Feature/deterministic (#11205 ) * "fix deterministic" * "fix ci" * "fix init"	7 years ago
yuyang18	53dab95b75	Static DSO handle	7 years ago
yuyang18	c5115950a8	Use static for dlsym	7 years ago
yuyang18	7cf8b656a2	Remove lock in device context	7 years ago
Xin Pan	7eca286159	Merge pull request #11078 from panyx0718/improve_profiler allow profiler and timeline to work when dev_ctx is nullptr.	7 years ago
gongweibao	4fb7cc7f5e	Move sync_mode device ctx from grpc server (#10881 )	7 years ago
Xin Pan	75ea577fd3	allow profiler and timeline to work when dev_ctx is nullptr. Sometimes dev_ctx is not available when RecordEvent.	7 years ago
Xin Pan	f14e579cc3	clean up	7 years ago
Xin Pan	3cb6395688	better profiler and benchmark	7 years ago
Xin Pan	0d598cf9f6	Merge pull request #10822 from panyx0718/dist_opt multi-thread handlerequest	7 years ago
Xin Pan	08e4970e45	follow comments	7 years ago
Xin Pan	b4dd4c048d	multi-thread handlerequest Experiment on vgg flower, 2 trainers, 1ps. more trainer could have more speedup. After: Pass = 0, Iters = 327, Speed = (7.52) img/s Before: Pass = 0, Iters = 385, Speed = (6.77) img/s	7 years ago
Krzysztof Binias	0aa01929c1	Add backward	7 years ago
Tao Luo	85b6bb5886	Merge pull request #10747 from jczaja/prv-mkldnn-pooling-reuse Reuse of pooling mkldnn primitives	7 years ago
dzhwinter	0e4467eee4	"fix compile" (#10657 )	7 years ago
Xin Pan	40a2ee9ae8	Merge pull request #10621 from panyx0718/fix_profile Fix a profiler race condition	7 years ago
Jacek Czaja	5f1333058c	- Draft of reuse of pooling mkldnn operator - Finished draft of pooling reusing of operators - Using gethash in PoolGrad added - Removed diagnostic - Added pool mkldnn grad reusing of primitives - Added diagnostic - Removed diagnostic - added dependency to mkldnn data type for pooling mkldnn - Added mkldnn memory data type determining based on template type of op - Compilation warning fix - codying style fixes	7 years ago
yuyang18	dfbe06ccab	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_ninja_build	7 years ago
Xin Pan	94c0a64d62	Fix a profiler race condition In multi-thread condition, EnableProfiler can be called after RecordEvent is constructed. In this case, RecordEvent constructor will not init anything, but RecordEvent destructor will do something since EnableProfiler was called. This PR fixes it.	7 years ago
yuyang18	dc6ce071d4	Polish cmake	7 years ago
yuyang18	7c777dd549	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/exec_strategy	7 years ago
yuyang18	08295f9877	Add build strategy	7 years ago
typhoonzero	7b0c0273f4	update by comments	7 years ago
typhoonzero	f5840d8925	follow comments	7 years ago
typhoonzero	04bde96e4c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op	7 years ago
fengjiayi	2bff03bc1e	fix a compile error (#10488 )	7 years ago
chengduoZH	345737d0fe	add sync	7 years ago
typhoonzero	a135fec1fc	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op	7 years ago
typhoonzero	17009d0627	workable version	7 years ago
Xin Pan	dce0732d5e	Merge pull request #10380 from panyx0718/dist_timeline timeline for distributed training	7 years ago
typhoonzero	a529d790b6	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op	7 years ago
typhoonzero	3667578ec2	testing	7 years ago
chengduoZH	d36af62c1e	wrap_shfl_x_sync	7 years ago
typhoonzero	d9320dcd94	complete code	7 years ago
Xin Pan	5a9f17f02b	clean up	7 years ago
Xin Pan	76d8b14bce	Add timeline support for distributed training	7 years ago
chengduo	54797abd53	Merge pull request #10347 from chengduoZH/replace___shfl_with__shfl_sync Wrap __shfl	7 years ago
chengduoZH	e97c1a8ca0	fix __shfl	7 years ago
chengduoZH	0cc635497c	merge develop	7 years ago
Yiqun Liu	6084af47ef	Fix the bug when a input variable of op is dispensable. (#10268 ) * Fix the bug when a input variable of op is dispensable. * Add HasInputs/Outputs interfaces to OperatorBase. * Remove the unreferenced header file.	7 years ago
chengduo	4fbde42cdf	Fix __shfl_down_sync_ of cross_entropy (#10345 ) * fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"	7 years ago
chengduoZH	b8f7fa97b6	replace __shfl with __shfl_sync	7 years ago
chengduoZH	90d73c79c3	fix shfl_sync for CUDA8.0	7 years ago
dzhwinter	eb6f9dd5de	Feature/cuda9 cudnn7 (#10140 ) * "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"	7 years ago
Yu Yang	c02ba51de0	Merge pull request #10191 from reyoung/feature/strict_dynload Make dyload strictly use the same ABI in header	7 years ago
Yu Yang	3d53631bad	Make dyload strictly use the same ABI in header	7 years ago
gongweibao	6171705a2c	Potential bug in paddle/fluid/platform/CMakeLists.txt (#9723 ) * fix * nv_library * add with_gpu * revert	7 years ago
Tao Luo	44fa823841	Merge pull request #9949 from mozga-intel/mozga-intel/Mul_mkldnn Initial implementation of multiplication operator for MKLDNN	7 years ago
fengjiayi	9f11da5931	Add synchronous TensorCopy and use it in double buffer	7 years ago
mozga-intel	171471eada	Merge branch 'develop' into mozga-intel/Mul_mkldnn	7 years ago
Yu Yang	c3c7b7bd1b	Merge pull request #9928 from reyoung/feature/stablize_code Use mutex to stablize ncclCtxMap	7 years ago
mozga-intel	6e7b883bdd	Initial implementation of multiplication operator for MKLDNN	7 years ago
Tao Luo	038dbb386e	Merge pull request #9958 from luotao1/find_tensorrt auto find tensorrt library and install in user root	7 years ago
Kexin Zhao	64bf3df0f9	add print support to float16 (#9960 )	7 years ago
Luo Tao	d4682247e1	auto find tensorrt library	7 years ago
Yan Chunwei	186659798f	add tensorrt build support(#9891 )	7 years ago
Yu Yang	093d227a77	Use mutex to stablize ncclCtxMap	7 years ago
Yi Wang	630943c7a7	Update documentation (#9918 )	7 years ago
Yi Wang	b48cf1712b	Fix cpplint errors in transform_test.cu (#9915 ) * Fix cpplint errors with transformer_test.cu * Update	7 years ago
Yi Wang	47609ab2b8	Document transform.h and fix cpplint errors (#9913 )	7 years ago
Yu Yang	6b20b35589	Fix Transformer Hang Problem	7 years ago
Yu Yang	c64190ecbb	Polish NCCLHelper	7 years ago
Yu Yang	7483555a81	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/change_int64	7 years ago
qingqing01	129859e732	Support data type int64 in NCCL. (#9818 )	7 years ago
Kexin Zhao	7ed457e77a	Fix cuda 7.5 error with cublas GEMM (#9811 ) * fix gemm error for cuda 7.5 * fix version number	7 years ago
Yu Yang	40e3fe173c	Make cuda_helper.h Pass cpplint	7 years ago

1 2 3 4 5

249 Commits (0ec1f65cf110ee4e73a7bfa03456b52111426288)