Paddle

Commit Graph

Author	SHA1	Message	Date
dzhwinter	e23ddf6ae4	status (#12764 )	7 years ago
Tao Luo	d04ef276a5	Merge pull request #12745 from tensor-tang/refine/op/elewise_mul Refine elementwise mul cpu forward	7 years ago
dzhwinter	00463fdfe3	cudnn windows support (#12757 ) * cudnn widndows * "add comment" * "windows support" * "fix cmake error"	7 years ago
dzhwinter	2673798ddb	"fix float16 ShuffleDownSync Bug" (#12756 ) * "fix bug" * "add test case"	7 years ago
tensor-tang	6644ce79a5	add mklml vmul	7 years ago
tensor-tang	ff92b6ba81	Merge pull request #12531 from tensor-tang/refine/op/gru Refine gru cpu forward	7 years ago
Chen Weihang	1e961b145c	Merge pull request #12591 from chenwhql/enforce_msg_polish polish high frequency enforce error message	7 years ago
Yan Chunwei	0a641ba326	add ratio to profiler (#12701 )	7 years ago
tensor-tang	c588c64a76	Merge remote-tracking branch 'ups/develop' into refine/op/gru	7 years ago
chenweihang	da39d84a48	refine by reviewer's advice	7 years ago
tensor-tang	1ab1d03c62	fix missing macro condition	7 years ago
Qiao Longfei	e8fcb71bed	Merge pull request #12620 from jacquesqiao/timeline-support-pure-cpu Timeline support pure cpu	7 years ago
tensor-tang	3bf3e77ac8	Merge remote-tracking branch 'ups/develop' into refine/op/gru	7 years ago
qiaolongfei	5a6c3cd9e0	fix profiler dead lock	7 years ago
tensor-tang	a50889f523	introduce xbyak	7 years ago
qiaolongfei	3f2aa91970	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu	7 years ago
qiaolongfei	e008600b08	optimize code	7 years ago
qiaolongfei	7c649e06c3	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu	7 years ago
Sylwester Fraczek	d74bb6ab9c	fix ut for mkldnn 0.15 - added forcing layout NCHW in mkldnn conv tests	7 years ago
chenweihang	b1dd4149b9	adjust enforce test cases	7 years ago
chenweihang	61052cdbc6	polish high frequency enforce error message	7 years ago
qiaolongfei	954d680b40	fix test_parallel_do.py	7 years ago
tensor-tang	836068569f	Merge remote-tracking branch 'ups/develop' into refine/op/gru	7 years ago
qiaolongfei	1623f1ba4f	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-profiler	7 years ago
qiaolongfei	4c5bcd7859	add guard to profiler	7 years ago
tensor-tang	43cee33a23	add mkl packed gemm	7 years ago
Xin Pan	caf10b474f	make profiler use thread_id from g_thread_id Add a few more RecordEvent. Cleanup	7 years ago
dzhwinter	6d3da458a7	Fix/float16 style (#12446 ) * "rewrite the test case" * "follow comment"	7 years ago
dzhwinter	39ac9e39c2	float16 type support enhance (#12181 ) * cherry picked * "cherry picked platform" * "add comment" * "fix ci"	7 years ago
tensor-tang	4f0383f52e	fix unknown flag	7 years ago
tensor-tang	9788e5ab87	add flags to control num_threads	7 years ago
tensor-tang	10a1c2bb86	control omp num_threads	7 years ago
typhoonzero	54e9fd3f61	fix cudnn enforce	7 years ago
qiaolongfei	a6d30a8607	profiler support cpu	7 years ago
Xin Pan	7781297c70	variants	7 years ago
Tao Luo	e568acbee2	Merge pull request #12092 from velconia/add_deps_to_device_ctx Add framework_proto to device context deps	7 years ago
minqiyang	2cc6ca43a0	Add framework_proto to device context deps	7 years ago
Jacek Czaja	fbe25ef510	MKLDNN: Extending Conv MKLDNN op to reuse MKLDNN primitives (#11750 ) * - Rebase of conv reuse - clag formatter fixes - Fix to conv reuse - Yet another fix - Fix - Fix - clagn format * - comment update	7 years ago
tensor-tang	2e418a5227	fix conflicts	7 years ago
tensor-tang	3df99e72ab	Merge remote-tracking branch 'ups/develop' into refine/set_num_threads fix conflicts	7 years ago
dzhwinter	4ed0b62476	Move fluid::framework::InitDevices into fluid::platform (#11757 ) * move to platform * "move init from framework to platform" * "remove used init" * "fix ci" * "fix ci" * "fix generic" * "fix ci" * "fix ci" * "fix ci" * "disable fragile test"	7 years ago
dzhwinter	99a99ec7e3	"remove lapack" (#11966 )	7 years ago
fengjiayi	ce16b40b04	Merge pull request #11891 from JiayiFeng/dev_eof_exp Add EOFException to represent EOF in C++ reader	7 years ago
Yu Yang	037ce12ee4	Merge pull request #11907 from reyoung/feature/use_dev_ctx_for_op Use std::map for Place <--> DeviceContext	7 years ago
yuyang18	2d0e5592b5	Use std::map for Place <--> DeviceContext	7 years ago
Xin Pan	94cb59ad09	hide utils to legacy	7 years ago
fengjiayi	ed4b2475f5	add an unittest	7 years ago
fengjiayi	8553ac6a95	fix unittests	7 years ago
fengjiayi	3fab4f65a4	Add EOFException to represent EOF in C++ reader	7 years ago
Yan Chunwei	28172bbb8e	add debug to replacing enforce with GLOG for debug (#11244 )	7 years ago
gongweibao	e2b1c5d925	fix code style (#11862 )	7 years ago
mozga-intel	b8a04c2fa1	Duplicated code was moved to common function	7 years ago
tensor-tang	e3a96300bb	move SetNumThreads to platform	7 years ago
Tao Luo	2dae8a4631	Merge pull request #11596 from tensor-tang/refine/mklml/dyload enable dynamic load mklml lib on fluid	7 years ago
Yi Wang	2625178add	No NCCL on macOS (#11652 ) * Make paddle no longer depend on boost * Update enforce.h	7 years ago
Tao Luo	60647c9aa4	Merge pull request #11519 from jczaja/prv-softmax-mkldnn-grad-operator MKLDNN: SoftmaxGrad Op	7 years ago
chengduo	da556ed6d4	enhance ParallelExecutor stable (#11637 )	7 years ago
Jacek Czaja	98f3ad3ba1	- MKLDNN Softmax Grad Op - Added hash function inside of MKLDNN softmax op to be used as handle for primitives stroing in a context - Style fixes to softmax mkldnn op - Fixes after review - Coding style - Fix to style - style fixes - style fix - style fixes - Fix to cody style check - Rephrasing a comment fix t obroken merge Fixes to rebase Conflicts: benchmark/fluid/models/machine_translation.py cmake/external/mkldnn.cmake paddle/fluid/operators/softmax_mkldnn_op.cc - Bumped revision of MKL-DNN up to have softmax backward primitive - Added choosing MKLDNN softmax grad operator - First reuse of softmax backward - Reinvented reusing for softmax - Fix to crash in reinvented reuse - Clang format fixes - Clang format fixes - Improved softmax mkldnn reuse mechanism - clang format fixes - Fix to broken merge - Fix	7 years ago
tensor-tang	d5fb8fa778	Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout" This reverts commit `4d8e8ee226`, reversing changes made to `d6a9f005c8`.	7 years ago
Yu Yang	9b3f48d7e6	Merge pull request #11616 from chengduoZH/fix_parallel_exe Enhance Parallel Executor stable	7 years ago
tensor-tang	28a0ef9522	remove usr local lib when dynamic load lib	7 years ago
tensor-tang	90780e22ce	Revert "MKLDNN layout: Support for sum operator"	7 years ago
chengduoZH	c99fca5f90	Add No Mutex	7 years ago
tensor-tang	3e73a7a924	add usr local lib to dynamic search path	7 years ago
tensor-tang	f503f12925	enable dynamic load mklml lib on fluid	7 years ago
mozga-intel	6512be59ec	MKLDNN layout: the code-review changes	7 years ago
tensor-tang	9a25f2895c	update the default cpu memory with MKLDNN	7 years ago
tensor-tang	a8c2ff316f	refine the initial cpu memory flag for mkldnn	7 years ago
Qiyang Min	046bb5c8cb	Fix NCCLBcast hang up bug in Parallel Executor (#11377 ) * 1. Create buddy allocator in each places before NcclBcast the variables 2. Check the memory usage of ALL gpus rather than the first one * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing 2. NOTE the usage of NCCLGroupGuard * Remove the memory usage check of gpus * Fix code style	7 years ago
Xin Pan	d2afd21021	Remove cuptiFinalize. In cupti samples, only cuptiFlush is used. I can't find any places calling cuptiFinalize and this API can error out as not_implemented in some cuda installation.	7 years ago
qiaolongfei	9ebbfa6bbc	fix build on mac	7 years ago
tensor-tang	056dd40475	add initial memory flag in MB for infer	7 years ago
yuyang18	a1254a86ba	Add lock to record_event.	7 years ago
mozga-intel	3ff9ba0e6b	Mkldnn layout (#11040 ) * Add MKLDNN layout support in Paddle Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance. * Add MKLDNN layout support in activation OP * Don't populate layout from input to output when kMKLDNN in * Refine pool mkldnn op kernel * MKLDNN layout * Remove the inferitance from tensor file * MKLDNN layout: refactoring * Remove additional #define to register new operator * Prepare mkldnn tests to work with layout	7 years ago
Xin Pan	ca2d6d3c66	Merge pull request #11224 from dzhwinter/fix/cudnn fix cudnn version issue	7 years ago
qingqing01	e0a32074bd	Fix PADDLE_ASSERT. (#10981 ) * Enable assertions in CUDA. * Fix PADDLE_ASSERT.	7 years ago
dzhwinter	44c662b4e1	Merge remote-tracking branch 'origin/develop' into fix/cudnn	7 years ago
Yu Yang	c36dd3b338	Merge pull request #11114 from reyoung/feature/yep Try to speed up parallel executor	7 years ago
dzhwinter	2b9ef7e249	"fix"	7 years ago
dzhwinter	75d8e8ca33	"fix compiled in manylinux"	7 years ago
dzhwinter	4777aec9be	"done"	7 years ago
dzhwinter	7971d4a310	Feature/deterministic (#11205 ) * "fix deterministic" * "fix ci" * "fix init"	7 years ago
yuyang18	53dab95b75	Static DSO handle	7 years ago
yuyang18	c5115950a8	Use static for dlsym	7 years ago
yuyang18	7cf8b656a2	Remove lock in device context	7 years ago
Xin Pan	7eca286159	Merge pull request #11078 from panyx0718/improve_profiler allow profiler and timeline to work when dev_ctx is nullptr.	7 years ago
gongweibao	4fb7cc7f5e	Move sync_mode device ctx from grpc server (#10881 )	7 years ago
Xin Pan	75ea577fd3	allow profiler and timeline to work when dev_ctx is nullptr. Sometimes dev_ctx is not available when RecordEvent.	7 years ago
Xin Pan	f14e579cc3	clean up	7 years ago
Xin Pan	3cb6395688	better profiler and benchmark	7 years ago
Xin Pan	0d598cf9f6	Merge pull request #10822 from panyx0718/dist_opt multi-thread handlerequest	7 years ago
Xin Pan	08e4970e45	follow comments	7 years ago
Xin Pan	b4dd4c048d	multi-thread handlerequest Experiment on vgg flower, 2 trainers, 1ps. more trainer could have more speedup. After: Pass = 0, Iters = 327, Speed = (7.52) img/s Before: Pass = 0, Iters = 385, Speed = (6.77) img/s	7 years ago
Krzysztof Binias	0aa01929c1	Add backward	7 years ago
Tao Luo	85b6bb5886	Merge pull request #10747 from jczaja/prv-mkldnn-pooling-reuse Reuse of pooling mkldnn primitives	7 years ago
dzhwinter	0e4467eee4	"fix compile" (#10657 )	7 years ago
Xin Pan	40a2ee9ae8	Merge pull request #10621 from panyx0718/fix_profile Fix a profiler race condition	7 years ago
Jacek Czaja	5f1333058c	- Draft of reuse of pooling mkldnn operator - Finished draft of pooling reusing of operators - Using gethash in PoolGrad added - Removed diagnostic - Added pool mkldnn grad reusing of primitives - Added diagnostic - Removed diagnostic - added dependency to mkldnn data type for pooling mkldnn - Added mkldnn memory data type determining based on template type of op - Compilation warning fix - codying style fixes	7 years ago
yuyang18	dfbe06ccab	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_ninja_build	7 years ago
Xin Pan	94c0a64d62	Fix a profiler race condition In multi-thread condition, EnableProfiler can be called after RecordEvent is constructed. In this case, RecordEvent constructor will not init anything, but RecordEvent destructor will do something since EnableProfiler was called. This PR fixes it.	7 years ago

1 2 3 4 5 ...

296 Commits (2002e71da825ef102e27f6318523369f893338dc)