Commit Graph

296 Commits (2002e71da825ef102e27f6318523369f893338dc)

Author SHA1 Message Date
dzhwinter e23ddf6ae4
status (#12764)
7 years ago
Tao Luo d04ef276a5
Merge pull request #12745 from tensor-tang/refine/op/elewise_mul
7 years ago
dzhwinter 00463fdfe3
cudnn windows support (#12757)
7 years ago
dzhwinter 2673798ddb
"fix float16 ShuffleDownSync Bug" (#12756)
7 years ago
tensor-tang 6644ce79a5 add mklml vmul
7 years ago
tensor-tang ff92b6ba81
Merge pull request #12531 from tensor-tang/refine/op/gru
7 years ago
Chen Weihang 1e961b145c
Merge pull request #12591 from chenwhql/enforce_msg_polish
7 years ago
Yan Chunwei 0a641ba326
add ratio to profiler (#12701)
7 years ago
tensor-tang c588c64a76 Merge remote-tracking branch 'ups/develop' into refine/op/gru
7 years ago
chenweihang da39d84a48 refine by reviewer's advice
7 years ago
tensor-tang 1ab1d03c62 fix missing macro condition
7 years ago
Qiao Longfei e8fcb71bed
Merge pull request #12620 from jacquesqiao/timeline-support-pure-cpu
7 years ago
tensor-tang 3bf3e77ac8 Merge remote-tracking branch 'ups/develop' into refine/op/gru
7 years ago
qiaolongfei 5a6c3cd9e0 fix profiler dead lock
7 years ago
tensor-tang a50889f523 introduce xbyak
7 years ago
qiaolongfei 3f2aa91970 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu
7 years ago
qiaolongfei e008600b08 optimize code
7 years ago
qiaolongfei 7c649e06c3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu
7 years ago
Sylwester Fraczek d74bb6ab9c fix ut for mkldnn 0.15 - added forcing layout NCHW in mkldnn conv tests
7 years ago
chenweihang b1dd4149b9 adjust enforce test cases
7 years ago
chenweihang 61052cdbc6 polish high frequency enforce error message
7 years ago
qiaolongfei 954d680b40 fix test_parallel_do.py
7 years ago
tensor-tang 836068569f Merge remote-tracking branch 'ups/develop' into refine/op/gru
7 years ago
qiaolongfei 1623f1ba4f Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-profiler
7 years ago
qiaolongfei 4c5bcd7859 add guard to profiler
7 years ago
tensor-tang 43cee33a23 add mkl packed gemm
7 years ago
Xin Pan caf10b474f make profiler use thread_id from g_thread_id
7 years ago
dzhwinter 6d3da458a7
Fix/float16 style (#12446)
7 years ago
dzhwinter 39ac9e39c2
float16 type support enhance (#12181)
7 years ago
tensor-tang 4f0383f52e fix unknown flag
7 years ago
tensor-tang 9788e5ab87 add flags to control num_threads
7 years ago
tensor-tang 10a1c2bb86 control omp num_threads
7 years ago
typhoonzero 54e9fd3f61 fix cudnn enforce
7 years ago
qiaolongfei a6d30a8607 profiler support cpu
7 years ago
Xin Pan 7781297c70 variants
7 years ago
Tao Luo e568acbee2
Merge pull request #12092 from velconia/add_deps_to_device_ctx
7 years ago
minqiyang 2cc6ca43a0 Add framework_proto to device context deps
7 years ago
Jacek Czaja fbe25ef510 MKLDNN: Extending Conv MKLDNN op to reuse MKLDNN primitives (#11750)
7 years ago
tensor-tang 2e418a5227 fix conflicts
7 years ago
tensor-tang 3df99e72ab Merge remote-tracking branch 'ups/develop' into refine/set_num_threads
7 years ago
dzhwinter 4ed0b62476
Move fluid::framework::InitDevices into fluid::platform (#11757)
7 years ago
dzhwinter 99a99ec7e3
"remove lapack" (#11966)
7 years ago
fengjiayi ce16b40b04
Merge pull request #11891 from JiayiFeng/dev_eof_exp
7 years ago
Yu Yang 037ce12ee4
Merge pull request #11907 from reyoung/feature/use_dev_ctx_for_op
7 years ago
yuyang18 2d0e5592b5
Use std::map for Place <--> DeviceContext
7 years ago
Xin Pan 94cb59ad09 hide utils to legacy
7 years ago
fengjiayi ed4b2475f5 add an unittest
7 years ago
fengjiayi 8553ac6a95 fix unittests
7 years ago
fengjiayi 3fab4f65a4 Add EOFException to represent EOF in C++ reader
7 years ago
Yan Chunwei 28172bbb8e
add debug to replacing enforce with GLOG for debug (#11244)
7 years ago
gongweibao e2b1c5d925
fix code style (#11862)
7 years ago
mozga-intel b8a04c2fa1 Duplicated code was moved to common function
7 years ago
tensor-tang e3a96300bb move SetNumThreads to platform
7 years ago
Tao Luo 2dae8a4631
Merge pull request #11596 from tensor-tang/refine/mklml/dyload
7 years ago
Yi Wang 2625178add
No NCCL on macOS (#11652)
7 years ago
Tao Luo 60647c9aa4
Merge pull request #11519 from jczaja/prv-softmax-mkldnn-grad-operator
7 years ago
chengduo da556ed6d4
enhance ParallelExecutor stable (#11637)
7 years ago
Jacek Czaja 98f3ad3ba1 - MKLDNN Softmax Grad Op
7 years ago
tensor-tang d5fb8fa778 Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout"
7 years ago
Yu Yang 9b3f48d7e6
Merge pull request #11616 from chengduoZH/fix_parallel_exe
7 years ago
tensor-tang 28a0ef9522 remove usr local lib when dynamic load lib
7 years ago
tensor-tang 90780e22ce
Revert "MKLDNN layout: Support for sum operator"
7 years ago
chengduoZH c99fca5f90 Add No Mutex
7 years ago
tensor-tang 3e73a7a924 add usr local lib to dynamic search path
7 years ago
tensor-tang f503f12925 enable dynamic load mklml lib on fluid
7 years ago
mozga-intel 6512be59ec MKLDNN layout: the code-review changes
7 years ago
tensor-tang 9a25f2895c update the default cpu memory with MKLDNN
7 years ago
tensor-tang a8c2ff316f refine the initial cpu memory flag for mkldnn
7 years ago
Qiyang Min 046bb5c8cb Fix NCCLBcast hang up bug in Parallel Executor (#11377)
7 years ago
Xin Pan d2afd21021 Remove cuptiFinalize.
7 years ago
qiaolongfei 9ebbfa6bbc fix build on mac
7 years ago
tensor-tang 056dd40475 add initial memory flag in MB for infer
7 years ago
yuyang18 a1254a86ba Add lock to record_event.
7 years ago
mozga-intel 3ff9ba0e6b Mkldnn layout (#11040)
7 years ago
Xin Pan ca2d6d3c66
Merge pull request #11224 from dzhwinter/fix/cudnn
7 years ago
qingqing01 e0a32074bd
Fix PADDLE_ASSERT. (#10981)
7 years ago
dzhwinter 44c662b4e1 Merge remote-tracking branch 'origin/develop' into fix/cudnn
7 years ago
Yu Yang c36dd3b338
Merge pull request #11114 from reyoung/feature/yep
7 years ago
dzhwinter 2b9ef7e249 "fix"
7 years ago
dzhwinter 75d8e8ca33 "fix compiled in manylinux"
7 years ago
dzhwinter 4777aec9be "done"
7 years ago
dzhwinter 7971d4a310
Feature/deterministic (#11205)
7 years ago
yuyang18 53dab95b75 Static DSO handle
7 years ago
yuyang18 c5115950a8 Use static for dlsym
7 years ago
yuyang18 7cf8b656a2 Remove lock in device context
7 years ago
Xin Pan 7eca286159
Merge pull request #11078 from panyx0718/improve_profiler
7 years ago
gongweibao 4fb7cc7f5e
Move sync_mode device ctx from grpc server (#10881)
7 years ago
Xin Pan 75ea577fd3 allow profiler and timeline to work when dev_ctx is nullptr.
7 years ago
Xin Pan f14e579cc3 clean up
7 years ago
Xin Pan 3cb6395688 better profiler and benchmark
7 years ago
Xin Pan 0d598cf9f6
Merge pull request #10822 from panyx0718/dist_opt
7 years ago
Xin Pan 08e4970e45 follow comments
7 years ago
Xin Pan b4dd4c048d multi-thread handlerequest
7 years ago
Krzysztof Binias 0aa01929c1 Add backward
7 years ago
Tao Luo 85b6bb5886
Merge pull request #10747 from jczaja/prv-mkldnn-pooling-reuse
7 years ago
dzhwinter 0e4467eee4
"fix compile" (#10657)
7 years ago
Xin Pan 40a2ee9ae8
Merge pull request #10621 from panyx0718/fix_profile
7 years ago
Jacek Czaja 5f1333058c - Draft of reuse of pooling mkldnn operator
7 years ago
yuyang18 dfbe06ccab Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_ninja_build
7 years ago
Xin Pan 94c0a64d62 Fix a profiler race condition
7 years ago