Commit Graph

383 Commits (d93b2d0365355430f3db723dc3e278851b7a88b4)

Author SHA1 Message Date
qiaolongfei 5a6c3cd9e0 fix profiler dead lock
7 years ago
tensor-tang a50889f523 introduce xbyak
7 years ago
qiaolongfei 3f2aa91970 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu
7 years ago
qiaolongfei e008600b08 optimize code
7 years ago
qiaolongfei 7c649e06c3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into timeline-support-pure-cpu
7 years ago
Sylwester Fraczek d74bb6ab9c fix ut for mkldnn 0.15 - added forcing layout NCHW in mkldnn conv tests
7 years ago
chenweihang b1dd4149b9 adjust enforce test cases
7 years ago
chenweihang 61052cdbc6 polish high frequency enforce error message
7 years ago
qiaolongfei 954d680b40 fix test_parallel_do.py
7 years ago
tensor-tang 836068569f Merge remote-tracking branch 'ups/develop' into refine/op/gru
7 years ago
qiaolongfei 1623f1ba4f Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-profiler
7 years ago
qiaolongfei 4c5bcd7859 add guard to profiler
7 years ago
tensor-tang 43cee33a23 add mkl packed gemm
7 years ago
Xin Pan caf10b474f make profiler use thread_id from g_thread_id
7 years ago
dzhwinter 6d3da458a7
Fix/float16 style (#12446)
7 years ago
dzhwinter 39ac9e39c2
float16 type support enhance (#12181)
7 years ago
tensor-tang 4f0383f52e fix unknown flag
7 years ago
tensor-tang 9788e5ab87 add flags to control num_threads
7 years ago
tensor-tang 10a1c2bb86 control omp num_threads
7 years ago
typhoonzero 54e9fd3f61 fix cudnn enforce
7 years ago
qiaolongfei a6d30a8607 profiler support cpu
7 years ago
Xin Pan 7781297c70 variants
7 years ago
Tao Luo e568acbee2
Merge pull request #12092 from velconia/add_deps_to_device_ctx
7 years ago
minqiyang 2cc6ca43a0 Add framework_proto to device context deps
7 years ago
Jacek Czaja fbe25ef510 MKLDNN: Extending Conv MKLDNN op to reuse MKLDNN primitives (#11750)
7 years ago
tensor-tang 2e418a5227 fix conflicts
7 years ago
tensor-tang 3df99e72ab Merge remote-tracking branch 'ups/develop' into refine/set_num_threads
7 years ago
dzhwinter 4ed0b62476
Move fluid::framework::InitDevices into fluid::platform (#11757)
7 years ago
dzhwinter 99a99ec7e3
"remove lapack" (#11966)
7 years ago
fengjiayi ce16b40b04
Merge pull request #11891 from JiayiFeng/dev_eof_exp
7 years ago
Yu Yang 037ce12ee4
Merge pull request #11907 from reyoung/feature/use_dev_ctx_for_op
7 years ago
yuyang18 2d0e5592b5
Use std::map for Place <--> DeviceContext
7 years ago
Xin Pan 94cb59ad09 hide utils to legacy
7 years ago
fengjiayi ed4b2475f5 add an unittest
7 years ago
fengjiayi 8553ac6a95 fix unittests
7 years ago
fengjiayi 3fab4f65a4 Add EOFException to represent EOF in C++ reader
7 years ago
Yan Chunwei 28172bbb8e
add debug to replacing enforce with GLOG for debug (#11244)
7 years ago
gongweibao e2b1c5d925
fix code style (#11862)
7 years ago
mozga-intel b8a04c2fa1 Duplicated code was moved to common function
7 years ago
tensor-tang e3a96300bb move SetNumThreads to platform
7 years ago
Tao Luo 2dae8a4631
Merge pull request #11596 from tensor-tang/refine/mklml/dyload
7 years ago
Yi Wang 2625178add
No NCCL on macOS (#11652)
7 years ago
Tao Luo 60647c9aa4
Merge pull request #11519 from jczaja/prv-softmax-mkldnn-grad-operator
7 years ago
chengduo da556ed6d4
enhance ParallelExecutor stable (#11637)
7 years ago
Jacek Czaja 98f3ad3ba1 - MKLDNN Softmax Grad Op
7 years ago
tensor-tang d5fb8fa778 Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout"
7 years ago
Yu Yang 9b3f48d7e6
Merge pull request #11616 from chengduoZH/fix_parallel_exe
7 years ago
tensor-tang 28a0ef9522 remove usr local lib when dynamic load lib
7 years ago
tensor-tang 90780e22ce
Revert "MKLDNN layout: Support for sum operator"
7 years ago
chengduoZH c99fca5f90 Add No Mutex
7 years ago
tensor-tang 3e73a7a924 add usr local lib to dynamic search path
7 years ago
tensor-tang f503f12925 enable dynamic load mklml lib on fluid
7 years ago
mozga-intel 6512be59ec MKLDNN layout: the code-review changes
7 years ago
tensor-tang 9a25f2895c update the default cpu memory with MKLDNN
7 years ago
tensor-tang a8c2ff316f refine the initial cpu memory flag for mkldnn
7 years ago
Qiyang Min 046bb5c8cb Fix NCCLBcast hang up bug in Parallel Executor (#11377)
7 years ago
Xin Pan d2afd21021 Remove cuptiFinalize.
7 years ago
qiaolongfei 9ebbfa6bbc fix build on mac
7 years ago
tensor-tang 056dd40475 add initial memory flag in MB for infer
7 years ago
yuyang18 a1254a86ba Add lock to record_event.
7 years ago
mozga-intel 3ff9ba0e6b Mkldnn layout (#11040)
7 years ago
Xin Pan ca2d6d3c66
Merge pull request #11224 from dzhwinter/fix/cudnn
7 years ago
qingqing01 e0a32074bd
Fix PADDLE_ASSERT. (#10981)
7 years ago
dzhwinter 44c662b4e1 Merge remote-tracking branch 'origin/develop' into fix/cudnn
7 years ago
Yu Yang c36dd3b338
Merge pull request #11114 from reyoung/feature/yep
7 years ago
dzhwinter 2b9ef7e249 "fix"
7 years ago
dzhwinter 75d8e8ca33 "fix compiled in manylinux"
7 years ago
dzhwinter 4777aec9be "done"
7 years ago
dzhwinter 7971d4a310
Feature/deterministic (#11205)
7 years ago
yuyang18 53dab95b75 Static DSO handle
7 years ago
yuyang18 c5115950a8 Use static for dlsym
7 years ago
yuyang18 7cf8b656a2 Remove lock in device context
7 years ago
Xin Pan 7eca286159
Merge pull request #11078 from panyx0718/improve_profiler
7 years ago
gongweibao 4fb7cc7f5e
Move sync_mode device ctx from grpc server (#10881)
7 years ago
Xin Pan 75ea577fd3 allow profiler and timeline to work when dev_ctx is nullptr.
7 years ago
Xin Pan f14e579cc3 clean up
7 years ago
Xin Pan 3cb6395688 better profiler and benchmark
7 years ago
Xin Pan 0d598cf9f6
Merge pull request #10822 from panyx0718/dist_opt
7 years ago
Xin Pan 08e4970e45 follow comments
7 years ago
Xin Pan b4dd4c048d multi-thread handlerequest
7 years ago
Krzysztof Binias 0aa01929c1 Add backward
7 years ago
Tao Luo 85b6bb5886
Merge pull request #10747 from jczaja/prv-mkldnn-pooling-reuse
7 years ago
dzhwinter 0e4467eee4
"fix compile" (#10657)
7 years ago
Xin Pan 40a2ee9ae8
Merge pull request #10621 from panyx0718/fix_profile
7 years ago
Jacek Czaja 5f1333058c - Draft of reuse of pooling mkldnn operator
7 years ago
yuyang18 dfbe06ccab Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_ninja_build
7 years ago
Xin Pan 94c0a64d62 Fix a profiler race condition
7 years ago
yuyang18 dc6ce071d4 Polish cmake
7 years ago
yuyang18 7c777dd549 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/exec_strategy
7 years ago
yuyang18 08295f9877 Add build strategy
7 years ago
typhoonzero 7b0c0273f4 update by comments
7 years ago
typhoonzero f5840d8925 follow comments
7 years ago
typhoonzero 04bde96e4c Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op
7 years ago
fengjiayi 2bff03bc1e fix a compile error (#10488)
7 years ago
chengduoZH 345737d0fe add sync
7 years ago
typhoonzero a135fec1fc Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op
7 years ago
typhoonzero 17009d0627 workable version
7 years ago
Xin Pan dce0732d5e
Merge pull request #10380 from panyx0718/dist_timeline
7 years ago
typhoonzero a529d790b6 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gen_nccl_id_op
7 years ago
typhoonzero 3667578ec2 testing
7 years ago
chengduoZH d36af62c1e wrap_shfl_x_sync
7 years ago
typhoonzero d9320dcd94 complete code
7 years ago
Xin Pan 5a9f17f02b clean up
7 years ago
Xin Pan 76d8b14bce Add timeline support for distributed training
7 years ago
chengduo 54797abd53
Merge pull request #10347 from chengduoZH/replace___shfl_with__shfl_sync
7 years ago
chengduoZH e97c1a8ca0 fix __shfl
7 years ago
chengduoZH 0cc635497c merge develop
7 years ago
Yiqun Liu 6084af47ef
Fix the bug when a input variable of op is dispensable. (#10268)
7 years ago
chengduo 4fbde42cdf Fix __shfl_down_sync_ of cross_entropy (#10345)
7 years ago
chengduoZH b8f7fa97b6 replace __shfl with __shfl_sync
7 years ago
chengduoZH 90d73c79c3 fix shfl_sync for CUDA8.0
7 years ago
dzhwinter eb6f9dd5de
Feature/cuda9 cudnn7 (#10140)
7 years ago
Yu Yang c02ba51de0
Merge pull request #10191 from reyoung/feature/strict_dynload
7 years ago
Yu Yang 3d53631bad Make dyload strictly use the same ABI in header
7 years ago
gongweibao 6171705a2c Potential bug in paddle/fluid/platform/CMakeLists.txt (#9723)
7 years ago
Tao Luo 44fa823841
Merge pull request #9949 from mozga-intel/mozga-intel/Mul_mkldnn
7 years ago
fengjiayi 9f11da5931 Add synchronous TensorCopy and use it in double buffer
7 years ago
mozga-intel 171471eada
Merge branch 'develop' into mozga-intel/Mul_mkldnn
7 years ago
Yu Yang c3c7b7bd1b
Merge pull request #9928 from reyoung/feature/stablize_code
7 years ago
mozga-intel 6e7b883bdd Initial implementation of multiplication operator for MKLDNN
7 years ago
Tao Luo 038dbb386e
Merge pull request #9958 from luotao1/find_tensorrt
7 years ago
Kexin Zhao 64bf3df0f9 add print support to float16 (#9960)
7 years ago
Luo Tao d4682247e1 auto find tensorrt library
7 years ago
Yan Chunwei 186659798f
add tensorrt build support(#9891)
7 years ago
Yu Yang 093d227a77 Use mutex to stablize ncclCtxMap
7 years ago
Yi Wang 630943c7a7
Update documentation (#9918)
7 years ago
Yi Wang b48cf1712b
Fix cpplint errors in transform_test.cu (#9915)
7 years ago
Yi Wang 47609ab2b8
Document transform.h and fix cpplint errors (#9913)
7 years ago
Yu Yang 6b20b35589 Fix Transformer Hang Problem
7 years ago
Yu Yang c64190ecbb Polish NCCLHelper
7 years ago
Yu Yang 7483555a81 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/change_int64
7 years ago
qingqing01 129859e732
Support data type int64 in NCCL. (#9818)
7 years ago
Kexin Zhao 7ed457e77a Fix cuda 7.5 error with cublas GEMM (#9811)
7 years ago
Yu Yang 40e3fe173c Make cuda_helper.h Pass cpplint
7 years ago
chengduo b1224da8d9 Move reduceSum to elementwise_op_function.h (#9773)
7 years ago
Kexin Zhao 0f38bb4593
add fp16 support to activation op (#9769)
7 years ago
Yi Wang 8dbd9c394e
Fix part of the cpplint errors in fluid/platform (#9802)
7 years ago
qingqing01 add367c3f4 Code cleanup in the profiler code. (#9782)
7 years ago
Yi Wang 47a4ec0672 Remove call_once.h (#9764)
7 years ago
Yi Wang b1a5a3cab8
Fix cpplint errors with float16* (#9751)
7 years ago
Yi Wang 25ad6884bb Merge branch 'develop' of http://github.com/paddlepaddle/paddle into cpplint-memory-detail
7 years ago
Yi Wang 67ba884d2a Update CMakeLists
7 years ago
Yi Wang 478055bd9f Update CMakeLists.txt
7 years ago
Yi Wang 535646cf25 Update (#9717)
7 years ago
Yi Wang e185502ebe
Fix cpplint errors with paddle/fluid/platform/dynload (#9715)
7 years ago
Yi Wang 0c43a376e2
Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710)
7 years ago
Yi Wang 55ffceaadb
Fix cpplint errors paddle/fluid/platform/place.* (#9711)
7 years ago
Yi Wang 809962625f
Fix cpplint errors of enforce.* (#9706)
7 years ago
Yi Wang ef4ee22668
Fix cpplint errors with paddle/fluid/platform/cpu_info* (#9708)
7 years ago
Kexin Zhao b2a1c9e8b7 Add float16 support to non-cudnn softmax op on GPU (#9686)
7 years ago
Yi Wang 797a7184ac
Unify Fluid code to Google C++ style (#9685)
7 years ago
Kexin Zhao d00bd9eb72 Update the cuda API and enable tensor core for GEMM (#9622)
7 years ago
Lei Wang 09b4a1a361 Build: generate all the build related files into one directory. (#9512)
7 years ago
Kexin Zhao d904b3dd1d
Merge pull request #9623 from kexinzhao/enable_cudnn_tensor_core
7 years ago
Kexin Zhao 9ba36604d8 fix cpplint error
7 years ago
Kexin Zhao 187ba08789 enable tensor core for conv cudnn
7 years ago
chengduoZH e099b18045 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_CUDAPinnedPlace
7 years ago
chengduoZH 2514d70ea7 follow comments
7 years ago
Luo Tao 5baa529e0e fix compiler error of profiler_test in ONLY_CPU mode
7 years ago
chengduoZH 58a9f9f781 set the max size of cudapinned memory
7 years ago
Yu Yang 7dcb217e31 Refine allreduce op
7 years ago
Yu Yang c0c2e15920 NCCL AllReduce
7 years ago
chengduoZH ab601c19c3 Add CUDAPinnedPlace
7 years ago
chengduoZH 158d6c4d19 add unit test
7 years ago
chengduoZH 18eb77303d add CUDAPinnedPlace
7 years ago
Yu Yang 50e7e25db3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
7 years ago
Darcy 8090eb6272 added proto_desc to device_tracer's dep list (#9342)
7 years ago
Yu Yang 1d8fe2a220 Enhance device context pool (#9293)
7 years ago
Yu Yang 5c333e4143 Add dctor for dev_ctx
7 years ago
Yu Yang fe7ed285d1 Extract NCCLCtxMap
7 years ago
Kexin Zhao ed2bc194c5
Merge pull request #9176 from kexinzhao/batch_norm_fp16
7 years ago
Yu Yang 6ebc6bf533 ReorganizeCode
7 years ago
Yu Yang 41ad632341 Add NCCL Group Guard
7 years ago
Yu Yang 99fe83a020 Move nccl helper
7 years ago
Yu Yang a0494f8e55 Mutex lock wait
7 years ago
Kexin Zhao d307b5e4a6 Merge remote-tracking branch 'upstream/develop' into elementwise_add_fp16
7 years ago
Kexin Zhao 182da95317 small fix
7 years ago
Kexin Zhao f2bbbb2b66 fix arithmetic operator
7 years ago
Kexin Zhao 18d616ed70 add float16 arithmetic operators on new GPU
7 years ago
Yu Yang 3aa7051b98 Remove DevCtx lock
7 years ago
Yu Yang d3e55fde03 Guard devctx
7 years ago
Yu Yang 0023c3bcf5 Use atomic bool
7 years ago
Kexin Zhao 446d54f5c3 update
7 years ago
Kexin Zhao ffa22a5f90 fix scaling param type
7 years ago
Kexin Zhao e870947cfd fix batch norm fp16 param type
7 years ago
Yu Yang 5e87cd7574 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
7 years ago
qiaolongfei a39c861530 rm unused private field in profiler
7 years ago
Kexin Zhao a13ec3432a fix test error
7 years ago
Kexin Zhao e4de5dc347 add conv2d fp16 support
7 years ago
Xin Pan d284cf88e5
Merge pull request #9037 from panyx0718/develop
7 years ago
dzhwinter 128adf53cb
[Speed]implement cudnn sequence softmax cudnn (#8978)
7 years ago
Yu Yang baef1124fb ParallelExecutor And dependency engine
7 years ago
Xin Pan 4840c49b27 Better timeline
7 years ago
QI JUN 7287630e83
Repair nccl op test (#8575)
7 years ago
Kexin Zhao c88f58dbd8 add comment
7 years ago
Kexin Zhao 3b44b849d3 address comments
7 years ago
Kexin Zhao 1998d5afa2 add gpu info func to get compute cap
7 years ago
kexinzhao 90215b7844
Add float16 GEMM math function on GPU (#8695)
7 years ago
Yiqun Liu fecc9a38c6
Add test for nested RecordEvent. (#8773)
7 years ago
Xin Pan a9b9ec45ab
Merge pull request #8775 from panyx0718/test2
7 years ago