Commit Graph

1051 Commits (81138239db4dbb37cf659ec5688d24ce33f7ab57)

Author SHA1 Message Date
Zhang Ting 72ff5a09c3
fix print bug of profile, test=develop (#22804)
5 years ago
wangchaochaohu 8456c3f4dd
polish the profiler_help code (#22811)
5 years ago
wangchaochaohu 7578fcbac4
Profile code refine (#22800)
5 years ago
Adam 2b80e9a719
Add cpu_info without XBYAK (#22716)
5 years ago
Zhang Ting f97f3f9301
add framework overhead ratio in profile report (#22590)
5 years ago
wangchaochaohu 611411b90e
Fusion group profile support (#22718)
5 years ago
tianshuo78520a d2ba91aad1
fix typo words (#22653)
5 years ago
Yiqun Liu 22bbd54719
Add the support of fp16 in fusion_group (#22239)
5 years ago
wangchaochaohu a089072c8b
fix the profile print error (#22665)
5 years ago
wangchaochaohu c65c6ae534
add flag to control profile level in python API (#22319)
5 years ago
Chen Weihang fe685cc185
fix enforce test error, test=develop (#22610)
5 years ago
Chen Weihang 266106da75
Fix mismatch with plus sign in the line (#22588)
5 years ago
Wilber de009152a7 Compile without nccl deps. [2/2] (#22484)
5 years ago
LielinJiang 2b1386b2b2
optimize performance of interpolate op (#22436)
5 years ago
wangchaochaohu 77dd0d97bb
use enum class to replace the usage of enum in some condition test=develop (#22464)
5 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
5 years ago
Michał Gallus 269db0d1d1
[DNNL] Fix accuracy in INT8 FC (#22404)
5 years ago
wangchaochaohu 621d3e0b66
fix the bug of profile update (#22207)
5 years ago
石晓伟 ad0dfb17c1
[Feature] Lite subgraph (#22114)
5 years ago
Yiqun Liu 96980c2244
Polish the PADDLE_ENFORCE in fusion_group pass related codes. (#22144)
5 years ago
wangchaochaohu c3876cf82d
add support for nested profiling event and printing in different level (#22061)
5 years ago
zhaoyuchen2018 3d4f2aa689
Refine stack op to improve xlnet performance, test=develop (#22142)
5 years ago
Zeng Jinle 4c2df8e4d4
fix allocator strategy comment, test=develop, test=document_fix (#22121)
5 years ago
bingyanghuang 7872d06ff4 Add explanation on conv grad for dims<3 (#22125)
5 years ago
Chen Weihang ba8414d3a5
replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#22109)
5 years ago
Jacek Czaja b0b27ff699 [MKL-DNN] Conv grad and Batch Norm grad NHWC support (#22088)
5 years ago
Zeng Jinle 9587249442
polish allocator strategy doc, test=develop, test=document_fix (#22095)
5 years ago
Zeng Jinle d9f5d1eb29
ag allocator by default, test=develop (#21837)
5 years ago
Jacek Czaja ad8a9cb82c [MKL-DNN] Pool & LRN Grad Ops NHWC support (#21747)
5 years ago
Yiqun Liu d48320777e
Add the first implememtation of fusion_group op (#19621)
5 years ago
Chen Weihang 2e9082250d
polish default error msg & cublas error hint, test=develop (#22032)
5 years ago
Chen Weihang 35ff1568e9 Add error message for cublas inItizalize failed (#21995)
5 years ago
Chen Weihang fbb42173a9
fix no hint problem when use ENFORCE for cuda, test=develop (#21994)
5 years ago
Chen Weihang 1fd1f06f11 Rename paddle throw error macro (#21657)
6 years ago
Adam e81f0228df MKL-DNN 1.0 Update (#20162)
6 years ago
Zeng Jinle 97e76cb96d
refine dev_ctx.Wait() exception throw, test=develop (#21600)
6 years ago
Huihuang Zheng b241c7329c
Refine a Warning Which Can Occur Not Only During Init (#21546)
6 years ago
wangchaochaohu 932aca162d
Add Branch to avoid CPU profiler warning print (#21556)
6 years ago
Pei Yang 122b37ce62
make config option DisableGlogInfo() able to mute all inference logs (#21318)
6 years ago
Zhaolong Xing c5f0293cf3
NV jetson(nano, tx2, xavier) inference compile support (#21393)
6 years ago
Huihuang Zheng a71f53d7ac
Add warning message when initialize GLOG failed. (#21487)
6 years ago
Tao Luo 01fa4ead61
fix -Wno-error=sign-compare warning in gcc8 (#21434)
6 years ago
Jie Fang 5e813b53c5 nhwc optimization for batchnorm (#21090)
6 years ago
Jacek Czaja cd43c4440e [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375)
6 years ago
wangchaochaohu 8293f21a52
Profile refine (#21258)
6 years ago
wangchaochaohu e0e205ea2d
fix the profiling bug test=develop (#21396)
6 years ago
zhouwei25 345b67b5e2 remove warning LNK4006 and warning LNK4221 (#21226)
6 years ago
gongweibao ed2a185248
optimize nhwc for tensor core in ConvOp and ConvGradOp (#20597)
6 years ago
Zeng Jinle cdb3d27985
Fix warn of gcc8 (#21205)
6 years ago
liuwei1031 d8b6cf2bcd
fix sporadically hang issue on windows(#21201)
6 years ago
zhaoyuchen2018 b93870e696
Improve topk performance. (#21087)
6 years ago
Chen Weihang b3a3e6f60c change cuda enforce & add example (#21142)
6 years ago
Chen Weihang 27fa9c100b
add examples for resource exhausted error, test=develop (#21140)
6 years ago
Chen Weihang edd6680a71
Further simplify the C++ error info stack (#21093)
6 years ago
joanna.wozna.intel 77c2083586 Add transpose2 INT8 for mkl-dnn (#19424)
6 years ago
Chen Weihang 7ee25189c3
Enrich the type of error and declare the error type interfaces (#21024)
6 years ago
Adam 3fda695bb0 Add support for asymetric padding in MKLDNN pool, conv and conv_transpose (#21062)
6 years ago
Zeng Jinle a710ccc0cb
refine error message of allocator again, test=develop (#21023)
6 years ago
wangchaochaohu 7695b713e1
gpu info query refine test=develop (#20904)
6 years ago
Chen Weihang 3358455c86
Polish and arrange code in enforce.h (#20901)
6 years ago
Chen Weihang 8b59ac3ad0 delete paddle infershape enforce marco (#20832)
6 years ago
Chen Weihang 1d1552d106
Make formatted ENFORCE stack adapt to more situations (#20826)
6 years ago
Adam 67b59ddb38 Minor MKL-DNN conv int8 performance fixes (#20753)
6 years ago
123malin 95e90aa102
test=develop, add communicator_is_sgd_optimizer flag (#20677)
6 years ago
wopeizl 9e5948230e
add support to gcc8, add docker env test=develop (#19807)
6 years ago
WangXi 507afa8a8a Fix dgc nan by stripping nccl from sparseReduce. (#20630)
6 years ago
lidanqing 46e93f7c86 Revert "Refactor conv computeINT8" (#20640)
6 years ago
Jacek Czaja a1cd27f13f [MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance (#20241)
6 years ago
Zeng Jinle 4922eb6da5
make_conv_workspace_size_configurable, test=develop (#20662)
6 years ago
633WHU 12e4be0382 Dlpack support (#20039)
6 years ago
Wilber 751812a674
enable cpu machine to run paddle in gpu lib
6 years ago
Zeng Jinle 1d1d221f26
refine allocator_flag, test=develop, test=document_fix (#20400)
6 years ago
danleifeng 425279a57b Improve elementwise operators performance in same dimensions. (#19763)
6 years ago
qingqing01 1a3eef026c
Enable users to create custom cpp op outside framework. (#19256)
6 years ago
liym27 24010472d4 fix pool2d pool3d,support asymmetric padding and channel_last (#19739)
6 years ago
Chen Weihang b916335025 Paddle error message stack shaping and optimization (#19895)
6 years ago
joanna.wozna.intel 1d32897c5c Fix test pool2d int8 mkldnn (#19976)
6 years ago
Zeng Jinle 37f76407b0
fix cuda dev_ctx allocator cmake deps, test=develop (#19953)
6 years ago
Jacek Czaja 5b07ca9cdd - ReImplemented pooling fwd mkldnn (#19911)
6 years ago
chengduo d7251a8e1e
Delete local execution scopes (#19749)
6 years ago
Zeng Jinle c7f36e7c00
Add lock to cudnn handle calls (#19845)
6 years ago
Zeng Jinle b25d1e758d
remove enforce.h file written, test=develop (#19897)
6 years ago
Jacek Czaja 619c797a7f [MKL-DNN] LRN refactoring (#19798)
6 years ago
lidanqing 2c32c2d649 Refactor conv computeINT8 (#19574)
6 years ago
Adam c7e688921b Add template functions for Acquire primitive/primitive_desc (#19867)
6 years ago
Zeng Jinle 13ca364ceb
remove some flags and add comments to some flags, test=develop (#19813)
6 years ago
Zeng Jinle 5eb381a3e2
refine reallocate of workspace size, test=develop (#19843)
6 years ago
Adam dfdd73cbc0 Add MKLDNNhandlerT templatized class (#19801)
6 years ago
Zeng Jinle 32b1151f5e
reduce default value of cudnn workspace size, test=develop (#19780)
6 years ago
Adam d4413a54bc Add common CreateKey for mkldnn handlers (#19767)
6 years ago
Yihua Xu 0d6ea52958 Fix the definition issue when used mkl_scsrmm and mkl_dcsrmm functions. (#19774)
6 years ago
Jacek Czaja 9e4c958552 Refactoring activation mkldnn op (#19748)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Adam 428b2b9e17 MKLDNN handler cleanup (#19713)
6 years ago
XiaoguangHu 27235cf222
Add document annotations for FLAGS that need to be open to external developers test=develop (#19692)
6 years ago
Tao Luo f05d2c519d paddle::framework::vectorize() templatization [PART3] (#19643)
6 years ago
Yiqun Liu 42b5bec6f9
Integrate NVRTC to support compiling CUDA kernel at runtime (#19422)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Adam e94b26daf5 using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568)
6 years ago
Tao Luo 49523ea189
replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586)
6 years ago
zhouwei25 84c728013c fix the compilation issue on windows caused by mkl_CSRMM (#19533)
6 years ago
Jacek Czaja cef95ee30d [MKL-DNN] Refactoring Softmax (#19312)
6 years ago
Zeng Jinle 0a73f7202a
Add retry_allocator for gpu (#19409)
6 years ago
Jacek Czaja ecd9f330c9 [MKL-DNN] Fix to face model on AVX512 platforms (#19282)
6 years ago
liuwei1031 d6cb1a4122
add dynamic C runtime support on windows, test=develop (#19502)
6 years ago
Zeng Jinle c2c5b1b941
remove signal raise msg, test=develop (#19527)
6 years ago
Zeng Jinle caf59d0f3f
Add signal message to stderr (#19421)
6 years ago
Yi Liu efb05ba258
supports multiple NCCL communicators preserved in NCCLCommContext (#19407)
6 years ago
wopeizl b8aa37d529
save the callstack information to file when exception throws test=dev… (#19324)
6 years ago
Tao Luo 6527a7df67
replace part of PADDLE_ASSERT to PADDLE_ENFORCE (#19285)
6 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
Zeng Jinle 91a0911ca3
Make PADDLE_ENFORCE_EQ support types that cannot be converted to std::string (#19243)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
Zeng Jinle 002f325dcd
add PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#19211)
6 years ago
Adam b837689e97 Add generalized Conv+Activation MKLDNN fuse pass creation (#19072)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
wopeizl 80b7ef6fc8
add tensorrt support for windows (#19084)
6 years ago
Zhang Ting c2063217e7 optimize error message for "embedding" and "cross_entropy" OP (#18765)
6 years ago
liuwei1031 a43a763b54
fix warpctc.dll not found issue (#18761)
6 years ago
Zeng Jinle 08fa98f7cc
Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950)
6 years ago
Jacek Czaja 5cf2d38594 - Removed passing X from FWD to GRAD via device context (#18911)
6 years ago
Huihuang Zheng ea6ee76fa9
GPU allocation uses fraction of available memory (#18896)
6 years ago
Jacek Czaja cfcb96d2df [MKL-DNN] Fix int8 performance regression (#18758)
6 years ago
Huihuang Zheng 0d3f16f53e
Try to modify external gflags to solve CI compilation (#18872)
6 years ago
Huihuang Zheng cfce4994cf
Merge cuda 9/10 dockerfile with root dockerfile (#18693)
6 years ago
lidanqing 9ecd8ee789 change ComputeINT8 to template version to remove checking dst_datatype code (#18756)
6 years ago
Jacek Czaja 95c1816ec0 [MKL-DNN] Extended LRN with reusing via Acquire API (#18675)
6 years ago
chengduo fd3aad6cb3
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664)
6 years ago
Jacek Czaja 0d8e6c9b8b MKL-DNN upgrade to 0.20 (#18370)
6 years ago
zhouwei25 772e09560e Optimize the content of error reporting information, print error code and official document web sites (#18671)
6 years ago
Zeng Jinle ae58afc546
Feature/auto_growth_allocator (#18561)
6 years ago
liuwei1031 759530966c
print out error code of cudaGetDeviceProperties if failed (#18643)
6 years ago
Jacek Czaja 71d883b8ef [MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585)
6 years ago
Tao Luo 076f833110
add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
Zeng Jinle be24e5b391
Clean unused code of dim and place (#18565)
6 years ago
Jacek Czaja 8869d7f735 Activations MKLDNN ops refactoring (#18191)
6 years ago
Jiabin Yang 667f88f9a6
Fix/gcc 4.8 ubt link error (#18558)
6 years ago
Physher 0caa08ea40 Add mkldnn int8 mul-op kernel (#17834)
6 years ago
Tao Luo fe32879d2a
add mkldnn shapeblob cache clear strategy (#18513)
6 years ago
chengduo 55baeceddb
Enhance execution error info (#18482)
6 years ago
Tao Luo 3f3112ceb0
add shape_blob for cache mkldnn primitive (#18454)
6 years ago
Leo Zhao 8f5fffca0a rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453)
6 years ago
Yi Liu a873fa84ce
supports collective training with programs (#18392)
6 years ago
Brian Liu 4bc2987d2f Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964)
6 years ago
Leo Zhao 681d3553f1 Fix potential mkldnn concat/pool/conv kernel issues (#18393)
6 years ago
HaoRen 9931bc64f5 add dependecy of collective_helper (#18365)
6 years ago
Michał Gallus 8409693272 Reset DeviceContext after quantization warmup (#18182)
6 years ago
HaoRen b7128bac5f supports collective communicated training (#18175)
6 years ago
Jacek Czaja c2efdfd5bc [MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146)
6 years ago
chengduo 4978db2c10
Remove nccl dep when the number of GPU is 1 (#18158)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
Jacek Czaja 84bb45c054 [MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965)
6 years ago
hutuxian 969e6378b9
Pipeline Concurrency (#17402)
6 years ago
Zeng Jinle 3ece61f71e
Remove attribute in Allocator::Allocate (#17878)
6 years ago
Zeng Jinle 3925bd81e8
Fix cuda/cudnn version detection error (#17853)
6 years ago
chengduo d1169afaa3
remove InstallFailureSignalHandler (#17828)
6 years ago
Leo Zhao 50326563d5 enable mkldnn primitive reuse for platform reorder (#17826)
6 years ago
wangchaochaohu c10157a5df
revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753)
6 years ago
chengduo 863c75168c
polish error doc (#17772)
6 years ago
gongweibao 0d561ef442
fix 2dconn test=develop (#17681)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
wopeizl 6724a652f3
add __str__ method for tensor and lodtensor to support print test=dev… (#17588)
6 years ago
mozga-intel f2694e122d [NGraph] Enable assign operator for a ngraph, test=develop (#17437)
6 years ago
Zeng Jinle c6189637cd
Fix allocator bug (#16712)
6 years ago
mozga-intel 109b5aed5a [NGraph] Enable reshape operator test=develop (#17512)
6 years ago
guomingz 2281ebf0f3 Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130)
6 years ago
qingqing01 97f0ec2357 Fix compiling error with cuDNN 5.1 (#17458)
6 years ago
Zeng Jinle eab34b2df6
fix_dygraph_mem_leak, test=develop (#17396)
6 years ago
qingqing01 e32c9888f5
Double backward of conv2d. (#17211)
6 years ago
zhaoyuchen2018 792443ef23
Refine elementwise kernel. (#16952)
6 years ago
chengduo db5e74ab95
update assert (#17282)
6 years ago
baojun 7bd1d03ee5 Adding lrn op for ngraph engine (#17189)
6 years ago
Tao Luo ff1661f12a
remove unused FLAGS_warpctc_dir (#17162)
6 years ago
Huihuang Zheng e4a5332416
Fix a typo in gpu_info.cc (#17175)
6 years ago
Huihuang Zheng b9494058b3
Use CudnnWorkspaceHandle in exhaustive search (#17082)
6 years ago
Zeng Jinle 0c335dcd2c
Make conv cudnn workspace size configurable (#17036)
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
xuezhong 742d758747 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_infershape_bug2
6 years ago
xuezhong 5663fbfb0a fix infershape bug
6 years ago
Jacek Czaja 87a44b1149 [MKL-DNN] Added reusing of primitive descriptors (fp32) (#16667)
6 years ago
dongdaxiang a659b37ace make lodtensor_printer usable in gpu setting
6 years ago
Chen Weihang 0b2aec14b6 Revert "Model data cryption link all lib (#16555)"
6 years ago
Chen Weihang c38c7c5619
Model data cryption link all lib (#16555)
6 years ago
guru4elephant 76b49f02ee
Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop
6 years ago
gongweibao fea91164b7 Fix windows compilation error! (#16546)
6 years ago
dongdaxiang 3a79be6eb3 refine API spec
6 years ago
dongdaxiang 98dda08a85 fix pull sparse slow problem
6 years ago
dongdaxiang 93c3c7f9b3 fix dataset testcase problem
6 years ago
dongdaxiang d739bab844 fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem
6 years ago
dongdaxiang e3107a6ae0 fix windows compile problem
6 years ago
dongdaxiang 398004ece0 disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer
6 years ago
dongdaxiang 39362a8415 move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids
6 years ago
dongdaxiang a0b59773af fix code style
6 years ago
dongdaxiang 365be5d559 support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem
6 years ago
dongdaxiang dc8cf36e4b add more example on datagenerator
6 years ago
dongdaxiang 6bf796df14 refine print fetch list
6 years ago
dongdaxiang cf1360643f add printer for fetch variable
6 years ago
Jacek Czaja 2632327429 [MKL-DNN] Tensor modifications revert (#16462)
6 years ago
Zeng Jinle 69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
6 years ago
sneaxiy 5656fa9f7c fix travis ci
6 years ago
Zeng Jinle 174d0d0b90 Revert "Fix allocator bug"
6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. (#15841)
6 years ago
Zeng Jinle 644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
6 years ago
nhzlx 953bdde058 Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
sneaxiy 2d92b6be98 merge develop
6 years ago
Zeng Jinle c64d959343
Merge pull request #16295 from zhhsplendid/zhenghuihuang-dev-2
6 years ago
nhzlx a1d11bb175 fix ci bug: cudnn handler in multi card
6 years ago
nhzlx 3df7b98a0f Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
sneaxiy 953214ad97 add more unittest
6 years ago
Wu Yi b7baeed7bb fix win gpu build test=develop (#16334)
6 years ago
zhhsplendid 124f1df481 Add flags for init and re-alloc gpu
6 years ago
nhzlx 07dcf2856c git cherry-pick from feature/anakin-engine: update anakin subgraph #16278
6 years ago
Wu Yi 6382b62f6b
Collective ops (#15572)
6 years ago
zhhsplendid 22715487dc add allocator flags
6 years ago
sneaxiy fd23262e0c merge develop, fix conflict
6 years ago
qingqing01 86e912c544 Fix windows compiling (#16230)
6 years ago
qingqing01 8ad672a287
Support sync batch norm. (#16121)
6 years ago
sneaxiy 682f2dbf29 merge develop
6 years ago
sneaxiy 2c4fcaa683 merge develop
6 years ago
chengduo 0979956619
Add memory profiler (#16137)
6 years ago
chengduo ad80bde824
Revert "Revert "Add Event for TensorCopy"" (#16035)
6 years ago
sneaxiy 2a639d5c2a add allocator chain to fix bug
6 years ago
chengduo e2da3a5b22
Revert "Add Event for TensorCopy" (#16022)
6 years ago
chengduo 7235fd662b
Add Event for TensorCopy (#15953)
6 years ago
Tao Luo 4efdebc6f6
Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt
6 years ago
dzhwinter 225c11a91f polish cudnn related code and fix bug. (#15164)
6 years ago
xiaolil1 6724be2b0d INT8 Pool kernel Key Creation Optimization. (#15883)
6 years ago
Yihua Xu 7396788694 Optimize gelu operation with mkl erf.
6 years ago
peizhilin c6472579c0 test=develop
6 years ago
peizhilin b5d6e38b05 fix build issue for cudaEvent_t
6 years ago
wopeizl 3ccd8964a4
Merge pull request #15905 from wopeizl/win/fix_eigen
6 years ago
chengduo 8e904d322f
Remove unnecessary dependence for profiler (#15899)
6 years ago
Xin Pan 44e7fcddc5
Merge pull request #15844 from panyx0718/infer
6 years ago
Jacek Czaja dec9cf53c8 [MKL-DNN] MKL-DNN specific Tensor modification (#15429)
6 years ago
peizhilin 6ccdb1b947 fix build issue on windows for sample prop op
6 years ago
Dun c6bd434ffe
add memset CUPTI && test=develop (#15868)
6 years ago
Sylwester Fraczek 74672d1aff Change *(smart_ptr.get()) -> *smart_ptr
6 years ago
tensor-tang ee2321debd
Revert 15770 develop a6910f900 gelu mkl opt (#15872)
6 years ago
chengduo 3b08c9abf4
enhance profiler (#15842)
6 years ago
Yihua Xu 676995c86c Optimze Gelu with MKL Erf function (#15770)
6 years ago
Tao Luo e3dd6970fc disable dam temporarily (#15860)
6 years ago
Dun Liang 35a90e06bf test=develop
6 years ago
Dun Liang c9080f516b test=develop
6 years ago
Dun Liang 1c7bb0e40c test=develop
6 years ago
Xin Pan 5eb87506bc add per kernel config and remove const_cast.
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
mozga-intel 13ec2d331b Enable momentum operator for a ngraph engine (#15673)
6 years ago