Commit Graph

3040 Commits (89530384008b023dc1e8c51e5a8e7e710718efff)

Author SHA1 Message Date
Zhaolong Xing c5f0293cf3
NV jetson(nano, tx2, xavier) inference compile support (#21393)
6 years ago
Tao Luo 01fa4ead61
fix -Wno-error=sign-compare warning in gcc8 (#21434)
6 years ago
wangchaochaohu d4776ec027
fix the correctness of memcpy profiling result test=develop (#21458)
6 years ago
Jie Fang 5e813b53c5 nhwc optimization for batchnorm (#21090)
6 years ago
Leo Chen e0c9d856fb
add unused input vars check for OpWithKernel, test=develop (#21169)
6 years ago
Huihuang Zheng 630be31952
Fix Cond Bug for Nested Control Flow (#21340)
6 years ago
Jacek Czaja cd43c4440e [MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375)
6 years ago
Zeng Jinle 6b09b73e17
add explicit conversion to NoNeedBufferVarsFunctor, test=develop (#21430)
6 years ago
hong ac8546701d
Add dygraph execution context (#20157)
6 years ago
Zeng Jinle 09696d5df8
Use system allocator in OpTest (#21335)
6 years ago
Tao Luo c0656dcb1a
remove -Wno-error=sign-compare, make warning as error (#21358)
6 years ago
Zeng Jinle b97fc16d21
fix lod_reset bug, test=develop (#21392)
6 years ago
Zeng Jinle 89966525f1
Polish reference count pass (#21324)
6 years ago
Youwei Song d5ff79e55e Support numpy bridge (enabled by default in dygraph mode) (#20983)
6 years ago
GaoWei8 8493f20ebc Polish the codes of fc when needs padding (#21378)
6 years ago
Michał Gallus 5d7d548275 INT8 Fully-connected (#17641)
6 years ago
GaoWei8 234060f88f Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
6 years ago
zhouwei25 345b67b5e2 remove warning LNK4006 and warning LNK4221 (#21226)
6 years ago
Thunderbrook 9a7832f8be
print table stat info for pslib (#21296)
6 years ago
Dong Daxiang 691ced87c0
Refactor fetch handler (#21264)
6 years ago
Yiqun Liu c918788ba9 Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. (#21310)
6 years ago
Chen Weihang 952508527a
Polish some PE code details (#21274)
6 years ago
Thunderbrook 0d17c1b816
solve pslib core in stop worker (#21263)
6 years ago
Thunderbrook 349e82d669
support general embedding params (#21217)
6 years ago
Yiqun Liu 6b1e1f0dda
Enable generating code for a given subgraph. (#21126)
6 years ago
Zeng Jinle a152315be7
refine Tensor method, test=develop (#21031)
6 years ago
Zeng Jinle cdb3d27985
Fix warn of gcc8 (#21205)
6 years ago
Zhaolong Xing 65f7052554
TRT int8: refine trt int8 for dynamic range set (#21112)
6 years ago
xujiaqi01 23876de55b
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)
6 years ago
xujiaqi01 9e045170c0
add copy table (#21086)
6 years ago
Chen Weihang 4bd9463630
fix detail error message error, test=develop (#21170)
6 years ago
Chen Weihang 8da0cd537a
Add examples for error message writing specification - NotFound, OutOfRange, AlreadyExists, PermissionDenied (#21134)
6 years ago
Chen Weihang 8414575b78
Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137)
6 years ago
Chen Weihang 7e5f74b825
Add examples for error message writing specification - InvalidArgument (#21132)
6 years ago
WangXi de5d3ff688 Fix dgc buffer illegal & reuse velocity (#21012)
6 years ago
Zeng Jinle d625aaf0c1
remove so many logs of parallel executor, test=develop (#21105)
6 years ago
Yiqun Liu 35f17ae28f
Add the check of lod_level between compile-time and runtime. (#20961)
6 years ago
Chen Weihang 826254f664
Add pre-condition check for fuse optimizer op pass (#21005)
6 years ago
Yiqun Liu 9091f8cdf9
Support generating code for grad_op (#21066)
6 years ago
joanna.wozna.intel 77c2083586 Add transpose2 INT8 for mkl-dnn (#19424)
6 years ago
Chen Weihang 7ee25189c3
Enrich the type of error and declare the error type interfaces (#21024)
6 years ago
Zeng Jinle 5aae595902
fix no_need_buffer_vars_dep, test=develop, test=document_fix (#21007)
6 years ago
xujiaqi01 1d1a07937a
simplify master+patch,remove ins when size != merge_size or has conflict slot (#20913)
6 years ago
Zeng Jinle 878a40f57d
Support NoNeedBufferVarsInference in dygraph backward (#20868)
6 years ago
Wilber c534149642
fix squared_mat_sub_fuse_pass when elementwise_op input is from persistable param test=develop (#20960)
6 years ago
WangXi eec4fa9099 And Enforce to fuse pass for DGC doesn't support fuse for now, test=develop (#20935)
6 years ago
Zeng Jinle b0c0ffb9ae
refine pe when exception raises, test=develop (#20894)
6 years ago
123malin 20cdff0e02
Optimize decay (#20816)
6 years ago
hong 8c4573a3cb
GradMaker for dygraph (#19706)
6 years ago
Thunderbrook 59bcdc8a19
support dump param of model into afs (#20302)
6 years ago
Yiqun Liu 16e4d02675
Refine the cache of program, context and scope in executor. (#18483)
6 years ago
hong ff0886a92a
save load problem fix and new feature add (#20823)
6 years ago
Yiqun Liu 6fcfd32e6c
Check and correct the output's lod_level in DynamicRNN related operators (#19144)
6 years ago
Yiqun Liu b5f3be8330
Implement a pass detect fusion group of elementwise op (#19884)
6 years ago
Huihuang Zheng 95ba4bd2ab
Add shape and type check at read_op (#20754)
6 years ago
Zeng Jinle 98103d3003
remove some unnecessary logs in pe, test=develop (#20848)
6 years ago
Chen Weihang 26cc1fe508
Replace risky GetInputType method with secure IndicateVarDataType interface (#20668)
6 years ago
xujiaqi01 48669aa8f0
fix several sparse table issuses (#20686)
6 years ago
Chen Weihang 1d1552d106
Make formatted ENFORCE stack adapt to more situations (#20826)
6 years ago
Zeng Jinle ac813bbaf4
Add more error debug message to Operator::Run (#20793)
6 years ago
wangchaochaohu ba45dce35d
fix codetest for windows make test=develop (#20796)
6 years ago
zhongpu 72d1d72c09 fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)
6 years ago
石晓伟 48a774c713
fix ts_sort's bug, test=develop (#20720)
6 years ago
wopeizl 9e5948230e
add support to gcc8, add docker env test=develop (#19807)
6 years ago
xujiaqi01 5223b0dd9d
add check nan / inf in downpour worker (#20694)
6 years ago
WangXi 507afa8a8a Fix dgc nan by stripping nccl from sparseReduce. (#20630)
6 years ago
Zeng Jinle 4eeda9d676
fix tensor_util, test=develop (#20699)
6 years ago
Zeng Jinle ab575de725 Fix op run log when memory optimization strategy is enabled (#20695)
6 years ago
Jacek Czaja a1cd27f13f [MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance (#20241)
6 years ago
Zeng Jinle 10505faf4e
polish codes, test=develop (#20672)
6 years ago
Chen Weihang 003f369bb2
Add IndicateVarDataType interface to block tensor is not initialized problem in OP GetExceptedKernelType (#20044)
6 years ago
Chengmo 940c6ff1c8
Fix communicator slow bug & fix communicator stop bug (#20366)
6 years ago
WangXi cadc6a9704 fix dgc test and bug when not set trainers_endpoints_, test=develop (#20617)
6 years ago
Thunderbrook f76a32df4a
dump fix dov vec file num (#20539)
6 years ago
633WHU 12e4be0382 Dlpack support (#20039)
6 years ago
Pei Yang 443f604c3b
add DisableGlogInfo() to AnalysisConfig, test=develop (#20581)
6 years ago
Zeng Jinle a9c8bdad7b
refine pe codes, test=develop (#20479)
6 years ago
Zeng Jinle 76b321872a
fix cuda dev_ctx by event, test=develop (#20553)
6 years ago
zhaoyuchen2018 b8333edef6
Add Multihead matmul fuse pass (#20167)
6 years ago
Adam 7faa3e9555 Add ConvTranspose + BatchNorm fuse pass (#20161)
6 years ago
xujiaqi01 22b80e1246
fix parse content in CreatePreLoadReaders (#20258)
6 years ago
hong fa43e80e19 New save load interface (#20148)
6 years ago
Zeng Jinle c20b11ba11
simplify op_info.h, test=develop (#20195)
6 years ago
hong 0ec2c081d9
update op compatible list; test=develop (#20175)
6 years ago
tangwei12 c9139c3db3
trainer from dataset fetch targets (#19760)
6 years ago
chengduo bfa55c9ddb Add place deps for fused_all_reduce_op_handle (#20077)
6 years ago
Zeng Jinle 5fef859c65
remove map type from var_type_traits.h, test=develop (#20090)
6 years ago
Zeng Jinle 4ad66c779c
fix op_compatiable_compile_error, test=develop (#20076)
6 years ago
qingqing01 1a3eef026c
Enable users to create custom cpp op outside framework. (#19256)
6 years ago
bingyanghuang 9de6772510 Follow comment of Merged QAT PR 18970 (#19979)
6 years ago
石晓伟 01b9d07963
update operator compatible info, test=develop (#19978)
6 years ago
joanna.wozna.intel f5221ac19f Disable conv requant squash (#20041)
6 years ago
wangchaochaohu c9ea317b36
codegen code for reconstruction (#19728)
6 years ago
tangwei12 8f0b3c0516
the integrated communicator (#19849)
6 years ago
Chen Weihang b916335025 Paddle error message stack shaping and optimization (#19895)
6 years ago
chengduo 2450d15b78
disable fuse_all_optimizer_ops (#19966)
6 years ago
chengduo 101a2b610a Add dtype for coalesce_tensor_op (#20016)
6 years ago
Huihuang Zheng 88af4ab650
Add new data layer (#19916)
6 years ago
xujiaqi01 f50e701b3b
fix memory leak in HogwildWorker (#19956)
6 years ago
xujiaqi01 cedc04775c
support change shuffle and train thread num (#19841)
6 years ago
Zeng Jinle cc157d5990
add inplace to assign op, test=develop (#19927)
6 years ago
chengduo 55ce696986
clean tensor array (#19930)
6 years ago
chengduo d7251a8e1e
Delete local execution scopes (#19749)
6 years ago
wopeizl 5452b6a152
remove the useless warning for user to avoid confuse test=develop (#19871)
6 years ago
hong 85b398f171
Add op compatible information (#19910)
6 years ago
Huihuang Zheng e117114289
Set states of recurrent op as dependent vars in prune (#19865)
6 years ago
Zeng Jinle b754700fb5
fix reduce and broadcast to avoid multi-stream, test=develop (#19889)
6 years ago
joanna.wozna.intel 3f1d0234ae Fix conv2d+dequantize squash for residual fusion (#19545)
6 years ago
Huihuang Zheng a35557d8f4
Fix deps of prune (#19876)
6 years ago
Leo Chen 578a2f5da3 fix SplitLodTensor when batch_size = 0, test=develop (#19866)
6 years ago
Yiqun Liu 3cd985a669
Add a pass to fuse fc+elementwise_add+layernorm (#19776)
6 years ago
Zeng Jinle 3f87464e9c
refine executor_gc_helper codes, test=develop (#19814)
6 years ago
Zeng Jinle 3fd3b663a8
fix gc bug in controlflow ops, test=develop (#19827)
6 years ago
Zeng Jinle db26de8389
[Bug fix] Disable memory reuse on feeded variables (#19835)
6 years ago
Thunderbrook 40c66f8df9
rm return in vfork (#19734)
6 years ago
xujiaqi01 6bf298bf09
support preload thread, optimize hdfs log, fix master+patch bug (#19695)
6 years ago
Jiabin Yang cc311bdf95
Feature/add transform data dygraph (#19707)
6 years ago
Zeng Jinle 754fd57ed7
disable memory optimization passes when FLAGS_use_ngraph=True, test=develop (#19778)
6 years ago
chengduo 8281497030
Fix warning info of build_strategy (#19805)
6 years ago
Yiqun Liu c67c8758cb
Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733)
6 years ago
Chen Weihang 00d5375e0c
Add prune_backward function to cover complicated test_program.clone situation (#19772)
6 years ago
Adam d4413a54bc Add common CreateKey for mkldnn handlers (#19767)
6 years ago
chengduo 056fdedde3
Open fuse all reduce option (#19765)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Zeng Jinle 0daa5c9772
Make leaky relu inplacable (#19676)
6 years ago
chengduo e506c99c20
Open fuse broadcast option (#18833)
6 years ago
Yiqun Liu a65c728e5d
Implement the GPU kernel of fc operator (#19687)
6 years ago
chengduo 5866a7a5fe
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418)
6 years ago
Tao Luo ec9bc1bd9f
paddle::framework::vectorize() templatization (#19730)
6 years ago
Zeng Jinle bb4f8dee83
add logs to left var memory size, test=develop (#19722)
6 years ago
wangguanzhong 25dcd74d34
merge empty lod tensor, test=develop (#19228)
6 years ago
Zeng Jinle 713c05dd60
refine tensor.mutable_data, test=develop (#19680)
6 years ago
hutuxian 1ca6ea0318
fix cmakelist deps (#19668)
6 years ago
Tao Luo bcddbc78d4
remove -Wmaybe-uninitialized warning (#19653)
6 years ago
wangchaochaohu ed8f44ea21
codegen for fused elementwise operation (#19520)
6 years ago
mapingshuo dca9b6c5b0 add feed_var_names to Prune interface (#19589)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
tensor-tang e3e98ed678
fix scope lock bug on infer (#19624)
6 years ago
Tao Luo 0a46d34538
refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607)
6 years ago
baojun a3a4b6e570 Enable ngraph through build_strategy (#19266)
6 years ago
Adam 8d6d95cc2b paddle::framework::vectorize() templatization (#19611)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Yiqun Liu c5548178b0
A a pass to enable the use of cudnn (#19346)
6 years ago
Adam e94b26daf5 using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568)
6 years ago
gongweibao abaf87be2b
Change backward_guard to optimize_guard to maximize the allreduce overlap. (#19506)
6 years ago
Zeng Jinle 19474019c2
fix fast pe to run highest priority ops first, test=develop (#19575)
6 years ago
Zeng Jinle 0af8549750 fix seg fault of share lod, test=develop (#19573)
6 years ago
hutuxian c756b5d231
Paddlebox Framework (#18982)
6 years ago
Jacek Czaja ecd9f330c9 [MKL-DNN] Fix to face model on AVX512 platforms (#19282)
6 years ago
yaoxuefeng 10ca3f9609
add thread scope stat accurate metrics test=develop (#19480)
6 years ago
Tao Luo 02270b3eb1
remove unused assert.h (#19529)
6 years ago
chengduo e340df013e
Support feed single persistable variable to PE (#19417)
6 years ago
Yiqun Liu fcec365d29
Add a pass to replace dropout_op with scale_op when is_test is true (#19297)
6 years ago
Thunderbrook 1fe468d319
support debug each output of each ins (#19004)
6 years ago
Zeng Jinle 5c8f210ce3
refine inplace inference registry, test=develop (#19032)
6 years ago
chengduo b6d1d8901f
Increase num_iteration_per_drop_scope (#19075)
6 years ago
tangwei12 65c7368400
Fix the correctness of async mode at distributed training (#18863)
6 years ago
joanna.wozna.intel 2e3ec66be0 Add conv dequant squash for int8 (#18905)
6 years ago
Tao Luo c82280e445
remove unused conv_elementwise_add2_act_fuse.cc (#19344)
6 years ago
Leo Chen a9d5fc5142 Enhance OpTest to check the consistency of operators when using and not using inplace (#19101)
6 years ago
Tao Luo e3c68bde78
stronger the error message of tensor's mutable_data (#19303)
6 years ago
Adam 97d1db1874 Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237)
6 years ago
Zhaolong Xing 76c95af000
Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213)
6 years ago
Zeng Jinle 5b6673c44d
merge develop to solve conflict, also fix API doc, test=develop (#18823)
6 years ago
liuwei1031 50582071dc
fix compilation issue in windows vs2017 (#19183)
6 years ago
juncaipeng 5368b36512 remove the warning for reminding user to avoid using the OriginProgram method, test=develop (#19244)
6 years ago
chengduo 8a89ca94ce
Fix REGISTER_OP_WITHOUT_GRADIENT (#19251)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
Adam b837689e97 Add generalized Conv+Activation MKLDNN fuse pass creation (#19072)
6 years ago
Yiqun Liu 77572b70cb
Enhance the error message when GrapOpMaker is null. (#19070)
6 years ago
chengduo c70a97f46e Use CUDAPinnedPlace in buffered_reader (#19112)
6 years ago
jiaqi b104ea0684
add get_last_save_xbox_base/get_last_save_xbox (#19122)
6 years ago
joanna.wozna.intel 492a00f53e Add conv reqantize squash (#18754)
6 years ago
joanna.wozna.intel bce72c7fea Replace Relu with bounded Relu in MobileNetV2 quantization (#18988)
6 years ago
chengduo e044e84264
open fuse_all_optimizer_ops (#19087)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
yaoxuefeng 9150cf50fc
add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)
6 years ago
hutuxian 5a80cc8431
Datafeed support reading to cuda place directly. (#19071)
6 years ago
chengduo 17d62ab220
Enhance fuse optimization op pass (#19010)
6 years ago
chengduo 21440b4d69
Add call stack info during compile time (#19067)
6 years ago
jiaqi fc038da749
fix QueueDataset queue size (#19016)
6 years ago
Leo Chen 8f53735437 Fix memory overwriting of tensors returned by executor (#19030)
6 years ago
Zeng Jinle 2175d19993
fix memory_reuse_pass memory_size calculation error, test=develop (#19020)
6 years ago
Zeng Jinle 7ac748adb4
Open gc by default (#18836)
6 years ago
jiaqi 02c370c3dc
support filelist size < trainer num && fix pull dense (#18956)
6 years ago
chengduo e7da0940f9
Disable fuse optimization option (#18924)
6 years ago
石晓伟 ee2f296ef8
Fusion: seqpool_cvm_concat (#18471)
6 years ago
jiaqi 768059b3a0
adjust ins weight according to nid slot (#18784)
6 years ago
Leo Zhao 10eeed93d1 Revert "use static variable to do cache instead of thread local in thread frequent switching case (#18428)" (#18879)
6 years ago
Zeng Jinle 8008ab4e6b
Remove legacy C++ memory optimization codes (#18834)
6 years ago
Thunderbrook 52c1431eee
add clear_model interface in fleetwrapper (#18815)
6 years ago
chengduo 4140fe11a4
Open fuse optimization ops (#18741)
6 years ago
Zeng Jinle a802da650b
Feature/mem opt pass refactor (#18735)
6 years ago
fuyinno4 c167a4b4dd
Fix shrink-dense and add scale-datanorm (#18746)
6 years ago
Zhaolong Xing 26ae6d49e4
Update trt5 for paddle-trt (#18645)
6 years ago
Thunderbrook d8396281ef
add slot to sparse table (#18686)
6 years ago
jiaqi d18aabb472
support patch data, add load_one_table, fix bug (#18509)
6 years ago
chengduo fd3aad6cb3
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664)
6 years ago
Huihuang Zheng 89bc3fd841
Support memory eager deletion on recurrent OP (#17710)
6 years ago
Zeng Jinle ae58afc546
Feature/auto_growth_allocator (#18561)
6 years ago
guru4elephant d714bf037c
remove async executor and add data_feed.proto to the deps of train demo (#18659)
6 years ago
chengduo a6d468a265
fix PE fetch bug (#18644)
6 years ago
Leo Zhao ff77dea969 not use transferscope cache in cpu case (#18578)
6 years ago
123malin b414645a65
fix #17430: int64类型的attr训练非预期 (#18264)
6 years ago
gongweibao c0a82748cf
Polish backwards optimizer dependency codes and use more default values. (#18255)
6 years ago
Zeng Jinle d3003a1620
Feature/buffer_shared_inplace (#17911)
6 years ago
Zeng Jinle be24e5b391
Clean unused code of dim and place (#18565)
6 years ago
Jiabin Yang 667f88f9a6
Fix/gcc 4.8 ubt link error (#18558)
6 years ago
Zhaolong Xing 88b52a27fe
Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532)
6 years ago
Leo Zhao ce38bb5341 use static variable to do cache instead of thread local in thread frequent switching case (#18428)
6 years ago
gongweibao 160ddc980c
Regroup fusion by date type. (#18496)
6 years ago
chengduo 7453857324 Make fuse_all_reduce_op_pass support mix_precision (#17652)
6 years ago
pkpk e9c7e218f2
Nan debugger init (#18401)
6 years ago
Tao Luo d234aa02cd
add transfer_scope_cache unit-test (#18467)
6 years ago
Yi Liu a873fa84ce
supports collective training with programs (#18392)
6 years ago
Michał Gallus 7023a86c3a Fix Pooling output scale (#18186)
6 years ago
jiaqi 93a2b317f7
fix data feed ptr error (#18419)
6 years ago
chengduo 8ed33bf91f
Fix Bug-prone code of PE (#18354)
6 years ago
tangwei12 999d9a59a5
fix communicator with pyreader (#18350)
6 years ago
HaoRen b7128bac5f supports collective communicated training (#18175)
6 years ago
Sylwester Fraczek 9252e8fa08 add int8 mkldnn prior_box (#17242)
6 years ago
chengduo 135a59ed45
update reduce config (#18334)
6 years ago
chengduo 5489216eba
Clean build strategy (#18148)
6 years ago
chengduo 14e1e165df
update alloc_continuous_space_for_grad_pass (#18287)
6 years ago
jiaqi 3f8031e256
dataset (#17973)
6 years ago
chengduo 25f3cd6486
Update execution_strategy option default value (#18183)
6 years ago
chengduo 4978db2c10
Remove nccl dep when the number of GPU is 1 (#18158)
6 years ago
chengduo 24e988a471
Fix bug of scope_buffered_ssa_graph_executor (#18100)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
chengduo b5a1c1463d
Update CPU_NUM config (#18059)
6 years ago
hutuxian f1d458daf0
add trainer_desc proto DEPS (#18019)
6 years ago
gongweibao da9143c1cc
Polish codes of old prs. (#17938)
6 years ago
石晓伟 bce259e5bf
Update the Anakin interfaces for content-dnn and MLU (#17890)
6 years ago
hutuxian 969e6378b9
Pipeline Concurrency (#17402)
6 years ago
Zeng Jinle 3ece61f71e
Remove attribute in Allocator::Allocate (#17878)
6 years ago
gongweibao 972c54cd70
Fix FLAGS_fuse_parameter_memory_size unit from Bytes to MBytes. (#17924)
6 years ago
gongweibao dd4cd352c7
Fix sync_batch_norm_op ncclallreduce error! (#17918)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
wopeizl 453a49b1bc
Make ParallelExecutor support Windows GPU (#17787)
6 years ago
baojun a4c528a31c [NGraph] some ngraph updates to enable bert (#17739)
6 years ago
chengduo 437520474c
fix DropLocalExeScopes (#17829)
6 years ago
Leo Zhao 50326563d5 enable mkldnn primitive reuse for platform reorder (#17826)
6 years ago
chengduo 863c75168c
polish error doc (#17772)
6 years ago
guru4elephant d52391094d
fix prepare context redundant code problem, optimize executor by cach… (#17743)
6 years ago
chengduo 67c8dade58
Add Event in ScopeBuffer Executor (#17667)
6 years ago
Yiqun Liu 8fd39f3e99
Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236)
6 years ago
gongweibao 0d561ef442
fix 2dconn test=develop (#17681)
6 years ago
mozga-intel 5eb81fe595 Capi for a ngraph engine (#17037)
6 years ago
Jacek Czaja 6d8075ecef [MKL-DNN] conv_transpose mkldnn bias pass (#17644)
6 years ago
Sylwester Fraczek 96845d2168 add Concat quantization (#17448)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
Zeng Jinle 4aa931dd85
Code clean of Allocator (#17602)
6 years ago
Zhaolong Xing 61221ebc28
TRT: Support set dynamic range in int8 mode. (#17524)
6 years ago
Michał Gallus 0c39b97b4e [MKL-DNN] Add Fully Connected Op for inference only(#15226)
6 years ago
wopeizl 6724a652f3
add __str__ method for tensor and lodtensor to support print test=dev… (#17588)
6 years ago
Sylwester Fraczek 5b2a3c4b12 Conv concat relu quantization (#17466)
6 years ago
Sylwester Fraczek bccb0ba49a fix quantize_squash_pass segfault when no tensor linked to Bias (#17292)
6 years ago
guru4elephant 7f8bc49d00
polish_executor_and_add_ctx_cache (#17536)
6 years ago
Zeng Jinle c6189637cd
Fix allocator bug (#16712)
6 years ago
Qiao Longfei 58f7695ab2
Async exe support communicator (#17386)
6 years ago
guomingz 2281ebf0f3 Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130)
6 years ago
liuwei1031 c3949f5699
remove two useless flags: enable_subgraph_optimize, memory_optimize_debug, test=develop (#17491)
6 years ago
Tao Luo 32da5e9c3d
remove unused expected_kernel_cache_pass (#17486)
6 years ago
chengduo 5a6ab38013 Add record event And remove CSP (#17447)
6 years ago
Qiao Longfei 728bbaa4e3
add cache_update_mutex_ for operator test=develop (#17124)
6 years ago
guru4elephant 43c9561e9a
add inductive shape index (#17435)
6 years ago
Zeng Jinle 712bfb17cb
fix recurrent_op,test=develop (#17433)
6 years ago
Tao Luo 5babcd02dd
Revert "remove unnecessary prepare_data (#17080)" (#17432)
6 years ago
chengduo e336dc86bb
[Speed] Refine the Executor when the num_thread=1 (#17405)
6 years ago
Zhen Wang 4a1b7fec96
Add setting Scope function for the graph class (#17417)
6 years ago
jiaqi 66d51206b1
add save/load model, shrink table, cvm, config file & fix pull dense bug (#17118)
6 years ago
Tao Luo 68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn (#17341)
6 years ago
chengduo bc833945a4
Add DropLocalExeScopes in ParallelExecutor (#17297)
6 years ago
qingqing01 e32c9888f5
Double backward of conv2d. (#17211)
6 years ago
Zeng Jinle 5e5e7b3305
fix data_type error message (#17312)
6 years ago
guru4elephant 5d6a1fcf16
fix infer_from_dataset and train_from_dataset (#17243)
6 years ago
chengduo 516317cf91
use sync copy (#17291)
6 years ago
Hongyu Liu c3195de522
Fix concat shape check (#17247)
6 years ago
chengduo 04bd413acb
Code Clean: Move all pass to paddle::framework::ir (#17228)
6 years ago
Zeng Jinle 4f8594088d
Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225)
6 years ago
songhao c2e20e2a29 fix build warning like 'comparison between signed and unsigned (#17240)
6 years ago
石晓伟 a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 (#17156)
6 years ago
Zeng Jinle ee2028a110
Add use_cuda to inplace pass (#17205)
6 years ago
chengduo 950aec55fd
It doesn't need sync when fetch_list nit not empty (#17201)
6 years ago
tensor-tang 79ed1c76cd
fix bn fuse vardesc and add model saver (#17143)
6 years ago
Zeng Jinle 4e1bc6e805
Rewrite inplace pass and fix gc bug (#17126)
6 years ago
chengduo 794a195881
fix fuse optimizer ops (#17102)
6 years ago
Tao Luo aca60e9a20
remove unnecessary prepare_data (#17080)
6 years ago
Zeng Jinle 842ded14b0
fix reference_count_pass,test=develop (#17060)
6 years ago
Tao Luo d9cd989825
Merge pull request #17048 from luotao1/fix_runtime_cache_bug
6 years ago
chengduo cc31681687
use fast executor as default (#17044)
6 years ago
chengduo a2be4b4d91
Add fuse momenutum ops (#16745)
6 years ago
luotao1 490e746269 fix runtime_context_cache bug when gpu model has an op runs only on cpu
6 years ago
wopeizl 51a0243a56 fix nccl wrapper on windows
6 years ago
Zeng Jinle 1202d3fc74
Refine model gpu memory (#16993)
6 years ago
Yibing Liu 3c375751f8
Support seq len equal to 0 in sequence ops (#16935)
6 years ago
jiaqi 8bcba3db84
Merge pull request #16896 from xjqbest/develop
6 years ago
guru4elephant bbc6c5714f
Merge pull request #16887 from guru4elephant/add_nccl_context_pybind
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
dongdaxiang 2ab2869c2d fix GPU compile error problem
6 years ago
dongdaxiang 466d177d09 add pybind dependency
6 years ago
xjqbest 10991e00a9 fix bug of num > INT_MAX
6 years ago
xjqbest 241120d94d fix bug of num > INT_MAX
6 years ago
xjqbest dac70ad4c5 fix bug of num > INT_MAX
6 years ago
xjqbest 74471397cf fix bug of num > INT_MAX
6 years ago
dongdaxiang b091139049 add nccl wrapper for python API
6 years ago
dongdaxiang fff795e5c8 add nccl_wrapper
6 years ago
乔龙飞 Qiao Longfei 82cff5ec42
Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder
6 years ago
Yibing Liu 4267a81afc
Correct the lod level of compiled time in lod_reset (#16790)
6 years ago
chengduo e9409665f7
Refine Fuse Optimize Ops (#16810)
6 years ago
chengduo d105c06b50
Replace ThreadedExecutor with FastThreadedExecutor (#16650)
6 years ago
Qiao Longfei 1526a3e4da Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Yihua Xu 93cedfdb9c Fix the order while sorting the operators (#16756)
6 years ago
Qiao Longfei afc56949c1 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
liuwei1031 85363848a1
Security issue (#16774)
6 years ago
guru4elephant aa46caf3d9
Merge pull request #16765 from guru4elephant/gpu_dataset_train
6 years ago
dongdaxiang 3c2d236815 remove all warnings
6 years ago
Yiqun Liu 112f16143b
Add an option to enable the cache of expected kernel in train phase. (#16724)
6 years ago
liuwei1031 2e07c19a9c
disable memory_optimize and inpalce strategy by default, test=develop (#16760)
6 years ago
dongdaxiang ea07eb8cd2 remove comment in data_feed.cc
6 years ago
dongdaxiang 05464e7c5c add gpu training for Executor.train_from_dataset
6 years ago
Qiao Longfei 0608f8ca56 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder
6 years ago
Zeng Jinle 9f7b027dce
fix activation grad op desc maker (#16715)
6 years ago
liuwei1031 fdb719a1bf
avoid optimize variable used in subblock, test=develop (#16739)
6 years ago
liuwei1031 a18ef10c87
only use the latest version variable for inplace strategy (#16736)
6 years ago
Tao Luo 5c364cda3c
Merge pull request #16711 from luotao1/has_attr
6 years ago
chengduo 55b15db5af
Add unit test for fuse all_reduce ops (#16699)
6 years ago
luotao1 4098ba29ed reduce hasAttr elapsed time in RunImpl
6 years ago
luotao1 f89a9c5d95 Merge branch 'develop' into has_attr
6 years ago
Tao Luo ad4a1bd13c
Merge pull request #16339 from luotao1/core_opt_choose_kernel
6 years ago
luotao1 6afc97ca6b reduce hasAttr elapsed time in RunImpl
6 years ago
gongweibao 8b793d0efd
Fix DGC bug. (#16697)
6 years ago
Yiqun Liu 3fe8cb0dd7
Enable the runtime_context_cache pass in train phase (#16640)
6 years ago
xjqbest 6a57e8075a remove trainer_id in datafeed and dataset
6 years ago
luotao1 695f2db6a0 update expected_kernel_cache_pass
6 years ago
luotao1 226596a296 Merge branch 'develop' into core_opt_choose_kernel
6 years ago
xjqbest 5e5139283b fix runtime error
6 years ago
xjqbest 271b7147cc fix dataset bug
6 years ago
Zeng Jinle 1c526e1d1a
Fix some grad op desc makers (#16633)
6 years ago
chengduo ea2a2f778a Fix the bug of AllReduceDepPass (#16393)
6 years ago
chengduo b75a69bad6
Add Stream for fetch op handle (#16600)
6 years ago
chengduo 1342e2ea04
Fix the bug of the fast threaded executor (#16514)
6 years ago
gongweibao 423bc515da
fix batch merge bug (#16601)
6 years ago
liuwei1031 bd193781df
fix the bug of reusing different types of variables in memory_optimiz… (#16547)
6 years ago
乔龙飞 Qiao Longfei 21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
6 years ago
sneaxiy 10249c0b78 Merge develop
7 years ago
Qiao Longfei 9861a92f6f change the return type of NewTempScope to unique ptr test=develop
7 years ago
Qiao Longfei fb6cc3a1bd follow commnet, optimize code and add comment test=develop
7 years ago
Qiao Longfei adf272bcec Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
guru4elephant 76b49f02ee
Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop
7 years ago
Qiao Longfei baf02328b2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
Qiao Longfei 9db1a9e128 change log level test=develop
7 years ago
gongweibao a61ed9782e
fix log level test=develop (#16554)
7 years ago
Qiao Longfei 8342f12e31 fix set remote_prefetch test=develop
7 years ago
Qiao Longfei df45c8c538 update nce and hierarchical_sigmoid remote_prefetch
7 years ago
Qiao Longfei a1821a0449 remote remote_prefetch in embedding layer test=develop
7 years ago
dongdaxiang 718ea6dbd5 fix fleet code style
7 years ago
xjqbest 782ab2e2bd add some doc
7 years ago
xjqbest a99c8d0c29 fix client to client communication bug
7 years ago
gongweibao fea91164b7 Fix windows compilation error! (#16546)
7 years ago
Zhaolong Xing 3e6aa498d6
Merge pull request #16526 from NHZlX/refine_trt_anakin
7 years ago
sneaxiy 33473890f3 Merge develop
7 years ago
dongdaxiang ade9337486 fix API.spec
7 years ago
liuwei1031 278debab71
fix comments of 16410, test=develop (#16499)
7 years ago
dongdaxiang 720647e17f rebase current develop and fix conflict
7 years ago
dongdaxiang 98dda08a85 fix pull sparse slow problem
7 years ago
dongdaxiang d739bab844 fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem
7 years ago
dongdaxiang 241d8808be add timer to distributed executor
7 years ago
dongdaxiang 3c73859eec add trainer_desc.proto to distributed executor
7 years ago
dongdaxiang 60b7bf6fa6 add infer_from_dataset for inference
7 years ago
xjqbest 030c7e7e9d fix FillSparseValue error
7 years ago
dongdaxiang 88880d9b69 fix import trainer_desc_pb2 error
7 years ago
dongdaxiang 0030eb2a61 fix distributed building
7 years ago
dongdaxiang ed31874397 undefine rand_r()
7 years ago
dongdaxiang f7e4813804 add WIN32 for rand_r and usleep
7 years ago
dongdaxiang cedbc161da add more _LINUX maroc on data_feed.cc for mac and window compile
7 years ago
dongdaxiang c5980c3566 add _LINUX macro
7 years ago
dongdaxiang 433301fbc2 remove glog in shell.h
7 years ago
dongdaxiang 9e51ad4a65 fix io and fs compile on mac
7 years ago
dongdaxiang 6eca88ac76 fix io and fs compile on mac
7 years ago
dongdaxiang 2708108a08 fix fleet_wrapper compile on windows
7 years ago
dongdaxiang 4ce35815fb fix windows GLOG problem
7 years ago
dongdaxiang e3107a6ae0 fix windows compile problem
7 years ago
dongdaxiang 398004ece0 disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer
7 years ago
dongdaxiang d4514949bf remove local random engine in fleet with rand_r()
7 years ago
dongdaxiang 45eb6f0765 run pre-commit check files and fix code style problem
7 years ago
dongdaxiang d87ba58c14 refine document of python API, make device_worker and trainer's API private
7 years ago
dongdaxiang 5687f234bf fix trainer_desc.proto error
7 years ago
dongdaxiang b95b80bc76 add doc string for executor and update API.spec
7 years ago
dongdaxiang 6be9f719e2 make string_helper dependency work
7 years ago
xjqbest e95cafd9a7 fix code style & add dataset testcase
7 years ago
dongdaxiang ba15d6b164 move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids
7 years ago
xjqbest be74de2c61 fix code style & fix register bug & add release_memory
7 years ago
dongdaxiang a0b59773af fix code style
7 years ago
dongdaxiang f39b323ed7 remove trainer_library in CMakeLists
7 years ago
dongdaxiang 365be5d559 support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem
7 years ago
dongdaxiang 6bf796df14 refine print fetch list
7 years ago
xjqbest 589467f24c fix bug
7 years ago
xjqbest b7940c2918 fix bug of gen_worker_desc and set_filelist, add some doc
7 years ago
dongdaxiang 68d7bf3de5 add fetch var function
7 years ago