Paddle

Commit Graph

Author	SHA1	Message	Date
Zhaolong Xing	c5f0293cf3	NV jetson(nano, tx2, xavier) inference compile support (#21393 ) * add jeston compile support test=develop * refine the cmake test=develop	6 years ago
Tao Luo	01fa4ead61	fix -Wno-error=sign-compare warning in gcc8 (#21434 ) * fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop	6 years ago
wangchaochaohu	d4776ec027	fix the correctness of memcpy profiling result test=develop (#21458 )	6 years ago
Jie Fang	5e813b53c5	nhwc optimization for batchnorm (#21090 )	6 years ago
Leo Chen	e0c9d856fb	add unused input vars check for OpWithKernel, test=develop (#21169 ) * add unused input vars check for OpWithKernel, test=develop * remove unused vars in some ops, test=develop * fix batch_norm, test=develop * add white list, test=develop * add CI check for white list, test=develop * :ove white list to c++, test=develop * solve failure of CI, test=develop * add unittest for unused_var_check, test=develop * refine code, enable check in operator_test, test=develop * skip mkldnn, test=develop * extend white list, test=develop * refine condition of mkldnn, test=develop * fix paddle_build, test=develop * follow comments, test=develop * fix GetExpectedKernelType * add wiki ref to err_msg, test=develop * follow comment, test=develop	6 years ago
Huihuang Zheng	630be31952	Fix Cond Bug for Nested Control Flow (#21340 ) * Commit before merging develop test=develop * Backup after working with Huihuang logs * Commit before deleting Huihuang debug loggings * Commit before debug test=develop * Fix bug commit test=develop * Backup of fixing bugs test=develop * Clean up code test=develop * Fix a bug in sum_op test=develop	6 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	6 years ago
Zeng Jinle	6b09b73e17	add explicit conversion to NoNeedBufferVarsFunctor, test=develop (#21430 )	6 years ago
hong	ac8546701d	Add dygraph execution context (#20157 ) * add_dygraph_execution_context * add dygraph infershape context and execution context; test=develop * fix imperative bug; test=develop * remove inputs outputs interface from execution context, because it have same function with inputNames; test=develop * remove tracer_test ctest; test=develop * fix split op bug; test=develop * fix unitests bug; test=develop * fix distribute test bug; test=develop * fix ngraph compile bug; test=develop * fix grad maker bug; test=develop * fix load op bugs; test=develop * fix operator.cc construct bug; test=develop * remove useless name find in operator; test=develop * add tracer_test; test=develop * fix concat, split bug; test=develop * remove tracer_test unitest; test=develop * fix attribute check bug; test=develop * add test code to fix converage; test=develop * remove useless code, change check backward input in engin; test=develop * unlock var type infer shape;test=develop * add ShareAllLoD api; test=develop * add dygraph infershape context unitest; test=develop * remove increase and decrease lod in dygraph; test=develop * addd override; test=develop * fix increase descrease lod; test=develop * fix paddle_enforce; test=develop * disable lod op dygraph check; test=develop * fix paddle enforce error; test=develop * add comment for op_registry and OperatorBase; test=develop * optimize the comment of op_registry; test=develop * fix format of comment; test=develop * fix format of comment; test=develop * optimize the format of comment; test=develop * optimize the format of the comment; test=develop * optimize comment of op_registry; test=develop	6 years ago
Zeng Jinle	09696d5df8	Use system allocator in OpTest (#21335 ) * use system allocator in unittests, test=develop * fix op bugs, test=develop * fix tensor copy bug when src and dst are the same, test=develop	6 years ago
Tao Luo	c0656dcb1a	remove -Wno-error=sign-compare, make warning as error (#21358 ) * remove -Wno-error=sign-compare, make warning as error test=develop test=document_fix * fix exist compile warning test=develop	6 years ago
Zeng Jinle	b97fc16d21	fix lod_reset bug, test=develop (#21392 )	6 years ago
Zeng Jinle	89966525f1	Polish reference count pass (#21324 ) * fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop	6 years ago
Youwei Song	d5ff79e55e	Support numpy bridge (enabled by default in dygraph mode) (#20983 ) * add numpy bridge * fix template compile * add unittest, add default test=develop * fix unittest test=develop * fix unittest test=develop * zero_copy=True for to_variable, test=develop * bug fix test=develop * disable deprecated NumPy API test=develop * use better design of NumpyAllocator test=develop * fix Py_None check test=develop * reset c++ tracer when jump out dygraph guard test=develop * refine PADDLE_ENFORCE_xx format test=develop * bug fix of tracer switch test=develop * update decref test=develop	6 years ago
GaoWei8	8493f20ebc	Polish the codes of fc when needs padding (#21378 ) test=develop	6 years ago
Michał Gallus	5d7d548275	INT8 Fully-connected (#17641 ) * Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop	6 years ago
GaoWei8	234060f88f	Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972 ) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop	6 years ago
zhouwei25	345b67b5e2	remove warning LNK4006 and warning LNK4221 (#21226 )	6 years ago
Thunderbrook	9a7832f8be	print table stat info for pslib (#21296 ) * print table stat test=develop * notes test=develop * notes test=develop	6 years ago
Dong Daxiang	691ced87c0	Refactor fetch handler (#21264 ) * fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.	6 years ago
Yiqun Liu	c918788ba9	Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. (#21310 ) * Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. test=develop * Print the subgraph when check failed. test=develop	6 years ago
Chen Weihang	952508527a	Polish some PE code details (#21274 ) * polish code details, test=develop * futher polish hint msg, test=develop	6 years ago
Thunderbrook	0d17c1b816	solve pslib core in stop worker (#21263 ) * general table * add sparse table test=develop * no cvm test=develop * add no_cvm test=develop * add note test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * add key of optimizer test=develop * solve pslib stop core test=develop * barrier test=develop * add notes test=develop	6 years ago
Thunderbrook	349e82d669	support general embedding params (#21217 ) * general table * add sparse table test=develop * no cvm test=develop * add no_cvm test=develop * add note test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * add key of optimizer test=develop	6 years ago
Yiqun Liu	6b1e1f0dda	Enable generating code for a given subgraph. (#21126 ) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop	6 years ago
Zeng Jinle	a152315be7	refine Tensor method, test=develop (#21031 )	6 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	6 years ago
Zhaolong Xing	65f7052554	TRT int8: refine trt int8 for dynamic range set (#21112 ) * refine trt int8 for dynamic range set test=develop * refine trt int8 test=develop	6 years ago
xujiaqi01	23876de55b	fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052 ) * fix cache table bug * add save_paddle_inference_model * fix hdfs util bug * test=develop	6 years ago
xujiaqi01	9e045170c0	add copy table (#21086 ) * copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars	6 years ago
Chen Weihang	4bd9463630	fix detail error message error, test=develop (#21170 )	6 years ago
Chen Weihang	8da0cd537a	Add examples for error message writing specification - NotFound, OutOfRange, AlreadyExists, PermissionDenied (#21134 ) * add examples for error msg spec, test=develop * change ENFORCE to ENFORCE_*, test=develop add more already exists examples, test=develop	6 years ago
Chen Weihang	8414575b78	Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137 ) * add examples for error spec, test=develop * change ENFORCE to ENFORCE_**, test=develop	6 years ago
Chen Weihang	7e5f74b825	Add examples for error message writing specification - InvalidArgument (#21132 ) * add examples for error msg spec, test=develop * change ENFORCE to ENFORCE_*, test=develop fix error, test=develop	6 years ago
WangXi	de5d3ff688	Fix dgc buffer illegal & reuse velocity (#21012 )	6 years ago
Zeng Jinle	d625aaf0c1	remove so many logs of parallel executor, test=develop (#21105 )	6 years ago
Yiqun Liu	35f17ae28f	Add the check of lod_level between compile-time and runtime. (#20961 ) * Add the check of lod_level between compile-time and runtime. test=develop * Fix bug in check_compile_vs_runtime. test=develop * Fix the check of output when it is dispensiable or intermediate. test=develop * Share lod of x to out in match_matrix_tensor op in compile-time. * Implement GetLoDLevel in InferShapeContext. * Set the default value of check_compile_vs_runtime to False and enable it in test_sequence_pad_op. test=develop * Enable check_compile_vs_runtime in test_match_matrix_tensor. * Add the implementation of SetLoDLevel in InferShapeContext. * Remove the implementation of IncreaseLoDLevel and call Get/SetLoDLevel instead. * Remove the implementation of DecreaseLoDLevel and call Set/GetLoDLevel instead. * Refine some ops and unittests. test=develop * Fix a typo. test=develop * Remove the check of var type, and change int to int32_t. test=develop * Add unittest for Get/SetLoDLevel. test=develop	6 years ago
Chen Weihang	826254f664	Add pre-condition check for fuse optimizer op pass (#21005 ) * add pre condition check for fuse optimizer op pass, test=develop * add log & set init to zero, test=develop * fix test_fuse_all_reduce_pass failed, test=develop * polish details, test=develop * refine PADDLE_ENFORCE & remove needless VLOG, test=develop * refactor op check method, test=develop	6 years ago
Yiqun Liu	9091f8cdf9	Support generating code for grad_op (#21066 ) * Add the definition of operation in fusion_group. * Use operations in OperationMap to detect fusion_group of elementwise pattern. * Add namespace fusion_group in code_generator. * Use operations recorded in OperationMap to generate code. * Remove implementation codes to .cc file. * Refine Operation and CodeGenerator to make it easier to generate code for grad_op. Refine the unittest for better reuse. * Avoid recording the template's keyword in a array. * Support the generating of code for grad_op and add unittest. test=develop * Remove replaced_element_in_order and use use number instead. test=develop	6 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	6 years ago
Chen Weihang	7ee25189c3	Enrich the type of error and declare the error type interfaces (#21024 ) * Enrich the type of error and declare the error type interfaces, test=develop * adjust tests to adapt new form, test=develop * add inference deps with error_codes.pb.h, test=develop * restore stack iter start pos, test=develop * polish code based review comments, test=develop	6 years ago
Zeng Jinle	5aae595902	fix no_need_buffer_vars_dep, test=develop, test=document_fix (#21007 )	6 years ago
xujiaqi01	1d1a07937a	simplify master+patch，remove ins when size != merge_size or has conflict slot (#20913 ) * remove duplicate code and duplicate config of master+patch * drop all ins which has conflict slot or size < merge_size * user only need to set merge size，if ins num of same id is not equal to merge size, just drop these ins * user must make sure master data and patch data has no same slot whose feasigns are both non-zero, otherwise these ins will be dropped. (slot list should still be the same of both master and patch) * test=develop	6 years ago
Zeng Jinle	878a40f57d	Support NoNeedBufferVarsInference in dygraph backward (#20868 ) * support no need buffer vars in dygraph, test=develop * fix inference compilation error, test=develop * update no_need_buffer_vars_inference, test=develop * add unittests for no_need_buffer_vars_context, test=develop * refine no_need_buffer_vars by return ref, test=develop * polish some codes, test=develop	6 years ago
Wilber	c534149642	fix squared_mat_sub_fuse_pass when elementwise_op input is from persistable param test=develop (#20960 ) fix squared_mat_sub_fuse_pass when elementwise_op input is from persistable param	6 years ago
WangXi	eec4fa9099	And Enforce to fuse pass for DGC doesn't support fuse for now, test=develop (#20935 )	6 years ago
Zeng Jinle	b0c0ffb9ae	refine pe when exception raises, test=develop (#20894 )	6 years ago
123malin	20cdff0e02	Optimize decay (#20816 ) * update pserver decay blocks * update distributed notify handler	6 years ago
hong	8c4573a3cb	GradMaker for dygraph (#19706 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * optimize grad maker; test=develop * optimize grad maker * test * grad make optim; test=develop * fix unittest bugs; test=develop * add dygraph grad op maker and split_op * grad op maker refactor; test=develop * add dygraph grad maker; test=develop * fix op deformable_conv_v1_op bug; test=develop * fix deformable_conv prroi pool bugs; * fix new op grad op maker bug; test=develop * fix split by ref bug; test=develop * fix dygraph auto prune bug; test=develop * fix test_trace bug; test=develop * fix fused emb seq pool bug; test=develop * remove useless code in op_desc file; test=develop * remove useless code, StrVarBaseNode; test=develop * fix review issues; test=develop * fix rank_loss grad maker; test=develop * remove flag in VarBase; test=develop * fix distributed_notify_op compile bug ; test=develop * fix reshape op double grad; test=develop * fix expand as op; test=develop * add impertive type_defs.h for demo_train; test=develop * fix inference lib cmake; test=develop * fix inference lib; test=develop * fix infernce_lib; test=develop * fix inference cmake; test=develop * fix inference lib; test=develop * fix inference lib; test=develop * remove condition dygraph grad maker, modify local name; test=develop * fix split grad maker bug; test=develop * fix pyramid_op bug; test=develop * change travis time out limit; test=develop * restore travis; test=develop * change timeout limit; test=develop	6 years ago
Thunderbrook	59bcdc8a19	support dump param of model into afs (#20302 ) * support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop	6 years ago
Yiqun Liu	16e4d02675	Refine the cache of program, context and scope in executor. (#18483 ) * Refine the cache of program, context and scope in executor. test=develop * Refine the unittest test_executor_and_use_program_cache. * Add the test the PaddingRNN with use_program_cache=True. test=develop * Remove a check. test=develop * Refine the unittest to check whether it is correct when setting use_program_cache=True. test=develop	6 years ago
hong	ff0886a92a	save load problem fix and new feature add (#20823 ) * fix persistable; * fix save load bugs; test=develop * fix bug; test=develop * add example for new io api; test=develop * addd example; test=develop	6 years ago
Yiqun Liu	6fcfd32e6c	Check and correct the output's lod_level in DynamicRNN related operators (#19144 ) * Refine the InferShape of ReadFrom and WriteTo op, and add comment to explain why not call ShareLoD for runtime. test=develop * Add comment for ReorderLoDTensorByRank op. * Add comment for lod_tensor_to_tensor_array op to explain why only call DecreaseLoDLevel for compile time. test=develop * ShrinkRNNMemory op should call ShareLoD for compile time. test=develop * Add the implementation of IncreaseLoDLevel and add the compile-time check of lod_level in InferShape of sequence_pool. test=develop * Refine the unittest of DynamicRNN. test=develop * Change PADDLE_ENFORCE to PADDLE_ENFORCE_NE. test=develop	6 years ago
Yiqun Liu	b5f3be8330	Implement a pass detect fusion group of elementwise op (#19884 ) * Add fusion_group_pass and elementwise pattern. * Rewrite the detector of elementwise group. test=develop * Add a comment in codegen. * Add more unittest cases. test=develop * Move code_generator related code to fusion_group directory. * Correct the including path. * Add the definition of SubGraph and finish the insert of fusion_group op in pass. * Insert graph_vis_pass in tester to visualize the graph for debug.	6 years ago
Huihuang Zheng	95ba4bd2ab	Add shape and type check at read_op (#20754 )	6 years ago
Zeng Jinle	98103d3003	remove some unnecessary logs in pe, test=develop (#20848 )	6 years ago
Chen Weihang	26cc1fe508	Replace risky GetInputType method with secure IndicateVarDataType interface (#20668 ) * replace part of the old implementation, test=develop * restore concat op, test=develop * update all ops implemention & delete GetDataTypeOfVar func, test=develop	6 years ago
xujiaqi01	48669aa8f0	fix several sparse table issuses (#20686 ) * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto. * add find_distributed_lookup_table_grads instead of hard code GRAD * support embedding stop gradient. push sparse has error before fix this.* * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this. * fix pull sparse, skip slots which do not have embedding. * fix collect feasign label info, skip slots which do not have embedding. * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables. * test=develop	6 years ago
Chen Weihang	1d1552d106	Make formatted ENFORCE stack adapt to more situations (#20826 ) * Make formatted ENFORCE stack adapt to more situations and polish details, test=develop * restore template message position, test=develop	6 years ago
Zeng Jinle	ac813bbaf4	Add more error debug message to Operator::Run (#20793 ) * add more err msg, test=develop * add more unittests, test=develop	6 years ago
wangchaochaohu	ba45dce35d	fix codetest for windows make test=develop (#20796 )	6 years ago
zhongpu	72d1d72c09	fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721 )	6 years ago
石晓伟	48a774c713	fix ts_sort's bug, test=develop (#20720 )	6 years ago
wopeizl	9e5948230e	add support to gcc8, add docker env test=develop (#19807 ) * add support to gcc8, add docker env test=develop	6 years ago
xujiaqi01	5223b0dd9d	add check nan / inf in downpour worker (#20694 ) * add check nan / inf in downpour worker during training * test=develop	6 years ago
WangXi	507afa8a8a	Fix dgc nan by stripping nccl from sparseReduce. (#20630 )	6 years ago
Zeng Jinle	4eeda9d676	fix tensor_util, test=develop (#20699 )	6 years ago
Zeng Jinle	ab575de725	Fix op run log when memory optimization strategy is enabled (#20695 )	6 years ago
Jacek Czaja	a1cd27f13f	[MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance (#20241 ) * - Flushing mkl-dnn cache test=develop - Disabled clearing cache for LoadModel - Added clearing of mkl-dnn cache when Executor is created test=develop - Do not clear for GPU places test=develop - compilation fix test=develop * - Moved clearing of mkl-dnn cache in destructor of executor test=develop * - Compilation fix test=develop - Reverted conditional clearing of mkl-dnn cache in Executors's destructor test=develop - compilation fix	6 years ago
Zeng Jinle	10505faf4e	polish codes, test=develop (#20672 )	6 years ago
Chen Weihang	003f369bb2	Add IndicateVarDataType interface to block tensor is not initialized problem in OP GetExceptedKernelType (#20044 ) * add indicate_var_data_type inferface, test=develop * add unittests & polish error message, test=develop * remove needless include, test=develop * extract public function & polish message, test=develop * delete empty var check, test=develop * change data_type to pointer parameter, test=develop * polish details, test=develop	6 years ago
Chengmo	940c6ff1c8	Fix communicator slow bug & fix communicator stop bug (#20366 ) * test=develop,Fix communicator slow bug * test=develop, delete if() in stop_worker() * test=develop * fix UT, test=develop * fix bug in fetch handler, test=develop * fix bug in fetch handler, test=develop * test=develop, fix fetch barrier bug * test=develop, bug fix * test=develop, bug fix * test=develop, fix bug	6 years ago
WangXi	cadc6a9704	fix dgc test and bug when not set trainers_endpoints_, test=develop (#20617 )	6 years ago
Thunderbrook	f76a32df4a	dump fix dov vec file num (#20539 ) * support dump multi file test=develop * dump fix num file test=develop	6 years ago
633WHU	12e4be0382	Dlpack support (#20039 ) * support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop	6 years ago
Pei Yang	443f604c3b	add DisableGlogInfo() to AnalysisConfig, test=develop (#20581 )	6 years ago
Zeng Jinle	a9c8bdad7b	refine pe codes, test=develop (#20479 )	6 years ago
Zeng Jinle	76b321872a	fix cuda dev_ctx by event, test=develop (#20553 )	6 years ago
zhaoyuchen2018	b8333edef6	Add Multihead matmul fuse pass (#20167 ) * Add multihead fuse pass for ernie opt * Refine softmax test=develop * Refine cuda kernel * Refine cuda version * Refine cmake test=develop * refine header file * refine test case and pass * refine comments	6 years ago
Adam	7faa3e9555	Add ConvTranspose + BatchNorm fuse pass (#20161 ) * Add ConvTranspose + BatchNorm fuse pass test=develop * Add tests for conv+bn and conv_transpose+bn passes test=develop	6 years ago
xujiaqi01	22b80e1246	fix parse content in CreatePreLoadReaders (#20258 ) Fix parse content in CreatePreLoadReaders. Before this fix, if you use dataset.set_parse_content and dataset.preload, parse content didn't work.	6 years ago
hong	fa43e80e19	New save load interface (#20148 ) * add new save load interface; test=develop * add new save interface; test=develop * add save load interface ; * fix save load error; * fix dygraph set dict bug; * add save load unit test; test=develop * fix test_imperative_optimizer bug; test=develop * fix unitest optimizer bug; test=develop * fix code coverage; test=develop * fix converage; test=develop * add document for apis; test=develop * fix unitest error; test=develop * fix save load unit test error; test=develop * fix error message; test=develop * change set_parameter set_optimizer to save_dygraph; test=develop * add load_graph check; test=develop * fix api spec; test=develop	6 years ago
Zeng Jinle	c20b11ba11	simplify op_info.h, test=develop (#20195 )	6 years ago
hong	0ec2c081d9	update op compatible list; test=develop (#20175 )	6 years ago
tangwei12	c9139c3db3	trainer from dataset fetch targets (#19760 ) add executor.FetchHandler for train/infer from the dataset	6 years ago
chengduo	bfa55c9ddb	Add place deps for fused_all_reduce_op_handle (#20077 ) test=develop	6 years ago
Zeng Jinle	5fef859c65	remove map type from var_type_traits.h, test=develop (#20090 )	6 years ago
Zeng Jinle	4ad66c779c	fix op_compatiable_compile_error, test=develop (#20076 )	6 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	6 years ago
bingyanghuang	9de6772510	Follow comment of Merged QAT PR 18970 (#19979 ) * Follow Wangzhen's comment in PR 18970, test=develop * Review comments, test=develop * Leave fake quantization around mul test=develop * Replace Fake with Real Quantized Mul test=develop * Fix bug in quantize placement pass Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop	6 years ago
石晓伟	01b9d07963	update operator compatible info, test=develop (#19978 ) * update operator compatible info, test=develop * revert cmake/version.cmake, test=develop * add unit_tests and fix bugs, test=develop * update ../paddle/fluid/framework/framework.proto, test=develop * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop * update paddle/fluid/framework/version_test.cc, test=develop * add comments and rename interfaces, test=develop	6 years ago
joanna.wozna.intel	f5221ac19f	Disable conv requant squash (#20041 ) * Fix conv2d+dequantize squash for residual fusion test=develop * Disable conv-requant squash test=develop	6 years ago
wangchaochaohu	c9ea317b36	codegen code for reconstruction (#19728 ) * codegen code for reconstruction test=develop * fix the cmake test=develop * fix review advice test=develop	6 years ago
tangwei12	8f0b3c0516	the integrated communicator (#19849 ) * add a base class for the Communicator * add AsyncCommunicator Impl for async distributed training	6 years ago
Chen Weihang	b916335025	Paddle error message stack shaping and optimization (#19895 ) * shape and optimize paddle error message stack, test=develop * limit exception type & add unittest, test=develop * fix multi-platform problem, test=develop * fix related unnitest failed, test=develop * add doc & fix unittest errors, test=develop * fix function name error, test=develop * update tensor test exception msg compare, test=develop * remove unittest on win32, the dir format is different, test=develop * remove useless package, test=develop * add paddle enforce handler unittest, test=develop * add exception checkout, test=develop * fix coverage failed, test=develop * fix op registry test failed, test=develop * refactor whole pr, test=develop * remove test in CMakelist, test=develop * fix coverage, test=develop	6 years ago
chengduo	2450d15b78	disable fuse_all_optimizer_ops (#19966 ) test=develop	6 years ago
chengduo	101a2b610a	Add dtype for coalesce_tensor_op (#20016 ) Add dtype for coalesce_tensor_op	6 years ago
Huihuang Zheng	88af4ab650	Add new data layer (#19916 ) The new "fluid.data" changes old "fluid.layers.data": 1. Add shape and dtype check. 2. Remove "append_batch_size" parameter. We won't offer this in the new data layer because other deep learning platforms don't have this kind of data layer pre-processing. It may confuse users. 3. Remove "stop gradient" parameter because the data layer doesn't do back-propagation TODO： Now data layer feeded by executor is checked, will we want to check the feed data of readers in the future?	6 years ago
xujiaqi01	f50e701b3b	fix memory leak in HogwildWorker (#19956 ) fix memory leak in HogwildWorker, whose ops are explicitly deleted in destructor	6 years ago
xujiaqi01	cedc04775c	support change shuffle and train thread num (#19841 ) * support change shuffle thread num * support change train thread num * fix receive shuffle data of each channel * data norm stop gradient * add check thread_tensor type and root_tensor type when merge metric * remove sleep in shuffle, add config * add config of pslib client to client communication * fix xbox str * add data norm op testcase * add flush in trainer finalize	6 years ago
Zeng Jinle	cc157d5990	add inplace to assign op, test=develop (#19927 )	6 years ago
chengduo	55ce696986	clean tensor array (#19930 ) test=develop	6 years ago
chengduo	d7251a8e1e	Delete local execution scopes (#19749 ) * Add RecordHistoryLocalExecScopes test=develop	6 years ago
wopeizl	5452b6a152	remove the useless warning for user to avoid confuse test=develop (#19871 ) * remove the useless warning for user to avoid confuse test=develop	6 years ago
hong	85b398f171	Add op compatible information (#19910 ) * add op compatible infomation; test=develop * add enum type * add enum type; test=develop	6 years ago
Huihuang Zheng	e117114289	Set states of recurrent op as dependent vars in prune (#19865 ) * Set states of recurrent op as dependent vars in prune of save inference model This PR will fix the save/load inference model problem of RNN models. The reason of the bug is that save_inferenc_model will prune OPs that doesn't contribute to Output. But in recurrent_op, States are not Output, OPs refers States will be pruned. This fix adds States of recurrent_op as dependent var so that OPs referring States won't be pruned.	6 years ago
Zeng Jinle	b754700fb5	fix reduce and broadcast to avoid multi-stream, test=develop (#19889 )	6 years ago
joanna.wozna.intel	3f1d0234ae	Fix conv2d+dequantize squash for residual fusion (#19545 ) * Fix conv2d+dequantize squash for residual fusion test=develop * Change condition test=develop	6 years ago
Huihuang Zheng	a35557d8f4	Fix deps of prune (#19876 ) Add boost as dependency of prune fix #19862	6 years ago
Leo Chen	578a2f5da3	fix SplitLodTensor when batch_size = 0, test=develop (#19866 )	6 years ago
Yiqun Liu	3cd985a669	Add a pass to fuse fc+elementwise_add+layernorm (#19776 ) * Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop	6 years ago
Zeng Jinle	3f87464e9c	refine executor_gc_helper codes, test=develop (#19814 )	6 years ago
Zeng Jinle	3fd3b663a8	fix gc bug in controlflow ops, test=develop (#19827 )	6 years ago
Zeng Jinle	db26de8389	[Bug fix] Disable memory reuse on feeded variables (#19835 ) * fix memory reuse bug on feeding variables, test=develop * add comments to reference count members, test=develop	6 years ago
Thunderbrook	40c66f8df9	rm return in vfork (#19734 ) * rm return in vfork * rm return in vfork test=develop	6 years ago
xujiaqi01	6bf298bf09	support preload thread, optimize hdfs log, fix master+patch bug (#19695 ) * support preload thread * sleep before fleet wrapper exit for pslib core dump * optimize hdfs log * fix master+patch bug	6 years ago
Jiabin Yang	cc311bdf95	Feature/add transform data dygraph (#19707 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * add transform_data to dygraph * test=develop, refoctor name to make it easier to understand * test=develop, refoctor name to make it easier to understand * add test and change input to const ref for safety * test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ * add ut for data transform * refine ut for data_transform * test=develop, fix ut failed on parallel se-resnext * test=develop, change one more PADDLE_ENFORCE * add test_tracer on multiple devices * test=develop, change place to mutable for data transform * test=develop, add transform data on same place test and remove useless log * test=develop, Add to do for data layout and and ut for conv2d with no bias	6 years ago
Zeng Jinle	754fd57ed7	disable memory optimization passes when FLAGS_use_ngraph=True, test=develop (#19778 )	6 years ago
chengduo	8281497030	Fix warning info of build_strategy (#19805 ) * fix warning info test=develop * fix bug of all_reduce_deps_pass test=develop	6 years ago
Yiqun Liu	c67c8758cb	Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop	6 years ago
Chen Weihang	00d5375e0c	Add prune_backward function to cover complicated test_program.clone situation (#19772 )	6 years ago
Adam	d4413a54bc	Add common CreateKey for mkldnn handlers (#19767 ) test=develop	6 years ago
chengduo	056fdedde3	Open fuse all reduce option (#19765 ) * Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop	6 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Zeng Jinle	0daa5c9772	Make leaky relu inplacable (#19676 ) * make leaky relu inplacable, test=develop * force add unittests to pass coverage, test=develop	6 years ago
chengduo	e506c99c20	Open fuse broadcast option (#18833 ) * fix vlog level and fuse option type test=develop	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
chengduo	5866a7a5fe	Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418 ) * Enable fused_all_reduce_op_handle support GPU and CPU Gradients	6 years ago
Tao Luo	ec9bc1bd9f	paddle::framework::vectorize() templatization (#19730 ) remove unused accuracy-diff warpctc-cudnn implementation test=develop	6 years ago
Zeng Jinle	bb4f8dee83	add logs to left var memory size, test=develop (#19722 )	6 years ago
wangguanzhong	25dcd74d34	merge empty lod tensor, test=develop (#19228 ) * merge_empty_lod_tensor, test=develop * fix multiclass_nms, test=develop * refine API.spec, test=develop * add unittest case for fetch, test=develop * add lod tensor test, test=develop * return index for multiclass_nms, test=develop * add api for multiclass_nms2 * update API.spc, test=develop * refine api doc, test=develop * fix test_detection.py, test=develop * polish code, test=develop * add more unittest case, test=develop	6 years ago
Zeng Jinle	713c05dd60	refine tensor.mutable_data, test=develop (#19680 )	6 years ago
hutuxian	1ca6ea0318	fix cmakelist deps (#19668 ) fix cmakelist deps: remove unnecessary deps and add proper op deps	6 years ago
Tao Luo	bcddbc78d4	remove -Wmaybe-uninitialized warning (#19653 ) * remove -Wmaybe-uninitialized warning test=develop * remove uninitialized op_handle_ in scale_loss_grad_op_handle.cc test=develop	6 years ago
wangchaochaohu	ed8f44ea21	codegen for fused elementwise operation (#19520 ) * test=develop codegen for fused elementwise operation * fix test=develop	6 years ago
mapingshuo	dca9b6c5b0	add feed_var_names to Prune interface (#19589 ) * Fix bug: add feed_vars to the prune function	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
tensor-tang	e3e98ed678	fix scope lock bug on infer (#19624 )	6 years ago
Tao Luo	0a46d34538	refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607 ) test=develop	6 years ago
baojun	a3a4b6e570	Enable ngraph through build_strategy (#19266 ) * enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop	6 years ago
Adam	8d6d95cc2b	paddle::framework::vectorize() templatization (#19611 ) test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Yiqun Liu	c5548178b0	A a pass to enable the use of cudnn (#19346 ) * Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop	6 years ago
Adam	e94b26daf5	using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568 ) * using MKLDNNMemoryFormat = mkldnn::memory::format changes test=develop * PADDLE_ENFORCE update test=develop	6 years ago
gongweibao	abaf87be2b	Change backward_guard to optimize_guard to maximize the allreduce overlap. (#19506 ) Change backward_guard to optimize_guard to maximize the allreduce overlap	6 years ago
Zeng Jinle	19474019c2	fix fast pe to run highest priority ops first, test=develop (#19575 )	6 years ago
Zeng Jinle	0af8549750	fix seg fault of share lod, test=develop (#19573 )	6 years ago
hutuxian	c756b5d231	Paddlebox Framework (#18982 ) * Support looking up embeddings from BoxPS. * Add a _pull_box_sparse op, for now this op is not exposed to users. * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on. * Add 'BoxPSDataset' in python code. * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS. * Add UT. * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982	6 years ago
Jacek Czaja	ecd9f330c9	[MKL-DNN] Fix to face model on AVX512 platforms (#19282 ) - Refactor step 1 - Compilation fix - Yet another compilation fix - Even more compilation fix - Lint fixes test=develop - Removed deprectaed PADDLE_ENFORCE occurance test=develop - Candidate fix to BN forward - Lint fixes test=develop - Refactoring in data_layout_transform - compilation fix - Another comppilation fix - Step further into darkness - Yet another compilation fix - Yet another compilation fix - missing header - compilation fix - Added MKLDNN -> Paddle conversion in fetch op test=develop - Compilation fix test=develop - Lint test=develop - Mul fix - Fix to MKLDNN MUL op and Elementwise MUL UT test=develop - Workaround for diffrent weights with groups representation Paddle vs MKL-DNN. test=develop - Candidate fix for 5D convolution with groups - Refactor of fix for conv3d and conv2d in fetch op test=develop - Compilation fix - Still same compilation fix - Compilation fix - Compilation fix - Reverted refactoring of fixes - Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format not NHWC test=develop - minor fix in UT test=develop - Lint fixes test=develop	6 years ago
yaoxuefeng	10ca3f9609	add thread scope stat accurate metrics test=develop (#19480 ) * add thread scope stat accurate metrics test=develop * fix style * fix style * fix style * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix conflict * fix style * fix style test=develop * fix error test=develop * fix error test=develop	6 years ago
Tao Luo	02270b3eb1	remove unused assert.h (#19529 ) test=develop	6 years ago
chengduo	e340df013e	Support feed single persistable variable to PE (#19417 ) * update executor feed	6 years ago
Yiqun Liu	fcec365d29	Add a pass to replace dropout_op with scale_op when is_test is true (#19297 ) * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop	6 years ago
Thunderbrook	1fe468d319	support debug each output of each ins (#19004 ) * dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop * support debug tensor of each ins test=develop * support debug tensor of each ins test=develop * learning rate * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style test=develop * code style test=develop * unitest * style * style * multi phase * add channel * code style * style * style * unitest * style * define * define test=develop * style test=develop * rm define test=develop * linux * linux test=develop * style test=develop * output format test=develop * windows ci test=develop	6 years ago
Zeng Jinle	5c8f210ce3	refine inplace inference registry, test=develop (#19032 )	6 years ago
chengduo	b6d1d8901f	Increase num_iteration_per_drop_scope (#19075 ) * increase num_iteration_per_drop_scope test=develop * Fix bug of while_op test=develop * fix bug of whileOp test=develop	6 years ago
tangwei12	65c7368400	Fix the correctness of async mode at distributed training (#18863 ) * fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop	6 years ago
joanna.wozna.intel	2e3ec66be0	Add conv dequant squash for int8 (#18905 )	6 years ago
Tao Luo	c82280e445	remove unused conv_elementwise_add2_act_fuse.cc (#19344 ) test=develop	6 years ago
Leo Chen	a9d5fc5142	Enhance OpTest to check the consistency of operators when using and not using inplace (#19101 ) * add pybind interface to get all inplace ops, test=develop * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop * handle corner cases in op_test, test=develop * support outputs without tensor holder_, like XShape in reshape_op, test=develop * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop * use reshape_grad instead of reshape in FlattenGradOp, test=develop * fix error debug dims info for variables like XShape, test=develop * change computational order in sum_op to relieve computation difference using inplace, test=develop * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop * follow sneaxiy's comments, test=develop * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop	6 years ago
Tao Luo	e3c68bde78	stronger the error message of tensor's mutable_data (#19303 ) * stronger the error message of tensor's mutable_data test=develop * update error message test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
liuwei1031	50582071dc	fix compilation issue in windows vs2017 (#19183 ) * fix compilation issue in windows vs2017, test=develop * fix gtest lib not found issue, test=develop	6 years ago
juncaipeng	5368b36512	remove the warning for reminding user to avoid using the OriginProgram method, test=develop (#19244 ) This log information may annoy users who don't need to care about it.	6 years ago
chengduo	8a89ca94ce	Fix REGISTER_OP_WITHOUT_GRADIENT (#19251 ) * fix REGISTER_OP_WITHOUT_GRADIENT test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
Yiqun Liu	77572b70cb	Enhance the error message when GrapOpMaker is null. (#19070 ) * Enhance the error message when GrapOpMaker is null. test=develop * Call Proto() instead of directly using proto_ pointer. test=develop * Rollback to use proto_ directly, because some sepecial grad ops, such some double grad ops, donot have proto. test=develop	6 years ago
chengduo	c70a97f46e	Use CUDAPinnedPlace in buffered_reader (#19112 ) Use CUDAPinnedPlace in buffered_reader	6 years ago
jiaqi	b104ea0684	add get_last_save_xbox_base/get_last_save_xbox (#19122 ) * add get_last_save_xbox_base/get_last_save_xbox * fix fleet_util bug of load paddle model * add doc string in fleet api	6 years ago
joanna.wozna.intel	492a00f53e	Add conv reqantize squash (#18754 ) * Add requantize squash test=develop * Add more precise tests test=develop * REname and REfactor tester test=develop	6 years ago
joanna.wozna.intel	bce72c7fea	Replace Relu with bounded Relu in MobileNetV2 quantization (#18988 ) test=develop	6 years ago
chengduo	e044e84264	open fuse_all_optimizer_ops (#19087 ) test=develop	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
yaoxuefeng	9150cf50fc	add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871 ) * add ctr related metric layer test=develop * add save cache and slots shuffle test=develop * add save cache and slots shuffle test=develop * fix error * fix error * fix style for ci * fix for comments * change SlotsShuffle input to std::strinf for generality * fix style * fix style * fix style * fix style * fix style * fix style * fix stylr * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * change non-const reference to pointer * fix style * fix style * fix style test=develop * fix style test=develop * add return ins num in ctr metric op * change dtype to float in metric_op.py * fix error test=develop * fix style test=develop * fix API spec * fix API spec * fix API spec test=develop * add UT test=develop	6 years ago
hutuxian	5a80cc8431	Datafeed support reading to cuda place directly. (#19071 ) * add a place field in DataFeed to denote which place it will feed data to. * abstract the copy process in CopyToFeedTensor function * add UT for float32 type and for CUDAPlace	6 years ago
chengduo	17d62ab220	Enhance fuse optimization op pass (#19010 ) * Enhance fuse optimization op pass test=develop	6 years ago
chengduo	21440b4d69	Add call stack info during compile time (#19067 ) * Add call stack info during runtime and compile time test=develop * Rename operator_call_stack test=develop * Add unit test test=develop * follow comment test=develop	6 years ago
jiaqi	fc038da749	fix QueueDataset queue size (#19016 ) * fix QueueDataset queue size，set queue size = batch size * 100, to avoid too many instances in channel when training is much slower than reading data.	6 years ago
Leo Chen	8f53735437	Fix memory overwriting of tensors returned by executor (#19030 ) * fix memory overlapping of fetch var (return of executor.run), test=develop * fix wrong usage of ParallelExecutor in op_test, test=develop * remove useless parameter and simplify code * avoid tensor destruct untimely, test=develop * add testcase independent of OpTest, test=develop	6 years ago
Zeng Jinle	2175d19993	fix memory_reuse_pass memory_size calculation error, test=develop (#19020 )	6 years ago
Zeng Jinle	7ac748adb4	Open gc by default (#18836 ) * open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop	6 years ago
jiaqi	02c370c3dc	support filelist size < trainer num && fix pull dense (#18956 ) * support filelist size < trainer num * pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver * enable QueueDataset train same filelist for serveral times	6 years ago
chengduo	e7da0940f9	Disable fuse optimization option (#18924 ) * Disable fuse optimization test=develop	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
jiaqi	768059b3a0	adjust ins weight according to nid slot (#18784 ) adjust ins weight according to nid slot , user can specify adjust_ins_weight in strategy	6 years ago
Leo Zhao	10eeed93d1	Revert "use static variable to do cache instead of thread local in thread frequent switching case (#18428 )" (#18879 ) This reverts commit `ce38bb5341`. test=develop	6 years ago
Zeng Jinle	8008ab4e6b	Remove legacy C++ memory optimization codes (#18834 ) * remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop	6 years ago
Thunderbrook	52c1431eee	add clear_model interface in fleetwrapper (#18815 ) * dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop	6 years ago
chengduo	4140fe11a4	Open fuse optimization ops (#18741 ) * open fuse optimization ops test=develop	6 years ago
Zeng Jinle	a802da650b	Feature/mem opt pass refactor (#18735 ) * first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop	6 years ago
fuyinno4	c167a4b4dd	Fix shrink-dense and add scale-datanorm (#18746 ) Fix FleetWrapper: 1. fix shrink dense: just scale show 2. add datanorm scale: divide datanorm's gradient by batch_size	6 years ago
Zhaolong Xing	26ae6d49e4	Update trt5 for paddle-trt (#18645 ) * update paddle-trt for: 1. fix bug: when batch > 2, core in split plugin. 2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.) 3. add new attr to dropout. 4. shuffle channel, swish, relu6 support test=develop * 1. fix ci test=develop	6 years ago
Thunderbrook	d8396281ef	add slot to sparse table (#18686 ) The change includes 2 things: 1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table. 2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta. test=develop	6 years ago
jiaqi	d18aabb472	support patch data, add load_one_table, fix bug (#18509 ) （1）support patch data （merge slots of instances of same line id, modify dense layer which changes its size）（2）add fleet load_one_table interface, support load from paddle model and load from pslib model （3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error. （5）add more debug info in TrainFilesWithProfiler	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Huihuang Zheng	89bc3fd841	Support memory eager deletion on recurrent OP (#17710 ) Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)	6 years ago
Zeng Jinle	ae58afc546	Feature/auto_growth_allocator (#18561 ) * feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop	6 years ago
guru4elephant	d714bf037c	remove async executor and add data_feed.proto to the deps of train demo (#18659 ) * remove async executor and add data_feed.proto to the deps of train demo	6 years ago
chengduo	a6d468a265	fix PE fetch bug (#18644 ) test=develop	6 years ago
Leo Zhao	ff77dea969	not use transferscope cache in cpu case (#18578 ) * not use transferscope cache in cpu case test=develop * adjust variable name and add comments test=develop * use correct format for class member in operator.h * use correct format for class member in operator.cc test=develop	6 years ago
123malin	b414645a65	fix #17430 : int64类型的attr训练非预期 (#18264 ) * fix int64_t * update fill constant op unittest * add empty line	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
Leo Zhao	ce38bb5341	use static variable to do cache instead of thread local in thread frequent switching case (#18428 )	6 years ago
gongweibao	160ddc980c	Regroup fusion by date type. (#18496 )	6 years ago
chengduo	7453857324	Make fuse_all_reduce_op_pass support mix_precision (#17652 )	6 years ago
pkpk	e9c7e218f2	Nan debugger init (#18401 ) test=develop	6 years ago
Tao Luo	d234aa02cd	add transfer_scope_cache unit-test (#18467 ) test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
Michał Gallus	7023a86c3a	Fix Pooling output scale (#18186 ) * Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop	6 years ago
jiaqi	93a2b317f7	fix data feed ptr error (#18419 ) fix data feed ptr runtime error, pipeline trainer will core in some cases, so set it nullptr as default value.	6 years ago
chengduo	8ed33bf91f	Fix Bug-prone code of PE (#18354 ) * update pe reduce config test=develop * drop the local_exe_scopes of the previous parallel_executor test=develop	6 years ago
tangwei12	999d9a59a5	fix communicator with pyreader (#18350 ) * add is_runnning in communicator, test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
chengduo	135a59ed45	update reduce config (#18334 ) test=develop	6 years ago
chengduo	5489216eba	Clean build strategy (#18148 ) * clean build_strategy test=develop * DataBalanceOpHandle has been removed test=develop * debug * update build_strategy. test=develop	6 years ago
chengduo	14e1e165df	update alloc_continuous_space_for_grad_pass (#18287 ) test=develop	6 years ago
jiaqi	3f8031e256	dataset (#17973 ) (1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training. (2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem. (3) add Channel, Archive in paddle/fluid/framework (4) change dataset from shared_ptr to unique_ptr in pybind (5) move create/destroy readers from trainer to dataset (6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network. (7) fix thread num bug of Dataset when filelist size < thread num (8) support set_queue_num in InMemoryDataset	6 years ago
chengduo	25f3cd6486	Update execution_strategy option default value (#18183 ) * update execution_strategy option default value test=develop * fix doc error test=develop	6 years ago
chengduo	4978db2c10	Remove nccl dep when the number of GPU is 1 (#18158 ) * remove nccl dep when the number of GPU is 1 test=develop	6 years ago
chengduo	24e988a471	Fix bug of scope_buffered_ssa_graph_executor (#18100 ) * fix code bug test=develop	6 years ago
gongweibao	f5caf3443c	Fix reinitialized ncclid error! (#18025 )	6 years ago
chengduo	b5a1c1463d	Update CPU_NUM config (#18059 ) * update CPU_NUM config test=develop	6 years ago
hutuxian	f1d458daf0	add trainer_desc proto DEPS (#18019 )	6 years ago
gongweibao	da9143c1cc	Polish codes of old prs. (#17938 )	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
hutuxian	969e6378b9	Pipeline Concurrency (#17402 ) Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
gongweibao	972c54cd70	Fix FLAGS_fuse_parameter_memory_size unit from Bytes to MBytes. (#17924 )	6 years ago
gongweibao	dd4cd352c7	Fix sync_batch_norm_op ncclallreduce error! (#17918 )	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
wopeizl	453a49b1bc	Make ParallelExecutor support Windows GPU (#17787 ) * fix the ParallelExecutor on Windows test=develop * restrict to use one GPU only under windows	6 years ago
baojun	a4c528a31c	[NGraph] some ngraph updates to enable bert (#17739 ) * delay infershape test=develop * fall back subblock to paddle test=develop * fix edge cases test=develop * remove output duplicates test=develop * handle reshape2_grad infershape test=develop	6 years ago
chengduo	437520474c	fix DropLocalExeScopes (#17829 ) test=develop	6 years ago
Leo Zhao	50326563d5	enable mkldnn primitive reuse for platform reorder (#17826 ) test=develop	6 years ago
chengduo	863c75168c	polish error doc (#17772 ) test=develop	6 years ago
guru4elephant	d52391094d	fix prepare context redundant code problem, optimize executor by cach… (#17743 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * cache sub_scope, program, var when use_program_cache=True is set * make fetch_list runable with variables, add more unittest for use_program_cache	6 years ago
chengduo	67c8dade58	Add Event in ScopeBuffer Executor (#17667 ) * add event for fast executor and add threads for scopebuffer executor test=develop	6 years ago
Yiqun Liu	8fd39f3e99	Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236 ) * Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop	6 years ago
gongweibao	0d561ef442	fix 2dconn test=develop (#17681 )	6 years ago
mozga-intel	5eb81fe595	Capi for a ngraph engine (#17037 )	6 years ago
Jacek Czaja	6d8075ecef	[MKL-DNN] conv_transpose mkldnn bias pass (#17644 ) * - changes to graph detector - Changes to pass - Added ut for new pass - use_pass - Added pass to mkldnn passes - fix to registration - improved verbose messaging for conv bias passes - Lint fixes test=develop * - Lint fixes test=develop	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
Zeng Jinle	4aa931dd85	Code clean of Allocator (#17602 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
Sylwester Fraczek	5b2a3c4b12	Conv concat relu quantization (#17466 ) * add conv_concat_relu fuse test=develop * add test code test=develop * added missing include with unordered_map test=develop * review fixes for wojtuss test=develop * remove 'should (not) be fused' comment statements one of them was invalid anyway test=develop	6 years ago
Sylwester Fraczek	bccb0ba49a	fix quantize_squash_pass segfault when no tensor linked to Bias (#17292 ) * fix quantize_squash_pass segfault when there is no tensor linked do Bias input test=develop * add googlenet test test=develop * fix concat CreateKey not using input format test=develop	6 years ago
guru4elephant	7f8bc49d00	polish_executor_and_add_ctx_cache (#17536 ) * polish_executor_and_add_ctx_cache	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
Qiao Longfei	58f7695ab2	Async exe support communicator (#17386 ) Async exe support communicator	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
liuwei1031	c3949f5699	remove two useless flags: enable_subgraph_optimize, memory_optimize_debug, test=develop (#17491 )	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
chengduo	5a6ab38013	Add record event And remove CSP (#17447 ) * add record_event test=develop * remove csp test=develop	6 years ago
Qiao Longfei	728bbaa4e3	add cache_update_mutex_ for operator test=develop (#17124 ) * add cache_update_mutex_ for operator	6 years ago
guru4elephant	43c9561e9a	add inductive shape index (#17435 ) add inductive shape index	6 years ago
Zeng Jinle	712bfb17cb	fix recurrent_op,test=develop (#17433 )	6 years ago
Tao Luo	5babcd02dd	Revert "remove unnecessary prepare_data (#17080 )" (#17432 ) This reverts commit `aca60e9a20`.	6 years ago
chengduo	e336dc86bb	[Speed] Refine the Executor when the num_thread=1 (#17405 ) Refine the Executor when the num_thread=1	6 years ago
Zhen Wang	4a1b7fec96	Add setting Scope function for the graph class (#17417 ) * add set_not_owned function for graph * add scope set. test=develop * add scope_ptr enforce not null before setting.test=develop	6 years ago
jiaqi	66d51206b1	add save/load model, shrink table, cvm, config file & fix pull dense bug (#17118 ) * add save/load model, shrink table, cvm, config file & fix pull dense bug test=develop * fix global shuffle bug, fix pull dense bug, fix release memeory bug, fix shrink error add client flush, add get data size test=develop * fix global shuffle bug test=develop * fix global shuffle bug test=develop * fix code style test=develop * fix code style & modify pslib cmake test=develop * fix error of _role_maker test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix windows compile error of fleet test=develop * fix global shuffle bug * add comment test=develop * update pslib.cmake test=develop * fix fill sparse bug test=develop * fix push sparse bug test=develop	6 years ago
Tao Luo	68ec0a6f74	make parallel_executor support FLAGS_use_mkldnn (#17341 ) * make parallel_executor support FLAGS_use_mkldnn test=develop * add warning when set mkldnn_enabled_op_types_ in non-mkldnn env test=develop	6 years ago
chengduo	bc833945a4	Add DropLocalExeScopes in ParallelExecutor (#17297 ) * reset drop local scope counter test=develop	6 years ago
qingqing01	e32c9888f5	Double backward of conv2d. (#17211 ) * Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables，return None in Python.	6 years ago
Zeng Jinle	5e5e7b3305	fix data_type error message (#17312 ) test=develop	6 years ago
guru4elephant	5d6a1fcf16	fix infer_from_dataset and train_from_dataset (#17243 ) * fix train_from_dataset and infer_from_dataset example * add inductive dim for data_reader, example: shape=[-1, 1], then -1 will be inducted through run-time reading of number of elements	6 years ago
chengduo	516317cf91	use sync copy (#17291 ) test=develop	6 years ago
Hongyu Liu	c3195de522	Fix concat shape check (#17247 ) * fix shape_check; test=develop * fix format; test=develop * fix format; test=develop * fix ddim bug; test=develop * fix c++ format; test=develop * change function name; test=develop	6 years ago
chengduo	04bd413acb	Code Clean: Move all pass to paddle::framework::ir (#17228 ) * move pass to ir * polish code test=develop * fix dependency test=develop	6 years ago
Zeng Jinle	4f8594088d	Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225 ) * add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop * fix potential inplace bug test=develop * add more skip vars in mem opt pass,test=develop * follow comment,test=develop * follow comments,move duplicate out arg check to program->graph,test=develop	6 years ago
songhao	c2e20e2a29	fix build warning like 'comparison between signed and unsigned (#17240 ) integer', test=develop	6 years ago
石晓伟	a72dbe9abf	Cherry-pick benchmark related changes from release/1.4 (#17156 ) * cherry-pick commit from `8877054` * cherry-pick commit from `3f0b97d` * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit `8643dbc233`) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit `7ad182e16c`) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit `e14ab180fe`) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit `4b9fa42307`) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit `a6fb066f90`) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit `8121b3eccb`) * 1. add shuffle_channel_detect (cherry picked from commit `6efdea8997`) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit `e4726a066f`) * Modify symbol export rules test=develop	6 years ago
Zeng Jinle	ee2028a110	Add use_cuda to inplace pass (#17205 ) * add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop	6 years ago
chengduo	950aec55fd	It doesn't need sync when fetch_list nit not empty (#17201 ) test=develop	6 years ago
tensor-tang	79ed1c76cd	fix bn fuse vardesc and add model saver (#17143 ) * fix bn fuse vardesc and add model saver test=develop * unify save model in test helper test=develop * fix mkdir on windows test=develop * remove magic number use bn bias var desc test=develop	6 years ago
Zeng Jinle	4e1bc6e805	Rewrite inplace pass and fix gc bug (#17126 ) * fix op graph view test=develop * rewrite inplace pass and fix reference count pass bug test=develop * fix unittest failed test=develop * follow comments, test=develop	6 years ago
chengduo	794a195881	fix fuse optimizer ops (#17102 ) test=develop	6 years ago
Tao Luo	aca60e9a20	remove unnecessary prepare_data (#17080 ) test=develop	6 years ago
Zeng Jinle	842ded14b0	fix reference_count_pass,test=develop (#17060 ) test=develop	6 years ago
Tao Luo	d9cd989825	Merge pull request #17048 from luotao1/fix_runtime_cache_bug fix runtime_context_cache bug when gpu model has an op runs only on cpu	6 years ago
chengduo	cc31681687	use fast executor as default (#17044 ) test=develop	6 years ago
chengduo	a2be4b4d91	Add fuse momenutum ops (#16745 ) * Add fuse momenutum ops	6 years ago
luotao1	490e746269	fix runtime_context_cache bug when gpu model has an op runs only on cpu test=develop	6 years ago
wopeizl	51a0243a56	fix nccl wrapper on windows test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
Yibing Liu	3c375751f8	Support seq len equal to 0 in sequence ops (#16935 ) * Support seq len equal to 0 in sequence ops test=develop * Add more test cases * Fix some comments test=develop * Fix py3 error test=develop	6 years ago
jiaqi	8bcba3db84	Merge pull request #16896 from xjqbest/develop fix bug of num > INT_MAX	6 years ago
guru4elephant	bbc6c5714f	Merge pull request #16887 from guru4elephant/add_nccl_context_pybind Add nccl context pybind	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
dongdaxiang	2ab2869c2d	fix GPU compile error problem	6 years ago
dongdaxiang	466d177d09	add pybind dependency test=develop	6 years ago
xjqbest	10991e00a9	fix bug of num > INT_MAX	6 years ago
xjqbest	241120d94d	fix bug of num > INT_MAX	6 years ago
xjqbest	dac70ad4c5	fix bug of num > INT_MAX	6 years ago
xjqbest	74471397cf	fix bug of num > INT_MAX	6 years ago
dongdaxiang	b091139049	add nccl wrapper for python API	6 years ago
dongdaxiang	fff795e5c8	add nccl_wrapper	6 years ago
乔龙飞 Qiao Longfei	82cff5ec42	Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder Add async sparse param update recorder	6 years ago
Yibing Liu	4267a81afc	Correct the lod level of compiled time in lod_reset (#16790 ) test=develop	6 years ago
chengduo	e9409665f7	Refine Fuse Optimize Ops (#16810 ) * fix bug of fuse optimize ops	6 years ago
chengduo	d105c06b50	Replace ThreadedExecutor with FastThreadedExecutor (#16650 ) * replace ThreadedExecutor with FastThreadedExecutor test=develop * Fix Travise CI test=develop * Test FastThreadedSSAGraphExecutor test=develop * refine parallel_ssa_graph_executor.cc test=develop	6 years ago
Qiao Longfei	1526a3e4da	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder test=develop	6 years ago
Yihua Xu	93cedfdb9c	Fix the order while sorting the operators (#16756 ) * Fix the order when sorting operators. test=develop * Enable transfomer compare test item. test=develop * Use set to replace vector. test=develop	6 years ago
Qiao Longfei	afc56949c1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder	6 years ago
liuwei1031	85363848a1	Security issue (#16774 ) * disable memory_optimize and inpalce strategy by default, test=develop * fix security issue http://newicafe.baidu.com:80/issue/PaddleSec-3/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-8/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-12/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-32/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-35/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-37/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-40/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-43/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-44/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-45/show?from=page test=develop * revert piece.cc, test=develop * adjust api.cc,test=develop	6 years ago
guru4elephant	aa46caf3d9	Merge pull request #16765 from guru4elephant/gpu_dataset_train add gpu training for Executor.train_from_dataset	6 years ago
dongdaxiang	3c2d236815	remove all warnings test=develop	6 years ago
Yiqun Liu	112f16143b	Add an option to enable the cache of expected kernel in train phase. (#16724 ) * Add an option to enable the cache of expected kernel in train phase. test=develop * Change the default value of cache_expected_kernel to true.	6 years ago
liuwei1031	2e07c19a9c	disable memory_optimize and inpalce strategy by default, test=develop (#16760 )	6 years ago
dongdaxiang	ea07eb8cd2	remove comment in data_feed.cc develop=test	6 years ago
dongdaxiang	05464e7c5c	add gpu training for Executor.train_from_dataset test=develop	6 years ago
Qiao Longfei	0608f8ca56	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder	6 years ago
Zeng Jinle	9f7b027dce	fix activation grad op desc maker (#16715 ) test=develop	6 years ago
liuwei1031	fdb719a1bf	avoid optimize variable used in subblock, test=develop (#16739 )	6 years ago
liuwei1031	a18ef10c87	only use the latest version variable for inplace strategy (#16736 ) * bug-fix, test=develop * tweak code, test=develop	6 years ago
Tao Luo	5c364cda3c	Merge pull request #16711 from luotao1/has_attr reduce hasAttr elapsed time in RunImpl	6 years ago
chengduo	55b15db5af	Add unit test for fuse all_reduce ops (#16699 ) * test fuse all_reduce	6 years ago
luotao1	4098ba29ed	reduce hasAttr elapsed time in RunImpl test=develop	6 years ago
luotao1	f89a9c5d95	Merge branch 'develop' into has_attr	6 years ago
Tao Luo	ad4a1bd13c	Merge pull request #16339 from luotao1/core_opt_choose_kernel Cache the chosen kernel of operators	6 years ago
luotao1	6afc97ca6b	reduce hasAttr elapsed time in RunImpl test=develop	6 years ago
gongweibao	8b793d0efd	Fix DGC bug. (#16697 )	6 years ago
Yiqun Liu	3fe8cb0dd7	Enable the runtime_context_cache pass in train phase (#16640 ) * Try to enable the runtime_context_cache pass in train phase. * Put the append of runtime_context_cache pass ahead of multi_dev passes. test=develop	6 years ago
xjqbest	6a57e8075a	remove trainer_id in datafeed and dataset test=develop	6 years ago
luotao1	695f2db6a0	update expected_kernel_cache_pass test=develop	6 years ago
luotao1	226596a296	Merge branch 'develop' into core_opt_choose_kernel	6 years ago
xjqbest	5e5139283b	fix runtime error test=develop	6 years ago
xjqbest	271b7147cc	fix dataset bug test=develop	6 years ago
Zeng Jinle	1c526e1d1a	Fix some grad op desc makers (#16633 ) * fix some grad op desc maker test=develop * fix grad op desc makers test=develop	6 years ago
chengduo	ea2a2f778a	Fix the bug of AllReduceDepPass (#16393 )	6 years ago
chengduo	b75a69bad6	Add Stream for fetch op handle (#16600 ) * expose fuse broadcast ops	6 years ago
chengduo	1342e2ea04	Fix the bug of the fast threaded executor (#16514 ) * Fix the bug of the fast threaded executor. I	6 years ago
gongweibao	423bc515da	fix batch merge bug (#16601 )	6 years ago
liuwei1031	bd193781df	fix the bug of reusing different types of variables in memory_optimiz… (#16547 ) * fix the bug of reusing different types of variables in memory_optimize_pass, test=develop * disable SELECTED_ROWS AND LOD_TENSOR_ARRAY reusage, test=develop	6 years ago
乔龙飞 Qiao Longfei	21622ca30b	Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator Add async ssa graph executor communicator	6 years ago
sneaxiy	10249c0b78	Merge develop test=develop	7 years ago
Qiao Longfei	9861a92f6f	change the return type of NewTempScope to unique ptr test=develop	7 years ago
Qiao Longfei	fb6cc3a1bd	follow commnet, optimize code and add comment test=develop	7 years ago
Qiao Longfei	adf272bcec	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	7 years ago
guru4elephant	76b49f02ee	Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop Train with pipe reader merge develop	7 years ago
Qiao Longfei	baf02328b2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	7 years ago
Qiao Longfei	9db1a9e128	change log level test=develop	7 years ago
gongweibao	a61ed9782e	fix log level test=develop (#16554 )	7 years ago
Qiao Longfei	8342f12e31	fix set remote_prefetch test=develop	7 years ago
Qiao Longfei	df45c8c538	update nce and hierarchical_sigmoid remote_prefetch test=develop	7 years ago
Qiao Longfei	a1821a0449	remote remote_prefetch in embedding layer test=develop	7 years ago
dongdaxiang	718ea6dbd5	fix fleet code style test=develop	7 years ago
xjqbest	782ab2e2bd	add some doc test=develop	7 years ago
xjqbest	a99c8d0c29	fix client to client communication bug test=develop	7 years ago
gongweibao	fea91164b7	Fix windows compilation error! (#16546 ) * fix compiled test=develop * follow comments test=develop	7 years ago
Zhaolong Xing	3e6aa498d6	Merge pull request #16526 from NHZlX/refine_trt_anakin refine subgraph trt and anakin	7 years ago
sneaxiy	33473890f3	Merge develop test=develop	7 years ago
dongdaxiang	ade9337486	fix API.spec test=develop	7 years ago
liuwei1031	278debab71	fix comments of 16410, test=develop (#16499 ) * fix comments of 16410, test=develop * modify inplace_op_inference_test according to pass interface change, test=develop	7 years ago
dongdaxiang	720647e17f	rebase current develop and fix conflict test=develop	7 years ago
dongdaxiang	98dda08a85	fix pull sparse slow problem test=develop	7 years ago
dongdaxiang	d739bab844	fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem test=develop	7 years ago
dongdaxiang	241d8808be	add timer to distributed executor test=develop	7 years ago
dongdaxiang	3c73859eec	add trainer_desc.proto to distributed executor test=develop	7 years ago
dongdaxiang	60b7bf6fa6	add infer_from_dataset for inference	7 years ago
xjqbest	030c7e7e9d	fix FillSparseValue error test=develop	7 years ago
dongdaxiang	88880d9b69	fix import trainer_desc_pb2 error test=develop	7 years ago
dongdaxiang	0030eb2a61	fix distributed building test=develop	7 years ago
dongdaxiang	ed31874397	undefine rand_r() test=develop	7 years ago
dongdaxiang	f7e4813804	add WIN32 for rand_r and usleep test=develop	7 years ago
dongdaxiang	cedbc161da	add more _LINUX maroc on data_feed.cc for mac and window compile test=develop	7 years ago
dongdaxiang	c5980c3566	add _LINUX macro test=develop	7 years ago
dongdaxiang	433301fbc2	remove glog in shell.h test=develop	7 years ago
dongdaxiang	9e51ad4a65	fix io and fs compile on mac test=develop	7 years ago
dongdaxiang	6eca88ac76	fix io and fs compile on mac test=develop	7 years ago
dongdaxiang	2708108a08	fix fleet_wrapper compile on windows test=develop	7 years ago
dongdaxiang	4ce35815fb	fix windows GLOG problem test=develop	7 years ago
dongdaxiang	e3107a6ae0	fix windows compile problem test=develop	7 years ago
dongdaxiang	398004ece0	disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer test=develop	7 years ago
dongdaxiang	d4514949bf	remove local random engine in fleet with rand_r() test=develop	7 years ago
dongdaxiang	45eb6f0765	run pre-commit check files and fix code style problem test=develop	7 years ago
dongdaxiang	d87ba58c14	refine document of python API, make device_worker and trainer's API private test=develop	7 years ago
dongdaxiang	5687f234bf	fix trainer_desc.proto error	7 years ago
dongdaxiang	b95b80bc76	add doc string for executor and update API.spec test=develop	7 years ago
dongdaxiang	6be9f719e2	make string_helper dependency work test=develop	7 years ago
xjqbest	e95cafd9a7	fix code style & add dataset testcase test=develop	7 years ago
dongdaxiang	ba15d6b164	move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids test=develop	7 years ago
xjqbest	be74de2c61	fix code style & fix register bug & add release_memory test=develop	7 years ago
dongdaxiang	a0b59773af	fix code style	7 years ago
dongdaxiang	f39b323ed7	remove trainer_library in CMakeLists test=develop	7 years ago
dongdaxiang	365be5d559	support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem test=develop	7 years ago
dongdaxiang	6bf796df14	refine print fetch list	7 years ago
xjqbest	589467f24c	fix bug	7 years ago
xjqbest	b7940c2918	fix bug of gen_worker_desc and set_filelist, add some doc	7 years ago
dongdaxiang	68d7bf3de5	add fetch var function test=develop	7 years ago

... 6 7 8 9 10 ...

3040 Commits (89530384008b023dc1e8c51e5a8e7e710718efff)