Paddle

Commit Graph

Author	SHA1	Message	Date
Kaipeng Deng	14625ffe9e	add elementwise mod support float/double. test=develop (#19570 )	6 years ago
Jacek Czaja	5b07ca9cdd	- ReImplemented pooling fwd mkldnn (#19911 ) - First implementation of BWD and FWD of pooling mkl-dnn - Compilation fix - Fix - Fix - Fix - Fix to crash - Compilation fix - Combined AcquireBacward with Fwd test=develop	6 years ago
Zeng Jinle	b1e83b33b0	fix huber loss op attr type, test=develop (#19937 )	6 years ago
Zeng Jinle	cc157d5990	add inplace to assign op, test=develop (#19927 )	6 years ago
Leo Chen	57606205f5	Make OpTest check grad inplace even if forward has no inplace (#19847 ) * make OpTest check grad inplace even if forward has no inplace, test=develop * do not run PE when enable_inplace is False, test=develop * add conv3d cuda kernel for float16 type, test=develop * refactor OpTest for inplace, test=develop * add comments, test=develop	6 years ago
Zhang Ting	cb8f3c03a7	resize Ops support data_layout:channel_last, test=develop, test=document_preview (#19914 )	6 years ago
Kaipeng Deng	3f021781a1	fix softmax CE time limit check failed (#19846 ) * fix softmax ce time limit check failed. test=develop * refine softmax calc. test=develop	6 years ago
石晓伟	30adea0a23	tensor_array_to_tensor_op.cc, test=develop (#19289 )	6 years ago
lvmengsi	4155e62559	add instance norm (#19500 ) * add instance norm op	6 years ago
Adam	cb65439da8	Add support for other axes in MKLDNN softmax op (#19907 ) * Initial, functional commit * Clean commit related files test=develop	6 years ago
Pei Yang	baccd7e2ca	Add TRT input shape check between model and runtime (#19864 ) * add TRT shape check, test=develop * model_input_shape == runtime_input_shape, refine message, test=develop	6 years ago
Aurelius84	fcf53e55ff	support 2-level lod of input in sequence_pool (#19839 ) * support 2-level lod of input in sequence_pool test=develop * fix lod level bug in .cu test=develop	6 years ago
Zhang Ting	93364b45c1	group_norm support data_layout:NHWC, test=develop, test=document_preview (#19614 ) 1. group_norm support data_layout=NHWC 2. modified doc of group_norm	6 years ago
Jacek Czaja	619c797a7f	[MKL-DNN] LRN refactoring (#19798 ) - LRN mkl-dnn kernel refactor test=develop - compilation fix - Another compilation fix - Compilation fix - another compilation fix - compilation fix - Crash fix - optional LRN mkldnn workspace - Added mid allocation - Workaround for tests - Removed gradient from is_test ut - Removed mid for inference - Reverted LRN mid removal for is_test - PADDLE_ENFORCE adjusted - Rebase to templatization commit - Compilation fix - compilation fix test=develop - lint test=develop - Fix to crash - Rebase to recent codebase - lin - lint - compilation fix	6 years ago
Zhang Ting	439d95e157	modified interpolate op to support tensor attribute, test=develop, test=document_preview (#19287 ) modified interpolate_op to support tensor attribute 1. the parameter out_shape of image_resize、resize_nearest/bilinear/trilinear can be a list or a 1-D tensor variable. If a list, each element can be an integer or a tensor variable with shape: [1]. 2. the parameter scale of above Ops can be a 1-D tensor variable. modified document of image_resize, resize_nearest, resize_bilinear, resize_trilinear and add some code example.	6 years ago
Zhang Ting	b38889413d	add crop_tensor_op, test=develop, test=document_preview (#19314 ) add crop_tensor op. The main difference with crop is : 1. If the argument shape is a list, each element is an integer or a tensor variable with shape: [1]. This way is suitable for the case that the shape may be changed each iteration. 2. If the argument shape is a variable. Its rank must be 1. In crop op, the rank of shape must be the same as x offsets can be a list, in which each element is an integer or a tensor variavle with shape: [1].	6 years ago
lidanqing	2c32c2d649	Refactor conv computeINT8 (#19574 ) * fix conflicts test=develop * change mask_bias_reorder test=develop * add ComputeMask function to make code clear test=develop * change according to reviews test=develop * change according to reviews test=develop	6 years ago
Adam	c7e688921b	Add template functions for Acquire primitive/primitive_desc (#19867 ) * Add template functions for Acquire primitive/primitive_desc test=develop * Move acquire primitive descriptor to protected section test=develop	6 years ago
Aurelius84	b125e327aa	Remove constraint that last dimension is forced to be 1 in cross_entropy (#19606 ) * Remove constraint that last dimension is forced to be 1 in cross_entropy test=develop * modify labels last dims test=develop	6 years ago
wopeizl	a7c440d303	add precise roi pooling op test=develop (#18960 ) * add precise roi pooling op test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * detail the description test=develop * test=develop * elaborate the doc for return type test=develop * test=develop	6 years ago
Yiqun Liu	3cd985a669	Add a pass to fuse fc+elementwise_add+layernorm (#19776 ) * Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop	6 years ago
wangchaochaohu	47af618f70	Strided slice (#19642 ) * strided_slice op basic function test=develop * test=develop rewrite and fix * fix bug test=develop * fix for the PADDLE_ENFORCE usage * add some unit testw * fix for the aip test and copright and fix test=develop * fix API.spec test=develop * fix API.spec test=develop * add axis parameter test=develop * fix for the build error test=develop * fix python api test=develop * fix the build test=develop * fix build test=develop * fix API spec test=develop * test=develop add some comment and single op test * fix API spece test=develop * fix test=develop * fix test=develop * fix api test=develop * fix api test=develop * fix API.spec test=develop * fix typo test=develop * fix API.spec test=develop * fix API typo test=develop * fix doc and API.spec test=develop	6 years ago
123malin	1bc285a53a	add retry function to try to solve grpc error code 14 (#19661 ) * rpc retry for asycsend/get/prefetch * test=develop, change retry vlog level to 3 * test=develop, set default grpc_retry_times is 3	6 years ago
LielinJiang	6d72a86b14	fix_roi_transform_bug (#19785 )	6 years ago
Zeng Jinle	3fd3b663a8	fix gc bug in controlflow ops, test=develop (#19827 )	6 years ago
Leo Chen	982e61f5ff	Update elementwise double grad to save gpu memory (#19509 ) * update elementwise double grad to save gpu memory, test=develop * update elementwise_mul/div_grad_grad to save memory, test=develop * remove eval function in eigen statement to save memory, test=develop * add unittest for elementwise_div_grad_grad without dout, test=develop * add unittest for elementwise_add_grad_grad without ddx, test=develop * add float16 cuda kernel for elementwise double grad op, test=develop	6 years ago
Adam	dfdd73cbc0	Add MKLDNNhandlerT templatized class (#19801 ) test=develop	6 years ago
Zeng Jinle	cabb9501bd	fix leaky_relu op when alpha is zero, test=develop (#19833 )	6 years ago
chengjuntao	00efd1d8a9	add deformable conv v1 op and cpu version of deformable conv v2 (#18500 ) * add deformable conv v1 op, test=develop	6 years ago
liym27	677e714425	fix pow op, support tensor for agument factor. (#19313 ) improve pow op according to reviews: 1. Delete unnecessary judgement statements in PowGradOpDescMaker; 2. Improve test of test_api; overload GetKernelTypeForVar add stop_gradient=True when attr(factor) is tensor Variable, change examples in API pow. test=develop,test=document_preview	6 years ago
liym27	bd89a27308	add tensor support for argument shape in reshape op; (#19268 ) add support parameter inference when argument shape is a list containing integer and tensor variable; test=develop fix reshape op according to reviews: 1. improve or message; 2. improve test of test_api. test=develop,test=document_preview fix reshape op: Add error message in nn.py, test=develop add stop_gradient=True when attr(shape) is tensor Variable. change examples in API reshape. test=develop,test=document_preview	6 years ago
liym27	88628016b2	add tensor(tensor and tensor in list) support for argument starts and ends in slice op; (#19208 ) add support parameter inference when arguments starts or ends is a list containing integer and tensor variable; test=develop,test=document_preview improve slice op according to review(from hongyu). test=develop fix slice op according to review: infer_flags, test=develop fix slice op: improve overload operator __getitem__ to support attrs(starts and ends) are Variable. test=develop,test=document_preview fix test_slice_op: add TestSliceOp_decs_dim_6 to resolve conflict with test_slice_ngraph_op. test=develop add stop_gradient=True when attr(starts) or attr(ends) is tensor Variable. test=develop,test=document_preview	6 years ago
liym27	e9e3c08777	fix expand op: (#19302 ) 1. add tensor support for argument expand_times in expand op; 2. add support parameter inference when argument expand_times is a list containing integer and tensor variable; improve expand op according to reviews: 1. add doc of ExpandTimes in expand_op.cc; 2. improve the test of test_api. add stop_gradient=True when attr(expand_times) is tensor Variable, change code examples. test=develop,test=document_preview	6 years ago
lvmengsi	b76343c3b7	cpu Conv double grad (#19672 ) * cpu conv_grad_grad	6 years ago
翟飞跃	93c85c930a	Implement FusedEmbeddingSeqPoolGradKernel with cblas_saxpy (#19770 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * optimize bp with mkl sparse matrix test=develop * tmp add fused_emb_seq layer * Add the support of padding_idx attribute. test=develop * add padding_idx support test=develop * implement grad refer lego test=develop	6 years ago
Yiqun Liu	c67c8758cb	Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop	6 years ago
zhongpu	52673956de	add kernel for squeeze_op, test=develop (#19656 ) * add kernel for squeeze_op, test=develop * delete comment, test=develop	6 years ago
zhongpu	2a81c3679a	add kernel for unstack_op, test=develop (#19538 ) * add kernel for unstack_op, test=develop * add kernel for unstack_op, test=develop * add kernel for unstack_op, test=develop * adjust the code format, test=develop * modify some comment, test=develop	6 years ago
Kaipeng Deng	99c78b772a	fix softmax axis!=-1. test=develop (#19800 )	6 years ago
Adam	d4413a54bc	Add common CreateKey for mkldnn handlers (#19767 ) test=develop	6 years ago
Aurelius84	8c7e411908	Remove constraint that last dimension is forced to be 1 by adding one_hot_v2 (#19716 ) * add one_hot_v2_op to remove last_dims==1 test=develop * add api unittest code for CI_Coverage test=develop * improve CI_Coverage rate by adding test_with_depth test=develop	6 years ago
Jacek Czaja	9e4c958552	Refactoring activation mkldnn op (#19748 ) test=develop - fix to BWD test=develop	6 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Zeng Jinle	0daa5c9772	Make leaky relu inplacable (#19676 ) * make leaky relu inplacable, test=develop * force add unittests to pass coverage, test=develop	6 years ago
Zeng Jinle	078a678219	refine math_op_patch, test=develop (#19727 )	6 years ago
Jacek Czaja	47f670d58c	- Softmax mkl-dnn refactoring (#19615 ) test=develop - Cosmetic fixes test=develop	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
Aurelius84	22301115d0	Remove constraint that last dimension is forced to be 1 in huber_loss op (#19562 ) * Remove constraint that last dimension is forced to be 1 in huber_loss test=develop * add y[rank-1] == 1 when x_rank=y_rank test=develop * modify into contain_unknown_dim test=develop	6 years ago
Tao Luo	ec9bc1bd9f	paddle::framework::vectorize() templatization (#19730 ) remove unused accuracy-diff warpctc-cudnn implementation test=develop	6 years ago
Adam	428b2b9e17	MKLDNN handler cleanup (#19713 ) * MKLDNN handler cleanup * MKLDNN handler cleanup test=develop	6 years ago
Zeng Jinle	1c25c88aba	refine memory usage of some operators, test=develop (#19700 )	6 years ago
wangguanzhong	25dcd74d34	merge empty lod tensor, test=develop (#19228 ) * merge_empty_lod_tensor, test=develop * fix multiclass_nms, test=develop * refine API.spec, test=develop * add unittest case for fetch, test=develop * add lod tensor test, test=develop * return index for multiclass_nms, test=develop * add api for multiclass_nms2 * update API.spc, test=develop * refine api doc, test=develop * fix test_detection.py, test=develop * polish code, test=develop * add more unittest case, test=develop	6 years ago
yaoxuefeng	c6756ed225	fix instag op (#19591 ) * fix instag op * fix instag bug: Some tiny logical error, occurring when ins_tag (2nd input) is multiple. test=develop	6 years ago
zhongpu	5f627488db	add kernel for unsqueeze_op and Add unsqueezed op test, test=develop (#19436 ) * add kernel for unsqueeze_op, test=develop * add kernel for unsqueeze_op, test=develop * add kernel for unsqueeze_op, test=develop	6 years ago
Tao Luo	f05d2c519d	paddle::framework::vectorize() templatization [PART3] (#19643 ) * paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop	6 years ago
hutuxian	1ca6ea0318	fix cmakelist deps (#19668 ) fix cmakelist deps: remove unnecessary deps and add proper op deps	6 years ago
Tao Luo	bcddbc78d4	remove -Wmaybe-uninitialized warning (#19653 ) * remove -Wmaybe-uninitialized warning test=develop * remove uninitialized op_handle_ in scale_loss_grad_op_handle.cc test=develop	6 years ago
wangchaochaohu	4440d7ced0	test=develop cuda realization of label smooth op (#19175 )	6 years ago
chengduo	31c5a5ee26	Remove linear_chain_crf_op.cu (#19645 ) test=develop	6 years ago
wangchaochaohu	ed8f44ea21	codegen for fused elementwise operation (#19520 ) * test=develop codegen for fused elementwise operation * fix test=develop	6 years ago
Chen Weihang	73daa3d6c0	Code Cleanup: delete three useless raw variables in Conv2D (#19644 ) * delete useless raw variables in Conv2D, test=develop * adjust the vars number in test_graph_wrapper to pass unittest, test=develop	6 years ago
123malin	2f037c3189	fix the diff between async mode and async_half mode (#19535 ) * test=develop, communicator merge add => merge average	6 years ago
tangwei12	f45cb1c2ca	fix bug of communicator flag, test=develop (#19635 )	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
Leo Chen	af692c9140	update reduce_sum and reduce_mean to save memory, test=develop (#19608 )	6 years ago
Zeng Jinle	710767d894	Enable inplace support for some ops (#19612 ) * enable inplace for affine_channel op, dropout op, test=develop * remove dropout inplace for ngraph fails, test=develop	6 years ago
Tao Luo	d6c85c96dc	paddle::framework::vectorize() templatization (#19627 ) test=develop	6 years ago
danleifeng	8672e15363	elementwise broadcast function enhancement (#19536 ) elementwise broadcast function enhancement	6 years ago
Chen Weihang	8cb54ede8c	Add user-friendly error message in optimizer ops to give a hint about the position sensitive problem of run(startup_program) (#19605 ) * add extra error message hint in optimizer ops * polish format & delete useless change, test=develop * extract init judue from shape compare, test=develop	6 years ago
zhongpu	118bb897cf	add kernel for flatten_op, test=develop (#19472 ) * add kernel for flatten_op, test=develop * add kernel for flatten_op, test=develop * fix the license and remove redundant code, test=develop	6 years ago
Tao Luo	0a46d34538	refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607 ) test=develop	6 years ago
ShenLiang	2cd3fa3e9a	add scatter_nd op and scatter_nd_add op (#19571 ) * add scatter_nd op, test=document_preview test=develop * fixed the document, test=document_preview test=develop * modify the notes, test=document_preview test=develop * remove the ShareDataWith, test=develop	6 years ago
wawltor	364c44422e	Add the support the int64 data type of `scatter_op` input Index(#18804 ) (#19508 ) * test=develop Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804). * test=develop Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ * test=develop Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add * test=develop Add the test case for scatter op, the test case just for index int64	6 years ago
Adam	8d6d95cc2b	paddle::framework::vectorize() templatization (#19611 ) test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Adam	e94b26daf5	using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568 ) * using MKLDNNMemoryFormat = mkldnn::memory::format changes test=develop * PADDLE_ENFORCE update test=develop	6 years ago
baojun	f2ad30c4dd	Some ngraph op and unittest fix (#19515 ) * update ngraph ops test=develop * update unittest test=develop * increase coverage test=develop	6 years ago
Tao Luo	49523ea189	replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586 ) * remove unused PADDLE_ASSERT(_IS_NOT_ERROR) * replace PADDLE_ASSERT with PADDLE_ASSERT_MSG test=develop	6 years ago
gongweibao	abaf87be2b	Change backward_guard to optimize_guard to maximize the allreduce overlap. (#19506 ) Change backward_guard to optimize_guard to maximize the allreduce overlap	6 years ago
gongweibao	57f0f0f2dc	Delete pserver complete file before executor running. (#19468 )	6 years ago
JesseyXujin	4a7e6deb63	add padding in linear_chain_crf op (#19583 ) * add padding in linear_chain_crf op * modify API.spec * add linear_chain_crf_op.cc and linear_chain_crf_op.h * remove useless unit test , test=develop * modify API.spec, test=develop * remove some blanks in nn.py , test=develop * fix some bugs on nn.py and API.spec ,test=develop * fix nn.py, test=develop * fix API.spec ,test=develop * fix bug of CI test in test_linear_chain_crf_op.py * fix bug of CI test in test_linear_chain_crf_op.py, test=develop * remove paddle_enforce, test=develop * remove paddle_enforce, test=develop * remove paddle_enforce, test=develop * remove paddle_enforce, test=develop * remove paddle_enforce, test=develop * remove paddle_enforce, test=develop * modify nn.py, test=develop * fix API.spec, test=develop * fix unittest bug, test=develop	6 years ago
zhouwei25	84c728013c	fix the compilation issue on windows caused by mkl_CSRMM (#19533 )	6 years ago
Jacek Czaja	cef95ee30d	[MKL-DNN] Refactoring Softmax (#19312 ) * - First set of modifications - Compilation fixes - compilation fix - Another compilation fix - Moved AcquireSoftmaxPrimitiveDescriptor call into handler - MKL-DNN Softmax PD refactor test=develop - Compilation fix test=develop - another compilation fix - cosmetcis test=develop - Compilation fix - Fix to crash when softmax backward is created * - Fixes after review of softmax refactoring test=develop	6 years ago
hutuxian	c756b5d231	Paddlebox Framework (#18982 ) * Support looking up embeddings from BoxPS. * Add a _pull_box_sparse op, for now this op is not exposed to users. * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on. * Add 'BoxPSDataset' in python code. * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS. * Add UT. * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982	6 years ago
Zeng Jinle	5dce1da680	remove reset recordio usage (#19519 )	6 years ago
ShenLiang	85914f7a88	add gather_nd op and unit test (#19366 ) * fixed the code for coverage * fixed the document,test=document_preview test=develop	6 years ago
Jacek Czaja	ecd9f330c9	[MKL-DNN] Fix to face model on AVX512 platforms (#19282 ) - Refactor step 1 - Compilation fix - Yet another compilation fix - Even more compilation fix - Lint fixes test=develop - Removed deprectaed PADDLE_ENFORCE occurance test=develop - Candidate fix to BN forward - Lint fixes test=develop - Refactoring in data_layout_transform - compilation fix - Another comppilation fix - Step further into darkness - Yet another compilation fix - Yet another compilation fix - missing header - compilation fix - Added MKLDNN -> Paddle conversion in fetch op test=develop - Compilation fix test=develop - Lint test=develop - Mul fix - Fix to MKLDNN MUL op and Elementwise MUL UT test=develop - Workaround for diffrent weights with groups representation Paddle vs MKL-DNN. test=develop - Candidate fix for 5D convolution with groups - Refactor of fix for conv3d and conv2d in fetch op test=develop - Compilation fix - Still same compilation fix - Compilation fix - Compilation fix - Reverted refactoring of fixes - Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format not NHWC test=develop - minor fix in UT test=develop - Lint fixes test=develop	6 years ago
GaoWei8	e8405e5c61	Modify the dropout op to multi-thread (#19504 ) * Modify the dropout op to multi-thread test=develop * define parallel test=develop	6 years ago
Huihuang Zheng	2916caa2c4	Change ugly PADDLE_ENFORCE_EQ in recurrent_op.cc (#19470 ) test=develop	6 years ago
Liufang Sang	9dde564097	change var name padding_num to padding_value (#19498 )	6 years ago
Aurelius84	5b5379b32a	Add sequence_topk_avg_pooling Op (#19442 ) * add topk_avg_pooling * refine api doc and modify api.spec test=develop	6 years ago
Tao Luo	02270b3eb1	remove unused assert.h (#19529 ) test=develop	6 years ago
lidanqing	ba368bf696	clean up intel labeled TODOs (#19476 ) test=develop	6 years ago
Zeng Jinle	11f2f78458	fix sofmax seg fault in AVX, test=develop (#19487 )	6 years ago
Zeng Jinle	5c8f210ce3	refine inplace inference registry, test=develop (#19032 )	6 years ago
chengduo	b6d1d8901f	Increase num_iteration_per_drop_scope (#19075 ) * increase num_iteration_per_drop_scope test=develop * Fix bug of while_op test=develop * fix bug of whileOp test=develop	6 years ago
Double_V	1d0f04315a	fix row_conv_op to force it support lodtensor and tensor input simultaneously, test=develop (#19412 ) Support Tensor input for row_conv_op	6 years ago
tangwei12	65c7368400	Fix the correctness of async mode at distributed training (#18863 ) * fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop	6 years ago
baojun	6421c61ae2	Update ngraph engine for multiple threading (#19155 ) * update for multiple threading test=develop * remove PADDLE_ENFORCE test=develop	6 years ago
Yi Liu	efb05ba258	supports multiple NCCL communicators preserved in NCCLCommContext (#19407 ) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop	6 years ago
Huihuang Zheng	56dd76538c	Delete useless ex-scope in recurrent op (#19426 )	6 years ago
vincentXiyu	482ce818bb	Support Tensor input with padding for warpctc op (#19322 ) * support tensor input with padding for warpctc op * merge with develop * test=develop * modified python API examples test=develop * nn.py is modified for code coverage test=develop * update documents info about warpctc op in API.spec test=develop * add test_warpctc_with_padding in test_layers test=develop * add warning log for cuda_version back to warpctc_op.cc * modify API.spec for warpctc op test=develop * modify API.spec * update warpctc test to new CompiledProgram API test=develop * modify code examples for warpctc op test=develop * modify API.spec for warpctc op test=develop * modify API.spec for warpctc op test=develop	6 years ago
Huihuang Zheng	12d29f4d2a	Change TensorCopy in recurrent_op to ShareDataWith (#19319 )	6 years ago
tangwei12	19dac67e9f	fix distribute transpiler GRPC error code 4, RPC Deadline (#18984 ) * fix sync mode hang in transpiler * remove sync mode in send/recv * replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE	6 years ago
翟飞跃	2e3ee57954	Use sparse matrix to implement FusedEmbeddingSeqPoolGradKernel (#19153 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * optimize bp with mkl sparse matrix test=develop	6 years ago
Leo Chen	a9d5fc5142	Enhance OpTest to check the consistency of operators when using and not using inplace (#19101 ) * add pybind interface to get all inplace ops, test=develop * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop * handle corner cases in op_test, test=develop * support outputs without tensor holder_, like XShape in reshape_op, test=develop * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop * use reshape_grad instead of reshape in FlattenGradOp, test=develop * fix error debug dims info for variables like XShape, test=develop * change computational order in sum_op to relieve computation difference using inplace, test=develop * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop * follow sneaxiy's comments, test=develop * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop	6 years ago
Aurelius84	0d29cf18f4	Supports diagonal initialization in uniform_random op (#19299 ) * add diag init in Uniform_random op test=develop * modify api.spec test=develop * fix unform_batch_size_like maker test=develop * add diag_num and diag_step assert check test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
wangguanzhong	37428952c6	fix generate mask fpn, test=develop (#19301 )	6 years ago
zhaoyuchen2018	5296294dae	Fix elementwise performance poor issue (#19278 ) For small case use 1D block is better than 2D block. Refer to this issue: #19275	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
wangchaochaohu	6e326ca2c6	optimize the realization of cuda dropout (#19136 ) * cuda optimie for dropout * remove tmp swp file * fix compile error test=develop * test=develop optimize the cuda realization of dropout op * remove unsed code test=develop * remove tmp file test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
qingqing01	5fc8de449a	Remove warning in batch_norm_op (#19260 )	6 years ago
Aurelius84	78a3d837f8	Add match_matrix_tensor op (#18525 ) * add matrch_matrix_tensor op test=develop * fix ignore unittest if with_mkl=off test=develop * clean code and rm is_test param test=develop * modify API.spec test=develop * rm useless code in search_compute.h test=develop * modify api.spec test=develop * modify default_grad.spec test=develop * Add API test code test=develop * clean code in search_computer.h * modify PADDLE_ENFORCE and clean search_compute.h test=develop * fix code style test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
zhang wenhui	539c870753	add fl_listen_and_serv &fl_transpiler,test=develop (#19091 ) add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.	6 years ago
silingtong123	af0fbd9012	change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205 ) * print error code if cuda related API fails	6 years ago
gongweibao	fd4b15a2f6	Unset unittests http_proxy env to avoid timeout. (#19269 ) Unset unittests http_proxy env to avoid timeout.	6 years ago
Kaipeng Deng	2848cb791e	fix temporal_shift OP PADDLE_ENFORCE. test=develop (#19161 ) * fix temporal_shift OP PADDLE_ENFORCE. test=develop * fix HasInput/HasOutpu ENFORECE. test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
Yibing Liu	50b1cab122	Add padding support for crf_decoding (#19057 ) * Add padding support for crf_decoding * Fixes in comupte kernel test=develop * Update API Spec test=develop * Update API.spec test=develop * Avoid using paddle_enforce test=develop * Fix enforce test=develop	6 years ago
chengduo	b5ba801ef0	Fix gather op bug (#19168 ) * fix gather op bug test=develop	6 years ago
Leo Chen	80eab822c1	Remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR() (#19166 ) * remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR(), test=develop * remove SplitIdsOpGradMaker since it is buggy and not tested, update spec file, test=develop	6 years ago
chengduo	c70a97f46e	Use CUDAPinnedPlace in buffered_reader (#19112 ) Use CUDAPinnedPlace in buffered_reader	6 years ago
Jiawei Wang	6ac32d0981	Instag Implemention (#18394 ) * instag lod tensor impl * First PR for instag * First PR for instag * Before adding Selection Rows. * Change name from instag to filter_instag, add upgrade the impl of filter_instag * Change name from instag to filter_instag, add upgrade the impl of filter_instag * Fix yapf error in gradient_checker.py to pass Travis-CI * Fix Filter Instag Grad test=develop * Fix Filter Instag Grad test=develop * 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop * Impl Loss_weight and empty output handler * change Loss Weight datatype to Float32, and add Loss Weight as 2nd output * 1) Support Tensor Input(without LOD) 2) Add Unit test * Filter By Instag Final test=develop * Update API.spec for filter_by_instag test=develop * Update API.spec for filter_by_instag 2 test=develop * Add Filter By Instag Coverage * code format of test_layers.py * code format test_layers.py test=develop * Make API args more readable test=develop * Make API args more readable and pass code format test=develop * Filter By Instag Op, Rename Map to Index Map test=develop * Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop * Filter by instag op: code format of cpp files test=develop * Filter by instag Op: Api spec modification test=develop * Filter by instag Op: Api spec doc id modification test=develop * Filter by instag Op: Api spec and doc preview test=develop test=document_preview * Filter By Instag Op, fix doc erro test=document_preview test=develop * Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop * Filter By Instag Op, fix Api spec test=document_preview test=develop * Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop * Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop	6 years ago
huangjun12	20f18930ae	Add hard swish op (new op) (#19001 ) * add hard_swish activation op (new op) test=develop * remove redundancy files * modify document content of HardSwish OP * add API test in test_layers.py * add dynamic_graph for test_hard_swish	6 years ago
joanna.wozna.intel	bce72c7fea	Replace Relu with bounded Relu in MobileNetV2 quantization (#18988 ) test=develop	6 years ago
wangguanzhong	1fc242a7ed	refine infer shape in box decoder and assign op, test=develop (#19118 )	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
Kevin	945f3cf631	fix code too big test=develop (#19111 ) Fix seq_pool failed when input dims is too large. Resolve issue #3023	6 years ago
Zeng Jinle	88f111f885	remove unused inplace act codes, test=develop (#19079 )	6 years ago
ShenLiang	4397cb318e	add eye op, kernel and unitest test=develop (#18980 ) * add eye op,test=document_preview test=develop * fix the API.spec, test=develop * fix the document, test=document_preview test=develop * add unitest for CI coverage, test=develop	6 years ago
Kaipeng Deng	f86fead693	Add trilinear_interp OP (#18711 ) * add trilinear interp. test=develop * fix unittest. test=develop * add python api and test_layers. test=develop * refine API.spec. test=develop * fix format. test=develop * add python API test. test=develop * format code. test=develop * refine code strcuture. test=develop * fix format * fix doc. test=develop * fix converage. test=develop * fix format. test=develop	6 years ago
Zhang Ting	c2063217e7	optimize error message for "embedding" and "cross_entropy" OP (#18765 ) * optimize error message, test=develop * optimize error message, test=develop	6 years ago
Yiqun Liu	a445c33552	Add the check of lod in sequence_softmax kernel. (#18996 ) * Add the check of lod in sequence_softmax kernel. test=develop * Refine the comments. test=develop	6 years ago
Kevin	e681d65515	Add var_conv_2d op (#18518 ) * fix overflow by int32 mul test=develop * fix reference nullptr * fix codestyle test=develop * modify to point in ContextProjectFunctor test=develop * modify to point in ContextProjectFunctor test=develop * modify . to -> test=develop * add var_conv_2d op test=develop * edit api.spec test=develop * ignore unittest if with_mkl=off test=develop * fix python3 division test=develop * fix ignore unittest bug test=develop * remove useless code test=develop * modify api.spec test=develop * modify default_grad.spec test=develop	6 years ago
pawelpiotrowicz	e53f517a44	fix for multithreading test_analyzer_image_classification --num_threads=X (#18265 ) test=develop	6 years ago
Liufang Sang	faf6890b6c	support tensor input for ctc align op (#18887 ) * test=develop support Tensor input for ctc_align_op * test=develop add some comment	6 years ago
hutuxian	b62c4f9b04	fix concat check info typo (#18975 )	6 years ago
Zeng Jinle	7ac748adb4	Open gc by default (#18836 ) * open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
wawltor	3ab1866ca5	Add the op of unique_with_counts, expand count function of the op unique (#18720 ) * test=develop Add the op of unique_with_counts, the op is calc the unqiue input of data, and output the corresponding indices and count of data. * test=develop Check the input and dtype in the op of unique_with_counts * test=develop test=document_preview update the API.spec for `unique_with_counts`, at the same time, optimize the python api in the op of `unique_with_count` * test=develop test=document_preview Fix some python api problem in the op of `unique_with_counts`, and change the error messsage in this op. * Fix some API problem in the op of `unique_with_counts` test=develop test=document_preview * test=develop test=document_preview Fix the api sample of op `unique_with_counts`, and update api.spec	6 years ago
Jacek Czaja	5cf2d38594	- Removed passing X from FWD to GRAD via device context (#18911 ) test=develop - Extracted key generation from FWD and GRAD into separate function test=develop - Compilation fix test=develop - another compilation test=develop	6 years ago
LielinJiang	22fa4c2d24	Fix depthwise conv gpu kernel bug (#18582 ) * fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop	6 years ago
liuwei1031	0d99690809	fix several security bugs reported by security team (#18831 ) * fix security issue, test=develop * bug fix, test=develop * throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop	6 years ago
Zhaolong Xing	61238d31f7	Trt fp16 support (#18860 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop	6 years ago
chengduo	20859c08e8	[DyGraph] Make multi-card program faster (#18892 ) * update parallel.py test=develop	6 years ago
HaoRen	24f8543106	Add center Loss Op Support (#18681 ) * support center loss * change tensor copy api to high level api tensorcopy * test=develop rewrite the center_loss cuda_kernel to make it faster and add document of the center loss api,also update test function * test=document_preview test=develop update document of center loss * test=document_preview test=develop modify API.spec modify test code remove nouse const_cast	6 years ago
Leo Zhao	86e494eb64	use mkl to accelerate gelu_grad (#18099 ) test=develop	6 years ago
wopeizl	dfd6a62a9a	Optimize the error report information when loadcombine fail to open model files test=develop (#18888 )	6 years ago
baojun	adcfc53b18	upgrade ngraph version and simplify ngraph engine (#18853 ) * upgrade ngraph to v0.24 test=develop * simplify io test=develop	6 years ago
Jacek Czaja	cfcb96d2df	[MKL-DNN] Fix int8 performance regression (#18758 ) test=develop - optimization of TID to string test=develop	6 years ago
danleifeng	e0a2d4dfec	Add elementwise_pow_op backward implementation and the unit test codes of it. (#18848 )	6 years ago
Zeng Jinle	9a8a7a1ddc	fix affine_channel no_need buffer bug, test=develop (#18844 )	6 years ago
Adam	ee02227949	Add LeakyReLU MKLDNN support (#18762 )	6 years ago
lidanqing	b05bdda0cf	remove unused TransposeINT8Op for higher UT coverage (#18791 ) test=develop	6 years ago
Physher	c5f47c2107	fix mul_mkldnn_op build failure (#18816 )	6 years ago
Physher	a5c986301c	clarify MKLDNN INT8 Mul Op attributes (#18685 )	6 years ago
FDInSky	cff5e2c173	fix roi_align_op cpu backward's bug (#18789 ) * test=develop fix cpu roi_align_op backward bug	6 years ago
Bai Yifan	d3ac561d65	fix deformable_conv_op compile error, test=develop (#18793 )	6 years ago
lidanqing	9ecd8ee789	change ComputeINT8 to template version to remove checking dst_datatype code (#18756 ) * change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews * reverse user_residual_memory_p and user_bias_memory_p declaration scope test=develop	6 years ago
JesseyXujin	d9e7b5b5e9	fix bug of swish op formula,test=develop (#18772 )	6 years ago
Bob Zhu	220eef602e	Extend Matmul to support matrix multiplication with multiple heads (#18570 ) * extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].	6 years ago
whs	075e1cf78e	Add python API for appending LoD level (#18702 ) * Make lod reset op support for append lod level. * Fix API.spec test=develop * Fix unitest. test=develop * Add python api for lod append. test=develop * Fix API.spec test=develop * Fix format of doc. test=develop * Fix unitest. test=develop * Fix doc. test=develop	6 years ago
Jacek Czaja	95c1816ec0	[MKL-DNN] Extended LRN with reusing via Acquire API (#18675 ) test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
wangchaochaohu	6b78e00da4	Cudnn convolution reconstruction (#18284 ) * rewrite the conv_op using cudnn_conv_helper * add workspace limit for v7 test=develop * fix test=develop * add half float test=develop * fix test=develop * fix test=develop * revise code style test=develop * fix test=develop	6 years ago
Yi Liu	157211c4e1	supports distributed classification (#18690 ) * supports distributed classification training * update API.spec * fix evenly division in python3 * change "index_range" to "index_num" in shard_index operator test=document_preview test=develop	6 years ago
qingqing01	3429e65aa8	Fix CPU implementation of roi_align_op backward (#18728 )	6 years ago
Tao Luo	bd22453f20	Revert "Add LeakyRelu MKLDNN support (#18656 )" (#18723 ) test=develop	6 years ago
whs	189b08dc0d	Make infer shape of pad2d support for input with negative dims in compile time. (#18695 ) test=develop	6 years ago
Bai Yifan	7e3963f295	add license, test=develop (#18709 )	6 years ago
cjt222	ccf06a48b0	test=develop (#18701 ) add license	6 years ago
wangguanzhong	185b3acea1	fix clip_by_norm doc (#18688 ) * fix clip_by_norm doc, test=develop	6 years ago
Huihuang Zheng	89bc3fd841	Support memory eager deletion on recurrent OP (#17710 ) Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)	6 years ago
Adam	d6b6a337a9	Add LeakyRelu MKLDNN support (#18656 ) test=develop	6 years ago
hutuxian	bb2f5d24a2	hash_op support int64 hash_size (#18674 ) * hash_op support int64 hash_size * add corresponding UT	6 years ago
guru4elephant	5ed713d519	remove ctr reader, all functions are satisfied in dataset (#18672 ) * remove ctr reader, all functions are satisfied in dataset	6 years ago
Yang Zhang	ce1ec33299	Add cuda implementation for `prelu` backward pass (#18633 ) * Add GPU implementation for `prelu` backward pass test=develop * Fix logic error in `prelu` GPU backward and simplify a bit test=develop * Fix `prelu` backward CUDA implementation test=develop CPU version was not used actually, so test passed	6 years ago
Yihua Xu	97549a4f13	[CPU] Fix the compiling issue with AVX512F macro. (#18634 )	6 years ago
baojun	256ba7cbb8	[NGraph] handle dim element 0 of ngraph op (#18568 )	6 years ago
Jacek Czaja	71d883b8ef	[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585 ) * - Added partial draft of pooling acquire - Workspace support - compilation fix - Added draft of pooling backward reimplementation - Segfault fix - reverted 'any' for diff_dst crewation in pooling - Lint fixes test=develop - lint fixes test=develop - Further lint fixes test=develop * - Fixes after review test=develop * - Lint fixes test=develop * - Even more lint fixes test=develop	6 years ago
chengduo	f4ec7d54c8	fix bug of scatter op (#18640 ) test=develop	6 years ago
guru4elephant	ab57d3893e	make auc op compatible with 1 dim (#18551 ) * make auc op compatible with 1 dim	6 years ago
Hongyu Liu	a20b2b43fc	fix cudnn lstm shape bug; test=develop (#18492 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
Jacek Czaja	8869d7f735	Activations MKLDNN ops refactoring (#18191 )	6 years ago
Yibing Liu	b86234fc0b	Register fp16 for concat_op (#18563 )	6 years ago
Physher	5e1220ef37	fix compile error which caused by gcc4.8 related commit;test=develop (#18567 )	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Physher	0caa08ea40	Add mkldnn int8 mul-op kernel (#17834 )	6 years ago
LielinJiang	24d1c44a0c	Fix roi_perspective_transform_op bug (#18522 ) * fix transform matrix bug, test=develop * modify API.spec	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
zhaoyuchen2018	832d8191ff	Fix topk cannot handle 1D vector bug (#18466 ) * Fix topk cannot handle 1D vector bug Add path to handle 1D vector test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
qingqing01	7ac4818a98	Refine Infershape in activation_op for double_grad. (#18485 ) * Refine Infershape in activation_op for double_grad.	6 years ago
chengduo	7453857324	Make fuse_all_reduce_op_pass support mix_precision (#17652 )	6 years ago
zhoukunsheng	7c6f2350b9	support Tensor input for edit_distance op (#18162 )	6 years ago
zhoukunsheng	26318544d2	support Tensor input for chunk_eval op (#18226 ) * test=develop support Tensor input for chunk_eval op * test=develop fix testcase for chunk_eval op * test=develop fix typos in nn.py	6 years ago
zhoukunsheng	206c44e2a8	add unique kernel and op (#17557 )	6 years ago
zhoukunsheng	71af72b1c2	upgrade hash op to support Tensor and LoDTensor input (#17998 )	6 years ago
zhoukunsheng	d3b3443d10	add ones_like op (#17388 )	6 years ago
zhoukunsheng	67b48d7fe7	add size op (#17412 )	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
chengduo	e0d8c6ac68	Add find_no_grad_vars in backward.py (#17942 ) * add not_been_used_vars to no_grad_set test=develop	6 years ago
LielinJiang	449c7a9f98	Make roi_perspective_transform op return mask and transform matrix (#18371 ) * modify roi_perspective_transform_op to output mask and transform matrix * modify comment * modify comment * modify API.spec * update API.spec * remove no use header, test=develop * resolve conflict	6 years ago
Brian Liu	4bc2987d2f	Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964 ) * Fix bug in quantize kernel which cause crash in vgg16/19 model test=develop * refine the code to reduce verbose code; test=develop * remove useless code; test=develop	6 years ago
Leo Zhao	681d3553f1	Fix potential mkldnn concat/pool/conv kernel issues (#18393 ) 1. some key generation method is not aligned with PR#17965 2. enlarge ptr lifetime to avoid memory release if SetBlob fails otherwise it will get core dump. test=develop	6 years ago
Zeng Jinle	f5641000bb	Add a unittest to inplace elementwise_add (#18385 ) * add_elementwise_add_inplace_test,test=develop * rename file, test=develop	6 years ago
tangwei12	999d9a59a5	fix communicator with pyreader (#18350 ) * add is_runnning in communicator, test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
Jacek Czaja	c2efdfd5bc	[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146 ) * - Reusing of reuder used in elementwise_add_mkldnn - Added MKL-DNN sum prim reusing test=develop - Compilation fixes test=develop - Yet another compilation fix test=develop - Yet another compilation fix test=develo - Yet another linking fix test=develop - Final compilation fix test=develop - lint fixes test=develop - Lint fixes test=develop * - Fixes after review test=develop	6 years ago
qingqing01	9047ac687e	Simplify multi_box_head API in detection.py and remove assign op. (#18310 ) * Simplify multi_box_head API in detection.py and remove assign op.	6 years ago
Yibing Liu	23941e43ec	Update lamb optimizer (#18333 ) * Update lamb optimizer test=develop, test=document_preview * Regenerate api spec test=develop, test=document_preview	6 years ago
tensor-tang	81ec538279	fix softrelu doc (#18324 ) * fix softrelu doc test=develop * update API doc test=develop	6 years ago
Hongyu Liu	df2eee71d8	Sequence mask support tensor (#18249 ) * sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-x to elmentwise_op; test=develop add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop	6 years ago
Qiao Longfei	0e08e91c18	optimize communicator merge sparse gradient test=develop (#18159 ) * optimize communicator merge sparse gradient test=develop * revert multithread selected rows merge add test=develop * follow comment test=develop	6 years ago
Yibing Liu	f57ee3693b	Fix the bug of sequence_unpad op (#18290 ) * Use TensorCopySync for sequence_unpad op test=develop * Fix the tensor memory alloc bug test=develop	6 years ago
chengduo	5489216eba	Clean build strategy (#18148 ) * clean build_strategy test=develop * DataBalanceOpHandle has been removed test=develop * debug * update build_strategy. test=develop	6 years ago
songhao	6b3d96254d	fix some bug when merge sparse embedding parameters, test=develop (#18223 ) 1. fix the bug that out_put_var in SaveSelectedRows would be empty string 2. use merge_sparse_lookup_table to replace sum op for load_persistables_for_inference 3. fix the bug in _clone_var_in_block_ when the var is SELECTED_ROWS.	6 years ago
xiaoting	b58bb80248	set src_idx > 0 for bilinear_interp_op (#18238 ) * set src_idx > 0, test=develop * add unittest and cu, test=develop	6 years ago
Hongyu Liu	cefd0fb598	Fix slice op shape=-1 bug (#18107 ) * fix slice op bug; test=develop * fix variabel test bug; test=develop * remove slice while true; test=develop	6 years ago
翟飞跃	802ea50956	fix spelling errors (#17941 ) * fix spelling errors; test=develop * Update API.spec update md5 * Update API.spec * change the order of api;test=develop	6 years ago
FlyingQianMM	944c3165ec	fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h (#18152 ) * test=develop fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h * test=develop fix wrong code stype in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h	6 years ago
Zeng Jinle	6eec66a1b1	Fix py_reader iterable bug (#18108 ) * fix py_reader iterable bug, test=develop * move data from buffered_reader,test=develop	6 years ago
qingqing01	80d2e66f9e	Update backward appending stragety to support double backward and fix some bug. (#18104 ) * Update backward.py: - If there is no input grad var in all outputs of previous ops, do not append this op into graph. - Only apply this stragety when double backward. * Update some double backward op. * Update sum_op to judge whether a tensor is empty by numel or IsInitialized().	6 years ago
FlyingQianMM	ff83655f7e	add detection output operator for supporting retinanet (#17896 ) * test=develop add detection output for supporting retinanet * test=develop add test_layers.py * test=develop add API.spec * test=develop alter test_retinanet_detection_output.py * test=develop alter round 2 * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=devlop alter detection.py * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=develop alter detection.py * test=develop alter API.spec * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=develop alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py * test=develop alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py * test=develop fix grammer error * test=develop fix grammer error * test=develop fix grammer error * test=develop alter python/paddle/fluid/tests/unittests/test_layers.py * test=develop alter paddle/fluid/API.spec	6 years ago
FlyingQianMM	0aee1f0074	add sigmoid focal loss operator for supporting retinanet (#17895 ) * test=develop add sigmoid_focal_loss for supporting retinanet * test=develop add test_layers * test=develop add API.spc * test=develop alter sigmoid_focal_loss_op.cc * test=develop alter detection.py * test=develop alter API.spec * test=develop alter round 1 * test=develop alter simooid_focal_loss * test=develop alter sigmoid_focal_loss_op.cc * test=develop alter test_layers.py * test=develop alter paddle/fluid/API.spec * test=develop alter sigmoid_focal_loss_op.cu * test=develop alter paddle/fluid/operators/detection/sigmoid_focal_loss_op.cc	6 years ago
FDInSky	9e4b9d9798	Update generate_proposal_labels_op to support CascadeRCNN. (#17200 ) * Update generate_proposal_labels_op to support CascadeRCNN.	6 years ago
FlyingQianMM	9ed2f936f1	add target assign operator for supporting retinanet (#17893 ) * test=develop add target assign for retinanet * test=develop run ci * test=developp add test_layers * test=develop add APi.spec * test=develop alter round 1 * test=develop alter rpn_target_assign_op.cc * test=develop alter test_rpn_target_assign_op.py * test=develop alter rpn_target_assign_op.cc * test=develop alter API.spec * test=develop alter paddle/fluid/operators/detection/rpn_target_assign_op.cc * test=develop alter rpn_target_assign_op.cc * test=develop alter python/paddle/fluid/layers/detection.py * test=develop alter paddle/fluid/API.spec	6 years ago
chengduo	24e988a471	Fix bug of scope_buffered_ssa_graph_executor (#18100 ) * fix code bug test=develop	6 years ago
whs	354643d8d9	Add warning for cudnn warpctc kernel in CUDA9\CUDA10. (#18046 ) test=develop	6 years ago
Yiqun Liu	660c1a65f3	Optimize fused_elewise_activation_grad op. (#18041 ) test=develop	6 years ago
lidanqing	f8ecc3de89	refactor the function ConvFwdPrimitiveDesc (#17897 ) * refractor the function ConvFwdPrimitiveDesc test=develop * change according to review test=develop * use pointer way without boost::optional test=develop * pass vector to function by reference instead of raw vector test=develop * change pointer to shared_ptr test=develop	6 years ago
Wojciech Uss	78e932862c	Added unit test for QAT FP32 & INT8 comparison (#17814 ) * added unit test for QAT FP32 & INT8 comparison test=develop * enabled other models and updated filenames test=develop * added accuracy check and multiple batch handling test=develop * removed quantization_mkldnn_pass.py test=develop * cleanup test=develop * updated model paths test=develop * renamed tests without MKL-DNN test=develop * fix reusing mkldnn pool2d primitive test=develop * add performance measuring test=develop * fix accuracy statistics test=develop * removed non-mkldnn tests test=develop * added conv2d_depthwise->conv2d mkldnn transformation test=develop * format update test=develop * fixed creating key for pool2d grad test=develop * added pass * Fix the accuracy issue while using float precision to get the scale. test=develop * Fix the format issue when 'X' is not nchw. test=develop * removed output comparing and changed number of images test=develop * cmake and comment fix test=develop * updated acc threshold for QAT comparison tests test=develop * added OMP_NUM_THREADS setting test=develop * enable all QAT INT8 tests test=develop * restored upstream version of a file test=develop * modified directory names test=develop	6 years ago
tensor-tang	566bf2ec56	concat op support negative axis (#18045 ) test=develop	6 years ago
Yiqun Liu	7e463c84a6	Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979 ) test=develop	6 years ago
tangwei12	101f74cb19	fix save/load in fleet (#17675 ) * fix save/load in Fleet * add UT framework of Fleet	6 years ago
Guo Sheng	a06b316b94	Fix GetExpectedKernelType of add_position_encoding_op (#17935 ) * Fix the GetExpectedKernelType of add_position_encoding_op. test=develop * Fix the doc of lstm_unit outputs in nn.py. test=develop	6 years ago
wawltor	8eb134c3c1	Fix scatter and gather op when has duplicate index (#17952 ) * test=develop The scatter op has a calc bug when the indices has same index, the scatter op use overwrite mode to calculate the same index, fix this bug by using the accumulate mode to calculate the same index.At the same time, the gather op has the same bug when the op calc the grad. And we use the lib of open-blas and eigen to optimize the time cost in accumulate mode. * test=develop Fix some code format problem, and the same time add the test case in gather and scatter op	6 years ago
lujun	75fcd29220	update load_error_info, test=develop (#18000 ) Repair error prompt: Users are prompted to check whether the model or parameter files are damaged when loading parameters are wrong.	6 years ago
wawltor	2ae8decc90	test=develop (#17984 ) Fix bug in sequence_unpad op, when allocate the output memory do not match actual memory, check memory failed. Fix this bug by allocating the output memeory in correct code position.	6 years ago
cjt222	871af28d6c	add deformable psroi pooling (#17827 ) * add deformable psroi pooling * test=develop * test=develop * test=develop modify format * fix bug * test=develop run ci * test=develop add API.spec * add test_layers.py * run ci again * test=develop run ci again * run ci again * test=develop run ci again * test=develop run ci again * test=develop run ci again * add space between two lines * test=develop add space between two lines * test=develop add space between lines * test=develop modify comment in nn.py * test=develop add space between two lines * test=develop add space between two lines * update API.spec * run ci again * test=develop run ci again * rerun ci * test=develop rerun ci * change input shape * run ci * test=develop run ci * modify format of nn.py * test=develop * test=develop * test=develop update API.spec * test=develop fix API doc * modify API comment * modift API comment * test=develop update API.spec * test=develop modify comment * test=develop modift comment * test=develop modift comment * test=develop update API.spec * test=develop modify comment * test=develop add inference in nn.py * test=develop update API.spec * test=develop resolve confict * test=develop update API.spec	6 years ago
SunGaofeng	40885c225b	add unfold op (new op),test=develop (#17944 ) * add unfold op test=develop * fix divide bug in python3 when calculating output width and height test=develop * add name=None in python api, move redundant code into inline function * try to trigger ci for this code test=develop	6 years ago
Jacek Czaja	84bb45c054	[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965 ) * - removed is_reusing_ * - Added TID to keys for reusing apart from softmax PD * - compilation fix * - Yet another compilation fix * - Batch Norm and Conv adapted * - Fix to softmax MT * - Fixes to MT code of MKL-DNN * - Lint fixes test=develop	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago

... 3 4 5 6 7 ...

4783 Commits (b085ecc25896c0a4aea70bcfff316683a76ec5e4)