Paddle

Commit Graph

Author	SHA1	Message	Date
Yi Liu	efb05ba258	supports multiple NCCL communicators preserved in NCCLCommContext (#19407 ) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop	6 years ago
Huihuang Zheng	56dd76538c	Delete useless ex-scope in recurrent op (#19426 )	6 years ago
vincentXiyu	482ce818bb	Support Tensor input with padding for warpctc op (#19322 ) * support tensor input with padding for warpctc op * merge with develop * test=develop * modified python API examples test=develop * nn.py is modified for code coverage test=develop * update documents info about warpctc op in API.spec test=develop * add test_warpctc_with_padding in test_layers test=develop * add warning log for cuda_version back to warpctc_op.cc * modify API.spec for warpctc op test=develop * modify API.spec * update warpctc test to new CompiledProgram API test=develop * modify code examples for warpctc op test=develop * modify API.spec for warpctc op test=develop * modify API.spec for warpctc op test=develop	6 years ago
Huihuang Zheng	12d29f4d2a	Change TensorCopy in recurrent_op to ShareDataWith (#19319 )	6 years ago
tangwei12	19dac67e9f	fix distribute transpiler GRPC error code 4, RPC Deadline (#18984 ) * fix sync mode hang in transpiler * remove sync mode in send/recv * replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE	6 years ago
翟飞跃	2e3ee57954	Use sparse matrix to implement FusedEmbeddingSeqPoolGradKernel (#19153 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * optimize bp with mkl sparse matrix test=develop	6 years ago
Leo Chen	a9d5fc5142	Enhance OpTest to check the consistency of operators when using and not using inplace (#19101 ) * add pybind interface to get all inplace ops, test=develop * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop * handle corner cases in op_test, test=develop * support outputs without tensor holder_, like XShape in reshape_op, test=develop * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop * use reshape_grad instead of reshape in FlattenGradOp, test=develop * fix error debug dims info for variables like XShape, test=develop * change computational order in sum_op to relieve computation difference using inplace, test=develop * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop * follow sneaxiy's comments, test=develop * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop	6 years ago
Aurelius84	0d29cf18f4	Supports diagonal initialization in uniform_random op (#19299 ) * add diag init in Uniform_random op test=develop * modify api.spec test=develop * fix unform_batch_size_like maker test=develop * add diag_num and diag_step assert check test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
wangguanzhong	37428952c6	fix generate mask fpn, test=develop (#19301 )	6 years ago
zhaoyuchen2018	5296294dae	Fix elementwise performance poor issue (#19278 ) For small case use 1D block is better than 2D block. Refer to this issue: #19275	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
wangchaochaohu	6e326ca2c6	optimize the realization of cuda dropout (#19136 ) * cuda optimie for dropout * remove tmp swp file * fix compile error test=develop * test=develop optimize the cuda realization of dropout op * remove unsed code test=develop * remove tmp file test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
qingqing01	5fc8de449a	Remove warning in batch_norm_op (#19260 )	6 years ago
Aurelius84	78a3d837f8	Add match_matrix_tensor op (#18525 ) * add matrch_matrix_tensor op test=develop * fix ignore unittest if with_mkl=off test=develop * clean code and rm is_test param test=develop * modify API.spec test=develop * rm useless code in search_compute.h test=develop * modify api.spec test=develop * modify default_grad.spec test=develop * Add API test code test=develop * clean code in search_computer.h * modify PADDLE_ENFORCE and clean search_compute.h test=develop * fix code style test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
zhang wenhui	539c870753	add fl_listen_and_serv &fl_transpiler,test=develop (#19091 ) add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.	6 years ago
silingtong123	af0fbd9012	change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205 ) * print error code if cuda related API fails	6 years ago
gongweibao	fd4b15a2f6	Unset unittests http_proxy env to avoid timeout. (#19269 ) Unset unittests http_proxy env to avoid timeout.	6 years ago
Kaipeng Deng	2848cb791e	fix temporal_shift OP PADDLE_ENFORCE. test=develop (#19161 ) * fix temporal_shift OP PADDLE_ENFORCE. test=develop * fix HasInput/HasOutpu ENFORECE. test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
Yibing Liu	50b1cab122	Add padding support for crf_decoding (#19057 ) * Add padding support for crf_decoding * Fixes in comupte kernel test=develop * Update API Spec test=develop * Update API.spec test=develop * Avoid using paddle_enforce test=develop * Fix enforce test=develop	6 years ago
chengduo	b5ba801ef0	Fix gather op bug (#19168 ) * fix gather op bug test=develop	6 years ago
Leo Chen	80eab822c1	Remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR() (#19166 ) * remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR(), test=develop * remove SplitIdsOpGradMaker since it is buggy and not tested, update spec file, test=develop	6 years ago
chengduo	c70a97f46e	Use CUDAPinnedPlace in buffered_reader (#19112 ) Use CUDAPinnedPlace in buffered_reader	6 years ago
Jiawei Wang	6ac32d0981	Instag Implemention (#18394 ) * instag lod tensor impl * First PR for instag * First PR for instag * Before adding Selection Rows. * Change name from instag to filter_instag, add upgrade the impl of filter_instag * Change name from instag to filter_instag, add upgrade the impl of filter_instag * Fix yapf error in gradient_checker.py to pass Travis-CI * Fix Filter Instag Grad test=develop * Fix Filter Instag Grad test=develop * 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop * Impl Loss_weight and empty output handler * change Loss Weight datatype to Float32, and add Loss Weight as 2nd output * 1) Support Tensor Input(without LOD) 2) Add Unit test * Filter By Instag Final test=develop * Update API.spec for filter_by_instag test=develop * Update API.spec for filter_by_instag 2 test=develop * Add Filter By Instag Coverage * code format of test_layers.py * code format test_layers.py test=develop * Make API args more readable test=develop * Make API args more readable and pass code format test=develop * Filter By Instag Op, Rename Map to Index Map test=develop * Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop * Filter by instag op: code format of cpp files test=develop * Filter by instag Op: Api spec modification test=develop * Filter by instag Op: Api spec doc id modification test=develop * Filter by instag Op: Api spec and doc preview test=develop test=document_preview * Filter By Instag Op, fix doc erro test=document_preview test=develop * Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop * Filter By Instag Op, fix Api spec test=document_preview test=develop * Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop * Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop	6 years ago
huangjun12	20f18930ae	Add hard swish op (new op) (#19001 ) * add hard_swish activation op (new op) test=develop * remove redundancy files * modify document content of HardSwish OP * add API test in test_layers.py * add dynamic_graph for test_hard_swish	6 years ago
joanna.wozna.intel	bce72c7fea	Replace Relu with bounded Relu in MobileNetV2 quantization (#18988 ) test=develop	6 years ago
wangguanzhong	1fc242a7ed	refine infer shape in box decoder and assign op, test=develop (#19118 )	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
Kevin	945f3cf631	fix code too big test=develop (#19111 ) Fix seq_pool failed when input dims is too large. Resolve issue #3023	6 years ago
Zeng Jinle	88f111f885	remove unused inplace act codes, test=develop (#19079 )	6 years ago
ShenLiang	4397cb318e	add eye op, kernel and unitest test=develop (#18980 ) * add eye op,test=document_preview test=develop * fix the API.spec, test=develop * fix the document, test=document_preview test=develop * add unitest for CI coverage, test=develop	6 years ago
Kaipeng Deng	f86fead693	Add trilinear_interp OP (#18711 ) * add trilinear interp. test=develop * fix unittest. test=develop * add python api and test_layers. test=develop * refine API.spec. test=develop * fix format. test=develop * add python API test. test=develop * format code. test=develop * refine code strcuture. test=develop * fix format * fix doc. test=develop * fix converage. test=develop * fix format. test=develop	6 years ago
Zhang Ting	c2063217e7	optimize error message for "embedding" and "cross_entropy" OP (#18765 ) * optimize error message, test=develop * optimize error message, test=develop	6 years ago
Yiqun Liu	a445c33552	Add the check of lod in sequence_softmax kernel. (#18996 ) * Add the check of lod in sequence_softmax kernel. test=develop * Refine the comments. test=develop	6 years ago
Kevin	e681d65515	Add var_conv_2d op (#18518 ) * fix overflow by int32 mul test=develop * fix reference nullptr * fix codestyle test=develop * modify to point in ContextProjectFunctor test=develop * modify to point in ContextProjectFunctor test=develop * modify . to -> test=develop * add var_conv_2d op test=develop * edit api.spec test=develop * ignore unittest if with_mkl=off test=develop * fix python3 division test=develop * fix ignore unittest bug test=develop * remove useless code test=develop * modify api.spec test=develop * modify default_grad.spec test=develop	6 years ago
pawelpiotrowicz	e53f517a44	fix for multithreading test_analyzer_image_classification --num_threads=X (#18265 ) test=develop	6 years ago
Liufang Sang	faf6890b6c	support tensor input for ctc align op (#18887 ) * test=develop support Tensor input for ctc_align_op * test=develop add some comment	6 years ago
hutuxian	b62c4f9b04	fix concat check info typo (#18975 )	6 years ago
Zeng Jinle	7ac748adb4	Open gc by default (#18836 ) * open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
wawltor	3ab1866ca5	Add the op of unique_with_counts, expand count function of the op unique (#18720 ) * test=develop Add the op of unique_with_counts, the op is calc the unqiue input of data, and output the corresponding indices and count of data. * test=develop Check the input and dtype in the op of unique_with_counts * test=develop test=document_preview update the API.spec for `unique_with_counts`, at the same time, optimize the python api in the op of `unique_with_count` * test=develop test=document_preview Fix some python api problem in the op of `unique_with_counts`, and change the error messsage in this op. * Fix some API problem in the op of `unique_with_counts` test=develop test=document_preview * test=develop test=document_preview Fix the api sample of op `unique_with_counts`, and update api.spec	6 years ago
Jacek Czaja	5cf2d38594	- Removed passing X from FWD to GRAD via device context (#18911 ) test=develop - Extracted key generation from FWD and GRAD into separate function test=develop - Compilation fix test=develop - another compilation test=develop	6 years ago
LielinJiang	22fa4c2d24	Fix depthwise conv gpu kernel bug (#18582 ) * fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop	6 years ago
liuwei1031	0d99690809	fix several security bugs reported by security team (#18831 ) * fix security issue, test=develop * bug fix, test=develop * throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop	6 years ago
Zhaolong Xing	61238d31f7	Trt fp16 support (#18860 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop	6 years ago
chengduo	20859c08e8	[DyGraph] Make multi-card program faster (#18892 ) * update parallel.py test=develop	6 years ago
HaoRen	24f8543106	Add center Loss Op Support (#18681 ) * support center loss * change tensor copy api to high level api tensorcopy * test=develop rewrite the center_loss cuda_kernel to make it faster and add document of the center loss api,also update test function * test=document_preview test=develop update document of center loss * test=document_preview test=develop modify API.spec modify test code remove nouse const_cast	6 years ago
Leo Zhao	86e494eb64	use mkl to accelerate gelu_grad (#18099 ) test=develop	6 years ago
wopeizl	dfd6a62a9a	Optimize the error report information when loadcombine fail to open model files test=develop (#18888 )	6 years ago
baojun	adcfc53b18	upgrade ngraph version and simplify ngraph engine (#18853 ) * upgrade ngraph to v0.24 test=develop * simplify io test=develop	6 years ago
Jacek Czaja	cfcb96d2df	[MKL-DNN] Fix int8 performance regression (#18758 ) test=develop - optimization of TID to string test=develop	6 years ago
danleifeng	e0a2d4dfec	Add elementwise_pow_op backward implementation and the unit test codes of it. (#18848 )	6 years ago
Zeng Jinle	9a8a7a1ddc	fix affine_channel no_need buffer bug, test=develop (#18844 )	6 years ago
Adam	ee02227949	Add LeakyReLU MKLDNN support (#18762 )	6 years ago
lidanqing	b05bdda0cf	remove unused TransposeINT8Op for higher UT coverage (#18791 ) test=develop	6 years ago
Physher	c5f47c2107	fix mul_mkldnn_op build failure (#18816 )	6 years ago
Physher	a5c986301c	clarify MKLDNN INT8 Mul Op attributes (#18685 )	6 years ago
FDInSky	cff5e2c173	fix roi_align_op cpu backward's bug (#18789 ) * test=develop fix cpu roi_align_op backward bug	6 years ago
Bai Yifan	d3ac561d65	fix deformable_conv_op compile error, test=develop (#18793 )	6 years ago
lidanqing	9ecd8ee789	change ComputeINT8 to template version to remove checking dst_datatype code (#18756 ) * change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews * reverse user_residual_memory_p and user_bias_memory_p declaration scope test=develop	6 years ago
JesseyXujin	d9e7b5b5e9	fix bug of swish op formula,test=develop (#18772 )	6 years ago
Bob Zhu	220eef602e	Extend Matmul to support matrix multiplication with multiple heads (#18570 ) * extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].	6 years ago
whs	075e1cf78e	Add python API for appending LoD level (#18702 ) * Make lod reset op support for append lod level. * Fix API.spec test=develop * Fix unitest. test=develop * Add python api for lod append. test=develop * Fix API.spec test=develop * Fix format of doc. test=develop * Fix unitest. test=develop * Fix doc. test=develop	6 years ago
Jacek Czaja	95c1816ec0	[MKL-DNN] Extended LRN with reusing via Acquire API (#18675 ) test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
wangchaochaohu	6b78e00da4	Cudnn convolution reconstruction (#18284 ) * rewrite the conv_op using cudnn_conv_helper * add workspace limit for v7 test=develop * fix test=develop * add half float test=develop * fix test=develop * fix test=develop * revise code style test=develop * fix test=develop	6 years ago
Yi Liu	157211c4e1	supports distributed classification (#18690 ) * supports distributed classification training * update API.spec * fix evenly division in python3 * change "index_range" to "index_num" in shard_index operator test=document_preview test=develop	6 years ago
qingqing01	3429e65aa8	Fix CPU implementation of roi_align_op backward (#18728 )	6 years ago
Tao Luo	bd22453f20	Revert "Add LeakyRelu MKLDNN support (#18656 )" (#18723 ) test=develop	6 years ago
whs	189b08dc0d	Make infer shape of pad2d support for input with negative dims in compile time. (#18695 ) test=develop	6 years ago
Bai Yifan	7e3963f295	add license, test=develop (#18709 )	6 years ago
cjt222	ccf06a48b0	test=develop (#18701 ) add license	6 years ago
wangguanzhong	185b3acea1	fix clip_by_norm doc (#18688 ) * fix clip_by_norm doc, test=develop	6 years ago
Huihuang Zheng	89bc3fd841	Support memory eager deletion on recurrent OP (#17710 ) Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)	6 years ago
Adam	d6b6a337a9	Add LeakyRelu MKLDNN support (#18656 ) test=develop	6 years ago
hutuxian	bb2f5d24a2	hash_op support int64 hash_size (#18674 ) * hash_op support int64 hash_size * add corresponding UT	6 years ago
guru4elephant	5ed713d519	remove ctr reader, all functions are satisfied in dataset (#18672 ) * remove ctr reader, all functions are satisfied in dataset	6 years ago
Yang Zhang	ce1ec33299	Add cuda implementation for `prelu` backward pass (#18633 ) * Add GPU implementation for `prelu` backward pass test=develop * Fix logic error in `prelu` GPU backward and simplify a bit test=develop * Fix `prelu` backward CUDA implementation test=develop CPU version was not used actually, so test passed	6 years ago
Yihua Xu	97549a4f13	[CPU] Fix the compiling issue with AVX512F macro. (#18634 )	6 years ago
baojun	256ba7cbb8	[NGraph] handle dim element 0 of ngraph op (#18568 )	6 years ago
Jacek Czaja	71d883b8ef	[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585 ) * - Added partial draft of pooling acquire - Workspace support - compilation fix - Added draft of pooling backward reimplementation - Segfault fix - reverted 'any' for diff_dst crewation in pooling - Lint fixes test=develop - lint fixes test=develop - Further lint fixes test=develop * - Fixes after review test=develop * - Lint fixes test=develop * - Even more lint fixes test=develop	6 years ago
chengduo	f4ec7d54c8	fix bug of scatter op (#18640 ) test=develop	6 years ago
guru4elephant	ab57d3893e	make auc op compatible with 1 dim (#18551 ) * make auc op compatible with 1 dim	6 years ago
Hongyu Liu	a20b2b43fc	fix cudnn lstm shape bug; test=develop (#18492 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
Jacek Czaja	8869d7f735	Activations MKLDNN ops refactoring (#18191 )	6 years ago
Yibing Liu	b86234fc0b	Register fp16 for concat_op (#18563 )	6 years ago
Physher	5e1220ef37	fix compile error which caused by gcc4.8 related commit;test=develop (#18567 )	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Physher	0caa08ea40	Add mkldnn int8 mul-op kernel (#17834 )	6 years ago
LielinJiang	24d1c44a0c	Fix roi_perspective_transform_op bug (#18522 ) * fix transform matrix bug, test=develop * modify API.spec	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
zhaoyuchen2018	832d8191ff	Fix topk cannot handle 1D vector bug (#18466 ) * Fix topk cannot handle 1D vector bug Add path to handle 1D vector test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
qingqing01	7ac4818a98	Refine Infershape in activation_op for double_grad. (#18485 ) * Refine Infershape in activation_op for double_grad.	6 years ago
chengduo	7453857324	Make fuse_all_reduce_op_pass support mix_precision (#17652 )	6 years ago
zhoukunsheng	7c6f2350b9	support Tensor input for edit_distance op (#18162 )	6 years ago
zhoukunsheng	26318544d2	support Tensor input for chunk_eval op (#18226 ) * test=develop support Tensor input for chunk_eval op * test=develop fix testcase for chunk_eval op * test=develop fix typos in nn.py	6 years ago
zhoukunsheng	206c44e2a8	add unique kernel and op (#17557 )	6 years ago
zhoukunsheng	71af72b1c2	upgrade hash op to support Tensor and LoDTensor input (#17998 )	6 years ago
zhoukunsheng	d3b3443d10	add ones_like op (#17388 )	6 years ago
zhoukunsheng	67b48d7fe7	add size op (#17412 )	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
chengduo	e0d8c6ac68	Add find_no_grad_vars in backward.py (#17942 ) * add not_been_used_vars to no_grad_set test=develop	6 years ago
LielinJiang	449c7a9f98	Make roi_perspective_transform op return mask and transform matrix (#18371 ) * modify roi_perspective_transform_op to output mask and transform matrix * modify comment * modify comment * modify API.spec * update API.spec * remove no use header, test=develop * resolve conflict	6 years ago
Brian Liu	4bc2987d2f	Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964 ) * Fix bug in quantize kernel which cause crash in vgg16/19 model test=develop * refine the code to reduce verbose code; test=develop * remove useless code; test=develop	6 years ago
Leo Zhao	681d3553f1	Fix potential mkldnn concat/pool/conv kernel issues (#18393 ) 1. some key generation method is not aligned with PR#17965 2. enlarge ptr lifetime to avoid memory release if SetBlob fails otherwise it will get core dump. test=develop	6 years ago
Zeng Jinle	f5641000bb	Add a unittest to inplace elementwise_add (#18385 ) * add_elementwise_add_inplace_test,test=develop * rename file, test=develop	6 years ago
tangwei12	999d9a59a5	fix communicator with pyreader (#18350 ) * add is_runnning in communicator, test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
Jacek Czaja	c2efdfd5bc	[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146 ) * - Reusing of reuder used in elementwise_add_mkldnn - Added MKL-DNN sum prim reusing test=develop - Compilation fixes test=develop - Yet another compilation fix test=develop - Yet another compilation fix test=develo - Yet another linking fix test=develop - Final compilation fix test=develop - lint fixes test=develop - Lint fixes test=develop * - Fixes after review test=develop	6 years ago
qingqing01	9047ac687e	Simplify multi_box_head API in detection.py and remove assign op. (#18310 ) * Simplify multi_box_head API in detection.py and remove assign op.	6 years ago
Yibing Liu	23941e43ec	Update lamb optimizer (#18333 ) * Update lamb optimizer test=develop, test=document_preview * Regenerate api spec test=develop, test=document_preview	6 years ago
tensor-tang	81ec538279	fix softrelu doc (#18324 ) * fix softrelu doc test=develop * update API doc test=develop	6 years ago
Hongyu Liu	df2eee71d8	Sequence mask support tensor (#18249 ) * sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-x to elmentwise_op; test=develop add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop	6 years ago
Qiao Longfei	0e08e91c18	optimize communicator merge sparse gradient test=develop (#18159 ) * optimize communicator merge sparse gradient test=develop * revert multithread selected rows merge add test=develop * follow comment test=develop	6 years ago
Yibing Liu	f57ee3693b	Fix the bug of sequence_unpad op (#18290 ) * Use TensorCopySync for sequence_unpad op test=develop * Fix the tensor memory alloc bug test=develop	6 years ago
chengduo	5489216eba	Clean build strategy (#18148 ) * clean build_strategy test=develop * DataBalanceOpHandle has been removed test=develop * debug * update build_strategy. test=develop	6 years ago
songhao	6b3d96254d	fix some bug when merge sparse embedding parameters, test=develop (#18223 ) 1. fix the bug that out_put_var in SaveSelectedRows would be empty string 2. use merge_sparse_lookup_table to replace sum op for load_persistables_for_inference 3. fix the bug in _clone_var_in_block_ when the var is SELECTED_ROWS.	6 years ago
xiaoting	b58bb80248	set src_idx > 0 for bilinear_interp_op (#18238 ) * set src_idx > 0, test=develop * add unittest and cu, test=develop	6 years ago
Hongyu Liu	cefd0fb598	Fix slice op shape=-1 bug (#18107 ) * fix slice op bug; test=develop * fix variabel test bug; test=develop * remove slice while true; test=develop	6 years ago
翟飞跃	802ea50956	fix spelling errors (#17941 ) * fix spelling errors; test=develop * Update API.spec update md5 * Update API.spec * change the order of api;test=develop	6 years ago
FlyingQianMM	944c3165ec	fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h (#18152 ) * test=develop fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h * test=develop fix wrong code stype in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h	6 years ago
Zeng Jinle	6eec66a1b1	Fix py_reader iterable bug (#18108 ) * fix py_reader iterable bug, test=develop * move data from buffered_reader,test=develop	6 years ago
qingqing01	80d2e66f9e	Update backward appending stragety to support double backward and fix some bug. (#18104 ) * Update backward.py: - If there is no input grad var in all outputs of previous ops, do not append this op into graph. - Only apply this stragety when double backward. * Update some double backward op. * Update sum_op to judge whether a tensor is empty by numel or IsInitialized().	6 years ago
FlyingQianMM	ff83655f7e	add detection output operator for supporting retinanet (#17896 ) * test=develop add detection output for supporting retinanet * test=develop add test_layers.py * test=develop add API.spec * test=develop alter test_retinanet_detection_output.py * test=develop alter round 2 * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=devlop alter detection.py * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=develop alter detection.py * test=develop alter API.spec * test=develop alter retinanet_detection_output * test=develop alter paddle/fluid/API.spec * test=develop alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py * test=develop alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py * test=develop fix grammer error * test=develop fix grammer error * test=develop fix grammer error * test=develop alter python/paddle/fluid/tests/unittests/test_layers.py * test=develop alter paddle/fluid/API.spec	6 years ago
FlyingQianMM	0aee1f0074	add sigmoid focal loss operator for supporting retinanet (#17895 ) * test=develop add sigmoid_focal_loss for supporting retinanet * test=develop add test_layers * test=develop add API.spc * test=develop alter sigmoid_focal_loss_op.cc * test=develop alter detection.py * test=develop alter API.spec * test=develop alter round 1 * test=develop alter simooid_focal_loss * test=develop alter sigmoid_focal_loss_op.cc * test=develop alter test_layers.py * test=develop alter paddle/fluid/API.spec * test=develop alter sigmoid_focal_loss_op.cu * test=develop alter paddle/fluid/operators/detection/sigmoid_focal_loss_op.cc	6 years ago
FDInSky	9e4b9d9798	Update generate_proposal_labels_op to support CascadeRCNN. (#17200 ) * Update generate_proposal_labels_op to support CascadeRCNN.	6 years ago
FlyingQianMM	9ed2f936f1	add target assign operator for supporting retinanet (#17893 ) * test=develop add target assign for retinanet * test=develop run ci * test=developp add test_layers * test=develop add APi.spec * test=develop alter round 1 * test=develop alter rpn_target_assign_op.cc * test=develop alter test_rpn_target_assign_op.py * test=develop alter rpn_target_assign_op.cc * test=develop alter API.spec * test=develop alter paddle/fluid/operators/detection/rpn_target_assign_op.cc * test=develop alter rpn_target_assign_op.cc * test=develop alter python/paddle/fluid/layers/detection.py * test=develop alter paddle/fluid/API.spec	6 years ago
chengduo	24e988a471	Fix bug of scope_buffered_ssa_graph_executor (#18100 ) * fix code bug test=develop	6 years ago
whs	354643d8d9	Add warning for cudnn warpctc kernel in CUDA9\CUDA10. (#18046 ) test=develop	6 years ago
Yiqun Liu	660c1a65f3	Optimize fused_elewise_activation_grad op. (#18041 ) test=develop	6 years ago
lidanqing	f8ecc3de89	refactor the function ConvFwdPrimitiveDesc (#17897 ) * refractor the function ConvFwdPrimitiveDesc test=develop * change according to review test=develop * use pointer way without boost::optional test=develop * pass vector to function by reference instead of raw vector test=develop * change pointer to shared_ptr test=develop	6 years ago
Wojciech Uss	78e932862c	Added unit test for QAT FP32 & INT8 comparison (#17814 ) * added unit test for QAT FP32 & INT8 comparison test=develop * enabled other models and updated filenames test=develop * added accuracy check and multiple batch handling test=develop * removed quantization_mkldnn_pass.py test=develop * cleanup test=develop * updated model paths test=develop * renamed tests without MKL-DNN test=develop * fix reusing mkldnn pool2d primitive test=develop * add performance measuring test=develop * fix accuracy statistics test=develop * removed non-mkldnn tests test=develop * added conv2d_depthwise->conv2d mkldnn transformation test=develop * format update test=develop * fixed creating key for pool2d grad test=develop * added pass * Fix the accuracy issue while using float precision to get the scale. test=develop * Fix the format issue when 'X' is not nchw. test=develop * removed output comparing and changed number of images test=develop * cmake and comment fix test=develop * updated acc threshold for QAT comparison tests test=develop * added OMP_NUM_THREADS setting test=develop * enable all QAT INT8 tests test=develop * restored upstream version of a file test=develop * modified directory names test=develop	6 years ago
tensor-tang	566bf2ec56	concat op support negative axis (#18045 ) test=develop	6 years ago
Yiqun Liu	7e463c84a6	Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979 ) test=develop	6 years ago
tangwei12	101f74cb19	fix save/load in fleet (#17675 ) * fix save/load in Fleet * add UT framework of Fleet	6 years ago
Guo Sheng	a06b316b94	Fix GetExpectedKernelType of add_position_encoding_op (#17935 ) * Fix the GetExpectedKernelType of add_position_encoding_op. test=develop * Fix the doc of lstm_unit outputs in nn.py. test=develop	6 years ago
wawltor	8eb134c3c1	Fix scatter and gather op when has duplicate index (#17952 ) * test=develop The scatter op has a calc bug when the indices has same index, the scatter op use overwrite mode to calculate the same index, fix this bug by using the accumulate mode to calculate the same index.At the same time, the gather op has the same bug when the op calc the grad. And we use the lib of open-blas and eigen to optimize the time cost in accumulate mode. * test=develop Fix some code format problem, and the same time add the test case in gather and scatter op	6 years ago
lujun	75fcd29220	update load_error_info, test=develop (#18000 ) Repair error prompt: Users are prompted to check whether the model or parameter files are damaged when loading parameters are wrong.	6 years ago
wawltor	2ae8decc90	test=develop (#17984 ) Fix bug in sequence_unpad op, when allocate the output memory do not match actual memory, check memory failed. Fix this bug by allocating the output memeory in correct code position.	6 years ago
cjt222	871af28d6c	add deformable psroi pooling (#17827 ) * add deformable psroi pooling * test=develop * test=develop * test=develop modify format * fix bug * test=develop run ci * test=develop add API.spec * add test_layers.py * run ci again * test=develop run ci again * run ci again * test=develop run ci again * test=develop run ci again * test=develop run ci again * add space between two lines * test=develop add space between two lines * test=develop add space between lines * test=develop modify comment in nn.py * test=develop add space between two lines * test=develop add space between two lines * update API.spec * run ci again * test=develop run ci again * rerun ci * test=develop rerun ci * change input shape * run ci * test=develop run ci * modify format of nn.py * test=develop * test=develop * test=develop update API.spec * test=develop fix API doc * modify API comment * modift API comment * test=develop update API.spec * test=develop modify comment * test=develop modift comment * test=develop modift comment * test=develop update API.spec * test=develop modify comment * test=develop add inference in nn.py * test=develop update API.spec * test=develop resolve confict * test=develop update API.spec	6 years ago
SunGaofeng	40885c225b	add unfold op (new op),test=develop (#17944 ) * add unfold op test=develop * fix divide bug in python3 when calculating output width and height test=develop * add name=None in python api, move redundant code into inline function * try to trigger ci for this code test=develop	6 years ago
Jacek Czaja	84bb45c054	[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965 ) * - removed is_reusing_ * - Added TID to keys for reusing apart from softmax PD * - compilation fix * - Yet another compilation fix * - Batch Norm and Conv adapted * - Fix to softmax MT * - Fixes to MT code of MKL-DNN * - Lint fixes test=develop	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
Tao Luo	53fd507bae	fix merge conflict of 'Remove attribute in Allocator::Allocate' and elementwise_add_mkldnn_op (#17949 ) test=develop	6 years ago
jerrywgz	aab4d12c0e	refine GetExpectedKernelType in conat op, test=develop (#17934 )	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
Yibing Liu	33d1e56506	Enable seq_pool op to accept len 0 input (#17284 ) * Enable seq_pool op to accept len 0 input test=develop * Update sequence_pool's api test=develop * Add more unittest cases for seq_pool op test=develop * Remove legacy comments test=develop * Don't use template in op maker test=develop	6 years ago
Yihua Xu	9b5017366a	Fix the format issue when 'X' is not nchw. (#17833 ) test=develop	6 years ago
Hongyu Liu	8062bd510c	Reshape support tensor attribute (#17781 ) * add reshape support tensor; test=develop * fix reshape bug; test=develop * change reshape attribute default value; test=develop * fix reshape input name; test=develop * fix reshape unitest; test=develop * check dim tensor shape; test=develop	6 years ago
Zeng Jinle	0a96ec699c	fix conv v7 workspace size limit error, test=develop (#17902 )	6 years ago
Yihua Xu	14a32bf0c4	Fix the accuracy issue while using float precision to get the scale. (#17884 ) test=develop	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
baojun	e2c1b7c354	[NGraph] cache compiled function instead test=develop (#17845 )	6 years ago
Zhaolong Xing	ae576f3c68	fix: when use the load model from memory mode, the RAM occupy is high (#17788 ) test=develop	6 years ago
Zhaolong Xing	5efe8c7287	fix bug: the lod_tensor_to_array op will aplly a new var but not release when dong inference (#17856 ) test=develop	6 years ago
pawelpiotrowicz	39bc8a55a4	[NGraph] Enable ngraph layer_norm operator (#17599 ) * Enable ngraph layer_norm operator test=develop * Disable/Enable cuda, new unit-test test=develop * Fix use_cudnn test=develop * Fixed test_layer test, new funciton is added test=develop * set use_cudnn by default test=develop	6 years ago
baojun	a4c528a31c	[NGraph] some ngraph updates to enable bert (#17739 ) * delay infershape test=develop * fall back subblock to paddle test=develop * fix edge cases test=develop * remove output duplicates test=develop * handle reshape2_grad infershape test=develop	6 years ago
baojun	7611208ab7	[NGraph] added gather_grad to ngraph test=develop (#17646 )	6 years ago
jerrywgz	92d9bdfce2	fix api doc in slice op, test=develop (#17804 )	6 years ago
Hongyu Liu	dfec676270	expand op supprt tensor attribute (#17773 ) * expand support tensor attribute; test=develop * fix bug ; test=develop * fix uni test bug; test=develop * fix copy bug; test=develop * refine expand_times default value; test=develop	6 years ago
Hongyu Liu	82358bfdc1	ont hot support tensor depth (#16972 ) * support some input tensor remain on cpu; test=develop * fix input = none; test=develop * fix unfound bug; test=develop * fix proto None case; test=develop * fix bug; test=develop * fix proto null bug; test=develop * remove conv check; test=develop * fix test bug; test=develop * move fill constant; test=develop * no change in proto; test=develop * fix bug; test=develop * change attr detph name; test=develop * remove remain cpu; test=develop * fix bug; test=develop * merge develop; test=develop * fix one_hot bug; test=develop * fix bug; test=develop * fix bug; test=develop * fix bug; test=develop * fix python api bug; test=develop	6 years ago
Brian Liu	7cfddf22c8	Optimize bilinear interpolate op with OpenMP (#17800 ) Refactor the code to be OpenMP friendly test=develop	6 years ago
wangchaochaohu	c10157a5df	revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753 ) * revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop	6 years ago
mozga-intel	6a6bf597f7	[NGraph] Enable elementwise_div operator test=develop (#17515 ) * Enable elementwise_div operator test=develop * Fix update date test=develop	6 years ago
lidanqing	d7c5c2bd64	Add input format in Transpose GetHash (#17737 ) * fix the bug of mobilenet-ssd INT8 inference without overloading GetHash test=develop * remove the out_grad->format() in TransposeMKLDNNGradOpKernel test=develop	6 years ago
baojun	2c58f1a83c	[NGraph] Added lookup table to ngraph engine test=develop (#17647 )	6 years ago
pawelpiotrowicz	bacc822492	[NGraph] Enable transpose ngraph operator (#17636 ) test=develop	6 years ago
baojun	90eae0b39a	[NGraph] Addded slice op to ngraph test=develop (#17648 )	6 years ago
baojun	2fbaa5c075	[NGraph] added matmul op to ngraph engine test=develop (#17645 )	6 years ago
Bai Yifan	bba57cdd82	Add deformable conv v2 op,test=develop (#17145 ) * unit commits, test=develop * update API.spec, test=develop	6 years ago
YishengCheng	bd15912d65	fix bug for ctr_reader for svm data (#17575 ) * fix bug for ctr_reader test=develop * fix svm data test=develop fix svm data test=develop	6 years ago
Yiqun Liu	8fd39f3e99	Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236 ) * Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop	6 years ago
Yiqun Liu	2704479bb2	Optimize recurrent_op using Prepare and RunPreparedContext, avoiding create operators in every iter. (#17689 ) test=develop	6 years ago
pawelpiotrowicz	9b99876442	Enable less_than ngraph operator (#17642 ) * Enable less_than ngraph operator test=develop * Added compare unit-tests test=develop * Update: date && removed import test=develop	6 years ago
hutuxian	4ff87c049d	remove useless input 'Softmax@GRAD' from softmax_with_cross_entropy op (#17612 )	6 years ago
pawelpiotrowicz	70a887af63	[NGraph] Add reduce_sum operator for Ngraph (#17450 ) test=develop	6 years ago
baojun	29baca0dd8	add depthwise_conv2d op to ngraph engine (#17454 ) * add depthwise_conv2d test=develop * use cpu for ngraph test=develop	6 years ago
gongweibao	0d561ef442	fix 2dconn test=develop (#17681 )	6 years ago
mozga-intel	ccf9e2327b	[Lite] Enable cast operator test=develop (#17294 )	6 years ago
Yiqun Liu	5782dddad0	Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415 ) * Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2. test=develop * Refine codes. test=develop * Correct the condition. test=develop * Move the define of tmp_data outside the if statement. * Print the cudnn minor version. test=develop * Fix the case when in_num/o_num is 1 in concat/split op. test=develop * Remove const_cast. test=develop	6 years ago
lidanqing	04b6c29ee0	Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570 ) * add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test test=develop * change fasle and 0.0 to fuse_brelu and brelu_threshold test=develop change the "fuse_relu\|\|fuse_brelu" to "unsigned_output" test=develop * Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18 test=develop * continuous-integration fix test=develop	6 years ago
Tao Luo	962eed6f82	Revert "Enable SQRT operator for the nGraph Bridge (#17549 )" (#17680 ) This reverts commit `f34830e2aa`.	6 years ago
Krzysztof Binias	f34830e2aa	Enable SQRT operator for the nGraph Bridge (#17549 ) * Enable sqrt operator for the nGraph Bridge. test=develop * Update activation_op.h	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
Krzysztof Binias	b1bd483a7d	[NGraph] Enable gelu operator for the nGraph Bridge. (#17547 ) test=develop	6 years ago
chengduo	343017324e	Polish Print Op (#17651 ) * enhance print	6 years ago
Zeng Jinle	4aa931dd85	Code clean of Allocator (#17602 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop	6 years ago
Guo Sheng	430e25654b	Fix the usage of out_grad lod in sequence_slice_op. (#17625 ) test=develop	6 years ago
hutuxian	1670db5e86	Gather Op Index Support int64_t datatype (#17610 ) * gather_op support int64_t index by adding a template typename * add UT and rename typename test=develop	6 years ago
mozga-intel	2b83d75bfa	Enable elementwise pow operator for ngraph (#17526 )	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago
Krzysztof Binias	e9216d0602	Enable logical operators for the nGraph Bridge. (#17543 ) test=develop	6 years ago
Yibing Liu	e8990e64f6	Fix trust ratio in lamb (#17614 ) test=develop	6 years ago
chengduo	b5f4d5ed0e	Add broadcast operators (#17503 ) * This PR adds broadcast for multi-process. And it could be used in dynamic graph to broadcast parameters.	6 years ago
Sylwester Fraczek	bccb0ba49a	fix quantize_squash_pass segfault when no tensor linked to Bias (#17292 ) * fix quantize_squash_pass segfault when there is no tensor linked do Bias input test=develop * add googlenet test test=develop * fix concat CreateKey not using input format test=develop	6 years ago
mozga-intel	0d4cbdad91	[NGraph] Enable elementwise mul operator (#17552 )	6 years ago
mozga-intel	f2694e122d	[NGraph] Enable assign operator for a ngraph, test=develop (#17437 ) * Enable assign operator for a ngraph, test=develop * Cross_entropy operators needs to be updated	6 years ago
mozga-intel	cf02cb5e98	Enable elementwise sub operator for ngraph (#17527 )	6 years ago
tensor-tang	7ae461eb13	[CPU] refine cpu softmax bwd (#17534 ) * refine softmax fwd test=develop * refine cpu softmax bwd test=develop * fix batch size test=develop * fix compile issue with gpu test=develop * add value clip	6 years ago
tensor-tang	0600b370ea	[CPU] refine softmax op fwd on CPU (#17522 ) * refine softmax fwd test=develop * fix compile issue wih gpu test=develop * add value clip to avoid exp	6 years ago
mozga-intel	035771512d	Enable elementwise min operator for ngraph (#17521 )	6 years ago
jerrywgz	c1aae8b8d2	Fix GetExpectedKernelType in Concat op (#17459 ) * fix concat op vartype check, test=develop	6 years ago
Qiao Longfei	58f7695ab2	Async exe support communicator (#17386 ) Async exe support communicator	6 years ago
mozga-intel	109b5aed5a	[NGraph] Enable reshape operator test=develop (#17512 )	6 years ago
Krzysztof Binias	43d15b9d96	Enable square operator for the nGraph Bridge. (#17551 ) test=develop	6 years ago
Sevin F. Varoglu	f86f49e779	[NGraph] add increment op to ngraph engine (#16929 ) * add increment op to ngraph engine test=develop * fix style errors test=develop	6 years ago
baojun	8923612b10	NGraph enable parse serialized graph test=develop (#17453 )	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
Yibing Liu	f9796b1249	Add LAMB Optimizer support (#17489 ) * Add LAMB optimizer * Expose LAMB Optimizer's APIs test=develop, test=document_preview * Cleanup code & doc test=develop, test=document_preview * Update lamb optimizer's formula test=develop	6 years ago
mozga-intel	99ab57123c	Enabled ngraph elementwise max operator (#17517 )	6 years ago
Tao Luo	3d19f44a89	remove unused SERIAL compiler option (#17500 ) test=develop	6 years ago
mozga-intel	1eb151752e	Enable abs operator for a ngraph test=develop (#17436 )	6 years ago
liuwei1031	ba70cc499e	fix security bugs : (#17464 ) http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page test=develop	6 years ago
Zhaolong Xing	ff7f911b4d	add quant_dequant_moving_avg_max_abs op (#17480 ) * add quant_dequant_moving_avg_max_abs op test=develop * add more note for quantdequant op test=develop	6 years ago
Qiao Longfei	287de41c04	Optimize communicator flags (#17494 ) * optimize communicator flag * change flags in init py test=develop	6 years ago
lvmengsi	10b23a72c1	Double backward elementwise div (#17416 ) * double backward, elementwise_div * fix dx empty. test=develop * bug fix (#17392) fix secure bug * Eanble stack operator for a Ngraph, test=develop (#17406) * fix sqrt_grad_grad unittest. test=develop (#17410) * fix sqrt_grad_grad unittest. test=develop * disable sqrt_grad_grad unittest. test=develop * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix bug * fix unittest. test=develop * fix unittest dx. test=develop * tmp fix! for test... test=develop * reduce tmp, test=develop * test=develop, reduce tmp * fix broadcast unittest. test=develop * fix format. test=develop * refine code. test=develop * refine code. test=develop * refine GetDoubleGradSafeTensor. test=develop * fix format. test=develop	6 years ago
Zeng Jinle	3d4e8268c6	fix recurrent fwd bug when no backward and scope clear (#17460 )	6 years ago
lvmengsi	977e9fcb27	support elementwise_sub double backward (#17476 ) add elementwise_sub_grad_grad op for backward of backward calculation	6 years ago
chengduo	5a6ab38013	Add record event And remove CSP (#17447 ) * add record_event test=develop * remove csp test=develop	6 years ago
Yan Xu	0217555530	polish parallel dygraph code (#17164 ) * add var grad hook test=develop	6 years ago
Bai Yifan	3a9ae28d32	fix assert,test=develop (#17445 )	6 years ago
zhaoyuchen2018	b02f2aff04	Add conditional compile for gru opt (#17368 ) * improve gru unit performance. refine code test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * Add conditional compile for gru opt Not enable gru opt if compute ability < 700 test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
Zeng Jinle	712bfb17cb	fix recurrent_op,test=develop (#17433 )	6 years ago
mozga-intel	6ee6700fac	Eanble stack operator for a Ngraph, test=develop (#17406 )	6 years ago
Krzysztof Binias	0823a7bc8b	Optimize the sequence padding op (#17403 ) test=develop	6 years ago
baojun	1ce7b45b9e	NGraph Added fill_zeros_like op test=develop (#17295 )	6 years ago
baojun	910196524d	NGraph Added dropout and dropout_grad to ngraph test=develop (#17320 )	6 years ago
mozga-intel	b189480734	Ngraph Enable gather operator test=develop (#17296 )	6 years ago
lvmengsi	4ef631013c	Double backward sqrt (#17387 ) * double backward sqrt * refine unittest. test=develop * refine test. test=develop * remove alpha in unittest. test=develop	6 years ago
lvmengsi	5d1ac41b00	Double backward reduce mean (#17372 ) * test=develop, double backward reduce_mean * add comment. test=develop * fix format. test=develop * rename GradGrad -> DoubleGrad. test=develop * fix op_use_default_grad_op_maker.spec. test=develop	6 years ago
jerrywgz	0cae5a36b6	enhance generate mask labels, test=develop (#17380 )	6 years ago
Kaipeng Deng	bd9bef5a4e	add elementwise_add_grad_grad op (#17366 ) * add elementwise_add_grad_grad op. test=develop * use defined GradMaker. test=develop	6 years ago
jerrywgz	1c6d064627	add collect fpn proposals op,test=develop (#16074 ) * add collect fpn proposals op,test=develop	6 years ago
Kaipeng Deng	60be66e2c0	support fc_op double grad (#17317 ) * add double grad for mul_op. test=develop * fix format. test=develop * fix format. test=develop * fix format. test=develop * refine code. test=develop * remove setzero. test=develop * fix dx/dy init bug. test=develop * fix format. test=develop	6 years ago
liuwei1031	0863599323	Fix the uninitialized gru_value.output_value. (#17197 ) test=develop	6 years ago
Yihua Xu	218d8d8f73	Optimize the computing kernel of sequence_reverse operator (#17349 ) * Optimize the computing kernel of sequence_reverse operator. test=develop * Clean code test=develop * Fix for cpplint syntax checking. test=develop * Fix the compile warning issue. test=develop	6 years ago
Yiqun Liu	dcda20233c	Optimize the elementwise op using eigen (#15494 ) * Optimize the elementwise op with CUDA kernels. test=develop * Support setting of attr in op config file. test=develop * Add the support the setting dtype and initializer in config. test=develop * Save workspace. * Add initializer "zeros". test=develop * Fix compiling error. * Support the use of existed file to initailize tensor in op_tester. * Use eigen to optimize the elementwise_add/mul for the case that x and y have the same dims. test=develop	6 years ago
Kaipeng Deng	8bae8590ac	add double grad for elementwise_mul op (#17255 ) * add double grad for elementwise_mul. test=develop * remove comment. test=develop * fix grad sum. test=develop * fix for axis expand. test=develop * add test for axis expand. test=develop	6 years ago
Kaipeng Deng	11d3a38f25	add double grad for square op (#17173 ) * add double grad for square. test=develop * formax code. test=develop * fix for grad sum. test=develop * refine shape. test=develop * refine extract. test=develop	6 years ago
zhoukunsheng	d4b67e1692	Add Where Op(#16793 )	6 years ago

... 3 4 5 6 7 ...

4684 Commits (940c6ff1c815c64572a3b523baf6d0a35628b8d4)