Paddle

Commit Graph

Author	SHA1	Message	Date
songyouwei	99d30bfc36	speedup slice impl (#23340 ) test=develop	5 years ago
Zhaolong Xing	1a6ce8b910	add swish split gelu plugin dynamic support (#23305 ) test=develop	5 years ago
Jacek Czaja	2bb1b0e89e	[DNNL] Added MKL-DNN inplace pass for C-API inference (#23315 )	5 years ago
Yi Liu	0471476a18	fix nccl comm double free bug (#23344 ) As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.	5 years ago
wangchaochaohu	1ee2a9a424	Profiler refine (#23294 ) * refine output of profiler for child event	5 years ago
Leo Chen	488b2387e2	Feature/expand params in auto-generated pybind functions for dygraph operators (#23181 ) * expand parameters, test=develop * support resnet, test=develop * fix resnet, test=develop * support duplicable out, test=develop * support ptb * fix bugs, test=develop * support null input, test=develop * fix bugs, test=develop * fix batchNorm is_test, test=develop * refine code, test=develop * follow comments, test=develop * follow comments, test=develop * follow comments, test=develop * follow comments, test=develop	5 years ago
GaoWei8	20eed5401a	Change fluid.layers.where‘s C++ operator name (#23250 )	5 years ago
Yi Liu	2169e6fb58	Initialize global nccl_comm in PE (#23275 )	5 years ago
Jacek Czaja	012886df79	[DNNL] Softmax mkldnn op inplace support (#23197 )	5 years ago
石晓伟	75ebb48a91	supports thread-binding stream, test=develop (#23177 )	5 years ago
石晓伟	708ded584e	pause the io_utils_test of int64 and resume after repair, test=develop (#23234 )	5 years ago
Zeng Jinle	babda94c8a	Distinguish public/private global vars (#23269 ) * distinguish public/private vars, test=develop * fix windows issues, test=develop	5 years ago
zhaoyuchen2018	58615a6272	Improve elementwise performance. (#23001 ) * Improve elementwise performance. Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern. * Add some cuda kernel to speedup common broadcast cases. test=develop * Add more test cases and fix cuda kernel bug. test=develop * Remove tests as cpu percision fails.test=develop * Refine SplitDims, test=develop * Change file mode, test=develop	5 years ago
Wojciech Uss	f836c8aa8f	add check for scales and a message (#23119 )	5 years ago
Zeng Jinle	8bfd62ffb7	Expose dygraph.grad api (#23124 ) * expose dygraph.grad api, test=develop, test=document_fix * add more parameter in dygraph.grad API, test=develop * add only_inputs=True parameter, test=develop * follow comments, test=develop, test=document_fix * fix typo, test=develop, test=document_fix	5 years ago
Wilber	0129f4b568	Add some inference API comments for AnalysisPredictor (#23242 ) * add inference api doc. test=develop	5 years ago
Tao Luo	c00d427d52	simplify the cmake log of ir/CMakeLists.txt (#23262 ) test=develop	5 years ago
Zeng Jinle	77b4dc80c9	code polish for adding const qualifier, test=develop, test=document_fix (#23248 )	5 years ago
Zhaolong Xing	430b0099c9	[Paddle-TRT]: Ernie Dynamic shape support. (#23138 ) * add dynamic plugin support. test=develop * change emb eltwise layernorm to math function test=develop * add emb eltwise layernorm test=develop * can run dynamic shape ernie test=develop * fix ci test=develop * add ut for trt ernie dynamic test=develop * refine dynamic shape c++ interface. test=develop * fix comments test=develop * fix comments test=develop	5 years ago
xujiaqi01	68ea1ad55b	add clear one table (#23089 ) * add clear_one_table * test=develop	5 years ago
danleifeng	ae3bb16d06	add MaskAucCalculator in paddlebox (#23157 ) * add maskauc in paddlebox; test=develop	5 years ago
liym27	6af480ca33	Support int64 for op assign_value. test=develop (#23179 )	5 years ago
Zeng Jinle	53e6f8e1da	rename macro, test=develop (#23161 )	5 years ago
Zeng Jinle	bba740710d	add cuda resource pool for BufferedReader, test=develop (#23152 )	5 years ago
Zeng Jinle	7d8d50b6cc	rename no_need_buffer_vars macro, test=develop (#23160 )	5 years ago
Liufang Sang	a486a739e1	fix compile error in win gpu (#23196 ) * fix compile error in win gpu test=develop * fix compile error in win gpu test=develop * fix compile error in win gpu test=develop	5 years ago
Zeng Jinle	7ca77a90ac	add Tensor::IsSharedBufferWith method, test=develop (#23175 )	5 years ago
Zeng Jinle	b8886bf122	rename no_need_buffer_vars_macro, test=develop (#23159 )	5 years ago
Zeng Jinle	bae5930ba1	fix graph attr copy issues, test=develop (#23191 )	5 years ago
wangchaochaohu	b721e23b25	transpose cudnn using cudnn v7 api (#19738 ) * refine the transopose conv using v7 to choose algorithm	5 years ago
Pei Yang	46b8d282dc	Add some inference API comments for AnalysisConfig (#23117 ) * add some API comments in paddle_analysis_config.h, test=develop * add some API comments in paddle_analysis_config.h, test=develop	5 years ago
Adam	4f5e4540f8	Improve SGD jit code to work with large data (#23120 )	5 years ago
Liufang Sang	4db031902d	add dequantize_log_op and make pyramid hash support int8 weight (#22548 ) * add dequantize_log_op and make pyramid hash support int8 weight test=develop * add unittest and update pyramid hash op test=develop * remove paddle_enforce test=develop * fix error message test=develop * remove incorrent commit test=develop * fix error message in log_dequantize test=develop * change 2019 to 2020 test=develop * remove useless check_grad test=develop	5 years ago
Zeng Jinle	e5fef8f38a	[Dygraph double grad]Code polish (#23121 ) * fix dygraph double grad, test=develop * fix unpack constructor, test=develop	5 years ago
Zeng Jinle	9258e96094	fix read op comments, test=develop, test=document_fix (#23122 )	5 years ago
Zeng Jinle	acfc9b8a70	Reader sequential and inference partial feed (#22699 ) * sequential reader stage 1, test=develop * fix ut, test=develop * fix iterable=False reset bug, add some logs and polish code, test=develop * inference feed partial data, test=develop * Turn on keep_order=True for test, test=develop * enhance ut to test more cases, test=develop * test commit for reverting * Revert "test commit for reverting", test=develop This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58. * add ut of merged and unmerged results, test=develop * add more uts for coverages and add en doc of api, test=develop * follow comments, test=develop * change note style, test=develop	5 years ago
Wilber	95b356a069	update embedding_eltwise_layernorm fuse and kernel. test=develop (#23114 ) update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input	5 years ago
Zeng Jinle	a31d7328b7	Add dygraph double grad implementation (#22939 ) * add double grad implementation for dygraph, test=develop * polish code, add uts, test=develop * fix place bug, test=develop * polish codes, add more uts for coverages, test=develop * add no_grad_set, test=develop * add star gan ut, test=develop * follow comments, test=develop	5 years ago
Yiqun Liu	3af4771122	Add the detection and code-generation of sqrt and square in fusion_group (#23095 )	5 years ago
hutuxian	0c30098f8b	Add need_save_delta parameter to solve OOM (#23097 )	5 years ago
songyouwei	2e2da7124b	high-performance dygraph slice (#22879 ) * move __getitem__ to cpp * bug fix * add type check and gil release * support negative step with omitted ends test=develop * code refine test=develop * bug fix test=develop * slice always return different pyobj test=develop	5 years ago
Sylwester Fraczek	abee05a8c8	added mkldnn swish activation (#23041 )	5 years ago
Zhaolong Xing	8c6fde9e69	fix align error (#23090 ) test=develop	5 years ago
Liufang Sang	915b892a15	Fix div zero in fake quantize op (#22966 ) * fix div zero test=develop * fix div zero test=develop * add hostdevice function test=develop * add eps when is zero test=develop	5 years ago
Yi Liu	121b2aed4d	initialize global nccl context in dygraph (#23037 ) initialize global nccl context in dygraph test=develop	5 years ago
Zhang Ting	880eb04d93	skip PrepareData when it is unnecessary (#22839 ) * remove unnecessary prepare data, test=develop * Op in while block will not skip PrepareData, test=develop	5 years ago
Feiyu Chan	01ab8a0619	add approximation for gelu, test=develop (#22961 ) add approximation for gelu, default value is False (only kernel with eigen is added, remove code for computing gelu with MKLDNN temporarily)	5 years ago
Adam	5842ae6785	Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695 )" (#22985 )	5 years ago
Pei Yang	24db750386	fix trt int8 calib precision bug. test=develop (#23036 )	5 years ago
GaoWei8	1dc1f9270e	Fix lod error of concat op for axis = 0 (#22538 )	5 years ago
yaoxuefeng	660ff18488	fix datsset test=develop (#23043 )	5 years ago
Zhang Ting	714b0076b6	Override GetKernelTypeForVar to avoid device transform, test=develop (#23032 )	5 years ago
wangchaochaohu	112e3edbf6	fix the conv group problem test=develop (#23025 )	5 years ago
Wilber	db40ee86db	fix unittets. test=develop (#23018 )	5 years ago
wangchaochaohu	99db0cf762	remove debug log test=develop (#22994 )	5 years ago
wangchaochaohu	3757e0687c	Add Unittest for backward of fusion group (#22932 ) * add fusion group test for backward and refine code	5 years ago
chengjuntao	63f3ada7b9	fix bug which input shape (#22965 ) * fix bug which input shape, test=develop * add error type,test=develop	5 years ago
Zhang Ting	137d6563fc	add check for assigned data, test=develop (#22960 )	5 years ago
wangchaochaohu	f0d193a23c	Cast fusion for fusion group (#22876 ) * add support for expression type convert and add cast Op support in fusion group	5 years ago
yaoxuefeng	29a7a52d38	Fix instag (#22632 ) * update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * fix compile warning test=develop * add attr default test=develop * add unittest test=develop * fix style test=develop * fix style test=develop * change out_val_ifempty to out_val_if_empty test=develop	5 years ago
wangchaochaohu	c979c9f2b0	refine the profiler print test=develop (#22968 )	5 years ago
Wilber	ff3ddbb502	add skip_layernorm pass. test=develop (#22895 ) * add skip_layernorm pass. test=develop	5 years ago
wawltor	f154d5860f	Speed up the matmul op, use the gemm replace the batch gemm (#22926 ) In the op of gemm, we use the gemm to replace batch gemm, speed up the matmul op	5 years ago
Adam	056edf3929	Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695 )	5 years ago
Zhaolong Xing	8d6dc102fe	[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse (#22494 ) * 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop	5 years ago
guofei	3d8571e884	modify assign op and add unittest of assign op (#22769 ) As the title.	5 years ago
Zeng Jinle	d33c4343e1	Imperative tracer refactoring (#22457 ) * refine grad maker, test=develop * refactor tracer stage 1, test=develop * merge develop to solve conflict third times, test=develop	5 years ago
liu zhengxi	61fef9754b	Fix fc padding bug during inference fusion (#22860 ) * fix fc padding during fusion, test=develop * fix optim model inference after SaveOptimModel, test=develop	5 years ago
tangwei12	ad9c8f6d2d	fix communicator when break under pyreder mode (#22911 ) * fix communicator when breaking under PyReader mode, test=develop * revert some vlog level to 0, test=develop	5 years ago
mapingshuo	5ba9dfc16a	add lookup_table_dequant_op (#22900 ) add lookup_table_dequant_op	5 years ago
zhaoyuchen2018	a020a25797	Fix model int8 quant fail, test=develop (#22891 ) As model fails when enable int8 quant, so disable allocate memory in cpu for small variable.	5 years ago
Zhaolong Xing	dd67d44a50	[Paddle-TRT] : (Part1) Dynamic shape support (#22868 ) * change the ci trt from version 5. to 6.0 * paddle-trt dynamic shape support init * conv+bias or conv+bn dynamic shape support test=develop * modity trt engine opconvert test=develop * fix ci error test=develop	5 years ago
tangwei12	07e13b84cd	remove vlog, test=develop (#22898 )	5 years ago
Zhang Ting	ca9c8b417d	fix compute ratio of profile, test=develop (#22872 )	5 years ago
wangchaochaohu	dbb0b9b3b6	refine the profiler print (#22823 ) * refine the profiler print test=develop	5 years ago
Michał Gallus	0038bfbd1d	Prevent loading of warmup data in analyzer_int8 if enable_int8 is set to false (#22857 )	5 years ago
Chen Weihang	1644926a6c	Polish detail implement of dygraph data loader (#22878 ) * polish detail implement of data loader, test=develop * solve coverage ci problem, test=develop	5 years ago
Wilber	f686310d81	fix concat_mkldnn op. test=develop (#22692 ) fix concat_mkldnn op when encounter extreame conditions.	5 years ago
hong	5191e54494	reduce default attrs for dynamic graph (#22850 ) * reduce default attrs for dynamic graph, test=develop * add some explanations for explicit attr, test=develop * tweak explicit attr comments, test=develop	5 years ago
Zhaolong Xing	1a533ed2de	[BUG]: Multihead matmul op's ouput size should be BxSx(N*H) (#22848 ) test=develop	5 years ago
hong	c736fef93b	dygraph backward engine accelerate (#22808 ) * fix loaded program load bug; test=develop * first version * speed backward engin; test=develop * remove useless code; test=develop * reconvery io.py; test=develop * remove useless code; test=develop * remove useless code; test=develop	5 years ago
Zeng Jinle	d41d802ba3	Add flags to limit gpu memory (#22793 ) * add recorded cuda memory apis, fix typo, test=develop * add more ut, test=develop * follow comments, test=develop * fix py35 incompatible issues, test=develop	5 years ago
石晓伟	1861ca88f1	serialize the PaddleTensor, test=develop (#22810 ) * encapsulate the PaddleTensorToLoDTensor, test=develop * serialize the pd_tensor, test=develop * serialize tensors to file, test=develop	5 years ago
Zhang Ting	72ff5a09c3	fix print bug of profile, test=develop (#22804 )	5 years ago
Zhang Ting	4e8bc02461	add fluid.device_guard to specify the device type for Op (#22254 ) * add fluid.device_guard to specify the device type for Op	5 years ago
石晓伟	ddb9b46fec	change the function in op_teller, test=develop (#22794 ) * change the function in op_teller, test=develop * correct the commit-id, test=develop	5 years ago
Zhen Wang	89cfa49156	Unmerged fetch list (#22635 ) * update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results. * add the unit test for fetch_unmerged. * update ut for multi-card and multi-cpu. * add the error message and the user suggestion in FetchOpHandle. test=develop	5 years ago
wangchaochaohu	8456c3f4dd	polish the profiler_help code (#22811 )	5 years ago
zhongpu	2fd1ec1e3e	fix docker build for paddle openblas, test=develop (#22795 )	5 years ago
Chen Weihang	7d8d573453	Speed up dygraph DataLoader based on shared memory and LoDTensor serialization (#22541 ) * add lodtensor share memory & serialization, test=develop * fix windows compile error, test=develop * deal vartype pickle & fix unittest matching error message, test=develop * update timeout variable name, test=develop * refactor memory map implement, test=develop * clear mmap file discripter when exit unexpectedly, test=develop * remove the child process fd in advance, test=develop * remove mmap fds after Queue.put in child process, test=develop * add hard unittests for register exit func, test=develop * fix python2 compatibility problem in unittest, test=develop * fix exception unittest error, test=develop * polish code based review comment, test=develop	5 years ago
liu zhengxi	324f2b3922	Fix inference c api PD_GetZeroCopyOutput lod (#22768 ) * fix inference c api lod, test=develop * fix capi lod problem and enrich tests, test=develop * delete useless header files and alter const_cast, test=develop	5 years ago
wangchaochaohu	7578fcbac4	Profile code refine (#22800 ) * add profiler_help.h to refine the code test=develop	5 years ago
hutuxian	53a2b68f4e	support customized download command in dataset (#22782 ) * user can call dataset.set_download_cmd to set its customized download cmd * add UT to cover this scenario	5 years ago
wangchaochaohu	ca9e77a8d4	add sum op support for fusion group (#22771 ) * Add the codegen and auto fusion for sum Op in fusion group	5 years ago
tianshuo78520a	433cef03e5	fix typo word (#22784 )	5 years ago
Kaipeng Deng	ebc7ffc300	fix detection_map. test=develop (#22705 )	5 years ago
zhaoyuchen2018	72dde4abde	Refine adam op to improve performance, test=develop (#22346 ) * Refine adam op, test=develop * Fuse kernels together to reduce cpu time. * Refine paddle enforce, test=develop * Remove some comments, test=develop * Refine code,test=develop * Refine cuda kernel, test=develop * Refine code according to comments, test=develop	5 years ago
wangguanzhong	f2d1cd119a	fix lod level, test=develop (#22755 )	5 years ago
FlyingQianMM	79d712346f	Correct CPU gradients of the argsort op (#22739 ) * Correct CPU gradients of the argsort op, form a network to test its forward and backward process, test=develop * fix dynamic threshold error in test_argsort_op, test=develop	5 years ago
Adam	2b80e9a719	Add cpu_info without XBYAK (#22716 )	5 years ago
guofei	ae8b5f11a3	Change ShareDataWith() to TensorCopy() in ref_by_trainer_id (#22717 ) As the title	5 years ago
liu zhengxi	71ab0458e1	Fix pointer and c-api encapsulation (#22663 ) * refine pointer and c-api prototype, test=develop * fix new c api profile bug, test=develop * add unit tests, test=develop	5 years ago
Leo Chen	b2c1be851a	support cond in clone, test=develop (#22657 ) * support cond in clone, test=develop * refine code, test=develop * refine code, test=develop * follow comments, test=develop * refine code, test=develop	5 years ago
Zhang Ting	f97f3f9301	add framework overhead ratio in profile report (#22590 ) * add framework overhead ratio, test=develop * print GpuMemcpy overhead, test=develop	5 years ago
zhouwei25	160d0f1308	fix the CI risk that network cannot be connected (#22736 )	5 years ago
chengjuntao	15c2667143	register fp16 for assign op (#22744 ) * register fp16 for assign op, test=develop * add op test for fp16, test=develop	5 years ago
zhangchunle	882e7f7c3b	Directly getting API.spec for tools/sampcd_processor.py (#22728 )	5 years ago
dyning	1c0653462d	fix generate_mask_labels lod level (#22743 )	5 years ago
GaoWei8	ba140222d6	fix compile&runtime lod_equality of lod_reset (#22737 )	5 years ago
hutuxian	175954d894	PaddleBox Framework Part2 (#22466 ) * Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator. * Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly. * Remove CPU code in Pull/PushSparse and we will add it back when testing it fully. * Fix some known issues: such as copying persistable vars after one epoch running.	5 years ago
ShenLiang	3132681e8a	add partial_sum op in contrib (#22292 ) * add partial_sum_op, test=develop * modify the Paddle Error Message, test=develop * modify the Paddle Error Message, test=develop * modify the bug for python3, test=develop * modify the ut for ci, test=develop * mv to contrib, test=develop * use check_variable_and_dtype, test=develop * fix ci, test=develop * fix conflict, test=dvelop * add partial concat, test=develop * fix the conflict, test=develop * fix the error, test=develop * rm SSE4, test=develop	5 years ago
wangchaochaohu	611411b90e	Fusion group profile support (#22718 ) * add support for the driver api callback and fix the profiler name show bug	5 years ago
ShenLiang	e136661304	add partial_concat op in contrib (#22528 ) * add partial_concat, test=develop * fix the grids and blocks, test=develop * fix the Paddle_Enforce, test=develop * fix the doc of op, test=develop * fix the doc, test=develop * fix the doc of the op, test=develop * replace -1 with None, test=develop	5 years ago
GaoWei8	cdf5f6fb8c	Add an inference interface to disable FC padding (#22097 ) * Add an interface of disabling FC padding * fix bert regression * polish fc padding interface * recover pass function * fix argument error * fix mkldnn error	5 years ago
tianshuo78520a	d2ba91aad1	fix typo words (#22653 )	5 years ago
Yibing Liu	6e7bfe30a6	register fp16 kernel for some ops (#22650 ) (#22696 ) test=develop	5 years ago
tangwei12	66a3150135	SYNC with communicaotor (#22344 ) * add sync communicator and implement	5 years ago
Yiqun Liu	22bbd54719	Add the support of fp16 in fusion_group (#22239 )	5 years ago
flame	d97475d53b	fix CPU C inference API compile bug (#22702 )	5 years ago
Huihuang Zheng	adfa5b8354	Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp (#22673 ) 1. Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp. 2. Also enrich PADDLE_ENFORCE error messages.	5 years ago
flame	74eb82de19	fix go api bug (#22669 )	5 years ago
wangchaochaohu	a089072c8b	fix the profile print error (#22665 ) * fix the profile print error test=develop	5 years ago
lidanqing	d926214535	[UT coverage] improve the mul_mkldnn_op line coverage (#22408 ) * improve the mul_mkldnn_op line coverage test=develop * remove fp32 mul mkldnn kernel test=develop * locally refactoring test=develop * change according to reviews test=develop	5 years ago
wangchaochaohu	c65c6ae534	add flag to control profile level in python API (#22319 ) * add python flag to control profile level test=develop	5 years ago
123malin	00594c1c88	support dumping params/grads in transpiler mode (#22490 )	5 years ago
Zhaolong Xing	a06d75a280	[Paddle-TRT] Refine the error log about runtime batch and max_batch_size. (#22535 ) * fix trt log test=develop * fix comments test=develop	5 years ago
Adam	608447bfd5	Update MKLDNN to v1.2 (#22521 )	5 years ago
Adam	ab610a34ff	transpose_mkldnn code change to meet Paddle standards (#22591 )	5 years ago
Jiawei Wang	8f035fb637	Add TopK Op Grad CPU&GPU Kernel test=develop (#22628 ) * Add TopK Op Grad CPU&GPU Kernel test=develop * Add TopK Op Grad, modify grad op maker test=develop * Add TopK Op Grad, modify grad op maker test=develop * Add TopK Op Grad, modify PADDLE_ENFORCE test=develop * Add TopK Op Grad, modify PADDLE_THROW test=develop * Add TopK Op Grad, modify unittest test=develop * fix ngraph top k op unittest test=develop	5 years ago
Steffy-zxf	90ee366653	update ops's unittest data type from float32 to float64 and shape over 100 (#22544 ) * update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt 1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64) 2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data 3. remove sqrt from op_accuracy_white_list.py 4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100 5. test=develop * modify the writing style according suggestions test=develop	5 years ago
flame	f7eafca828	remove python inference warning (#22602 )	5 years ago
Chen Weihang	fe685cc185	fix enforce test error, test=develop (#22610 )	5 years ago
Wilber	9a8203aa25	fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop (#22551 ) 当一个模型中有多个fc_lstm子图的时候，且其中fc共用了同一个persistable的bias，此时不应该将bias节点删除，只将非persistable的节点去除即可。	5 years ago
Chen Weihang	266106da75	Fix mismatch with plus sign in the line (#22588 ) * reproduce match error, test=develop, test=document_fix * fix mismatch error, test=develop, test=document_fix	5 years ago
flame	1d503e6a9e	Golang inference API (#22503 ) * support golang inference	5 years ago
Zhaolong Xing	8acd745c25	[Ernie GPU Optim]: Fuse three fc to multihtead matmul (#22486 ) * 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop	5 years ago
Yiqun Liu	96770f519e	Disable fusion_group for windows and mac in build_strategy. (#22549 ) test=develop	5 years ago
Zeng Jinle	08033c8634	fix traced layer with non persistable vars, test=develop (#22552 )	5 years ago
Guo Sheng	31b5464632	Add support for dynamic_decode(while) training. (#22231 ) * Add support for dynamic_decode(while) training. test=develop * Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop * Fix test_rnn_decode_api.py. test=develop * Refine docs for apis in rnn.py. test=develop * Adjust outputs of dynamic_decode. test=develop * Remove the force_cpu update in assign_op. test=develop * Remove the force_cpu update in assign_op. test=develop * Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop * Rename _create_array_outof_while as _create_array_out_of_while in rnn.py. test=develop	5 years ago
tangwei12	b0675c8193	fix bug with compiledProgram (#22495 ) * add thread barrier for the compiled program	5 years ago
Wojciech Uss	4cddb43c5c	Add support for Ernie NLP model to the Slim QAT (#22506 ) * a test for Ernie QAT INT8 accuracy check test=develop * Remove NLP comparison test to split PRs test=develop * Fix typo and tabs, delete commented lines test=develop * re-combine the 2 PRs, test=develop Co-authored-by: Michał Gallus <sand3r@interia.eu> Co-authored-by: bingyanghuang <33643817+bingyanghuang@users.noreply.github.com>	5 years ago
Double_V	58d99247f4	support slice double grad, test=develop (#22166 ) * support slice double grad, test=develop * merge two doublegradopmaker to one doublegradopmaker,test=develop * change the shape of slice_OP's unittest, test=develop	5 years ago
hutuxian	1a7962be97	Paddlebox about box_wrapper (#22497 ) Refine PaddleBox Framework, Main functions: * Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC. * Replace FeedPass with new interface: BeginFeedPass & EndFeedPass * Refactor Pull/Push Sparse Function in box_wrapper. * Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct. * Cache copied keys in pull sparse in order to reuse it in push period.	5 years ago
huzhiqiang	9e29d3ebed	【OpPorting Example】DEMO OF FIX COMPILE&RUNTIME LOD_EQUALITY (#22460 )	5 years ago
yaoxuefeng	2235ee1a5e	multi-loss optimization by adding a DownpourOpt worker (#22025 ) * update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop	5 years ago
zhaoyuchen2018	54970444ce	Improve transpose performance with tile sm copy, test=develop (#22311 ) * Refine code, fix select tile error,test=develop * Refine element type and some comments, test=develop * Refine comments and gpu utils, test=develop * Remove some useless condition * Refine floor and ceil, test=develop * refine for loop. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Wilber	a90fa54092	Compile without nccl deps. [1/2] (#22509 ) 支持不依赖nccl进行编译。[1/2] 多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
guofei	3a59a7a11f	Make assign op support LoDTensorArray and modify while_loop API (#22309 ) This PR makes assign op support LoDTensorArray and enable the loop_vars in while_loop to support tuple or list.	5 years ago
Zhaolong Xing	54a325a52f	[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483 ) * add int8 op teller for trt. * refine trt int8 * add int8 op teller for trt. test=develop	5 years ago
zhongpu	5739eeb9fa	add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile for paddle, test=develop (#22504 )	5 years ago
Wilber	de009152a7	Compile without nccl deps. [2/2] (#22484 ) Compile without nccl deps. [1/2] Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Yiqun Liu	4b2227e958	Fix dismatch of std::max's arguments type on windows. (#22507 ) test=develop	5 years ago
Wilber	870f465887	fix test_fusion_seqpool_concat lod level between compile and runtime (#22488 )	5 years ago
Zhong Hui	a61d09527b	Fix the integer overflow problem of sequence2batch (#22479 ) Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t， In the /paddle/fluid/operators/math/sequence2batch.h#L122.	5 years ago
cc	197913ebe1	Add weight quantization in post_training_quanzitaion (#22445 ) * support weight quantization in post_training_quanzitaion, test=develop * add test for weight quantization, test=develop	5 years ago
Yiqun Liu	dcfb603897	Enable the detection of subgraph composed of grad ops (#21223 ) * Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop	5 years ago
Tao Luo	7c9ce097f1	refine reshape_op shape error message (#22480 ) test=develop	5 years ago
LielinJiang	2b1386b2b2	optimize performance of interpolate op (#22436 ) * optimize interpolate op, test=develop	5 years ago
wangchaochaohu	77dd0d97bb	use enum class to replace the usage of enum in some condition test=develop (#22464 )	5 years ago
Yiqun Liu	44b45b9f07	Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456 ) * Add log in memory::Copy for debug purpose. * Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one. * Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one. test=develop * Change the type of second_dim from size_t to int64_t. test=develop	5 years ago
joanna.wozna.intel	17f2c0899f	Add dequant-scale squash (#22409 ) * Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop	5 years ago
mapingshuo	9c4deedbc2	update readme of imdb training demo (#22455 ) * update readme * test=develop	5 years ago
Zhaolong Xing	ceda0b9b1a	[Fix BUG]: Core when multi thread + clone + paddle-trt (#22442 ) * add mutex for trt engine test=develop * add the test for copy_to_cpu test=develop	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Tao Luo	943cb8c664	fix sigmoid cudnn bug (#22439 ) * Sigmoid bug fix, test=develop * fix code format test=develop Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>	5 years ago
xujiaqi01	d51ffe860a	fix copy table bug (#22432 ) * fix copy table bug of lost some feasign * test=develop	5 years ago
Leo Chen	822e5b36ec	Support int16 for Tensor (#22423 ) * add int16 support, test=develop * add test, test=develop * fix typo, test=develop * fix dtype error in slice, test=develop	5 years ago
石晓伟	e1b0d7cbb1	remove anakin from code, test=develop (#22420 )	5 years ago
liu zhengxi	0404e7a985	Update the precision of pad, pad2d, pad_constant_like's unit tests from fp32 to fp64 (#22394 ) * update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop	5 years ago
xujiaqi01	371f377bea	add GeneralRoleMaker (#22295 ) * add GeneralRoleMaker which is for general usage * test=develop	5 years ago
Michał Gallus	269db0d1d1	[DNNL] Fix accuracy in INT8 FC (#22404 ) * Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop	5 years ago
joanna.wozna.intel	fb3086fd57	[UT coverage]Remove unnecessary transpose op registration (#22402 )	5 years ago
lidanqing	ade5022681	[UT Coverage]Improve sum_mkldnn_op line coverage (#22275 )	5 years ago
joanna.wozna.intel	3099d9d47c	Restore requantize squash (#22399 )	5 years ago
Wojciech Uss	92462e948d	improve elementwise_add_mkldnn_op test code coverage (#22359 )	5 years ago
ceci3	20f30dd604	add benchmark flag for conv_transpose (#22389 )	5 years ago
Leo Chen	b96c7c9a7a	polish code, test=develop (#22380 ) remove unnecessary template.	5 years ago
Chengmo	8f36c39537	Fix GEO-SGD init & send Bug (#22375 ) * test=develop, fix geo Send & Init	5 years ago
zhupengyang	c6f888e5a5	update unittest accuracy to float64 for relu, prelu, maxout (#22273 )	5 years ago
wangchaochaohu	0d8b222b79	Optimize the depthwise op test=develop (#22265 )	5 years ago
Leo Chen	aaa4fe491a	use function instead of lambda, test=develop (#22348 ) * use function instead of lambda, test=develop * follow comments, test=develop	5 years ago
Adam	e7a9f6bbb7	[Bugfix] Preserve shape in inpalce operators (#22360 )	5 years ago
qingqing01	2d20869c94	Fix infer_shape in compling for elementwise_op (#22291 )	5 years ago
Yiqun Liu	b7cac50b64	Implement a common python unittest to test the ir passes. (#22209 ) * Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop	5 years ago
tangwei12	82bc814a57	integrated HALF_ASYNC to communicator (#21869 ) * add half_async in the communicator * fix DistributedStrategy	5 years ago
wangchaochaohu	1e932eccfa	remove unused code test=develop (#22327 )	5 years ago
Leo Chen	3e5744aa65	Remove unused inputs for some operators (#22284 ) * remove unused inputs, test=develop * remove unused inputs, test=develop * update dtype, test=develop * remove unused inputs, test=develop * update op_use_default_grad_op_maker, tese=develop * resolve conflicts, test=develop * follow comments, test=develop * update center_loss_grad, test=develop	5 years ago
zhangchunle	805328e13b	fix typo in error message (#22312 )	5 years ago
lidanqing	895f8da7d6	change std::cout to log(INFO), vlog (#22316 )	5 years ago
石晓伟	8cb04664b9	revert paddle_fluid.map, test=develop (#22236 )	5 years ago
Chen Weihang	35efbe6d95	Speeding up dygraph DataLoader with multiprocessing (#21762 ) * add multiprocess for dygraph data loader, test=develop * polish code & add safe gurad, test=develop * refactor dygraph dataloader & add signal handler, test=develop * fix member initializer compile error on ci, test=develop * fix member initializer compile error one more, test=develop * remove useless config, test=develop * skip windows incompatible problem, test=develop * add unittest for coverage, test=coverage * add more exception unittest case, test=develop * deal with signal handler coverage, test=develop * polish code & add signal handler tests, test=develop * deal with coverage ci problem, test=develop * split data loader test & coverage ci fix, test=develop * remove test_imperative_data_loader_with_exception, test=develop * remove singal process except test case, test=develop * add exception tests again & remove sample list test, test=develop * split normal and exception unittests to diff class, test=develop * polish doc for use_multiprocess effect in static mode, test=develop	5 years ago
Zeng Jinle	9435533adf	remove op_use_default_grad_op_maker.spec, test=develop, test=document_fix (#22300 )	5 years ago
wangchaochaohu	7b76a76495	fix the conda build confilict test=develop (#22279 )	5 years ago
Zeng Jinle	5e601a92ad	polish grad op check (#22290 ) * polish grad op check, test=develop, test=document_fix * keep op_use_default_grad_maker.spec to avoid conflict, test=develop, test=document_fix	5 years ago
Bai Yifan	faba4b116a	Remove disable flag in test_fsp_op.py (#22171 ) * fix fsp_op, test=develop * fix fsp grad op maker, test=develop * update op_use_default_grad_op_maker.spec, test=develop	5 years ago
Zhen Wang	e40cfb1010	fix the bug of assert_is_op_output. test=develop (#22262 )	5 years ago
Wojciech Uss	d3a6647372	improve placement pass tests code coverage (#22197 )	5 years ago
liu zhengxi	07afc29e90	Make api.cc malloc consistent with paddle_api.h for PaddleBuf (#22255 )	5 years ago
silingtong123	4f1da4adcb	remove the useless third_party library from C++ inference library (#22021 ) * remove the useless third_party library from C++ inference library * revert removing the install directory	5 years ago
zhouwei25	549e6de7ac	faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 (#22164 )	5 years ago

... 2 3 4 5 6 ...

16827 Commits (289edf3962f039394452bfccafcd70ce3c3dde0f)