Paddle

Commit Graph

Author	SHA1	Message	Date
GaoWei8	d4dda8628e	optimize fc jit (#21878 ) test=develop	5 years ago
GaoWei8	5af0c7ba89	Modify padding strategy: remove weight copy in fc padding (#21650 ) test=develop	5 years ago
Tao Luo	01fa4ead61	fix -Wno-error=sign-compare warning in gcc8 (#21434 ) * fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop	5 years ago
Tao Luo	c0656dcb1a	remove -Wno-error=sign-compare, make warning as error (#21358 ) * remove -Wno-error=sign-compare, make warning as error test=develop test=document_fix * fix exist compile warning test=develop	5 years ago
GaoWei8	8493f20ebc	Polish the codes of fc when needs padding (#21378 ) test=develop	5 years ago
GaoWei8	234060f88f	Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972 ) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop	5 years ago
Liufang Sang	f0b1518438	add dequantize_abs_max op and modify lookup_table op (#20899 ) * add int8 kernel to lookup_table op and add dequantize op test=develop * change paddle_enforce to paddle_enforce_eq test=develop * change copyright and change some not suitable code test=develop * remove debug log test=develop * replace GetInputType with IndicateVarDataType test=develop * fix EmptyGradMaker test=develop * fix diff between cpu and gpu test=develop * use memcopy when int8_t test=develop	5 years ago
whs	cfdd1fc2cd	Fix warpctc in padding mode. (#21033 )	5 years ago
lilong12	e249d9a3e2	fix the computation for dx (grad for x) for prelu operation. (#20949 ) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop	5 years ago
Chen Weihang	2f27b10331	Add dependency for error_codes.proto (#21084 ) * fix activation_functions deps, test=develop, test=document_fix * add error_codes_proto deps, test=develop, test=document_fix * try delete enforce.h, test=develop, test=document_fix	5 years ago
zhaoyuchen2018	0059404e77	Fix ce ocr_recognition test fails (#20987 ) ocr_recognition fails, so add a path to handle small frame_size. test=develop	5 years ago
Tao Luo	25ffa8445d	refine murmurhash3_x64_128 for bloom_filter (#20996 ) test=develop	5 years ago
zhaoyuchen2018	7f3a445e9a	Fix gru as small frame_size has error. (#20922 ) seems shuffle_sync cannot handle small size test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Zhang Ting	8d1e9f0f7e	maxout supports channel_last input (#20846 ) * maxout support channel_last input, test=develop * modified details of Input(X) and Attr(groups, axis) in doc, test=develop	5 years ago
Zhang Ting	c18f1bd716	fix the bug of conv_transpose:compatible with Anylayout setting, test=develop (#20897 )	5 years ago
zhang wenhui	d428912503	fix select_rows mergeadd bug, test=develop (#20876 )	5 years ago
Aurelius84	aacd16dbb4	add pyramid_hash_op (#20698 )	5 years ago
Pei Yang	e89c16b90d	Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733 ) * fix pool2d trt converter, test=develop * add fix for split op converter, test=develop	5 years ago
qingqing01	01eddc1a04	Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636 ) * Support fp16 in fused_elemwise_activation_op. * Fix unit testing in ONLY-CPU mode.	5 years ago
Zhang Ting	78910480c1	fix conv_transpose's bug: compatible with Anylayout setting, test=develop (#20589 )	5 years ago
liym27	ad60b3b8ac	mv two function in conv op for good code style (#20116 ) * Delete PadFuntion, include padding.h instead. test=develop * move function(IsSymmetricPadding) from conv_cudnn_op.cu/conv_transpose_cudnn_op.cu to padding.h, test=develop	5 years ago
Zhang Ting	cf6919bf6e	conv_transpose supports channel_last input, test=develop, test=document_preview (#20072 )	5 years ago
danleifeng	425279a57b	Improve elementwise operators performance in same dimensions. (#19763 ) Improve elementwise operators performance in same dimensions	5 years ago
liym27	3aa331d97e	fix conv2d and conv3d: (#20042 ) 1.support asymmetric padding; 2.support padding algorithm:"SAME" and "VALID"; 3.support channel_last: data_format NHWC and NDHWC; 4.change doc of python API and c++; test=develop, test=document_preview	5 years ago
liym27	24010472d4	fix pool2d pool3d,support asymmetric padding and channel_last (#19739 ) * fix pool2d pool3d: 1. support asymmetric padding; 2. support padding algorithm:"SAME" and "VALID"; 3. support channel_last: data_format NHWC and NDHWC; 4. support inferring shape when input with negative dims in compile time; 5. change doc of python API and c++; 6. fix bug in cuda kernel when Attr(adaptive) is true. test=develop,test=document_preview * fix 'tensors' to 'Tensors'. test=develop,test=document_preview * add test for converage ValueError.test=develop,test=document_preview * resolve conflict in test_pool2d. test=develop	5 years ago
chengduo	fb2a9cdf83	Add fp16 support for pad and split (#19881 ) * make pad and split support fp16 test=develop	5 years ago
Bob Zhu	c670058a8d	add support of matmul with multiple head even different width and height (#19708 ) * add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * refactor the code of matmul with multiple head even different width and height test=develop	5 years ago
Kaipeng Deng	3f021781a1	fix softmax CE time limit check failed (#19846 ) * fix softmax ce time limit check failed. test=develop * refine softmax calc. test=develop	5 years ago
Aurelius84	fcf53e55ff	support 2-level lod of input in sequence_pool (#19839 ) * support 2-level lod of input in sequence_pool test=develop * fix lod level bug in .cu test=develop	6 years ago
Kaipeng Deng	99c78b772a	fix softmax axis!=-1. test=develop (#19800 )	6 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
123malin	2f037c3189	fix the diff between async mode and async_half mode (#19535 ) * test=develop, communicator merge add => merge average	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
Tao Luo	d6c85c96dc	paddle::framework::vectorize() templatization (#19627 ) test=develop	6 years ago
Tao Luo	0a46d34538	refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607 ) test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Tao Luo	49523ea189	replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586 ) * remove unused PADDLE_ASSERT(_IS_NOT_ERROR) * replace PADDLE_ASSERT with PADDLE_ASSERT_MSG test=develop	6 years ago
zhouwei25	84c728013c	fix the compilation issue on windows caused by mkl_CSRMM (#19533 )	6 years ago
Zeng Jinle	11f2f78458	fix sofmax seg fault in AVX, test=develop (#19487 )	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
silingtong123	af0fbd9012	change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205 ) * print error code if cuda related API fails	6 years ago
LielinJiang	22fa4c2d24	Fix depthwise conv gpu kernel bug (#18582 ) * fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop	6 years ago
Bob Zhu	220eef602e	Extend Matmul to support matrix multiplication with multiple heads (#18570 ) * extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].	6 years ago
Zeng Jinle	f5641000bb	Add a unittest to inplace elementwise_add (#18385 ) * add_elementwise_add_inplace_test,test=develop * rename file, test=develop	6 years ago
Hongyu Liu	df2eee71d8	Sequence mask support tensor (#18249 ) * sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-x to elmentwise_op; test=develop add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop	6 years ago
Yiqun Liu	660c1a65f3	Optimize fused_elewise_activation_grad op. (#18041 ) test=develop	6 years ago
Yiqun Liu	7e463c84a6	Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979 ) test=develop	6 years ago
Yibing Liu	33d1e56506	Enable seq_pool op to accept len 0 input (#17284 ) * Enable seq_pool op to accept len 0 input test=develop * Update sequence_pool's api test=develop * Add more unittest cases for seq_pool op test=develop * Remove legacy comments test=develop * Don't use template in op maker test=develop	6 years ago
Yiqun Liu	8fd39f3e99	Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236 ) * Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop	6 years ago

1 2 3 4 5 ...

702 Commits (310edc0d0c9050f1f01c108655493c1935c00214)