Paddle

Commit Graph

Author	SHA1	Message	Date
Kaipeng Deng	99c78b772a	fix softmax axis!=-1. test=develop (#19800 )	6 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
123malin	2f037c3189	fix the diff between async mode and async_half mode (#19535 ) * test=develop, communicator merge add => merge average	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
Tao Luo	d6c85c96dc	paddle::framework::vectorize() templatization (#19627 ) test=develop	6 years ago
Tao Luo	0a46d34538	refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607 ) test=develop	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Tao Luo	49523ea189	replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586 ) * remove unused PADDLE_ASSERT(_IS_NOT_ERROR) * replace PADDLE_ASSERT with PADDLE_ASSERT_MSG test=develop	6 years ago
zhouwei25	84c728013c	fix the compilation issue on windows caused by mkl_CSRMM (#19533 )	6 years ago
Zeng Jinle	11f2f78458	fix sofmax seg fault in AVX, test=develop (#19487 )	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
silingtong123	af0fbd9012	change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205 ) * print error code if cuda related API fails	6 years ago
LielinJiang	22fa4c2d24	Fix depthwise conv gpu kernel bug (#18582 ) * fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop	6 years ago
Bob Zhu	220eef602e	Extend Matmul to support matrix multiplication with multiple heads (#18570 ) * extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].	6 years ago
Zeng Jinle	f5641000bb	Add a unittest to inplace elementwise_add (#18385 ) * add_elementwise_add_inplace_test,test=develop * rename file, test=develop	6 years ago
Hongyu Liu	df2eee71d8	Sequence mask support tensor (#18249 ) * sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-x to elmentwise_op; test=develop add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop	6 years ago
Yiqun Liu	660c1a65f3	Optimize fused_elewise_activation_grad op. (#18041 ) test=develop	6 years ago
Yiqun Liu	7e463c84a6	Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979 ) test=develop	6 years ago
Yibing Liu	33d1e56506	Enable seq_pool op to accept len 0 input (#17284 ) * Enable seq_pool op to accept len 0 input test=develop * Update sequence_pool's api test=develop * Add more unittest cases for seq_pool op test=develop * Remove legacy comments test=develop * Don't use template in op maker test=develop	6 years ago
Yiqun Liu	8fd39f3e99	Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236 ) * Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop	6 years ago
Yiqun Liu	5782dddad0	Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415 ) * Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2. test=develop * Refine codes. test=develop * Correct the condition. test=develop * Move the define of tmp_data outside the if statement. * Print the cudnn minor version. test=develop * Fix the case when in_num/o_num is 1 in concat/split op. test=develop * Remove const_cast. test=develop	6 years ago
tensor-tang	7ae461eb13	[CPU] refine cpu softmax bwd (#17534 ) * refine softmax fwd test=develop * refine cpu softmax bwd test=develop * fix batch size test=develop * fix compile issue with gpu test=develop * add value clip	7 years ago
tensor-tang	0600b370ea	[CPU] refine softmax op fwd on CPU (#17522 ) * refine softmax fwd test=develop * fix compile issue wih gpu test=develop * add value clip to avoid exp	7 years ago
liuwei1031	ba70cc499e	fix security bugs : (#17464 ) http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page test=develop	7 years ago
zhaoyuchen2018	b02f2aff04	Add conditional compile for gru opt (#17368 ) * improve gru unit performance. refine code test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * Add conditional compile for gru opt Not enable gru opt if compute ability < 700 test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	7 years ago
Krzysztof Binias	0823a7bc8b	Optimize the sequence padding op (#17403 ) test=develop	7 years ago
zhaoyuchen2018	8a2caacdbc	improve gru unit performance. (#16338 ) refine code fuse cublas calling and kernels into one cuda kernel. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	7 years ago
Kaipeng Deng	a71d8fdb87	Softmax_cross_entropy op add axis (#16806 ) * add attr axis infershape. test=develop * add CUDA kernel. test=develop * fix unittest. test=develop * fix unittest for soft_label. test=develop * fix fp16 unittest. test=develop * remove comment code. test=develop * refine test for axis. test=develop * add python api. test=develop * fix doc. test=develop * fix fp16 unittest. test=develop * fix ngraph test. test=develop * fix ENFORCE for test_imperative_transformer. test=develop * fit for ngraph test. test=develop * fix after rebase develop. test=develop * fix doc. test=develop * fix API.spec. test=develop * fix test_layers. test=develop * fix format. test=develop	7 years ago
Yibing Liu	3c375751f8	Support seq len equal to 0 in sequence ops (#16935 ) * Support seq len equal to 0 in sequence ops test=develop * Add more test cases * Fix some comments test=develop * Fix py3 error test=develop	7 years ago
Kevin	c474e7ddf5	fix overflow by int32 mul test=develop (#16794 ) * fix overflow by int32 mul test=develop * fix reference nullptr * fix codestyle test=develop * modify to point in ContextProjectFunctor test=develop * modify to point in ContextProjectFunctor test=develop * modify . to -> test=develop	7 years ago
Qiao Longfei	faae1b4170	fix cpplint test=develop	7 years ago
Qiao Longfei	0a8ff2ecd4	add cpu_merge_add_multi_noduplicated_test test=develop	7 years ago
Qiao Longfei	920a960974	optimize merge add if input rows of all selected rows is not duplicated	7 years ago
Qiao Longfei	baf02328b2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	7 years ago
Kaipeng Deng	54474637ae	Merge pull request #16057 from heavengate/softmax_axis Add attr 'axis' for softmax	7 years ago
Qiao Longfei	30618409db	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator	7 years ago
dengkaipeng	90bd038d35	fix format. test=develop	7 years ago
phlrain	1580be5d6c	fix sequence pad; test=develop	7 years ago
dengkaipeng	93701dba50	add jit kernel for softmax axis. test=develop	7 years ago
dengkaipeng	6c64182709	refine softmax kernel. test=develop	7 years ago
phlrain	802b33489a	remove resize then seq num == 1; test=develop	7 years ago
sneaxiy	5a92e4c097	revert revert 16144 test=develop	7 years ago
Zeng Jinle	a91964c8fe	Revert "PaddingRNN model memory optimize" test=develop	7 years ago
Zeng Jinle	0b49e43d3a	Merge pull request #16144 from sneaxiy/rnn_mem_opt PaddingRNN model memory optimize	7 years ago
sneaxiy	b26e9bd232	refine code test=develop	7 years ago
tensor-tang	6ff230a624	Merge remote-tracking branch 'ups/develop' into refine/jit	7 years ago
tensor-tang	14a764c930	simplify the jitkernel templates and tests test=develop	7 years ago
Yiqun Liu	5bde120243	Make parent_idx a dispensable output for beam_search op to support models saved by older paddle version. (#16106 ) test=develop	7 years ago
tensor-tang	802f362ac4	unify the kernelfuncs cache and add unit test test=develop	7 years ago

1 2 3 4 5 ...

673 Commits (b34933d9ee3b61dbbd642fd02f244c36d0d14550)