Paddle

Commit Graph

Author	SHA1	Message	Date
Qiao Longfei	faae1b4170	fix cpplint test=develop	7 years ago
Qiao Longfei	0a8ff2ecd4	add cpu_merge_add_multi_noduplicated_test test=develop	7 years ago
Qiao Longfei	920a960974	optimize merge add if input rows of all selected rows is not duplicated	7 years ago
Qiao Longfei	baf02328b2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	7 years ago
Kaipeng Deng	54474637ae	Merge pull request #16057 from heavengate/softmax_axis Add attr 'axis' for softmax	7 years ago
Qiao Longfei	30618409db	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator	7 years ago
dengkaipeng	90bd038d35	fix format. test=develop	7 years ago
phlrain	1580be5d6c	fix sequence pad; test=develop	7 years ago
dengkaipeng	93701dba50	add jit kernel for softmax axis. test=develop	7 years ago
dengkaipeng	6c64182709	refine softmax kernel. test=develop	7 years ago
phlrain	802b33489a	remove resize then seq num == 1; test=develop	7 years ago
sneaxiy	5a92e4c097	revert revert 16144 test=develop	7 years ago
Zeng Jinle	a91964c8fe	Revert "PaddingRNN model memory optimize" test=develop	7 years ago
Zeng Jinle	0b49e43d3a	Merge pull request #16144 from sneaxiy/rnn_mem_opt PaddingRNN model memory optimize	7 years ago
sneaxiy	b26e9bd232	refine code test=develop	7 years ago
tensor-tang	6ff230a624	Merge remote-tracking branch 'ups/develop' into refine/jit	7 years ago
tensor-tang	14a764c930	simplify the jitkernel templates and tests test=develop	7 years ago
Yiqun Liu	5bde120243	Make parent_idx a dispensable output for beam_search op to support models saved by older paddle version. (#16106 ) test=develop	7 years ago
tensor-tang	802f362ac4	unify the kernelfuncs cache and add unit test test=develop	7 years ago
Qiao Longfei	fab1b54d99	Merge branch 'add-communicator' of ssh://github.com/jacquesqiao/Paddle into add-async-ssa-graph-executor-communicator	7 years ago
Qiao Longfei	3691a46fa3	improve communicator	7 years ago
Yiqun Liu	87248281f7	Fix error in CUDA kernel of beam_search. (#15957 ) test=develop	7 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	7 years ago
xuezhong	1dad36f6aa	Merge pull request #15609 from xuezhong/add_sample_logits_op add sample_logits and sampled_softmax_with_cross_entropy op	7 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	7 years ago
xuezhong	81870723c6	Merge pull request #15605 from xuezhong/fix_bug_for_lstmp Fix bug for lstmp	7 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	7 years ago
xuezhong	f2262d7336	update comment test=develop	7 years ago
xuezhong	fb261793b9	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_sample_logits_op test=develop	7 years ago
xuezhong	fb9a6a2bc6	pass test for lstm op test=develop	7 years ago
xuezhong	2ba256df40	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_bug_for_lstmp	7 years ago
peizhilin	061299be87	fix dependency test=develop	7 years ago
xuezhong	4028943125	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_bug_for_lstmp	7 years ago
tensor-tang	a6a1a92ef7	Merge pull request #15586 from tensor-tang/jit/cache refine bert	7 years ago
xuezhong	4c98c2ccc3	remove debug print	7 years ago
xuezhong	58ad40cc15	add sample_logits op	7 years ago
xuezhong	880836329d	add cell clip and proj clip, fix bug for h0	7 years ago
Yiqun Liu	16d54f7f23	Return parent_idx in beam_search op (#15520 ) * Refine beam_search_op to output an extra parent_idx tensor. test=develop * Fix the unittest test_beam_search_op. test=develop * Fix the merging mistake. test=develop	7 years ago
tensor-tang	a18c0d4242	cache fc kernel test=develop	7 years ago
tensor-tang	6e1ee7fb57	cache softmax kernel func test=develop	7 years ago
tensor-tang	d59f733551	refine softmax and use with cache test=develop	7 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	7 years ago
tangwei12	5cfc40dea8	nce add check sample lables, test=develop (#15463 ) * nce add check sample lables, test=develop	7 years ago
Dun	9f8f0fc2d3	Memory optimization of depthwise conv op and group norm op (#15313 ) * mem opt * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * refine code test=develop * refine code test=develop * refine code test=develop * refine code test=develop * refine with cub test=develop * fix mkldnn test && remove comments && test=develop * polish code && test=develop * add only_forward test && test=develop	7 years ago
zhaozhehao	e2ba9668b4	Tree conv op (#15217 ) * refactor tree2col operator with new memory mechanism test=develop * test=develop * test=develop * Modified API according to panyx0718 test=develop * fix API change according to heavengate test=develop * Modify API comment test=develop	7 years ago
Qiao Longfei	4d15515c40	fix gru_gpu_kernel test=develop	7 years ago
Qiao Longfei	4feae25378	fix build problem test=develop	7 years ago
Qiao Longfei	4c7be265d3	update avx gru grad kernel test=develop	7 years ago
Qiao Longfei	9b16e54064	update gru_grad_op test=develop	7 years ago
Qiao Longfei	e477d789a1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into gru-add-mode	7 years ago

1 2 3 4 5 ...

642 Commits (0508c9869cc58a89d48448740928f649cc97d2be)