Paddle

Commit Graph

Author	SHA1	Message	Date
tensor-tang	cf8c8e72bd	add vtanh and unit test	7 years ago
tensor-tang	b37fe30417	Merge pull request #13690 from wangguibao/fix_cpu_lstm_compute_cc Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so	7 years ago
dzhwinter	26771f41ba	"fix compile error" (#13579 ) * "fix compile error" * "fix ci" * rerun ci test=develop * test=develop rerun ci	7 years ago
tensor-tang	d10a9df7b8	add vaddbias and unit test	7 years ago
tensor-tang	3c8b651187	add vsigmoid avx implementations and unit test	7 years ago
tensor-tang	55e44761fb	refine code and init vsigmoid	7 years ago
wangguibao	1940bc2d83	Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so test=develop	7 years ago
sneaxiy	584c3f048f	fix sparse rmsprop	7 years ago
Dun	161c3e31f7	Optimization of Kernels that related to DeepLabv3+ (#13534 ) * refine reduce by cub * optimize KernelDepthwiseConvFilterGrad * optimize depthwise conv and reduce mean and reduce sum * fix bug: dilation * cuda arch and cuda 8 compatible	7 years ago
tensor-tang	2d0ff6a3c2	add vexp and unit test	7 years ago
tensor-tang	b3c63f40fa	add vscal and unit test	7 years ago
tensor-tang	0987f2b4d9	add vadd unit test	7 years ago
tensor-tang	3d928d4f9d	refine and seepdup	7 years ago
tensor-tang	77fc42d2d1	Merge remote-tracking branch 'ups/develop' into fea/jitkernel	7 years ago
tensor-tang	2937314d8e	refine vmul and test	7 years ago
tensor-tang	6c986e127a	fix macro and add vmul unit test	7 years ago
Yu Yang	0be1582df0	Merge pull request #13525 from reyoung/fix_mixed_vector Fix mixed vector	7 years ago
tensor-tang	8c69764d12	add vmul unit tests	7 years ago
tensor-tang	084893a9a9	add vadd kernel	7 years ago
tensor-tang	eeff268a6c	clean and refine kernels	7 years ago
tensor-tang	dee5d35c20	refine vmul	7 years ago
tensor-tang	92031968d7	init vmul kernel	7 years ago
tensor-tang	b9acbcc8c5	init lstm kernel	7 years ago
tensor-tang	c260bf942d	init jit kernel	7 years ago
Yu Yang	3043f51b3a	Merge pull request #13511 from reyoung/fix_ce Revert "Merge pull request #13431 from chengduoZH/refine_lod"	7 years ago
Yu Yang	f7af695801	Merge pull request #13505 from reyoung/fix_selected_rows_functor_test Fix unstable selected_rows_functor_test.cu	7 years ago
Yu Yang	6d2c6f96f1	Revert "Revert "Merge pull request #13431 from chengduoZH/refine_lod"" This reverts commit `a6c8d6b9a2`.	7 years ago
Yu Yang	a6c8d6b9a2	Revert "Merge pull request #13431 from chengduoZH/refine_lod" This reverts commit `bd79e04667`, reversing changes made to `6b4d290c18`.	7 years ago
Zeng Jinle	7f1e312677	Merge pull request #13456 from sneaxiy/refine_sparse_adam Fix sparse Adam and Gradient clip of SelectedRows	7 years ago
Yu Yang	b5996fa124	Fix unstable selected_rows_functor_test.cu	7 years ago
sneaxiy	a29b4227eb	fix sparse gradient clip	7 years ago
Yihua Xu	87086b1386	Refine activation for GRU operator (#13275 ) * Optimize GRU with AVX instruction * Clean code * Add the Unitest and fix the align issue * Remove the remanent part of the unitest part * Code clean * Fix the parameters length issue for fusion_gru to pass CI * Change the default type as float32	7 years ago
chengduo	d402234ba8	Feature/op_fuse_pass (#12440 ) * Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * add op_fuse pass * add backward * code refine * use TopologySortOperations * follow comments * refine IsFusible * code enhance * fix op_fusion_pass * refine code * refine fuse_elemwise_act_op * adjust the input and output * refine logic * add intermediate_edge * disable inplace * follow comments * refine logic * follow comments * Remove the removable IntermediateOut * change strategy * code refine * enable fuse backward * code refine * code refine * rename unit test * follow comments	7 years ago
Yu Yang	2c31ea9293	Merge pull request #13424 from chengduoZH/refine_seq_concat Refine seq_concat	7 years ago
Yu Yang	5996e224fa	Merge pull request #13430 from chengduoZH/refine_seq_pool Refine seq_pool	7 years ago
sneaxiy	b6f61faf13	fix adam	7 years ago
chengduoZH	6534f8527a	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_lod	7 years ago
chengduoZH	24459501fe	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_concat	7 years ago
chengduoZH	f92b07f0b5	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_pool	7 years ago
gongweibao	0c8c0d943f	fix macunittest (#13434 )	7 years ago
chengduoZH	cdb9605bad	refine	7 years ago
chengduoZH	cacf549e8a	refine seq_pool	7 years ago
chengduoZH	e7940141ce	refine seq_concat	7 years ago
tensor-tang	7c8730824a	Merge pull request #13396 from tensor-tang/refine/op/lstm Refine/op/lstm	7 years ago
Tao Luo	40c54db301	Merge pull request #13338 from bingyanghuang/bingyang/seq_pool_memcpy Use memcpy to rewrite the sequence pooling LAST and FIRST mode	7 years ago
tensor-tang	e09cf031a8	refine src and header	7 years ago
bingyanghuang	76553c5a6d	fix travis-ci	7 years ago
tensor-tang	bc9971dd6c	fix deps	7 years ago
tensor-tang	ff858d35ed	fix bug and enable on batch mode as well	7 years ago
tensor-tang	8dea07f209	fix comopile	7 years ago
tensor-tang	612ba41aee	add simple lstm compute	7 years ago
bingyanghuang	83394bab3e	modified by luotao's suggestion	7 years ago
Bai Yifan	faf8ad2436	Add ignore_index in cross_entropy op (#13217 ) * add ignore index * update api.spec * enhance softmax_with_cross_entropy	7 years ago
bingyanghuang	1454cd54aa	pre-commit check	7 years ago
bingyanghuang	7429067ab3	clean code	7 years ago
bingyanghuang	cdbc5e7353	Add some comments	7 years ago
bingyanghuang	53185fde11	Rewrite sequence pooling last and first mode with memcpy and clean code	7 years ago
dzhwinter	379b471ee2	squash commit	7 years ago
dzhwinter	f05520060e	fix style (#13142 )	7 years ago
tensor-tang	f38905a6e5	Merge remote-tracking branch 'ups/develop' into optimize/op/fusion_gru	7 years ago
dzhwinter	34757efb8e	fix windows compile	7 years ago
dzhwinter	dbe90cc0f6	merge develop branch	7 years ago
dzhwinter	ab1097cd8e	Feature/template (#13093 ) * remove template operator * "fix compile" * "fix ci" * "fix ci"	7 years ago
tensor-tang	7bdd11d88e	Merge branch 'develop' into optimize/op/fusion_gru	7 years ago
tensor-tang	b0d36c4c3d	add cross vec to speedup gru	7 years ago
chengduo	3bd1d22a7d	Enhance fused_elementwise_activation_op (#12837 ) * Enhance the function of fused_elementwise_activation_op * enhance unit test * Clean Code And Add Doc * Add compound functors * Fix doc and enhance unit test * define Dx and Dy for d_binary_func * add mul_scale * add mul_scale * add elementwise_mul * code refine * code refine * add doc * add AsIntermediate	7 years ago
tensor-tang	2d0ddf8c41	refine cpu gru batch mode	7 years ago
tensor-tang	70d3981220	add cpu vec bias sub	7 years ago
tensor-tang	d941192e74	fix gcc53 on cpu vec (#13020 )	7 years ago
tensor-tang	2328a69157	Merge pull request #13012 from tensor-tang/refine/seq2batch refine seq2batch	7 years ago
tensor-tang	fd4f7c3ab5	refine seq2batch	7 years ago
fengjiayi	7e0c9f50ae	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op	7 years ago
fengjiayi	9cb455fa7d	update function	7 years ago
Zeng Jinle	ef7bd03a03	Merge pull request #12964 from sneaxiy/fix_concat_sync Fix concat bug	7 years ago
qingqing01	1f09bc320c	Support data type int8_t . (#12841 ) * Support int8 type.	7 years ago
dzhwinter	cd8f3e9ed0	operator module is done	7 years ago
chengduo	3e1050a2e8	Add pad_constant_like_op (#12943 ) * Add pad_constant_batch_size_like * refine pad_op * optimize memory	7 years ago
dzhwinter	6cc7870517	fix concat synchronization bug	7 years ago
dzhwinter	2ec589a24e	float.h fixed	7 years ago
dzhwinter	7dceb8a080	check some operators	7 years ago
dzhwinter	26dbe35c54	add msvc flags and copy lib done	7 years ago
Qiao Longfei	3c58b87b45	fix auc layer and add check for auc op (#12954 ) * fix auc layer and add check for auc op * use input to check if states are inited * optimize code	7 years ago
dzhwinter	d7f98f37a7	more platform is done	7 years ago
dzhwinter	eca4563e5d	operators module (#12938 )	7 years ago
dzhwinter	a94d4f51a8	fix math_function compile	7 years ago
tensor-tang	7bdaf09664	Merge remote-tracking branch 'ups/develop' into refine/jit	7 years ago
tensor-tang	3462c29940	refine add bias with avx	7 years ago
dzhwinter	c1ad52f768	pre-commit	7 years ago
dzhwinter	89f95ea25e	merge develop branch	7 years ago
tensor-tang	bb9f98e10d	add inplace test	7 years ago
tensor-tang	f269614bcd	further optimize tanh with avx and mkl	7 years ago
luotao1	2b4edacca0	enhance the forward of concat op	7 years ago
dzhwinter	34f8c9b6f5	windows port	7 years ago
tensor-tang	7a4924cd44	further optimize sigmoid with avx and avx512	7 years ago
tensor-tang	6bd89ba5b6	fix typo	7 years ago
tensor-tang	e3bb98eb38	optimize relu with avx and avx512	7 years ago
tensor-tang	25976fe736	optimize the sigmoid and tanh	7 years ago
tensor-tang	2eb46c2b06	add cpu vec test	7 years ago
tensor-tang	f0f06992c1	Merge pull request #12878 from tensor-tang/feature/op/attention_lstm Add attention lstm cpu forward	7 years ago
fengjiayi	f4a4a4cbd9	add op comment and python layer	7 years ago
tensor-tang	5ca0bb9aad	support more activation type and remove some comments	7 years ago
tensor-tang	ec59f0d454	add cpu vec	7 years ago
tensor-tang	cf5ea925c3	fix bugs	7 years ago
tensor-tang	3dd66390b2	add blas vexp	7 years ago
tensor-tang	0ec1f65cf1	fix blas dot and add cblas scal	7 years ago
tensor-tang	a2203d0466	add cblas dot	7 years ago
tensor-tang	f72ab8961e	refine blas gemm	7 years ago
Yu Yang	3768677980	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/process_lod_grad	7 years ago
Yu Yang	2a36ad1a96	Handle LoD for concat & seq_softmax ops	7 years ago
fengjiayi	ce182d9037	bug fix	7 years ago
Tao Luo	d04ef276a5	Merge pull request #12745 from tensor-tang/refine/op/elewise_mul Refine elementwise mul cpu forward	7 years ago
fengjiayi	34b209cffa	Complete sequence_padding GPU kernel	7 years ago
tensor-tang	b090479409	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
fengjiayi	8d8d48a34f	Complete sequence_pad_op and its CPU kernel. Add unittests	7 years ago
dzhwinter	4069262f0e	Revert ""cherry picked operators changes" (#12184 )" (#12747 ) This reverts commit `bf3c34960f`.	7 years ago
fengjiayi	3c749fae43	update CPU sequence_padding functor	7 years ago
tensor-tang	92890ac258	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
tensor-tang	6644ce79a5	add mklml vmul	7 years ago
tensor-tang	ff92b6ba81	Merge pull request #12531 from tensor-tang/refine/op/gru Refine gru cpu forward	7 years ago
tensor-tang	a72f68f223	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
tensor-tang	f3cd2612ae	refine fc and use the fc compute in fusion_lstm	7 years ago
dzhwinter	bf3c34960f	"cherry picked operators changes" (#12184 ) * "cherry picked operators changes" * "remove duplicated code" * "add constant setter" * "add get expected kernel" * "fix ci" * "add fill constant"	7 years ago
fengjiayi	a38a8db928	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op	7 years ago
tensor-tang	3bf3e77ac8	Merge remote-tracking branch 'ups/develop' into refine/op/gru	7 years ago
chengduo	7c8b69c700	Feature/op fusion (#12240 ) * Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * follow comments * follow comments	7 years ago
tensor-tang	54c95e49f0	fix blas	7 years ago
tensor-tang	8c23f7c4f0	fix blas and use packed weight	7 years ago
tensor-tang	43cee33a23	add mkl packed gemm	7 years ago
tensor-tang	d8d2dbcfac	further optimize im2col using variables	7 years ago
tensor-tang	687a322267	Merge remote-tracking branch 'ups/develop' into refine/im2col	7 years ago
tensor-tang	65d418f060	complete im2col with padding==1 and speedup filter width==1	7 years ago
tensor-tang	52eb86e30f	refine im2col benchmark	7 years ago
tensor-tang	3017f46076	add more test cases	7 years ago
tensor-tang	8d6be4fb5f	refine im2col test and add benchmark	7 years ago
tensor-tang	507c143047	im2col cfo cpu code clean	7 years ago
tensor-tang	4eeed0b5e4	refine width padding and enable core copy	7 years ago
Wu Yi	73fcfc06ec	refine conv cudnn enforce (#12353 ) * refine conv cudnn enforce * update * update all cudnn ops * fix	7 years ago
tensor-tang	e3131e2d73	enable width padding	7 years ago
tensor-tang	92518c519f	reuse sizes saving time	7 years ago
tensor-tang	660df122ce	enable padding!=0 and fill height padding with 0	7 years ago
tensor-tang	d8e00facf7	reuse im_size	7 years ago
tensor-tang	b72befc5cc	reuse copy size	7 years ago
tensor-tang	6788af4bf1	refine test cases	7 years ago
tensor-tang	b163e601b6	add gtest	7 years ago
tensor-tang	aae994fd26	refine im2col no padding	7 years ago
Yan Chunwei	02cf54d331	bugfix lod cpu performance (#12297 )	7 years ago
tensor-tang	fc2b578842	add gemm_warp test	7 years ago
tensor-tang	a916c52579	refine gemm	7 years ago
tensor-tang	961e754c9f	mkl split gemm for better perf	7 years ago
tensor-tang	f0cd493c0d	Merge pull request #11989 from tensor-tang/feature/libxsmm introduce libxsmm	7 years ago
Guo Sheng	da3f766821	Merge pull request #12088 from guoshengCS/complete-hsigmoid Complete hsigmoid_op	7 years ago
guosheng	4ee069fdba	Fix the HierarchicalSigmoidGradOpKernel and refine the codes. Now hsigmoid_op is same with V2 implementation and can pass gradient check.	7 years ago
tensor-tang	1c5d6c5692	disable xsmm with float16	7 years ago
tensor-tang	c9ba51ead8	Merge remote-tracking branch 'ups/develop' into feature/libxsmm	7 years ago
tensor-tang	64a8e6d20e	refine the threshold functions	7 years ago
lemon34	29145e1e31	change im2sequence for ctc batch inference (#11696 ) * change im2sequence for ctc batch inference * Update im2sequence_op.cc * change im2sequence for ctc batch inference * update * change PR by comment * fix ocr test error * fix test_im2sequence * modify the old name to standard name * fix test_layers failed	7 years ago
guosheng	e7a4cfc0ff	complete the hsigmoid_op	7 years ago
guosheng	d695381677	Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into complete-hsigmoid	7 years ago
tensor-tang	6bc1aaaac7	refine the ColMajor replacement	7 years ago
tensor-tang	de856da9a6	fix ColMajor and RowMajor replacement	7 years ago
tensor-tang	21516e5cbe	add unit test of smm	7 years ago
tensor-tang	c3941745b3	add libxsmm_gemm	7 years ago
tensor-tang	7782a4ab53	fix blas build issue	7 years ago
tensor-tang	17987eb3fc	link libxsmm	7 years ago
tensor-tang	3df99e72ab	Merge remote-tracking branch 'ups/develop' into refine/set_num_threads fix conflicts	7 years ago
dzhwinter	4ed0b62476	Move fluid::framework::InitDevices into fluid::platform (#11757 ) * move to platform * "move init from framework to platform" * "remove used init" * "fix ci" * "fix ci" * "fix generic" * "fix ci" * "fix ci" * "fix ci" * "disable fragile test"	7 years ago
dzhwinter	99a99ec7e3	"remove lapack" (#11966 )	7 years ago
Xin Pan	a9086bf320	also move a few other dir to legacy/	7 years ago
tensor-tang	e3a96300bb	move SetNumThreads to platform	7 years ago
tensor-tang	1f09ddf806	Merge remote-tracking branch 'ups/develop' into refine/mklml/dyload	7 years ago
Tao Luo	bfe5dc6312	Merge pull request #11607 from chengduoZH/fix_concat_warning Fix concat compile warning	7 years ago
chengduoZH	804c767107	fix concat warning	7 years ago
tensor-tang	f503f12925	enable dynamic load mklml lib on fluid	7 years ago
fengjiayi	12619fcf90	fix a compile error	7 years ago
qiaolongfei	762160bd8c	fix concat grad kernel	7 years ago
qingqing01	9c90dc9728	Make the CUDA kernel of concat correct and fix unit tests. (#11541 ) * Make the CUDA kernel of concat correct and fix unit tests.	7 years ago
qiaolongfei	ad1ad738d8	add gpu support for concat	7 years ago
qiaolongfei	9c128fe656	concat support data as input	7 years ago
weixing02	ee13b396f2	fix some errors	7 years ago
weixing02	8bd148dc00	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into hsigmoid_op	7 years ago
tensor-tang	9169b3b802	Merge pull request #10789 from Xreki/core_fix_openblas_threads Add an interface to set the number of threads for math function, and set the default value to 1 for inference.	7 years ago
guochaorong	04b8d3d03c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into paddle_fix	7 years ago
guochaorong	0fec9469f9	fix some bugs introduced by unfreed memory	7 years ago
weixing02	3e46ec41a9	add hsigmoid	7 years ago
qingqing01	3ba75d4a69	Check label range in cross entropy calculation. (#10954 )	7 years ago
Tomasz Patejko	e43c8f33cd	MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used	7 years ago
yangyaming	10ec329b7d	Refine code.	7 years ago
Liu Yiqun	50ba205d79	Merge branch 'develop' into core_fix_openblas_threads	7 years ago
Liu Yiqun	39eb871ddf	Add an interface to set the number of threads for math function, and set the default value to 1 for inference.	7 years ago
yuyang18	fd2b4b478e	Make tensor support uint8	7 years ago
Yiqun Liu	b7026f79a9	Fix a bug related to dispensable inputs and refine the inference unittest (#10527 ) * Fix a bug related to dispensable inputs and refine the inference unittest. * Fix the use of dispensable inputs in reshape_op. * Polish the enforce statements. * Fix an English writing typo.	7 years ago
yangyaming	0797246704	Enhance sequence_padding functor (CPU and GPU).	7 years ago
yuyang18	66590a0b88	Fix typo in blas_impl.h	7 years ago
yuyang18	27197290dc	matmul support float16/double	7 years ago
Yu Yang	fcd31d6161	Follow comments and polish code names	7 years ago
Yu Yang	0a13d3c67a	Move MatMul to blas_impl.h Rename MatDim to MatDescriptor	7 years ago
Yu Yang	3dd01823a8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_matmul	7 years ago
Yu Yang	c6a6d87f96	Rewrite Matmul, make code cleaner	7 years ago
fengjiayi	b708ec0ae1	Merge pull request #10412 from JiayiFeng/correct_TensorCopy_misuse Correct tensor copy misuse	7 years ago
Darcy	8f8a4768dc	adding device_context to blas deps list (#10420 ) * adding operator to blas deps list * use device_context instead to solve cycle deps	7 years ago

... 2 3 4 5 6 ...

433 Commits (c75dc885b58000b018414ab442097ee515244b9c)