Paddle

Commit Graph

Author	SHA1	Message	Date
tensor-tang	6447155dac	Merge pull request #13851 from tensor-tang/fea/jitkernel_peephole Fea jitkernel lstm peephole	7 years ago
sneaxiy	4b4af84e67	test=develop	7 years ago
Qiao Longfei	0225957515	change elementwise_add to elementwise_add_to test=develop	7 years ago
Qiao Longfei	b4a32eafdf	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op test=develop	7 years ago
Zeng Jinle	93606c2c2c	Merge pull request #13689 from sneaxiy/sparse_rmsprop Fix sparse rmsprop	7 years ago
sneaxiy	5cedfb60c8	test=develop	7 years ago
Qiao Longfei	936926aadd	code optimize test=develop	7 years ago
Qiyang Min	cab29828a5	Merge pull request #13829 from velconia/accelerate_sequence_pool_op Accelerate SequencePool Op on SUM mode of CPU	7 years ago
Qiao Longfei	c52ccbc109	clean code	7 years ago
Qiao Longfei	6056d04361	optimize blas call	7 years ago
Qiao Longfei	5db7551317	optimize code	7 years ago
Qiao Longfei	eb6d9e3bbe	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op	7 years ago
Qiao Longfei	0170d36c42	fix a bug	7 years ago
Qiyang Min	e37c9e6732	Merge pull request #13828 from velconia/accelerate_selected_rows_functor Accelerate SelectedRows Functors:	7 years ago
Qiao Longfei	86e2e686ee	fix bug	7 years ago
Qiao Longfei	333fd15204	add gpu test for mrege add	7 years ago
Qiao Longfei	ab3e36da80	update MergeAdd for selected_rows_functor.cu	7 years ago
Qiao Longfei	d5c64af24f	change map to unordered_map	7 years ago
Qiao Longfei	005f1923a2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op	7 years ago
tensor-tang	bcb8ea397d	Merge remote-tracking branch 'ups/develop' into fea/jitkernel_peephole test=develop	7 years ago
tensor-tang	8e182170ba	refine and replace lstm peephole kernel	7 years ago
Dun	5f2e837847	optimize depthwise conv by register memory (#13778 ) * optimize depthwise conv by register memory * test=develop	7 years ago
minqiyang	3f6ec90060	Polish code test=develop	7 years ago
tensor-tang	7ef2699e18	init peephole runtime kernel	7 years ago
minqiyang	0385b0a1ea	Accelerate SequencePool Op on SUM mode test=develop	7 years ago
minqiyang	8ec748cfa0	Accelerate SelectedRows Functors: 1. Accelerate SelectedRows MergeAdd functor 2. Add SelectedRowsSumTo functor to support MergeAdd multiple SelectedRows into one test=develop	7 years ago
Qiao Longfei	38568519f7	optimize code	7 years ago
tensor-tang	3ee8f2c6cf	thread local jit kernels test=develop	7 years ago
tensor-tang	9131a35676	replace the lstm compute with jitkernel test=develop	7 years ago
tensor-tang	b55c247678	add lstm compute unit test	7 years ago
sneaxiy	4c672ab1a2	Merge reyoung:rewrite_allocation	7 years ago
tensor-tang	2a00969165	optimize lstm jitkernel keq8 test=develop	7 years ago
tensor-tang	f2adaf1c3e	add vrelu and lstm kernel test=develop	7 years ago
tensor-tang	e6d8aca3bf	refine code and fix	7 years ago
qiaolongfei	1a59880084	update test_sum_op	7 years ago
qiaolongfei	40d3bd4e81	selected rows merge add support multi input	7 years ago
tensor-tang	ea7dc9cbf6	Merge remote-tracking branch 'ups/develop' into fea/jitkernel test=develop	7 years ago
tensor-tang	2513b2cc4e	fix bug vtanh	7 years ago
tensor-tang	cf8c8e72bd	add vtanh and unit test	7 years ago
tensor-tang	b37fe30417	Merge pull request #13690 from wangguibao/fix_cpu_lstm_compute_cc Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so	7 years ago
dzhwinter	26771f41ba	"fix compile error" (#13579 ) * "fix compile error" * "fix ci" * rerun ci test=develop * test=develop rerun ci	7 years ago
tensor-tang	d10a9df7b8	add vaddbias and unit test	7 years ago
tensor-tang	3c8b651187	add vsigmoid avx implementations and unit test	7 years ago
tensor-tang	55e44761fb	refine code and init vsigmoid	7 years ago
wangguibao	1940bc2d83	Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so test=develop	7 years ago
sneaxiy	584c3f048f	fix sparse rmsprop	7 years ago
Yu Yang	8e3fdc6e65	Fix SetDevice on init	7 years ago
Yu Yang	524f6e9b36	Refine code	7 years ago
Dun	161c3e31f7	Optimization of Kernels that related to DeepLabv3+ (#13534 ) * refine reduce by cub * optimize KernelDepthwiseConvFilterGrad * optimize depthwise conv and reduce mean and reduce sum * fix bug: dilation * cuda arch and cuda 8 compatible	7 years ago
tensor-tang	2d0ff6a3c2	add vexp and unit test	7 years ago
tensor-tang	b3c63f40fa	add vscal and unit test	7 years ago
tensor-tang	0987f2b4d9	add vadd unit test	7 years ago
tensor-tang	3d928d4f9d	refine and seepdup	7 years ago
tensor-tang	77fc42d2d1	Merge remote-tracking branch 'ups/develop' into fea/jitkernel	7 years ago
tensor-tang	2937314d8e	refine vmul and test	7 years ago
tensor-tang	6c986e127a	fix macro and add vmul unit test	7 years ago
Yu Yang	0be1582df0	Merge pull request #13525 from reyoung/fix_mixed_vector Fix mixed vector	7 years ago
tensor-tang	8c69764d12	add vmul unit tests	7 years ago
tensor-tang	084893a9a9	add vadd kernel	7 years ago
tensor-tang	eeff268a6c	clean and refine kernels	7 years ago
tensor-tang	dee5d35c20	refine vmul	7 years ago
tensor-tang	92031968d7	init vmul kernel	7 years ago
tensor-tang	b9acbcc8c5	init lstm kernel	7 years ago
tensor-tang	c260bf942d	init jit kernel	7 years ago
Yu Yang	3043f51b3a	Merge pull request #13511 from reyoung/fix_ce Revert "Merge pull request #13431 from chengduoZH/refine_lod"	7 years ago
Yu Yang	f7af695801	Merge pull request #13505 from reyoung/fix_selected_rows_functor_test Fix unstable selected_rows_functor_test.cu	7 years ago
Yu Yang	6d2c6f96f1	Revert "Revert "Merge pull request #13431 from chengduoZH/refine_lod"" This reverts commit `a6c8d6b9a2`.	7 years ago
Yu Yang	a6c8d6b9a2	Revert "Merge pull request #13431 from chengduoZH/refine_lod" This reverts commit `bd79e04667`, reversing changes made to `6b4d290c18`.	7 years ago
Zeng Jinle	7f1e312677	Merge pull request #13456 from sneaxiy/refine_sparse_adam Fix sparse Adam and Gradient clip of SelectedRows	7 years ago
Yu Yang	b5996fa124	Fix unstable selected_rows_functor_test.cu	7 years ago
sneaxiy	a29b4227eb	fix sparse gradient clip	7 years ago
Yihua Xu	87086b1386	Refine activation for GRU operator (#13275 ) * Optimize GRU with AVX instruction * Clean code * Add the Unitest and fix the align issue * Remove the remanent part of the unitest part * Code clean * Fix the parameters length issue for fusion_gru to pass CI * Change the default type as float32	7 years ago
chengduo	d402234ba8	Feature/op_fuse_pass (#12440 ) * Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * add op_fuse pass * add backward * code refine * use TopologySortOperations * follow comments * refine IsFusible * code enhance * fix op_fusion_pass * refine code * refine fuse_elemwise_act_op * adjust the input and output * refine logic * add intermediate_edge * disable inplace * follow comments * refine logic * follow comments * Remove the removable IntermediateOut * change strategy * code refine * enable fuse backward * code refine * code refine * rename unit test * follow comments	7 years ago
Yu Yang	2c31ea9293	Merge pull request #13424 from chengduoZH/refine_seq_concat Refine seq_concat	7 years ago
Yu Yang	5996e224fa	Merge pull request #13430 from chengduoZH/refine_seq_pool Refine seq_pool	7 years ago
sneaxiy	b6f61faf13	fix adam	7 years ago
chengduoZH	6534f8527a	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_lod	7 years ago
chengduoZH	24459501fe	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_concat	7 years ago
chengduoZH	f92b07f0b5	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_pool	7 years ago
gongweibao	0c8c0d943f	fix macunittest (#13434 )	7 years ago
chengduoZH	cdb9605bad	refine	7 years ago
chengduoZH	cacf549e8a	refine seq_pool	7 years ago
chengduoZH	e7940141ce	refine seq_concat	7 years ago
tensor-tang	7c8730824a	Merge pull request #13396 from tensor-tang/refine/op/lstm Refine/op/lstm	7 years ago
Tao Luo	40c54db301	Merge pull request #13338 from bingyanghuang/bingyang/seq_pool_memcpy Use memcpy to rewrite the sequence pooling LAST and FIRST mode	7 years ago
tensor-tang	e09cf031a8	refine src and header	7 years ago
bingyanghuang	76553c5a6d	fix travis-ci	7 years ago
tensor-tang	bc9971dd6c	fix deps	7 years ago
tensor-tang	ff858d35ed	fix bug and enable on batch mode as well	7 years ago
tensor-tang	8dea07f209	fix comopile	7 years ago
tensor-tang	612ba41aee	add simple lstm compute	7 years ago
bingyanghuang	83394bab3e	modified by luotao's suggestion	7 years ago
Bai Yifan	faf8ad2436	Add ignore_index in cross_entropy op (#13217 ) * add ignore index * update api.spec * enhance softmax_with_cross_entropy	7 years ago
bingyanghuang	1454cd54aa	pre-commit check	7 years ago
bingyanghuang	7429067ab3	clean code	7 years ago
bingyanghuang	cdbc5e7353	Add some comments	7 years ago
bingyanghuang	53185fde11	Rewrite sequence pooling last and first mode with memcpy and clean code	7 years ago
dzhwinter	379b471ee2	squash commit	7 years ago
dzhwinter	f05520060e	fix style (#13142 )	7 years ago
tensor-tang	f38905a6e5	Merge remote-tracking branch 'ups/develop' into optimize/op/fusion_gru	7 years ago
dzhwinter	34757efb8e	fix windows compile	7 years ago
dzhwinter	dbe90cc0f6	merge develop branch	7 years ago
dzhwinter	ab1097cd8e	Feature/template (#13093 ) * remove template operator * "fix compile" * "fix ci" * "fix ci"	7 years ago
tensor-tang	7bdd11d88e	Merge branch 'develop' into optimize/op/fusion_gru	7 years ago
tensor-tang	b0d36c4c3d	add cross vec to speedup gru	7 years ago
chengduo	3bd1d22a7d	Enhance fused_elementwise_activation_op (#12837 ) * Enhance the function of fused_elementwise_activation_op * enhance unit test * Clean Code And Add Doc * Add compound functors * Fix doc and enhance unit test * define Dx and Dy for d_binary_func * add mul_scale * add mul_scale * add elementwise_mul * code refine * code refine * add doc * add AsIntermediate	7 years ago
tensor-tang	2d0ddf8c41	refine cpu gru batch mode	7 years ago
tensor-tang	70d3981220	add cpu vec bias sub	7 years ago
tensor-tang	d941192e74	fix gcc53 on cpu vec (#13020 )	7 years ago
tensor-tang	2328a69157	Merge pull request #13012 from tensor-tang/refine/seq2batch refine seq2batch	7 years ago
tensor-tang	fd4f7c3ab5	refine seq2batch	7 years ago
fengjiayi	7e0c9f50ae	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op	7 years ago
fengjiayi	9cb455fa7d	update function	7 years ago
Zeng Jinle	ef7bd03a03	Merge pull request #12964 from sneaxiy/fix_concat_sync Fix concat bug	7 years ago
qingqing01	1f09bc320c	Support data type int8_t . (#12841 ) * Support int8 type.	7 years ago
dzhwinter	cd8f3e9ed0	operator module is done	7 years ago
chengduo	3e1050a2e8	Add pad_constant_like_op (#12943 ) * Add pad_constant_batch_size_like * refine pad_op * optimize memory	7 years ago
dzhwinter	6cc7870517	fix concat synchronization bug	7 years ago
dzhwinter	2ec589a24e	float.h fixed	7 years ago
dzhwinter	7dceb8a080	check some operators	7 years ago
dzhwinter	26dbe35c54	add msvc flags and copy lib done	7 years ago
Qiao Longfei	3c58b87b45	fix auc layer and add check for auc op (#12954 ) * fix auc layer and add check for auc op * use input to check if states are inited * optimize code	7 years ago
dzhwinter	d7f98f37a7	more platform is done	7 years ago
dzhwinter	eca4563e5d	operators module (#12938 )	7 years ago
dzhwinter	a94d4f51a8	fix math_function compile	7 years ago
tensor-tang	7bdaf09664	Merge remote-tracking branch 'ups/develop' into refine/jit	7 years ago
tensor-tang	3462c29940	refine add bias with avx	7 years ago
dzhwinter	c1ad52f768	pre-commit	7 years ago
dzhwinter	89f95ea25e	merge develop branch	7 years ago
tensor-tang	bb9f98e10d	add inplace test	7 years ago
tensor-tang	f269614bcd	further optimize tanh with avx and mkl	7 years ago
luotao1	2b4edacca0	enhance the forward of concat op	7 years ago
dzhwinter	34f8c9b6f5	windows port	7 years ago
tensor-tang	7a4924cd44	further optimize sigmoid with avx and avx512	7 years ago
tensor-tang	6bd89ba5b6	fix typo	7 years ago
tensor-tang	e3bb98eb38	optimize relu with avx and avx512	7 years ago
tensor-tang	25976fe736	optimize the sigmoid and tanh	7 years ago
tensor-tang	2eb46c2b06	add cpu vec test	7 years ago
tensor-tang	f0f06992c1	Merge pull request #12878 from tensor-tang/feature/op/attention_lstm Add attention lstm cpu forward	7 years ago
fengjiayi	f4a4a4cbd9	add op comment and python layer	7 years ago
tensor-tang	5ca0bb9aad	support more activation type and remove some comments	7 years ago
tensor-tang	ec59f0d454	add cpu vec	7 years ago
tensor-tang	cf5ea925c3	fix bugs	7 years ago
tensor-tang	3dd66390b2	add blas vexp	7 years ago
tensor-tang	0ec1f65cf1	fix blas dot and add cblas scal	7 years ago
tensor-tang	a2203d0466	add cblas dot	7 years ago
tensor-tang	f72ab8961e	refine blas gemm	7 years ago
Yu Yang	3768677980	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/process_lod_grad	7 years ago
Yu Yang	2a36ad1a96	Handle LoD for concat & seq_softmax ops	7 years ago
fengjiayi	ce182d9037	bug fix	7 years ago
Tao Luo	d04ef276a5	Merge pull request #12745 from tensor-tang/refine/op/elewise_mul Refine elementwise mul cpu forward	7 years ago
fengjiayi	34b209cffa	Complete sequence_padding GPU kernel	7 years ago
tensor-tang	b090479409	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
fengjiayi	8d8d48a34f	Complete sequence_pad_op and its CPU kernel. Add unittests	7 years ago
dzhwinter	4069262f0e	Revert ""cherry picked operators changes" (#12184 )" (#12747 ) This reverts commit `bf3c34960f`.	7 years ago
fengjiayi	3c749fae43	update CPU sequence_padding functor	7 years ago
tensor-tang	92890ac258	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
tensor-tang	6644ce79a5	add mklml vmul	7 years ago
tensor-tang	ff92b6ba81	Merge pull request #12531 from tensor-tang/refine/op/gru Refine gru cpu forward	7 years ago
tensor-tang	a72f68f223	Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm	7 years ago
tensor-tang	f3cd2612ae	refine fc and use the fc compute in fusion_lstm	7 years ago
dzhwinter	bf3c34960f	"cherry picked operators changes" (#12184 ) * "cherry picked operators changes" * "remove duplicated code" * "add constant setter" * "add get expected kernel" * "fix ci" * "add fill constant"	7 years ago
fengjiayi	a38a8db928	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op	7 years ago
tensor-tang	3bf3e77ac8	Merge remote-tracking branch 'ups/develop' into refine/op/gru	7 years ago
chengduo	7c8b69c700	Feature/op fusion (#12240 ) * Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * follow comments * follow comments	7 years ago
tensor-tang	54c95e49f0	fix blas	7 years ago
tensor-tang	8c23f7c4f0	fix blas and use packed weight	7 years ago
tensor-tang	43cee33a23	add mkl packed gemm	7 years ago
tensor-tang	d8d2dbcfac	further optimize im2col using variables	7 years ago
tensor-tang	687a322267	Merge remote-tracking branch 'ups/develop' into refine/im2col	7 years ago
tensor-tang	65d418f060	complete im2col with padding==1 and speedup filter width==1	7 years ago
tensor-tang	52eb86e30f	refine im2col benchmark	7 years ago
tensor-tang	3017f46076	add more test cases	7 years ago
tensor-tang	8d6be4fb5f	refine im2col test and add benchmark	7 years ago
tensor-tang	507c143047	im2col cfo cpu code clean	7 years ago
tensor-tang	4eeed0b5e4	refine width padding and enable core copy	7 years ago
Wu Yi	73fcfc06ec	refine conv cudnn enforce (#12353 ) * refine conv cudnn enforce * update * update all cudnn ops * fix	7 years ago
tensor-tang	e3131e2d73	enable width padding	7 years ago
tensor-tang	92518c519f	reuse sizes saving time	7 years ago
tensor-tang	660df122ce	enable padding!=0 and fill height padding with 0	7 years ago
tensor-tang	d8e00facf7	reuse im_size	7 years ago
tensor-tang	b72befc5cc	reuse copy size	7 years ago
tensor-tang	6788af4bf1	refine test cases	7 years ago
tensor-tang	b163e601b6	add gtest	7 years ago
tensor-tang	aae994fd26	refine im2col no padding	7 years ago
Yan Chunwei	02cf54d331	bugfix lod cpu performance (#12297 )	7 years ago
tensor-tang	fc2b578842	add gemm_warp test	7 years ago
tensor-tang	a916c52579	refine gemm	7 years ago
tensor-tang	961e754c9f	mkl split gemm for better perf	7 years ago
tensor-tang	f0cd493c0d	Merge pull request #11989 from tensor-tang/feature/libxsmm introduce libxsmm	7 years ago
Guo Sheng	da3f766821	Merge pull request #12088 from guoshengCS/complete-hsigmoid Complete hsigmoid_op	7 years ago
guosheng	4ee069fdba	Fix the HierarchicalSigmoidGradOpKernel and refine the codes. Now hsigmoid_op is same with V2 implementation and can pass gradient check.	7 years ago
tensor-tang	1c5d6c5692	disable xsmm with float16	7 years ago
tensor-tang	c9ba51ead8	Merge remote-tracking branch 'ups/develop' into feature/libxsmm	7 years ago
tensor-tang	64a8e6d20e	refine the threshold functions	7 years ago
lemon34	29145e1e31	change im2sequence for ctc batch inference (#11696 ) * change im2sequence for ctc batch inference * Update im2sequence_op.cc * change im2sequence for ctc batch inference * update * change PR by comment * fix ocr test error * fix test_im2sequence * modify the old name to standard name * fix test_layers failed	7 years ago
guosheng	e7a4cfc0ff	complete the hsigmoid_op	7 years ago
guosheng	d695381677	Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into complete-hsigmoid	7 years ago
tensor-tang	6bc1aaaac7	refine the ColMajor replacement	7 years ago
tensor-tang	de856da9a6	fix ColMajor and RowMajor replacement	7 years ago
tensor-tang	21516e5cbe	add unit test of smm	7 years ago
tensor-tang	c3941745b3	add libxsmm_gemm	7 years ago
tensor-tang	7782a4ab53	fix blas build issue	7 years ago
tensor-tang	17987eb3fc	link libxsmm	7 years ago
tensor-tang	3df99e72ab	Merge remote-tracking branch 'ups/develop' into refine/set_num_threads fix conflicts	7 years ago
dzhwinter	4ed0b62476	Move fluid::framework::InitDevices into fluid::platform (#11757 ) * move to platform * "move init from framework to platform" * "remove used init" * "fix ci" * "fix ci" * "fix generic" * "fix ci" * "fix ci" * "fix ci" * "disable fragile test"	7 years ago
dzhwinter	99a99ec7e3	"remove lapack" (#11966 )	7 years ago
Xin Pan	a9086bf320	also move a few other dir to legacy/	7 years ago
tensor-tang	e3a96300bb	move SetNumThreads to platform	7 years ago
tensor-tang	1f09ddf806	Merge remote-tracking branch 'ups/develop' into refine/mklml/dyload	7 years ago
Tao Luo	bfe5dc6312	Merge pull request #11607 from chengduoZH/fix_concat_warning Fix concat compile warning	7 years ago
chengduoZH	804c767107	fix concat warning	7 years ago
tensor-tang	f503f12925	enable dynamic load mklml lib on fluid	7 years ago
fengjiayi	12619fcf90	fix a compile error	7 years ago
qiaolongfei	762160bd8c	fix concat grad kernel	7 years ago
qingqing01	9c90dc9728	Make the CUDA kernel of concat correct and fix unit tests. (#11541 ) * Make the CUDA kernel of concat correct and fix unit tests.	7 years ago
qiaolongfei	ad1ad738d8	add gpu support for concat	7 years ago
qiaolongfei	9c128fe656	concat support data as input	7 years ago
weixing02	ee13b396f2	fix some errors	7 years ago
weixing02	8bd148dc00	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into hsigmoid_op	7 years ago
tensor-tang	9169b3b802	Merge pull request #10789 from Xreki/core_fix_openblas_threads Add an interface to set the number of threads for math function, and set the default value to 1 for inference.	7 years ago
guochaorong	04b8d3d03c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into paddle_fix	7 years ago
guochaorong	0fec9469f9	fix some bugs introduced by unfreed memory	7 years ago
weixing02	3e46ec41a9	add hsigmoid	8 years ago
qingqing01	3ba75d4a69	Check label range in cross entropy calculation. (#10954 )	8 years ago
Tomasz Patejko	e43c8f33cd	MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used	8 years ago
yangyaming	10ec329b7d	Refine code.	8 years ago
Liu Yiqun	50ba205d79	Merge branch 'develop' into core_fix_openblas_threads	8 years ago
Liu Yiqun	39eb871ddf	Add an interface to set the number of threads for math function, and set the default value to 1 for inference.	8 years ago
yuyang18	fd2b4b478e	Make tensor support uint8	8 years ago
Yiqun Liu	b7026f79a9	Fix a bug related to dispensable inputs and refine the inference unittest (#10527 ) * Fix a bug related to dispensable inputs and refine the inference unittest. * Fix the use of dispensable inputs in reshape_op. * Polish the enforce statements. * Fix an English writing typo.	8 years ago
yangyaming	0797246704	Enhance sequence_padding functor (CPU and GPU).	8 years ago
yuyang18	66590a0b88	Fix typo in blas_impl.h	8 years ago
yuyang18	27197290dc	matmul support float16/double	8 years ago
Yu Yang	fcd31d6161	Follow comments and polish code names	8 years ago
Yu Yang	0a13d3c67a	Move MatMul to blas_impl.h Rename MatDim to MatDescriptor	8 years ago
Yu Yang	3dd01823a8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_matmul	8 years ago
Yu Yang	c6a6d87f96	Rewrite Matmul, make code cleaner	8 years ago
fengjiayi	b708ec0ae1	Merge pull request #10412 from JiayiFeng/correct_TensorCopy_misuse Correct tensor copy misuse	8 years ago
Darcy	8f8a4768dc	adding device_context to blas deps list (#10420 ) * adding operator to blas deps list * use device_context instead to solve cycle deps	8 years ago
fengjiayi	0c99cd7bbb	fix errors in sequence_padding_test	8 years ago
Siddharth Goyal	b65282168c	Fix cpplint errors in lstm kernel (#10394 )	8 years ago
fengjiayi	e309f42293	fix errors in concat_test	8 years ago
Yu Yang	0285a2b95d	Merge pull request #10371 from reyoung/refine_code Polish MatMul, clean copy & paste code	8 years ago
Abhinav Arora	c9f55dfafc	Fix CPPLint issues in /math/detail/gru_kernel.h (#10390 ) * Fix CPPLint issyes in gru_kernel.h * Fix CPPLint issyes in gru_kernel.h * Fix Compile error	8 years ago
Yu Yang	ef6ea790dc	Clean and extract blas	8 years ago
Yu Yang	815d888468	Clean MatMul	8 years ago
Yu Yang	bc8160350b	Fix compile	8 years ago
Yu Yang	a6edeb39b3	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_blas	8 years ago
chengduo	4fbde42cdf	Fix __shfl_down_sync_ of cross_entropy (#10345 ) * fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"	8 years ago
Yu Yang	caa4027d9d	Follow comments	8 years ago
Abhinav Arora	1945b729b6	Fix CPPLint issues with math/sequence_padding (#10317 ) * Fix cpplint issues in sequence_padding * Fix typo in cu file * Fix dependencies of sequence_padding * Add include	8 years ago
chengduo	9bcd9f661b	fix cpplint error (#10329 )	8 years ago
Yu Yang	4db43c6c9f	Naive implement cblas	8 years ago
Yu Yang	60d6348e69	Revert develop	8 years ago
Yu Yang	86af6bdc81	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_blas	8 years ago
Yu Yang	49dedfad17	Polish code and tests	8 years ago
Abhinav Arora	738585476d	Fix more CPPLint issues in fluid/operators/math (#10276 ) * Fix CPPLint issues in lstm_cpu_kernel.h * Fix CPPLint issues in math/math_function_test * Fix CPPLint issues in math/math_function_test * Fix CPPLint issues in math/concat.cc * Fix CPPLint issues in math/concat.cc * Fix CPPLint issues in math/concat.cc * Fix CPPLint issues in math/gru_cpu_kernel * Fix CPPLint issues in math/selected_rows_functor_test.cu * Fix compile error * Fix compile error	8 years ago
dzhwinter	eb6f9dd5de	Feature/cuda9 cudnn7 (#10140 ) * "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"	8 years ago
Yu Yang	c888e01660	Refactor GEMM in blas	8 years ago
Abhinav Arora	e735359631	Fix more CPPlint issues in fluid/operators/math (#10249 ) * Fix CPPLint errors * Fix CPPLint errors in sequence2batch * Fix compilation * Fix LSTM op and GRU op * Fix LSTMP op * Fix more cpplint errors in operators/math * Address Code review feedback	8 years ago
fengjiayi	71fa3ca9c4	Merge pull request #10232 from JiayiFeng/fix_unittests Fix unittests	8 years ago
fengjiayi	30f9dc92e5	fix errors	8 years ago
fengjiayi	330fa95cbd	Follow comments	8 years ago
Abhinav Arora	83b1a8f6bf	Pending more CPPLint errors in fluid/operators/math (#10243 ) * Fix CPPLint issue in test_engine * Fix CPPLint errors in operators/math * Fix compilation	8 years ago
fengjiayi	bcf260e1e8	fix several unit tests	8 years ago
Abhinav Arora	f457d5da06	Fix more CPPLint errors (#10218 ) * Fix more CPPLint issues * Fix more CPPLint issues * Fix more CPPLint issues * Fix CPPLint issues in operators/math and operators/reader	8 years ago
Yu Yang	580dad0c2c	Fix compile when there is no mkl	8 years ago
Yu Yang	2a06e307d0	Fix batch_gemm bugs stride should be int64_t, not int	8 years ago
Kexin Zhao	92913027fc	fix unused var error (#9908 )	8 years ago
Kexin Zhao	617e790a59	fix cuda 7.5 compile error (#9885 )	8 years ago
Kexin Zhao	7ed457e77a	Fix cuda 7.5 error with cublas GEMM (#9811 ) * fix gemm error for cuda 7.5 * fix version number	8 years ago
Kexin Zhao	b2a1c9e8b7	Add float16 support to non-cudnn softmax op on GPU (#9686 ) * initial commit * fix error * fix typo and order	8 years ago
Kexin Zhao	d00bd9eb72	Update the cuda API and enable tensor core for GEMM (#9622 ) * change from hgemm to gemmEx * fix cpplint	8 years ago
chengduoZH	e099b18045	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_CUDAPinnedPlace	8 years ago
Yang Yu	af230d9bef	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor	8 years ago
dzhwinter	8425c2c859	Speed/sequence op1 (#9217 ) * "add functors" * "remove old code" * "fix" * "fix ci" * "add details" * "fix ci" * "fix ci" * "fix ci" * "fix ci" * "remove unused code"	8 years ago
Yang Yu	b0775588c0	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor	8 years ago
chengduoZH	ab601c19c3	Add CUDAPinnedPlace	8 years ago
Luo Tao	6332bd1ed8	Merge branch 'develop' into infer_mkl	8 years ago
Yu Yang	50e7e25db3	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor	8 years ago
chengduoZH	aca9180a76	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_concat	8 years ago
chengduoZH	750aff10ce	code refine	8 years ago
chengduoZH	043f47b27f	fix concat op	8 years ago
Luo Tao	ae820a34bc	Merge branch 'develop' into infer_mkl	8 years ago
Tao Luo	9126e626fc	Merge pull request #9165 from ROCmSoftwarePlatform/amd_cmake_01 Demostration of cmake refine for HIP support.	8 years ago
Kexin Zhao	4eaa789730	resolve conflict	8 years ago
Kexin Zhao	ed2bc194c5	Merge pull request #9176 from kexinzhao/batch_norm_fp16 Add float16 support to batch norm operator	8 years ago
Kexin Zhao	70e7122785	initial commit	8 years ago
sabreshao	e50205e744	CMake refine for HIP support. 1. Add option WITH_AMD_GPU. 2. Add cmake/hip.cmake for HIP toolchain. 3. Some external module such as eigen may need HIP port. 4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake. 5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.	8 years ago
Yang yaming	381c6a026d	Merge pull request #9100 from pkuyym/fix-9049 Enhance sequence_expand operator	8 years ago
yangyaming	2f2c5f5e60	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-9049	8 years ago
Xi Chen	9eae086e39	add math_function to softmax's dep list	8 years ago
Yu Yang	9cb8f50302	Complete fetch op	8 years ago
Kexin Zhao	39c676e208	initial commit	8 years ago
xuwei06	ab3543e35e	Fix compilation for gcc5.4 The error is: paddle/fluid/operators/math/concat.cc:47:72: error: invalid initialization of non-const reference of type 'paddle::platform::CPUPlace&' from an rvalue of type 'paddle::platform::CPUPlace' auto& cpu_place = boost::get<platform::CPUPlace>(context.GetPlace()); Should not use reference for cpu_place.	8 years ago
yangyaming	bf3f56e899	Finish adaption for backward.	8 years ago
sabreshao	45c988d86a	Demostration of cmake refine for HIP support. 1. Add option WITH_AMD_GPU. 2. Add cmake/hip.cmake for HIP toolchain. 3. Some external module such as eigen may need HIP port. 4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake. 5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.	8 years ago
Tao Luo	a448fbe9e1	Merge pull request #9134 from putcn/fix-selected-row-dep add math_function to selected_rows_functor dependency list	8 years ago
qingqing01	7c1a0b77a0	Delete the detection_output_op, which had been split into several operators. (#9121 )	8 years ago
Xi Chen	d20c6eb6de	add math_function to selected_rows_functor dependency list	8 years ago
dzhwinter	128adf53cb	[Speed]implement cudnn sequence softmax cudnn (#8978 ) * "add softmax cudnn functor support" * "add testing" * "refine cmakelist" * "sequence softmax forward speed up" * "add softmax grad" * "fix sequence softmax test" * "add double precision' * "fix softmax test" * "add softmax cudnn support" * "fix softmax cudnn test" * "add softmax to nn.py" * "fix compile bug" * "refine cmakelist" * "fix ci" * "fix based on comment" * "fix based on comments" * "fix ci"	8 years ago
Luo Tao	de13f0eb4e	Merge branch 'develop' into infer_mkl	8 years ago
Kexin Zhao	3b44b849d3	address comments	8 years ago
Kexin Zhao	95de7617eb	fix bug	8 years ago
Kexin Zhao	1998d5afa2	add gpu info func to get compute cap	8 years ago
Kexin Zhao	d400b4192d	fix math function arch mismatch for older GPU	8 years ago
kexinzhao	90215b7844	Add float16 GEMM math function on GPU (#8695 ) * test cpu float16 data transform * add isnan etc * small fix * fix containsNAN test error * add data_type transform GPU test * add float16 GPU example * fix error * fix GPU test error * initial commit * fix error * small fix * add more gemm fp16 tests * fix error * add utility function	8 years ago
Luo Tao	bc0cfb2283	remove PADDLE_USE_ATLAS	8 years ago
Luo Tao	49f3f1db07	add back framework_proto depends	8 years ago
Luo Tao	3ddc997182	rename concat_functor to concat, refine CMakeLists based on comments	8 years ago
Luo Tao	1ef97fa7b1	Merge branch 'develop' into math_function	8 years ago
chengduo	84aea8a8a1	Merge pull request #8669 from chengduoZH/feature/concat_op Refine concat_op	8 years ago
kexinzhao	266ccaa843	Integrate float16 into data_type_transform (#8619 ) * test cpu float16 data transform * add isnan etc * small fix * fix containsNAN test error * add data_type transform GPU test * add float16 GPU example * fix error * fix GPU test error * add context wait	8 years ago
chengduoZH	131ec276ed	fix bug for big number; float->double and code refine	8 years ago
chengduoZH	82bd82c186	follow comments and refine code	8 years ago
chengduoZH	00e596edbe	get max threads of GPU	8 years ago
Luo Tao	f67275a920	refine operator/math/CMakeLists.txt, seperate im2col from math_function	8 years ago
chengduoZH	60e7ee0611	refine concat_op	8 years ago
Yi Wang	cfffb1a362	Update tensor_util.h (#8422 ) * Update tensor_util.h * Update with moved TensorDesc * Fix tensur_utils.cu * Update * Update * Update * Update * Make tensor_util.cu a symbolic link	8 years ago
qingqing01	24509f4af9	Fix the grammar in copyright. (#8403 )	8 years ago
Yi Wang	fc374821dd	Correct #include path	8 years ago
Yi Wang	90648f336d	Move file to fluid/; Edit CMakeLists.txt	8 years ago

... 6 7 8 9 10 ...

673 Commits (b34933d9ee3b61dbbd642fd02f244c36d0d14550)