Paddle

Commit Graph

Author	SHA1	Message	Date
Zeng Jinle	bb4f8dee83	add logs to left var memory size, test=develop (#19722 )	6 years ago
wangguanzhong	25dcd74d34	merge empty lod tensor, test=develop (#19228 ) * merge_empty_lod_tensor, test=develop * fix multiclass_nms, test=develop * refine API.spec, test=develop * add unittest case for fetch, test=develop * add lod tensor test, test=develop * return index for multiclass_nms, test=develop * add api for multiclass_nms2 * update API.spc, test=develop * refine api doc, test=develop * fix test_detection.py, test=develop * polish code, test=develop * add more unittest case, test=develop	6 years ago
Tao Luo	bcddbc78d4	remove -Wmaybe-uninitialized warning (#19653 ) * remove -Wmaybe-uninitialized warning test=develop * remove uninitialized op_handle_ in scale_loss_grad_op_handle.cc test=develop	6 years ago
baojun	a3a4b6e570	Enable ngraph through build_strategy (#19266 ) * enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop	6 years ago
Zeng Jinle	19474019c2	fix fast pe to run highest priority ops first, test=develop (#19575 )	6 years ago
chengduo	b6d1d8901f	Increase num_iteration_per_drop_scope (#19075 ) * increase num_iteration_per_drop_scope test=develop * Fix bug of while_op test=develop * fix bug of whileOp test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
chengduo	e044e84264	open fuse_all_optimizer_ops (#19087 ) test=develop	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
chengduo	e7da0940f9	Disable fuse optimization option (#18924 ) * Disable fuse optimization test=develop	6 years ago
Zeng Jinle	8008ab4e6b	Remove legacy C++ memory optimization codes (#18834 ) * remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop	6 years ago
chengduo	4140fe11a4	Open fuse optimization ops (#18741 ) * open fuse optimization ops test=develop	6 years ago
Zeng Jinle	a802da650b	Feature/mem opt pass refactor (#18735 ) * first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
chengduo	a6d468a265	fix PE fetch bug (#18644 ) test=develop	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
chengduo	7453857324	Make fuse_all_reduce_op_pass support mix_precision (#17652 )	6 years ago
tangwei12	999d9a59a5	fix communicator with pyreader (#18350 ) * add is_runnning in communicator, test=develop	6 years ago
chengduo	5489216eba	Clean build strategy (#18148 ) * clean build_strategy test=develop * DataBalanceOpHandle has been removed test=develop * debug * update build_strategy. test=develop	6 years ago
chengduo	24e988a471	Fix bug of scope_buffered_ssa_graph_executor (#18100 ) * fix code bug test=develop	6 years ago
gongweibao	f5caf3443c	Fix reinitialized ncclid error! (#18025 )	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
chengduo	437520474c	fix DropLocalExeScopes (#17829 ) test=develop	6 years ago
chengduo	67c8dade58	Add Event in ScopeBuffer Executor (#17667 ) * add event for fast executor and add threads for scopebuffer executor test=develop	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
Qiao Longfei	58f7695ab2	Async exe support communicator (#17386 ) Async exe support communicator	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
chengduo	5a6ab38013	Add record event And remove CSP (#17447 ) * add record_event test=develop * remove csp test=develop	6 years ago
chengduo	e336dc86bb	[Speed] Refine the Executor when the num_thread=1 (#17405 ) Refine the Executor when the num_thread=1	6 years ago
Tao Luo	68ec0a6f74	make parallel_executor support FLAGS_use_mkldnn (#17341 ) * make parallel_executor support FLAGS_use_mkldnn test=develop * add warning when set mkldnn_enabled_op_types_ in non-mkldnn env test=develop	6 years ago
chengduo	bc833945a4	Add DropLocalExeScopes in ParallelExecutor (#17297 ) * reset drop local scope counter test=develop	6 years ago
chengduo	516317cf91	use sync copy (#17291 ) test=develop	6 years ago
chengduo	04bd413acb	Code Clean: Move all pass to paddle::framework::ir (#17228 ) * move pass to ir * polish code test=develop * fix dependency test=develop	6 years ago
Zeng Jinle	4f8594088d	Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225 ) * add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop * fix potential inplace bug test=develop * add more skip vars in mem opt pass,test=develop * follow comment,test=develop * follow comments,move duplicate out arg check to program->graph,test=develop	6 years ago
Zeng Jinle	ee2028a110	Add use_cuda to inplace pass (#17205 ) * add use_cuda to inplace pass,test=develop * add test softmax_with_xe_inplace test,test=develop	6 years ago
chengduo	950aec55fd	It doesn't need sync when fetch_list nit not empty (#17201 ) test=develop	6 years ago
Zeng Jinle	4e1bc6e805	Rewrite inplace pass and fix gc bug (#17126 ) * fix op graph view test=develop * rewrite inplace pass and fix reference count pass bug test=develop * fix unittest failed test=develop * follow comments, test=develop	6 years ago
chengduo	794a195881	fix fuse optimizer ops (#17102 ) test=develop	6 years ago
Zeng Jinle	842ded14b0	fix reference_count_pass,test=develop (#17060 ) test=develop	6 years ago
chengduo	cc31681687	use fast executor as default (#17044 ) test=develop	6 years ago
chengduo	a2be4b4d91	Add fuse momenutum ops (#16745 ) * Add fuse momenutum ops	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
乔龙飞 Qiao Longfei	82cff5ec42	Merge pull request #16762 from jacquesqiao/add-async_sparse_param_update_recorder Add async sparse param update recorder	6 years ago
chengduo	e9409665f7	Refine Fuse Optimize Ops (#16810 ) * fix bug of fuse optimize ops	6 years ago
chengduo	d105c06b50	Replace ThreadedExecutor with FastThreadedExecutor (#16650 ) * replace ThreadedExecutor with FastThreadedExecutor test=develop * Fix Travise CI test=develop * Test FastThreadedSSAGraphExecutor test=develop * refine parallel_ssa_graph_executor.cc test=develop	6 years ago
Qiao Longfei	afc56949c1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async_sparse_param_update_recorder	6 years ago
Yiqun Liu	112f16143b	Add an option to enable the cache of expected kernel in train phase. (#16724 ) * Add an option to enable the cache of expected kernel in train phase. test=develop * Change the default value of cache_expected_kernel to true.	6 years ago

1 2 3 4 5 ...

720 Commits (bda92434db2d75c9c1f29de84005df28c7b1c9e5)