Paddle

Commit Graph

Author	SHA1	Message	Date
Zhen Wang	89cfa49156	Unmerged fetch list (#22635 ) * update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results. * add the unit test for fetch_unmerged. * update ut for multi-card and multi-cpu. * add the error message and the user suggestion in FetchOpHandle. test=develop	5 years ago
Chen Weihang	7d8d573453	Speed up dygraph DataLoader based on shared memory and LoDTensor serialization (#22541 ) * add lodtensor share memory & serialization, test=develop * fix windows compile error, test=develop * deal vartype pickle & fix unittest matching error message, test=develop * update timeout variable name, test=develop * refactor memory map implement, test=develop * clear mmap file discripter when exit unexpectedly, test=develop * remove the child process fd in advance, test=develop * remove mmap fds after Queue.put in child process, test=develop * add hard unittests for register exit func, test=develop * fix python2 compatibility problem in unittest, test=develop * fix exception unittest error, test=develop * polish code based review comment, test=develop	5 years ago
Leo Chen	b2c1be851a	support cond in clone, test=develop (#22657 ) * support cond in clone, test=develop * refine code, test=develop * refine code, test=develop * follow comments, test=develop * refine code, test=develop	5 years ago
hutuxian	175954d894	PaddleBox Framework Part2 (#22466 ) * Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator. * Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly. * Remove CPU code in Pull/PushSparse and we will add it back when testing it fully. * Fix some known issues: such as copying persistable vars after one epoch running.	5 years ago
wangchaochaohu	c65c6ae534	add flag to control profile level in python API (#22319 ) * add python flag to control profile level test=develop	5 years ago
tangwei12	b0675c8193	fix bug with compiledProgram (#22495 ) * add thread barrier for the compiled program	5 years ago
Wilber	de009152a7	Compile without nccl deps. [2/2] (#22484 ) Compile without nccl deps. [1/2] Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Yiqun Liu	dcfb603897	Enable the detection of subgraph composed of grad ops (#21223 ) * Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
Yiqun Liu	b7cac50b64	Implement a common python unittest to test the ir passes. (#22209 ) * Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop	5 years ago
xujiaqi01	e3a457d34b	add collective communication library in fleet (#22211 ) * add collective communication library in fleet to replace mpi * test=develop	6 years ago
Zhen Wang	46189b166d	Add bn and relu fuse pass (#22048 ) * add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop	6 years ago
Huihuang Zheng	dd4361568e	Add ParallelExecutor Test for Cond API and Fix PE Checks Shape Bug (#22029 )	6 years ago
Huihuang Zheng	557bce77da	Fix Backward Bugs in Conditional Block (#21809 ) The fixed bugs: 1. The condition sub-graph is not pruned 2. When backward graph is extremely simple, the whole backward ops are pruned.	6 years ago
mapingshuo	686f0ecb6a	add `no_need_buffer_slots` interface to pybind (#21575 ) * add no_need_buffer_slots interface to pybind	6 years ago
liym27	9da7e6b4d4	add file check_op_desc.py and add interface to get default value. (#21530 ) * add file check_op_desc.py and add interface to get default value. test=develop * add test for c++ coverage rate. test=develop * Correct typo. test=develop	6 years ago
Zeng Jinle	3a7caf481c	add grad maker assert, test=develop (#21564 )	6 years ago
Leo Chen	cdd46d7e02	Split VarBase from Python Variable for Dygraph (#21359 ) * test=develop, fix docker with paddle nccl problem * don't expose numerous Tensor.set(), test=develop * fix condition, test=develop * fix float16 bug, test=develop * feed should be Tensor or np.array, not Variable or number, test=develop * use forcecast to copy numpy slice to new array, test=develop * remove float16-uint16 hacking, test=develop * add variable method to varbase and refactor to_variable to support return varbase * support kwargs in varbase constructor * add VarBase constructor to support default python args * refine varbase initial method * reset branch * fix ut for change VarBase error info to PaddleEnforce * cherry is parameter change before * overload isinstance to replace too many change of is_variable * rm useless files * rm useless code merged by git * test=develop, fix some ut failed error * test=develop, fix test_graph_wrapper * add some tests, test=develop * refine __getitem__, test=develop * add tests, test=develop * fix err_msg, test=develop	6 years ago
Aurelius84	54382ce497	Add get_all_kernels api of registered data_type in pybind.cc (#21499 ) * add _get_all_register_op_kernels api test=develop * refine usage of check_op_register_type test=develop * add import in core test=develop	6 years ago
Youwei Song	d5ff79e55e	Support numpy bridge (enabled by default in dygraph mode) (#20983 ) * add numpy bridge * fix template compile * add unittest, add default test=develop * fix unittest test=develop * fix unittest test=develop * zero_copy=True for to_variable, test=develop * bug fix test=develop * disable deprecated NumPy API test=develop * use better design of NumpyAllocator test=develop * fix Py_None check test=develop * reset c++ tracer when jump out dygraph guard test=develop * refine PADDLE_ENFORCE_xx format test=develop * bug fix of tracer switch test=develop * update decref test=develop	6 years ago
Zeng Jinle	b9f8ae8494	Add global value getter setter (#21285 ) * add global value getter setter, test=develop * fix error messages, test=develop	6 years ago
Dong Daxiang	691ced87c0	Refactor fetch handler (#21264 ) * fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.	6 years ago
Zeng Jinle	5fdfbe3413	Add friendly dygraph trace API (#21091 ) * friendly trace interface, test=develop * refine TracedLayer, test=develop * add some docs, test=develop	6 years ago
Leo Chen	9974e40787	Update Tensor.set() to support float16 (#19964 ) * don't expose numerous Tensor.set(), test=develop * fix condition, test=develop * fix float16 bug, test=develop * feed should be Tensor or np.array, not Variable or number, test=develop * use forcecast to copy numpy slice to new array, test=develop * remove float16-uint16 hacking, test=develop	6 years ago
Yiqun Liu	16e4d02675	Refine the cache of program, context and scope in executor. (#18483 ) * Refine the cache of program, context and scope in executor. test=develop * Refine the unittest test_executor_and_use_program_cache. * Add the test the PaddingRNN with use_program_cache=True. test=develop * Remove a check. test=develop * Refine the unittest to check whether it is correct when setting use_program_cache=True. test=develop	6 years ago
hong	ff0886a92a	save load problem fix and new feature add (#20823 ) * fix persistable; * fix save load bugs; test=develop * fix bug; test=develop * add example for new io api; test=develop * addd example; test=develop	6 years ago
WangXi	507afa8a8a	Fix dgc nan by stripping nccl from sparseReduce. (#20630 )	6 years ago
633WHU	12e4be0382	Dlpack support (#20039 ) * support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop	6 years ago
Zeng Jinle	40effc61af	Refine py_reader exit (#20331 ) * refine py_reader exit, test=develop * fix multiprocess_reader exception unittest, test=develop * increase code coverage for legacy fluid.layers.py_reader, test=develop	6 years ago
liu zhengxi	f855a86c93	update the api en doc of BuildStrategy (#20445 ) * update the api en doc of BuildStrategy and its setting, test=develop, test=document_fix * update api.spec, test=develop, test=document_fix * update the en doc of fuse_relu_depthwise_conv, test=develop, test=document_fix	6 years ago
tangwei12	a010d883b4	doc fix, test=develop, test=document_fix (#20239 ) * doc fix, test=develop, test=document_fix	6 years ago
Leo Chen	5a7142ac4e	Update en APIs of LoDTensor (#20115 ) * polish en APIs of LodTensor, test=develop, test=document_dix * polish en APIs of LoDTensor, test=develop, test=document_fix * follow comments, test=develop, test=document_dix	6 years ago
hong	fa43e80e19	New save load interface (#20148 ) * add new save load interface; test=develop * add new save interface; test=develop * add save load interface ; * fix save load error; * fix dygraph set dict bug; * add save load unit test; test=develop * fix test_imperative_optimizer bug; test=develop * fix unitest optimizer bug; test=develop * fix code coverage; test=develop * fix converage; test=develop * add document for apis; test=develop * fix unitest error; test=develop * fix save load unit test error; test=develop * fix error message; test=develop * change set_parameter set_optimizer to save_dygraph; test=develop * add load_graph check; test=develop * fix api spec; test=develop	6 years ago
Leo Chen	f4c56e9f51	Polish en doc of LoDTensorArray, test=document_fix (#19972 ) * Polish en doc of LoDTensorArray, test=develop, test=document_fix * follow comments, test=develop, test=document_dix	6 years ago
Youwei Song	20f68916ed	refine CUDA CPU places en doc (#20243 ) * fix CUDA CPU places, test=document_fix, test=develop * fix CUDAPlace param doc, test=document_fix, test=develop * fix CUDAPlace param doc, test=document_fix, test=develop	6 years ago
tangwei12	c9139c3db3	trainer from dataset fetch targets (#19760 ) add executor.FetchHandler for train/infer from the dataset	6 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	6 years ago
石晓伟	01b9d07963	update operator compatible info, test=develop (#19978 ) * update operator compatible info, test=develop * revert cmake/version.cmake, test=develop * add unit_tests and fix bugs, test=develop * update ../paddle/fluid/framework/framework.proto, test=develop * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop * update paddle/fluid/framework/version_test.cc, test=develop * add comments and rename interfaces, test=develop	6 years ago
Yang Zhang	cde73a7bbf	Expose `mutable_data` as python binding (#19932 ) * Expose `mutable_data` as python binding test=develop * Add test for device pointer binding test=develop * Make test compatible with python 2	6 years ago
Wojciech Uss	4286a6270d	Add support for new QAT models (#18970 ) * Add support for new QAT models test=develop Co-Authored-By: Michał Gallus <michal.gallus@intel.com> Co-Authored-By: Wojciech Uss <wojciech.uss@intel.com> * fixed fps results test=develop * fix top5 accuracy drop problem * updated for new QAT models * skip quantizing average pooling - dirty but working * add missing pass * added missing conv+brelu fuse pass * removed a call to non-existent pass test=develop * renamed pass test=develop * Adjust finding pooling scale to newest QAT models * Remove unnecessary code from quantization_mkldnn_pass * Copy Pooling input scale to output scale in QAT * Refactor & remove unused code in QAT * Incorporate fp32 FC into QAT test=develop * Enable graph drawing with debug flag test=develop * Add tests for QATv2 * Fix paths for QATv2 models test=develop * Add option to save transformed int8 qat model test=develop * Remove redundant lines from qat mkldnn pass test=develop * Delegate disablement of avg pooling to qat test=develop * fix CI bug, test=develop * Follow Wangzhen's Review, test=develop * Update API.spec test=develop * Name False in (is_unsigned, TensorScale) tuple test=develop	6 years ago
Chen Weihang	00d5375e0c	Add prune_backward function to cover complicated test_program.clone situation (#19772 )	6 years ago
chengduo	056fdedde3	Open fuse all reduce option (#19765 ) * Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop	6 years ago
mapingshuo	dca9b6c5b0	add feed_var_names to Prune interface (#19589 ) * Fix bug: add feed_vars to the prune function	6 years ago
hutuxian	c756b5d231	Paddlebox Framework (#18982 ) * Support looking up embeddings from BoxPS. * Add a _pull_box_sparse op, for now this op is not exposed to users. * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on. * Add 'BoxPSDataset' in python code. * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS. * Add UT. * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982	6 years ago
Leo Chen	6fb310ae29	Fix bug of getting bool Flags from os.environ (#19349 ) * fix bug of getting bool Flags from os.environ, test=develop * add empty loss_name in CompiledProgram for inplace grad test, test=develop	6 years ago
Leo Chen	a9d5fc5142	Enhance OpTest to check the consistency of operators when using and not using inplace (#19101 ) * add pybind interface to get all inplace ops, test=develop * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop * handle corner cases in op_test, test=develop * support outputs without tensor holder_, like XShape in reshape_op, test=develop * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop * use reshape_grad instead of reshape in FlattenGradOp, test=develop * fix error debug dims info for variables like XShape, test=develop * change computational order in sum_op to relieve computation difference using inplace, test=develop * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop * follow sneaxiy's comments, test=develop * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
Zeng Jinle	88f111f885	remove unused inplace act codes, test=develop (#19079 )	6 years ago
Leo Chen	8f53735437	Fix memory overwriting of tensors returned by executor (#19030 ) * fix memory overlapping of fetch var (return of executor.run), test=develop * fix wrong usage of ParallelExecutor in op_test, test=develop * remove useless parameter and simplify code * avoid tensor destruct untimely, test=develop * add testcase independent of OpTest, test=develop	6 years ago
liuwei1031	a43a763b54	fix warpctc.dll not found issue (#18761 ) * fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop	6 years ago
chengduo	20859c08e8	[DyGraph] Make multi-card program faster (#18892 ) * update parallel.py test=develop	6 years ago
Zeng Jinle	8008ab4e6b	Remove legacy C++ memory optimization codes (#18834 ) * remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop	6 years ago
chengduo	292dfbce63	fix build strategy doc (#18725 ) test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Zeng Jinle	ae58afc546	Feature/auto_growth_allocator (#18561 ) * feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop	6 years ago
guru4elephant	d714bf037c	remove async executor and add data_feed.proto to the deps of train demo (#18659 ) * remove async executor and add data_feed.proto to the deps of train demo	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
xsrobin	47e2ef38e9	add "import paddle.fluid as fluid" to examples lack of it	6 years ago
Zeng Jinle	5826b72e06	Refine CUDAPlace error message. (#18343 ) * refine cuda place error msg, test=develop * use LOG(ERROR)+exit(-1), test=develop	6 years ago
chengduo	25f3cd6486	Update execution_strategy option default value (#18183 ) * update execution_strategy option default value test=develop * fix doc error test=develop	6 years ago
Sylwester Fraczek	accb132f0f	fix slim int8 mkldnn multithreading issue (#18009 )	6 years ago
tensor-tang	5c06bff222	combine noavx and avx package (#17889 ) * support avx and noavx core * add catch and give some log test=develop * fix build test=develop * add missing package test=develop * fix pybind name test=develop * fix import error test=develop * conbime noavx core test=develop * add requirements test=develop * fix unkown message test=develop * fix api spec test=develop * refine and clean test=develop * update * pass dist ut * follow comments test=develop * refine scripts test=develop	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
wopeizl	453a49b1bc	Make ParallelExecutor support Windows GPU (#17787 ) * fix the ParallelExecutor on Windows test=develop * restrict to use one GPU only under windows	6 years ago
guru4elephant	d52391094d	fix prepare context redundant code problem, optimize executor by cach… (#17743 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * cache sub_scope, program, var when use_program_cache=True is set * make fetch_list runable with variables, add more unittest for use_program_cache	6 years ago
Zeng Jinle	432ac70124	clean code of py_layer in dygraph mode,test=develop (#17661 )	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
guru4elephant	326bf8291a	add Run Prepared Ctx (#17616 ) add Run Prepared Ctx, fix pybind problem	6 years ago
flame	2280f185d7	BuildStrategy api comment (#17348 ) Python examples of fluid.layers.io.double_buffer and some BuildStrategy's methods.	6 years ago
guru4elephant	7f8bc49d00	polish_executor_and_add_ctx_cache (#17536 ) * polish_executor_and_add_ctx_cache	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
Qiao Longfei	92e7d5d7cc	fix distribute doc test=develop (#17318 ) * fix distribute doc	6 years ago
Qiao Longfei	58f7695ab2	Async exe support communicator (#17386 ) Async exe support communicator	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
Yan Xu	0217555530	polish parallel dygraph code (#17164 ) * add var grad hook test=develop	6 years ago
Jiabin Yang	d7df4e5e5b	Fix/Fix memory leak in dygraph (#17394 ) * test=develop, add gradient sort backward strategy * test=develop, fix test by add FLAGS_cudnn_deterministic on new tests * test=develop, fix memory leak in dygraph mode * test=develop, fix memory leak in dygraph mode * test=develop, polish code * test=develop, polish code * test=develop, polish code	6 years ago
Tao Luo	68ec0a6f74	make parallel_executor support FLAGS_use_mkldnn (#17341 ) * make parallel_executor support FLAGS_use_mkldnn test=develop * add warning when set mkldnn_enabled_op_types_ in non-mkldnn env test=develop	6 years ago
Jiabin Yang	4624d7c642	test=develop, add gradient sort backward strategy (#17125 ) * test=develop, add gradient sort backward strategy * test=develop, fix test by add FLAGS_cudnn_deterministic on new tests	6 years ago
chengduo	bc833945a4	Add DropLocalExeScopes in ParallelExecutor (#17297 ) * reset drop local scope counter test=develop	6 years ago
lujun	e388a1fb66	Repair api example (#17221 ) Fix the following API examples: paddle.fluid.scope_guard paddle.fluid.backward.append_backward paddle.fluid.cpu_places paddle.fluid.cuda_pinned_places paddle.fluid.cuda_places paddle.fluid.in_dygraph_mode paddle.fluid.CUDAPlace paddle.fluid.CPUPlace paddle.fluid.CUDAPinnedPlace	6 years ago
chengduo	04bd413acb	Code Clean: Move all pass to paddle::framework::ir (#17228 ) * move pass to ir * polish code test=develop * fix dependency test=develop	6 years ago
Zeng Jinle	f2fa3f7300	fix api doc,test=develop (#17241 )	6 years ago
Zeng Jinle	5dfe2ab9e8	Fix mem leak when converting Tensor to numpy array (#17182 ) * fix mem leak when converting Tensor to numpy array test=develop * remove unused unittest,test=develop * follow comments, test=develop * fix dygraph bug,test=develop	6 years ago
Yan Xu	0b07eef118	ParallelDyGraph with GPU collective mode (#16827 ) implement dygraph.parallel.DataParallel to hook reduce op.	6 years ago
guru4elephant	03d469ad98	Merge pull request #17005 from wopeizl/fix_ncclwrapper_win1 fix nccl wrapper on windows	6 years ago
liuwei1031	a770ce0615	add doc for memory_optimize, test=develop (#17010 ) * add doc for memory_optimize, test=develop * update doc, test=develop * doc update, test=develop	6 years ago
wopeizl	51a0243a56	fix nccl wrapper on windows test=develop	6 years ago
dongdaxiang	b091139049	add nccl wrapper for python API	6 years ago
Yiqun Liu	112f16143b	Add an option to enable the cache of expected kernel in train phase. (#16724 ) * Add an option to enable the cache of expected kernel in train phase. test=develop * Change the default value of cache_expected_kernel to true.	6 years ago
chengduo	55b15db5af	Add unit test for fuse all_reduce ops (#16699 ) * test fuse all_reduce	6 years ago
Yiqun Liu	3fe8cb0dd7	Enable the runtime_context_cache pass in train phase (#16640 ) * Try to enable the runtime_context_cache pass in train phase. * Put the append of runtime_context_cache pass ahead of multi_dev passes. test=develop	6 years ago
Yan Xu	b4c3a6aa0b	[Imperative] implement imperative NCCLParallelContext (#16477 ) add NCCLParallelContext for parallel dygraph	6 years ago
chengduo	b75a69bad6	Add Stream for fetch op handle (#16600 ) * expose fuse broadcast ops	6 years ago
乔龙飞 Qiao Longfei	21622ca30b	Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator Add async ssa graph executor communicator	6 years ago
sneaxiy	10249c0b78	Merge develop test=develop	6 years ago
Qiao Longfei	adf272bcec	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	6 years ago
sneaxiy	33473890f3	Merge develop test=develop	6 years ago
dongdaxiang	f612877797	add incubate for unified API	6 years ago

1 2 3 4 5 ...

443 Commits (f9c97dd7287119f7546c90d9813ec825b3c956d2)