Paddle

Commit Graph

Author	SHA1	Message	Date
Zeng Jinle	67e88424e5	Polish jit trace codes (#21218 ) * polish jit trace codes, test=develop * polish codes again by removing var_id, test=develop	5 years ago
xujiaqi01	9e045170c0	add copy table (#21086 ) * copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars	5 years ago
Zeng Jinle	5fdfbe3413	Add friendly dygraph trace API (#21091 ) * friendly trace interface, test=develop * refine TracedLayer, test=develop * add some docs, test=develop	5 years ago
WangXi	de5d3ff688	Fix dgc buffer illegal & reuse velocity (#21012 )	5 years ago
Leo Chen	008ed65fd5	Add c++ global current tracer for dygraph (#20882 ) * Add c++ global current tracer for dygraph, test=develop * add tracer property in c++, test=develop * support different place, test=develop * add unittest for tracer, test=develop	5 years ago
Leo Chen	2c3c579b9b	tensor.set() supports array list and remove unused code, test=develop (#20959 )	5 years ago
Leo Chen	9974e40787	Update Tensor.set() to support float16 (#19964 ) * don't expose numerous Tensor.set(), test=develop * fix condition, test=develop * fix float16 bug, test=develop * feed should be Tensor or np.array, not Variable or number, test=develop * use forcecast to copy numpy slice to new array, test=develop * remove float16-uint16 hacking, test=develop	5 years ago
hong	8c4573a3cb	GradMaker for dygraph (#19706 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * optimize grad maker; test=develop * optimize grad maker * test * grad make optim; test=develop * fix unittest bugs; test=develop * add dygraph grad op maker and split_op * grad op maker refactor; test=develop * add dygraph grad maker; test=develop * fix op deformable_conv_v1_op bug; test=develop * fix deformable_conv prroi pool bugs; * fix new op grad op maker bug; test=develop * fix split by ref bug; test=develop * fix dygraph auto prune bug; test=develop * fix test_trace bug; test=develop * fix fused emb seq pool bug; test=develop * remove useless code in op_desc file; test=develop * remove useless code, StrVarBaseNode; test=develop * fix review issues; test=develop * fix rank_loss grad maker; test=develop * remove flag in VarBase; test=develop * fix distributed_notify_op compile bug ; test=develop * fix reshape op double grad; test=develop * fix expand as op; test=develop * add impertive type_defs.h for demo_train; test=develop * fix inference lib cmake; test=develop * fix inference lib; test=develop * fix infernce_lib; test=develop * fix inference cmake; test=develop * fix inference lib; test=develop * fix inference lib; test=develop * remove condition dygraph grad maker, modify local name; test=develop * fix split grad maker bug; test=develop * fix pyramid_op bug; test=develop * change travis time out limit; test=develop * restore travis; test=develop * change timeout limit; test=develop	5 years ago
Yiqun Liu	16e4d02675	Refine the cache of program, context and scope in executor. (#18483 ) * Refine the cache of program, context and scope in executor. test=develop * Refine the unittest test_executor_and_use_program_cache. * Add the test the PaddingRNN with use_program_cache=True. test=develop * Remove a check. test=develop * Refine the unittest to check whether it is correct when setting use_program_cache=True. test=develop	5 years ago
hong	ff0886a92a	save load problem fix and new feature add (#20823 ) * fix persistable; * fix save load bugs; test=develop * fix bug; test=develop * add example for new io api; test=develop * addd example; test=develop	5 years ago
Huihuang Zheng	95ba4bd2ab	Add shape and type check at read_op (#20754 )	5 years ago
Zeng Jinle	8ff6b289bd	[Dygraph to static graph]JIT/Trace (#20775 ) * jit/trace 1st version, test=develop * add more unittests, test=develop	5 years ago
WangXi	507afa8a8a	Fix dgc nan by stripping nccl from sparseReduce. (#20630 )	5 years ago
633WHU	12e4be0382	Dlpack support (#20039 ) * support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop	5 years ago
Pei Yang	443f604c3b	add DisableGlogInfo() to AnalysisConfig, test=develop (#20581 )	5 years ago
Zeng Jinle	40effc61af	Refine py_reader exit (#20331 ) * refine py_reader exit, test=develop * fix multiprocess_reader exception unittest, test=develop * increase code coverage for legacy fluid.layers.py_reader, test=develop	5 years ago
liu zhengxi	f855a86c93	update the api en doc of BuildStrategy (#20445 ) * update the api en doc of BuildStrategy and its setting, test=develop, test=document_fix * update api.spec, test=develop, test=document_fix * update the en doc of fuse_relu_depthwise_conv, test=develop, test=document_fix	5 years ago
tangwei12	a010d883b4	doc fix, test=develop, test=document_fix (#20239 ) * doc fix, test=develop, test=document_fix	5 years ago
Leo Chen	5a7142ac4e	Update en APIs of LoDTensor (#20115 ) * polish en APIs of LodTensor, test=develop, test=document_dix * polish en APIs of LoDTensor, test=develop, test=document_fix * follow comments, test=develop, test=document_dix	5 years ago
Jiabin Yang	df102f6428	Refine en doc (#20409 ) * test=develop, fix docker with paddle nccl problem * test=develop, refine en_doc for Variable and Program * test=document_fix, fix English doc for Variable and Program * test=document_fix, refine astype code block style * test=document_fix, add example code for Variable properties * test=document_fix, fix BackwardStrategy English Doc * test=document_fix, fix syntax * test=document_fix, refresh API.spec * test=document_fix, refine api spec * test=document_fix, refine api spec	5 years ago
hong	fa43e80e19	New save load interface (#20148 ) * add new save load interface; test=develop * add new save interface; test=develop * add save load interface ; * fix save load error; * fix dygraph set dict bug; * add save load unit test; test=develop * fix test_imperative_optimizer bug; test=develop * fix unitest optimizer bug; test=develop * fix code coverage; test=develop * fix converage; test=develop * add document for apis; test=develop * fix unitest error; test=develop * fix save load unit test error; test=develop * fix error message; test=develop * change set_parameter set_optimizer to save_dygraph; test=develop * add load_graph check; test=develop * fix api spec; test=develop	5 years ago
Leo Chen	f4c56e9f51	Polish en doc of LoDTensorArray, test=document_fix (#19972 ) * Polish en doc of LoDTensorArray, test=develop, test=document_fix * follow comments, test=develop, test=document_dix	5 years ago
Youwei Song	20f68916ed	refine CUDA CPU places en doc (#20243 ) * fix CUDA CPU places, test=document_fix, test=develop * fix CUDAPlace param doc, test=document_fix, test=develop * fix CUDAPlace param doc, test=document_fix, test=develop	5 years ago
tangwei12	c9139c3db3	trainer from dataset fetch targets (#19760 ) add executor.FetchHandler for train/infer from the dataset	5 years ago
Chengmo	728ec1b43d	Add GEO-SGD distribute training algorithm (#20018 ) * refector geo sgd & communicator	5 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	5 years ago
石晓伟	01b9d07963	update operator compatible info, test=develop (#19978 ) * update operator compatible info, test=develop * revert cmake/version.cmake, test=develop * add unit_tests and fix bugs, test=develop * update ../paddle/fluid/framework/framework.proto, test=develop * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop * update paddle/fluid/framework/version_test.cc, test=develop * add comments and rename interfaces, test=develop	5 years ago
tangwei12	8f0b3c0516	the integrated communicator (#19849 ) * add a base class for the Communicator * add AsyncCommunicator Impl for async distributed training	5 years ago
Yang Zhang	cde73a7bbf	Expose `mutable_data` as python binding (#19932 ) * Expose `mutable_data` as python binding test=develop * Add test for device pointer binding test=develop * Make test compatible with python 2	5 years ago
Huihuang Zheng	88af4ab650	Add new data layer (#19916 ) The new "fluid.data" changes old "fluid.layers.data": 1. Add shape and dtype check. 2. Remove "append_batch_size" parameter. We won't offer this in the new data layer because other deep learning platforms don't have this kind of data layer pre-processing. It may confuse users. 3. Remove "stop gradient" parameter because the data layer doesn't do back-propagation TODO： Now data layer feeded by executor is checked, will we want to check the feed data of readers in the future?	5 years ago
Wojciech Uss	4286a6270d	Add support for new QAT models (#18970 ) * Add support for new QAT models test=develop Co-Authored-By: Michał Gallus <michal.gallus@intel.com> Co-Authored-By: Wojciech Uss <wojciech.uss@intel.com> * fixed fps results test=develop * fix top5 accuracy drop problem * updated for new QAT models * skip quantizing average pooling - dirty but working * add missing pass * added missing conv+brelu fuse pass * removed a call to non-existent pass test=develop * renamed pass test=develop * Adjust finding pooling scale to newest QAT models * Remove unnecessary code from quantization_mkldnn_pass * Copy Pooling input scale to output scale in QAT * Refactor & remove unused code in QAT * Incorporate fp32 FC into QAT test=develop * Enable graph drawing with debug flag test=develop * Add tests for QATv2 * Fix paths for QATv2 models test=develop * Add option to save transformed int8 qat model test=develop * Remove redundant lines from qat mkldnn pass test=develop * Delegate disablement of avg pooling to qat test=develop * fix CI bug, test=develop * Follow Wangzhen's Review, test=develop * Update API.spec test=develop * Name False in (is_unsigned, TensorScale) tuple test=develop	5 years ago
xujiaqi01	cedc04775c	support change shuffle and train thread num (#19841 ) * support change shuffle thread num * support change train thread num * fix receive shuffle data of each channel * data norm stop gradient * add check thread_tensor type and root_tensor type when merge metric * remove sleep in shuffle, add config * add config of pslib client to client communication * fix xbox str * add data norm op testcase * add flush in trainer finalize	5 years ago
Zeng Jinle	0436efd6a3	Unify DataLoader APIs (#19305 ) * unify DataLoader APIs, test=develop * integrate iterable CPU Dataset, test=develop add GPU dataset supporting, test=develop * add unittests for dataset, test=develop * add more docs to dataloader apis, test=develop, test=document_preview * refine doc, test=develop * refine doc again, test=develop * increase coverage, test=develop	5 years ago
Jiabin Yang	454254115e	Feature/auto prune in dygraph (#19757 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * test=develop, refoctor name to make it easier to understand * test=develop, refoctor name to make it easier to understand * test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ * test=develop, fix ut failed on parallel se-resnext * test=develop, change one more PADDLE_ENFORCE * support auto prune in dygraph mode * test=develop, support auto prune * test=develop, merge develop conflict * test=develop, fix test_layer and test_tracer ut * test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs	5 years ago
Pei Yang	9cbc1eff2d	zerocopytensor support uint8, analysis config support profile, analysis predictor support GetInputTensorShape, test=develop (#19822 )	5 years ago
xujiaqi01	6bf298bf09	support preload thread, optimize hdfs log, fix master+patch bug (#19695 ) * support preload thread * sleep before fleet wrapper exit for pslib core dump * optimize hdfs log * fix master+patch bug	5 years ago
Chen Weihang	00d5375e0c	Add prune_backward function to cover complicated test_program.clone situation (#19772 )	5 years ago
chengduo	056fdedde3	Open fuse all reduce option (#19765 ) * Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop	5 years ago
Tao Luo	f05d2c519d	paddle::framework::vectorize() templatization [PART3] (#19643 ) * paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop	6 years ago
Jiabin Yang	e9233d1c1e	Refactor dygraph (#19107 ) * refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * test=develop, refoctor name to make it easier to understand * test=develop, refoctor name to make it easier to understand * test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ * test=develop, fix ut failed on parallel se-resnext * test=develop, change one more PADDLE_ENFORCE	6 years ago
mapingshuo	dca9b6c5b0	add feed_var_names to Prune interface (#19589 ) * Fix bug: add feed_vars to the prune function	6 years ago
hutuxian	c756b5d231	Paddlebox Framework (#18982 ) * Support looking up embeddings from BoxPS. * Add a _pull_box_sparse op, for now this op is not exposed to users. * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on. * Add 'BoxPSDataset' in python code. * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS. * Add UT. * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982	6 years ago
Thunderbrook	1fe468d319	support debug each output of each ins (#19004 ) * dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop * support debug tensor of each ins test=develop * support debug tensor of each ins test=develop * learning rate * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style test=develop * code style test=develop * unitest * style * style * multi phase * add channel * code style * style * style * unitest * style * define * define test=develop * style test=develop * rm define test=develop * linux * linux test=develop * style test=develop * output format test=develop * windows ci test=develop	6 years ago
Leo Chen	6fb310ae29	Fix bug of getting bool Flags from os.environ (#19349 ) * fix bug of getting bool Flags from os.environ, test=develop * add empty loss_name in CompiledProgram for inplace grad test, test=develop	6 years ago
liu zhengxi	32598ffd8f	Python infer api update and add unit test (#19353 ) * python inference api supports numpy and add unit test, fix unit test fail in test_slim_int8_googlenet and test_slim_int8_mobilenet	6 years ago
Leo Chen	a9d5fc5142	Enhance OpTest to check the consistency of operators when using and not using inplace (#19101 ) * add pybind interface to get all inplace ops, test=develop * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop * handle corner cases in op_test, test=develop * support outputs without tensor holder_, like XShape in reshape_op, test=develop * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop * use reshape_grad instead of reshape in FlattenGradOp, test=develop * fix error debug dims info for variables like XShape, test=develop * change computational order in sum_op to relieve computation difference using inplace, test=develop * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop * follow sneaxiy's comments, test=develop * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
Tao Luo	5f5648a8ff	Revert "Python inference API support numpy (#19009 )" (#19160 ) test=develop	6 years ago
flame	b7e1a1d7e7	Python inference API support numpy (#19009 ) test=develop	6 years ago
yaoxuefeng	9150cf50fc	add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871 ) * add ctr related metric layer test=develop * add save cache and slots shuffle test=develop * add save cache and slots shuffle test=develop * fix error * fix error * fix style for ci * fix for comments * change SlotsShuffle input to std::strinf for generality * fix style * fix style * fix style * fix style * fix style * fix style * fix stylr * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * change non-const reference to pointer * fix style * fix style * fix style test=develop * fix style test=develop * add return ins num in ctr metric op * change dtype to float in metric_op.py * fix error test=develop * fix style test=develop * fix API spec * fix API spec * fix API spec test=develop * add UT test=develop	6 years ago
Zeng Jinle	88f111f885	remove unused inplace act codes, test=develop (#19079 )	6 years ago
jiaqi	a99bc64c63	add fleet util, add some interface in hdfs util (#18752 ) * add fleet util (fleet/utils/fleet_util.py): functions for users' convenience * add some interface in hdfs util : hdfs is_file、hdfs cat	6 years ago
Leo Chen	8f53735437	Fix memory overwriting of tensors returned by executor (#19030 ) * fix memory overlapping of fetch var (return of executor.run), test=develop * fix wrong usage of ParallelExecutor in op_test, test=develop * remove useless parameter and simplify code * avoid tensor destruct untimely, test=develop * add testcase independent of OpTest, test=develop	6 years ago
liuwei1031	a43a763b54	fix warpctc.dll not found issue (#18761 ) * fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop	6 years ago
flame	65d987527d	python inference enable_memory_optim(#18817 ) python inference API support enable_memory_optim	6 years ago
Zhaolong Xing	61238d31f7	Trt fp16 support (#18860 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop	6 years ago
chengduo	20859c08e8	[DyGraph] Make multi-card program faster (#18892 ) * update parallel.py test=develop	6 years ago
Zeng Jinle	8008ab4e6b	Remove legacy C++ memory optimization codes (#18834 ) * remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop	6 years ago
Thunderbrook	52c1431eee	add clear_model interface in fleetwrapper (#18815 ) * dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop	6 years ago
chengduo	292dfbce63	fix build strategy doc (#18725 ) test=develop	6 years ago
jiaqi	d18aabb472	support patch data, add load_one_table, fix bug (#18509 ) （1）support patch data （merge slots of instances of same line id, modify dense layer which changes its size）（2）add fleet load_one_table interface, support load from paddle model and load from pslib model （3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error. （5）add more debug info in TrainFilesWithProfiler	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Zeng Jinle	ae58afc546	Feature/auto_growth_allocator (#18561 ) * feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop	6 years ago
guru4elephant	d714bf037c	remove async executor and add data_feed.proto to the deps of train demo (#18659 ) * remove async executor and add data_feed.proto to the deps of train demo	6 years ago
123malin	b414645a65	fix #17430 : int64类型的attr训练非预期 (#18264 ) * fix int64_t * update fill constant op unittest * add empty line	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	d3003a1620	Feature/buffer_shared_inplace (#17911 ) * feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
xsrobin	47e2ef38e9	add "import paddle.fluid as fluid" to examples lack of it	6 years ago
lujun	fd6631ef2f	Fix dygraph show style (#18297 ) Fix dygraph show style for FluidDoc.	6 years ago
tangwei12	999d9a59a5	fix communicator with pyreader (#18350 ) * add is_runnning in communicator, test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Zeng Jinle	5826b72e06	Refine CUDAPlace error message. (#18343 ) * refine cuda place error msg, test=develop * use LOG(ERROR)+exit(-1), test=develop	6 years ago
jiaqi	3f8031e256	dataset (#17973 ) (1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training. (2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem. (3) add Channel, Archive in paddle/fluid/framework (4) change dataset from shared_ptr to unique_ptr in pybind (5) move create/destroy readers from trainer to dataset (6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network. (7) fix thread num bug of Dataset when filelist size < thread num (8) support set_queue_num in InMemoryDataset	6 years ago
chengduo	25f3cd6486	Update execution_strategy option default value (#18183 ) * update execution_strategy option default value test=develop * fix doc error test=develop	6 years ago
Zeng Jinle	25ab23be28	Fix dygraph mem leak (#18082 ) * fix dygraph mem leak, test=develop * polish msg, test=develop	6 years ago
Sylwester Fraczek	accb132f0f	fix slim int8 mkldnn multithreading issue (#18009 )	6 years ago
tensor-tang	5c06bff222	combine noavx and avx package (#17889 ) * support avx and noavx core * add catch and give some log test=develop * fix build test=develop * add missing package test=develop * fix pybind name test=develop * fix import error test=develop * conbime noavx core test=develop * add requirements test=develop * fix unkown message test=develop * fix api spec test=develop * refine and clean test=develop * update * pass dist ut * follow comments test=develop * refine scripts test=develop	6 years ago
Jiabin Yang	4d5f6937c3	Feature/refine api for dygraph (#17907 ) * WIP * WIP * test=develop, add api doc and example code for dygraph	6 years ago
gongweibao	fbbdc9ccad	Add backward and optimizer operator dependency pass. (#17746 )	6 years ago
wopeizl	453a49b1bc	Make ParallelExecutor support Windows GPU (#17787 ) * fix the ParallelExecutor on Windows test=develop * restrict to use one GPU only under windows	6 years ago
翟飞跃	993c703bcc	INT8 MKL-DNN v2 integrate to slim (#17634 ) * refactor PR 16865 * delete mergetool files * test=develop * test=develop * test=develop * test=develop * create dir for int8 model before call SaveOptimModel * test=develop * mkldnn int8 only support linux; test=develop * refine code; test=develop * remove comment; test=develop * refine code; test=develop * fix bug; test=develop * add exception for mkldnn_post_training_strategy * reuse int8v2 CAPI dataset; test=develop * fix accuracy check bug; test=develop * remove tab * convert files to unix format * test=develop * reduce CI time;test=develop * reduce CI time and refine code;test=develop * refine comment; test=develop * add cmake FLAGS;test=develop * remove predict_num;test=develop	6 years ago
wopeizl	841553e13f	use pyreader to read data in dygraph mode (#17314 ) * use pyreader to read data * add return_list to PyReader to support return value represented as list	6 years ago
Zeng Jinle	674e0ce2d6	Use Python C-API to speed up dygraph trace (#17837 ) * use python api to reduce python time cost, test=develop * fix travis ci, test=develop * fix Py_None error,test=develop	6 years ago
Jiabin Yang	3b70f870e2	Using Smart pointer to optimizer memory usage of dyGraph (#17768 ) * for debug * test=develop, memory optimize for dygraph using shared_ptr * test=develop, fix travis ci showed error * test=develop, fix bug for recurrent usage of varbase * test=develop, init varbase when it need to be Add	6 years ago
guru4elephant	d52391094d	fix prepare context redundant code problem, optimize executor by cach… (#17743 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * cache sub_scope, program, var when use_program_cache=True is set * make fetch_list runable with variables, add more unittest for use_program_cache	6 years ago
Zeng Jinle	432ac70124	clean code of py_layer in dygraph mode,test=develop (#17661 )	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
guru4elephant	326bf8291a	add Run Prepared Ctx (#17616 ) add Run Prepared Ctx, fix pybind problem	6 years ago
flame	2280f185d7	BuildStrategy api comment (#17348 ) Python examples of fluid.layers.io.double_buffer and some BuildStrategy's methods.	6 years ago
guru4elephant	7f8bc49d00	polish_executor_and_add_ctx_cache (#17536 ) * polish_executor_and_add_ctx_cache	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
Qiao Longfei	92e7d5d7cc	fix distribute doc test=develop (#17318 ) * fix distribute doc	6 years ago
Qiao Longfei	58f7695ab2	Async exe support communicator (#17386 ) Async exe support communicator	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
Yan Xu	0217555530	polish parallel dygraph code (#17164 ) * add var grad hook test=develop	6 years ago
Jiabin Yang	d7df4e5e5b	Fix/Fix memory leak in dygraph (#17394 ) * test=develop, add gradient sort backward strategy * test=develop, fix test by add FLAGS_cudnn_deterministic on new tests * test=develop, fix memory leak in dygraph mode * test=develop, fix memory leak in dygraph mode * test=develop, polish code * test=develop, polish code * test=develop, polish code	6 years ago
Zhen Wang	4a1b7fec96	Add setting Scope function for the graph class (#17417 ) * add set_not_owned function for graph * add scope set. test=develop * add scope_ptr enforce not null before setting.test=develop	6 years ago
jiaqi	66d51206b1	add save/load model, shrink table, cvm, config file & fix pull dense bug (#17118 ) * add save/load model, shrink table, cvm, config file & fix pull dense bug test=develop * fix global shuffle bug, fix pull dense bug, fix release memeory bug, fix shrink error add client flush, add get data size test=develop * fix global shuffle bug test=develop * fix global shuffle bug test=develop * fix code style test=develop * fix code style & modify pslib cmake test=develop * fix error of _role_maker test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix code style test=develop * fix windows compile error of fleet test=develop * fix global shuffle bug * add comment test=develop * update pslib.cmake test=develop * fix fill sparse bug test=develop * fix push sparse bug test=develop	6 years ago
Tao Luo	68ec0a6f74	make parallel_executor support FLAGS_use_mkldnn (#17341 ) * make parallel_executor support FLAGS_use_mkldnn test=develop * add warning when set mkldnn_enabled_op_types_ in non-mkldnn env test=develop	6 years ago
Jiabin Yang	4624d7c642	test=develop, add gradient sort backward strategy (#17125 ) * test=develop, add gradient sort backward strategy * test=develop, fix test by add FLAGS_cudnn_deterministic on new tests	6 years ago
chengduo	bc833945a4	Add DropLocalExeScopes in ParallelExecutor (#17297 ) * reset drop local scope counter test=develop	6 years ago
qingqing01	e32c9888f5	Double backward of conv2d. (#17211 ) * Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables，return None in Python.	6 years ago
lujun	e388a1fb66	Repair api example (#17221 ) Fix the following API examples: paddle.fluid.scope_guard paddle.fluid.backward.append_backward paddle.fluid.cpu_places paddle.fluid.cuda_pinned_places paddle.fluid.cuda_places paddle.fluid.in_dygraph_mode paddle.fluid.CUDAPlace paddle.fluid.CPUPlace paddle.fluid.CUDAPinnedPlace	6 years ago
chengduo	04bd413acb	Code Clean: Move all pass to paddle::framework::ir (#17228 ) * move pass to ir * polish code test=develop * fix dependency test=develop	6 years ago
Zeng Jinle	f2fa3f7300	fix api doc,test=develop (#17241 )	6 years ago
石晓伟	a72dbe9abf	Cherry-pick benchmark related changes from release/1.4 (#17156 ) * cherry-pick commit from `8877054` * cherry-pick commit from `3f0b97d` * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit `8643dbc233`) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit `7ad182e16c`) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit `e14ab180fe`) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit `4b9fa42307`) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit `a6fb066f90`) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit `8121b3eccb`) * 1. add shuffle_channel_detect (cherry picked from commit `6efdea8997`) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit `e4726a066f`) * Modify symbol export rules test=develop	6 years ago
Zeng Jinle	c5eeecca7c	Fix tensor_py.h (#17195 ) * fix tensor_py,test=develop * change class name,test=develop	6 years ago
Zeng Jinle	5dfe2ab9e8	Fix mem leak when converting Tensor to numpy array (#17182 ) * fix mem leak when converting Tensor to numpy array test=develop * remove unused unittest,test=develop * follow comments, test=develop * fix dygraph bug,test=develop	6 years ago
Yan Xu	0b07eef118	ParallelDyGraph with GPU collective mode (#16827 ) implement dygraph.parallel.DataParallel to hook reduce op.	6 years ago
guru4elephant	03d469ad98	Merge pull request #17005 from wopeizl/fix_ncclwrapper_win1 fix nccl wrapper on windows	6 years ago
liuwei1031	a770ce0615	add doc for memory_optimize, test=develop (#17010 ) * add doc for memory_optimize, test=develop * update doc, test=develop * doc update, test=develop	6 years ago
qingqing01	ea42e431f8	Speed unit testing. (#16978 ) * Speed affine_channel_op unit testing * Add check in tensor_py * Fix ONLY_CPU Compiling	6 years ago
wopeizl	51a0243a56	fix nccl wrapper on windows test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
guru4elephant	bbc6c5714f	Merge pull request #16887 from guru4elephant/add_nccl_context_pybind Add nccl context pybind	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
dongdaxiang	466d177d09	add pybind dependency test=develop	6 years ago
dongdaxiang	4aa6f679b5	add pybind dependency test=develop	6 years ago
dongdaxiang	b091139049	add nccl wrapper for python API	6 years ago
Yiqun Liu	112f16143b	Add an option to enable the cache of expected kernel in train phase. (#16724 ) * Add an option to enable the cache of expected kernel in train phase. test=develop * Change the default value of cache_expected_kernel to true.	6 years ago
chengduo	55b15db5af	Add unit test for fuse all_reduce ops (#16699 ) * test fuse all_reduce	6 years ago
Yiqun Liu	3fe8cb0dd7	Enable the runtime_context_cache pass in train phase (#16640 ) * Try to enable the runtime_context_cache pass in train phase. * Put the append of runtime_context_cache pass ahead of multi_dev passes. test=develop	6 years ago
guru4elephant	7d653f0aed	Merge pull request #16652 from xjqbest/dataset_merge_develop fix dataset bug	6 years ago
xjqbest	6a57e8075a	remove trainer_id in datafeed and dataset test=develop	6 years ago
Yan Xu	b4c3a6aa0b	[Imperative] implement imperative NCCLParallelContext (#16477 ) add NCCLParallelContext for parallel dygraph	6 years ago
xjqbest	271b7147cc	fix dataset bug test=develop	6 years ago
chengduo	b75a69bad6	Add Stream for fetch op handle (#16600 ) * expose fuse broadcast ops	6 years ago
乔龙飞 Qiao Longfei	21622ca30b	Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator Add async ssa graph executor communicator	6 years ago
sneaxiy	10249c0b78	Merge develop test=develop	6 years ago
Qiao Longfei	adf272bcec	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	6 years ago
xjqbest	9b84e8e66b	fix code style test=develop	6 years ago
xjqbest	a99c8d0c29	fix client to client communication bug test=develop	6 years ago
sneaxiy	33473890f3	Merge develop test=develop	6 years ago
dongdaxiang	720647e17f	rebase current develop and fix conflict test=develop	6 years ago
dongdaxiang	45eb6f0765	run pre-commit check files and fix code style problem test=develop	6 years ago
xjqbest	e95cafd9a7	fix code style & add dataset testcase test=develop	6 years ago
xjqbest	be74de2c61	fix code style & fix register bug & add release_memory test=develop	6 years ago
xujiaqi01	a5b1a0e12b	support multi dataset && add init model && fix bug	6 years ago
dongdaxiang	b7a202aa38	add distributed optimizer factory	6 years ago
dongdaxiang	f612877797	add incubate for unified API	6 years ago
dongdaxiang	317eb0aad3	add incubate for unified API	6 years ago
xujiaqi01	ecfc7df913	add dataset factory && fix style	6 years ago
xujiaqi01	3cea00bd52	store memory data in Dataset && fix bug	6 years ago
dongdaxiang	cc4def6ba5	fix some conflict for compilation	6 years ago
heqiaozhi	9bca1926c1	refactor & fix bug	6 years ago
xjqbest	2e9a836c6f	add DataSet and InMemoryDataFeed, support load data into memory and shuffle data	6 years ago
dongdaxiang	e36bbcc871	fix some typo and CMakefile.txt	6 years ago
xjqbest	824b84d185	add DataSet and InMemoryDataFeed, support load data into memory and shuffle data	6 years ago
dongdaxiang	be757096da	add pybind for fleet	6 years ago
Qiao Longfei	d8974e6da0	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator test=develop	6 years ago
chengduo	1096746cbf	Fuse Adam And SGD ops (#15933 ) * fuse optimizer	6 years ago
sneaxiy	2c836ff914	check default grad maker test=develop	6 years ago
Zeng Jinle	69cb9792ea	Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug Revert "Fix allocator bug"	6 years ago
chengduo	ed61d67c73	Fix the interface of Pass::Apply (#16484 ) * modify the interface of Pass::Allay test=develop * Polish code test=develop * Fix Travis CI test=develop * fix Pass::Apply interface test=develop * Fix Travis CI test=develop	6 years ago
Zeng Jinle	174d0d0b90	Revert "Fix allocator bug" add include headers to fix travis-ci test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
Zeng Jinle	644e8af4cf	Merge pull request #16424 from sneaxiy/fix_allocator_bug Fix allocator bug	6 years ago
Zeng Jinle	c7c6eeb44e	Merge pull request #16409 from sneaxiy/feature/advance_gc Enhance gc to support deleting tensor buffer in advance	6 years ago
wopeizl	c300b1ba69	Tensor index (#16223 ) * extend the slice function for python test=develop	6 years ago
Xin Pan	f8c279b11c	Merge pull request #16454 from panyx0718/imperative2 polish deepCF model to support real dataset	6 years ago
Qiao Longfei	30618409db	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator	6 years ago
chengduo	4f2278f032	Add doc for CPUPlace CUDAPlace CUDAPinPlace (#16442 ) test=develop	6 years ago
sneaxiy	78fb3a62e0	fix env variable settting bug test=develop	6 years ago
sneaxiy	2d92b6be98	merge develop test=develop	6 years ago
Xin Pan	fd24ab47ab	polish test=develop	6 years ago
sneaxiy	a7d0ac50b8	Merge develop	6 years ago
sneaxiy	7000ec85d9	fix some op grad maker fix ctest eager deletion disable bug test=develop	6 years ago
sneaxiy	f8ed2c229e	try to fix ci error test=develop	6 years ago
sneaxiy	c20db6357b	split PR test=develop	6 years ago
sneaxiy	2f54d9f995	Merge develop test=develop	6 years ago
sneaxiy	a93a9eef8f	add op registry type refine gc code test=develop	6 years ago
sneaxiy	953214ad97	add more unittest modify allocator strategy remove changes of legacy buddy_allocator test=develop	6 years ago
chengduo	f26ba5bddd	Fuse AllReduce (#15921 ) * fuse all_reduce test=develop * add fuse_parameter_groups_size test=develop * Polish code test=develop * Fix travis-ci test=develop * Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize test=develop * Add SetGroupAccordingToMemorySize test=develop * fix multi_devices_graph test=develop * reset params_grads test=develop * Polish code test=develop	6 years ago
Tao Luo	7d2740db83	Revert "cache runtime_context"	6 years ago
sneaxiy	fd23262e0c	merge develop, fix conflict test=develop	6 years ago
Qiyang Min	c7f1f3ed0c	Merge pull request #16214 from velconia/imperative_infer_var_type Implement imperative infer var type	6 years ago
Tao Luo	dbb92ee4b1	Merge pull request #16002 from luotao1/runtime_context cache runtime_context	6 years ago
sneaxiy	161b8ddcaa	Merge develop	6 years ago
minqiyang	b40e41fbd1	Polish code style test=develop	6 years ago
Qiyang Min	8e4ad008fb	Merge pull request #16198 from velconia/imperative_train_speed Improve imperative mode training speed	6 years ago
minqiyang	36dce65bb3	Take DataType and VarType apart test=develop	6 years ago
minqiyang	438bca9c3d	Implement Runtime Var Type Inference test=develop	6 years ago
luotao1	1b59bed989	Merge branch 'develop' into runtime_context	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
minqiyang	7355d41834	1. Add imperative gperf profiler 2. Add binutils 2.27 in manylinux support test=develop	6 years ago
luotao1	b2898c0f57	Merge branch 'develop' into runtime_context test=develop	6 years ago
minqiyang	98dfb492bb	Release GIL lock	6 years ago
sneaxiy	ac0e0f5181	merge develop test=develop	6 years ago
minqiyang	42e96a029f	Accelerate CPU part	6 years ago
sneaxiy	682f2dbf29	merge develop test=develop	6 years ago
sneaxiy	2c4fcaa683	merge develop	6 years ago
luotao1	d94fd97230	add runtime_context_cache_pass test=develop	6 years ago
Yan Xu	30568473ec	fix broadcast on mp mode (#15951 ) * fix broadcast with mp mode * polish code test=develop * fix bcast strategy test=develop * fic cpplint test=develop * fix py3 failed test=develop * fix comment test=develop * update comment test=develop	6 years ago
baojun	e3c37bd564	remove const_cast and refactor ngraph engine code (#15925 ) * remove concast_cast and refactor code test=develop * reduce flag use test=develop	6 years ago
Zhen Wang	ac6ef06ffa	Add the Clone method in Graph. test=develop	6 years ago
Zhen Wang	01eddf125c	Not add graph copy construction method. test=develop	6 years ago

... 2 3 4 5 6 ...

776 Commits (39d5bb6dce23d4bd9c1ef47ad6f2c5e50dba516a)