Paddle

Commit Graph

Author	SHA1	Message	Date
Jacek Czaja	ecd9f330c9	[MKL-DNN] Fix to face model on AVX512 platforms (#19282 ) - Refactor step 1 - Compilation fix - Yet another compilation fix - Even more compilation fix - Lint fixes test=develop - Removed deprectaed PADDLE_ENFORCE occurance test=develop - Candidate fix to BN forward - Lint fixes test=develop - Refactoring in data_layout_transform - compilation fix - Another comppilation fix - Step further into darkness - Yet another compilation fix - Yet another compilation fix - missing header - compilation fix - Added MKLDNN -> Paddle conversion in fetch op test=develop - Compilation fix test=develop - Lint test=develop - Mul fix - Fix to MKLDNN MUL op and Elementwise MUL UT test=develop - Workaround for diffrent weights with groups representation Paddle vs MKL-DNN. test=develop - Candidate fix for 5D convolution with groups - Refactor of fix for conv3d and conv2d in fetch op test=develop - Compilation fix - Still same compilation fix - Compilation fix - Compilation fix - Reverted refactoring of fixes - Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format not NHWC test=develop - minor fix in UT test=develop - Lint fixes test=develop	6 years ago
liuwei1031	d6cb1a4122	add dynamic C runtime support on windows, test=develop (#19502 )	6 years ago
Zeng Jinle	c2c5b1b941	remove signal raise msg, test=develop (#19527 )	6 years ago
Zeng Jinle	caf59d0f3f	Add signal message to stderr (#19421 ) * add signal message to stderr, test=develop * add unittests for ugly SignalHandle, test=develop	6 years ago
Yi Liu	efb05ba258	supports multiple NCCL communicators preserved in NCCLCommContext (#19407 ) * supports multiple NCCL communicators preserved in NCCLCommContext test=develop * add ut for c_comm_init_all operator and fix cuda resource release problem test=develop	6 years ago
wopeizl	b8aa37d529	save the callstack information to file when exception throws test=dev… (#19324 ) * save the callstack information to file when exception throws test=develop	6 years ago
Tao Luo	6527a7df67	replace part of PADDLE_ASSERT to PADDLE_ENFORCE (#19285 ) * replace part of PADDLE_ASSERT to PADDLE_ENFORCE test=develop * remove unused fallback_alloc_size_ * add unit-test of CUDAPinnedAllocator test=develop	6 years ago
Yihua Xu	b920395842	Use sparse matrix to implement fused emb_seq_pool operator (#19064 ) * Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop	6 years ago
Zeng Jinle	91a0911ca3	Make PADDLE_ENFORCE_EQ support types that cannot be converted to std::string (#19243 ) * make PADDLE_ENFORCE_EQ support cannot to string types, test=develop * follow huihuang's comments, test=develop	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Zeng Jinle	002f325dcd	add PADDLE_ENFORCE_CUDA_SUCCESS, test=develop (#19211 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
wopeizl	80b7ef6fc8	add tensorrt support for windows (#19084 ) * add tensorrt support for windows	6 years ago
Zhang Ting	c2063217e7	optimize error message for "embedding" and "cross_entropy" OP (#18765 ) * optimize error message, test=develop * optimize error message, test=develop	6 years ago
liuwei1031	a43a763b54	fix warpctc.dll not found issue (#18761 ) * fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop	6 years ago
Zeng Jinle	08fa98f7cc	Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950 ) * fix gpu_info, test=develop * fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop * fix bug again for reserving size, test=develop	6 years ago
Jacek Czaja	5cf2d38594	- Removed passing X from FWD to GRAD via device context (#18911 ) test=develop - Extracted key generation from FWD and GRAD into separate function test=develop - Compilation fix test=develop - another compilation test=develop	6 years ago
Huihuang Zheng	ea6ee76fa9	GPU allocation uses fraction of available memory (#18896 ) GPU allocation uses fraction of available memory, also fix the GetUsed without lock	6 years ago
Jacek Czaja	cfcb96d2df	[MKL-DNN] Fix int8 performance regression (#18758 ) test=develop - optimization of TID to string test=develop	6 years ago
Huihuang Zheng	0d3f16f53e	Try to modify external gflags to solve CI compilation (#18872 )	6 years ago
Huihuang Zheng	cfce4994cf	Merge cuda 9/10 dockerfile with root dockerfile (#18693 ) Also fix a dependency error which may cause compile error	6 years ago
lidanqing	9ecd8ee789	change ComputeINT8 to template version to remove checking dst_datatype code (#18756 ) * change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews * reverse user_residual_memory_p and user_bias_memory_p declaration scope test=develop	6 years ago
Jacek Czaja	95c1816ec0	[MKL-DNN] Extended LRN with reusing via Acquire API (#18675 ) test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop	6 years ago
chengduo	fd3aad6cb3	Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664 ) * support sparse gradients test=develop	6 years ago
Jacek Czaja	0d8e6c9b8b	MKL-DNN upgrade to 0.20 (#18370 ) test=develop	6 years ago
zhouwei25	772e09560e	Optimize the content of error reporting information, print error code and official document web sites (#18671 ) optimize the error reporting information of cuda related API index on develop: 130ac17 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop	6 years ago
Zeng Jinle	ae58afc546	Feature/auto_growth_allocator (#18561 ) * feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop	6 years ago
liuwei1031	759530966c	print out error code of cudaGetDeviceProperties if failed (#18643 )	6 years ago
Jacek Czaja	71d883b8ef	[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585 ) * - Added partial draft of pooling acquire - Workspace support - compilation fix - Added draft of pooling backward reimplementation - Segfault fix - reverted 'any' for diff_dst crewation in pooling - Lint fixes test=develop - lint fixes test=develop - Further lint fixes test=develop * - Fixes after review test=develop * - Lint fixes test=develop * - Even more lint fixes test=develop	6 years ago
Tao Luo	076f833110	add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580 ) * add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop	6 years ago
gongweibao	c0a82748cf	Polish backwards optimizer dependency codes and use more default values. (#18255 )	6 years ago
Zeng Jinle	be24e5b391	Clean unused code of dim and place (#18565 ) * clean code of dim and place, test=develop * fix failed unittests, test=develop	6 years ago
Jacek Czaja	8869d7f735	Activations MKLDNN ops refactoring (#18191 )	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Physher	0caa08ea40	Add mkldnn int8 mul-op kernel (#17834 )	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
chengduo	55baeceddb	Enhance execution error info (#18482 ) * enhance execution error info test=develop	6 years ago
Tao Luo	3f3112ceb0	add shape_blob for cache mkldnn primitive (#18454 ) test=develop	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Yi Liu	a873fa84ce	supports collective training with programs (#18392 ) 1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis	6 years ago
Brian Liu	4bc2987d2f	Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964 ) * Fix bug in quantize kernel which cause crash in vgg16/19 model test=develop * refine the code to reduce verbose code; test=develop * remove useless code; test=develop	6 years ago
Leo Zhao	681d3553f1	Fix potential mkldnn concat/pool/conv kernel issues (#18393 ) 1. some key generation method is not aligned with PR#17965 2. enlarge ptr lifetime to avoid memory release if SetBlob fails otherwise it will get core dump. test=develop	6 years ago
HaoRen	9931bc64f5	add dependecy of collective_helper (#18365 ) * add dependecy of collective_helper * test=develop fix dependecy of collective_helper	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
Jacek Czaja	c2efdfd5bc	[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146 ) * - Reusing of reuder used in elementwise_add_mkldnn - Added MKL-DNN sum prim reusing test=develop - Compilation fixes test=develop - Yet another compilation fix test=develop - Yet another compilation fix test=develo - Yet another linking fix test=develop - Final compilation fix test=develop - lint fixes test=develop - Lint fixes test=develop * - Fixes after review test=develop	6 years ago
chengduo	4978db2c10	Remove nccl dep when the number of GPU is 1 (#18158 ) * remove nccl dep when the number of GPU is 1 test=develop	6 years ago
gongweibao	f5caf3443c	Fix reinitialized ncclid error! (#18025 )	6 years ago
Jacek Czaja	84bb45c054	[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965 ) * - removed is_reusing_ * - Added TID to keys for reusing apart from softmax PD * - compilation fix * - Yet another compilation fix * - Batch Norm and Conv adapted * - Fix to softmax MT * - Fixes to MT code of MKL-DNN * - Lint fixes test=develop	6 years ago
hutuxian	969e6378b9	Pipeline Concurrency (#17402 ) Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
Zeng Jinle	3925bd81e8	Fix cuda/cudnn version detection error (#17853 ) * fix cuda/cudnn version detection error, test=develop * fix again, test=develop	6 years ago
chengduo	d1169afaa3	remove InstallFailureSignalHandler (#17828 ) test=develop	6 years ago
Leo Zhao	50326563d5	enable mkldnn primitive reuse for platform reorder (#17826 ) test=develop	6 years ago
wangchaochaohu	c10157a5df	revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753 ) * revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop	6 years ago
chengduo	863c75168c	polish error doc (#17772 ) test=develop	6 years ago
gongweibao	0d561ef442	fix 2dconn test=develop (#17681 )	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
mozga-intel	f2694e122d	[NGraph] Enable assign operator for a ngraph, test=develop (#17437 ) * Enable assign operator for a ngraph, test=develop * Cross_entropy operators needs to be updated	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
mozga-intel	109b5aed5a	[NGraph] Enable reshape operator test=develop (#17512 )	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
qingqing01	97f0ec2357	Fix compiling error with cuDNN 5.1 (#17458 ) test=develop	6 years ago
Zeng Jinle	eab34b2df6	fix_dygraph_mem_leak, test=develop (#17396 )	6 years ago
qingqing01	e32c9888f5	Double backward of conv2d. (#17211 ) * Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables，return None in Python.	6 years ago
zhaoyuchen2018	792443ef23	Refine elementwise kernel. (#16952 ) * Refine elementwise kernel. Add a simple cuda kernel if grad x and y both exist Use 2D block cuda kernel to do broadcast. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
chengduo	db5e74ab95	update assert (#17282 ) test=develop	6 years ago
baojun	7bd1d03ee5	Adding lrn op for ngraph engine (#17189 ) * added lrn op test=develop * Added CreateConstant method test=develop * avoid duplicates test=develop	6 years ago
Tao Luo	ff1661f12a	remove unused FLAGS_warpctc_dir (#17162 ) * remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop	6 years ago
Huihuang Zheng	e4a5332416	Fix a typo in gpu_info.cc (#17175 ) test=develop	6 years ago
Huihuang Zheng	b9494058b3	Use CudnnWorkspaceHandle in exhaustive search (#17082 ) 1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop	6 years ago
Zeng Jinle	0c335dcd2c	Make conv cudnn workspace size configurable (#17036 ) * make_conv_cudnn_ws_size_configurable, test=develop * change std::max to std::min test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
xuezhong	742d758747	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_infershape_bug2	6 years ago
xuezhong	5663fbfb0a	fix infershape bug test=develop	6 years ago
Jacek Czaja	87a44b1149	[MKL-DNN] Added reusing of primitive descriptors (fp32) (#16667 ) * - Reuse of conv PD - conv transpose pd reused - Added PD reusing of softmax and Batch Norm - Refactoring and removal of not needed routines of mkl-dnn ops test=develop - Fix to reusing conv test=develop - Lint fixes test=develop - Further lint fixes test=develop - Lint fixes test=develop - lint fixes test=develop - Lint workaround test=develop * - Fix after review on including boost as third party header test=develop * - Fix after review. Name change to something more descriptive test=develop	6 years ago
dongdaxiang	a659b37ace	make lodtensor_printer usable in gpu setting test=develop	6 years ago
Chen Weihang	0b2aec14b6	Revert "Model data cryption link all lib (#16555 )" test=develop This reverts commit `c38c7c5619`.	6 years ago
Chen Weihang	c38c7c5619	Model data cryption link all lib (#16555 ) * link the libwbaes.so into paddle * polish detail, test=develop * try fix mac_pr_ci error, test=develop * add compile option, test=develop * fix ci error, test=develop * ignore failed to find mac lib, test=develop * change cdn to bj, cdn can't get the latest version * trigger ci, test=develop * temporary delete win32 lib linking, test=develop * change https to http, test=develop * turn compile option on to off * turn compile option off to on, test=develop * try lib compiled by gcc4.8, test=develop * update lib version, test=develop * link other lib, test=develop * add setup config * delete false, test=develop * delete no_soname, test=develop * recover so name set * fix, test=develop * adjust make config, test=develop * remove link to wbaes, test=develop * remove useless define, test=develop	6 years ago
guru4elephant	76b49f02ee	Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop Train with pipe reader merge develop	6 years ago
gongweibao	fea91164b7	Fix windows compilation error! (#16546 ) * fix compiled test=develop * follow comments test=develop	6 years ago
dongdaxiang	3a79be6eb3	refine API spec test=develop	6 years ago
dongdaxiang	98dda08a85	fix pull sparse slow problem test=develop	6 years ago
dongdaxiang	93c3c7f9b3	fix dataset testcase problem test=develop	6 years ago
dongdaxiang	d739bab844	fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem test=develop	6 years ago
dongdaxiang	e3107a6ae0	fix windows compile problem test=develop	6 years ago
dongdaxiang	398004ece0	disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer test=develop	6 years ago
dongdaxiang	39362a8415	move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids test=develop	6 years ago
dongdaxiang	a0b59773af	fix code style	6 years ago
dongdaxiang	365be5d559	support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem test=develop	6 years ago
dongdaxiang	dc8cf36e4b	add more example on datagenerator test=develop	6 years ago
dongdaxiang	6bf796df14	refine print fetch list	6 years ago
dongdaxiang	cf1360643f	add printer for fetch variable	6 years ago
Jacek Czaja	2632327429	[MKL-DNN] Tensor modifications revert (#16462 ) * Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)" This reverts commit `13816dd4ac`. Apart from enabling transformer for MKL-DNN * Revert "- MKL-DNN pooling updated to set_prim_desc" This reverts commit `c63f6b2039`. Conflicts: paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc * Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)" test=develop This reverts commit `dec9cf53c8`. * - concat compilation fix - lint test=develop - Lint fixes test=develop - Lint fixes test=develop - Fix Transpose MKLDNN op test=develop	6 years ago
Zeng Jinle	69cb9792ea	Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug Revert "Fix allocator bug"	6 years ago
sneaxiy	5656fa9f7c	fix travis ci test=develop	6 years ago
Zeng Jinle	174d0d0b90	Revert "Fix allocator bug" add include headers to fix travis-ci test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
Zeng Jinle	644e8af4cf	Merge pull request #16424 from sneaxiy/fix_allocator_bug Fix allocator bug	6 years ago
nhzlx	953bdde058	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD test=develop	6 years ago
sneaxiy	2d92b6be98	merge develop test=develop	6 years ago
Zeng Jinle	c64d959343	Merge pull request #16295 from zhhsplendid/zhenghuihuang-dev-2 Add support for init_memory and re-allocate_memory	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
sneaxiy	953214ad97	add more unittest modify allocator strategy remove changes of legacy buddy_allocator test=develop	6 years ago
Wu Yi	b7baeed7bb	fix win gpu build test=develop (#16334 )	6 years ago
zhhsplendid	124f1df481	Add flags for init and re-alloc gpu test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
Wu Yi	6382b62f6b	Collective ops (#15572 ) * wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop	6 years ago
zhhsplendid	22715487dc	add allocator flags test=develop	6 years ago
sneaxiy	fd23262e0c	merge develop, fix conflict test=develop	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
sneaxiy	682f2dbf29	merge develop test=develop	6 years ago
sneaxiy	2c4fcaa683	merge develop	6 years ago
chengduo	0979956619	Add memory profiler (#16137 ) test=develop	6 years ago
chengduo	ad80bde824	Revert "Revert "Add Event for TensorCopy"" (#16035 ) * Revert "Revert "Add Event for TensorCopy" (#16022)" This reverts commit `e2da3a5b22`. * use default stream test=develop	6 years ago
sneaxiy	2a639d5c2a	add allocator chain to fix bug test=develop	6 years ago
chengduo	e2da3a5b22	Revert "Add Event for TensorCopy" (#16022 ) * Revert "Add Event for TensorCopy (#15953)" This reverts commit `7235fd662b`. test=develop * fix CI test=develop	6 years ago
chengduo	7235fd662b	Add Event for TensorCopy (#15953 ) Add Event for TensorCopy	6 years ago
Tao Luo	4efdebc6f6	Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt Optimize gelu operation with mkl erf	6 years ago
dzhwinter	225c11a91f	polish cudnn related code and fix bug. (#15164 ) * staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop	6 years ago
xiaolil1	6724be2b0d	INT8 Pool kernel Key Creation Optimization. (#15883 ) * Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency. test=develop * Optimize key creation of pool fp32 grad. test=develop	6 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	6 years ago
peizhilin	c6472579c0	test=develop	6 years ago
peizhilin	b5d6e38b05	fix build issue for cudaEvent_t test=develop	6 years ago
wopeizl	3ccd8964a4	Merge pull request #15905 from wopeizl/win/fix_eigen fix build issue on windows for sample prop op	6 years ago
chengduo	8e904d322f	Remove unnecessary dependence for profiler (#15899 ) * refile profiler test=develop * follow comment test=develop	6 years ago
Xin Pan	44e7fcddc5	Merge pull request #15844 from panyx0718/infer add per kernel config and remove const_cast.	6 years ago
Jacek Czaja	dec9cf53c8	[MKL-DNN] MKL-DNN specific Tensor modification (#15429 ) * - Implemented draft of primitive desc keeping in Tensor test=develop - TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented - Added nchw and nc formats setting for sake of compatiblity Fixed unit tests - Worakaround to problem with 5D data in conv - Added 3D and 1D MKL-DNN formats for name handles for tensor test=develop - Fix to UTs test=develop - Conv fp32 op was updated Cosmetic fixes test=develop - tensor mkldnn cosmetics test=develop - Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils * - Lint fixes test=develop * - setting prim dec in Tensor , sets also layout to kMKLDNN test=develop * - Moved creation of prim desc totally out of Tensor test=develop * - Cosmetic fixes adter review test=develop	6 years ago
peizhilin	6ccdb1b947	fix build issue on windows for sample prop op test=develop	6 years ago
Dun	c6bd434ffe	add memset CUPTI && test=develop (#15868 )	6 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	6 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	6 years ago
chengduo	3b08c9abf4	enhance profiler (#15842 ) test=develop	6 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	6 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	6 years ago
Dun Liang	35a90e06bf	test=develop	6 years ago
Dun Liang	c9080f516b	test=develop	6 years ago
Dun Liang	1c7bb0e40c	test=develop	6 years ago
Xin Pan	5eb87506bc	add per kernel config and remove const_cast. test=develop	6 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	6 years ago
mozga-intel	13ec2d331b	Enable momentum operator for a ngraph engine (#15673 ) * Enable momentum operator for a ngraph engine test=develop * Update tests test=develop * Unnecessary line of the code as intended was removed test=develop	6 years ago
Tao Luo	c797a1f050	remove legacy any.cmake	6 years ago
Tao Luo	bd2fa73620	Merge pull request #15794 from sneaxiy/fix-warnings Fix compile warning	6 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	6 years ago
sneaxiy	9b8e0e2f17	fix enforce_test test=develop	6 years ago
sneaxiy	209b355762	fix many warning test=develop	6 years ago
Zeng Jinle	fc87ef741b	Merge pull request #15687 from sneaxiy/fix_enforce fix enforce	6 years ago
sneaxiy	f0590947c3	fix enforce test=develop	6 years ago
tensor-tang	31fd8ce1e1	Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator Enable batch_norm operator for a ngraph engine	6 years ago
dzhwinter	04e9776aef	add details. test=develop	6 years ago
mozga-intel	1198ccae6b	Enable batch_norm operator for a ngraph engine test=develop	6 years ago
peizhilin	883d22093a	fix the lib_any dependency test=develop	6 years ago
wopeizl	3614dadf23	Merge pull request #15631 from wopeizl/windows/fixci fix ci broken randomly and disable some warnings	6 years ago
peizhilin	061299be87	fix dependency test=develop	6 years ago
baojun	ac4cde009d	Enable accuracy op for ngraph engine (#15592 ) * Added accuracy ngraph op test=develop * fixed name type test=develop	6 years ago
dzhwinter	ce0394bcd0	merge develop branch. test=develop	6 years ago
guoshengCS	b6c3b69af8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size test=develop	6 years ago
liuwei1031	6e84eb131f	expose peak gpu memory API to python test=develop (#15529 ) * expose peak gpu memory API to python test=develop * add unittest for peak gpu memory monitoring test=develop * add pybind change test=develop * add mutex to gpu mem usage monitor test=develop * update benchmark flag definition file test=develop * tweak unittest for memory monitoring test=develop	6 years ago
guoshengCS	5dfce93101	To make CUDA_LAUNCH_KERNEL_HELPER support large size. test=develop	6 years ago
tensor-tang	8117725852	add jit kernel hsum, hmax and softmax refer code test=develop	6 years ago
sneaxiy	ba4f43fd62	fix compile error in distributed mode test=develop	6 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	6 years ago
Zeng Jinle	2480a3df7d	Merge pull request #15496 from sneaxiy/lazy_allocator2 Fix bug when user set CUDA_VISIBLE_DEVICES be empty and run CPU-only models	6 years ago
sneaxiy	9c360cc798	test=develop	6 years ago
Xin Pan	58cb18d9d9	Merge pull request #15322 from velconia/imperative_resnet Imperative Resnet	6 years ago
sneaxiy	51227bd447	lazy_allocator test=develop	6 years ago
tangwei12	8b50ad80ff	checkpoint at distributed training (#14854 ) checkpoint for distributed training.	6 years ago
minqiyang	8ce198b2e1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet test=develop	6 years ago
minqiyang	315b133e67	Add single GPU support to imperative	6 years ago
tensor-tang	3759c1db8c	Merge pull request #14805 from mozga-intel/mozga-intel/element_wise_operator_ngraph Enable element_wise_add operator for a ngraph engine	6 years ago
peizhilin	eea75a1d93	fix issue when type is invalid test=develop	6 years ago
peizhilin	9adb158e5b	Merge remote-tracking branch 'upstream/develop' into debug/support	6 years ago
chengduo	46d01d798e	Revert "Revert "Remove workspace_handle in conv_cudnn (#15186 )"" (#15290 ) test=develop This reverts commit `358e657f68`.	6 years ago
Wojciech Uss	cb2ba58458	Fix performance drop when with MKL-DNN test=develop	6 years ago
chengduozh	c4eced9881	fix thread safe bug test=develop	6 years ago
chengduozh	358e657f68	Revert "Remove workspace_handle in conv_cudnn (#15186 )" test=develop This reverts commit `064512aa47`.	6 years ago
wopeizl	5d9edb4124	Merge pull request #15156 from wopeizl/windows/fixgpuissue fix gpu buils issue on windows test=develop	6 years ago
chengduo	064512aa47	Remove workspace_handle in conv_cudnn (#15186 ) * remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop	6 years ago
xiaolil1	8f17c714de	Conv int8 residual (#15145 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Enable INT8 Conv with residual fusion OP test=develop * Modify code. test=develop * Modify basic INT8 Conv test=develop * Modify Conv. test=develop * fix style test=develop * Fix style test=develop * Fix test test=develop * Modify code. test=develop * Fix test test=develop	6 years ago
peizhilin	439691f5bd	adjust the shlwapi on windows test=develop	6 years ago
peizhilin	92da467c99	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	6 years ago
peizhilin	c1235c935f	add the enable_debug flag test=develop	6 years ago
Zeng Jinle	e29f10d315	Merge pull request #15207 from sneaxiy/remove_op_handle_lock_and_fix_var Remove op handle lock and fix var	6 years ago
mozga-intel	a42f8f4f6f	Enable element_wise_add operator for a ngraph test=develop	6 years ago
Zeng Jinle	c562be20d9	Merge pull request #15193 from sneaxiy/fix_cudnn_compatible_check Fix cudnn compatible check	6 years ago
peizhilin	1cd95d8a0b	use thread local instance test=develop	6 years ago
sneaxiy	ed409ac9f4	Revert "Revert "Remove op handle lock"" test=develop	6 years ago
peizhilin	d54133ea85	not include the numeric under linux test=develop	6 years ago
peizhilin	a6f5ceee74	add the python callstack for debug support test=develop	6 years ago
Zeng Jinle	dacfaaa966	Revert "Remove op handle lock" test=develop	6 years ago
xiaolil1	c8f101e5da	Conv int8 relu (#15130 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Modify basic INT8 Conv test=develop * fix type test=develop * Modify test test=develop	6 years ago
sneaxiy	9793a0b6a6	fix_cudnn_compatible_check	6 years ago
Zeng Jinle	ccb322d6a5	merge develop	6 years ago
Zeng Jinle	f3a13512fc	Merge pull request #15139 from sneaxiy/remove_op_handle_lock Remove op handle lock	6 years ago
xiaolil1	bbc9336878	Enable basic MKL-DNN INT8 Conv OP (#15124 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Modify basic INT8 Conv test=develop	6 years ago
peizhilin	c919b2f31d	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	6 years ago
peizhilin	fd4f4d0e5f	fix build issue test=develop	6 years ago
Yan Xu	a1e60ab19b	Merge pull request #14791 from Yancey1989/parallel_graph_mode [Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance	6 years ago
peizhilin	9ae50dd07d	fix gpu buils issue on windows test=develop	6 years ago
sneaxiy	d0a8a1e950	remove_op_handle_lock test=develop	6 years ago
Yancey1989	e65436103f	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	6 years ago
sneaxiy	6f06e6cdac	Merge remote origin test=develop	6 years ago
Xin Pan	9186451f60	hide GetTensor test=develop	6 years ago
sneaxiy	d25395fc98	remove tensor core lock test=develop	6 years ago
Yancey1989	82b42e31f0	polish unittest test=develop	6 years ago
Yancey1989	0a885ac12a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	6 years ago
peizhilin	813c2ce539	fix timer test=develop	6 years ago
wopeizl	7ab501264d	Merge pull request #15069 from wopeizl/windows/dsosupport add cuda dso support for windows	6 years ago
guru4elephant	ff739449ab	Merge pull request #15018 from guru4elephant/add_timer Add debug thread function for async executor	6 years ago
Yancey1989	4743c9cd5d	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
wopeizl	719ebe3786	Merge pull request #15070 from wopeizl/windows/testcasefix fix test issues on windows	6 years ago
Qiyang Min	0238a3bb4f	Merge pull request #14972 from velconia/accelerate_lstm Accelerate PADDLE_ENFORCE	6 years ago
Yancey1989	86bb583881	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
peizhilin	01c00b07dd	fix test issues on windows test=develop	6 years ago
peizhilin	1e7f83e60a	add cuda dso support for windows test=develop	6 years ago
Yancey1989	41a64f6a2a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
Wu Yi	856f0da0fe	Fp16 training (#14992 ) * wip * wip * wip * wip for test * add fp16 tests test=develop * fix cpu build test=develop * fix test=develop * fix py3 tests test=develop * fix lr_scheduler dtype test=develop * fix test=dvelop * test fix ci compile test=develop * fix build and merge test=develop * fallback momentumop change to general test=develop * make fp16 lr schedule simple test=develop * fix ut test=develop * fix tests test=develop * remove fp16 learning rate cast test=develop	6 years ago
chengduo	b9fb03cf54	Move GetTensor to tensor_util (#15011 ) * refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop	6 years ago
dongdaxiang	ab2abfc5b2	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	4cb833d2de	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
tensor-tang	f0e02a65ed	Merge pull request #14974 from xiaolil1/quantize Add Quantize OP	6 years ago
dongdaxiang	68a2d1f3d7	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer add timer_test test=develop	6 years ago
dongdaxiang	2e5ebc4594	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	5dfd9c9aa9	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	d0a5159946	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	f9b8168508	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
minqiyang	52b4821a6e	Fix Sprintf problem test=develop	6 years ago
minqiyang	010f657b33	Polish code test=develop	6 years ago
minqiyang	45acfbd011	1. Add specific condition for one or no arg in PADDLE_ENFORCE 2. Add unit test for new enforce feature test=develop	6 years ago
dongdaxiang	2dee8f6cd5	add TrainFilesWithTimer in async_executor	6 years ago
xiaoli.liu@intel.com	d83d0f33fd	extract templated function test=develop	6 years ago
wopeizl	b117a5f208	Merge pull request #14931 from wopeizl/windows/mkl add mkl support for windows	6 years ago
dongdaxiang	cf6188a823	add a linux timer	6 years ago
chengduo	79bd6dfa18	[Feature] Add Temporary Allocator (#14875 ) * Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop	6 years ago
minqiyang	e4719eb462	Fix bug in Windows VC 2010 test=develop	6 years ago
minqiyang	5a5c577529	Polish code test=develop	6 years ago
minqiyang	099186cd41	Support one argument PADDLE_ENFORCE test=develop	6 years ago
minqiyang	4af97c6946	Polish code	6 years ago
minqiyang	41b81293ab	Polish code test=develop	6 years ago
peizhilin	9e60c58666	Merge remote-tracking branch 'upstream/develop' into windows/mkl test=develop	6 years ago
minqiyang	bc66401566	Polish code test=develop	6 years ago
minqiyang	53619a79b4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm	6 years ago
peizhilin	b06ce129bc	some not so useful adjust test=develop	6 years ago
minqiyang	679d1a9e0b	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm	6 years ago
Jacek Czaja	709d9e3cb7	- Added reusing MKL-DNN primitives for Transpose MKL-DNN op test=develop	6 years ago

... 3 4 5 6 7 ...

947 Commits (6e5670b8bdb9a4c62a98b69ea6fe33b6ed38065b)