Paddle

Commit Graph

Author	SHA1	Message	Date
hutuxian	969e6378b9	Pipeline Concurrency (#17402 ) Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
Zeng Jinle	3925bd81e8	Fix cuda/cudnn version detection error (#17853 ) * fix cuda/cudnn version detection error, test=develop * fix again, test=develop	6 years ago
chengduo	d1169afaa3	remove InstallFailureSignalHandler (#17828 ) test=develop	6 years ago
Leo Zhao	50326563d5	enable mkldnn primitive reuse for platform reorder (#17826 ) test=develop	6 years ago
wangchaochaohu	c10157a5df	revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753 ) * revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop	6 years ago
chengduo	863c75168c	polish error doc (#17772 ) test=develop	6 years ago
gongweibao	0d561ef442	fix 2dconn test=develop (#17681 )	6 years ago
gongweibao	65bbf950ee	Add multi-ncclcomm and 2D ncclallreduce support. (#17263 )	6 years ago
wopeizl	6724a652f3	add __str__ method for tensor and lodtensor to support print test=dev… (#17588 ) * add __str__ method for tensor and lodtensor to support print test=develop	6 years ago
mozga-intel	f2694e122d	[NGraph] Enable assign operator for a ngraph, test=develop (#17437 ) * Enable assign operator for a ngraph, test=develop * Cross_entropy operators needs to be updated	6 years ago
Zeng Jinle	c6189637cd	Fix allocator bug (#16712 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop	6 years ago
mozga-intel	109b5aed5a	[NGraph] Enable reshape operator test=develop (#17512 )	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
qingqing01	97f0ec2357	Fix compiling error with cuDNN 5.1 (#17458 ) test=develop	6 years ago
Zeng Jinle	eab34b2df6	fix_dygraph_mem_leak, test=develop (#17396 )	6 years ago
qingqing01	e32c9888f5	Double backward of conv2d. (#17211 ) * Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables，return None in Python.	6 years ago
zhaoyuchen2018	792443ef23	Refine elementwise kernel. (#16952 ) * Refine elementwise kernel. Add a simple cuda kernel if grad x and y both exist Use 2D block cuda kernel to do broadcast. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	6 years ago
chengduo	db5e74ab95	update assert (#17282 ) test=develop	6 years ago
baojun	7bd1d03ee5	Adding lrn op for ngraph engine (#17189 ) * added lrn op test=develop * Added CreateConstant method test=develop * avoid duplicates test=develop	6 years ago
Tao Luo	ff1661f12a	remove unused FLAGS_warpctc_dir (#17162 ) * remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop	6 years ago
Huihuang Zheng	e4a5332416	Fix a typo in gpu_info.cc (#17175 ) test=develop	6 years ago
Huihuang Zheng	b9494058b3	Use CudnnWorkspaceHandle in exhaustive search (#17082 ) 1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop	6 years ago
Zeng Jinle	0c335dcd2c	Make conv cudnn workspace size configurable (#17036 ) * make_conv_cudnn_ws_size_configurable, test=develop * change std::max to std::min test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
xuezhong	742d758747	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_infershape_bug2	6 years ago
xuezhong	5663fbfb0a	fix infershape bug test=develop	6 years ago
Jacek Czaja	87a44b1149	[MKL-DNN] Added reusing of primitive descriptors (fp32) (#16667 ) * - Reuse of conv PD - conv transpose pd reused - Added PD reusing of softmax and Batch Norm - Refactoring and removal of not needed routines of mkl-dnn ops test=develop - Fix to reusing conv test=develop - Lint fixes test=develop - Further lint fixes test=develop - Lint fixes test=develop - lint fixes test=develop - Lint workaround test=develop * - Fix after review on including boost as third party header test=develop * - Fix after review. Name change to something more descriptive test=develop	6 years ago
dongdaxiang	a659b37ace	make lodtensor_printer usable in gpu setting test=develop	6 years ago
Chen Weihang	0b2aec14b6	Revert "Model data cryption link all lib (#16555 )" test=develop This reverts commit `c38c7c5619`.	6 years ago
Chen Weihang	c38c7c5619	Model data cryption link all lib (#16555 ) * link the libwbaes.so into paddle * polish detail, test=develop * try fix mac_pr_ci error, test=develop * add compile option, test=develop * fix ci error, test=develop * ignore failed to find mac lib, test=develop * change cdn to bj, cdn can't get the latest version * trigger ci, test=develop * temporary delete win32 lib linking, test=develop * change https to http, test=develop * turn compile option on to off * turn compile option off to on, test=develop * try lib compiled by gcc4.8, test=develop * update lib version, test=develop * link other lib, test=develop * add setup config * delete false, test=develop * delete no_soname, test=develop * recover so name set * fix, test=develop * adjust make config, test=develop * remove link to wbaes, test=develop * remove useless define, test=develop	6 years ago
guru4elephant	76b49f02ee	Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop Train with pipe reader merge develop	6 years ago
gongweibao	fea91164b7	Fix windows compilation error! (#16546 ) * fix compiled test=develop * follow comments test=develop	6 years ago
dongdaxiang	3a79be6eb3	refine API spec test=develop	6 years ago
dongdaxiang	98dda08a85	fix pull sparse slow problem test=develop	6 years ago
dongdaxiang	93c3c7f9b3	fix dataset testcase problem test=develop	6 years ago
dongdaxiang	d739bab844	fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem test=develop	6 years ago
dongdaxiang	e3107a6ae0	fix windows compile problem test=develop	6 years ago
dongdaxiang	398004ece0	disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer test=develop	6 years ago
dongdaxiang	39362a8415	move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids test=develop	6 years ago
dongdaxiang	a0b59773af	fix code style	6 years ago
dongdaxiang	365be5d559	support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem test=develop	6 years ago
dongdaxiang	dc8cf36e4b	add more example on datagenerator test=develop	6 years ago
dongdaxiang	6bf796df14	refine print fetch list	6 years ago
dongdaxiang	cf1360643f	add printer for fetch variable	6 years ago
Jacek Czaja	2632327429	[MKL-DNN] Tensor modifications revert (#16462 ) * Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)" This reverts commit `13816dd4ac`. Apart from enabling transformer for MKL-DNN * Revert "- MKL-DNN pooling updated to set_prim_desc" This reverts commit `c63f6b2039`. Conflicts: paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc * Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)" test=develop This reverts commit `dec9cf53c8`. * - concat compilation fix - lint test=develop - Lint fixes test=develop - Lint fixes test=develop - Fix Transpose MKLDNN op test=develop	6 years ago
Zeng Jinle	69cb9792ea	Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug Revert "Fix allocator bug"	6 years ago
sneaxiy	5656fa9f7c	fix travis ci test=develop	6 years ago
Zeng Jinle	174d0d0b90	Revert "Fix allocator bug" add include headers to fix travis-ci test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
Zeng Jinle	644e8af4cf	Merge pull request #16424 from sneaxiy/fix_allocator_bug Fix allocator bug	6 years ago
nhzlx	953bdde058	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD test=develop	6 years ago
sneaxiy	2d92b6be98	merge develop test=develop	6 years ago
Zeng Jinle	c64d959343	Merge pull request #16295 from zhhsplendid/zhenghuihuang-dev-2 Add support for init_memory and re-allocate_memory	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
sneaxiy	953214ad97	add more unittest modify allocator strategy remove changes of legacy buddy_allocator test=develop	6 years ago
Wu Yi	b7baeed7bb	fix win gpu build test=develop (#16334 )	6 years ago
zhhsplendid	124f1df481	Add flags for init and re-alloc gpu test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
Wu Yi	6382b62f6b	Collective ops (#15572 ) * wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop	6 years ago
zhhsplendid	22715487dc	add allocator flags test=develop	6 years ago
sneaxiy	fd23262e0c	merge develop, fix conflict test=develop	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
sneaxiy	682f2dbf29	merge develop test=develop	6 years ago
sneaxiy	2c4fcaa683	merge develop	6 years ago
chengduo	0979956619	Add memory profiler (#16137 ) test=develop	6 years ago
chengduo	ad80bde824	Revert "Revert "Add Event for TensorCopy"" (#16035 ) * Revert "Revert "Add Event for TensorCopy" (#16022)" This reverts commit `e2da3a5b22`. * use default stream test=develop	6 years ago
sneaxiy	2a639d5c2a	add allocator chain to fix bug test=develop	6 years ago
chengduo	e2da3a5b22	Revert "Add Event for TensorCopy" (#16022 ) * Revert "Add Event for TensorCopy (#15953)" This reverts commit `7235fd662b`. test=develop * fix CI test=develop	6 years ago
chengduo	7235fd662b	Add Event for TensorCopy (#15953 ) Add Event for TensorCopy	6 years ago
Tao Luo	4efdebc6f6	Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt Optimize gelu operation with mkl erf	6 years ago
dzhwinter	225c11a91f	polish cudnn related code and fix bug. (#15164 ) * staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop	6 years ago
xiaolil1	6724be2b0d	INT8 Pool kernel Key Creation Optimization. (#15883 ) * Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency. test=develop * Optimize key creation of pool fp32 grad. test=develop	6 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	6 years ago
peizhilin	c6472579c0	test=develop	6 years ago
peizhilin	b5d6e38b05	fix build issue for cudaEvent_t test=develop	6 years ago
wopeizl	3ccd8964a4	Merge pull request #15905 from wopeizl/win/fix_eigen fix build issue on windows for sample prop op	6 years ago
chengduo	8e904d322f	Remove unnecessary dependence for profiler (#15899 ) * refile profiler test=develop * follow comment test=develop	6 years ago
Xin Pan	44e7fcddc5	Merge pull request #15844 from panyx0718/infer add per kernel config and remove const_cast.	6 years ago
Jacek Czaja	dec9cf53c8	[MKL-DNN] MKL-DNN specific Tensor modification (#15429 ) * - Implemented draft of primitive desc keeping in Tensor test=develop - TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented - Added nchw and nc formats setting for sake of compatiblity Fixed unit tests - Worakaround to problem with 5D data in conv - Added 3D and 1D MKL-DNN formats for name handles for tensor test=develop - Fix to UTs test=develop - Conv fp32 op was updated Cosmetic fixes test=develop - tensor mkldnn cosmetics test=develop - Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils * - Lint fixes test=develop * - setting prim dec in Tensor , sets also layout to kMKLDNN test=develop * - Moved creation of prim desc totally out of Tensor test=develop * - Cosmetic fixes adter review test=develop	6 years ago
peizhilin	6ccdb1b947	fix build issue on windows for sample prop op test=develop	6 years ago
Dun	c6bd434ffe	add memset CUPTI && test=develop (#15868 )	6 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	6 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	6 years ago
chengduo	3b08c9abf4	enhance profiler (#15842 ) test=develop	6 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	6 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	6 years ago
Dun Liang	35a90e06bf	test=develop	6 years ago
Dun Liang	c9080f516b	test=develop	6 years ago
Dun Liang	1c7bb0e40c	test=develop	6 years ago
Xin Pan	5eb87506bc	add per kernel config and remove const_cast. test=develop	6 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	6 years ago
mozga-intel	13ec2d331b	Enable momentum operator for a ngraph engine (#15673 ) * Enable momentum operator for a ngraph engine test=develop * Update tests test=develop * Unnecessary line of the code as intended was removed test=develop	6 years ago
Tao Luo	c797a1f050	remove legacy any.cmake	6 years ago
Tao Luo	bd2fa73620	Merge pull request #15794 from sneaxiy/fix-warnings Fix compile warning	6 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	6 years ago
sneaxiy	9b8e0e2f17	fix enforce_test test=develop	6 years ago
sneaxiy	209b355762	fix many warning test=develop	6 years ago
Zeng Jinle	fc87ef741b	Merge pull request #15687 from sneaxiy/fix_enforce fix enforce	6 years ago
sneaxiy	f0590947c3	fix enforce test=develop	6 years ago
tensor-tang	31fd8ce1e1	Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator Enable batch_norm operator for a ngraph engine	6 years ago
dzhwinter	04e9776aef	add details. test=develop	6 years ago
mozga-intel	1198ccae6b	Enable batch_norm operator for a ngraph engine test=develop	6 years ago
peizhilin	883d22093a	fix the lib_any dependency test=develop	6 years ago
wopeizl	3614dadf23	Merge pull request #15631 from wopeizl/windows/fixci fix ci broken randomly and disable some warnings	6 years ago
peizhilin	061299be87	fix dependency test=develop	6 years ago
baojun	ac4cde009d	Enable accuracy op for ngraph engine (#15592 ) * Added accuracy ngraph op test=develop * fixed name type test=develop	6 years ago
dzhwinter	ce0394bcd0	merge develop branch. test=develop	6 years ago
guoshengCS	b6c3b69af8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size test=develop	6 years ago
liuwei1031	6e84eb131f	expose peak gpu memory API to python test=develop (#15529 ) * expose peak gpu memory API to python test=develop * add unittest for peak gpu memory monitoring test=develop * add pybind change test=develop * add mutex to gpu mem usage monitor test=develop * update benchmark flag definition file test=develop * tweak unittest for memory monitoring test=develop	6 years ago
guoshengCS	5dfce93101	To make CUDA_LAUNCH_KERNEL_HELPER support large size. test=develop	6 years ago
tensor-tang	8117725852	add jit kernel hsum, hmax and softmax refer code test=develop	6 years ago
sneaxiy	ba4f43fd62	fix compile error in distributed mode test=develop	6 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	6 years ago
Zeng Jinle	2480a3df7d	Merge pull request #15496 from sneaxiy/lazy_allocator2 Fix bug when user set CUDA_VISIBLE_DEVICES be empty and run CPU-only models	6 years ago
sneaxiy	9c360cc798	test=develop	6 years ago
Xin Pan	58cb18d9d9	Merge pull request #15322 from velconia/imperative_resnet Imperative Resnet	6 years ago
sneaxiy	51227bd447	lazy_allocator test=develop	6 years ago
tangwei12	8b50ad80ff	checkpoint at distributed training (#14854 ) checkpoint for distributed training.	6 years ago
minqiyang	8ce198b2e1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet test=develop	6 years ago
minqiyang	315b133e67	Add single GPU support to imperative	6 years ago
tensor-tang	3759c1db8c	Merge pull request #14805 from mozga-intel/mozga-intel/element_wise_operator_ngraph Enable element_wise_add operator for a ngraph engine	6 years ago
peizhilin	eea75a1d93	fix issue when type is invalid test=develop	6 years ago
peizhilin	9adb158e5b	Merge remote-tracking branch 'upstream/develop' into debug/support	6 years ago
chengduo	46d01d798e	Revert "Revert "Remove workspace_handle in conv_cudnn (#15186 )"" (#15290 ) test=develop This reverts commit `358e657f68`.	6 years ago
Wojciech Uss	cb2ba58458	Fix performance drop when with MKL-DNN test=develop	6 years ago
chengduozh	c4eced9881	fix thread safe bug test=develop	6 years ago
chengduozh	358e657f68	Revert "Remove workspace_handle in conv_cudnn (#15186 )" test=develop This reverts commit `064512aa47`.	6 years ago
wopeizl	5d9edb4124	Merge pull request #15156 from wopeizl/windows/fixgpuissue fix gpu buils issue on windows test=develop	6 years ago
chengduo	064512aa47	Remove workspace_handle in conv_cudnn (#15186 ) * remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop	6 years ago
xiaolil1	8f17c714de	Conv int8 residual (#15145 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Enable INT8 Conv with residual fusion OP test=develop * Modify code. test=develop * Modify basic INT8 Conv test=develop * Modify Conv. test=develop * fix style test=develop * Fix style test=develop * Fix test test=develop * Modify code. test=develop * Fix test test=develop	6 years ago
peizhilin	439691f5bd	adjust the shlwapi on windows test=develop	6 years ago
peizhilin	92da467c99	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	6 years ago
peizhilin	c1235c935f	add the enable_debug flag test=develop	6 years ago
Zeng Jinle	e29f10d315	Merge pull request #15207 from sneaxiy/remove_op_handle_lock_and_fix_var Remove op handle lock and fix var	6 years ago
mozga-intel	a42f8f4f6f	Enable element_wise_add operator for a ngraph test=develop	6 years ago
Zeng Jinle	c562be20d9	Merge pull request #15193 from sneaxiy/fix_cudnn_compatible_check Fix cudnn compatible check	6 years ago
peizhilin	1cd95d8a0b	use thread local instance test=develop	6 years ago
sneaxiy	ed409ac9f4	Revert "Revert "Remove op handle lock"" test=develop	6 years ago
peizhilin	d54133ea85	not include the numeric under linux test=develop	6 years ago
peizhilin	a6f5ceee74	add the python callstack for debug support test=develop	6 years ago
Zeng Jinle	dacfaaa966	Revert "Remove op handle lock" test=develop	6 years ago
xiaolil1	c8f101e5da	Conv int8 relu (#15130 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Modify basic INT8 Conv test=develop * fix type test=develop * Modify test test=develop	6 years ago
sneaxiy	9793a0b6a6	fix_cudnn_compatible_check	6 years ago
Zeng Jinle	ccb322d6a5	merge develop	6 years ago
Zeng Jinle	f3a13512fc	Merge pull request #15139 from sneaxiy/remove_op_handle_lock Remove op handle lock	6 years ago
xiaolil1	bbc9336878	Enable basic MKL-DNN INT8 Conv OP (#15124 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Modify basic INT8 Conv test=develop	6 years ago
peizhilin	c919b2f31d	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	6 years ago
peizhilin	fd4f4d0e5f	fix build issue test=develop	6 years ago
Yan Xu	a1e60ab19b	Merge pull request #14791 from Yancey1989/parallel_graph_mode [Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance	6 years ago
peizhilin	9ae50dd07d	fix gpu buils issue on windows test=develop	6 years ago
sneaxiy	d0a8a1e950	remove_op_handle_lock test=develop	6 years ago
Yancey1989	e65436103f	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	6 years ago
sneaxiy	6f06e6cdac	Merge remote origin test=develop	6 years ago
Xin Pan	9186451f60	hide GetTensor test=develop	6 years ago
sneaxiy	d25395fc98	remove tensor core lock test=develop	6 years ago
Yancey1989	82b42e31f0	polish unittest test=develop	6 years ago
Yancey1989	0a885ac12a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	6 years ago
peizhilin	813c2ce539	fix timer test=develop	6 years ago
wopeizl	7ab501264d	Merge pull request #15069 from wopeizl/windows/dsosupport add cuda dso support for windows	6 years ago
guru4elephant	ff739449ab	Merge pull request #15018 from guru4elephant/add_timer Add debug thread function for async executor	6 years ago
Yancey1989	4743c9cd5d	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
wopeizl	719ebe3786	Merge pull request #15070 from wopeizl/windows/testcasefix fix test issues on windows	6 years ago
Qiyang Min	0238a3bb4f	Merge pull request #14972 from velconia/accelerate_lstm Accelerate PADDLE_ENFORCE	6 years ago
Yancey1989	86bb583881	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
peizhilin	01c00b07dd	fix test issues on windows test=develop	6 years ago
peizhilin	1e7f83e60a	add cuda dso support for windows test=develop	6 years ago
Yancey1989	41a64f6a2a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
Wu Yi	856f0da0fe	Fp16 training (#14992 ) * wip * wip * wip * wip for test * add fp16 tests test=develop * fix cpu build test=develop * fix test=develop * fix py3 tests test=develop * fix lr_scheduler dtype test=develop * fix test=dvelop * test fix ci compile test=develop * fix build and merge test=develop * fallback momentumop change to general test=develop * make fp16 lr schedule simple test=develop * fix ut test=develop * fix tests test=develop * remove fp16 learning rate cast test=develop	6 years ago
chengduo	b9fb03cf54	Move GetTensor to tensor_util (#15011 ) * refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop	6 years ago
dongdaxiang	ab2abfc5b2	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	4cb833d2de	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
tensor-tang	f0e02a65ed	Merge pull request #14974 from xiaolil1/quantize Add Quantize OP	6 years ago
dongdaxiang	68a2d1f3d7	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer add timer_test test=develop	6 years ago
dongdaxiang	2e5ebc4594	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	5dfd9c9aa9	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	d0a5159946	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
dongdaxiang	f9b8168508	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	6 years ago
minqiyang	52b4821a6e	Fix Sprintf problem test=develop	6 years ago
minqiyang	010f657b33	Polish code test=develop	6 years ago
minqiyang	45acfbd011	1. Add specific condition for one or no arg in PADDLE_ENFORCE 2. Add unit test for new enforce feature test=develop	6 years ago
dongdaxiang	2dee8f6cd5	add TrainFilesWithTimer in async_executor	6 years ago
xiaoli.liu@intel.com	d83d0f33fd	extract templated function test=develop	6 years ago
wopeizl	b117a5f208	Merge pull request #14931 from wopeizl/windows/mkl add mkl support for windows	6 years ago
dongdaxiang	cf6188a823	add a linux timer	6 years ago
chengduo	79bd6dfa18	[Feature] Add Temporary Allocator (#14875 ) * Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop	6 years ago
minqiyang	e4719eb462	Fix bug in Windows VC 2010 test=develop	6 years ago
minqiyang	5a5c577529	Polish code test=develop	6 years ago
minqiyang	099186cd41	Support one argument PADDLE_ENFORCE test=develop	6 years ago
minqiyang	4af97c6946	Polish code	6 years ago
minqiyang	41b81293ab	Polish code test=develop	6 years ago
peizhilin	9e60c58666	Merge remote-tracking branch 'upstream/develop' into windows/mkl test=develop	6 years ago
minqiyang	bc66401566	Polish code test=develop	6 years ago
minqiyang	53619a79b4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm	6 years ago
peizhilin	b06ce129bc	some not so useful adjust test=develop	6 years ago
minqiyang	679d1a9e0b	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm	6 years ago
Jacek Czaja	709d9e3cb7	- Added reusing MKL-DNN primitives for Transpose MKL-DNN op test=develop	6 years ago
peizhilin	40a94a138f	remove irrelevant fix for mkl test=develop	6 years ago
mozga-intel	9035bb81fe	Enable mul operator for a ngraph engine (#14801 ) * Enable mul operator for a ngraph test=develop * Enable activation ops test test=develop * Remove unused line test=develop	6 years ago
peizhilin	07c7eaabb4	Merge remote-tracking branch 'upstream/develop' into windows/mkl test=develop	6 years ago
peizhilin	ed5bd5e586	test=develop	6 years ago
peizhilin	19ebd8b4cf	add ctc support for windows	6 years ago
minqiyang	a3fa3f85d7	Polish code test=develop	6 years ago
Yu Yang	2803cf5776	Merge pull request #14868 from reyoung/feature/refine_w2v Feature/refine w2v	6 years ago
peizhilin	b601f2de8d	include the mkl fix only test=develop	6 years ago
peizhilin	5a6d7fe2ff	add mkl,ctc support for windows	6 years ago
wopeizl	0f085f0a5a	Merge pull request #14892 from wopeizl/windows/port3 fix script issue	6 years ago
Zeng Jinle	36a1d021a4	Merge pull request #14927 from sneaxiy/fix_cuda_stream_callback_in_cuda10 Fix stream_callback_manager bug in CUDA 10	6 years ago
wopeizl	fa78fc60be	Merge pull request #14907 from wopeizl/windows/avx add avx support for windows	6 years ago
sneaxiy	2373aeb5e8	fix bug test=develop	6 years ago
minqiyang	aa41ee75a1	Accelerate PADDLE_ENFORCE	6 years ago
peizhilin	41456e1723	Remove the useless definition test=develop	6 years ago
Yu Yang	740e1626ce	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_w2v test=develop	6 years ago
Yancey1989	a760a550b0	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
peizhilin	d519fd6944	test=develop	6 years ago
Yu Yang	bacf1d2399	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type	6 years ago
Yan Chunwei	a985949be9	Fea/fuse conv elementwise add fuse (#14669 )	6 years ago
Yancey1989	4a4ccac1d0	update by comment test=develop	6 years ago
peizhilin	23dec78772	fix script issue test=develop	6 years ago
Yancey1989	c722b1dcb6	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	6 years ago
Yu Yang	4ecdb6f486	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type test=develop	6 years ago
Yu Yang	7b10bf0e60	Use mkl	6 years ago
sneaxiy	ca84c2ca8f	merge develop test=develop	6 years ago
Yu Yang	81520a24cf	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_eigen_tensor	6 years ago
Yu Yang	9bd70a1e04	Change tensor uses proto::VarType::type test=develop	6 years ago
Yu Yang	8175983ef9	Merge pull request #14814 from reyoung/feature/gprof Add gperftools supports for PE	6 years ago
Yu Yang	5e60906996	Fix compile error test=develop	6 years ago
Yu Yang	7604b1ad51	Fix Eigen macro when using GPU The macro should be defined by compiler rather than by source. test=develop	6 years ago
sneaxiy	7923042365	merge develop test=develop	6 years ago
Yu Yang	b22d638d8f	Speed up SizeOfType test=develop	6 years ago
Yancey1989	2dda19f756	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	6 years ago
sneaxiy	66182abda6	add cuda cudnn version check test=develop	6 years ago
Zeng Jinle	add98c9e7d	Merge pull request #14745 from sneaxiy/fix_eigen_deallocate Fix eigen deallocate bug	6 years ago
Yancey1989	cb8a24be14	clean code	6 years ago
Tao Luo	54fcafb5f6	Merge pull request #14707 from yihuaxu/develop_4f71a6ee2_conv3d_mkldnn_opt Implement conv3d with mkldnn library	6 years ago
Yancey1989	c9de6f1b05	init parallel graph mode	6 years ago
sneaxiy	0f96c2e80f	fix thread-safety bug test=develop	6 years ago
Yihua Xu	65dbc7cca4	Merge branch 'develop' into develop_4f71a6ee2_conv3d_mkldnn_opt	6 years ago
tensor-tang	4a93db9288	remove jit namespace test=develop	6 years ago
sneaxiy	900765224c	fix deallocate bug test=develop	6 years ago
liuhongyu	773dc73fbf	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_5_support	6 years ago
liuhongyu	8daf67f90f	fix bugs; test=develop	6 years ago
Xin Pan	052cc5f538	Merge pull request #14725 from ZongwuYang/my-cool-stuff My cool stuff	6 years ago
Wu Yi	29d9fb53fc	[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661 ) * wip multi process multi gpu dist training * workable for p2p * update test=develop * change back env name test=develop * fix alloc init * fix cpu build test=devlop * fix mac tests test=develop * refine code * refine test=develop	6 years ago
liuhongyu	968dd3c078	add cudnn 5 support; test=develop	6 years ago
ZongwuYang	1560eb4a6d	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into my-cool-stuff	6 years ago
ZongwuYang	deb04809bd	test=develop Fix the bug that profiler cannot trace the nccl allreduce operator	6 years ago
sneaxiy	35a2578426	fix bug test=develop	6 years ago
sneaxiy	64ad051b9a	merge develop test=develop	6 years ago
sneaxiy	c47c451a00	fix bug	6 years ago
Yihua Xu	669191c9cc	Implement conv3d with mkldnn library (test=develop)	6 years ago
Hongyu Liu	4f71a6ee2c	Merge pull request #14622 from PaddlePaddle/add_cudnn_lstm Add cudnn lstm	6 years ago
Yibing Liu	c7382df80f	Print assert failure id in lookup_table_op (#14698 )	6 years ago
sneaxiy	096673f675	refactor eager deletion test=develop	6 years ago
phlrain	cf1fe61004	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm	6 years ago
Tao Luo	20120d9c97	Merge pull request #14608 from jczaja/prv-conv2d-transpose-mkldnn [MKL-DNN]conv2d transpose	6 years ago
Tao Luo	ea47685f91	Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum Softmax for inference MKL further changes	6 years ago
minqiyang	a02ce58f2c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog test=develop	6 years ago
Tao Luo	4ec9de0122	Merge pull request #14628 from Sand3r-/mgallus/mkldnn-elementwise_mul EltwiseMul: Changes from previous PR	6 years ago
Clementine	6c71c1f8f9	Add activation gelu (#14569 )	6 years ago
Michal Gallus	9455be0ba5	EltwiseMul: Extract StringToFormat to MKLDNN helper test=develop	6 years ago
Jacek Czaja	8bfa1fa9bb	- ASUM MKL integration	6 years ago
liuhongyu	05917c3c79	add cudnn lstm; test=develop	6 years ago
peizhilin	38715e6fd0	minor fix	6 years ago
Jacek Czaja	fb24690a58	- conv2d transpose MKL-DNN test=develop - Added new header for MKLDNN reuse functionality - Extended conv2d_transpose GetExpectedKernelType for MKL-DNN supporrt - Buildable conv transpose mkldnn and conv mkldnn using conv template - Conv2d transpose roughlt implemented and buildable - Added modifications conv2d transpose MKLDNN unit tests - Fix to UT of conv2d transpose mkldnn op - Wrong type of MKLDNN primitive was chosen for conv2d transpose - HAcks for conv2d transpose - UT enalbed - Replaced copying loop with memcpy - Draft of passing lambda into AcquireMemory - Made reorder (IOHW->OIHW) to be called only once	6 years ago
minqiyang	be04d99fe4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog test=develop	6 years ago
minqiyang	53433d7f2e	Revert the changes of VLOG test=develop	6 years ago
peizhilin	36cd18b549	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
peizhilin	b2f8d4183d	Given the different fraction_of_gpu_memory_to_use depends on platform	6 years ago
Yu Yang	26af9cf90c	Merge pull request #14565 from chengduoZH/fix_cublas_warp_error Fix cublas warp error	6 years ago
chengduozh	f7847ca6a3	fix cublas warp error test=develop	6 years ago
luotao1	e21edb26f6	add Set/GetCPUNumThreads api	6 years ago
peizhilin	445fff24dc	add the bigobj option to NVCC compile fix code style	6 years ago
chengduo	00b9e9a135	Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929 ) * refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop	6 years ago
peizhilin	7c8c9dc9bf	fix unit test cases	6 years ago
wopeizl	d9a1f3e58e	Windows/online (#14474 ) * add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop	6 years ago
peizhilin	6e66fadb95	clean up the pre-definitions on windows	6 years ago
peizhilin	67562a6fcd	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
peizhilin	703b26e697	add profiler, parallel_executor back	6 years ago
chengduo	a8d3aaae2a	print output log warning (#14497 ) test=develop	6 years ago
peizhilin	3a72a634cf	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
peizhilin	ee0fd78c81	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
Yu Yang	f1a392a5fe	Merge pull request #13804 from sneaxiy/rewrite_allocation Rewrite allocation	6 years ago
qingqing01	fd7e643153	Convolution fusion operator. (#14449 ) * Convolution fusion operator. * Clean code test=develop	6 years ago
Yu Yang	98bbfc17be	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation test=develop	6 years ago
peizhilin	c59d3e83bc	test case fix	6 years ago
peizhilin	8580b7a130	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
Wu Yi	b32c13dc20	Add cudnn ctc loss (#12366 ) * add cudnn ctc loss * wip add test test=develop * wip * wip * done test=develop * move include cudnn test=develop * test test=develop * fix build test=develop * fix build test=develop * fix build on cudnn5 test=develop * fix cudnn5 build test=develop * fix cudnn5 build test=develop * merge develop softmax functor change test=develop	6 years ago
peizhilin	d1a1fafc4c	code style	6 years ago
peizhilin	162f2d4109	disable the openblas multi-thread on windows since no support adjust the python script	6 years ago
Yu Yang	c8f6e70ab4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation test=develop	6 years ago
peizhilin	d1429ac4a5	add recordio support	6 years ago
Yu Yang	0d6718fcbd	Pass compile	6 years ago
peizhilin	be332a13bc	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
Yu Yang	d93b2d0365	Refine code	6 years ago
peizhilin	1a9008c420	code style fix test=develop	6 years ago
tensor-tang	1be85d011d	add mkl vsqr and vpow	6 years ago

... 4 5 6 7 8 ...

947 Commits (6e5670b8bdb9a4c62a98b69ea6fe33b6ed38065b)