yaoxuefeng
660ff18488
fix datsset test=develop ( #23043 )
5 years ago
Zhang Ting
714b0076b6
Override GetKernelTypeForVar to avoid device transform, test=develop ( #23032 )
5 years ago
wangchaochaohu
112e3edbf6
fix the conv group problem test=develop ( #23025 )
5 years ago
Wilber
db40ee86db
fix unittets. test=develop ( #23018 )
5 years ago
wangchaochaohu
99db0cf762
remove debug log test=develop ( #22994 )
5 years ago
wangchaochaohu
3757e0687c
Add Unittest for backward of fusion group ( #22932 )
...
* add fusion group test for backward and refine code
5 years ago
chengjuntao
63f3ada7b9
fix bug which input shape ( #22965 )
...
* fix bug which input shape, test=develop
* add error type,test=develop
5 years ago
Zhang Ting
137d6563fc
add check for assigned data, test=develop ( #22960 )
5 years ago
wangchaochaohu
f0d193a23c
Cast fusion for fusion group ( #22876 )
...
* add support for expression type convert and add cast Op support in fusion group
5 years ago
yaoxuefeng
29a7a52d38
Fix instag ( #22632 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* fix compile warning test=develop
* add attr default test=develop
* add unittest test=develop
* fix style test=develop
* fix style test=develop
* change out_val_ifempty to out_val_if_empty test=develop
5 years ago
wangchaochaohu
c979c9f2b0
refine the profiler print test=develop ( #22968 )
5 years ago
Wilber
ff3ddbb502
add skip_layernorm pass. test=develop ( #22895 )
...
* add skip_layernorm pass. test=develop
5 years ago
wawltor
f154d5860f
Speed up the matmul op, use the gemm replace the batch gemm ( #22926 )
...
In the op of gemm, we use the gemm to replace batch gemm, speed up the matmul op
5 years ago
Adam
056edf3929
Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )
5 years ago
Zhaolong Xing
8d6dc102fe
[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse ( #22494 )
...
* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop
* 1. refine fc
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
guofei
3d8571e884
modify assign op and add unittest of assign op ( #22769 )
...
As the title.
5 years ago
Zeng Jinle
d33c4343e1
Imperative tracer refactoring ( #22457 )
...
* refine grad maker, test=develop
* refactor tracer stage 1, test=develop
* merge develop to solve conflict third times, test=develop
5 years ago
liu zhengxi
61fef9754b
Fix fc padding bug during inference fusion ( #22860 )
...
* fix fc padding during fusion, test=develop
* fix optim model inference after SaveOptimModel, test=develop
5 years ago
tangwei12
ad9c8f6d2d
fix communicator when break under pyreder mode ( #22911 )
...
* fix communicator when breaking under PyReader mode, test=develop
* revert some vlog level to 0, test=develop
5 years ago
mapingshuo
5ba9dfc16a
add lookup_table_dequant_op ( #22900 )
...
add lookup_table_dequant_op
5 years ago
zhaoyuchen2018
a020a25797
Fix model int8 quant fail, test=develop ( #22891 )
...
As model fails when enable int8 quant, so disable allocate memory in cpu
for small variable.
5 years ago
Zhaolong Xing
dd67d44a50
[Paddle-TRT] : (Part1) Dynamic shape support ( #22868 )
...
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
5 years ago
tangwei12
07e13b84cd
remove vlog, test=develop ( #22898 )
5 years ago
Zhang Ting
ca9c8b417d
fix compute ratio of profile, test=develop ( #22872 )
5 years ago
wangchaochaohu
dbb0b9b3b6
refine the profiler print ( #22823 )
...
* refine the profiler print test=develop
5 years ago
Michał Gallus
0038bfbd1d
Prevent loading of warmup data in analyzer_int8 if enable_int8 is set to false ( #22857 )
5 years ago
Chen Weihang
1644926a6c
Polish detail implement of dygraph data loader ( #22878 )
...
* polish detail implement of data loader, test=develop
* solve coverage ci problem, test=develop
5 years ago
Wilber
f686310d81
fix concat_mkldnn op. test=develop ( #22692 )
...
fix concat_mkldnn op when encounter extreame conditions.
5 years ago
hong
5191e54494
reduce default attrs for dynamic graph ( #22850 )
...
* reduce default attrs for dynamic graph, test=develop
* add some explanations for explicit attr, test=develop
* tweak explicit attr comments, test=develop
5 years ago
Zhaolong Xing
1a533ed2de
[BUG]: Multihead matmul op's ouput size should be BxSx(N*H) ( #22848 )
...
test=develop
5 years ago
hong
c736fef93b
dygraph backward engine accelerate ( #22808 )
...
* fix loaded program load bug; test=develop
* first version
* speed backward engin; test=develop
* remove useless code; test=develop
* reconvery io.py; test=develop
* remove useless code; test=develop
* remove useless code; test=develop
5 years ago
Zeng Jinle
d41d802ba3
Add flags to limit gpu memory ( #22793 )
...
* add recorded cuda memory apis, fix typo, test=develop
* add more ut, test=develop
* follow comments, test=develop
* fix py35 incompatible issues, test=develop
5 years ago
石晓伟
1861ca88f1
serialize the PaddleTensor, test=develop ( #22810 )
...
* encapsulate the PaddleTensorToLoDTensor, test=develop
* serialize the pd_tensor, test=develop
* serialize tensors to file, test=develop
5 years ago
Zhang Ting
72ff5a09c3
fix print bug of profile, test=develop ( #22804 )
5 years ago
Zhang Ting
4e8bc02461
add fluid.device_guard to specify the device type for Op ( #22254 )
...
* add fluid.device_guard to specify the device type for Op
5 years ago
石晓伟
ddb9b46fec
change the function in op_teller, test=develop ( #22794 )
...
* change the function in op_teller, test=develop
* correct the commit-id, test=develop
5 years ago
Zhen Wang
89cfa49156
Unmerged fetch list ( #22635 )
...
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
* add the unit test for fetch_unmerged.
* update ut for multi-card and multi-cpu.
* add the error message and the user suggestion in FetchOpHandle. test=develop
5 years ago
wangchaochaohu
8456c3f4dd
polish the profiler_help code ( #22811 )
5 years ago
zhongpu
2fd1ec1e3e
fix docker build for paddle openblas, test=develop ( #22795 )
5 years ago
Chen Weihang
7d8d573453
Speed up dygraph DataLoader based on shared memory and LoDTensor serialization ( #22541 )
...
* add lodtensor share memory & serialization, test=develop
* fix windows compile error, test=develop
* deal vartype pickle & fix unittest matching error message, test=develop
* update timeout variable name, test=develop
* refactor memory map implement, test=develop
* clear mmap file discripter when exit unexpectedly, test=develop
* remove the child process fd in advance, test=develop
* remove mmap fds after Queue.put in child process, test=develop
* add hard unittests for register exit func, test=develop
* fix python2 compatibility problem in unittest, test=develop
* fix exception unittest error, test=develop
* polish code based review comment, test=develop
5 years ago
liu zhengxi
324f2b3922
Fix inference c api PD_GetZeroCopyOutput lod ( #22768 )
...
* fix inference c api lod, test=develop
* fix capi lod problem and enrich tests, test=develop
* delete useless header files and alter const_cast, test=develop
5 years ago
wangchaochaohu
7578fcbac4
Profile code refine ( #22800 )
...
* add profiler_help.h to refine the code test=develop
5 years ago
hutuxian
53a2b68f4e
support customized download command in dataset ( #22782 )
...
* user can call dataset.set_download_cmd to set its customized download cmd
* add UT to cover this scenario
5 years ago
wangchaochaohu
ca9e77a8d4
add sum op support for fusion group ( #22771 )
...
* Add the codegen and auto fusion for sum Op in fusion group
5 years ago
tianshuo78520a
433cef03e5
fix typo word ( #22784 )
5 years ago
Kaipeng Deng
ebc7ffc300
fix detection_map. test=develop ( #22705 )
5 years ago
zhaoyuchen2018
72dde4abde
Refine adam op to improve performance, test=develop ( #22346 )
...
* Refine adam op, test=develop
* Fuse kernels together to reduce cpu time.
* Refine paddle enforce, test=develop
* Remove some comments, test=develop
* Refine code,test=develop
* Refine cuda kernel, test=develop
* Refine code according to comments, test=develop
5 years ago
wangguanzhong
f2d1cd119a
fix lod level, test=develop ( #22755 )
5 years ago
FlyingQianMM
79d712346f
Correct CPU gradients of the argsort op ( #22739 )
...
* Correct CPU gradients of the argsort op, form a network to test its forward and backward process, test=develop
* fix dynamic threshold error in test_argsort_op, test=develop
5 years ago
Adam
2b80e9a719
Add cpu_info without XBYAK ( #22716 )
5 years ago
guofei
ae8b5f11a3
Change ShareDataWith() to TensorCopy() in ref_by_trainer_id ( #22717 )
...
As the title
5 years ago
liu zhengxi
71ab0458e1
Fix pointer and c-api encapsulation ( #22663 )
...
* refine pointer and c-api prototype, test=develop
* fix new c api profile bug, test=develop
* add unit tests, test=develop
5 years ago
Leo Chen
b2c1be851a
support cond in clone, test=develop ( #22657 )
...
* support cond in clone, test=develop
* refine code, test=develop
* refine code, test=develop
* follow comments, test=develop
* refine code, test=develop
5 years ago
Zhang Ting
f97f3f9301
add framework overhead ratio in profile report ( #22590 )
...
* add framework overhead ratio, test=develop
* print GpuMemcpy overhead, test=develop
5 years ago
zhouwei25
160d0f1308
fix the CI risk that network cannot be connected ( #22736 )
5 years ago
chengjuntao
15c2667143
register fp16 for assign op ( #22744 )
...
* register fp16 for assign op, test=develop
* add op test for fp16, test=develop
5 years ago
zhangchunle
882e7f7c3b
Directly getting API.spec for tools/sampcd_processor.py ( #22728 )
5 years ago
dyning
1c0653462d
fix generate_mask_labels lod level ( #22743 )
5 years ago
GaoWei8
ba140222d6
fix compile&runtime lod_equality of lod_reset ( #22737 )
5 years ago
hutuxian
175954d894
PaddleBox Framework Part2 ( #22466 )
...
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
5 years ago
ShenLiang
3132681e8a
add partial_sum op in contrib ( #22292 )
...
* add partial_sum_op, test=develop
* modify the Paddle Error Message, test=develop
* modify the Paddle Error Message, test=develop
* modify the bug for python3, test=develop
* modify the ut for ci, test=develop
* mv to contrib, test=develop
* use check_variable_and_dtype, test=develop
* fix ci, test=develop
* fix conflict, test=dvelop
* add partial concat, test=develop
* fix the conflict, test=develop
* fix the error, test=develop
* rm SSE4, test=develop
5 years ago
wangchaochaohu
611411b90e
Fusion group profile support ( #22718 )
...
* add support for the driver api callback and fix the profiler name show bug
5 years ago
ShenLiang
e136661304
add partial_concat op in contrib ( #22528 )
...
* add partial_concat, test=develop
* fix the grids and blocks, test=develop
* fix the Paddle_Enforce, test=develop
* fix the doc of op, test=develop
* fix the doc, test=develop
* fix the doc of the op, test=develop
* replace -1 with None, test=develop
5 years ago
GaoWei8
cdf5f6fb8c
Add an inference interface to disable FC padding ( #22097 )
...
* Add an interface of disabling FC padding
* fix bert regression
* polish fc padding interface
* recover pass function
* fix argument error
* fix mkldnn error
5 years ago
tianshuo78520a
d2ba91aad1
fix typo words ( #22653 )
5 years ago
Yibing Liu
6e7bfe30a6
register fp16 kernel for some ops ( #22650 ) ( #22696 )
...
test=develop
5 years ago
tangwei12
66a3150135
SYNC with communicaotor ( #22344 )
...
* add sync communicator and implement
5 years ago
Yiqun Liu
22bbd54719
Add the support of fp16 in fusion_group ( #22239 )
5 years ago
flame
d97475d53b
fix CPU C inference API compile bug ( #22702 )
5 years ago
Huihuang Zheng
adfa5b8354
Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp ( #22673 )
...
1. Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp.
2. Also enrich PADDLE_ENFORCE error messages.
5 years ago
flame
74eb82de19
fix go api bug ( #22669 )
5 years ago
wangchaochaohu
a089072c8b
fix the profile print error ( #22665 )
...
* fix the profile print error test=develop
5 years ago
lidanqing
d926214535
[UT coverage] improve the mul_mkldnn_op line coverage ( #22408 )
...
* improve the mul_mkldnn_op line coverage
test=develop
* remove fp32 mul mkldnn kernel
test=develop
* locally refactoring
test=develop
* change according to reviews
test=develop
5 years ago
wangchaochaohu
c65c6ae534
add flag to control profile level in python API ( #22319 )
...
* add python flag to control profile level test=develop
5 years ago
123malin
00594c1c88
support dumping params/grads in transpiler mode ( #22490 )
5 years ago
Zhaolong Xing
a06d75a280
[Paddle-TRT] Refine the error log about runtime batch and max_batch_size. ( #22535 )
...
* fix trt log
test=develop
* fix comments
test=develop
5 years ago
Adam
608447bfd5
Update MKLDNN to v1.2 ( #22521 )
5 years ago
Adam
ab610a34ff
transpose_mkldnn code change to meet Paddle standards ( #22591 )
5 years ago
Jiawei Wang
8f035fb637
Add TopK Op Grad CPU&GPU Kernel test=develop ( #22628 )
...
* Add TopK Op Grad CPU&GPU Kernel test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify PADDLE_ENFORCE test=develop
* Add TopK Op Grad, modify PADDLE_THROW test=develop
* Add TopK Op Grad, modify unittest test=develop
* fix ngraph top k op unittest test=develop
5 years ago
Steffy-zxf
90ee366653
update ops's unittest data type from float32 to float64 and shape over 100 ( #22544 )
...
* update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt
1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64)
2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data
3. remove sqrt from op_accuracy_white_list.py
4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100
5. test=develop
* modify the writing style according suggestions
test=develop
5 years ago
flame
f7eafca828
remove python inference warning ( #22602 )
5 years ago
Chen Weihang
fe685cc185
fix enforce test error, test=develop ( #22610 )
5 years ago
Wilber
9a8203aa25
fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop ( #22551 )
...
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
5 years ago
Chen Weihang
266106da75
Fix mismatch with plus sign in the line ( #22588 )
...
* reproduce match error, test=develop, test=document_fix
* fix mismatch error, test=develop, test=document_fix
5 years ago
flame
1d503e6a9e
Golang inference API ( #22503 )
...
* support golang inference
5 years ago
Zhaolong Xing
8acd745c25
[Ernie GPU Optim]: Fuse three fc to multihtead matmul ( #22486 )
...
* 1. optim multihead matmul: fuse three fc to multihtead matmul
test=develop
* fix conflict
test=develop
* fix comments
test=develop
5 years ago
Yiqun Liu
96770f519e
Disable fusion_group for windows and mac in build_strategy. ( #22549 )
...
test=develop
5 years ago
Zeng Jinle
08033c8634
fix traced layer with non persistable vars, test=develop ( #22552 )
5 years ago
Guo Sheng
31b5464632
Add support for dynamic_decode(while) training. ( #22231 )
...
* Add support for dynamic_decode(while) training. test=develop
* Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop
* Fix test_rnn_decode_api.py. test=develop
* Refine docs for apis in rnn.py. test=develop
* Adjust outputs of dynamic_decode. test=develop
* Remove the force_cpu update in assign_op. test=develop
* Remove the force_cpu update in assign_op. test=develop
* Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop
* Rename _create_array_outof_while as _create_array_out_of_while in rnn.py.
test=develop
5 years ago
tangwei12
b0675c8193
fix bug with compiledProgram ( #22495 )
...
* add thread barrier for the compiled program
5 years ago
Wojciech Uss
4cddb43c5c
Add support for Ernie NLP model to the Slim QAT ( #22506 )
...
* a test for Ernie QAT INT8 accuracy check
test=develop
* Remove NLP comparison test to split PRs
test=develop
* Fix typo and tabs, delete commented lines
test=develop
* re-combine the 2 PRs, test=develop
Co-authored-by: Michał Gallus <sand3r@interia.eu>
Co-authored-by: bingyanghuang <33643817+bingyanghuang@users.noreply.github.com>
5 years ago
Double_V
58d99247f4
support slice double grad, test=develop ( #22166 )
...
* support slice double grad, test=develop
* merge two doublegradopmaker to one doublegradopmaker,test=develop
* change the shape of slice_OP's unittest, test=develop
5 years ago
hutuxian
1a7962be97
Paddlebox about box_wrapper ( #22497 )
...
Refine PaddleBox Framework, Main functions:
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.
5 years ago
huzhiqiang
9e29d3ebed
【OpPorting Example】DEMO OF FIX COMPILE&RUNTIME LOD_EQUALITY ( #22460 )
5 years ago
yaoxuefeng
2235ee1a5e
multi-loss optimization by adding a DownpourOpt worker ( #22025 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
5 years ago
zhaoyuchen2018
54970444ce
Improve transpose performance with tile sm copy, test=develop ( #22311 )
...
* Refine code, fix select tile error,test=develop
* Refine element type and some comments, test=develop
* Refine comments and gpu utils, test=develop
* Remove some useless condition
* Refine floor and ceil, test=develop
* refine for loop. test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Wilber
a90fa54092
Compile without nccl deps. [1/2] ( #22509 )
...
支持不依赖nccl进行编译。[1/2]
多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
guofei
3a59a7a11f
Make assign op support LoDTensorArray and modify while_loop API ( #22309 )
...
This PR makes assign op support LoDTensorArray and enable the loop_vars in
while_loop to support tuple or list.
5 years ago
Zhaolong Xing
54a325a52f
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. ( #22483 )
...
* add int8 op teller for trt.
* refine trt int8
* add int8 op teller for trt.
test=develop
5 years ago
zhongpu
5739eeb9fa
add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile for paddle, test=develop ( #22504 )
5 years ago
Wilber
de009152a7
Compile without nccl deps. [2/2] ( #22484 )
...
Compile without nccl deps. [1/2]
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
Yiqun Liu
4b2227e958
Fix dismatch of std::max's arguments type on windows. ( #22507 )
...
test=develop
5 years ago
Wilber
870f465887
fix test_fusion_seqpool_concat lod level between compile and runtime ( #22488 )
5 years ago
Zhong Hui
a61d09527b
Fix the integer overflow problem of sequence2batch ( #22479 )
...
Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t,
In the /paddle/fluid/operators/math/sequence2batch.h#L122.
5 years ago
cc
197913ebe1
Add weight quantization in post_training_quanzitaion ( #22445 )
...
* support weight quantization in post_training_quanzitaion, test=develop
* add test for weight quantization, test=develop
5 years ago
Yiqun Liu
dcfb603897
Enable the detection of subgraph composed of grad ops ( #21223 )
...
* Add the first implememtation of fusion_group op #19621 (#3 )
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop
* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop
* Disable for mac and windows.
test=develop
* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop
* Refine the CUDA kernel to support large dims.
test=develop
* Add DeviceCodePool to manage all device codes.
* Add the first implementation fusion_group op.
* Add unit-test for fusion_group op.
* Add the check of result.
* Add the check of nvrtc in unit-test.
test=develop
* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop
* Disable fusion_group op for mac and windows.
test=develop
* Make the compiling of device code return status instead of hanging up.
test=develop
* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
* Unify fusion_group_op's input and output names.
test=develop
* Add the check of CUDA driver library in unittest.
test=develop
* Enable generating code for a given subgraph. #21126 (#4 )
* Enable generating code for a given subgraph.
* Support sorting the subgraph.
* Remove the rearange of expressions because we use the sorted subgraph directly.
* Enable generating code for a subgraph which is composed of grad ops.
* Use expression information to check the accuracy in unittest.
* Separate load and store from computation expressions.
test=develop
* Improve the loading statements in generated codes.
test=develop
* Remove unused arguments from formal list.
test=develop
* Enable the detection of subgraph of grad ops.
* Generate code for detected subgraph in fusion_group_pass.
* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop
* Fix a bug when checking whether the shape of all inputs are the same.
* Add debug information.
* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5 )
test=develop
* Call subgraph_detector in fusion_group pass.
test=develop
* Disable fusion_group when WITH_GPU is OFF.
test=develop
* Refine all PADDLE_ENFORCE message.
test=develop
* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop
* Follow review comments.
test=develop
5 years ago
Tao Luo
7c9ce097f1
refine reshape_op shape error message ( #22480 )
...
test=develop
5 years ago
LielinJiang
2b1386b2b2
optimize performance of interpolate op ( #22436 )
...
* optimize interpolate op, test=develop
5 years ago
wangchaochaohu
77dd0d97bb
use enum class to replace the usage of enum in some condition test=develop ( #22464 )
5 years ago
Yiqun Liu
44b45b9f07
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test ( #22456 )
...
* Add log in memory::Copy for debug purpose.
* Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one.
* Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one.
test=develop
* Change the type of second_dim from size_t to int64_t.
test=develop
5 years ago
joanna.wozna.intel
17f2c0899f
Add dequant-scale squash ( #22409 )
...
* Add dequant scale squash
test=develop
* Correct dequant-scale squash test
test=develop
5 years ago
mapingshuo
9c4deedbc2
update readme of imdb training demo ( #22455 )
...
* update readme
* test=develop
5 years ago
Zhaolong Xing
ceda0b9b1a
[Fix BUG]: Core when multi thread + clone + paddle-trt ( #22442 )
...
* add mutex for trt engine
test=develop
* add the test for copy_to_cpu
test=develop
5 years ago
Wilber
7bc4b09500
add WITH_NCCL option for cmake. ( #22384 )
...
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL
添加了PADDLE_WITH_NCCL定义
单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
Tao Luo
943cb8c664
fix sigmoid cudnn bug ( #22439 )
...
* Sigmoid bug fix, test=develop
* fix code format
test=develop
Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>
5 years ago
xujiaqi01
d51ffe860a
fix copy table bug ( #22432 )
...
* fix copy table bug of lost some feasign
* test=develop
5 years ago
Leo Chen
822e5b36ec
Support int16 for Tensor ( #22423 )
...
* add int16 support, test=develop
* add test, test=develop
* fix typo, test=develop
* fix dtype error in slice, test=develop
5 years ago
石晓伟
e1b0d7cbb1
remove anakin from code, test=develop ( #22420 )
5 years ago
liu zhengxi
0404e7a985
Update the precision of pad, pad2d, pad_constant_like's unit tests from fp32 to fp64 ( #22394 )
...
* update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop
5 years ago
xujiaqi01
371f377bea
add GeneralRoleMaker ( #22295 )
...
* add GeneralRoleMaker which is for general usage
* test=develop
5 years ago
Michał Gallus
269db0d1d1
[DNNL] Fix accuracy in INT8 FC ( #22404 )
...
* Enable quantize to reorder to nchw as well
* Correct FC MKL-DNN input dim requirements to accept 3D
* Improve DNNL FC format, error and 3D input handling
test=develop
* Improve error checking in FC
test=develop
* Improve PADDLE_ENFORCE messages in fc-related files
* Remove data layout attribute from obligatory pass args
test=develop
* Fix message in fc_mkldnn_pass to be logically correct
test=develop
5 years ago
joanna.wozna.intel
fb3086fd57
[UT coverage]Remove unnecessary transpose op registration ( #22402 )
5 years ago
lidanqing
ade5022681
[UT Coverage]Improve sum_mkldnn_op line coverage ( #22275 )
5 years ago
joanna.wozna.intel
3099d9d47c
Restore requantize squash ( #22399 )
5 years ago
Wojciech Uss
92462e948d
improve elementwise_add_mkldnn_op test code coverage ( #22359 )
5 years ago
ceci3
20f30dd604
add benchmark flag for conv_transpose ( #22389 )
5 years ago
Leo Chen
b96c7c9a7a
polish code, test=develop ( #22380 )
...
remove unnecessary template.
5 years ago
Chengmo
8f36c39537
Fix GEO-SGD init & send Bug ( #22375 )
...
* test=develop, fix geo Send & Init
5 years ago
zhupengyang
c6f888e5a5
update unittest accuracy to float64 for relu, prelu, maxout ( #22273 )
5 years ago
wangchaochaohu
0d8b222b79
Optimize the depthwise op test=develop ( #22265 )
5 years ago
Leo Chen
aaa4fe491a
use function instead of lambda, test=develop ( #22348 )
...
* use function instead of lambda, test=develop
* follow comments, test=develop
5 years ago
Adam
e7a9f6bbb7
[Bugfix] Preserve shape in inpalce operators ( #22360 )
5 years ago
qingqing01
2d20869c94
Fix infer_shape in compling for elementwise_op ( #22291 )
5 years ago
Yiqun Liu
b7cac50b64
Implement a common python unittest to test the ir passes. ( #22209 )
...
* Implement a common python unittest to test the ir passes.
test=develop
* Save the results in np.array and support to startup on CPU.
test=develop
* Fix the unittest.
test=develop
* Add check_program to check whether the optimized program is different from the origin one.
test=develop
* Remove the inferface all_ops.
test=develop
* Add exception test in pass_test.
test=develop
5 years ago
tangwei12
82bc814a57
integrated HALF_ASYNC to communicator ( #21869 )
...
* add half_async in the communicator
* fix DistributedStrategy
5 years ago
wangchaochaohu
1e932eccfa
remove unused code test=develop ( #22327 )
5 years ago
Leo Chen
3e5744aa65
Remove unused inputs for some operators ( #22284 )
...
* remove unused inputs, test=develop
* remove unused inputs, test=develop
* update dtype, test=develop
* remove unused inputs, test=develop
* update op_use_default_grad_op_maker, tese=develop
* resolve conflicts, test=develop
* follow comments, test=develop
* update center_loss_grad, test=develop
5 years ago
zhangchunle
805328e13b
fix typo in error message ( #22312 )
5 years ago
lidanqing
895f8da7d6
change std::cout to log(INFO), vlog ( #22316 )
5 years ago
石晓伟
8cb04664b9
revert paddle_fluid.map, test=develop ( #22236 )
5 years ago
Chen Weihang
35efbe6d95
Speeding up dygraph DataLoader with multiprocessing ( #21762 )
...
* add multiprocess for dygraph data loader, test=develop
* polish code & add safe gurad, test=develop
* refactor dygraph dataloader & add signal handler, test=develop
* fix member initializer compile error on ci, test=develop
* fix member initializer compile error one more, test=develop
* remove useless config, test=develop
* skip windows incompatible problem, test=develop
* add unittest for coverage, test=coverage
* add more exception unittest case, test=develop
* deal with signal handler coverage, test=develop
* polish code & add signal handler tests, test=develop
* deal with coverage ci problem, test=develop
* split data loader test & coverage ci fix, test=develop
* remove test_imperative_data_loader_with_exception, test=develop
* remove singal process except test case, test=develop
* add exception tests again & remove sample list test, test=develop
* split normal and exception unittests to diff class, test=develop
* polish doc for use_multiprocess effect in static mode, test=develop
5 years ago
Zeng Jinle
9435533adf
remove op_use_default_grad_op_maker.spec, test=develop, test=document_fix ( #22300 )
5 years ago
wangchaochaohu
7b76a76495
fix the conda build confilict test=develop ( #22279 )
5 years ago
Zeng Jinle
5e601a92ad
polish grad op check ( #22290 )
...
* polish grad op check, test=develop, test=document_fix
* keep op_use_default_grad_maker.spec to avoid conflict, test=develop, test=document_fix
5 years ago
Bai Yifan
faba4b116a
Remove disable flag in test_fsp_op.py ( #22171 )
...
* fix fsp_op, test=develop
* fix fsp grad op maker, test=develop
* update op_use_default_grad_op_maker.spec, test=develop
5 years ago
Zhen Wang
e40cfb1010
fix the bug of assert_is_op_output. test=develop ( #22262 )
5 years ago
Wojciech Uss
d3a6647372
improve placement pass tests code coverage ( #22197 )
5 years ago
liu zhengxi
07afc29e90
Make api.cc malloc consistent with paddle_api.h for PaddleBuf ( #22255 )
5 years ago
silingtong123
4f1da4adcb
remove the useless third_party library from C++ inference library ( #22021 )
...
* remove the useless third_party library from C++ inference library
* revert removing the install directory
5 years ago
zhouwei25
549e6de7ac
faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 ( #22164 )
5 years ago
xujiaqi01
e3a457d34b
add collective communication library in fleet ( #22211 )
...
* add collective communication library in fleet to replace mpi
* test=develop
5 years ago
Zhen Wang
f2522e91c4
fix the type error caused by setting bool attr in OpDesc. test=develop ( #22257 )
5 years ago
songyouwei
0ba1d140d4
Add CI check for sequence ops' unittests ( #21615 )
5 years ago
Zeng Jinle
1b76e789cf
remove cuda allocator ctor, test=develop ( #22212 )
5 years ago
Adam
9942d9ed5c
Add caching mechanizm to requantize_mkldnn_op ( #22223 )
5 years ago
Wilber
1230c110cb
[fluid-lite] adjust to relative error ( #22232 )
...
- fluid和lite精度比较替换为相对误差
5 years ago
123malin
985bceac53
Bug fix for sparse recorder ( #21969 )
...
* test=develop, bug fix for sparse recorder
5 years ago
Chen Weihang
fc0b21e17b
Polish fetch error message of parallel executor ( #22206 )
...
* polish error message of parallel executor, test=develop
* change PADDLE_ENFORCE, test=develop
5 years ago
Wojciech Uss
2e90c4eb0a
improve mkldnn_quantizer_config test code coverage ( #22216 )
5 years ago
Wilber
5750152e80
support fluid-lite subgraph run resnet test=develop ( #22191 )
...
- 添加了fluid-lite子图方式运行resnet的单测
- 修改了依赖Lite的git commit id
5 years ago
wangchaochaohu
621d3e0b66
fix the bug of profile update ( #22207 )
...
* fix the bug of profile update test=develop
5 years ago
FlyingQianMM
443a713c9e
add backward gradient computation for op argsort ( #22203 )
...
* add backward gradient computation for op argsort test=developo
* use pre-commit test=develop
5 years ago
Zhen Wang
46189b166d
Add bn and relu fuse pass ( #22048 )
...
* add bn and relu fuse pass
* add op attr assert and dtype assert
* fix some inputs&&outputs bugs for the fused op and pattern.
* add the unittest for fuse_bn_act_pass. test=develop
* use normative enforce statements. test=develop
* add the cpu test. test=develop
* add the support of batch_size=1 for the bn with relu op. test=develop
* add the error type for paddle throws. test=develop
* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
5 years ago
zhouwei25
2f3e2a84af
fix ci rule to show Shell variables ( #22177 )
5 years ago
baojun
298ee7d28a
Improve ngraph file line coverage ( #22155 )
5 years ago
zhongpu
d0f0a2520c
test Optimizer in dygraph ( #21949 )
...
* test Optimizer in dygraph, test=develop
* add optest for Optimizer in dygraph, test=develop
* fix adagrad optimizer, test=develop
* fix dpsgd optimizer, test=develop
* fix test_optimizer.py, test=develop
* fix dpsgd optimizer, this op only support cpu, test=develop
* add optest for optimizer, test=develop
* add description for dpsgd, test=develop
* add rmsprop to white_list in unused_var_check.cc, test=develop
* polish code style, test=develop
* polish code style, test=develop
* delete seed attribute for DpsgdOptimizer, test=develop
* change testing to debugging, test=develop
5 years ago
石晓伟
ad0dfb17c1
[Feature] Lite subgraph ( #22114 )
5 years ago
joanna.wozna.intel
5b2e98aa17
Add multiple quantize operators fuse ( #22062 )
5 years ago
Yiqun Liu
96980c2244
Polish the PADDLE_ENFORCE in fusion_group pass related codes. ( #22144 )
...
* Polish the PADDLE_ENFORCE in fusion_group pass related codes.
test=develop
* Correct the unittest because of the change relu_grad's formula.
test=develop
5 years ago
wangchaochaohu
c3876cf82d
add support for nested profiling event and printing in different level ( #22061 )
...
* add support for nested profiling event and printing in different level
5 years ago
Zeng Jinle
c3bcd3c1e2
fix dygraph non zero gpu bug, test=develop ( #22165 )
5 years ago
zhaoyuchen2018
3d4f2aa689
Refine stack op to improve xlnet performance, test=develop ( #22142 )
...
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
zhongpu
cf475f95df
Remove FC in dygraph, modify FC to Linear in sample code ( #22082 )
...
* modify fc to linear in sample code, test=develop
* remove FC, test=develop
* remove warnings, test=develop
* drop fluid/imperative/README.md , test=develop
* change fc to linear, test=develop
* polish code style, test=develop
5 years ago
liu zhengxi
64a4044292
add double register op_data_type of pad2d and fix compile error, test=develop ( #22075 )
5 years ago
Liu Xudong
7ba7acd197
Add coverage tools ( #21975 )
...
Add coverage data processing tools.
5 years ago
Double_V
6ea3809143
Support prroi_pool_op with Tensor and LoDTensor rois ( #20649 )
...
1. Add a new input named batch_roi_nums for prroi_pool_op. batch_roi_nums includes the number of roi for each image in batch when rois is Tensor. This information is saved in rois's lod when rois is LoDTensor.
2. add grad check to prroi_pool_op and solve unnormal X grad diff in CPU.
5 years ago
Pei Yang
d8a9b134e3
fix trt instance_norm serialize bug. test=develop ( #22152 )
5 years ago
zhongpu
cc1a9f4238
fix sample code in paddle/fluid/imperative/README.md ( #22141 )
...
* fix sample code, test=develop
* polish code style, test=develop
5 years ago
Zeng Jinle
4c2df8e4d4
fix allocator strategy comment, test=develop, test=document_fix ( #22121 )
5 years ago
bingyanghuang
7872d06ff4
Add explanation on conv grad for dims<3 ( #22125 )
5 years ago
liu zhengxi
724b13e459
fix xception precision problem, test=develop ( #22124 )
5 years ago
Yiqun Liu
b1401fb74d
Remove subgraph_detector from inference/analysis to the common framework/ir directory. ( #22094 )
...
test=develop
5 years ago
Pei Yang
50bee83f71
add TRT support for instance_norm op ( #21928 )
...
* add TRT support for instance_norm op
5 years ago
zhaoyuchen2018
3dbd4087fe
Fix windows build not kernel issue, test=develop ( #22105 )
...
windows conv_fusion failed as no kernel, explicit declare lambda
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Chengmo
418abc92f4
Update pyramid related OP ( #21372 )
...
* add special way to add distribute vars, Update Pyramid hash op
5 years ago
bingyanghuang
4b4a9cc88f
fix format in operator.cc ( #22101 )
5 years ago
Feiyu Chan
14aebc7a95
add erf op ( #21785 )
...
* add erf op and python interface.
* add fp16 support for erf op.
* add unitests for erf op and its python interface.
5 years ago
Chen Weihang
ba8414d3a5
replace CUDNN_ENFORCE with PADDLE_ENFORCE_CUDA_SUCCESS, test=develop ( #22109 )
5 years ago
silingtong123
6c20e7c4e6
test=develop, remove unused parameter from class RuntimeInferShapeContext constructors ( #22046 )
5 years ago
Double_V
fab4b0765a
support elu_op double grad ( #21822 )
...
* support elu activation double grad,test=develop
* delete the code commit in .cc,test=develop
* fix relu test unpass, test=develop
* add elu double grad kernel and unit test
* add caculate dX in elu double grad functor, test=develop
* update the commit code,test=develop
5 years ago
Pei Yang
0a51098a71
Add TRT support for BERT ( #21135 )
...
* add gelu plugin
* align trt bert with gpu
* add support for fused fc with relu,
* add unittest for bert trt
5 years ago
Jacek Czaja
b0b27ff699
[MKL-DNN] Conv grad and Batch Norm grad NHWC support ( #22088 )
5 years ago
Huihuang Zheng
dd4361568e
Add ParallelExecutor Test for Cond API and Fix PE Checks Shape Bug ( #22029 )
5 years ago
Zeng Jinle
9587249442
polish allocator strategy doc, test=develop, test=document_fix ( #22095 )
5 years ago
Zeng Jinle
d9f5d1eb29
ag allocator by default, test=develop ( #21837 )
5 years ago
123malin
7fb817d447
add distributed_strategy ( #21710 )
...
* add distributed_strategy
5 years ago
Jacek Czaja
ad8a9cb82c
[MKL-DNN] Pool & LRN Grad Ops NHWC support ( #21747 )
5 years ago
Kaipeng Deng
34c57120eb
polish cross_entropy ENFORCE ( #22056 )
5 years ago
SunAhong1993
7f4abaf2f5
register int/int64_t/float16 in pow/square kernel,test=develop ( #22023 )
...
* register int/int64_t/float16 in pow/square kernel,test=develop
* add abs/square/exp type,test=develop
5 years ago
Leo Chen
3f653c8323
register NoNeedBufferVarsInference for max_pool_grad_op, test=develop ( #22055 )
...
* fix test_conv2d_ngraph for grad diff, test=develop
* register NoNeedBufferVarsInference for max_pool_grad_op, test=develop
* refine error message, test=develop
* fix numpy, test=develop
* disable test conv2d_ngraph_op, test=develop
Co-authored-by: Zhang Ting <709968123@qq.com>
5 years ago
Yiqun Liu
d48320777e
Add the first implememtation of fusion_group op ( #19621 )
...
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop
* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop
* Disable for mac and windows.
test=develop
* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop
* Refine the CUDA kernel to support large dims.
test=develop
* Add DeviceCodePool to manage all device codes.
* Add the first implementation fusion_group op.
* Add unit-test for fusion_group op.
* Add the check of result.
* Add the check of nvrtc in unit-test.
test=develop
* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop
* Disable fusion_group op for mac and windows.
test=develop
* Make the compiling of device code return status instead of hanging up.
test=develop
* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
* Unify fusion_group_op's input and output names.
test=develop
* Add the check of CUDA driver library in unittest.
test=develop
* Refine the calling of PADDLE_ENFORCE.
test=develop
5 years ago
Michał Gallus
6192108408
[DNNL] 3D Fully-Connected ( #21746 )
5 years ago
FDInSky
aa2ed0dcc6
fix generate_proposal_labesl op ( #21793 )
...
* test=develop fix generate_proposal_labesl op
5 years ago
ceci3
95d79b6d00
update error log for batch_norm_grad ( #22017 )
...
* update error information about batch_norm_grad
* update bn,test=develop
5 years ago
Aurelius84
c53b62eb8e
fix integer overflow in match_matrix ( #22036 )
...
* fix integer overflow in match_matrix test=develop
* fix integer overflow in match_matrix test=develop
* fix typo test=develop
5 years ago
Chen Weihang
2e9082250d
polish default error msg & cublas error hint, test=develop ( #22032 )
5 years ago
wangchaochaohu
64baee4144
polish code test=develop ( #22014 )
5 years ago
Chen Weihang
35ff1568e9
Add error message for cublas inItizalize failed ( #21995 )
5 years ago
Chen Weihang
fbb42173a9
fix no hint problem when use ENFORCE for cuda, test=develop ( #21994 )
5 years ago
zhouwei25
e66f92d1ae
Modify demo_ci to support Windows, prepare for PR_Windows_Inference ( #21873 )
5 years ago
danleifeng
b7697f6218
fix broadcast bug;test=develop ( #21898 )
5 years ago
liu zhengxi
196e20dfbb
Fix multi-threads memory out of bounds error for passes ( #21920 )
...
* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop
* fix attention_lstm_fuse_pass during multi-threads inference, test=develop
* fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop
* fix fc_lstm_fuse_pass during multi-threads inference, test=develop
* fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop
5 years ago
zhaoyuchen2018
8859ddd6cf
Refine multihead kernel, align block to 32 ( #21961 )
...
* Refine multihead kernel, align block to 32
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
* Refine log comments
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
silingtong123
fd9b00df4b
test=develop, remove unused variable ( #21974 )
5 years ago
zhoushiyu
cee2ccb078
add shuffle batch op ( #21674 )
...
* add shuffle batch op, test=develop, test=document_preview
* fix size_t conflict and check_output test=develop, test=document_preview
* fix bug test=develop, test=document_preview
* add unittest of shuffle_batch layer test=develop, test=document_preview
* fix py coverage and op input type, test=develop, test=document_preview
* fix py coverage, test=develop
* fix en doc, test=develop
* move to contrib test=develop
* add unique_name test=develop
* invoke shuffle_batch in contrib.layers test=develop
5 years ago
mapingshuo
c3e1954918
make reverse op support negative axis ( #21925 )
...
* make reverse op support negative axis
5 years ago
石晓伟
03479469a7
fix multi-thread error of fc_gru_fuse_pass.cc, test=develop ( #21841 )
...
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop
* export FLAGS and GLOG symbols, test=develop
5 years ago
wangchaochaohu
de9ba01f11
add conda build python script test=develop ( #21943 )
...
* add script for conda package build
5 years ago
Aurelius84
10d6846900
Remove double registered dataType in Pad2d ( #21942 )
...
* fix compile error in CUDA10 test=develop
* remove double in pad2d test=develop
5 years ago
zhouwei25
2df4be5d35
Fix openblas bug to support compile on windows when WITH_MKL=OFF ( #21902 )
...
* Fix openblas to support compile on Windows when WITH_MKL=OFF
5 years ago
hutuxian
27decacb8a
fix aucop stat shape ( #21846 )
...
* fix stat shape back in global auc scenario
* add UT to cover global auc
5 years ago
Pei Yang
3e5008ad01
fix trt calib not working bug, test=develop ( #21934 )
5 years ago
Aurelius84
5cb2c74127
add register op_data_type of pad/expand_as et.al ( #21718 )
...
* add register op_data_type test=develop
* fix register bug in isfinite op test=develop
* rm int int64_t in pad2d gradKernel test=develop
5 years ago
qingqing01
2066745847
Pack imperative/layer into paddle_framework.so ( #21921 )
...
* Pack imperative/layer into paddle_framework.so
5 years ago
hong
30d000f8c2
fix matmul error message; test=develop ( #21885 )
5 years ago
zhouwei25
a01663ca1f
remove patch command and file of cares to Improved quality of Paddle Repo ( #21776 )
5 years ago
flame
2bbc0d7d60
python zero copy inference, delete pass ( #21897 )
...
* python zero copy inference
* support delete inference pass
5 years ago
Aurelius84
51a86d2b6b
Optimize adam speed ( #21777 )
...
* optimize adam speed by removing _finish_update test=develop
* fix SparseAdamFunctor param list test=develop
* Remove scale_op in expect_list of adam_op test=develop
* fix test optimizer loss assert error test=develop
* fix test optimizer loss assert error test=develop
* modify PADDLE_ENFORCE usage test=develop
* fix op_type in lamb_op.cc test=develop
* fix errors ostream format bug test=develop
* add betaPowOut in ngraph op test=develop
* fix ngraph::op api for gcc8 test=develop
* clean code test=develop
* modify struct into class test=develop
* remove code of beta1Tensor in lamb_op test=develop
5 years ago
Leo Chen
310edc0d0c
Update layers used in ptb model to use auto-generated op functions in dygraph mode ( #21724 )
...
* update layers, test=develop
* fix input numpy, test=develop
* fix bugs, test=develop
* follow commments, test=develop
* update getitem, test=develop
5 years ago
lidanqing
9dff56e8e2
change qat_performance with mobilenet, change batch_size of qat2_resnet50 ( #21895 )
...
test=develop
5 years ago
FDInSky
6b9fbcf3ad
Update iou_similarity op to support non-normalized bbox ( #21671 )
...
Update iou_similarity op to support non-normalized bbox
5 years ago
guofei
46f9184aff
Modify the while_loop API ( #21844 )
5 years ago
Guo Sheng
7689b6aaa4
Fix default label dim of label_smooth_op. test=develop ( #21862 )
5 years ago
zhouwei25
13e4756f18
change ci check rule of deleting unit-test ( #21876 )
5 years ago
GaoWei8
d4dda8628e
optimize fc jit ( #21878 )
...
test=develop
5 years ago
zhouwei25
013225bb68
fix Execution order of ci_check_unittest, and add it to Linux_py35 ( #21640 )
5 years ago
Chen Weihang
2b941736f3
fix softmax_with_cross_entropy_fix bug, test=develop ( #21810 )
5 years ago
Thunderbrook
c3cf42d0f7
add table id in cache shuffle ( #21585 )
...
* general table
* add sparse table
test=develop
* no cvm
test=develop
* add no_cvm
test=develop
* add note
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* code style
test=develop
* add key of optimizer
test=develop
* solve pslib stop core
test=develop
* barrier
test=develop
* add notes
test=develop
* add table id in cache shuffle
test=develop
* table id
test=develop
* code style
test=develop
5 years ago
Michał Gallus
253e664275
Disable memory opt pass when DNNL is on ( #21826 )
...
* Disable memory opt pass when DNNL is on
* Refine comment above mem optimization pass enablement
test=develop
5 years ago
Chengmo
a86f11b5f5
Speed GEO dense calc & communication ( #21579 )
...
* test=develop, speed dense calc & communication
5 years ago
Wojciech Uss
666c3bb9b0
handle multi-inputs with empty inputs for mkldnn_concat_op ( #21827 )
...
test=develop
5 years ago
Zeng Jinle
aa4d6a5d6c
Add some debug flags to auto growth allocator ( #21766 )
...
* add some debug flags to auto growth allocator, test=develop
* add comments about auto growth, test=develop
5 years ago
guofei
8b7c50f49a
Make While Op could run on GPU place and add while_loop unittest ( #21672 )
...
1. Make while_op accept GPU conditional data
2. Add more complex test cases for while_loop API
5 years ago
WangXi
17299b8d21
fix batch_norm_grad infer shape=0 & add allreduce enforce shape, test=develop ( #21801 )
5 years ago
Huihuang Zheng
557bce77da
Fix Backward Bugs in Conditional Block ( #21809 )
...
The fixed bugs:
1. The condition sub-graph is not pruned
2. When backward graph is extremely simple, the whole backward ops are pruned.
5 years ago
xujiaqi01
0eb4d990c4
fix compiled error when with_pslib=on ( #21769 )
...
* fix compiled error of butil when with_pslib=on and with_testing=on
* test=develop
5 years ago
Huihuang Zheng
0677a1c1c1
Fix That conditional_block_op Doesn't Have InferShape ( #21733 )
5 years ago
zhaoyuchen2018
a5a8d14414
Fix softmax cuda bug ( #21720 )
...
* Fix softmax cuda bug
* Refine multihead log and softmax logic
5 years ago
Kaipeng Deng
943a44492b
yolo_box OP add Attr(clip_bbox). ( #21620 )
...
* yolo_box OP add Attr(clip_bbox). test=develop
5 years ago
Michał Gallus
a5159d8480
Re-anble vgg and resnet101 models download ( #21713 )
...
test=develop
5 years ago