songyouwei
99d30bfc36
speedup slice impl ( #23340 )
...
test=develop
5 years ago
Zhaolong Xing
1a6ce8b910
add swish split gelu plugin dynamic support ( #23305 )
...
test=develop
5 years ago
Jacek Czaja
2bb1b0e89e
[DNNL] Added MKL-DNN inplace pass for C-API inference ( #23315 )
5 years ago
Yi Liu
0471476a18
fix nccl comm double free bug ( #23344 )
...
As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.
5 years ago
wangchaochaohu
1ee2a9a424
Profiler refine ( #23294 )
...
* refine output of profiler for child event
5 years ago
Leo Chen
488b2387e2
Feature/expand params in auto-generated pybind functions for dygraph operators ( #23181 )
...
* expand parameters, test=develop
* support resnet, test=develop
* fix resnet, test=develop
* support duplicable out, test=develop
* support ptb
* fix bugs, test=develop
* support null input, test=develop
* fix bugs, test=develop
* fix batchNorm is_test, test=develop
* refine code, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
5 years ago
GaoWei8
20eed5401a
Change fluid.layers.where‘s C++ operator name ( #23250 )
5 years ago
Yi Liu
2169e6fb58
Initialize global nccl_comm in PE ( #23275 )
5 years ago
Jacek Czaja
012886df79
[DNNL] Softmax mkldnn op inplace support ( #23197 )
5 years ago
石晓伟
75ebb48a91
supports thread-binding stream, test=develop ( #23177 )
5 years ago
石晓伟
708ded584e
pause the io_utils_test of int64 and resume after repair, test=develop ( #23234 )
5 years ago
Zeng Jinle
babda94c8a
Distinguish public/private global vars ( #23269 )
...
* distinguish public/private vars, test=develop
* fix windows issues, test=develop
5 years ago
zhaoyuchen2018
58615a6272
Improve elementwise performance. ( #23001 )
...
* Improve elementwise performance.
Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.
* Add some cuda kernel to speedup common broadcast cases. test=develop
* Add more test cases and fix cuda kernel bug. test=develop
* Remove tests as cpu percision fails.test=develop
* Refine SplitDims, test=develop
* Change file mode, test=develop
5 years ago
Wojciech Uss
f836c8aa8f
add check for scales and a message ( #23119 )
5 years ago
Zeng Jinle
8bfd62ffb7
Expose dygraph.grad api ( #23124 )
...
* expose dygraph.grad api, test=develop, test=document_fix
* add more parameter in dygraph.grad API, test=develop
* add only_inputs=True parameter, test=develop
* follow comments, test=develop, test=document_fix
* fix typo, test=develop, test=document_fix
5 years ago
Wilber
0129f4b568
Add some inference API comments for AnalysisPredictor ( #23242 )
...
* add inference api doc. test=develop
5 years ago
Tao Luo
c00d427d52
simplify the cmake log of ir/CMakeLists.txt ( #23262 )
...
test=develop
5 years ago
Zeng Jinle
77b4dc80c9
code polish for adding const qualifier, test=develop, test=document_fix ( #23248 )
5 years ago
Zhaolong Xing
430b0099c9
[Paddle-TRT]: Ernie Dynamic shape support. ( #23138 )
...
* add dynamic plugin support.
test=develop
* change emb eltwise layernorm to math function
test=develop
* add emb eltwise layernorm
test=develop
* can run dynamic shape ernie
test=develop
* fix ci
test=develop
* add ut for trt ernie dynamic
test=develop
* refine dynamic shape c++ interface.
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
xujiaqi01
68ea1ad55b
add clear one table ( #23089 )
...
* add clear_one_table
* test=develop
5 years ago
danleifeng
ae3bb16d06
add MaskAucCalculator in paddlebox ( #23157 )
...
* add maskauc in paddlebox; test=develop
5 years ago
liym27
6af480ca33
Support int64 for op assign_value. test=develop ( #23179 )
5 years ago
Zeng Jinle
53e6f8e1da
rename macro, test=develop ( #23161 )
5 years ago
Zeng Jinle
bba740710d
add cuda resource pool for BufferedReader, test=develop ( #23152 )
5 years ago
Zeng Jinle
7d8d50b6cc
rename no_need_buffer_vars macro, test=develop ( #23160 )
5 years ago
Liufang Sang
a486a739e1
fix compile error in win gpu ( #23196 )
...
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
5 years ago
Zeng Jinle
7ca77a90ac
add Tensor::IsSharedBufferWith method, test=develop ( #23175 )
5 years ago
Zeng Jinle
b8886bf122
rename no_need_buffer_vars_macro, test=develop ( #23159 )
5 years ago
Zeng Jinle
bae5930ba1
fix graph attr copy issues, test=develop ( #23191 )
5 years ago
wangchaochaohu
b721e23b25
transpose cudnn using cudnn v7 api ( #19738 )
...
* refine the transopose conv using v7 to choose algorithm
5 years ago
Pei Yang
46b8d282dc
Add some inference API comments for AnalysisConfig ( #23117 )
...
* add some API comments in paddle_analysis_config.h, test=develop
* add some API comments in paddle_analysis_config.h, test=develop
5 years ago
Adam
4f5e4540f8
Improve SGD jit code to work with large data ( #23120 )
5 years ago
Liufang Sang
4db031902d
add dequantize_log_op and make pyramid hash support int8 weight ( #22548 )
...
* add dequantize_log_op and make pyramid hash support int8 weight test=develop
* add unittest and update pyramid hash op test=develop
* remove paddle_enforce test=develop
* fix error message test=develop
* remove incorrent commit test=develop
* fix error message in log_dequantize test=develop
* change 2019 to 2020 test=develop
* remove useless check_grad test=develop
5 years ago
Zeng Jinle
e5fef8f38a
[Dygraph double grad]Code polish ( #23121 )
...
* fix dygraph double grad, test=develop
* fix unpack constructor, test=develop
5 years ago
Zeng Jinle
9258e96094
fix read op comments, test=develop, test=document_fix ( #23122 )
5 years ago
Zeng Jinle
acfc9b8a70
Reader sequential and inference partial feed ( #22699 )
...
* sequential reader stage 1, test=develop
* fix ut, test=develop
* fix iterable=False reset bug, add some logs and polish code, test=develop
* inference feed partial data, test=develop
* Turn on keep_order=True for test, test=develop
* enhance ut to test more cases, test=develop
* test commit for reverting
* Revert "test commit for reverting", test=develop
This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58.
* add ut of merged and unmerged results, test=develop
* add more uts for coverages and add en doc of api, test=develop
* follow comments, test=develop
* change note style, test=develop
5 years ago
Wilber
95b356a069
update embedding_eltwise_layernorm fuse and kernel. test=develop ( #23114 )
...
update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input
5 years ago
Zeng Jinle
a31d7328b7
Add dygraph double grad implementation ( #22939 )
...
* add double grad implementation for dygraph, test=develop
* polish code, add uts, test=develop
* fix place bug, test=develop
* polish codes, add more uts for coverages, test=develop
* add no_grad_set, test=develop
* add star gan ut, test=develop
* follow comments, test=develop
5 years ago
Yiqun Liu
3af4771122
Add the detection and code-generation of sqrt and square in fusion_group ( #23095 )
5 years ago
hutuxian
0c30098f8b
Add need_save_delta parameter to solve OOM ( #23097 )
5 years ago
songyouwei
2e2da7124b
high-performance dygraph slice ( #22879 )
...
* move __getitem__ to cpp
* bug fix
* add type check and gil release
* support negative step with omitted ends
test=develop
* code refine
test=develop
* bug fix
test=develop
* slice always return different pyobj
test=develop
5 years ago
Sylwester Fraczek
abee05a8c8
added mkldnn swish activation ( #23041 )
5 years ago
Zhaolong Xing
8c6fde9e69
fix align error ( #23090 )
...
test=develop
5 years ago
Liufang Sang
915b892a15
Fix div zero in fake quantize op ( #22966 )
...
* fix div zero test=develop
* fix div zero test=develop
* add hostdevice function test=develop
* add eps when is zero test=develop
5 years ago
Yi Liu
121b2aed4d
initialize global nccl context in dygraph ( #23037 )
...
initialize global nccl context in dygraph
test=develop
5 years ago
Zhang Ting
880eb04d93
skip PrepareData when it is unnecessary ( #22839 )
...
* remove unnecessary prepare data, test=develop
* Op in while block will not skip PrepareData, test=develop
5 years ago
Feiyu Chan
01ab8a0619
add approximation for gelu, test=develop ( #22961 )
...
add approximation for gelu, default value is False (only kernel with eigen is added, remove code for computing gelu with MKLDNN temporarily)
5 years ago
Adam
5842ae6785
Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )" ( #22985 )
5 years ago
Pei Yang
24db750386
fix trt int8 calib precision bug. test=develop ( #23036 )
5 years ago
GaoWei8
1dc1f9270e
Fix lod error of concat op for axis = 0 ( #22538 )
5 years ago
yaoxuefeng
660ff18488
fix datsset test=develop ( #23043 )
5 years ago
Zhang Ting
714b0076b6
Override GetKernelTypeForVar to avoid device transform, test=develop ( #23032 )
5 years ago
wangchaochaohu
112e3edbf6
fix the conv group problem test=develop ( #23025 )
5 years ago
Wilber
db40ee86db
fix unittets. test=develop ( #23018 )
5 years ago
wangchaochaohu
99db0cf762
remove debug log test=develop ( #22994 )
5 years ago
wangchaochaohu
3757e0687c
Add Unittest for backward of fusion group ( #22932 )
...
* add fusion group test for backward and refine code
5 years ago
chengjuntao
63f3ada7b9
fix bug which input shape ( #22965 )
...
* fix bug which input shape, test=develop
* add error type,test=develop
5 years ago
Zhang Ting
137d6563fc
add check for assigned data, test=develop ( #22960 )
5 years ago
wangchaochaohu
f0d193a23c
Cast fusion for fusion group ( #22876 )
...
* add support for expression type convert and add cast Op support in fusion group
5 years ago
yaoxuefeng
29a7a52d38
Fix instag ( #22632 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* fix compile warning test=develop
* add attr default test=develop
* add unittest test=develop
* fix style test=develop
* fix style test=develop
* change out_val_ifempty to out_val_if_empty test=develop
5 years ago
wangchaochaohu
c979c9f2b0
refine the profiler print test=develop ( #22968 )
5 years ago
Wilber
ff3ddbb502
add skip_layernorm pass. test=develop ( #22895 )
...
* add skip_layernorm pass. test=develop
5 years ago
wawltor
f154d5860f
Speed up the matmul op, use the gemm replace the batch gemm ( #22926 )
...
In the op of gemm, we use the gemm to replace batch gemm, speed up the matmul op
5 years ago
Adam
056edf3929
Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )
5 years ago
Zhaolong Xing
8d6dc102fe
[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse ( #22494 )
...
* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop
* 1. refine fc
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
guofei
3d8571e884
modify assign op and add unittest of assign op ( #22769 )
...
As the title.
5 years ago
Zeng Jinle
d33c4343e1
Imperative tracer refactoring ( #22457 )
...
* refine grad maker, test=develop
* refactor tracer stage 1, test=develop
* merge develop to solve conflict third times, test=develop
5 years ago
liu zhengxi
61fef9754b
Fix fc padding bug during inference fusion ( #22860 )
...
* fix fc padding during fusion, test=develop
* fix optim model inference after SaveOptimModel, test=develop
5 years ago
tangwei12
ad9c8f6d2d
fix communicator when break under pyreder mode ( #22911 )
...
* fix communicator when breaking under PyReader mode, test=develop
* revert some vlog level to 0, test=develop
5 years ago
mapingshuo
5ba9dfc16a
add lookup_table_dequant_op ( #22900 )
...
add lookup_table_dequant_op
5 years ago
zhaoyuchen2018
a020a25797
Fix model int8 quant fail, test=develop ( #22891 )
...
As model fails when enable int8 quant, so disable allocate memory in cpu
for small variable.
5 years ago
Zhaolong Xing
dd67d44a50
[Paddle-TRT] : (Part1) Dynamic shape support ( #22868 )
...
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
5 years ago
tangwei12
07e13b84cd
remove vlog, test=develop ( #22898 )
5 years ago
Zhang Ting
ca9c8b417d
fix compute ratio of profile, test=develop ( #22872 )
5 years ago
wangchaochaohu
dbb0b9b3b6
refine the profiler print ( #22823 )
...
* refine the profiler print test=develop
5 years ago
Michał Gallus
0038bfbd1d
Prevent loading of warmup data in analyzer_int8 if enable_int8 is set to false ( #22857 )
5 years ago
Chen Weihang
1644926a6c
Polish detail implement of dygraph data loader ( #22878 )
...
* polish detail implement of data loader, test=develop
* solve coverage ci problem, test=develop
5 years ago
Wilber
f686310d81
fix concat_mkldnn op. test=develop ( #22692 )
...
fix concat_mkldnn op when encounter extreame conditions.
5 years ago
hong
5191e54494
reduce default attrs for dynamic graph ( #22850 )
...
* reduce default attrs for dynamic graph, test=develop
* add some explanations for explicit attr, test=develop
* tweak explicit attr comments, test=develop
5 years ago
Zhaolong Xing
1a533ed2de
[BUG]: Multihead matmul op's ouput size should be BxSx(N*H) ( #22848 )
...
test=develop
5 years ago
hong
c736fef93b
dygraph backward engine accelerate ( #22808 )
...
* fix loaded program load bug; test=develop
* first version
* speed backward engin; test=develop
* remove useless code; test=develop
* reconvery io.py; test=develop
* remove useless code; test=develop
* remove useless code; test=develop
5 years ago
Zeng Jinle
d41d802ba3
Add flags to limit gpu memory ( #22793 )
...
* add recorded cuda memory apis, fix typo, test=develop
* add more ut, test=develop
* follow comments, test=develop
* fix py35 incompatible issues, test=develop
5 years ago
石晓伟
1861ca88f1
serialize the PaddleTensor, test=develop ( #22810 )
...
* encapsulate the PaddleTensorToLoDTensor, test=develop
* serialize the pd_tensor, test=develop
* serialize tensors to file, test=develop
5 years ago
Zhang Ting
72ff5a09c3
fix print bug of profile, test=develop ( #22804 )
5 years ago
Zhang Ting
4e8bc02461
add fluid.device_guard to specify the device type for Op ( #22254 )
...
* add fluid.device_guard to specify the device type for Op
5 years ago
石晓伟
ddb9b46fec
change the function in op_teller, test=develop ( #22794 )
...
* change the function in op_teller, test=develop
* correct the commit-id, test=develop
5 years ago
Zhen Wang
89cfa49156
Unmerged fetch list ( #22635 )
...
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
* add the unit test for fetch_unmerged.
* update ut for multi-card and multi-cpu.
* add the error message and the user suggestion in FetchOpHandle. test=develop
5 years ago
wangchaochaohu
8456c3f4dd
polish the profiler_help code ( #22811 )
5 years ago
zhongpu
2fd1ec1e3e
fix docker build for paddle openblas, test=develop ( #22795 )
5 years ago
Chen Weihang
7d8d573453
Speed up dygraph DataLoader based on shared memory and LoDTensor serialization ( #22541 )
...
* add lodtensor share memory & serialization, test=develop
* fix windows compile error, test=develop
* deal vartype pickle & fix unittest matching error message, test=develop
* update timeout variable name, test=develop
* refactor memory map implement, test=develop
* clear mmap file discripter when exit unexpectedly, test=develop
* remove the child process fd in advance, test=develop
* remove mmap fds after Queue.put in child process, test=develop
* add hard unittests for register exit func, test=develop
* fix python2 compatibility problem in unittest, test=develop
* fix exception unittest error, test=develop
* polish code based review comment, test=develop
5 years ago
liu zhengxi
324f2b3922
Fix inference c api PD_GetZeroCopyOutput lod ( #22768 )
...
* fix inference c api lod, test=develop
* fix capi lod problem and enrich tests, test=develop
* delete useless header files and alter const_cast, test=develop
5 years ago
wangchaochaohu
7578fcbac4
Profile code refine ( #22800 )
...
* add profiler_help.h to refine the code test=develop
5 years ago
hutuxian
53a2b68f4e
support customized download command in dataset ( #22782 )
...
* user can call dataset.set_download_cmd to set its customized download cmd
* add UT to cover this scenario
5 years ago
wangchaochaohu
ca9e77a8d4
add sum op support for fusion group ( #22771 )
...
* Add the codegen and auto fusion for sum Op in fusion group
5 years ago
tianshuo78520a
433cef03e5
fix typo word ( #22784 )
5 years ago
Kaipeng Deng
ebc7ffc300
fix detection_map. test=develop ( #22705 )
5 years ago
zhaoyuchen2018
72dde4abde
Refine adam op to improve performance, test=develop ( #22346 )
...
* Refine adam op, test=develop
* Fuse kernels together to reduce cpu time.
* Refine paddle enforce, test=develop
* Remove some comments, test=develop
* Refine code,test=develop
* Refine cuda kernel, test=develop
* Refine code according to comments, test=develop
5 years ago
wangguanzhong
f2d1cd119a
fix lod level, test=develop ( #22755 )
5 years ago
FlyingQianMM
79d712346f
Correct CPU gradients of the argsort op ( #22739 )
...
* Correct CPU gradients of the argsort op, form a network to test its forward and backward process, test=develop
* fix dynamic threshold error in test_argsort_op, test=develop
5 years ago
Adam
2b80e9a719
Add cpu_info without XBYAK ( #22716 )
5 years ago
guofei
ae8b5f11a3
Change ShareDataWith() to TensorCopy() in ref_by_trainer_id ( #22717 )
...
As the title
5 years ago
liu zhengxi
71ab0458e1
Fix pointer and c-api encapsulation ( #22663 )
...
* refine pointer and c-api prototype, test=develop
* fix new c api profile bug, test=develop
* add unit tests, test=develop
5 years ago
Leo Chen
b2c1be851a
support cond in clone, test=develop ( #22657 )
...
* support cond in clone, test=develop
* refine code, test=develop
* refine code, test=develop
* follow comments, test=develop
* refine code, test=develop
5 years ago
Zhang Ting
f97f3f9301
add framework overhead ratio in profile report ( #22590 )
...
* add framework overhead ratio, test=develop
* print GpuMemcpy overhead, test=develop
5 years ago
zhouwei25
160d0f1308
fix the CI risk that network cannot be connected ( #22736 )
5 years ago
chengjuntao
15c2667143
register fp16 for assign op ( #22744 )
...
* register fp16 for assign op, test=develop
* add op test for fp16, test=develop
5 years ago
zhangchunle
882e7f7c3b
Directly getting API.spec for tools/sampcd_processor.py ( #22728 )
5 years ago
dyning
1c0653462d
fix generate_mask_labels lod level ( #22743 )
5 years ago
GaoWei8
ba140222d6
fix compile&runtime lod_equality of lod_reset ( #22737 )
5 years ago
hutuxian
175954d894
PaddleBox Framework Part2 ( #22466 )
...
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
5 years ago
ShenLiang
3132681e8a
add partial_sum op in contrib ( #22292 )
...
* add partial_sum_op, test=develop
* modify the Paddle Error Message, test=develop
* modify the Paddle Error Message, test=develop
* modify the bug for python3, test=develop
* modify the ut for ci, test=develop
* mv to contrib, test=develop
* use check_variable_and_dtype, test=develop
* fix ci, test=develop
* fix conflict, test=dvelop
* add partial concat, test=develop
* fix the conflict, test=develop
* fix the error, test=develop
* rm SSE4, test=develop
5 years ago
wangchaochaohu
611411b90e
Fusion group profile support ( #22718 )
...
* add support for the driver api callback and fix the profiler name show bug
5 years ago
ShenLiang
e136661304
add partial_concat op in contrib ( #22528 )
...
* add partial_concat, test=develop
* fix the grids and blocks, test=develop
* fix the Paddle_Enforce, test=develop
* fix the doc of op, test=develop
* fix the doc, test=develop
* fix the doc of the op, test=develop
* replace -1 with None, test=develop
5 years ago
GaoWei8
cdf5f6fb8c
Add an inference interface to disable FC padding ( #22097 )
...
* Add an interface of disabling FC padding
* fix bert regression
* polish fc padding interface
* recover pass function
* fix argument error
* fix mkldnn error
5 years ago
tianshuo78520a
d2ba91aad1
fix typo words ( #22653 )
5 years ago
Yibing Liu
6e7bfe30a6
register fp16 kernel for some ops ( #22650 ) ( #22696 )
...
test=develop
5 years ago
tangwei12
66a3150135
SYNC with communicaotor ( #22344 )
...
* add sync communicator and implement
5 years ago
Yiqun Liu
22bbd54719
Add the support of fp16 in fusion_group ( #22239 )
5 years ago
flame
d97475d53b
fix CPU C inference API compile bug ( #22702 )
5 years ago
Huihuang Zheng
adfa5b8354
Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp ( #22673 )
...
1. Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp.
2. Also enrich PADDLE_ENFORCE error messages.
5 years ago
flame
74eb82de19
fix go api bug ( #22669 )
5 years ago
wangchaochaohu
a089072c8b
fix the profile print error ( #22665 )
...
* fix the profile print error test=develop
5 years ago
lidanqing
d926214535
[UT coverage] improve the mul_mkldnn_op line coverage ( #22408 )
...
* improve the mul_mkldnn_op line coverage
test=develop
* remove fp32 mul mkldnn kernel
test=develop
* locally refactoring
test=develop
* change according to reviews
test=develop
5 years ago
wangchaochaohu
c65c6ae534
add flag to control profile level in python API ( #22319 )
...
* add python flag to control profile level test=develop
5 years ago
123malin
00594c1c88
support dumping params/grads in transpiler mode ( #22490 )
5 years ago
Zhaolong Xing
a06d75a280
[Paddle-TRT] Refine the error log about runtime batch and max_batch_size. ( #22535 )
...
* fix trt log
test=develop
* fix comments
test=develop
5 years ago
Adam
608447bfd5
Update MKLDNN to v1.2 ( #22521 )
5 years ago
Adam
ab610a34ff
transpose_mkldnn code change to meet Paddle standards ( #22591 )
5 years ago
Jiawei Wang
8f035fb637
Add TopK Op Grad CPU&GPU Kernel test=develop ( #22628 )
...
* Add TopK Op Grad CPU&GPU Kernel test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify PADDLE_ENFORCE test=develop
* Add TopK Op Grad, modify PADDLE_THROW test=develop
* Add TopK Op Grad, modify unittest test=develop
* fix ngraph top k op unittest test=develop
5 years ago
Steffy-zxf
90ee366653
update ops's unittest data type from float32 to float64 and shape over 100 ( #22544 )
...
* update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt
1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64)
2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data
3. remove sqrt from op_accuracy_white_list.py
4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100
5. test=develop
* modify the writing style according suggestions
test=develop
5 years ago
flame
f7eafca828
remove python inference warning ( #22602 )
5 years ago
Chen Weihang
fe685cc185
fix enforce test error, test=develop ( #22610 )
5 years ago
Wilber
9a8203aa25
fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop ( #22551 )
...
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
5 years ago
Chen Weihang
266106da75
Fix mismatch with plus sign in the line ( #22588 )
...
* reproduce match error, test=develop, test=document_fix
* fix mismatch error, test=develop, test=document_fix
5 years ago
flame
1d503e6a9e
Golang inference API ( #22503 )
...
* support golang inference
5 years ago
Zhaolong Xing
8acd745c25
[Ernie GPU Optim]: Fuse three fc to multihtead matmul ( #22486 )
...
* 1. optim multihead matmul: fuse three fc to multihtead matmul
test=develop
* fix conflict
test=develop
* fix comments
test=develop
5 years ago
Yiqun Liu
96770f519e
Disable fusion_group for windows and mac in build_strategy. ( #22549 )
...
test=develop
5 years ago
Zeng Jinle
08033c8634
fix traced layer with non persistable vars, test=develop ( #22552 )
5 years ago
Guo Sheng
31b5464632
Add support for dynamic_decode(while) training. ( #22231 )
...
* Add support for dynamic_decode(while) training. test=develop
* Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop
* Fix test_rnn_decode_api.py. test=develop
* Refine docs for apis in rnn.py. test=develop
* Adjust outputs of dynamic_decode. test=develop
* Remove the force_cpu update in assign_op. test=develop
* Remove the force_cpu update in assign_op. test=develop
* Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop
* Rename _create_array_outof_while as _create_array_out_of_while in rnn.py.
test=develop
5 years ago
tangwei12
b0675c8193
fix bug with compiledProgram ( #22495 )
...
* add thread barrier for the compiled program
5 years ago
Wojciech Uss
4cddb43c5c
Add support for Ernie NLP model to the Slim QAT ( #22506 )
...
* a test for Ernie QAT INT8 accuracy check
test=develop
* Remove NLP comparison test to split PRs
test=develop
* Fix typo and tabs, delete commented lines
test=develop
* re-combine the 2 PRs, test=develop
Co-authored-by: Michał Gallus <sand3r@interia.eu>
Co-authored-by: bingyanghuang <33643817+bingyanghuang@users.noreply.github.com>
5 years ago
Double_V
58d99247f4
support slice double grad, test=develop ( #22166 )
...
* support slice double grad, test=develop
* merge two doublegradopmaker to one doublegradopmaker,test=develop
* change the shape of slice_OP's unittest, test=develop
5 years ago
hutuxian
1a7962be97
Paddlebox about box_wrapper ( #22497 )
...
Refine PaddleBox Framework, Main functions:
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.
5 years ago
huzhiqiang
9e29d3ebed
【OpPorting Example】DEMO OF FIX COMPILE&RUNTIME LOD_EQUALITY ( #22460 )
5 years ago
yaoxuefeng
2235ee1a5e
multi-loss optimization by adding a DownpourOpt worker ( #22025 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
5 years ago
zhaoyuchen2018
54970444ce
Improve transpose performance with tile sm copy, test=develop ( #22311 )
...
* Refine code, fix select tile error,test=develop
* Refine element type and some comments, test=develop
* Refine comments and gpu utils, test=develop
* Remove some useless condition
* Refine floor and ceil, test=develop
* refine for loop. test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Wilber
a90fa54092
Compile without nccl deps. [1/2] ( #22509 )
...
支持不依赖nccl进行编译。[1/2]
多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
guofei
3a59a7a11f
Make assign op support LoDTensorArray and modify while_loop API ( #22309 )
...
This PR makes assign op support LoDTensorArray and enable the loop_vars in
while_loop to support tuple or list.
5 years ago
Zhaolong Xing
54a325a52f
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. ( #22483 )
...
* add int8 op teller for trt.
* refine trt int8
* add int8 op teller for trt.
test=develop
5 years ago
zhongpu
5739eeb9fa
add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile for paddle, test=develop ( #22504 )
5 years ago
Wilber
de009152a7
Compile without nccl deps. [2/2] ( #22484 )
...
Compile without nccl deps. [1/2]
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
Yiqun Liu
4b2227e958
Fix dismatch of std::max's arguments type on windows. ( #22507 )
...
test=develop
5 years ago
Wilber
870f465887
fix test_fusion_seqpool_concat lod level between compile and runtime ( #22488 )
5 years ago
Zhong Hui
a61d09527b
Fix the integer overflow problem of sequence2batch ( #22479 )
...
Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t,
In the /paddle/fluid/operators/math/sequence2batch.h#L122.
5 years ago
cc
197913ebe1
Add weight quantization in post_training_quanzitaion ( #22445 )
...
* support weight quantization in post_training_quanzitaion, test=develop
* add test for weight quantization, test=develop
5 years ago
Yiqun Liu
dcfb603897
Enable the detection of subgraph composed of grad ops ( #21223 )
...
* Add the first implememtation of fusion_group op #19621 (#3 )
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop
* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop
* Disable for mac and windows.
test=develop
* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop
* Refine the CUDA kernel to support large dims.
test=develop
* Add DeviceCodePool to manage all device codes.
* Add the first implementation fusion_group op.
* Add unit-test for fusion_group op.
* Add the check of result.
* Add the check of nvrtc in unit-test.
test=develop
* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop
* Disable fusion_group op for mac and windows.
test=develop
* Make the compiling of device code return status instead of hanging up.
test=develop
* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
* Unify fusion_group_op's input and output names.
test=develop
* Add the check of CUDA driver library in unittest.
test=develop
* Enable generating code for a given subgraph. #21126 (#4 )
* Enable generating code for a given subgraph.
* Support sorting the subgraph.
* Remove the rearange of expressions because we use the sorted subgraph directly.
* Enable generating code for a subgraph which is composed of grad ops.
* Use expression information to check the accuracy in unittest.
* Separate load and store from computation expressions.
test=develop
* Improve the loading statements in generated codes.
test=develop
* Remove unused arguments from formal list.
test=develop
* Enable the detection of subgraph of grad ops.
* Generate code for detected subgraph in fusion_group_pass.
* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop
* Fix a bug when checking whether the shape of all inputs are the same.
* Add debug information.
* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5 )
test=develop
* Call subgraph_detector in fusion_group pass.
test=develop
* Disable fusion_group when WITH_GPU is OFF.
test=develop
* Refine all PADDLE_ENFORCE message.
test=develop
* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop
* Follow review comments.
test=develop
5 years ago
Tao Luo
7c9ce097f1
refine reshape_op shape error message ( #22480 )
...
test=develop
5 years ago
LielinJiang
2b1386b2b2
optimize performance of interpolate op ( #22436 )
...
* optimize interpolate op, test=develop
5 years ago
wangchaochaohu
77dd0d97bb
use enum class to replace the usage of enum in some condition test=develop ( #22464 )
5 years ago
Yiqun Liu
44b45b9f07
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test ( #22456 )
...
* Add log in memory::Copy for debug purpose.
* Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one.
* Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one.
test=develop
* Change the type of second_dim from size_t to int64_t.
test=develop
5 years ago
joanna.wozna.intel
17f2c0899f
Add dequant-scale squash ( #22409 )
...
* Add dequant scale squash
test=develop
* Correct dequant-scale squash test
test=develop
5 years ago
mapingshuo
9c4deedbc2
update readme of imdb training demo ( #22455 )
...
* update readme
* test=develop
5 years ago
Zhaolong Xing
ceda0b9b1a
[Fix BUG]: Core when multi thread + clone + paddle-trt ( #22442 )
...
* add mutex for trt engine
test=develop
* add the test for copy_to_cpu
test=develop
5 years ago
Wilber
7bc4b09500
add WITH_NCCL option for cmake. ( #22384 )
...
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL
添加了PADDLE_WITH_NCCL定义
单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡
Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
5 years ago
Tao Luo
943cb8c664
fix sigmoid cudnn bug ( #22439 )
...
* Sigmoid bug fix, test=develop
* fix code format
test=develop
Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>
5 years ago
xujiaqi01
d51ffe860a
fix copy table bug ( #22432 )
...
* fix copy table bug of lost some feasign
* test=develop
5 years ago
Leo Chen
822e5b36ec
Support int16 for Tensor ( #22423 )
...
* add int16 support, test=develop
* add test, test=develop
* fix typo, test=develop
* fix dtype error in slice, test=develop
5 years ago
石晓伟
e1b0d7cbb1
remove anakin from code, test=develop ( #22420 )
5 years ago
liu zhengxi
0404e7a985
Update the precision of pad, pad2d, pad_constant_like's unit tests from fp32 to fp64 ( #22394 )
...
* update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop
5 years ago
xujiaqi01
371f377bea
add GeneralRoleMaker ( #22295 )
...
* add GeneralRoleMaker which is for general usage
* test=develop
5 years ago
Michał Gallus
269db0d1d1
[DNNL] Fix accuracy in INT8 FC ( #22404 )
...
* Enable quantize to reorder to nchw as well
* Correct FC MKL-DNN input dim requirements to accept 3D
* Improve DNNL FC format, error and 3D input handling
test=develop
* Improve error checking in FC
test=develop
* Improve PADDLE_ENFORCE messages in fc-related files
* Remove data layout attribute from obligatory pass args
test=develop
* Fix message in fc_mkldnn_pass to be logically correct
test=develop
5 years ago
joanna.wozna.intel
fb3086fd57
[UT coverage]Remove unnecessary transpose op registration ( #22402 )
5 years ago
lidanqing
ade5022681
[UT Coverage]Improve sum_mkldnn_op line coverage ( #22275 )
5 years ago
joanna.wozna.intel
3099d9d47c
Restore requantize squash ( #22399 )
5 years ago
Wojciech Uss
92462e948d
improve elementwise_add_mkldnn_op test code coverage ( #22359 )
5 years ago
ceci3
20f30dd604
add benchmark flag for conv_transpose ( #22389 )
5 years ago
Leo Chen
b96c7c9a7a
polish code, test=develop ( #22380 )
...
remove unnecessary template.
5 years ago
Chengmo
8f36c39537
Fix GEO-SGD init & send Bug ( #22375 )
...
* test=develop, fix geo Send & Init
5 years ago
zhupengyang
c6f888e5a5
update unittest accuracy to float64 for relu, prelu, maxout ( #22273 )
5 years ago
wangchaochaohu
0d8b222b79
Optimize the depthwise op test=develop ( #22265 )
5 years ago
Leo Chen
aaa4fe491a
use function instead of lambda, test=develop ( #22348 )
...
* use function instead of lambda, test=develop
* follow comments, test=develop
5 years ago
Adam
e7a9f6bbb7
[Bugfix] Preserve shape in inpalce operators ( #22360 )
5 years ago
qingqing01
2d20869c94
Fix infer_shape in compling for elementwise_op ( #22291 )
5 years ago
Yiqun Liu
b7cac50b64
Implement a common python unittest to test the ir passes. ( #22209 )
...
* Implement a common python unittest to test the ir passes.
test=develop
* Save the results in np.array and support to startup on CPU.
test=develop
* Fix the unittest.
test=develop
* Add check_program to check whether the optimized program is different from the origin one.
test=develop
* Remove the inferface all_ops.
test=develop
* Add exception test in pass_test.
test=develop
5 years ago
tangwei12
82bc814a57
integrated HALF_ASYNC to communicator ( #21869 )
...
* add half_async in the communicator
* fix DistributedStrategy
5 years ago
wangchaochaohu
1e932eccfa
remove unused code test=develop ( #22327 )
5 years ago
Leo Chen
3e5744aa65
Remove unused inputs for some operators ( #22284 )
...
* remove unused inputs, test=develop
* remove unused inputs, test=develop
* update dtype, test=develop
* remove unused inputs, test=develop
* update op_use_default_grad_op_maker, tese=develop
* resolve conflicts, test=develop
* follow comments, test=develop
* update center_loss_grad, test=develop
5 years ago
zhangchunle
805328e13b
fix typo in error message ( #22312 )
5 years ago
lidanqing
895f8da7d6
change std::cout to log(INFO), vlog ( #22316 )
5 years ago
石晓伟
8cb04664b9
revert paddle_fluid.map, test=develop ( #22236 )
5 years ago
Chen Weihang
35efbe6d95
Speeding up dygraph DataLoader with multiprocessing ( #21762 )
...
* add multiprocess for dygraph data loader, test=develop
* polish code & add safe gurad, test=develop
* refactor dygraph dataloader & add signal handler, test=develop
* fix member initializer compile error on ci, test=develop
* fix member initializer compile error one more, test=develop
* remove useless config, test=develop
* skip windows incompatible problem, test=develop
* add unittest for coverage, test=coverage
* add more exception unittest case, test=develop
* deal with signal handler coverage, test=develop
* polish code & add signal handler tests, test=develop
* deal with coverage ci problem, test=develop
* split data loader test & coverage ci fix, test=develop
* remove test_imperative_data_loader_with_exception, test=develop
* remove singal process except test case, test=develop
* add exception tests again & remove sample list test, test=develop
* split normal and exception unittests to diff class, test=develop
* polish doc for use_multiprocess effect in static mode, test=develop
5 years ago
Zeng Jinle
9435533adf
remove op_use_default_grad_op_maker.spec, test=develop, test=document_fix ( #22300 )
5 years ago
wangchaochaohu
7b76a76495
fix the conda build confilict test=develop ( #22279 )
5 years ago
Zeng Jinle
5e601a92ad
polish grad op check ( #22290 )
...
* polish grad op check, test=develop, test=document_fix
* keep op_use_default_grad_maker.spec to avoid conflict, test=develop, test=document_fix
5 years ago
Bai Yifan
faba4b116a
Remove disable flag in test_fsp_op.py ( #22171 )
...
* fix fsp_op, test=develop
* fix fsp grad op maker, test=develop
* update op_use_default_grad_op_maker.spec, test=develop
5 years ago
Zhen Wang
e40cfb1010
fix the bug of assert_is_op_output. test=develop ( #22262 )
5 years ago
Wojciech Uss
d3a6647372
improve placement pass tests code coverage ( #22197 )
5 years ago
liu zhengxi
07afc29e90
Make api.cc malloc consistent with paddle_api.h for PaddleBuf ( #22255 )
5 years ago
silingtong123
4f1da4adcb
remove the useless third_party library from C++ inference library ( #22021 )
...
* remove the useless third_party library from C++ inference library
* revert removing the install directory
5 years ago
zhouwei25
549e6de7ac
faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 ( #22164 )
5 years ago