Zeng Jinle
29337f4e17
fix conflict of inferne partial feed with gpu parallel ssa graph executor, test=develop ( #23400 )
5 years ago
Pei Yang
7e439780d9
add full paddle_analysis_config.h APIs. ( #23215 )
5 years ago
zhongpu
bfb07aafe8
Revert "Exhaustive search ( #22821 )", test=develop ( #23401 )
...
This reverts commit 48144e4099
.
5 years ago
liym27
b7b0b3595b
Add unittest for transformer prediction in dygraph_to_static ( #23207 )
...
* Add unittest for transformer prediction in dygraph_to_static.
* fix bug in fill_constant api.
* Make transpose support size 0. test=develop
5 years ago
xujiaqi01
93ea9dd27a
fix stat var in hogwild worker ( #23367 )
...
* fix stat var in hogwild worker
* test=develop
5 years ago
joanna.wozna.intel
8c463700e1
Add default pass attributes ( #23042 )
5 years ago
zhongpu
48144e4099
Exhaustive search ( #22821 )
...
* use global conv cache; test=develop
* use singleton cache; test=develop
* fix format error; test=develop
* add cudnn helper header; test=develop
* fix header error; test=develop
* fix mac unitest; test=develop
* fix mac unitest; test=develop
* fix file format; test=develop
* fix include file error, test=develop
* remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop
* fix test_elementwise_mul_op_dim, test=develop
Co-authored-by: phlrain <phliuhongyu@126.com>
5 years ago
Adam
da7c73f847
Delete is_test attribute from activation operators ( #23318 )
...
* Delete is_test from activation operators
test=develop
* Revent unneeded changes
test=develop
5 years ago
Kaipeng Deng
21d95be0db
Add inplace abn op ( #22806 )
...
* add inplace_abn_op. test=develop
5 years ago
Yi Liu
821534efd3
add paralell_executor dependancy to collective_helper ( #23380 )
...
test=develop
5 years ago
Zeng Jinle
3a21980b78
add reader dependency pass, test=develop ( #23301 )
5 years ago
wangchaochaohu
69e3f99362
refine the error message ( #23212 )
...
* refine the error message of tensor_array_read_write Op
5 years ago
石晓伟
5c59d2139e
reverts the commit 23177, test=develop ( #23363 )
5 years ago
wangchaochaohu
d280106007
Add support for attr type Op and add fill_constant Op and scale Op ( #23163 )
...
* add attr support for fusion group and add support for fill_constant and scale Op
5 years ago
xujiaqi01
3a45767d49
add fleet pslib pull and push sparse op and push dense op ( #23139 )
...
* add fleet pslib pull and push sparse op and push dense op
* test=develop
5 years ago
songyouwei
99d30bfc36
speedup slice impl ( #23340 )
...
test=develop
5 years ago
Zhaolong Xing
1a6ce8b910
add swish split gelu plugin dynamic support ( #23305 )
...
test=develop
5 years ago
Jacek Czaja
2bb1b0e89e
[DNNL] Added MKL-DNN inplace pass for C-API inference ( #23315 )
5 years ago
Yi Liu
0471476a18
fix nccl comm double free bug ( #23344 )
...
As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.
5 years ago
wangchaochaohu
1ee2a9a424
Profiler refine ( #23294 )
...
* refine output of profiler for child event
5 years ago
Leo Chen
488b2387e2
Feature/expand params in auto-generated pybind functions for dygraph operators ( #23181 )
...
* expand parameters, test=develop
* support resnet, test=develop
* fix resnet, test=develop
* support duplicable out, test=develop
* support ptb
* fix bugs, test=develop
* support null input, test=develop
* fix bugs, test=develop
* fix batchNorm is_test, test=develop
* refine code, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
5 years ago
GaoWei8
20eed5401a
Change fluid.layers.where‘s C++ operator name ( #23250 )
5 years ago
Yi Liu
2169e6fb58
Initialize global nccl_comm in PE ( #23275 )
5 years ago
Jacek Czaja
012886df79
[DNNL] Softmax mkldnn op inplace support ( #23197 )
5 years ago
石晓伟
75ebb48a91
supports thread-binding stream, test=develop ( #23177 )
5 years ago
石晓伟
708ded584e
pause the io_utils_test of int64 and resume after repair, test=develop ( #23234 )
5 years ago
Zeng Jinle
babda94c8a
Distinguish public/private global vars ( #23269 )
...
* distinguish public/private vars, test=develop
* fix windows issues, test=develop
5 years ago
zhaoyuchen2018
58615a6272
Improve elementwise performance. ( #23001 )
...
* Improve elementwise performance.
Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.
* Add some cuda kernel to speedup common broadcast cases. test=develop
* Add more test cases and fix cuda kernel bug. test=develop
* Remove tests as cpu percision fails.test=develop
* Refine SplitDims, test=develop
* Change file mode, test=develop
5 years ago
Wojciech Uss
f836c8aa8f
add check for scales and a message ( #23119 )
5 years ago
Zeng Jinle
8bfd62ffb7
Expose dygraph.grad api ( #23124 )
...
* expose dygraph.grad api, test=develop, test=document_fix
* add more parameter in dygraph.grad API, test=develop
* add only_inputs=True parameter, test=develop
* follow comments, test=develop, test=document_fix
* fix typo, test=develop, test=document_fix
5 years ago
Wilber
0129f4b568
Add some inference API comments for AnalysisPredictor ( #23242 )
...
* add inference api doc. test=develop
5 years ago
Tao Luo
c00d427d52
simplify the cmake log of ir/CMakeLists.txt ( #23262 )
...
test=develop
5 years ago
Zeng Jinle
77b4dc80c9
code polish for adding const qualifier, test=develop, test=document_fix ( #23248 )
5 years ago
Zhaolong Xing
430b0099c9
[Paddle-TRT]: Ernie Dynamic shape support. ( #23138 )
...
* add dynamic plugin support.
test=develop
* change emb eltwise layernorm to math function
test=develop
* add emb eltwise layernorm
test=develop
* can run dynamic shape ernie
test=develop
* fix ci
test=develop
* add ut for trt ernie dynamic
test=develop
* refine dynamic shape c++ interface.
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
xujiaqi01
68ea1ad55b
add clear one table ( #23089 )
...
* add clear_one_table
* test=develop
5 years ago
danleifeng
ae3bb16d06
add MaskAucCalculator in paddlebox ( #23157 )
...
* add maskauc in paddlebox; test=develop
5 years ago
liym27
6af480ca33
Support int64 for op assign_value. test=develop ( #23179 )
5 years ago
Zeng Jinle
53e6f8e1da
rename macro, test=develop ( #23161 )
5 years ago
Zeng Jinle
bba740710d
add cuda resource pool for BufferedReader, test=develop ( #23152 )
5 years ago
Zeng Jinle
7d8d50b6cc
rename no_need_buffer_vars macro, test=develop ( #23160 )
5 years ago
Liufang Sang
a486a739e1
fix compile error in win gpu ( #23196 )
...
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
5 years ago
Zeng Jinle
7ca77a90ac
add Tensor::IsSharedBufferWith method, test=develop ( #23175 )
5 years ago
Zeng Jinle
b8886bf122
rename no_need_buffer_vars_macro, test=develop ( #23159 )
5 years ago
Zeng Jinle
bae5930ba1
fix graph attr copy issues, test=develop ( #23191 )
5 years ago
wangchaochaohu
b721e23b25
transpose cudnn using cudnn v7 api ( #19738 )
...
* refine the transopose conv using v7 to choose algorithm
5 years ago
Pei Yang
46b8d282dc
Add some inference API comments for AnalysisConfig ( #23117 )
...
* add some API comments in paddle_analysis_config.h, test=develop
* add some API comments in paddle_analysis_config.h, test=develop
5 years ago
Adam
4f5e4540f8
Improve SGD jit code to work with large data ( #23120 )
5 years ago
Liufang Sang
4db031902d
add dequantize_log_op and make pyramid hash support int8 weight ( #22548 )
...
* add dequantize_log_op and make pyramid hash support int8 weight test=develop
* add unittest and update pyramid hash op test=develop
* remove paddle_enforce test=develop
* fix error message test=develop
* remove incorrent commit test=develop
* fix error message in log_dequantize test=develop
* change 2019 to 2020 test=develop
* remove useless check_grad test=develop
5 years ago
Zeng Jinle
e5fef8f38a
[Dygraph double grad]Code polish ( #23121 )
...
* fix dygraph double grad, test=develop
* fix unpack constructor, test=develop
5 years ago
Zeng Jinle
9258e96094
fix read op comments, test=develop, test=document_fix ( #23122 )
5 years ago
Zeng Jinle
acfc9b8a70
Reader sequential and inference partial feed ( #22699 )
...
* sequential reader stage 1, test=develop
* fix ut, test=develop
* fix iterable=False reset bug, add some logs and polish code, test=develop
* inference feed partial data, test=develop
* Turn on keep_order=True for test, test=develop
* enhance ut to test more cases, test=develop
* test commit for reverting
* Revert "test commit for reverting", test=develop
This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58.
* add ut of merged and unmerged results, test=develop
* add more uts for coverages and add en doc of api, test=develop
* follow comments, test=develop
* change note style, test=develop
5 years ago
Wilber
95b356a069
update embedding_eltwise_layernorm fuse and kernel. test=develop ( #23114 )
...
update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input
5 years ago
Zeng Jinle
a31d7328b7
Add dygraph double grad implementation ( #22939 )
...
* add double grad implementation for dygraph, test=develop
* polish code, add uts, test=develop
* fix place bug, test=develop
* polish codes, add more uts for coverages, test=develop
* add no_grad_set, test=develop
* add star gan ut, test=develop
* follow comments, test=develop
5 years ago
Yiqun Liu
3af4771122
Add the detection and code-generation of sqrt and square in fusion_group ( #23095 )
5 years ago
hutuxian
0c30098f8b
Add need_save_delta parameter to solve OOM ( #23097 )
5 years ago
songyouwei
2e2da7124b
high-performance dygraph slice ( #22879 )
...
* move __getitem__ to cpp
* bug fix
* add type check and gil release
* support negative step with omitted ends
test=develop
* code refine
test=develop
* bug fix
test=develop
* slice always return different pyobj
test=develop
5 years ago
Sylwester Fraczek
abee05a8c8
added mkldnn swish activation ( #23041 )
5 years ago
Zhaolong Xing
8c6fde9e69
fix align error ( #23090 )
...
test=develop
5 years ago
Liufang Sang
915b892a15
Fix div zero in fake quantize op ( #22966 )
...
* fix div zero test=develop
* fix div zero test=develop
* add hostdevice function test=develop
* add eps when is zero test=develop
5 years ago
Yi Liu
121b2aed4d
initialize global nccl context in dygraph ( #23037 )
...
initialize global nccl context in dygraph
test=develop
5 years ago
Zhang Ting
880eb04d93
skip PrepareData when it is unnecessary ( #22839 )
...
* remove unnecessary prepare data, test=develop
* Op in while block will not skip PrepareData, test=develop
5 years ago
Feiyu Chan
01ab8a0619
add approximation for gelu, test=develop ( #22961 )
...
add approximation for gelu, default value is False (only kernel with eigen is added, remove code for computing gelu with MKLDNN temporarily)
5 years ago
Adam
5842ae6785
Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )" ( #22985 )
5 years ago
Pei Yang
24db750386
fix trt int8 calib precision bug. test=develop ( #23036 )
5 years ago
GaoWei8
1dc1f9270e
Fix lod error of concat op for axis = 0 ( #22538 )
5 years ago
yaoxuefeng
660ff18488
fix datsset test=develop ( #23043 )
5 years ago
Zhang Ting
714b0076b6
Override GetKernelTypeForVar to avoid device transform, test=develop ( #23032 )
5 years ago
wangchaochaohu
112e3edbf6
fix the conv group problem test=develop ( #23025 )
5 years ago
Wilber
db40ee86db
fix unittets. test=develop ( #23018 )
5 years ago
wangchaochaohu
99db0cf762
remove debug log test=develop ( #22994 )
5 years ago
wangchaochaohu
3757e0687c
Add Unittest for backward of fusion group ( #22932 )
...
* add fusion group test for backward and refine code
5 years ago
chengjuntao
63f3ada7b9
fix bug which input shape ( #22965 )
...
* fix bug which input shape, test=develop
* add error type,test=develop
5 years ago
Zhang Ting
137d6563fc
add check for assigned data, test=develop ( #22960 )
5 years ago
wangchaochaohu
f0d193a23c
Cast fusion for fusion group ( #22876 )
...
* add support for expression type convert and add cast Op support in fusion group
5 years ago
yaoxuefeng
29a7a52d38
Fix instag ( #22632 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* fix compile warning test=develop
* add attr default test=develop
* add unittest test=develop
* fix style test=develop
* fix style test=develop
* change out_val_ifempty to out_val_if_empty test=develop
5 years ago
wangchaochaohu
c979c9f2b0
refine the profiler print test=develop ( #22968 )
5 years ago
Wilber
ff3ddbb502
add skip_layernorm pass. test=develop ( #22895 )
...
* add skip_layernorm pass. test=develop
5 years ago
wawltor
f154d5860f
Speed up the matmul op, use the gemm replace the batch gemm ( #22926 )
...
In the op of gemm, we use the gemm to replace batch gemm, speed up the matmul op
5 years ago
Adam
056edf3929
Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )
5 years ago
Zhaolong Xing
8d6dc102fe
[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse ( #22494 )
...
* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop
* 1. refine fc
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
guofei
3d8571e884
modify assign op and add unittest of assign op ( #22769 )
...
As the title.
5 years ago
Zeng Jinle
d33c4343e1
Imperative tracer refactoring ( #22457 )
...
* refine grad maker, test=develop
* refactor tracer stage 1, test=develop
* merge develop to solve conflict third times, test=develop
5 years ago
liu zhengxi
61fef9754b
Fix fc padding bug during inference fusion ( #22860 )
...
* fix fc padding during fusion, test=develop
* fix optim model inference after SaveOptimModel, test=develop
5 years ago
tangwei12
ad9c8f6d2d
fix communicator when break under pyreder mode ( #22911 )
...
* fix communicator when breaking under PyReader mode, test=develop
* revert some vlog level to 0, test=develop
5 years ago
mapingshuo
5ba9dfc16a
add lookup_table_dequant_op ( #22900 )
...
add lookup_table_dequant_op
5 years ago
zhaoyuchen2018
a020a25797
Fix model int8 quant fail, test=develop ( #22891 )
...
As model fails when enable int8 quant, so disable allocate memory in cpu
for small variable.
5 years ago
Zhaolong Xing
dd67d44a50
[Paddle-TRT] : (Part1) Dynamic shape support ( #22868 )
...
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
5 years ago
tangwei12
07e13b84cd
remove vlog, test=develop ( #22898 )
5 years ago
Zhang Ting
ca9c8b417d
fix compute ratio of profile, test=develop ( #22872 )
5 years ago
wangchaochaohu
dbb0b9b3b6
refine the profiler print ( #22823 )
...
* refine the profiler print test=develop
5 years ago
Michał Gallus
0038bfbd1d
Prevent loading of warmup data in analyzer_int8 if enable_int8 is set to false ( #22857 )
5 years ago
Chen Weihang
1644926a6c
Polish detail implement of dygraph data loader ( #22878 )
...
* polish detail implement of data loader, test=develop
* solve coverage ci problem, test=develop
5 years ago
Wilber
f686310d81
fix concat_mkldnn op. test=develop ( #22692 )
...
fix concat_mkldnn op when encounter extreame conditions.
5 years ago
hong
5191e54494
reduce default attrs for dynamic graph ( #22850 )
...
* reduce default attrs for dynamic graph, test=develop
* add some explanations for explicit attr, test=develop
* tweak explicit attr comments, test=develop
5 years ago
Zhaolong Xing
1a533ed2de
[BUG]: Multihead matmul op's ouput size should be BxSx(N*H) ( #22848 )
...
test=develop
5 years ago
hong
c736fef93b
dygraph backward engine accelerate ( #22808 )
...
* fix loaded program load bug; test=develop
* first version
* speed backward engin; test=develop
* remove useless code; test=develop
* reconvery io.py; test=develop
* remove useless code; test=develop
* remove useless code; test=develop
5 years ago
Zeng Jinle
d41d802ba3
Add flags to limit gpu memory ( #22793 )
...
* add recorded cuda memory apis, fix typo, test=develop
* add more ut, test=develop
* follow comments, test=develop
* fix py35 incompatible issues, test=develop
5 years ago
石晓伟
1861ca88f1
serialize the PaddleTensor, test=develop ( #22810 )
...
* encapsulate the PaddleTensorToLoDTensor, test=develop
* serialize the pd_tensor, test=develop
* serialize tensors to file, test=develop
5 years ago
Zhang Ting
72ff5a09c3
fix print bug of profile, test=develop ( #22804 )
5 years ago
Zhang Ting
4e8bc02461
add fluid.device_guard to specify the device type for Op ( #22254 )
...
* add fluid.device_guard to specify the device type for Op
5 years ago
石晓伟
ddb9b46fec
change the function in op_teller, test=develop ( #22794 )
...
* change the function in op_teller, test=develop
* correct the commit-id, test=develop
5 years ago
Zhen Wang
89cfa49156
Unmerged fetch list ( #22635 )
...
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
* add the unit test for fetch_unmerged.
* update ut for multi-card and multi-cpu.
* add the error message and the user suggestion in FetchOpHandle. test=develop
5 years ago
wangchaochaohu
8456c3f4dd
polish the profiler_help code ( #22811 )
5 years ago
zhongpu
2fd1ec1e3e
fix docker build for paddle openblas, test=develop ( #22795 )
5 years ago
Chen Weihang
7d8d573453
Speed up dygraph DataLoader based on shared memory and LoDTensor serialization ( #22541 )
...
* add lodtensor share memory & serialization, test=develop
* fix windows compile error, test=develop
* deal vartype pickle & fix unittest matching error message, test=develop
* update timeout variable name, test=develop
* refactor memory map implement, test=develop
* clear mmap file discripter when exit unexpectedly, test=develop
* remove the child process fd in advance, test=develop
* remove mmap fds after Queue.put in child process, test=develop
* add hard unittests for register exit func, test=develop
* fix python2 compatibility problem in unittest, test=develop
* fix exception unittest error, test=develop
* polish code based review comment, test=develop
5 years ago
liu zhengxi
324f2b3922
Fix inference c api PD_GetZeroCopyOutput lod ( #22768 )
...
* fix inference c api lod, test=develop
* fix capi lod problem and enrich tests, test=develop
* delete useless header files and alter const_cast, test=develop
5 years ago
wangchaochaohu
7578fcbac4
Profile code refine ( #22800 )
...
* add profiler_help.h to refine the code test=develop
5 years ago
hutuxian
53a2b68f4e
support customized download command in dataset ( #22782 )
...
* user can call dataset.set_download_cmd to set its customized download cmd
* add UT to cover this scenario
5 years ago
wangchaochaohu
ca9e77a8d4
add sum op support for fusion group ( #22771 )
...
* Add the codegen and auto fusion for sum Op in fusion group
5 years ago
tianshuo78520a
433cef03e5
fix typo word ( #22784 )
5 years ago
Kaipeng Deng
ebc7ffc300
fix detection_map. test=develop ( #22705 )
5 years ago
zhaoyuchen2018
72dde4abde
Refine adam op to improve performance, test=develop ( #22346 )
...
* Refine adam op, test=develop
* Fuse kernels together to reduce cpu time.
* Refine paddle enforce, test=develop
* Remove some comments, test=develop
* Refine code,test=develop
* Refine cuda kernel, test=develop
* Refine code according to comments, test=develop
5 years ago
wangguanzhong
f2d1cd119a
fix lod level, test=develop ( #22755 )
5 years ago
FlyingQianMM
79d712346f
Correct CPU gradients of the argsort op ( #22739 )
...
* Correct CPU gradients of the argsort op, form a network to test its forward and backward process, test=develop
* fix dynamic threshold error in test_argsort_op, test=develop
5 years ago
Adam
2b80e9a719
Add cpu_info without XBYAK ( #22716 )
5 years ago
guofei
ae8b5f11a3
Change ShareDataWith() to TensorCopy() in ref_by_trainer_id ( #22717 )
...
As the title
5 years ago
liu zhengxi
71ab0458e1
Fix pointer and c-api encapsulation ( #22663 )
...
* refine pointer and c-api prototype, test=develop
* fix new c api profile bug, test=develop
* add unit tests, test=develop
5 years ago
Leo Chen
b2c1be851a
support cond in clone, test=develop ( #22657 )
...
* support cond in clone, test=develop
* refine code, test=develop
* refine code, test=develop
* follow comments, test=develop
* refine code, test=develop
5 years ago
Zhang Ting
f97f3f9301
add framework overhead ratio in profile report ( #22590 )
...
* add framework overhead ratio, test=develop
* print GpuMemcpy overhead, test=develop
5 years ago
zhouwei25
160d0f1308
fix the CI risk that network cannot be connected ( #22736 )
5 years ago
chengjuntao
15c2667143
register fp16 for assign op ( #22744 )
...
* register fp16 for assign op, test=develop
* add op test for fp16, test=develop
5 years ago
zhangchunle
882e7f7c3b
Directly getting API.spec for tools/sampcd_processor.py ( #22728 )
5 years ago
dyning
1c0653462d
fix generate_mask_labels lod level ( #22743 )
5 years ago
GaoWei8
ba140222d6
fix compile&runtime lod_equality of lod_reset ( #22737 )
5 years ago
hutuxian
175954d894
PaddleBox Framework Part2 ( #22466 )
...
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
5 years ago
ShenLiang
3132681e8a
add partial_sum op in contrib ( #22292 )
...
* add partial_sum_op, test=develop
* modify the Paddle Error Message, test=develop
* modify the Paddle Error Message, test=develop
* modify the bug for python3, test=develop
* modify the ut for ci, test=develop
* mv to contrib, test=develop
* use check_variable_and_dtype, test=develop
* fix ci, test=develop
* fix conflict, test=dvelop
* add partial concat, test=develop
* fix the conflict, test=develop
* fix the error, test=develop
* rm SSE4, test=develop
5 years ago
wangchaochaohu
611411b90e
Fusion group profile support ( #22718 )
...
* add support for the driver api callback and fix the profiler name show bug
5 years ago
ShenLiang
e136661304
add partial_concat op in contrib ( #22528 )
...
* add partial_concat, test=develop
* fix the grids and blocks, test=develop
* fix the Paddle_Enforce, test=develop
* fix the doc of op, test=develop
* fix the doc, test=develop
* fix the doc of the op, test=develop
* replace -1 with None, test=develop
5 years ago
GaoWei8
cdf5f6fb8c
Add an inference interface to disable FC padding ( #22097 )
...
* Add an interface of disabling FC padding
* fix bert regression
* polish fc padding interface
* recover pass function
* fix argument error
* fix mkldnn error
5 years ago
tianshuo78520a
d2ba91aad1
fix typo words ( #22653 )
5 years ago
Yibing Liu
6e7bfe30a6
register fp16 kernel for some ops ( #22650 ) ( #22696 )
...
test=develop
5 years ago
tangwei12
66a3150135
SYNC with communicaotor ( #22344 )
...
* add sync communicator and implement
5 years ago
Yiqun Liu
22bbd54719
Add the support of fp16 in fusion_group ( #22239 )
5 years ago
flame
d97475d53b
fix CPU C inference API compile bug ( #22702 )
5 years ago
Huihuang Zheng
adfa5b8354
Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp ( #22673 )
...
1. Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp.
2. Also enrich PADDLE_ENFORCE error messages.
5 years ago
flame
74eb82de19
fix go api bug ( #22669 )
5 years ago
wangchaochaohu
a089072c8b
fix the profile print error ( #22665 )
...
* fix the profile print error test=develop
5 years ago
lidanqing
d926214535
[UT coverage] improve the mul_mkldnn_op line coverage ( #22408 )
...
* improve the mul_mkldnn_op line coverage
test=develop
* remove fp32 mul mkldnn kernel
test=develop
* locally refactoring
test=develop
* change according to reviews
test=develop
5 years ago
wangchaochaohu
c65c6ae534
add flag to control profile level in python API ( #22319 )
...
* add python flag to control profile level test=develop
5 years ago
123malin
00594c1c88
support dumping params/grads in transpiler mode ( #22490 )
5 years ago
Zhaolong Xing
a06d75a280
[Paddle-TRT] Refine the error log about runtime batch and max_batch_size. ( #22535 )
...
* fix trt log
test=develop
* fix comments
test=develop
5 years ago
Adam
608447bfd5
Update MKLDNN to v1.2 ( #22521 )
5 years ago
Adam
ab610a34ff
transpose_mkldnn code change to meet Paddle standards ( #22591 )
5 years ago
Jiawei Wang
8f035fb637
Add TopK Op Grad CPU&GPU Kernel test=develop ( #22628 )
...
* Add TopK Op Grad CPU&GPU Kernel test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify PADDLE_ENFORCE test=develop
* Add TopK Op Grad, modify PADDLE_THROW test=develop
* Add TopK Op Grad, modify unittest test=develop
* fix ngraph top k op unittest test=develop
5 years ago
Steffy-zxf
90ee366653
update ops's unittest data type from float32 to float64 and shape over 100 ( #22544 )
...
* update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt
1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64)
2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data
3. remove sqrt from op_accuracy_white_list.py
4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100
5. test=develop
* modify the writing style according suggestions
test=develop
5 years ago
flame
f7eafca828
remove python inference warning ( #22602 )
5 years ago
Chen Weihang
fe685cc185
fix enforce test error, test=develop ( #22610 )
5 years ago
Wilber
9a8203aa25
fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop ( #22551 )
...
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
5 years ago
Chen Weihang
266106da75
Fix mismatch with plus sign in the line ( #22588 )
...
* reproduce match error, test=develop, test=document_fix
* fix mismatch error, test=develop, test=document_fix
5 years ago
flame
1d503e6a9e
Golang inference API ( #22503 )
...
* support golang inference
5 years ago