Zhang Ting
9bc1e0a156
fix the CI random failure for dist op ( #23743 )
5 years ago
Zeng Jinle
54d3b5a1eb
enhance err_msg, test=develop ( #23714 )
5 years ago
Michał Gallus
a63bcf9ae7
[DNNL][INT8][FP32] MatMul ( #23395 )
...
* Initial FP32 DNNL MatMul Implementation
* Implement int8 DNNL MatMul
* Unify in-kernel-naming, clean UTs
* MatmuL: Introduce op caching
* Final adjustments
test=develop
* Remove dy_graph disablement
test=develop
* Change dnnl header name to new one
test=develop
* Contrain multi head check to prevent fails
test=develop
* Resolve dnnl header problems on MAC CI
* Variable namings to kernel and skip_grad_ci added
test=develop
* Prevent MAC CI from failing
* Prevent windows build from failing
test=develop
* Modify UTs to conform to the rules
* Modify MatMul aux functions namings
test=develop
5 years ago
liuwei1031
6c332ad6c6
imporve error messages for conv, conv_transpose, cos_sim, group_norm ( #23675 )
...
* imporve error messages for conv, conv_transpose, cos_sim, group_norm
5 years ago
zhupengyang
5b3dd80633
Op(prelu) error message enhancement ( #23616 )
5 years ago
Zhang Ting
4773e3f541
add dist op ( #23503 )
...
* add dist op
5 years ago
littletomatodonkey
1c08a2136e
test=develop, add addmm op ( #23384 )
...
add addmm op
5 years ago
zhaoyuchen2018
7b5e23c034
OP(fusion_gru) error message enhancement. test=develop ( #23599 )
...
C++ OP enhancement.
5 years ago
Chengmo
8c0bdde934
Add Tdm sampler op in Contrib ( #23290 )
...
* add tdm sampler op
* fix compile bug
* fix compile bug
* fix compile bug
* fix compile bug
* test=develop, add tdm sampler unittest
* fix tdm sampler unittest
* fix input var name bug
* update tdm sampler unittest
* fix unittest
* fix unittest
* update tdm sampler unittest
* add tdm exe run unittest
* fix infershape
* test=develop, add doc
* test=develop, fix gcc8 compile bug & unittest bug
* test=develop, fix unittest
* test=develop, fix T one & zero
* test=develop, add unittest check
* test=develop, add doc smaple code & fix dype set
* test=develop, fix dtype
* test=develop, fix compile bug
* test=develop, fix unittest
* test=develop, fix unittest
* test=develop, check py3 unittest
* test=develop,fix unittest
* test=develop, fix py3&py2 unittest diff
* test=develop, fix sample code
* test=develop, fix sample code
* test=develop, fix sample code
* test=develop, fix error message
5 years ago
GaoWei8
517929f148
Op (reorder_lod_tensor_by_rank) error message enhancement ( #23552 )
5 years ago
Pei Yang
28f04c6a5e
refine shuffle channel errmsg, test=develop ( #23520 )
5 years ago
Leo Chen
b59426b52a
Enhance error msg of imperative code ( #23572 )
...
* fix init_gflags with 'python -c', test=develop
* enhance error msg related Tracer, test=develop
* refine err msg, test=develop
* follow comments, test=develop
5 years ago
Wilber
1ac9db4354
error message enhancement for fusion_seqpool_concat_op. test=develop ( #23563 )
...
error message enhancement for fusion_seqpool_concat_op
5 years ago
Wilber
286c2e0ede
error message enhancement for py_func op. ( #23565 )
...
error message enhancement for py_func op.
5 years ago
hutuxian
94a3789fd0
Add AfsAPI in PaddleBox ( #23419 )
...
* Involves AfsAPI to resolve slow downloading.
* Mainly used in PaddleBox
5 years ago
liym27
06d4aa4e73
API (BuildStrategy) error message enhancement. ( #23462 )
5 years ago
Zeng Jinle
674355a097
fix GET_DATA_SAFELY ptr, test=develop ( #23679 )
5 years ago
zhongpu
37fcf03af7
Op (Save/Load) error message enhancement, test=develop ( #23650 )
5 years ago
silingtong123
c6d14bc839
show the exception messages of cpp inference library in msvc ( #23702 )
5 years ago
Tao Luo
e4f1b1c5e1
solve mklml memory leak ( #23557 )
5 years ago
Zhen Wang
84cd45f674
Solve the conflict of ops with the same name, test for CI. ( #23573 )
...
* solve the conflict of ops with the same name. test=develop
5 years ago
wangguanzhong
c2f5a3ad34
enhance the error message of roi_align, test=develop ( #23649 )
5 years ago
silingtong123
cec234b1aa
test=develop, error message of tree_conv OP enhancement ( #23574 )
5 years ago
Kaipeng Deng
b465bb0de7
fix adaptive_pool2d/pool3d error message. test=develop ( #23658 )
5 years ago
Zhaolong Xing
f345607115
Refine transpose flatten concat error message ( #23625 )
...
* refine fusion_transpose_flatten_concat_op log
test=develop
* fix ci error
test=develop
5 years ago
Chen Weihang
df538439f5
api build strategy error polish, test=develop ( #23546 )
5 years ago
Zeng Jinle
7f3e0eaad1
refine error msg, test=develop ( #23589 )
5 years ago
Zeng Jinle
ea22515ae4
pimpl to polish code, test=develop ( #23597 )
5 years ago
zhaoyuchen2018
42d67dacb6
OP(minus) error message enhancement. test=develop ( #23621 )
...
C++ error message enhancement.
5 years ago
Huihuang Zheng
a82ce2b1bb
API/OP (ConditionalBlock) error message enhancement ( #23480 )
...
API/OP (ConditionalBlock) error message enhancement (#23480 )
5 years ago
Yiqun Liu
4489f0d304
Op(fetch) error message enhancement. ( #23542 )
5 years ago
Zhen Wang
2cf27260ae
OP(fake_quantize) error message enhancement ( #23550 )
...
* improve error messages of fake_quantize op. test=develop
* update the bit_length error info. test=develop
5 years ago
Zhen Wang
1cf64e00fc
improve error messages of fake_dequantize_op. test=develop ( #23556 )
5 years ago
mozga-intel
3baaee9aab
Remove: NGraph engine from PDPD repository ( #23545 )
...
* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop
* Remove ngraph from:
1. Cmake file
2. Python file
test=develop
5 years ago
wangchaochaohu
81e8fd4a3e
API(fluid.layers.array_length) error message enhancement ( #23547 )
5 years ago
Zhang Ting
1b8fe70e48
fix VLOG, test=develop ( #23327 )
5 years ago
wangguanzhong
6bb8206d03
enhance the error message of box_clip, test=develop ( #23638 )
5 years ago
liym27
8987946fe2
Api/Op (select_input/select_ouput) error message enhancement. ( #23445 )
5 years ago
Wilber
5f22478a93
error message enhancement for repeated fc. test=develop ( #23562 )
...
error message enhancement for repeated fc
5 years ago
Wilber
a5bdf485d5
fill op error message enhancement. test=develop ( #23560 )
...
fill op error message enhancement
5 years ago
GaoWei8
2c4b57e94b
Op (concat) error message enhancement ( #23523 )
5 years ago
Chen Weihang
45880f604b
API(Program) error message enhancement ( #23519 )
...
* polish api program error message, test=develop
* fix condition error, test=develop
* fix test prune error, test=develop
* fix coverage problem, test=develop
5 years ago
GaoWei8
66cae9157e
Op (lod_reset) error message enhancement ( #23499 )
5 years ago
liym27
dc225ed2fc
OP (tensor_array_read_write) error message enhancement. test=develop ( #23468 )
5 years ago
joanna.wozna.intel
3cb5623dad
Add matmul dequant squash ( #23505 )
...
test=develop
5 years ago
Pei Yang
3d5d217030
Revert "[Paddle-TRT] Add hard_sigmoid and hard_swish support(support MobilenetV3) ( #23536 )", test=develop ( #23642 )
...
This reverts commit cdc6d4e292
.
5 years ago
Pei Yang
eb11633611
batch_norm trt converter error message, test=develop ( #23620 )
5 years ago
wangchaochaohu
c1187cd6f4
Fp16 refine for fusion group ( #23472 )
5 years ago
joanna.wozna.intel
ce08fdcf2b
Add support for INT8 matmul in C-API quantization ( #23463 )
...
* Integrate matmul with cpu_quantize_pass
test=develop
* Add matmul checking scales
test=develop
* Change condition of matmul quantization
test=develop
* Remove redundant var
test=develop
5 years ago
GaoWei8
c068512f34
Implement a new C++ operator where and API tensor.where ( #23220 )
5 years ago
石晓伟
9b82e4c183
change the cmake and apis of lite engine, test=develop ( #22934 )
...
* change the cmake and apis of lite engine, test=develop
* change the cmake of lite engine, test=develop
5 years ago
Yiqun Liu
55d0c8fde7
Enhance the error message of feed_op. ( #23526 )
5 years ago
Huihuang Zheng
71b5f1d2b2
OP (recurrent) error message enhancement ( #23481 )
...
* OP (recurrent) error message enhancement
5 years ago
Aurelius84
8674a82c03
Op (Scope) error message enhancement ( #23458 )
...
* Op (Scope) error message enhancement test=develop
5 years ago
Pei Yang
cdc6d4e292
[Paddle-TRT] Add hard_sigmoid and hard_swish support(support MobilenetV3) ( #23536 )
...
* add hard_sigmoid trt op converter
* add hard_swish op converter and plugin. test=develop
5 years ago
Pei Yang
42655ef721
Add full_like op. ( #23364 )
...
* add full_like op. test=develop
* add dygraph support. test=develop
* increase coverage. test=develop
5 years ago
Zhang Ting
480530c4e3
API(place-related) error message enhancement ( #23515 )
5 years ago
guofei
ca7bd2beb1
Add a function to update FLAGS ( #22851 )
...
* Add a function to update FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* expr flags
* Add a function to update FLAGS
test=develop
* distinguish public/private vars, test=develop
* fix windows issues, test=develop
* expr flag
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set flags
test=develop
* Add functions to get and set FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* Add functions to get and set flags in Paddle
test=develop
Co-authored-by: sneaxiy <sneaxiy@126.com>
5 years ago
wangchaochaohu
d085f79228
fix untime fail for output var stop_gradient=True for fusion group ( #23317 )
5 years ago
Adam
62aff0a7ac
Add DNNL GELU kernels ( #22426 )
5 years ago
silingtong123
009c049e82
Add randint op API ( #23337 )
...
* add randint op
5 years ago
qingqing01
6162cf2f2e
Make optimizer consistent in dygraph and static-graph and remove some LOG-INFO. ( #23426 )
...
* Make optimizer consistent in dygraph and static-graph and remove some LOG-INFO
5 years ago
wangchaochaohu
29c4fae112
Tensor value support ( #23491 )
...
* add support for value tensor support of fill_constant Op
5 years ago
Chengmo
426912df5a
Add Index sample OP ( #23218 )
...
* add index_sample op
5 years ago
zhangchunle
638d924d89
Op (FusionSquaredMatSub) error message enhancement. ( #23498 )
5 years ago
ShenLiang
c706ff20a3
fix conflict, test=develop ( #23298 )
5 years ago
ShenLiang
5223e2bbc4
Add a new DataFeed named PaddleBoxDataFeed ( #23321 )
...
* add paddleboxdatafeed
* add ifdef linux and boxps
* add untest for datafeed
* fix untest of test_paddlebox_datafeed
* fix untest
* rename function
5 years ago
Chen Weihang
75bd350710
Implement StaticModelRunner to support dygraph fine-tune static graph pre-training model ( #23171 )
...
* static model runner basic implement, test=develop
* add run program op to execute loaded program, test=develop
* refactor static model runner & run program op, test=develop
* reset engine.cc to resolve conflict
* adapt the change of dygraph double grad, test=develop
* refactor impl to solve control flow error, test=develop
* clear debug code, test=develop
* fix ci str compatible error & checkout dygraph grad maker & add example, test=develop
* hide api & add op test, test=develop
* fix run program op test places error, test=develop
* fix program by review comment, test=develop
* delete change var desc name, test=develop
* fix other program by review comment, test=develop
* remove _static_graph_guard, test=develop
* add selectedrows test, test=develop
* remove desc parser, test=develop
* fix detail program, test=develop
* change socpe create & add test, test=develop
5 years ago
cc
9297f49e4b
[OP] Add randperm op ( #23292 )
5 years ago
Kaipeng Deng
d223a24904
Fix inplace_abn compile error on Windows ( #23464 )
...
* fix inplace_abn windows compile error. test=develop
5 years ago
Tao Luo
0b583235f5
Revert "Solve the conflict of ops with the same name. ( #23199 )" ( #23494 )
...
This reverts commit abe3e6906d
.
test=develop
5 years ago
wawltor
6577f91b74
Add the sum op to API 2.0, add some parameters for new api
...
* Add the sum op to API 2.0, test=develop
* Fix the import meesage in common_ops_import
5 years ago
石晓伟
36b82eae0e
refine the doc of paddle_api.h, test=develop ( #23402 )
...
* refine the doc of paddle_api.h, test=develop
* fix documents, test=develop
5 years ago
WuHaobo
c4d0305239
add tril op and triu op ( #23469 )
...
add tril op and triu op
5 years ago
yongqiangma
eb035f24d1
add unbind op ( #23359 )
...
* add unbind op
unbind(tensor, dim=0):
说明:移除指定维后,返回一组数组,包含了沿着指定维切片后的各个切片。
tensor(Tensor) -- 输入Tensor
dim(int) -- 删除的维度
示例:
Input = [[1,2],
[3,4],
[5,6]]
axis = 0
Output[0] = [1,2]
Output[1] = [3,4]
Output[2] = [5,6]
5 years ago
zhangchunle
fd9b7bdb3d
Op (FusedEmbeddingSeqPool) error message enhancement. ( #23454 )
5 years ago
Chen Weihang
16315d3d9e
Delete Ref & VectorRef and add GetDataSafely ( #22997 )
...
* delete invalid check inferface Ref & VectorRef, test=develop
* fix vector ref delete error, test=develop
* try the new check inferface, test=develop
* change all related code with new check macro, test=develop
* remove static assert, test=develop
* polish detail, test=develop
* skip coverage problem, test=develop
* add new check macro, test=develop
5 years ago
Zhen Wang
abe3e6906d
Solve the conflict of ops with the same name. ( #23199 )
...
* solve the conflict of ops with the same name. test=develop
5 years ago
wawltor
0b092d05f1
Add the argmax op to API 2.0, and update some parameters
...
* Add the argmax op to API 2.0, test=develop
* Fix the compiler problem in arg_max op, test=develop
* Fix the import meesage in common_ops_import, test=develop
* Fix the default dtype of arg_min_max, test=develop
5 years ago
Leo Chen
f297a33285
Dev/fix init flags ( #23465 )
...
* fix init_gflags with 'python -c', test=develop
* add test, test=develop
* use sys.executable instead of python, test=develop
* keep dummy, test=develop
5 years ago
Zhaolong Xing
6a23850a3f
add init value to varis in analysis config. ( #23442 )
5 years ago
wawltor
915341e3de
Add the zeros, ones, ones_like, zeros_like for api 2.0, test=develop ( #23471 )
...
Update the new api ops of creation ops to the api 2.0
5 years ago
Zhen Wang
56b50c97f8
Add allclose_op ( #23335 )
...
* Add allclose Op, and its function is analogous to numpy.allclose. It returns True if two tensors are elementwise equal within a tolerance.
5 years ago
kinghuin
948c57d84b
move sin, sqrt, tanh, atan to paddle.tensor.math and add a new parameter "out" ( #23387 )
...
* sin sqrt tanh atan add out, test=develop
* optimize doc, test=develop
* add dygraph test, test=develop
5 years ago
Chengmo
a2e9af5663
Add Tdm child OP in contrib ( #23241 )
...
* add tdm child op
5 years ago
Wilber
9676ac1c5c
Add flip op. ( #23255 )
...
* add flip op
5 years ago
tianshuo78520a
d8a21ef6f3
test=develop;fix error ( #23467 )
5 years ago
Feiyu Chan
81f1402f6c
Add functional convolutions in paddle.nn.functional ( #23408 )
...
* add functional conv
* add test and doc for function convs, test=develop
* update ConvTransposeOp's InferShape and error message, test=develop
5 years ago
Zhaolong Xing
70782e6379
[Inference doc]: refine paddle_api.h doc ( #23354 )
...
* refine paddle api doc
test=develop
* fix comments
test=develop
5 years ago
Feiyu Chan
bcafe3179a
add MKL computation back to gelu's non-approximate part ( #23420 )
5 years ago
zhongpu
dbfbd7eac4
support Exhaustive search in dygraph ( #23415 )
...
* use global conv cache; test=develop
* use singleton cache; test=develop
* fix format error; test=develop
* add cudnn helper header; test=develop
* fix header error; test=develop
* fix mac unitest; test=develop
* fix mac unitest; test=develop
* fix file format; test=develop
* fix include file error, test=develop
* remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop
* fix test_elementwise_mul_op_dim, test=develop
* fix compile error, test=develop
Co-authored-by: phlrain <phliuhongyu@126.com>
5 years ago
zhaoyuchen2018
01d7ccd4b6
Fix elementwise compile error, test=develop ( #23381 )
...
elementwise function used before definition then failed in cuda 8, move it ahead.
5 years ago
gongweibao
24a063f6ac
Add fleet checkpoint on local fs and remote fs(such as hdfs) for EDL ( #22586 )
5 years ago
Zeng Jinle
0c23e3ff4d
fix Tracer::NoGrad, test=develop ( #23443 )
5 years ago
channings
a2e10930cf
update linspace, equal operators to API 2.0 ( #23274 )
...
* update linspace, equal operators to API 2.0, test=develop
* equal support higher performance CUDA kernel, test=develop
* update comment of equal&linspace operator, test=develop
* update comment of equal&linspace operator, test=develop
5 years ago
zhaoyuchen2018
4fe9ca6959
improve elementwise performance. ( #23405 )
...
* improve elementwise performance.
* Add contiguous check, test=develop
5 years ago
wangchaochaohu
5c60778731
polish the code of fusion group test=develop ( #23370 )
5 years ago
Leo Chen
a62599a888
[feature] prune program by feed and fetch_list automatically ( #22474 )
...
* prune train program by fetch_list, test=develop
* add unittest for prune, test=develop
* fix pruned feed, test=develop
* support ParallelExecutor and feed prune, test=develop
* add comments, test=develop
* update unittest, test=develop
* update unittests, test=develop
* remove debug code, test=develop
* support cond in clone, test=develop
* support cond in prune, test=develop
* support multiple minimize, test=develop
* support cache, test=develop
* fix _copy_param_info_from, test=develop
* support python2 str, test=develop
* remove debug code, test=develop
* fix bug of caching CompiledProgram, test=develop
* fix multi_device issue, test=develop
* tmp
* support tuple in fetch_list and overriding use_prune, test=develop
* dont use nonlocal in python2, test=develop
* remove nonlocal, test=develop
* code clean, test=develop
* code clean, test=develop
* feed list, test=develop
* test adam, test=develop
* follow comments, test=develop
* reduce duplicate code, test=develop
* update comments, test=develop
5 years ago
Chen Weihang
7f1ad510bd
Add op inout check macro to simplify error message writing ( #23430 )
...
* add op inout check macro, test=develop
* fix enforce_test, test=develop
5 years ago
Yiqun Liu
bc2981e998
Disable test_code_generator and test_post_training_quantization_mobilenetv1 ( #23440 )
5 years ago
Zeng Jinle
29337f4e17
fix conflict of inferne partial feed with gpu parallel ssa graph executor, test=develop ( #23400 )
5 years ago
Pei Yang
7e439780d9
add full paddle_analysis_config.h APIs. ( #23215 )
5 years ago
zhongpu
bfb07aafe8
Revert "Exhaustive search ( #22821 )", test=develop ( #23401 )
...
This reverts commit 48144e4099
.
5 years ago
liym27
b7b0b3595b
Add unittest for transformer prediction in dygraph_to_static ( #23207 )
...
* Add unittest for transformer prediction in dygraph_to_static.
* fix bug in fill_constant api.
* Make transpose support size 0. test=develop
5 years ago
xujiaqi01
93ea9dd27a
fix stat var in hogwild worker ( #23367 )
...
* fix stat var in hogwild worker
* test=develop
5 years ago
joanna.wozna.intel
8c463700e1
Add default pass attributes ( #23042 )
5 years ago
zhongpu
48144e4099
Exhaustive search ( #22821 )
...
* use global conv cache; test=develop
* use singleton cache; test=develop
* fix format error; test=develop
* add cudnn helper header; test=develop
* fix header error; test=develop
* fix mac unitest; test=develop
* fix mac unitest; test=develop
* fix file format; test=develop
* fix include file error, test=develop
* remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop
* fix test_elementwise_mul_op_dim, test=develop
Co-authored-by: phlrain <phliuhongyu@126.com>
5 years ago
Adam
da7c73f847
Delete is_test attribute from activation operators ( #23318 )
...
* Delete is_test from activation operators
test=develop
* Revent unneeded changes
test=develop
5 years ago
Kaipeng Deng
21d95be0db
Add inplace abn op ( #22806 )
...
* add inplace_abn_op. test=develop
5 years ago
Yi Liu
821534efd3
add paralell_executor dependancy to collective_helper ( #23380 )
...
test=develop
5 years ago
Zeng Jinle
3a21980b78
add reader dependency pass, test=develop ( #23301 )
5 years ago
wangchaochaohu
69e3f99362
refine the error message ( #23212 )
...
* refine the error message of tensor_array_read_write Op
5 years ago
石晓伟
5c59d2139e
reverts the commit 23177, test=develop ( #23363 )
5 years ago
wangchaochaohu
d280106007
Add support for attr type Op and add fill_constant Op and scale Op ( #23163 )
...
* add attr support for fusion group and add support for fill_constant and scale Op
5 years ago
xujiaqi01
3a45767d49
add fleet pslib pull and push sparse op and push dense op ( #23139 )
...
* add fleet pslib pull and push sparse op and push dense op
* test=develop
5 years ago
songyouwei
99d30bfc36
speedup slice impl ( #23340 )
...
test=develop
5 years ago
Zhaolong Xing
1a6ce8b910
add swish split gelu plugin dynamic support ( #23305 )
...
test=develop
5 years ago
Jacek Czaja
2bb1b0e89e
[DNNL] Added MKL-DNN inplace pass for C-API inference ( #23315 )
5 years ago
Yi Liu
0471476a18
fix nccl comm double free bug ( #23344 )
...
As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.
5 years ago
wangchaochaohu
1ee2a9a424
Profiler refine ( #23294 )
...
* refine output of profiler for child event
5 years ago
Leo Chen
488b2387e2
Feature/expand params in auto-generated pybind functions for dygraph operators ( #23181 )
...
* expand parameters, test=develop
* support resnet, test=develop
* fix resnet, test=develop
* support duplicable out, test=develop
* support ptb
* fix bugs, test=develop
* support null input, test=develop
* fix bugs, test=develop
* fix batchNorm is_test, test=develop
* refine code, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
* follow comments, test=develop
5 years ago
GaoWei8
20eed5401a
Change fluid.layers.where‘s C++ operator name ( #23250 )
5 years ago
Yi Liu
2169e6fb58
Initialize global nccl_comm in PE ( #23275 )
5 years ago
Jacek Czaja
012886df79
[DNNL] Softmax mkldnn op inplace support ( #23197 )
5 years ago
石晓伟
75ebb48a91
supports thread-binding stream, test=develop ( #23177 )
5 years ago
石晓伟
708ded584e
pause the io_utils_test of int64 and resume after repair, test=develop ( #23234 )
5 years ago
Zeng Jinle
babda94c8a
Distinguish public/private global vars ( #23269 )
...
* distinguish public/private vars, test=develop
* fix windows issues, test=develop
5 years ago
zhaoyuchen2018
58615a6272
Improve elementwise performance. ( #23001 )
...
* Improve elementwise performance.
Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.
* Add some cuda kernel to speedup common broadcast cases. test=develop
* Add more test cases and fix cuda kernel bug. test=develop
* Remove tests as cpu percision fails.test=develop
* Refine SplitDims, test=develop
* Change file mode, test=develop
5 years ago
Wojciech Uss
f836c8aa8f
add check for scales and a message ( #23119 )
5 years ago
Zeng Jinle
8bfd62ffb7
Expose dygraph.grad api ( #23124 )
...
* expose dygraph.grad api, test=develop, test=document_fix
* add more parameter in dygraph.grad API, test=develop
* add only_inputs=True parameter, test=develop
* follow comments, test=develop, test=document_fix
* fix typo, test=develop, test=document_fix
5 years ago
Wilber
0129f4b568
Add some inference API comments for AnalysisPredictor ( #23242 )
...
* add inference api doc. test=develop
5 years ago
Tao Luo
c00d427d52
simplify the cmake log of ir/CMakeLists.txt ( #23262 )
...
test=develop
5 years ago
Zeng Jinle
77b4dc80c9
code polish for adding const qualifier, test=develop, test=document_fix ( #23248 )
5 years ago
Zhaolong Xing
430b0099c9
[Paddle-TRT]: Ernie Dynamic shape support. ( #23138 )
...
* add dynamic plugin support.
test=develop
* change emb eltwise layernorm to math function
test=develop
* add emb eltwise layernorm
test=develop
* can run dynamic shape ernie
test=develop
* fix ci
test=develop
* add ut for trt ernie dynamic
test=develop
* refine dynamic shape c++ interface.
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
xujiaqi01
68ea1ad55b
add clear one table ( #23089 )
...
* add clear_one_table
* test=develop
5 years ago
danleifeng
ae3bb16d06
add MaskAucCalculator in paddlebox ( #23157 )
...
* add maskauc in paddlebox; test=develop
5 years ago
liym27
6af480ca33
Support int64 for op assign_value. test=develop ( #23179 )
5 years ago
Zeng Jinle
53e6f8e1da
rename macro, test=develop ( #23161 )
5 years ago
Zeng Jinle
bba740710d
add cuda resource pool for BufferedReader, test=develop ( #23152 )
5 years ago
Zeng Jinle
7d8d50b6cc
rename no_need_buffer_vars macro, test=develop ( #23160 )
5 years ago
Liufang Sang
a486a739e1
fix compile error in win gpu ( #23196 )
...
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
* fix compile error in win gpu test=develop
5 years ago
Zeng Jinle
7ca77a90ac
add Tensor::IsSharedBufferWith method, test=develop ( #23175 )
5 years ago
Zeng Jinle
b8886bf122
rename no_need_buffer_vars_macro, test=develop ( #23159 )
5 years ago
Zeng Jinle
bae5930ba1
fix graph attr copy issues, test=develop ( #23191 )
5 years ago
wangchaochaohu
b721e23b25
transpose cudnn using cudnn v7 api ( #19738 )
...
* refine the transopose conv using v7 to choose algorithm
5 years ago
Pei Yang
46b8d282dc
Add some inference API comments for AnalysisConfig ( #23117 )
...
* add some API comments in paddle_analysis_config.h, test=develop
* add some API comments in paddle_analysis_config.h, test=develop
5 years ago
Adam
4f5e4540f8
Improve SGD jit code to work with large data ( #23120 )
5 years ago
Liufang Sang
4db031902d
add dequantize_log_op and make pyramid hash support int8 weight ( #22548 )
...
* add dequantize_log_op and make pyramid hash support int8 weight test=develop
* add unittest and update pyramid hash op test=develop
* remove paddle_enforce test=develop
* fix error message test=develop
* remove incorrent commit test=develop
* fix error message in log_dequantize test=develop
* change 2019 to 2020 test=develop
* remove useless check_grad test=develop
5 years ago
Zeng Jinle
e5fef8f38a
[Dygraph double grad]Code polish ( #23121 )
...
* fix dygraph double grad, test=develop
* fix unpack constructor, test=develop
5 years ago
Zeng Jinle
9258e96094
fix read op comments, test=develop, test=document_fix ( #23122 )
5 years ago
Zeng Jinle
acfc9b8a70
Reader sequential and inference partial feed ( #22699 )
...
* sequential reader stage 1, test=develop
* fix ut, test=develop
* fix iterable=False reset bug, add some logs and polish code, test=develop
* inference feed partial data, test=develop
* Turn on keep_order=True for test, test=develop
* enhance ut to test more cases, test=develop
* test commit for reverting
* Revert "test commit for reverting", test=develop
This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58.
* add ut of merged and unmerged results, test=develop
* add more uts for coverages and add en doc of api, test=develop
* follow comments, test=develop
* change note style, test=develop
5 years ago
Wilber
95b356a069
update embedding_eltwise_layernorm fuse and kernel. test=develop ( #23114 )
...
update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input
5 years ago
Zeng Jinle
a31d7328b7
Add dygraph double grad implementation ( #22939 )
...
* add double grad implementation for dygraph, test=develop
* polish code, add uts, test=develop
* fix place bug, test=develop
* polish codes, add more uts for coverages, test=develop
* add no_grad_set, test=develop
* add star gan ut, test=develop
* follow comments, test=develop
5 years ago
Yiqun Liu
3af4771122
Add the detection and code-generation of sqrt and square in fusion_group ( #23095 )
5 years ago
hutuxian
0c30098f8b
Add need_save_delta parameter to solve OOM ( #23097 )
5 years ago
songyouwei
2e2da7124b
high-performance dygraph slice ( #22879 )
...
* move __getitem__ to cpp
* bug fix
* add type check and gil release
* support negative step with omitted ends
test=develop
* code refine
test=develop
* bug fix
test=develop
* slice always return different pyobj
test=develop
5 years ago
Sylwester Fraczek
abee05a8c8
added mkldnn swish activation ( #23041 )
5 years ago
Zhaolong Xing
8c6fde9e69
fix align error ( #23090 )
...
test=develop
5 years ago
Liufang Sang
915b892a15
Fix div zero in fake quantize op ( #22966 )
...
* fix div zero test=develop
* fix div zero test=develop
* add hostdevice function test=develop
* add eps when is zero test=develop
5 years ago
Yi Liu
121b2aed4d
initialize global nccl context in dygraph ( #23037 )
...
initialize global nccl context in dygraph
test=develop
5 years ago
Zhang Ting
880eb04d93
skip PrepareData when it is unnecessary ( #22839 )
...
* remove unnecessary prepare data, test=develop
* Op in while block will not skip PrepareData, test=develop
5 years ago
Feiyu Chan
01ab8a0619
add approximation for gelu, test=develop ( #22961 )
...
add approximation for gelu, default value is False (only kernel with eigen is added, remove code for computing gelu with MKLDNN temporarily)
5 years ago
Adam
5842ae6785
Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )" ( #22985 )
5 years ago
Pei Yang
24db750386
fix trt int8 calib precision bug. test=develop ( #23036 )
5 years ago
GaoWei8
1dc1f9270e
Fix lod error of concat op for axis = 0 ( #22538 )
5 years ago
yaoxuefeng
660ff18488
fix datsset test=develop ( #23043 )
5 years ago
Zhang Ting
714b0076b6
Override GetKernelTypeForVar to avoid device transform, test=develop ( #23032 )
5 years ago
wangchaochaohu
112e3edbf6
fix the conv group problem test=develop ( #23025 )
5 years ago
Wilber
db40ee86db
fix unittets. test=develop ( #23018 )
5 years ago
wangchaochaohu
99db0cf762
remove debug log test=develop ( #22994 )
5 years ago
wangchaochaohu
3757e0687c
Add Unittest for backward of fusion group ( #22932 )
...
* add fusion group test for backward and refine code
5 years ago
chengjuntao
63f3ada7b9
fix bug which input shape ( #22965 )
...
* fix bug which input shape, test=develop
* add error type,test=develop
5 years ago
Zhang Ting
137d6563fc
add check for assigned data, test=develop ( #22960 )
5 years ago
wangchaochaohu
f0d193a23c
Cast fusion for fusion group ( #22876 )
...
* add support for expression type convert and add cast Op support in fusion group
5 years ago
yaoxuefeng
29a7a52d38
Fix instag ( #22632 )
...
* update
* update test=develop
* update compile set test=develop
* update compile set test=develop
* update test=develop
* update test=develop
* update test=develop
* update compile setting test=develop
* update compile setting test=develop
* update run demo test=develop
* update test=develop
* update test=develop
* fix test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update format test=develop
* update format test=develop
* update style test=develop
* update style test=develop
* change style test=develop
* change style test=develop
* change style test=develop
* add dataset unittest test=develop
* update test=develop
* update for record test=develop
* udpate style for record test=develop
* update for record test=develop
* update for record test=develop
* update for record test=develop
* fix format test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* update test=develop
* fix compile warning test=develop
* add attr default test=develop
* add unittest test=develop
* fix style test=develop
* fix style test=develop
* change out_val_ifempty to out_val_if_empty test=develop
5 years ago
wangchaochaohu
c979c9f2b0
refine the profiler print test=develop ( #22968 )
5 years ago
Wilber
ff3ddbb502
add skip_layernorm pass. test=develop ( #22895 )
...
* add skip_layernorm pass. test=develop
5 years ago
wawltor
f154d5860f
Speed up the matmul op, use the gemm replace the batch gemm ( #22926 )
...
In the op of gemm, we use the gemm to replace batch gemm, speed up the matmul op
5 years ago
Adam
056edf3929
Change ShareDataWith() to TensorCopy() in conv_mkldnn ( #22695 )
5 years ago
Zhaolong Xing
8d6dc102fe
[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse ( #22494 )
...
* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop
* 1. refine fc
test=develop
* fix comments
test=develop
* fix comments
test=develop
5 years ago
guofei
3d8571e884
modify assign op and add unittest of assign op ( #22769 )
...
As the title.
5 years ago
Zeng Jinle
d33c4343e1
Imperative tracer refactoring ( #22457 )
...
* refine grad maker, test=develop
* refactor tracer stage 1, test=develop
* merge develop to solve conflict third times, test=develop
5 years ago
liu zhengxi
61fef9754b
Fix fc padding bug during inference fusion ( #22860 )
...
* fix fc padding during fusion, test=develop
* fix optim model inference after SaveOptimModel, test=develop
5 years ago
tangwei12
ad9c8f6d2d
fix communicator when break under pyreder mode ( #22911 )
...
* fix communicator when breaking under PyReader mode, test=develop
* revert some vlog level to 0, test=develop
5 years ago
mapingshuo
5ba9dfc16a
add lookup_table_dequant_op ( #22900 )
...
add lookup_table_dequant_op
5 years ago
zhaoyuchen2018
a020a25797
Fix model int8 quant fail, test=develop ( #22891 )
...
As model fails when enable int8 quant, so disable allocate memory in cpu
for small variable.
5 years ago
Zhaolong Xing
dd67d44a50
[Paddle-TRT] : (Part1) Dynamic shape support ( #22868 )
...
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
5 years ago
tangwei12
07e13b84cd
remove vlog, test=develop ( #22898 )
5 years ago
Zhang Ting
ca9c8b417d
fix compute ratio of profile, test=develop ( #22872 )
5 years ago
wangchaochaohu
dbb0b9b3b6
refine the profiler print ( #22823 )
...
* refine the profiler print test=develop
5 years ago
Michał Gallus
0038bfbd1d
Prevent loading of warmup data in analyzer_int8 if enable_int8 is set to false ( #22857 )
5 years ago
Chen Weihang
1644926a6c
Polish detail implement of dygraph data loader ( #22878 )
...
* polish detail implement of data loader, test=develop
* solve coverage ci problem, test=develop
5 years ago
Wilber
f686310d81
fix concat_mkldnn op. test=develop ( #22692 )
...
fix concat_mkldnn op when encounter extreame conditions.
5 years ago
hong
5191e54494
reduce default attrs for dynamic graph ( #22850 )
...
* reduce default attrs for dynamic graph, test=develop
* add some explanations for explicit attr, test=develop
* tweak explicit attr comments, test=develop
5 years ago
Zhaolong Xing
1a533ed2de
[BUG]: Multihead matmul op's ouput size should be BxSx(N*H) ( #22848 )
...
test=develop
5 years ago
hong
c736fef93b
dygraph backward engine accelerate ( #22808 )
...
* fix loaded program load bug; test=develop
* first version
* speed backward engin; test=develop
* remove useless code; test=develop
* reconvery io.py; test=develop
* remove useless code; test=develop
* remove useless code; test=develop
5 years ago
Zeng Jinle
d41d802ba3
Add flags to limit gpu memory ( #22793 )
...
* add recorded cuda memory apis, fix typo, test=develop
* add more ut, test=develop
* follow comments, test=develop
* fix py35 incompatible issues, test=develop
5 years ago
石晓伟
1861ca88f1
serialize the PaddleTensor, test=develop ( #22810 )
...
* encapsulate the PaddleTensorToLoDTensor, test=develop
* serialize the pd_tensor, test=develop
* serialize tensors to file, test=develop
5 years ago
Zhang Ting
72ff5a09c3
fix print bug of profile, test=develop ( #22804 )
5 years ago
Zhang Ting
4e8bc02461
add fluid.device_guard to specify the device type for Op ( #22254 )
...
* add fluid.device_guard to specify the device type for Op
5 years ago
石晓伟
ddb9b46fec
change the function in op_teller, test=develop ( #22794 )
...
* change the function in op_teller, test=develop
* correct the commit-id, test=develop
5 years ago
Zhen Wang
89cfa49156
Unmerged fetch list ( #22635 )
...
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
* add the unit test for fetch_unmerged.
* update ut for multi-card and multi-cpu.
* add the error message and the user suggestion in FetchOpHandle. test=develop
5 years ago
wangchaochaohu
8456c3f4dd
polish the profiler_help code ( #22811 )
5 years ago
Chen Weihang
7d8d573453
Speed up dygraph DataLoader based on shared memory and LoDTensor serialization ( #22541 )
...
* add lodtensor share memory & serialization, test=develop
* fix windows compile error, test=develop
* deal vartype pickle & fix unittest matching error message, test=develop
* update timeout variable name, test=develop
* refactor memory map implement, test=develop
* clear mmap file discripter when exit unexpectedly, test=develop
* remove the child process fd in advance, test=develop
* remove mmap fds after Queue.put in child process, test=develop
* add hard unittests for register exit func, test=develop
* fix python2 compatibility problem in unittest, test=develop
* fix exception unittest error, test=develop
* polish code based review comment, test=develop
5 years ago
liu zhengxi
324f2b3922
Fix inference c api PD_GetZeroCopyOutput lod ( #22768 )
...
* fix inference c api lod, test=develop
* fix capi lod problem and enrich tests, test=develop
* delete useless header files and alter const_cast, test=develop
5 years ago
wangchaochaohu
7578fcbac4
Profile code refine ( #22800 )
...
* add profiler_help.h to refine the code test=develop
5 years ago
hutuxian
53a2b68f4e
support customized download command in dataset ( #22782 )
...
* user can call dataset.set_download_cmd to set its customized download cmd
* add UT to cover this scenario
5 years ago
wangchaochaohu
ca9e77a8d4
add sum op support for fusion group ( #22771 )
...
* Add the codegen and auto fusion for sum Op in fusion group
5 years ago
tianshuo78520a
433cef03e5
fix typo word ( #22784 )
5 years ago
Kaipeng Deng
ebc7ffc300
fix detection_map. test=develop ( #22705 )
5 years ago
zhaoyuchen2018
72dde4abde
Refine adam op to improve performance, test=develop ( #22346 )
...
* Refine adam op, test=develop
* Fuse kernels together to reduce cpu time.
* Refine paddle enforce, test=develop
* Remove some comments, test=develop
* Refine code,test=develop
* Refine cuda kernel, test=develop
* Refine code according to comments, test=develop
5 years ago
wangguanzhong
f2d1cd119a
fix lod level, test=develop ( #22755 )
5 years ago
FlyingQianMM
79d712346f
Correct CPU gradients of the argsort op ( #22739 )
...
* Correct CPU gradients of the argsort op, form a network to test its forward and backward process, test=develop
* fix dynamic threshold error in test_argsort_op, test=develop
5 years ago
Adam
2b80e9a719
Add cpu_info without XBYAK ( #22716 )
5 years ago
guofei
ae8b5f11a3
Change ShareDataWith() to TensorCopy() in ref_by_trainer_id ( #22717 )
...
As the title
5 years ago
liu zhengxi
71ab0458e1
Fix pointer and c-api encapsulation ( #22663 )
...
* refine pointer and c-api prototype, test=develop
* fix new c api profile bug, test=develop
* add unit tests, test=develop
5 years ago
Leo Chen
b2c1be851a
support cond in clone, test=develop ( #22657 )
...
* support cond in clone, test=develop
* refine code, test=develop
* refine code, test=develop
* follow comments, test=develop
* refine code, test=develop
5 years ago
Zhang Ting
f97f3f9301
add framework overhead ratio in profile report ( #22590 )
...
* add framework overhead ratio, test=develop
* print GpuMemcpy overhead, test=develop
5 years ago
chengjuntao
15c2667143
register fp16 for assign op ( #22744 )
...
* register fp16 for assign op, test=develop
* add op test for fp16, test=develop
5 years ago
dyning
1c0653462d
fix generate_mask_labels lod level ( #22743 )
5 years ago
GaoWei8
ba140222d6
fix compile&runtime lod_equality of lod_reset ( #22737 )
5 years ago
hutuxian
175954d894
PaddleBox Framework Part2 ( #22466 )
...
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
5 years ago
ShenLiang
3132681e8a
add partial_sum op in contrib ( #22292 )
...
* add partial_sum_op, test=develop
* modify the Paddle Error Message, test=develop
* modify the Paddle Error Message, test=develop
* modify the bug for python3, test=develop
* modify the ut for ci, test=develop
* mv to contrib, test=develop
* use check_variable_and_dtype, test=develop
* fix ci, test=develop
* fix conflict, test=dvelop
* add partial concat, test=develop
* fix the conflict, test=develop
* fix the error, test=develop
* rm SSE4, test=develop
5 years ago
wangchaochaohu
611411b90e
Fusion group profile support ( #22718 )
...
* add support for the driver api callback and fix the profiler name show bug
5 years ago
ShenLiang
e136661304
add partial_concat op in contrib ( #22528 )
...
* add partial_concat, test=develop
* fix the grids and blocks, test=develop
* fix the Paddle_Enforce, test=develop
* fix the doc of op, test=develop
* fix the doc, test=develop
* fix the doc of the op, test=develop
* replace -1 with None, test=develop
5 years ago
GaoWei8
cdf5f6fb8c
Add an inference interface to disable FC padding ( #22097 )
...
* Add an interface of disabling FC padding
* fix bert regression
* polish fc padding interface
* recover pass function
* fix argument error
* fix mkldnn error
5 years ago
tianshuo78520a
d2ba91aad1
fix typo words ( #22653 )
5 years ago
Yibing Liu
6e7bfe30a6
register fp16 kernel for some ops ( #22650 ) ( #22696 )
...
test=develop
5 years ago
tangwei12
66a3150135
SYNC with communicaotor ( #22344 )
...
* add sync communicator and implement
5 years ago
Yiqun Liu
22bbd54719
Add the support of fp16 in fusion_group ( #22239 )
5 years ago
flame
d97475d53b
fix CPU C inference API compile bug ( #22702 )
5 years ago
Huihuang Zheng
adfa5b8354
Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp ( #22673 )
...
1. Add PADDLE_ENFORCE to Check Sequence Length of RecurrentOp.
2. Also enrich PADDLE_ENFORCE error messages.
5 years ago
flame
74eb82de19
fix go api bug ( #22669 )
5 years ago
wangchaochaohu
a089072c8b
fix the profile print error ( #22665 )
...
* fix the profile print error test=develop
5 years ago
lidanqing
d926214535
[UT coverage] improve the mul_mkldnn_op line coverage ( #22408 )
...
* improve the mul_mkldnn_op line coverage
test=develop
* remove fp32 mul mkldnn kernel
test=develop
* locally refactoring
test=develop
* change according to reviews
test=develop
5 years ago
wangchaochaohu
c65c6ae534
add flag to control profile level in python API ( #22319 )
...
* add python flag to control profile level test=develop
5 years ago
123malin
00594c1c88
support dumping params/grads in transpiler mode ( #22490 )
5 years ago
Zhaolong Xing
a06d75a280
[Paddle-TRT] Refine the error log about runtime batch and max_batch_size. ( #22535 )
...
* fix trt log
test=develop
* fix comments
test=develop
5 years ago
Adam
608447bfd5
Update MKLDNN to v1.2 ( #22521 )
5 years ago
Adam
ab610a34ff
transpose_mkldnn code change to meet Paddle standards ( #22591 )
5 years ago
Jiawei Wang
8f035fb637
Add TopK Op Grad CPU&GPU Kernel test=develop ( #22628 )
...
* Add TopK Op Grad CPU&GPU Kernel test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify grad op maker test=develop
* Add TopK Op Grad, modify PADDLE_ENFORCE test=develop
* Add TopK Op Grad, modify PADDLE_THROW test=develop
* Add TopK Op Grad, modify unittest test=develop
* fix ngraph top k op unittest test=develop
5 years ago
Steffy-zxf
90ee366653
update ops's unittest data type from float32 to float64 and shape over 100 ( #22544 )
...
* update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt
1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64)
2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data
3. remove sqrt from op_accuracy_white_list.py
4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100
5. test=develop
* modify the writing style according suggestions
test=develop
5 years ago
flame
f7eafca828
remove python inference warning ( #22602 )
5 years ago
Chen Weihang
fe685cc185
fix enforce test error, test=develop ( #22610 )
5 years ago
Wilber
9a8203aa25
fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop ( #22551 )
...
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
5 years ago
Chen Weihang
266106da75
Fix mismatch with plus sign in the line ( #22588 )
...
* reproduce match error, test=develop, test=document_fix
* fix mismatch error, test=develop, test=document_fix
5 years ago
flame
1d503e6a9e
Golang inference API ( #22503 )
...
* support golang inference
5 years ago
Zhaolong Xing
8acd745c25
[Ernie GPU Optim]: Fuse three fc to multihtead matmul ( #22486 )
...
* 1. optim multihead matmul: fuse three fc to multihtead matmul
test=develop
* fix conflict
test=develop
* fix comments
test=develop
5 years ago
Yiqun Liu
96770f519e
Disable fusion_group for windows and mac in build_strategy. ( #22549 )
...
test=develop
5 years ago
Zeng Jinle
08033c8634
fix traced layer with non persistable vars, test=develop ( #22552 )
5 years ago