hutuxian
969e6378b9
Pipeline Concurrency ( #17402 )
...
Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now
6 years ago
Zeng Jinle
3ece61f71e
Remove attribute in Allocator::Allocate ( #17878 )
...
* remove attribute in Allocator::Allocate, test=develop
* fix travis ci error, test=develop
6 years ago
Zeng Jinle
3925bd81e8
Fix cuda/cudnn version detection error ( #17853 )
...
* fix cuda/cudnn version detection error, test=develop
* fix again, test=develop
6 years ago
chengduo
d1169afaa3
remove InstallFailureSignalHandler ( #17828 )
...
test=develop
6 years ago
Leo Zhao
50326563d5
enable mkldnn primitive reuse for platform reorder ( #17826 )
...
test=develop
6 years ago
wangchaochaohu
c10157a5df
revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) ( #17753 )
...
* revise conv layer cudnn algo choose test=develop
* update for code style test=develop
* update for code style test=develop
6 years ago
chengduo
863c75168c
polish error doc ( #17772 )
...
test=develop
6 years ago
gongweibao
0d561ef442
fix 2dconn test=develop ( #17681 )
6 years ago
gongweibao
65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. ( #17263 )
6 years ago
wopeizl
6724a652f3
add __str__ method for tensor and lodtensor to support print test=dev… ( #17588 )
...
* add __str__ method for tensor and lodtensor to support print test=develop
6 years ago
mozga-intel
f2694e122d
[NGraph] Enable assign operator for a ngraph, test=develop ( #17437 )
...
* Enable assign operator for a ngraph, test=develop
* Cross_entropy operators needs to be updated
6 years ago
Zeng Jinle
c6189637cd
Fix allocator bug ( #16712 )
...
* Revert "Revert "Fix allocator bug""
This reverts commit 174d0d0b90
.
* Revert "fix travis ci"
This reverts commit 5656fa9f7c
.
test=develop
* add inlined_vector.h, test=develop
* add inlined_vector_test,test=develop
6 years ago
mozga-intel
109b5aed5a
[NGraph] Enable reshape operator test=develop ( #17512 )
6 years ago
guomingz
2281ebf0f3
Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. ( #17130 )
...
* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.
Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280
test=develop
* Fix the format issue
test=develop
* Add the missing nolint comments.
test=develop
* Fix the typos.
test=develop
* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.
test=develop
* Adjust the indentation.
test=develop
* Add the test_conv_brelu_mkldnn_fuse_pass case.
test=develop
* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.
test=develop
6 years ago
qingqing01
97f0ec2357
Fix compiling error with cuDNN 5.1 ( #17458 )
...
test=develop
6 years ago
Zeng Jinle
eab34b2df6
fix_dygraph_mem_leak, test=develop ( #17396 )
6 years ago
qingqing01
e32c9888f5
Double backward of conv2d. ( #17211 )
...
* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
- Now use it in conv2d_grad_grad.
- Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables,return None in Python.
6 years ago
zhaoyuchen2018
792443ef23
Refine elementwise kernel. ( #16952 )
...
* Refine elementwise kernel.
Add a simple cuda kernel if grad x and y both exist
Use 2D block cuda kernel to do broadcast.
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
* refine code.
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
* refine code.
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
6 years ago
chengduo
db5e74ab95
update assert ( #17282 )
...
test=develop
6 years ago
baojun
7bd1d03ee5
Adding lrn op for ngraph engine ( #17189 )
...
* added lrn op test=develop
* Added CreateConstant method test=develop
* avoid duplicates test=develop
6 years ago
Tao Luo
ff1661f12a
remove unused FLAGS_warpctc_dir ( #17162 )
...
* remove unused FLAGS_warpctc_dir
test=develop
* remove FLAGS_warpctc_dir
test=develop
6 years ago
Huihuang Zheng
e4a5332416
Fix a typo in gpu_info.cc ( #17175 )
...
test=develop
6 years ago
Huihuang Zheng
b9494058b3
Use CudnnWorkspaceHandle in exhaustive search ( #17082 )
...
1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.
test=develop
6 years ago
Zeng Jinle
0c335dcd2c
Make conv cudnn workspace size configurable ( #17036 )
...
* make_conv_cudnn_ws_size_configurable, test=develop
* change std::max to std::min
test=develop
6 years ago
Zeng Jinle
1202d3fc74
Refine model gpu memory ( #16993 )
...
* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop
* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop
* follow comments
test=develop
6 years ago
gongweibao
cbdb8a17b1
Polish DGC code ( #16818 )
6 years ago
xuezhong
742d758747
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_infershape_bug2
6 years ago
xuezhong
5663fbfb0a
fix infershape bug
...
test=develop
6 years ago
Jacek Czaja
87a44b1149
[MKL-DNN] Added reusing of primitive descriptors (fp32) ( #16667 )
...
* - Reuse of conv PD
- conv transpose pd reused
- Added PD reusing of softmax and Batch Norm
- Refactoring and removal of not needed routines of mkl-dnn ops
test=develop
- Fix to reusing conv
test=develop
- Lint fixes
test=develop
- Further lint fixes
test=develop
- Lint fixes
test=develop
- lint fixes
test=develop
- Lint workaround
test=develop
* - Fix after review on including boost as third party header
test=develop
* - Fix after review. Name change to something more descriptive
test=develop
6 years ago
dongdaxiang
a659b37ace
make lodtensor_printer usable in gpu setting
...
test=develop
6 years ago
Chen Weihang
0b2aec14b6
Revert "Model data cryption link all lib ( #16555 )"
...
test=develop
This reverts commit c38c7c5619
.
6 years ago
Chen Weihang
c38c7c5619
Model data cryption link all lib ( #16555 )
...
* link the libwbaes.so into paddle
* polish detail, test=develop
* try fix mac_pr_ci error, test=develop
* add compile option, test=develop
* fix ci error, test=develop
* ignore failed to find mac lib, test=develop
* change cdn to bj, cdn can't get the latest version
* trigger ci, test=develop
* temporary delete win32 lib linking, test=develop
* change https to http, test=develop
* turn compile option on to off
* turn compile option off to on, test=develop
* try lib compiled by gcc4.8, test=develop
* update lib version, test=develop
* link other lib, test=develop
* add setup config
* delete false, test=develop
* delete no_soname, test=develop
* recover so name set
* fix, test=develop
* adjust make config, test=develop
* remove link to wbaes, test=develop
* remove useless define, test=develop
6 years ago
guru4elephant
76b49f02ee
Merge pull request #16539 from guru4elephant/train_with_pipe_reader_merge_develop
...
Train with pipe reader merge develop
6 years ago
gongweibao
fea91164b7
Fix windows compilation error! ( #16546 )
...
* fix compiled
test=develop
* follow comments test=develop
6 years ago
dongdaxiang
3a79be6eb3
refine API spec
...
test=develop
6 years ago
dongdaxiang
98dda08a85
fix pull sparse slow problem
...
test=develop
6 years ago
dongdaxiang
93c3c7f9b3
fix dataset testcase problem
...
test=develop
6 years ago
dongdaxiang
d739bab844
fix async_executor problem and remove some unnecessary testcase, fix trainer_desc import problem
...
test=develop
6 years ago
dongdaxiang
e3107a6ae0
fix windows compile problem
...
test=develop
6 years ago
dongdaxiang
398004ece0
disable sys/wait.h to fix windows compile problem, include scope in lodtensor_printer
...
test=develop
6 years ago
dongdaxiang
39362a8415
move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids
...
test=develop
6 years ago
dongdaxiang
a0b59773af
fix code style
6 years ago
dongdaxiang
365be5d559
support win32 flag in io.cc shell.cc, fix code style problem in fleet_wrapper, fix lodtensor_printer_test problem
...
test=develop
6 years ago
dongdaxiang
dc8cf36e4b
add more example on datagenerator
...
test=develop
6 years ago
dongdaxiang
6bf796df14
refine print fetch list
6 years ago
dongdaxiang
cf1360643f
add printer for fetch variable
6 years ago
Jacek Czaja
2632327429
[MKL-DNN] Tensor modifications revert ( #16462 )
...
* Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233 )"
This reverts commit 13816dd4ac
.
Apart from enabling transformer for MKL-DNN
* Revert "- MKL-DNN pooling updated to set_prim_desc"
This reverts commit c63f6b2039
.
Conflicts:
paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc
* Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429 )"
test=develop
This reverts commit dec9cf53c8
.
* - concat compilation fix
- lint
test=develop
- Lint fixes
test=develop
- Lint fixes
test=develop
- Fix Transpose MKLDNN op
test=develop
6 years ago
Zeng Jinle
69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
...
Revert "Fix allocator bug"
6 years ago
sneaxiy
5656fa9f7c
fix travis ci
...
test=develop
6 years ago
Zeng Jinle
174d0d0b90
Revert "Fix allocator bug"
...
add include headers to fix travis-ci
test=develop
6 years ago
gongweibao
eb83abeac3
Add DGC(Deep Gradient Compression) interface. ( #15841 )
6 years ago
Zeng Jinle
644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
...
Fix allocator bug
6 years ago
nhzlx
953bdde058
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
...
test=develop
6 years ago
sneaxiy
2d92b6be98
merge develop
...
test=develop
6 years ago
Zeng Jinle
c64d959343
Merge pull request #16295 from zhhsplendid/zhenghuihuang-dev-2
...
Add support for init_memory and re-allocate_memory
6 years ago
nhzlx
a1d11bb175
fix ci bug: cudnn handler in multi card
...
test=develop
6 years ago
nhzlx
3df7b98a0f
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
sneaxiy
953214ad97
add more unittest
...
modify allocator strategy
remove changes of legacy buddy_allocator
test=develop
6 years ago
Wu Yi
b7baeed7bb
fix win gpu build test=develop ( #16334 )
6 years ago
zhhsplendid
124f1df481
Add flags for init and re-alloc gpu
...
test=develop
6 years ago
nhzlx
07dcf2856c
git cherry-pick from feature/anakin-engine: update anakin subgraph #16278
6 years ago
Wu Yi
6382b62f6b
Collective ops ( #15572 )
...
* wip allreduce in op
* wip
* wip
* wip
* wip adding test
* wip for conflict with mp mode
* fix tests test=develop
* fix cpu build test=develop
* fix travis clang format test=develop
* fix cpu build test=develop
* update api.spec test=develop
* delete comment test=develop
* fix cpplint test=develop
* fix test=develop
* follow comment test=develop
* add file test=develop
* fix build test=develop
* update test=develop
* to be compatible with sync_bn, and fix mp mode in develop test=develop
6 years ago
zhhsplendid
22715487dc
add allocator flags
...
test=develop
6 years ago
sneaxiy
fd23262e0c
merge develop, fix conflict
...
test=develop
6 years ago
qingqing01
86e912c544
Fix windows compiling ( #16230 )
...
test=develop
6 years ago
qingqing01
8ad672a287
Support sync batch norm. ( #16121 )
...
* Support Sync Batch Norm.
* Note, do not enable it in one device.
Usage:
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
loss_name=loss_mean.name,
build_strategy=build_strategy)
6 years ago
sneaxiy
682f2dbf29
merge develop
...
test=develop
6 years ago
sneaxiy
2c4fcaa683
merge develop
6 years ago
chengduo
0979956619
Add memory profiler ( #16137 )
...
test=develop
6 years ago
chengduo
ad80bde824
Revert "Revert "Add Event for TensorCopy"" ( #16035 )
...
* Revert "Revert "Add Event for TensorCopy" (#16022 )"
This reverts commit e2da3a5b22
.
* use default stream
test=develop
6 years ago
sneaxiy
2a639d5c2a
add allocator chain to fix bug
...
test=develop
6 years ago
chengduo
e2da3a5b22
Revert "Add Event for TensorCopy" ( #16022 )
...
* Revert "Add Event for TensorCopy (#15953 )"
This reverts commit 7235fd662b
.
test=develop
* fix CI
test=develop
6 years ago
chengduo
7235fd662b
Add Event for TensorCopy ( #15953 )
...
Add Event for TensorCopy
6 years ago
Tao Luo
4efdebc6f6
Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt
...
Optimize gelu operation with mkl erf
6 years ago
dzhwinter
225c11a91f
polish cudnn related code and fix bug. ( #15164 )
...
* staged.
* polish code
* polish code. test=develop
* polish code. test=develop
* api change. test=develop
* fix default value. test=develop
* fix default value. test=develop
6 years ago
xiaolil1
6724be2b0d
INT8 Pool kernel Key Creation Optimization. ( #15883 )
...
* Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency.
test=develop
* Optimize key creation of pool fp32 grad.
test=develop
6 years ago
Yihua Xu
7396788694
Optimize gelu operation with mkl erf.
...
test=develop
6 years ago
peizhilin
c6472579c0
test=develop
6 years ago
peizhilin
b5d6e38b05
fix build issue for cudaEvent_t
...
test=develop
6 years ago
wopeizl
3ccd8964a4
Merge pull request #15905 from wopeizl/win/fix_eigen
...
fix build issue on windows for sample prop op
6 years ago
chengduo
8e904d322f
Remove unnecessary dependence for profiler ( #15899 )
...
* refile profiler
test=develop
* follow comment
test=develop
6 years ago
Xin Pan
44e7fcddc5
Merge pull request #15844 from panyx0718/infer
...
add per kernel config and remove const_cast.
6 years ago
Jacek Czaja
dec9cf53c8
[MKL-DNN] MKL-DNN specific Tensor modification ( #15429 )
...
* - Implemented draft of primitive desc keeping in Tensor
test=develop
- TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented
- Added nchw and nc formats setting for sake of compatiblity
Fixed unit tests
- Worakaround to problem with 5D data in conv
- Added 3D and 1D MKL-DNN formats for name handles for tensor
test=develop
- Fix to UTs
test=develop
- Conv fp32 op was updated
Cosmetic fixes
test=develop
- tensor mkldnn cosmetics
test=develop
- Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils
* - Lint fixes
test=develop
* - setting prim dec in Tensor , sets also layout to kMKLDNN
test=develop
* - Moved creation of prim desc totally out of Tensor
test=develop
* - Cosmetic fixes adter review
test=develop
6 years ago
peizhilin
6ccdb1b947
fix build issue on windows for sample prop op
...
test=develop
6 years ago
Dun
c6bd434ffe
add memset CUPTI && test=develop ( #15868 )
6 years ago
Sylwester Fraczek
74672d1aff
Change *(smart_ptr.get()) -> *smart_ptr
...
reason: dereferencing smart pointer is the same as the underlying pointer
test=develop
6 years ago
tensor-tang
ee2321debd
Revert 15770 develop a6910f900
gelu mkl opt ( #15872 )
...
* Revert "Optimze Gelu with MKL Erf function (#15770 )"
This reverts commit 676995c86c
.
* test=develop
6 years ago
chengduo
3b08c9abf4
enhance profiler ( #15842 )
...
test=develop
6 years ago
Yihua Xu
676995c86c
Optimze Gelu with MKL Erf function ( #15770 )
...
* Optimize for gelu operator
* Set up the low accuracy mode of MKL ERF function.
test=develop
* Only enable MKLML ERF when OS is linux
* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.
test=develop
* Add the CUDA macro to avoid NVCC's compile issue.
test=develop
* Add the TODO comments for mklml library modification.
test=develop
* Clean Code
test=develop
* Add the comment of marco for NVCC compiler.
test=develop
6 years ago
Tao Luo
e3dd6970fc
disable dam temporarily ( #15860 )
...
test=develop
6 years ago
Dun Liang
35a90e06bf
test=develop
6 years ago
Dun Liang
c9080f516b
test=develop
6 years ago
Dun Liang
1c7bb0e40c
test=develop
6 years ago
Xin Pan
5eb87506bc
add per kernel config and remove const_cast.
...
test=develop
6 years ago
Dun
a83e470405
Profiler refine and add CUDA runtime api tracer ( #15301 )
...
* refine profiler && add runtime tracer
* test=develop
* test=develop
* test=develop
* test=develop
* test=develop
* test=develop
* test=develop
* test=develop
* fix bug && test=develop
* add thread id map && test=develop
* test=develop
* testing
* bug fix
* remove cuda event && refine code && test=develop
* test=develop
* test=develop
* test=develop
* fix windows temp file && test=develop
* test=develop
* fix windows bug && test=develop
* fix start up issue && test=develop
* code polish && test=develop
* remove unused code && test=develop
* add some cupti cbid && test=develop
* add FLAGS_multiple_of_cupti_buffer_size && test=develop
* fix compile error && test=develop
* add keyword && test=develop
* fix && test=develop
* code polish && test=develop
6 years ago
mozga-intel
13ec2d331b
Enable momentum operator for a ngraph engine ( #15673 )
...
* Enable momentum operator for a ngraph engine
test=develop
* Update tests
test=develop
* Unnecessary line of the code as intended was removed
test=develop
6 years ago
Tao Luo
c797a1f050
remove legacy any.cmake
6 years ago
Tao Luo
bd2fa73620
Merge pull request #15794 from sneaxiy/fix-warnings
...
Fix compile warning
6 years ago
tensor-tang
e1c707fe9c
fix warnings ( #15790 )
...
* fix warnings
test=develop
* fix enforce test
test=develop
6 years ago
sneaxiy
9b8e0e2f17
fix enforce_test
...
test=develop
6 years ago
sneaxiy
209b355762
fix many warning
...
test=develop
6 years ago
Zeng Jinle
fc87ef741b
Merge pull request #15687 from sneaxiy/fix_enforce
...
fix enforce
6 years ago
sneaxiy
f0590947c3
fix enforce
...
test=develop
6 years ago
tensor-tang
31fd8ce1e1
Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator
...
Enable batch_norm operator for a ngraph engine
6 years ago
dzhwinter
04e9776aef
add details. test=develop
6 years ago
mozga-intel
1198ccae6b
Enable batch_norm operator for a ngraph engine
...
test=develop
6 years ago
peizhilin
883d22093a
fix the lib_any dependency
...
test=develop
6 years ago
wopeizl
3614dadf23
Merge pull request #15631 from wopeizl/windows/fixci
...
fix ci broken randomly and disable some warnings
6 years ago
peizhilin
061299be87
fix dependency
...
test=develop
6 years ago
baojun
ac4cde009d
Enable accuracy op for ngraph engine ( #15592 )
...
* Added accuracy ngraph op test=develop
* fixed name type test=develop
6 years ago
dzhwinter
ce0394bcd0
merge develop branch. test=develop
6 years ago
guoshengCS
b6c3b69af8
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size
...
test=develop
6 years ago
liuwei1031
6e84eb131f
expose peak gpu memory API to python test=develop ( #15529 )
...
* expose peak gpu memory API to python test=develop
* add unittest for peak gpu memory monitoring test=develop
* add pybind change test=develop
* add mutex to gpu mem usage monitor test=develop
* update benchmark flag definition file test=develop
* tweak unittest for memory monitoring test=develop
6 years ago
guoshengCS
5dfce93101
To make CUDA_LAUNCH_KERNEL_HELPER support large size.
...
test=develop
6 years ago
tensor-tang
8117725852
add jit kernel hsum, hmax and softmax refer code
...
test=develop
6 years ago
sneaxiy
ba4f43fd62
fix compile error in distributed mode
...
test=develop
6 years ago
Yiqun Liu
3008fa1261
Add the CUDA kernel for beam_search op ( #15020 )
...
* Refine the beam_search op and test.
* A basic CUDA implementation of beam_search for small batch_size.
* Implement CUDA kernel for beam_search_op.
* Use multiple CUDA threads in the same block to select the top beam.
* Update the python api of beam_search op.
* Enable extend function in CPU kernel of beam_search op.
* Unify the CUDA codes.
test=develop
* Unify the CPU kernel of beam_search op.
* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
* Update the description of beam_search in API.spec.
* Enable the use of CUDA kernel in beam_search op.
* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop
* Follow comments.
test=develop
* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop
* Remove the except of is_empty op in PrepareData.
test=develop
6 years ago
Zeng Jinle
2480a3df7d
Merge pull request #15496 from sneaxiy/lazy_allocator2
...
Fix bug when user set CUDA_VISIBLE_DEVICES be empty and run CPU-only models
6 years ago
sneaxiy
9c360cc798
test=develop
6 years ago
Xin Pan
58cb18d9d9
Merge pull request #15322 from velconia/imperative_resnet
...
Imperative Resnet
6 years ago
sneaxiy
51227bd447
lazy_allocator
...
test=develop
6 years ago
tangwei12
8b50ad80ff
checkpoint at distributed training ( #14854 )
...
checkpoint for distributed training.
6 years ago
minqiyang
8ce198b2e1
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet
...
test=develop
6 years ago
minqiyang
315b133e67
Add single GPU support to imperative
6 years ago
tensor-tang
3759c1db8c
Merge pull request #14805 from mozga-intel/mozga-intel/element_wise_operator_ngraph
...
Enable element_wise_add operator for a ngraph engine
6 years ago
peizhilin
eea75a1d93
fix issue when type is invalid
...
test=develop
6 years ago
peizhilin
9adb158e5b
Merge remote-tracking branch 'upstream/develop' into debug/support
6 years ago
chengduo
46d01d798e
Revert "Revert "Remove workspace_handle in conv_cudnn ( #15186 )"" ( #15290 )
...
test=develop
This reverts commit 358e657f68
.
6 years ago
Wojciech Uss
cb2ba58458
Fix performance drop when with MKL-DNN
...
test=develop
6 years ago
chengduozh
c4eced9881
fix thread safe bug
...
test=develop
6 years ago
chengduozh
358e657f68
Revert "Remove workspace_handle in conv_cudnn ( #15186 )"
...
test=develop
This reverts commit 064512aa47
.
6 years ago
wopeizl
5d9edb4124
Merge pull request #15156 from wopeizl/windows/fixgpuissue
...
fix gpu buils issue on windows test=develop
6 years ago
chengduo
064512aa47
Remove workspace_handle in conv_cudnn ( #15186 )
...
* remove workspace_handle in conv2d_cudnn
test=develop
* remove workspace_handle
test=develop
* fix bug
test=develop
* make test_conv2d_op SERIAL
test=develop
* save memory in conv_cudnn
test=develop
* enhance thread safety
test=develop
* enhance temporary allocator
test=develop
* Add excess fraction
test=develop
* follow comments
test=develop
* fix bug and code refine
test=develop
* fix memory size check
test=develop
* rename reuse_tmp_allocation_excess_fraction
test=develop
6 years ago
xiaolil1
8f17c714de
Conv int8 residual ( #15145 )
...
* Enable basic MKL-DNN INT8 Conv OP
test=develop
* Modify test case
test=develop
* Clean unittest code
test=develop
* Fix test
test=develop
* Modify test
test=develop
* Enable MKL-DNN INT8 Conv with Relu Fusion OP
test=develop
* Enable INT8 Conv with residual fusion OP
test=develop
* Modify code.
test=develop
* Modify basic INT8 Conv
test=develop
* Modify Conv.
test=develop
* fix style
test=develop
* Fix style
test=develop
* Fix test
test=develop
* Modify code.
test=develop
* Fix test
test=develop
6 years ago
peizhilin
439691f5bd
adjust the shlwapi on windows
...
test=develop
6 years ago
peizhilin
92da467c99
Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue
6 years ago
peizhilin
c1235c935f
add the enable_debug flag
...
test=develop
6 years ago
Zeng Jinle
e29f10d315
Merge pull request #15207 from sneaxiy/remove_op_handle_lock_and_fix_var
...
Remove op handle lock and fix var
6 years ago
mozga-intel
a42f8f4f6f
Enable element_wise_add operator for a ngraph
...
test=develop
6 years ago
Zeng Jinle
c562be20d9
Merge pull request #15193 from sneaxiy/fix_cudnn_compatible_check
...
Fix cudnn compatible check
6 years ago
peizhilin
1cd95d8a0b
use thread local instance test=develop
6 years ago
sneaxiy
ed409ac9f4
Revert "Revert "Remove op handle lock""
...
test=develop
6 years ago
peizhilin
d54133ea85
not include the numeric under linux test=develop
6 years ago
peizhilin
a6f5ceee74
add the python callstack for debug support test=develop
6 years ago
Zeng Jinle
dacfaaa966
Revert "Remove op handle lock"
...
test=develop
6 years ago
xiaolil1
c8f101e5da
Conv int8 relu ( #15130 )
...
* Enable basic MKL-DNN INT8 Conv OP
test=develop
* Modify test case
test=develop
* Clean unittest code
test=develop
* Fix test
test=develop
* Modify test
test=develop
* Enable MKL-DNN INT8 Conv with Relu Fusion OP
test=develop
* Modify basic INT8 Conv
test=develop
* fix type
test=develop
* Modify test
test=develop
6 years ago
sneaxiy
9793a0b6a6
fix_cudnn_compatible_check
6 years ago
Zeng Jinle
ccb322d6a5
merge develop
6 years ago
Zeng Jinle
f3a13512fc
Merge pull request #15139 from sneaxiy/remove_op_handle_lock
...
Remove op handle lock
6 years ago
xiaolil1
bbc9336878
Enable basic MKL-DNN INT8 Conv OP ( #15124 )
...
* Enable basic MKL-DNN INT8 Conv OP
test=develop
* Modify test case
test=develop
* Clean unittest code
test=develop
* Fix test
test=develop
* Modify test
test=develop
* Modify basic INT8 Conv
test=develop
6 years ago
peizhilin
c919b2f31d
Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue
6 years ago
peizhilin
fd4f4d0e5f
fix build issue test=develop
6 years ago
Yan Xu
a1e60ab19b
Merge pull request #14791 from Yancey1989/parallel_graph_mode
...
[Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance
6 years ago
peizhilin
9ae50dd07d
fix gpu buils issue on windows test=develop
6 years ago
sneaxiy
d0a8a1e950
remove_op_handle_lock
...
test=develop
6 years ago
Yancey1989
e65436103f
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
...
test=develop
6 years ago
sneaxiy
6f06e6cdac
Merge remote origin
...
test=develop
6 years ago
Xin Pan
9186451f60
hide GetTensor
...
test=develop
6 years ago
sneaxiy
d25395fc98
remove tensor core lock
...
test=develop
6 years ago
Yancey1989
82b42e31f0
polish unittest test=develop
6 years ago
Yancey1989
0a885ac12a
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
...
test=develop
6 years ago
peizhilin
813c2ce539
fix timer test=develop
6 years ago
wopeizl
7ab501264d
Merge pull request #15069 from wopeizl/windows/dsosupport
...
add cuda dso support for windows
6 years ago
guru4elephant
ff739449ab
Merge pull request #15018 from guru4elephant/add_timer
...
Add debug thread function for async executor
6 years ago
Yancey1989
4743c9cd5d
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
wopeizl
719ebe3786
Merge pull request #15070 from wopeizl/windows/testcasefix
...
fix test issues on windows
6 years ago
Qiyang Min
0238a3bb4f
Merge pull request #14972 from velconia/accelerate_lstm
...
Accelerate PADDLE_ENFORCE
6 years ago
Yancey1989
86bb583881
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
peizhilin
01c00b07dd
fix test issues on windows
...
test=develop
6 years ago
peizhilin
1e7f83e60a
add cuda dso support for windows
...
test=develop
6 years ago
Yancey1989
41a64f6a2a
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
Wu Yi
856f0da0fe
Fp16 training ( #14992 )
...
* wip
* wip
* wip
* wip for test
* add fp16 tests test=develop
* fix cpu build test=develop
* fix test=develop
* fix py3 tests test=develop
* fix lr_scheduler dtype test=develop
* fix test=dvelop
* test fix ci compile test=develop
* fix build and merge test=develop
* fallback momentumop change to general test=develop
* make fp16 lr schedule simple test=develop
* fix ut test=develop
* fix tests test=develop
* remove fp16 learning rate cast test=develop
6 years ago
chengduo
b9fb03cf54
Move GetTensor to tensor_util ( #15011 )
...
* refine tensor
test=develop
* refine tensor
test=develop
* fix device_context log
test=develop
6 years ago
dongdaxiang
ab2abfc5b2
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
dongdaxiang
4cb833d2de
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
tensor-tang
f0e02a65ed
Merge pull request #14974 from xiaolil1/quantize
...
Add Quantize OP
6 years ago
dongdaxiang
68a2d1f3d7
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
add timer_test
test=develop
6 years ago
dongdaxiang
2e5ebc4594
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
dongdaxiang
5dfd9c9aa9
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
dongdaxiang
d0a5159946
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
dongdaxiang
f9b8168508
Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer
...
test=develop
6 years ago
minqiyang
52b4821a6e
Fix Sprintf problem
...
test=develop
6 years ago
minqiyang
010f657b33
Polish code
...
test=develop
6 years ago
minqiyang
45acfbd011
1. Add specific condition for one or no arg in PADDLE_ENFORCE
...
2. Add unit test for new enforce feature
test=develop
6 years ago
dongdaxiang
2dee8f6cd5
add TrainFilesWithTimer in async_executor
6 years ago
xiaoli.liu@intel.com
d83d0f33fd
extract templated function
...
test=develop
6 years ago
wopeizl
b117a5f208
Merge pull request #14931 from wopeizl/windows/mkl
...
add mkl support for windows
6 years ago
dongdaxiang
cf6188a823
add a linux timer
6 years ago
chengduo
79bd6dfa18
[Feature] Add Temporary Allocator ( #14875 )
...
* Add Temporal Allocator
* add Temporay Allocator to DeviceContext
test=develop
* code refine
test=develop
* fix mean_iou
test=develop
* Add DeviceTemporaryAllocator
test=develop
* fix conv_op bug
test=develop
* small fix
test=develop
* code refine
test=develop
* log refine
test=develop
* fix unit test
test=develop
* move double check
* refine concat_and_split
test=develop
* add limit_of_temporary_allocation
test=develop
* fix name
test=develop
6 years ago
minqiyang
e4719eb462
Fix bug in Windows VC 2010
...
test=develop
6 years ago
minqiyang
5a5c577529
Polish code
...
test=develop
6 years ago
minqiyang
099186cd41
Support one argument PADDLE_ENFORCE
...
test=develop
6 years ago
minqiyang
4af97c6946
Polish code
6 years ago
minqiyang
41b81293ab
Polish code
...
test=develop
6 years ago
peizhilin
9e60c58666
Merge remote-tracking branch 'upstream/develop' into windows/mkl
...
test=develop
6 years ago
minqiyang
bc66401566
Polish code
...
test=develop
6 years ago
minqiyang
53619a79b4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm
6 years ago
peizhilin
b06ce129bc
some not so useful adjust
...
test=develop
6 years ago
minqiyang
679d1a9e0b
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into accelerate_lstm
6 years ago
Jacek Czaja
709d9e3cb7
- Added reusing MKL-DNN primitives for Transpose MKL-DNN op
...
test=develop
6 years ago
peizhilin
40a94a138f
remove irrelevant fix for mkl
...
test=develop
6 years ago
mozga-intel
9035bb81fe
Enable mul operator for a ngraph engine ( #14801 )
...
* Enable mul operator for a ngraph
test=develop
* Enable activation ops test
test=develop
* Remove unused line
test=develop
6 years ago
peizhilin
07c7eaabb4
Merge remote-tracking branch 'upstream/develop' into windows/mkl
...
test=develop
6 years ago
peizhilin
ed5bd5e586
test=develop
6 years ago
peizhilin
19ebd8b4cf
add ctc support for windows
6 years ago
minqiyang
a3fa3f85d7
Polish code
...
test=develop
6 years ago
Yu Yang
2803cf5776
Merge pull request #14868 from reyoung/feature/refine_w2v
...
Feature/refine w2v
6 years ago
peizhilin
b601f2de8d
include the mkl fix only
...
test=develop
6 years ago
peizhilin
5a6d7fe2ff
add mkl,ctc support for windows
6 years ago
wopeizl
0f085f0a5a
Merge pull request #14892 from wopeizl/windows/port3
...
fix script issue
6 years ago
Zeng Jinle
36a1d021a4
Merge pull request #14927 from sneaxiy/fix_cuda_stream_callback_in_cuda10
...
Fix stream_callback_manager bug in CUDA 10
6 years ago
wopeizl
fa78fc60be
Merge pull request #14907 from wopeizl/windows/avx
...
add avx support for windows
6 years ago
sneaxiy
2373aeb5e8
fix bug
...
test=develop
6 years ago
minqiyang
aa41ee75a1
Accelerate PADDLE_ENFORCE
6 years ago
peizhilin
41456e1723
Remove the useless definition
...
test=develop
6 years ago
Yu Yang
740e1626ce
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_w2v
...
test=develop
6 years ago
Yancey1989
a760a550b0
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
peizhilin
d519fd6944
test=develop
6 years ago
Yu Yang
bacf1d2399
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type
6 years ago
Yan Chunwei
a985949be9
Fea/fuse conv elementwise add fuse ( #14669 )
6 years ago
Yancey1989
4a4ccac1d0
update by comment test=develop
6 years ago
peizhilin
23dec78772
fix script issue
...
test=develop
6 years ago
Yancey1989
c722b1dcb6
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
...
test=develop
6 years ago
Yu Yang
4ecdb6f486
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/tensor_type
...
test=develop
6 years ago
Yu Yang
7b10bf0e60
Use mkl
6 years ago
sneaxiy
ca84c2ca8f
merge develop
...
test=develop
6 years ago
Yu Yang
81520a24cf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/refine_eigen_tensor
6 years ago
Yu Yang
9bd70a1e04
Change tensor uses proto::VarType::type
...
test=develop
6 years ago
Yu Yang
8175983ef9
Merge pull request #14814 from reyoung/feature/gprof
...
Add gperftools supports for PE
6 years ago
Yu Yang
5e60906996
Fix compile error
...
test=develop
6 years ago
Yu Yang
7604b1ad51
Fix Eigen macro when using GPU
...
The macro should be defined by compiler rather than by source.
test=develop
6 years ago
sneaxiy
7923042365
merge develop
...
test=develop
6 years ago
Yu Yang
b22d638d8f
Speed up SizeOfType
...
test=develop
6 years ago
Yancey1989
2dda19f756
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
sneaxiy
66182abda6
add cuda cudnn version check
...
test=develop
6 years ago
Zeng Jinle
add98c9e7d
Merge pull request #14745 from sneaxiy/fix_eigen_deallocate
...
Fix eigen deallocate bug
6 years ago
Yancey1989
cb8a24be14
clean code
6 years ago
Tao Luo
54fcafb5f6
Merge pull request #14707 from yihuaxu/develop_4f71a6ee2_conv3d_mkldnn_opt
...
Implement conv3d with mkldnn library
6 years ago
Yancey1989
c9de6f1b05
init parallel graph mode
6 years ago
sneaxiy
0f96c2e80f
fix thread-safety bug
...
test=develop
6 years ago
Yihua Xu
65dbc7cca4
Merge branch 'develop' into develop_4f71a6ee2_conv3d_mkldnn_opt
6 years ago
tensor-tang
4a93db9288
remove jit namespace
...
test=develop
6 years ago
sneaxiy
900765224c
fix deallocate bug
...
test=develop
6 years ago
liuhongyu
773dc73fbf
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_5_support
6 years ago
liuhongyu
8daf67f90f
fix bugs; test=develop
6 years ago
Xin Pan
052cc5f538
Merge pull request #14725 from ZongwuYang/my-cool-stuff
...
My cool stuff
6 years ago
Wu Yi
29d9fb53fc
[Feature] multi process multi gpu dist training, boost v100 performance by 20% ( #14661 )
...
* wip multi process multi gpu dist training
* workable for p2p
* update test=develop
* change back env name test=develop
* fix alloc init
* fix cpu build test=devlop
* fix mac tests test=develop
* refine code
* refine test=develop
6 years ago
liuhongyu
968dd3c078
add cudnn 5 support; test=develop
6 years ago
ZongwuYang
1560eb4a6d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into my-cool-stuff
6 years ago
ZongwuYang
deb04809bd
test=develop
...
Fix the bug that profiler cannot trace the nccl allreduce operator
6 years ago
sneaxiy
35a2578426
fix bug
...
test=develop
6 years ago
sneaxiy
64ad051b9a
merge develop
...
test=develop
6 years ago
sneaxiy
c47c451a00
fix bug
6 years ago
Yihua Xu
669191c9cc
Implement conv3d with mkldnn library (test=develop)
6 years ago
Hongyu Liu
4f71a6ee2c
Merge pull request #14622 from PaddlePaddle/add_cudnn_lstm
...
Add cudnn lstm
6 years ago
Yibing Liu
c7382df80f
Print assert failure id in lookup_table_op ( #14698 )
6 years ago
sneaxiy
096673f675
refactor eager deletion
...
test=develop
6 years ago
phlrain
cf1fe61004
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm
6 years ago
Tao Luo
20120d9c97
Merge pull request #14608 from jczaja/prv-conv2d-transpose-mkldnn
...
[MKL-DNN]conv2d transpose
6 years ago
Tao Luo
ea47685f91
Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum
...
Softmax for inference MKL further changes
6 years ago
minqiyang
a02ce58f2c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
6 years ago
Tao Luo
4ec9de0122
Merge pull request #14628 from Sand3r-/mgallus/mkldnn-elementwise_mul
...
EltwiseMul: Changes from previous PR
6 years ago
Clementine
6c71c1f8f9
Add activation gelu ( #14569 )
6 years ago
Michal Gallus
9455be0ba5
EltwiseMul: Extract StringToFormat to MKLDNN helper
...
test=develop
6 years ago
Jacek Czaja
8bfa1fa9bb
- ASUM MKL integration
6 years ago
liuhongyu
05917c3c79
add cudnn lstm; test=develop
6 years ago
peizhilin
38715e6fd0
minor fix
6 years ago
Jacek Czaja
fb24690a58
- conv2d transpose MKL-DNN
...
test=develop
- Added new header for MKLDNN reuse functionality
- Extended conv2d_transpose GetExpectedKernelType for MKL-DNN supporrt
- Buildable conv transpose mkldnn and conv mkldnn using conv template
- Conv2d transpose roughlt implemented and buildable
- Added modifications conv2d transpose MKLDNN unit tests
- Fix to UT of conv2d transpose mkldnn op
- Wrong type of MKLDNN primitive was chosen for conv2d transpose
- HAcks for conv2d transpose
- UT enalbed
- Replaced copying loop with memcpy
- Draft of passing lambda into AcquireMemory
- Made reorder (IOHW->OIHW) to be called only once
6 years ago
minqiyang
be04d99fe4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
6 years ago
minqiyang
53433d7f2e
Revert the changes of VLOG
...
test=develop
6 years ago
peizhilin
36cd18b549
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
peizhilin
b2f8d4183d
Given the different fraction_of_gpu_memory_to_use depends on platform
6 years ago
Yu Yang
26af9cf90c
Merge pull request #14565 from chengduoZH/fix_cublas_warp_error
...
Fix cublas warp error
6 years ago
chengduozh
f7847ca6a3
fix cublas warp error
...
test=develop
6 years ago
luotao1
e21edb26f6
add Set/GetCPUNumThreads api
6 years ago
peizhilin
445fff24dc
add the bigobj option to NVCC compile
...
fix code style
6 years ago
chengduo
00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH ( #13929 )
...
* refine cublase
test=develop
* code refine
* refine cublas
* add GEMME_EX
* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop
* fix CublasCall for cuda version
test=develop
* fix error
test=develop
* fix GEMM_EX to be compatible with gcc 4.8
test=develop
* add GEMM_EX
test=develop
* to compatiable with gcc4.8
test=develop
6 years ago
peizhilin
7c8c9dc9bf
fix unit test cases
6 years ago
wopeizl
d9a1f3e58e
Windows/online ( #14474 )
...
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop
6 years ago
peizhilin
6e66fadb95
clean up the pre-definitions on windows
6 years ago
peizhilin
67562a6fcd
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
peizhilin
703b26e697
add profiler, parallel_executor back
6 years ago
chengduo
a8d3aaae2a
print output log warning ( #14497 )
...
test=develop
6 years ago
peizhilin
3a72a634cf
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
peizhilin
ee0fd78c81
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
Yu Yang
f1a392a5fe
Merge pull request #13804 from sneaxiy/rewrite_allocation
...
Rewrite allocation
6 years ago
qingqing01
fd7e643153
Convolution fusion operator. ( #14449 )
...
* Convolution fusion operator.
* Clean code
test=develop
6 years ago
Yu Yang
98bbfc17be
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
6 years ago
peizhilin
c59d3e83bc
test case fix
6 years ago
peizhilin
8580b7a130
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
Wu Yi
b32c13dc20
Add cudnn ctc loss ( #12366 )
...
* add cudnn ctc loss
* wip add test test=develop
* wip
* wip
* done test=develop
* move include cudnn test=develop
* test test=develop
* fix build test=develop
* fix build test=develop
* fix build on cudnn5 test=develop
* fix cudnn5 build test=develop
* fix cudnn5 build test=develop
* merge develop softmax functor change test=develop
6 years ago
peizhilin
d1a1fafc4c
code style
6 years ago
peizhilin
162f2d4109
disable the openblas multi-thread on windows since no support
...
adjust the python script
6 years ago
Yu Yang
c8f6e70ab4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
...
test=develop
6 years ago
peizhilin
d1429ac4a5
add recordio support
6 years ago
Yu Yang
0d6718fcbd
Pass compile
6 years ago
peizhilin
be332a13bc
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
Yu Yang
d93b2d0365
Refine code
6 years ago
peizhilin
1a9008c420
code style fix
...
test=develop
6 years ago
tensor-tang
1be85d011d
add mkl vsqr and vpow
6 years ago