luotao1
9c7fde45a7
enhance test_analyzer to profile ditu inference demo
7 years ago
Tao Luo
decda738b0
fea/anakin compile with demo ( #12772 )
...
* anakin support x86
* fix code style
* add anakin ditu cnn demo
* add timer
* add rnn
* fix inference_anakin_cnn/rnn_test compile error
* make anakin_rnn_tester run
* add anakin_enable_op_time option
* update api/CMakeLists.txt
* enlarge the max_batch_size in anakin.config
* update with comments
7 years ago
Yan Chunwei
9ee698e605
enhance/ditu rnn with fc fuse ( #12831 )
...
* make fc fuse work with ditu rnn
* add ditu rnn data download to CMAKE
7 years ago
Xin Pan
78415f326d
Merge pull request #12838 from panyx0718/infer
...
speed up while_op
7 years ago
Xin Pan
a2c0e52f3e
speed up while_op
7 years ago
Zhaolong Xing
21ba32b065
Merge pull request #12843 from NHZlX/fix_ssa_bug_for_trt
...
fix ssa bug with batch_norm and refine the trt
7 years ago
Michał Gallus
cd32ddac12
Fuse Convolution and Eltwise Add into MKLDNN's Conv+Bias ( #12669 )
...
* Fuse Convolution and Eltwise Add into Conv+Bias
* Reduce bias branching at conv_mkldnn_op
* Add MKLDNN build checks for Conv Bias
* Conv-bias: check if bias input exist befor assignment
* Conv-bias: Remove Bias dim check from infershape
It was causing conv3d test to crash upon\ncalling HasInput(Bias)
7 years ago
nhzlx
c999895e93
merge develop
7 years ago
nhzlx
276950291a
1. fix ssa bug with batchnorm, 2. refine the trt
7 years ago
Yan Chunwei
896a37b6e3
fea/link ir to inference analysis and fc fuse support ( #12789 )
...
* link IR graph to analysis graph
* add clean code and update
* add infer_clean_pass
* add ir_pass_manager
* support fc fuse executation
* fix ir circle
7 years ago
dzhwinter
e23ddf6ae4
status ( #12764 )
7 years ago
Tao Luo
d04ef276a5
Merge pull request #12745 from tensor-tang/refine/op/elewise_mul
...
Refine elementwise mul cpu forward
7 years ago
tangwei12
cbc6e6eb97
Merge pull request #12247 from seiriosPlus/dis_ckpt_fix
...
add load slice_vars in io.py
7 years ago
Qiyang Min
72965226e6
Merge pull request #12818 from velconia/fix_python3_CI_job
...
Fix python3 CI job
7 years ago
tangwei12
44bade8b17
fix api spec
7 years ago
Zhaolong Xing
470335e8c4
Merge pull request #12786 from NHZlX/add_batch_norm_trt_converter
...
Add batch norm trt converter
7 years ago
Qingsheng Li
3d11d018e0
Fix scatter_op python API ( #12742 )
...
* Fix scatter_op python API and remove inconsistency between implementation and doc
* API spec change
* Change as review comment
7 years ago
nhzlx
ff052c0e6f
merge develop
7 years ago
nhzlx
c6a5c4b0c0
add comments for execute in ut_helper
7 years ago
minqiyang
beb93bb901
Fix ut bug for graph_test
...
Port dist_transpiler new added codes
Port ut for clone desc
7 years ago
Tao Luo
8f9f414a14
Merge pull request #12805 from tensor-tang/fix/op/elewise_add
...
fix SEGV element wise add at debug mode
7 years ago
tensor-tang
e955361267
Merge pull request #12737 from tensor-tang/feature/op/fusion_lstm
...
add fusion lstm
7 years ago
tensor-tang
82bb9170fb
Merge remote-tracking branch 'ups/develop' into fix/op/elewise_add
7 years ago
tangwei12
99f74be561
Merge pull request #12802 from seiriosPlus/inference_teeny_mistakes
...
fix some teeny mistakes
7 years ago
Tao Luo
2ae885e224
Merge pull request #12811 from luotao1/tensorrt_compiler_bug
...
fix tensorrt compiler bug
7 years ago
Chen Weihang
57b34d9196
Merge pull request #12808 from chenwhql/remove_inplace_param_in_squeeze_and_unsqueeze
...
Refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
Xin Pan
daf464af68
Merge pull request #12807 from panyx0718/fix
...
fix program_desc constructor
7 years ago
luotao1
808e5b1748
fix tensorrt compiler bug
7 years ago
Yihua Xu
084d4a9e9e
Optimize CRF Decoding with AVX/AVX2/AVX512F instruction ( #12767 )
...
* Optimize CRF decoding with AVX/AVX2 instruction
* Enable the AVX2 flags for compiling
* Clean the code and decrease the count of multiply calculation
* Add the support of AVX512 instruction to optimize CRF Decoding
* Clean the code
* Enable the AVX512f flags for compiling
* Clean the code for the invaluable switch
* Fixed the issue to check AVX512F status
* Clean the code
* Add some explanation of the key points
7 years ago
dzhwinter
00463fdfe3
cudnn windows support ( #12757 )
...
* cudnn widndows
* "add comment"
* "windows support"
* "fix cmake error"
7 years ago
Xin Pan
4a4c469f61
add test
7 years ago
qingqing01
c62f68cb94
Fix bug in conditional_block_op. ( #12246 )
...
* Fix bug in conditional_block_op.
* Fix bug and add comments.
* Rename arguments.
7 years ago
nhzlx
1bf9d9e90c
fix comments
7 years ago
chenweihang
bc471b6ac4
refactor: remove inplace parameter from squeeze and unsqueeze op
7 years ago
Xin Pan
7473d5f735
fix program_desc constructor
7 years ago
tensor-tang
0507f7bc3c
fix SEGV elementwise add at debug mode
7 years ago
tangwei12
cfb12f09bf
fix some teeny mistakes
7 years ago
Yu Yang
c6af7201e9
Merge pull request #12692 from reyoung/feature/fast_executor
...
Feature/fast executor
7 years ago
Xin Pan
e525aa232e
Merge pull request #12780 from panyx0718/ir4
...
fix ProgramToGraph
7 years ago
Tao Luo
7decbaaa13
Merge pull request #12762 from luotao1/anakin_cuda_env
...
disable anakin when cuda < 8.0 or cudnn < 7.0
7 years ago
nhzlx
324dd16816
merge develop
7 years ago
yuyang18
b8029fd650
Follow comments
7 years ago
tangwei12
ca1e18c04a
Merge pull request #12469 from seiriosPlus/sum_op_dim_fix
...
sum_op selectedRows dim bug fix
7 years ago
Xin Pan
1d3343240e
fix
7 years ago
nhzlx
144b20c160
add batch norm op converter
7 years ago
nhzlx
14311bb094
merge develop
7 years ago
Zhaolong Xing
e5674f6dde
Merge pull request #12753 from NHZlX/add_benchmark
...
modify tensorrt engine op from cpu mode to gpu
7 years ago
Zhaolong Xing
310708726b
Merge pull request #12761 from NHZlX/global_pooling_trt
...
Add support for global pooling for trt
7 years ago
tensor-tang
b090479409
Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
nhzlx
1e92baf746
fix comments
7 years ago