Improve elementwise performance. (#23001)

* Improve elementwise performance.

Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.

* Add some cuda kernel to speedup common broadcast cases. test=develop

* Add more test cases and fix cuda kernel bug. test=develop

* Remove tests as cpu percision fails.test=develop

* Refine SplitDims, test=develop

* Change file mode, test=develop
revert-23830-2.0-beta
zhaoyuchen2018 5 years ago committed by GitHub
parent c524b930e7
commit 58615a6272
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
Loading…
Cancel
Save