There are mainly following fixes:
- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
* Simplize Gradient Check
* Stash
* Extract apply_backward_pass to backward.py
Rename apply_backward_pass to append_backward_ops
* Use graph API to check gradient
* Fix ci
* Fix CI
* Fix backward for double precision
* Stash
* Fix CI
* Fix ci
* Ignore GRU test
* Ignore xe op
* Fix CI
* Fix softmax with xe gradient
The correct equation should be IG = OG * (d_softmax_with_xe())
* Fix typo
* Fix merge error
* Disable LRN