* Optimize GRU with AVX instruction
* Clean code
* Add the Unitest and fix the align issue
* Remove the remanent part of the unitest part
* Code clean
* Fix the parameters length issue for fusion_gru to pass CI
* Change the default type as float32
* add lod_tensor util and modify pybind
* refind pybind LoDTensor API and modify LoDTensor and DataFeeder test
* fix test error
* fix detection map op test
* fix reorder_lod_tensor test
* fix seq_concat_op
* fix chunk evel op test
* fix target assign op
* fix warp ctc op
* address comments step 1: reverse reset_lod op
* step 2: modify op test
* add warning message
* remove has_valid_lod
* add back has_valid_lod
* address comments
* add exception catching trial