* Add mean_iou op.
* Add unitest for mean iou op.
* Add optional collections of confusion matrix and mean_iou.
* Fix cuda kernel.
* Refine code.
1. Merge computing in GPU to two kernel.
2. Use wrong array and correct array instead of confusion matrix.
* Add python api and fix cuda kernel.
* Fix comments.
* Small fix.
* Small fix.