Do not use ctor
* Reduce line of codes.
* We can use virtual function for Maker now.
* The implementation does not care what maker holds, it is easier to
refactor later.
* init
* add some check
* add dist transpile logic
* add insert op for block
* init change get_pserver_program
* optimize code
* fix a bug
* can run now
* start to do table split
* start to process table gradient
* complete pserver part
* can send_vars now
* revert cpplint
* fix a bug
* optimize code
* move dist test to models
* revert the interface of distribute_transpiler.transpile
* fix prefetch_block
* optimize trainspiler code
* add comment to sum_op
* add warning log
* fix comment
* fix test_send_recv
* fix test_send_recv
* fix train with no distributed table
* optimize GetDims