* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* add dist ut for text_classification
* add dist ut for text_classification
* add simnet bow unittest
* add dist ut for simnet bow
* add trainning data url for simnet bow
* add trainning data url for simnet bow
* modify simnet test_reader to train reader
* add test_dist_ctr
* test_dist_ctr can run now
* dense update is good
* add unit test for selected rows
* debug unit test
* fix dist sparse update problem
* Constant args at init
* optimize code
* simnet optimize
* fix DebugStringEx
* optimize sum_op.h
* add ScaleOpVarTypeInference
* clean code
* fix test_dist_transpiler.py
* code optimize
* modify delta
* fix sparse update bug
* dist test use one cpu
* update some data
* remove unused code
* add use cuda config
* unit test fix
* unit test fix
* unit test fix
* unit test fix
* dist_word2vec use CPU
* unit test fix
* unit test fix
* code clean
* code clean
* merge develop
* api spec update
* Revert: api spec update
* replace simnet data with fake
* replace simnet data with fake
* update dim
* add batch auc
* code clean
* code clean
* modify print to stderr
* update simnet delta -> 1e-5
* update RUN_STEP
* add use_reader_alloc
* add use_reader_alloc
* add use_reader_alloc
* modify delta
* add use_reader_alloc
* fix stderr write
* python3 compatibility
test=develop
* python3 compatibility, test=develop
* Update dist_text_classification.py
* test=develop
* wip
* clean up
* should fix running with memopt
* add ut
* mark lr schedule op role
* hide lr_schedule_guard
* use op_role_var instead of ufind
* unify dist test name
* wip for py3 support
* fix var deref
* fix python3 mem_opt order
* remove comments