* support no need buffer vars in dygraph, test=develop
* fix inference compilation error, test=develop
* update no_need_buffer_vars_inference, test=develop
* add unittests for no_need_buffer_vars_context, test=develop
* refine no_need_buffer_vars by return ref, test=develop
* polish some codes, test=develop
* replace part of the old implementation, test=develop
* restore concat op, test=develop
* update all ops implemention & delete GetDataTypeOfVar func, test=develop
* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop
* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop
* follow comments
test=develop
* Rewrite gradient ProtoMaker for affine_channel_op to remove the Output as the input.
* Add act in Python API to make the act can be in-place by layer_help.py