* Add benchmark for PaddlePaddle, tensorflow and caffe
* ConvProjection to reduce memory for goolenet
* Add unit test for ConvProjection.
1. unit test in test_LayerGrad.
2. compare the ConvPorjection and CudnnConvLayer, also compare the concat_layer+img_conv_layer and concat_layer_conv_projection.
* Reduce cudnn_conv memory and add benchmark document.
1. Use TmpMatrix as the workspace in cudnn_conv to reduce gpu memory. It reduce lots of memory.
2. Add benchmark document.
3. fix smallnet_mnist_cifar.py in paddle.
* Add job=time and refine cudnn_conv to reduce gpu memroy and speed up
* Refine cudnn_conv and shared biases operation in concat_layer and mixed_layer.
* follow comments
* follow comments
* Use unique_ptr to prevent memory leaks in CudnnConvLayer.