commit
63de3ec2ba
@ -0,0 +1 @@
|
||||
.gitignore
|
@ -0,0 +1,20 @@
|
||||
- repo: https://github.com/Lucas-C/pre-commit-hooks.git
|
||||
sha: c25201a00e6b0514370501050cf2a8538ac12270
|
||||
hooks:
|
||||
- id: remove-crlf
|
||||
- repo: https://github.com/reyoung/mirrors-yapf.git
|
||||
sha: v0.13.2
|
||||
hooks:
|
||||
- id: yapf
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
sha: 7539d8bd1a00a3c1bfd34cdb606d3a6372e83469
|
||||
hooks:
|
||||
- id: check-added-large-files
|
||||
- id: check-merge-conflict
|
||||
- id: check-symlinks
|
||||
- id: detect-private-key
|
||||
- id: end-of-file-fixer
|
||||
- repo: https://github.com/PaddlePaddle/clang-format-pre-commit-hook.git
|
||||
sha: 28c0ea8a67a3e2dbbf4822ef44e85b63a0080a29
|
||||
hooks:
|
||||
- id: clang-formater
|
@ -0,0 +1,3 @@
|
||||
[style]
|
||||
based_on_style = pep8
|
||||
column_limit = 80
|
@ -0,0 +1,69 @@
|
||||
# Release v0.9.0
|
||||
|
||||
## New Features:
|
||||
|
||||
* New Layers
|
||||
* bilinear interpolation layer.
|
||||
* spatial pyramid-pool layer.
|
||||
* de-convolution layer.
|
||||
* maxout layer.
|
||||
* Support rectangle padding, stride, window and input for Pooling Operation.
|
||||
* Add —job=time in trainer, which can be used to print time info without compiler option -WITH_TIMER=ON.
|
||||
* Expose cost_weight/nce_layer in `trainer_config_helpers`
|
||||
* Add FAQ, concepts, h-rnn docs.
|
||||
* Add Bidi-LSTM and DB-LSTM to quick start demo @alvations
|
||||
* Add usage track scripts.
|
||||
|
||||
## Improvements
|
||||
|
||||
* Add Travis-CI for Mac OS X. Enable swig unittest in Travis-CI. Skip Travis-CI when only docs are changed.
|
||||
* Add code coverage tools.
|
||||
* Refine convolution layer to speedup and reduce GPU memory.
|
||||
* Speed up PyDataProvider2
|
||||
* Add ubuntu deb package build scripts.
|
||||
* Make Paddle use git-flow branching model.
|
||||
* PServer support no parameter blocks.
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
* add zlib link to py_paddle
|
||||
* add input sparse data check for sparse layer at runtime
|
||||
* Bug fix for sparse matrix multiplication
|
||||
* Fix floating-point overflow problem of tanh
|
||||
* Fix some nvcc compile options
|
||||
* Fix a bug in yield dictionary in DataProvider
|
||||
* Fix SRL hang when exit.
|
||||
|
||||
# Release v0.8.0beta.1
|
||||
New features:
|
||||
|
||||
* Mac OSX is supported by source code. #138
|
||||
* Both GPU and CPU versions of PaddlePaddle are supported.
|
||||
|
||||
* Support CUDA 8.0
|
||||
|
||||
* Enhance `PyDataProvider2`
|
||||
* Add dictionary yield format. `PyDataProvider2` can yield a dictionary with key is data_layer's name, value is features.
|
||||
* Add `min_pool_size` to control memory pool in provider.
|
||||
|
||||
* Add `deb` install package & docker image for no_avx machines.
|
||||
* Especially for cloud computing and virtual machines
|
||||
|
||||
* Automatically disable `avx` instructions in cmake when machine's CPU don't support `avx` instructions.
|
||||
|
||||
* Add Parallel NN api in trainer_config_helpers.
|
||||
|
||||
* Add `travis ci` for Github
|
||||
|
||||
Bug fixes:
|
||||
|
||||
* Several bugs in trainer_config_helpers. Also complete the unittest for trainer_config_helpers
|
||||
* Check if PaddlePaddle is installed when unittest.
|
||||
* Fix bugs in GTX series GPU
|
||||
* Fix bug in MultinomialSampler
|
||||
|
||||
Also more documentation was written since last release.
|
||||
|
||||
# Release v0.8.0beta.0
|
||||
|
||||
PaddlePaddle v0.8.0beta.0 release. The install package is not stable yet and it's a pre-release version.
|
@ -0,0 +1,9 @@
|
||||
paddle/image/logs
|
||||
paddle/image/*.pyc
|
||||
paddle/image/train.list
|
||||
paddle/rnn/logs
|
||||
paddle/rnn/*.pyc
|
||||
paddle/rnn/imdb.pkl
|
||||
caffe/image/logs
|
||||
tensorflow/image/logs
|
||||
tensorflow/rnn/logs
|
@ -0,0 +1,168 @@
|
||||
# Benchmark
|
||||
|
||||
Machine:
|
||||
|
||||
- CPU: 12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz
|
||||
- GPU: Tesla K40m
|
||||
- cuDNN: v5.1
|
||||
- system: Docker 1.12.1, all platforms are tested in docker environment.
|
||||
|
||||
Platforms:
|
||||
|
||||
- PaddlePaddle: paddledev/paddle:gpu-devel-v0.9.0a0
|
||||
- Tensorflow: gcr.io/tensorflow/tensorflow:0.11.0rc0-gpu
|
||||
- Caffe: kaixhin/cuda-caffe
|
||||
|
||||
Several convolutional neural networks and recurrent neural networks are used to test.
|
||||
|
||||
## Image
|
||||
|
||||
### Benchmark Model
|
||||
|
||||
AlexNet, GoogleNet and a small network used in Caffe.
|
||||
|
||||
- [AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet): but the group size is one.
|
||||
|
||||
- [GoogleNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet): but remove loss1 and loss2 when testing benchmark.
|
||||
|
||||
- [SmallNet](https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10\_quick\_train\_test.prototxt)
|
||||
|
||||
|
||||
### Single-GPU
|
||||
|
||||
- AlexNet: input - 3 * 227 * 227, Time: ms/batch
|
||||
|
||||
| BatchSize | 64 | 128 | 256 | 512 |
|
||||
|--------------|-----| -----| ------| -----|
|
||||
| PaddlePaddle | 195 | 334 | 602 | 1629 |
|
||||
| TensorFlow | 223 | 364 | 645 | 1235 |
|
||||
| Caffe | 324 | 627 | 1232 | 2513 |
|
||||
|
||||
**Notation**
|
||||
|
||||
All platforms use cuDNN-v5.1. We see that caffe is slower in this experiment, because its workspace limit size of cuDNN-conv interface is 8 * 1024 * 1024, which is smaller in PaddlePaddle and TensorFlow. Note that Caffe will be faster if increasing the workspace limit size.
|
||||
|
||||
- GoogletNet: input - 3 * 224 * 224, Time: ms/batch
|
||||
|
||||
|
||||
| BatchSize | 64 | 128 | 256 |
|
||||
|--------------|-------| -------| --------|
|
||||
| PaddlePaddle | 613 | 1149 | 2348 |
|
||||
| TensorFlow | 644 | 1176 | 2219 |
|
||||
| Caffe | 694 | 1364 | out of memory |
|
||||
|
||||
- SmallNet: input - 3 * 32 * 32, Time ms/batch
|
||||
|
||||
| BatchSize | 64 | 128 | 256 | 512 |
|
||||
|--------------|--------| -------- | --------|---------|
|
||||
| PaddlePaddle | 10.463 | 18.184 | 33.113 | 63.039 |
|
||||
| TensorFlow | 9 | 15 | 28 | 59 |
|
||||
| Caffe | 9.373 | 16.6606 | 31.4797 | 59.719 |
|
||||
|
||||
**Notation**
|
||||
|
||||
All the single-GPU experiments in caffe use `caffe time` to calculate elapsed time, which does not include parameter updating time. However, both PaddlePaddle and TensorFlow experiments contain the parameter updating time. As compared with the total time, this part is relatively little on single machine, we can ignore it.
|
||||
|
||||
In Tensorflow, they implement algorithm searching method instead of using the algorithm searching interface in cuDNN.
|
||||
|
||||
### Multi-GPU: 4 GPUs
|
||||
|
||||
- AlexNet, ms / batch
|
||||
|
||||
| total-BatchSize | 128 * 4 | 256 * 4 |
|
||||
|------------------|----------| -----------|
|
||||
| PaddlePaddle | 347 | 622 |
|
||||
| TensorFlow | 377 | 675 |
|
||||
| Caffe | 1229 | 2435 |
|
||||
|
||||
For example, if `total-BatchSize = 128 * 4`, the speedup ratio is calculated by
|
||||
|
||||
```
|
||||
time_at_1gpu_batch_128 * 4 / time_at_4gpu_total_batch_512
|
||||
= (334 * 4)/347
|
||||
= 3.85
|
||||
```
|
||||
|
||||
<img src="figs/alexnet-4gpu.png" width="420">
|
||||
|
||||
|
||||
- GoogleNet, ms / batch
|
||||
|
||||
| total-BatchSize | 128 * 4 | 256 * 4 |
|
||||
|-------------------|--------------| ----------- |
|
||||
| PaddlePaddle | 1178 | 2367 |
|
||||
| TensorFlow | 1210 | 2292 |
|
||||
| Caffe | 2007 | out of memory |
|
||||
|
||||
<img src="figs/googlenet-4gpu.png" width="420">
|
||||
|
||||
|
||||
## RNN
|
||||
We use lstm network for text classfication to test benchmark.
|
||||
|
||||
### Dataset
|
||||
- [IMDB](http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl)
|
||||
- Sequence length is 100. In fact, PaddlePaddle supports training with variable-length sequence, but TensorFlow needs to pad. Thus, we also pad sequence length to 100 in PaddlePaddle in order to compare.
|
||||
- Dictionary size=30000
|
||||
- Peephole connection is used in `lstmemory` by default in PaddlePaddle. It is also configured in TensorFlow.
|
||||
|
||||
### Single-GPU
|
||||
|
||||
#### LSTM in Text Classification
|
||||
|
||||
Testing `2 lstm layer + fc` network with different hidden size and batch size.
|
||||
|
||||
- Batch size = 64, ms / batch
|
||||
|
||||
| hidden_size | 256 | 512 | 1280 |
|
||||
|--------------|-------| -------| --------|
|
||||
| PaddlePaddle | 83 | 184 | 641 |
|
||||
| TensorFlow | 175 | 280 | 818 |
|
||||
|
||||
- Batch size = 128, ms / batch
|
||||
|
||||
| hidden_size | 256 | 512 | 1280 |
|
||||
|--------------|------- | -------| --------|
|
||||
| PaddlePaddle | 110 | 261 | 1007 |
|
||||
| TensorFlow | 181 | 361 | 1237 |
|
||||
|
||||
|
||||
- Batch size = 256, ms / batch
|
||||
|
||||
| hidden_size | 256 | 512 | 1280 |
|
||||
|--------------|-------| -------| --------|
|
||||
| PaddlePaddle | 170 | 414 | 1655 |
|
||||
| TensorFlow | 238 | 536 | 1905 |
|
||||
|
||||
<img src="figs/rnn_lstm_cls.png" width="600">
|
||||
|
||||
#### Seq2Seq
|
||||
|
||||
The benchmark of sequence-to-sequence network will be added later.
|
||||
|
||||
|
||||
### Multi GPU: 4 GPUs
|
||||
|
||||
#### LSTM in Text Classification
|
||||
|
||||
- hidden_size = 256, ms / batch
|
||||
|
||||
| batch_size | 256 | 512 |
|
||||
|--------------| -------| --------|
|
||||
| PaddlePaddle | 90 | 118 |
|
||||
| TensorFlow | 226 | 118 |
|
||||
|
||||
|
||||
- hidden_size = 512, ms / batch
|
||||
|
||||
| batch_size | 256 | 512 |
|
||||
|--------------| -------| --------|
|
||||
| PaddlePaddle | 189 | 268 |
|
||||
| TensorFlow | 297 | 383 |
|
||||
|
||||
|
||||
<img src="figs/rnn_lstm_4gpus.png" width="420">
|
||||
|
||||
#### Seq2Seq
|
||||
|
||||
The benchmark of sequence-to-sequence network will be added later.
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,30 @@
|
||||
set -e
|
||||
|
||||
function test() {
|
||||
cfg=$1
|
||||
batch=$2
|
||||
prefix=$3
|
||||
sed -i "/input: \"data\"/{n;s/^input_dim.*/input_dim: $batch/g}" $cfg
|
||||
sed -i "/input: \"label\"/{n;s/^input_dim.*/input_dim: $batch/g}" $cfg
|
||||
caffe time --model=$cfg --iterations=50 --gpu 0 > logs/$prefix-1gpu-batch${batch}.log 2>&1
|
||||
}
|
||||
|
||||
if [ ! -d "logs" ]; then
|
||||
mkdir logs
|
||||
fi
|
||||
|
||||
# alexnet
|
||||
test alexnet.prototxt 64 alexnet
|
||||
test alexnet.prototxt 128 alexnet
|
||||
test alexnet.prototxt 256 alexnet
|
||||
test alexnet.prototxt 512 alexnet
|
||||
|
||||
# googlenet
|
||||
test googlenet.prototxt 64 googlenet
|
||||
test googlenet.prototxt 128 googlenet
|
||||
|
||||
# small net
|
||||
test smallnet_mnist_cifar.prototxt 64 smallnet
|
||||
test smallnet_mnist_cifar.prototxt 128 smallnet
|
||||
test smallnet_mnist_cifar.prototxt 256 smallnet
|
||||
test smallnet_mnist_cifar.prototxt 512 smallnet
|
@ -0,0 +1,24 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
function test() {
|
||||
cfg=$1
|
||||
batch=$2
|
||||
prefix=$3
|
||||
batch_per_gpu=`expr ${batch} / 4`
|
||||
sed -i "/input: \"data\"/{n;s/^input_dim.*/input_dim: ${batch_per_gpu}/g}" $cfg
|
||||
sed -i "/input: \"label\"/{n;s/^input_dim.*/input_dim: ${batch_per_gpu}/g}" $cfg
|
||||
sed -i "1c\net : \"${cfg}\"" solver.prototxt
|
||||
caffe train --solver=solver.prototxt -gpu 0,1,2,3 > logs/${prefix}-4gpu-batch${batch}.log 2>&1
|
||||
}
|
||||
|
||||
if [ ! -d "logs" ]; then
|
||||
mkdir logs
|
||||
fi
|
||||
|
||||
# alexnet
|
||||
test alexnet.prototxt 512 alexnet
|
||||
test alexnet.prototxt 1024 alexnet
|
||||
|
||||
# googlnet
|
||||
test googlenet.prototxt 512 googlenet
|
@ -0,0 +1,198 @@
|
||||
name: "mnist/cifar"
|
||||
input: "data"
|
||||
input_dim: 128
|
||||
input_dim: 3
|
||||
input_dim: 32
|
||||
input_dim: 32
|
||||
input: "label"
|
||||
input_dim: 128
|
||||
input_dim: 1
|
||||
input_dim: 1
|
||||
input_dim: 1
|
||||
layer {
|
||||
name: "conv1"
|
||||
type: "Convolution"
|
||||
bottom: "data"
|
||||
top: "conv1"
|
||||
param {
|
||||
lr_mult: 1
|
||||
}
|
||||
param {
|
||||
lr_mult: 2
|
||||
}
|
||||
convolution_param {
|
||||
num_output: 32
|
||||
pad: 2
|
||||
kernel_size: 5
|
||||
stride: 1
|
||||
weight_filler {
|
||||
type: "gaussian"
|
||||
std: 0.0001
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant"
|
||||
}
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "pool1"
|
||||
type: "Pooling"
|
||||
bottom: "conv1"
|
||||
top: "pool1"
|
||||
pooling_param {
|
||||
pool: MAX
|
||||
kernel_size: 3
|
||||
stride: 2
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "relu1"
|
||||
type: "ReLU"
|
||||
bottom: "pool1"
|
||||
top: "pool1"
|
||||
}
|
||||
layer {
|
||||
name: "conv2"
|
||||
type: "Convolution"
|
||||
bottom: "pool1"
|
||||
top: "conv2"
|
||||
param {
|
||||
lr_mult: 1
|
||||
}
|
||||
param {
|
||||
lr_mult: 2
|
||||
}
|
||||
convolution_param {
|
||||
num_output: 32
|
||||
pad: 2
|
||||
kernel_size: 5
|
||||
stride: 1
|
||||
weight_filler {
|
||||
type: "gaussian"
|
||||
std: 0.01
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant"
|
||||
}
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "relu2"
|
||||
type: "ReLU"
|
||||
bottom: "conv2"
|
||||
top: "conv2"
|
||||
}
|
||||
layer {
|
||||
name: "pool2"
|
||||
type: "Pooling"
|
||||
bottom: "conv2"
|
||||
top: "pool2"
|
||||
pooling_param {
|
||||
pool: AVE
|
||||
kernel_size: 3
|
||||
stride: 2
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "conv3"
|
||||
type: "Convolution"
|
||||
bottom: "pool2"
|
||||
top: "conv3"
|
||||
param {
|
||||
lr_mult: 1
|
||||
}
|
||||
param {
|
||||
lr_mult: 2
|
||||
}
|
||||
convolution_param {
|
||||
num_output: 64
|
||||
pad: 2
|
||||
kernel_size: 5
|
||||
stride: 1
|
||||
weight_filler {
|
||||
type: "gaussian"
|
||||
std: 0.01
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant"
|
||||
}
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "relu3"
|
||||
type: "ReLU"
|
||||
bottom: "conv3"
|
||||
top: "conv3"
|
||||
}
|
||||
layer {
|
||||
name: "pool3"
|
||||
type: "Pooling"
|
||||
bottom: "conv3"
|
||||
top: "pool3"
|
||||
pooling_param {
|
||||
pool: AVE
|
||||
kernel_size: 3
|
||||
stride: 2
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "ip1"
|
||||
type: "InnerProduct"
|
||||
bottom: "pool3"
|
||||
top: "ip1"
|
||||
param {
|
||||
lr_mult: 1
|
||||
}
|
||||
param {
|
||||
lr_mult: 2
|
||||
}
|
||||
inner_product_param {
|
||||
num_output: 64
|
||||
weight_filler {
|
||||
type: "gaussian"
|
||||
std: 0.1
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant"
|
||||
}
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "ip2"
|
||||
type: "InnerProduct"
|
||||
bottom: "ip1"
|
||||
top: "ip2"
|
||||
param {
|
||||
lr_mult: 1
|
||||
}
|
||||
param {
|
||||
lr_mult: 2
|
||||
}
|
||||
inner_product_param {
|
||||
num_output: 10
|
||||
weight_filler {
|
||||
type: "gaussian"
|
||||
std: 0.1
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant"
|
||||
}
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "accuracy"
|
||||
type: "Accuracy"
|
||||
bottom: "ip2"
|
||||
bottom: "label"
|
||||
top: "accuracy"
|
||||
include {
|
||||
phase: TEST
|
||||
}
|
||||
}
|
||||
layer {
|
||||
name: "loss"
|
||||
type: "SoftmaxWithLoss"
|
||||
bottom: "ip2"
|
||||
bottom: "label"
|
||||
top: "loss"
|
||||
}
|
@ -0,0 +1,10 @@
|
||||
net: "alexnet.prototxt"
|
||||
base_lr: 0.01
|
||||
lr_policy: "fixed"
|
||||
display: 20
|
||||
max_iter: 200
|
||||
momentum: 0.9
|
||||
weight_decay: 0.0005
|
||||
snapshot: 10000
|
||||
snapshot_prefix: "models/caffe_alexnet_train"
|
||||
solver_mode: GPU
|
After Width: | Height: | Size: 82 KiB |
After Width: | Height: | Size: 82 KiB |
After Width: | Height: | Size: 72 KiB |
After Width: | Height: | Size: 115 KiB |
@ -0,0 +1,64 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from paddle.trainer_config_helpers import *
|
||||
|
||||
height = 227
|
||||
width = 227
|
||||
num_class = 1000
|
||||
batch_size = get_config_arg('batch_size', int, 128)
|
||||
|
||||
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class}
|
||||
define_py_data_sources2(
|
||||
"train.list", None, module="provider", obj="process", args=args)
|
||||
|
||||
settings(
|
||||
batch_size=batch_size,
|
||||
learning_rate=0.01 / batch_size,
|
||||
learning_method=MomentumOptimizer(0.9),
|
||||
regularization=L2Regularization(0.0005 * batch_size))
|
||||
|
||||
# conv1
|
||||
net = data_layer('data', size=height * width * 3)
|
||||
net = img_conv_layer(
|
||||
input=net,
|
||||
filter_size=11,
|
||||
num_channels=3,
|
||||
num_filters=96,
|
||||
stride=4,
|
||||
padding=1)
|
||||
net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75)
|
||||
net = img_pool_layer(input=net, pool_size=3, stride=2)
|
||||
|
||||
# conv2
|
||||
net = img_conv_layer(
|
||||
input=net, filter_size=5, num_filters=256, stride=1, padding=2, groups=1)
|
||||
net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75)
|
||||
net = img_pool_layer(input=net, pool_size=3, stride=2)
|
||||
|
||||
# conv3
|
||||
net = img_conv_layer(
|
||||
input=net, filter_size=3, num_filters=384, stride=1, padding=1)
|
||||
# conv4
|
||||
net = img_conv_layer(
|
||||
input=net, filter_size=3, num_filters=384, stride=1, padding=1, groups=1)
|
||||
|
||||
# conv5
|
||||
net = img_conv_layer(
|
||||
input=net, filter_size=3, num_filters=256, stride=1, padding=1, groups=1)
|
||||
net = img_pool_layer(input=net, pool_size=3, stride=2)
|
||||
|
||||
net = fc_layer(
|
||||
input=net,
|
||||
size=4096,
|
||||
act=ReluActivation(),
|
||||
layer_attr=ExtraAttr(drop_rate=0.5))
|
||||
net = fc_layer(
|
||||
input=net,
|
||||
size=4096,
|
||||
act=ReluActivation(),
|
||||
layer_attr=ExtraAttr(drop_rate=0.5))
|
||||
net = fc_layer(input=net, size=1000, act=SoftmaxActivation())
|
||||
|
||||
lab = data_layer('label', num_class)
|
||||
loss = cross_entropy(input=net, label=lab)
|
||||
outputs(loss)
|
@ -0,0 +1,226 @@
|
||||
#!/usr/bin/env python
|
||||
from paddle.trainer_config_helpers import *
|
||||
|
||||
height = 224
|
||||
width = 224
|
||||
num_class = 1000
|
||||
batch_size = get_config_arg('batch_size', int, 128)
|
||||
|
||||
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class}
|
||||
define_py_data_sources2(
|
||||
"train.list", None, module="provider", obj="process", args=args)
|
||||
|
||||
settings(
|
||||
batch_size=batch_size,
|
||||
learning_rate=0.01 / batch_size,
|
||||
learning_method=MomentumOptimizer(0.9),
|
||||
regularization=L2Regularization(0.0005 * batch_size))
|
||||
|
||||
def inception2(name, input, channels, \
|
||||
filter1,
|
||||
filter3R, filter3,
|
||||
filter5R, filter5,
|
||||
proj):
|
||||
|
||||
conv1 = name + '_1'
|
||||
conv3r = name + '_3r'
|
||||
conv3 = name + '_3'
|
||||
conv5r = name + '_5r'
|
||||
conv5 = name + '_5'
|
||||
maxpool = name + '_max'
|
||||
convproj = name + '_proj'
|
||||
|
||||
cov1 = img_conv_layer(
|
||||
name=conv1,
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter1,
|
||||
stride=1,
|
||||
padding=0)
|
||||
|
||||
cov3r = img_conv_layer(
|
||||
name=conv3r,
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter3R,
|
||||
stride=1,
|
||||
padding=0)
|
||||
cov3 = img_conv_layer(
|
||||
name=conv3,
|
||||
input=cov3r,
|
||||
filter_size=3,
|
||||
num_filters=filter3,
|
||||
stride=1,
|
||||
padding=1)
|
||||
|
||||
cov5r = img_conv_layer(
|
||||
name=conv5r,
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter5R,
|
||||
stride=1,
|
||||
padding=0)
|
||||
cov5 = img_conv_layer(
|
||||
name=conv5,
|
||||
input=cov5r,
|
||||
filter_size=5,
|
||||
num_filters=filter5,
|
||||
stride=1,
|
||||
padding=2)
|
||||
|
||||
pool1 = img_pool_layer(
|
||||
name=maxpool,
|
||||
input=input,
|
||||
pool_size=3,
|
||||
num_channels=channels,
|
||||
stride=1,
|
||||
padding=1)
|
||||
covprj = img_conv_layer(
|
||||
name=convproj,
|
||||
input=pool1,
|
||||
filter_size=1,
|
||||
num_filters=proj,
|
||||
stride=1,
|
||||
padding=0)
|
||||
|
||||
cat = concat_layer(name=name, input=[cov1, cov3, cov5, covprj])
|
||||
return cat
|
||||
|
||||
def inception(name, input, channels, \
|
||||
filter1,
|
||||
filter3R, filter3,
|
||||
filter5R, filter5,
|
||||
proj):
|
||||
|
||||
cov1 = conv_projection(
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter1,
|
||||
stride=1,
|
||||
padding=0)
|
||||
|
||||
cov3r = img_conv_layer(
|
||||
name=name + '_3r',
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter3R,
|
||||
stride=1,
|
||||
padding=0)
|
||||
cov3 = conv_projection(
|
||||
input=cov3r, filter_size=3, num_filters=filter3, stride=1, padding=1)
|
||||
|
||||
cov5r = img_conv_layer(
|
||||
name=name + '_5r',
|
||||
input=input,
|
||||
filter_size=1,
|
||||
num_channels=channels,
|
||||
num_filters=filter5R,
|
||||
stride=1,
|
||||
padding=0)
|
||||
cov5 = conv_projection(
|
||||
input=cov5r, filter_size=5, num_filters=filter5, stride=1, padding=2)
|
||||
|
||||
pool1 = img_pool_layer(
|
||||
name=name + '_max',
|
||||
input=input,
|
||||
pool_size=3,
|
||||
num_channels=channels,
|
||||
stride=1,
|
||||
padding=1)
|
||||
covprj = conv_projection(
|
||||
input=pool1, filter_size=1, num_filters=proj, stride=1, padding=0)
|
||||
|
||||
cat = concat_layer(
|
||||
name=name,
|
||||
input=[cov1, cov3, cov5, covprj],
|
||||
bias_attr=True,
|
||||
act=ReluActivation())
|
||||
return cat
|
||||
|
||||
|
||||
lab = data_layer(name="label", size=1000)
|
||||
data = data_layer(name="input", size=3 * height * width)
|
||||
|
||||
# stage 1
|
||||
conv1 = img_conv_layer(
|
||||
name="conv1",
|
||||
input=data,
|
||||
filter_size=7,
|
||||
num_channels=3,
|
||||
num_filters=64,
|
||||
stride=2,
|
||||
padding=3)
|
||||
pool1 = img_pool_layer(
|
||||
name="pool1", input=conv1, pool_size=3, num_channels=64, stride=2)
|
||||
|
||||
# stage 2
|
||||
conv2_1 = img_conv_layer(
|
||||
name="conv2_1",
|
||||
input=pool1,
|
||||
filter_size=1,
|
||||
num_filters=64,
|
||||
stride=1,
|
||||
padding=0)
|
||||
conv2_2 = img_conv_layer(
|
||||
name="conv2_2",
|
||||
input=conv2_1,
|
||||
filter_size=3,
|
||||
num_filters=192,
|
||||
stride=1,
|
||||
padding=1)
|
||||
pool2 = img_pool_layer(
|
||||
name="pool2", input=conv2_2, pool_size=3, num_channels=192, stride=2)
|
||||
|
||||
# stage 3
|
||||
ince3a = inception("ince3a", pool2, 192, 64, 96, 128, 16, 32, 32)
|
||||
ince3b = inception("ince3b", ince3a, 256, 128, 128, 192, 32, 96, 64)
|
||||
pool3 = img_pool_layer(
|
||||
name="pool3", input=ince3b, num_channels=480, pool_size=3, stride=2)
|
||||
|
||||
# stage 4
|
||||
ince4a = inception("ince4a", pool3, 480, 192, 96, 208, 16, 48, 64)
|
||||
ince4b = inception("ince4b", ince4a, 512, 160, 112, 224, 24, 64, 64)
|
||||
ince4c = inception("ince4c", ince4b, 512, 128, 128, 256, 24, 64, 64)
|
||||
ince4d = inception("ince4d", ince4c, 512, 112, 144, 288, 32, 64, 64)
|
||||
ince4e = inception("ince4e", ince4d, 528, 256, 160, 320, 32, 128, 128)
|
||||
pool4 = img_pool_layer(
|
||||
name="pool4", input=ince4e, num_channels=832, pool_size=3, stride=2)
|
||||
|
||||
# stage 5
|
||||
ince5a = inception("ince5a", pool4, 832, 256, 160, 320, 32, 128, 128)
|
||||
ince5b = inception("ince5b", ince5a, 832, 384, 192, 384, 48, 128, 128)
|
||||
pool5 = img_pool_layer(
|
||||
name="pool5",
|
||||
input=ince5b,
|
||||
num_channels=1024,
|
||||
pool_size=7,
|
||||
stride=7,
|
||||
pool_type=AvgPooling())
|
||||
|
||||
# We remove loss1 and loss2 for all system when testing benchmark
|
||||
# output 1
|
||||
# pool_o1 = img_pool_layer(name="pool_o1", input=ince4a, num_channels=512, pool_size=5, stride=3, pool_type=AvgPooling())
|
||||
# conv_o1 = img_conv_layer(name="conv_o1", input=pool_o1, filter_size=1, num_filters=128, stride=1, padding=0)
|
||||
# fc_o1 = fc_layer(name="fc_o1", input=conv_o1, size=1024, layer_attr=ExtraAttr(drop_rate=0.7), act=ReluActivation())
|
||||
# out1 = fc_layer(name="output1", input=fc_o1, size=1000, act=SoftmaxActivation())
|
||||
# loss1 = cross_entropy(name='loss1', input=out1, label=lab, coeff=0.3)
|
||||
|
||||
# output 2
|
||||
#pool_o2 = img_pool_layer(name="pool_o2", input=ince4d, num_channels=528, pool_size=5, stride=3, pool_type=AvgPooling())
|
||||
#conv_o2 = img_conv_layer(name="conv_o2", input=pool_o2, filter_size=1, num_filters=128, stride=1, padding=0)
|
||||
#fc_o2 = fc_layer(name="fc_o2", input=conv_o2, size=1024, layer_attr=ExtraAttr(drop_rate=0.7), act=ReluActivation())
|
||||
#out2 = fc_layer(name="output2", input=fc_o2, size=1000, act=SoftmaxActivation())
|
||||
#loss2 = cross_entropy(name='loss2', input=out2, label=lab, coeff=0.3)
|
||||
|
||||
# output 3
|
||||
dropout = dropout_layer(name="dropout", input=pool5, dropout_rate=0.4)
|
||||
out3 = fc_layer(
|
||||
name="output3", input=dropout, size=1000, act=SoftmaxActivation())
|
||||
loss3 = cross_entropy(name='loss3', input=out3, label=lab)
|
||||
|
||||
outputs(loss3)
|
@ -0,0 +1,26 @@
|
||||
import io, os
|
||||
import random
|
||||
import numpy as np
|
||||
from paddle.trainer.PyDataProvider2 import *
|
||||
|
||||
|
||||
def initHook(settings, height, width, color, num_class, **kwargs):
|
||||
settings.height = height
|
||||
settings.width = width
|
||||
settings.color = color
|
||||
settings.num_class = num_class
|
||||
if settings.color:
|
||||
settings.data_size = settings.height * settings.width * 3
|
||||
else:
|
||||
settings.data_size = settings.height * settings.width
|
||||
|
||||
settings.slots = [dense_vector(settings.data_size), integer_value(1)]
|
||||
|
||||
|
||||
@provider(
|
||||
init_hook=initHook, min_pool_size=-1, cache=CacheType.CACHE_PASS_IN_MEM)
|
||||
def process(settings, file_list):
|
||||
for i in xrange(1024):
|
||||
img = np.random.rand(1, settings.data_size).reshape(-1, 1).flatten()
|
||||
lab = random.randint(0, settings.num_class)
|
||||
yield img.astype('float32'), int(lab)
|
@ -0,0 +1,51 @@
|
||||
set -e
|
||||
|
||||
function train() {
|
||||
cfg=$1
|
||||
thread=$2
|
||||
bz=$3
|
||||
args="batch_size=$3"
|
||||
prefix=$4
|
||||
paddle train --job=time \
|
||||
--config=$cfg \
|
||||
--use_gpu=True \
|
||||
--trainer_count=$thread \
|
||||
--log_period=10 \
|
||||
--test_period=100 \
|
||||
--config_args=$args \
|
||||
> logs/$prefix-${thread}gpu-$bz.log 2>&1
|
||||
}
|
||||
|
||||
if [ ! -d "train.list" ]; then
|
||||
echo " " > train.list
|
||||
fi
|
||||
if [ ! -d "logs" ]; then
|
||||
mkdir logs
|
||||
fi
|
||||
|
||||
#========single-gpu=========#
|
||||
# alexnet
|
||||
train alexnet.py 1 64 alexnet
|
||||
train alexnet.py 1 128 alexnet
|
||||
train alexnet.py 1 256 alexnet
|
||||
train alexnet.py 1 512 alexnet
|
||||
|
||||
# googlenet
|
||||
train googlenet.py 1 64 googlenet
|
||||
train googlenet.py 1 128 googlenet
|
||||
train googlenet.py 1 256 googlenet
|
||||
|
||||
# smallnet
|
||||
train smallnet_mnist_cifar.py 1 64 smallnet
|
||||
train smallnet_mnist_cifar.py 1 128 smallnet
|
||||
train smallnet_mnist_cifar.py 1 256 smallnet
|
||||
train smallnet_mnist_cifar.py 1 512 smallnet
|
||||
|
||||
|
||||
############################
|
||||
#========multi-gpus=========#
|
||||
train alexnet.py 4 512 alexnet
|
||||
train alexnet.py 4 1024 alexnet
|
||||
|
||||
train googlenet.py 4 512 googlenet
|
||||
train googlenet.py 4 1024 googlenet
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue