You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Paddle/doc/ui/cmd_argument/use_case.md

184 lines
6.0 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Use Case
## Local Training
These command line arguments are commonly used by local training experiments, such as image classification, natural language processing, et al.
```
paddle train \
--use_gpu=1/0 \ #1:GPU,0:CPU(default:true)
--config=network_config \
--save_dir=output \
--trainer_count=COUNT \ #(default:1)
--test_period=M \ #(default:1000
--test_all_data_in_one_period=true \ #(default:false)
--num_passes=N \ #(defalut:100
--log_period=K \ #(default:100)
--dot_period=1000 \ #(default:1)
#[--show_parameter_stats_period=100] \ #(default:0)
#[--saving_period_by_batches=200] \ #(default:0)
```
`show_parameter_stats_period` and `saving_period_by_batches` are optional according to your task.
### 1) Pass Command Argument to Network config
`config_args` is a useful parameter to pass arguments to network config.
```
--config_args=generating=1,beam_size=5,layer_num=10 \
```
And `get_config_arg` can be used to parse these arguments in network config as follows:
```
generating = get_config_arg('generating', bool, False)
beam_size = get_config_arg('beam_size', int, 3)
layer_num = get_config_arg('layer_num', int, 8)
```
`get_config_arg`:
```
get_config_arg(name, type, default_value)
```
- name: the name specified in the `--config_args`
- type: value type, bool, int, str, float etc.
- default_value: default value if not set.
### 2) Use Model to Initialize Network
add argument:
```
--init_model_path=model_path
--load_missing_parameter_strategy=rand
```
## Local Testing
Method 1:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--init_model_path=model_path \
```
- use init\_model\_path to specify test model.
- only can test one model.
Method 2:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--model_list=model.list \
```
- use model_list to specify test models
- can test several models, where model.list likes:
```
./alexnet_pass1
./alexnet_pass2
```
Method 3:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--save_dir=model \
--test_pass=M \
--num_passes=N \
```
This way must use model path saved by Paddle like this: `model/pass-%5d`. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test `model/pass-00012` and `model/pass-00013`.
## Sparse Training
Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.
### 1) Local training
You need to set **sparse\_update=True** in network config. Check the network config documentation for more details.
### 2) cluster training
Add the following argument for cluster training of a sparse model. At the same time you need to set **sparse\_remote\_update=True** in network config. Check the network config documentation for more details.
```
--ports_num_for_sparse=1 #(default: 0)
```
## parallel_nn
`parallel_nn` can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can **reduce GPU memory** or **use parallel computation to accelerate some layers**.
If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:
```
--parallel_nn=true
```
### case 1: Mixed Use of GPU and CPU
Consider the following example:
```
#command line:
paddle train --use_gpu=true --parallel_nn=true trainer_count=COUNT
default_device(0)
fc1=fc_layer(...)
fc2=fc_layer(...)
fc3=fc_layer(...,layer_attr=ExtraAttr(device=-1))
```
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer l1 and l2 are computed on the GPU.
- device=-1: use the CPU for layer l3.
- trainer_count:
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers l1 and l2. Otherwise use the GPU with gpu\_id.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer l1 and l2.
### Case 2: Specify Layers in Different Devices
```
#command line:
paddle train --use_gpu=true --parallel_nn=true --trainer_count=COUNT
#network:
fc2=fc_layer(input=l1, layer_attr=ExtraAttr(device=0), ...)
fc3=fc_layer(input=l1, layer_attr=ExtraAttr(device=1), ...)
fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)
```
In this case, we assume that there are 4 GPUs in one machine.
- trainer_count=1:
- Use GPU 0 to compute layer l2.
- Use GPU 1 to compute layer l3.
- Use CPU to compute layer l4.
- trainer_count=2:
- Use GPU 0 and 1 to compute layer l2.
- Use GPU 2 and 3 to compute layer l3.
- Use CPU to compute l4 in two threads.
- trainer_count=4:
- It will fail (note, we have assumed that there are 4 GPUs in machine), because argument `allow_only_one_model_on_one_gpu` is true by default.
**Allocation of device ID when `device!=-1`**:
```
(deviceId + gpu_id + threadId * numLogicalDevices_) % numDevices_
deviceId: specified in layer.
gpu_id: 0 by default.
threadId: thread ID, range: 0,1,..., trainer_count-1
numDevices_: device (GPU) count in machine.
numLogicalDevices_: min(max(deviceId + 1), numDevices_)
```