4.0 KiB

Raw Blame History

Fluid Benchmark

This directory contains several models configurations and tools that used to run Fluid benchmarks for local and distributed training.

Run the Benchmark

To start, run the following command to get the full help message:

python fluid_benchmark.py --help

Currently supported --model argument include:

mnist
resnet
- you can chose to use different dataset using --data_set cifar10 or --data_set flowers.
vgg
stacked_dynamic_lstm
machine_translation
Run the following command to start a benchmark job locally:
```
  python fluid_benchmark.py --model mnist --device GPU
```
You can choose to use GPU/CPU training. With GPU training, you can specify --gpus <gpu_num> to run multi GPU training. You can set async mode parameter server. With async mode, you can specify --async_mode to train model asynchronous.

Run distributed training with parameter servers:

see run_fluid_benchmark.sh as an example.

start parameter servers:

PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=1 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist  --device GPU --update_method pserver
sleep 15

start trainers:

PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=1 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist  --device GPU --update_method pserver

Run distributed training using NCCL2

PADDLE_PSERVER_PORT=7164 PADDLE_TRAINER_IPS=192.168.0.2,192.168.0.3  PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist --device GPU --update_method nccl2

Prepare the RecordIO file to Achieve Better Performance

Run the following command will generate RecordIO files like "mnist.recordio" under the path and batch_size you choose, you can use batch_size=1 so that later reader can change the batch_size at any time using fluid.batch.

python -c 'from recordio_converter import *; prepare_mnist("data", 1)'

Run Distributed Benchmark on Kubernetes Cluster

You may need to build a Docker image before submitting a cluster job onto Kubernetes, or you will have to start all those processes mannually on each node, which is not recommended.

To build the Docker image, you need to choose a paddle "whl" package to run with, you may either download it from http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_en.html or build it by your own. Once you've got the "whl" package, put it under the current directory and run:

docker build -t [your docker image name]:[your docker image tag] .

Then push the image to a Docker registry that your Kubernetes cluster can reach.

We provide a script kube_gen_job.py to generate Kubernetes yaml files to submit distributed benchmark jobs to your cluster. To generate a job yaml, just run:

python kube_gen_job.py --jobname myjob --pscpu 4 --cpu 8 --gpu 8 --psmemory 20 --memory 40 --pservers 4 --trainers 4 --entry "python fluid_benchmark.py --model mnist --gpus 8 --device GPU --update_method pserver " --disttype pserver

Then the yaml files are generated under directory myjob, you can run:

kubectl create -f myjob/

The job shall start.

Notes for Run Fluid Distributed with NCCL2 and RDMA

Before running NCCL2 distributed jobs, please check that whether your node has multiple network interfaces, try to add the environment variable export NCCL_SOCKET_IFNAME=eth0 to use your actual network device.

To run high-performance distributed training, you must prepare your hardware environment to be able to run RDMA enabled network communication, please check out this note for details.

4.0 KiB Raw Blame History