|
|
@ -1,38 +1,146 @@
|
|
|
|
因为我们不提供非Ubuntu的bulid支持,所以如果用户用其他操作系统,比如CoreOS、CentOS、MacOS X、Windows,开发都得在docker里。所以需要能build本地修改后的代码。
|
|
|
|
We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md.
|
|
|
|
|
|
|
|
|
|
|
|
我们可能需要两个 Docker images:
|
|
|
|
I am recording some ideas here, and we should file a PR later.
|
|
|
|
|
|
|
|
|
|
|
|
1. development image:不包括源码,但是包括开发环境(预先安装好各种工具),也就是说Dockerfile.dev里既不需要 COPY 也不需要 RUN git clone。虽然这个image和源码无关,但是不同版本的源码需要依赖不同的第三方库,所以这个image的tag里还是要包含git branch/tag name,比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`,这里的0.10.0.rc1是一个branch name,其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag,叫做0.10.0。
|
|
|
|
## Current Status
|
|
|
|
|
|
|
|
|
|
|
|
1. production image: 不包括编译环境,也不包括源码,只包括build好的libpaddle.so和必要的Python packages,用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1`。
|
|
|
|
Currently, we have four sets of Dockefiles:
|
|
|
|
|
|
|
|
|
|
|
|
从1.生成2.的过程如下:
|
|
|
|
1. Kubernetes examples:
|
|
|
|
|
|
|
|
|
|
|
|
1. 在本机(host)上开发。假设源码位于 `~/work/paddle`。
|
|
|
|
```
|
|
|
|
|
|
|
|
doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh
|
|
|
|
|
|
|
|
doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh
|
|
|
|
|
|
|
|
doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one.
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Generate .deb packages:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. In the `docker` directory:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
paddle/scripts/docker/Dockerfile
|
|
|
|
|
|
|
|
paddle/scripts/docker/Dockerfile.gpu
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Document building
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets.
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Goal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We want two Docker images for each version of PaddlePaddle:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. `paddle:<version>-dev`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This a development image contains only the development tools. This standardizes the building tools and procedure. Users include:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
|
|
|
|
|
|
|
|
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
|
|
|
|
|
|
|
|
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
So the development image must contain not only source code building tools, but also documentation tools:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- gcc/clang
|
|
|
|
|
|
|
|
- nvcc
|
|
|
|
|
|
|
|
- Python
|
|
|
|
|
|
|
|
- sphinx
|
|
|
|
|
|
|
|
- woboq
|
|
|
|
|
|
|
|
- sshd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where `sshd` makes it easy for developers to have multiple terminals connecting into the container.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. `paddle:<version>`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is the production image, generated using the development image. This image might have multiple variants:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- GPU/AVX `paddle:<version>-gpu`
|
|
|
|
|
|
|
|
- GPU/no-AVX `paddle:<version>-gpu-noavx`
|
|
|
|
|
|
|
|
- no-GPU/AVX `paddle:<version>`
|
|
|
|
|
|
|
|
- no-GPU/no-AVX `paddle:<version>-noavx`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Dockerfile
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Let us go over our daily development procedure to show how developers can use this file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Check out the source code
|
|
|
|
|
|
|
|
|
|
|
|
1. 用dev image build 我们的源码:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
```bash
|
|
|
|
docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1 /paddle/build.sh
|
|
|
|
git clone https://github.com/PaddlePaddle/Paddle paddle
|
|
|
|
```
|
|
|
|
```
|
|
|
|
注意,这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录;而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码,结果在container里的 `/paddle/build` 目录里,也就是本地的源码目录里的 `build` 子目录。
|
|
|
|
|
|
|
|
|
|
|
|
1. Do something
|
|
|
|
|
|
|
|
|
|
|
|
1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile,使得我们可以运行:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
```bash
|
|
|
|
docker build -t paddle ./build
|
|
|
|
cd paddle
|
|
|
|
|
|
|
|
git checkout -b my_work
|
|
|
|
|
|
|
|
Edit some files
|
|
|
|
```
|
|
|
|
```
|
|
|
|
来生成我们的production image。
|
|
|
|
|
|
|
|
|
|
|
|
1. Build/update the development image (if not yet)
|
|
|
|
1. 有了这个production image之后,我们可能会希望docker push 到dockerhub.com的我们自己的名下,然后可以用来启动本地或者远程(Kubernetes)jobs:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
```bash
|
|
|
|
docker tag paddle yiwang/paddle:did-some-change
|
|
|
|
docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
|
|
|
|
docker push
|
|
|
|
```
|
|
|
|
paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py
|
|
|
|
|
|
|
|
|
|
|
|
1. Build the source code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This command maps the source directory on the host into `/paddle` in the container.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image. We will talk about it later.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Run on the host (Not recommended)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Run in the development container
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`build.sh` generates binary files and invokes `make install`. So we can run the built program within the development container. This is convenient for developers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Build a production image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
On the host, we can use the `$PWD/build/Dockerfile` to generate a production image.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile .
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。
|
|
|
|
1. Run the Paddle Book
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
docker run -it paddle
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Run on Kubernetes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
docker tag paddle me/paddle
|
|
|
|
|
|
|
|
docker push
|
|
|
|
|
|
|
|
kubectl ...
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
曾经的讨论背景:
|
|
|
|
For end users, we will provide more convinient tools to run distributed jobs.
|
|
|
|
["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599)
|
|
|
|
|
|
|
|
["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598)
|
|
|
|
|
|
|
|