Merge pull request #866 from gangliao/doc_cn

Integrate doc en/cn into single doc
9 years ago · b0c6331cde
parent 48128e51ba 5848288142
commit b0c6331cde
108 changed files with 1027 additions and 1644 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -169,5 +169,4 @@ add_subdirectory(paddle)
 add_subdirectory(python)
 if(WITH_DOC)
    add_subdirectory(doc)
    add_subdirectory(doc_cn)
 endif()
--- a/doc/CMakeLists.txt
+++ b/doc/CMakeLists.txt
@ -7,25 +7,50 @@ if(NOT DEFINED SPHINX_THEME_DIR)
 endif()
 # configured documentation tools and intermediate build results
-set(BINARY_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/_build")
+set(BINARY_BUILD_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_build")
 # Sphinx cache with pickled ReST documents
-set(SPHINX_CACHE_DIR "${CMAKE_CURRENT_BINARY_DIR}/_doctrees")
+set(SPHINX_CACHE_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_doctrees")
-# HTML output directory
+# HTML output director
-set(SPHINX_HTML_DIR "${CMAKE_CURRENT_BINARY_DIR}/html")
+set(SPHINX_HTML_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/html")
 configure_file(
-    "${CMAKE_CURRENT_SOURCE_DIR}/conf.py.in"
+    "${CMAKE_CURRENT_SOURCE_DIR}/conf.py.en.in"
-    "${BINARY_BUILD_DIR}/conf.py"
+    "${BINARY_BUILD_DIR_EN}/conf.py"
    @ONLY)
 sphinx_add_target(paddle_docs
                  html
-                  ${BINARY_BUILD_DIR}
+                  ${BINARY_BUILD_DIR_EN}
-                  ${SPHINX_CACHE_DIR}
+                  ${SPHINX_CACHE_DIR_EN}
                  ${CMAKE_CURRENT_SOURCE_DIR}
-                  ${SPHINX_HTML_DIR})
+                  ${SPHINX_HTML_DIR_EN})
 add_dependencies(paddle_docs
  gen_proto_py)
 # configured documentation tools and intermediate build results
 set(BINARY_BUILD_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/_build")
 # Sphinx cache with pickled ReST documents
 set(SPHINX_CACHE_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/_doctrees")
 # HTML output directory
 set(SPHINX_HTML_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/html")
 configure_file(
    "${CMAKE_CURRENT_SOURCE_DIR}/conf.py.cn.in"
    "${BINARY_BUILD_DIR_CN}/conf.py"
    @ONLY)
 sphinx_add_target(paddle_docs_cn
                  html
                  ${BINARY_BUILD_DIR_CN}
                  ${SPHINX_CACHE_DIR_CN}
                  ${CMAKE_CURRENT_SOURCE_DIR}
                  ${SPHINX_HTML_DIR_CN})
 add_dependencies(paddle_docs_cn
  gen_proto_py)
--- a/doc/api/data_provider/dataprovider_cn.rst
+++ b/doc/api/data_provider/dataprovider_cn.rst
--- a/doc/api/data_provider/dataprovider_en.rst
+++ b/doc/api/data_provider/dataprovider_en.rst
--- a/doc/api/data_provider/pydataprovider2_cn.rst
+++ b/doc/api/data_provider/pydataprovider2_cn.rst
@ -15,23 +15,23 @@ MNIST的使用场景
 MNIST是一个包含有70,000张灰度图片的数字分类数据集。样例数据 ``mnist_train.txt`` 如下：
-..  literalinclude:: mnist_train.txt
+..  literalinclude:: src/mnist_train.txt
 其中每行数据代表一张图片，行内使用 ``;`` 分成两部分。第一部分是图片的标签，为0-9中的一个数字；第二部分是28*28的图片像素灰度值。 对应的 ``train.list`` 即为这个数据文件的名字：
-..  literalinclude:: train.list
+..  literalinclude:: src/train.list
 dataprovider的使用
 ++++++++++++++++++
-..  literalinclude:: mnist_provider.dict.py
+..  literalinclude:: src/mnist_provider.dict.py
 - 首先，引入PaddlePaddle的PyDataProvider2包。
 - 其次，定义一个Python的 `Decorator <http://www.learnpython.org/en/Decorators>`_ `@provider`_ 。用于将下一行的数据输入函数标记成一个PyDataProvider2，同时设置它的input_types属性。
  - `input_types`_：设置这个PyDataProvider2返回什么样的数据。本例根据网络配置中 ``data_layer`` 的名字，显式指定返回的是一个28*28维的稠密浮点数向量和一个[0-9]的10维整数标签。
-    ..  literalinclude:: mnist_config.py
+    ..  literalinclude:: src/mnist_config.py
         :lines: 9-10
  - 注意：如果用户不显示指定返回数据的对应关系，那么PaddlePaddle会根据layer的声明顺序，来确定对应关系。但这个关系可能不正确，所以推荐使用显式指定的方式来设置input_types。
@ -53,7 +53,7 @@ dataprovider的使用
 在网络配置里，只需要一行代码就可以调用这个PyDataProvider2，如，
-..  literalinclude:: mnist_config.py
+..  literalinclude:: src/mnist_config.py
     :lines: 1-7
 训练数据是 ``train.list`` ，没有测试数据，调用的PyDataProvider2是 ``mnist_provider`` 模块中的 ``process`` 函数。
@ -80,7 +80,7 @@ dataprovider的使用
 本例采用英文情感分类的数据，即将一段英文文本数据，分类成正面情绪和负面情绪两类(用0和1表示)。样例数据 ``sentimental_train.txt`` 如下：
-..  literalinclude:: sentimental_train.txt
+..  literalinclude:: src/sentimental_train.txt
 dataprovider的使用
 ++++++++++++++++++
@ -90,7 +90,7 @@ dataprovider的使用
 - 其中 ``input_types`` 和在 `@provider`_ 中配置的效果一致。本例中的输入特征是词ID的序列，因此使用 ``integer_value_sequence`` 类型来设置。
 - 将 ``dictionary`` 存入settings对象，在 ``process`` 函数中使用。 dictionary是从网络配置中传入的dict对象，即一个将单词字符串映射到单词ID的字典。
-..  literalinclude:: sentimental_provider.py
+..  literalinclude:: src/sentimental_provider.py
 网络配置中的调用
 ++++++++++++++++
@ -100,7 +100,7 @@ dataprovider的使用
 * 在配置中需要读取外部字典。
 * 在声明DataProvider的时候传入dictionary作为参数。
-..  literalinclude:: sentimental_config.py
+..  literalinclude:: src/sentimental_config.py
     :emphasize-lines: 12-14
 参考(Reference)
--- a/doc/api/data_provider/pydataprovider2_en.rst
+++ b/doc/api/data_provider/pydataprovider2_en.rst
@ -24,18 +24,18 @@ of 28 x 28 pixels.
 A small part of the original data as an example is shown as below:
-.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_train.txt
+.. literalinclude:: src/mnist_train.txt
 Each line of the data contains two parts, separated by :code:`;`. The first part is
 label of an image. The second part contains 28x28 pixel float values.
 Just write path of the above data into train.list. It looks like this:
-.. literalinclude:: ../../../doc_cn/ui/data_provider/train.list
+.. literalinclude:: src/train.list
 The corresponding dataprovider is shown as below:
-.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_provider.py
+.. literalinclude:: src/mnist_provider.dict.py
 The first line imports PyDataProvider2 package.
 The main function is the process function, that has two parameters.
@ -74,7 +74,7 @@ sample by using keywords :code:`yield`.
 Only a few lines of codes need to be added into the training configuration file,
 you can take this as an example.
-.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_config.py
+.. literalinclude:: src/mnist_config.py
 Here we specify training data by :code:`train.list`, and no testing data is specified.
 The method which actually provide data is :code:`process`.
@ -83,7 +83,7 @@ User also can use another style to provide data, which defines the
 :code:`data_layer`'s name explicitly when `yield`. For example,
 the :code:`dataprovider` is shown as below.
-.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_provider.dict.py
+.. literalinclude:: src/mnist_provider.dict.py
   :linenos:
 If user did't give the :code:`data_layer`'s name, PaddlePaddle will use
@ -121,11 +121,11 @@ negative sentiment (marked by 0 and 1 respectively).
 A small part of the original data as an example can be found in the path below:
-.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_train.txt
+.. literalinclude:: src/sentimental_train.txt
 The corresponding data provider can be found in the path below:
-.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_provider.py
+.. literalinclude:: src/sentimental_provider.py
 This data provider for sequential model is a little more complex than that
 for MINST dataset.
@ -143,7 +143,7 @@ initialized. The :code:`on_init` function has the following parameters:
 To pass these parameters into DataProvider, the following lines should be added
 into trainer configuration file.
-.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_config.py
+.. literalinclude:: src/sentimental_config.py
 The definition is basically same as MNIST example, except:
 * Load dictionary in this configuration
--- a/doc/api/data_provider/src/mnist_config.py
+++ b/doc/api/data_provider/src/mnist_config.py
--- a/doc/api/data_provider/src/mnist_provider.dict.py
+++ b/doc/api/data_provider/src/mnist_provider.dict.py
--- a/doc/api/data_provider/src/mnist_train.txt
+++ b/doc/api/data_provider/src/mnist_train.txt
--- a/doc/api/data_provider/src/sentimental_config.py
+++ b/doc/api/data_provider/src/sentimental_config.py
--- a/doc/api/data_provider/src/sentimental_provider.py
+++ b/doc/api/data_provider/src/sentimental_provider.py
--- a/doc/api/data_provider/src/sentimental_train.txt
+++ b/doc/api/data_provider/src/sentimental_train.txt
--- a/doc/api/data_provider/src/train.list
+++ b/doc/api/data_provider/src/train.list
--- a/doc/api/index_cn.rst
+++ b/doc/api/index_cn.rst
@ -0,0 +1,37 @@
 API
 ===
 DataProvider API
 ----------------
 ..  toctree::
    :maxdepth: 1
    data_provider/dataprovider_cn.rst
    data_provider/pydataprovider2_cn.rst
 ..  _api_trainer_config:
 Model Config API
 ----------------
 ..  toctree::
    :maxdepth: 1
    trainer_config_helpers/optimizers.rst
    trainer_config_helpers/data_sources.rst
    trainer_config_helpers/layers.rst
    trainer_config_helpers/activations.rst 
    trainer_config_helpers/poolings.rst
    trainer_config_helpers/networks.rst
    trainer_config_helpers/evaluators.rst
    trainer_config_helpers/attrs.rst
 Applications API
 ----------------
 ..  toctree::
    :maxdepth: 1
    predict/swig_py_paddle_cn.rst
--- a/doc/api/index_en.rst
+++ b/doc/api/index_en.rst
@ -7,7 +7,7 @@ DataProvider API
 ..  toctree::
    :maxdepth: 1
-    data_provider/index_en.rst
+    data_provider/dataprovider_en.rst
    data_provider/pydataprovider2_en.rst
 ..  _api_trainer_config:
--- a/doc/api/predict/src/predict_sample.py
+++ b/doc/api/predict/src/predict_sample.py
--- a/doc/api/predict/swig_py_paddle_cn.rst
+++ b/doc/api/predict/swig_py_paddle_cn.rst
@ -34,7 +34,7 @@ PaddlePaddle使用swig对常用的预测接口进行了封装，通过编译会
 如下是一段使用mnist model来实现手写识别的预测代码。完整的代码见 ``src_root/doc/ui/predict/predict_sample.py`` 。mnist model可以通过 ``src_root\demo\mnist`` 目录下的demo训练出来。
-..  literalinclude:: ../../../doc/ui/predict/predict_sample.py
+..  literalinclude:: src/predict_sample.py
    :language: python
    :lines: 15-18,121-136
--- a/doc/api/predict/swig_py_paddle_en.rst
+++ b/doc/api/predict/swig_py_paddle_en.rst
@ -13,7 +13,7 @@ Here is a sample python script that shows the typical prediction process for the
 MNIST classification problem. A complete sample code could be found at
 :code:`src_root/doc/ui/predict/predict_sample.py`.
-..  literalinclude:: ./predict_sample.py
+..  literalinclude:: src/predict_sample.py
    :language: python
    :lines: 15-18,90-100,101-104
--- a/doc/conf.py.cn.in
+++ b/doc/conf.py.cn.in
@ -62,7 +62,7 @@ source_suffix = ['.rst', '.md', '.Rmd']
 source_encoding = 'utf-8'
 # The master toctree document.
-master_doc = 'index'
+master_doc = 'index_cn'
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
--- a/doc/conf.py.en.in
+++ b/doc/conf.py.en.in
@ -63,7 +63,7 @@ source_suffix = ['.rst', '.md', '.Rmd']
 source_encoding = 'utf-8'
 # The master toctree document.
-master_doc = 'index'
+master_doc = 'index_en'
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
--- a/doc/faq/index_cn.rst
+++ b/doc/faq/index_cn.rst
@ -1,5 +1,5 @@
 ####################
-PaddlePaddle常见问题
+FAQ
 ####################
 ..  contents::
@ -33,10 +33,9 @@ PyDataProvider使用的是异步加载，同时在内存里直接随即选取数
 个内存池实际上决定了shuffle的粒度。所以，如果将这个内存池减小，又要保证数据是随机的，
 那么最好将数据文件在每次读取之前做一次shuffle。可能的代码为
-..  literalinclude:: reduce_min_pool_size.py
+..  literalinclude:: src/reduce_min_pool_size.py
-这样做可以极大的减少内存占用，并且可能会加速训练过程，详细文档参考 `这里
+这样做可以极大的减少内存占用，并且可能会加速训练过程，详细文档参考 `这里 <../ui/data_provider/pydataprovider2.html#provider>`_ 。
 <../ui/data_provider/pydataprovider2.html#provider>`_ 。
 神经元激活内存
 ++++++++++++++
@ -76,7 +75,7 @@ PaddlePaddle支持非常多的优化算法(Optimizer)，不同的优化算法需
 使用 :code:`pydataprovider`时，可以减少缓存池的大小，同时设置内存缓存功能，即可以极大的加速数据载入流程。
 :code:`DataProvider` 缓存池的减小，和之前减小通过减小缓存池来减小内存占用的原理一致。
-..  literalinclude:: reduce_min_pool_size.py
+..  literalinclude:: src/reduce_min_pool_size.py
 同时 :code:`@provider` 接口有一个 :code:`cache` 参数来控制缓存方法，将其设置成 :code:`CacheType.CACHE_PASS_IN_MEM` 的话，会将第一个 :code:`pass` (过完所有训练数据即为一个pass)生成的数据缓存在内存里，在之后的 :code:`pass` 中，不会再从 :code:`python` 端读取数据，而是直接从内存的缓存里读取数据。这也会极大减少数据读入的耗时。
@ -90,11 +89,11 @@ PaddlePaddle支持Sparse的训练，sparse训练需要训练特征是 :code:`spa
 使用一个词前两个词和后两个词，来预测这个中间的词。这个任务的DataProvider为\:
-..  literalinclude:: word2vec_dataprovider.py
+..  literalinclude:: src/word2vec_dataprovider.py
 这个任务的配置为\:
-..  literalinclude:: word2vec_config.py
+..  literalinclude:: src/word2vec_config.py
 更多关于sparse训练的内容请参考 `sparse训练的文档 <TBD>`_
@ -158,7 +157,7 @@ PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID，相同名字
 这里 :code:`hidden_a` 和 :code:`hidden_b` 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 :code:`softmax_param`。
 7. *-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.
-----------------------------------------------------------------------
+---------------------------------------------------------------------------
 出现这个问题的主要原因是，系统编译wheel包的时候，使用的 :code:`wheel` 包是最新的，
 而系统中的 :code:`pip` 包比较老。具体的解决方法是，更新 :code:`pip` 包并重新编译PaddlePaddle。
@ -220,7 +219,7 @@ PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID，相同名字
 10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致
----------------------------------------------------------
+----------------------------------------------------------------
 这是目前CMake寻找Python的逻辑存在缺陷，如果系统安装了多个Python版本，CMake找到的Python库和Python解释器版本可能有不一致现象，导致编译PaddlePaddle失败。正确的解决方法是，
 用户强制指定特定的Python版本，具体操作如下：
--- a/doc/faq/src/reduce_min_pool_size.py
+++ b/doc/faq/src/reduce_min_pool_size.py
--- a/doc/faq/src/word2vec_config.py
+++ b/doc/faq/src/word2vec_config.py
--- a/doc/faq/src/word2vec_dataprovider.py
+++ b/doc/faq/src/word2vec_dataprovider.py
--- a/doc/getstarted/basic_usage/index_cn.rst
+++ b/doc/getstarted/basic_usage/index_cn.rst
@ -58,6 +58,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍
    cost = regression_cost(input= ȳ, label=y)
    outputs(cost)
 这段简短的配置展示了PaddlePaddle的基本用法：
 - 第一部分定义了数据输入。一般情况下，PaddlePaddle先从一个文件列表里获得数据文件地址，然后交给用户自定义的函数（例如上面的 `process`函数）进行读入和预处理从而得到真实输入。本文中由于输入数据是随机生成的不需要读输入文件，所以放一个空列表（`empty.list`）即可。
--- a/Show More
+++ b/Show More