Merge pull request #1459 from weisy11/dygraph

update style text doc, add corpus file descriptions
release/2.0-rc1-0
MissPenguin 4 years ago committed by GitHub
commit 631fe2ecca
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -116,9 +116,17 @@ In actual application scenarios, it is often necessary to synthesize pictures in
* `CorpusGenerator`
* `method`Method of CorpusGeneratorsupports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is usedNo other configuration is neededotherwise you need to set `corpus_file` and `language`.
* `language`Language of the corpus.
* `corpus_file`: Filepath of the corpus.
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings'\n'. Corpus generator samples one line each time.
Example of corpus file:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
```
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
<div align="center">

@ -102,7 +102,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
* `CorpusGenerator`
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`
* `language`:语料的语种;
* `corpus_file`: 语料文件路径。
* `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。
语料文件格式示例:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
...
```
Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像便于合成场景丰富的文本图像下图给出了一些示例。

Loading…
Cancel
Save