* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
For example, enter the following image and corpus `PaddleOCR`.
@ -116,8 +119,16 @@ In actual application scenarios, it is often necessary to synthesize pictures in
* `CorpusGenerator`:
* `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`.
* `language`:Language of the corpus.
* `corpus_file`: Filepath of the corpus.
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time.
Example of corpus file:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
```
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
@ -130,7 +141,18 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:


You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and