|
|
|
@ -91,32 +91,31 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker {
|
|
|
|
|
"(int64_t). The number of chunks both in Inference and Label on the "
|
|
|
|
|
"given mini-batch.");
|
|
|
|
|
AddAttr<int>("num_chunk_types",
|
|
|
|
|
"(int). The number of chunk type. See below for details.");
|
|
|
|
|
AddAttr<std::string>(
|
|
|
|
|
"chunk_scheme",
|
|
|
|
|
"(string, default IOB). The labeling scheme indicating "
|
|
|
|
|
"how to encode the chunks. Must be IOB, IOE, IOBES or plain. See below "
|
|
|
|
|
"The number of chunk type. See the description for details.");
|
|
|
|
|
AddAttr<std::string>("chunk_scheme",
|
|
|
|
|
"The labeling scheme indicating "
|
|
|
|
|
"how to encode the chunks. Must be IOB, IOE, IOBES or "
|
|
|
|
|
"plain. See the description"
|
|
|
|
|
"for details.")
|
|
|
|
|
.SetDefault("IOB");
|
|
|
|
|
AddAttr<std::vector<int>>("excluded_chunk_types",
|
|
|
|
|
"(list<int>) A list including chunk type ids "
|
|
|
|
|
"A list including chunk type ids "
|
|
|
|
|
"indicating chunk types that are not counted. "
|
|
|
|
|
"See below for details.")
|
|
|
|
|
"See the description for details.")
|
|
|
|
|
.SetDefault(std::vector<int>{});
|
|
|
|
|
AddComment(R"DOC(
|
|
|
|
|
For some basics of chunking, please refer to
|
|
|
|
|
‘Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>’.
|
|
|
|
|
'Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CheckEvalOp computes the precision, recall, and F1-score of chunk detection,
|
|
|
|
|
ChunkEvalOp computes the precision, recall, and F1-score of chunk detection,
|
|
|
|
|
and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.
|
|
|
|
|
Here is a NER example of labeling for these tagging schemes:
|
|
|
|
|
|
|
|
|
|
Li Ming works at Agricultural Bank of China in Beijing.
|
|
|
|
|
IO: I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
|
|
|
|
|
IOB: B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
|
|
|
|
|
IOE: I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
|
|
|
|
|
IOBES: B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
|
|
|
|
|
IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
|
|
|
|
|
IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
|
|
|
|
|
IOE I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
|
|
|
|
|
IOBES B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
|
|
|
|
|
|
|
|
|
|
There are three chunk types(named entity types) including PER(person), ORG(organization)
|
|
|
|
|
and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>.
|
|
|
|
@ -148,7 +147,7 @@ PER and LOC. To satisfy the above equations, the label map can be like this:
|
|
|
|
|
I-LOC 5
|
|
|
|
|
O 6
|
|
|
|
|
|
|
|
|
|
It’s not hard to verify the equations noting that the num of chunk types
|
|
|
|
|
It's not hard to verify the equations noting that the num of chunk types
|
|
|
|
|
is 3 and the num of tag types in IOB scheme is 2. For example, the label
|
|
|
|
|
id of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of
|
|
|
|
|
I-LOC is 2, which consistent with the results from the equations.
|
|
|
|
|