|
|
|
@ -91,32 +91,31 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker {
|
|
|
|
|
"(int64_t). The number of chunks both in Inference and Label on the "
|
|
|
|
|
"given mini-batch.");
|
|
|
|
|
AddAttr<int>("num_chunk_types",
|
|
|
|
|
"(int). The number of chunk type. See below for details.");
|
|
|
|
|
AddAttr<std::string>(
|
|
|
|
|
"chunk_scheme",
|
|
|
|
|
"(string, default IOB). The labeling scheme indicating "
|
|
|
|
|
"how to encode the chunks. Must be IOB, IOE, IOBES or plain. See below "
|
|
|
|
|
"for details.")
|
|
|
|
|
"The number of chunk type. See the description for details.");
|
|
|
|
|
AddAttr<std::string>("chunk_scheme",
|
|
|
|
|
"The labeling scheme indicating "
|
|
|
|
|
"how to encode the chunks. Must be IOB, IOE, IOBES or "
|
|
|
|
|
"plain. See the description"
|
|
|
|
|
"for details.")
|
|
|
|
|
.SetDefault("IOB");
|
|
|
|
|
AddAttr<std::vector<int>>("excluded_chunk_types",
|
|
|
|
|
"(list<int>) A list including chunk type ids "
|
|
|
|
|
"A list including chunk type ids "
|
|
|
|
|
"indicating chunk types that are not counted. "
|
|
|
|
|
"See below for details.")
|
|
|
|
|
"See the description for details.")
|
|
|
|
|
.SetDefault(std::vector<int>{});
|
|
|
|
|
AddComment(R"DOC(
|
|
|
|
|
For some basics of chunking, please refer to
|
|
|
|
|
‘Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>’.
|
|
|
|
|
'Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CheckEvalOp computes the precision, recall, and F1-score of chunk detection,
|
|
|
|
|
ChunkEvalOp computes the precision, recall, and F1-score of chunk detection,
|
|
|
|
|
and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.
|
|
|
|
|
Here is a NER example of labeling for these tagging schemes:
|
|
|
|
|
|
|
|
|
|
Li Ming works at Agricultural Bank of China in Beijing.
|
|
|
|
|
IO: I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
|
|
|
|
|
IOB: B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
|
|
|
|
|
IOE: I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
|
|
|
|
|
IOBES: B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
|
|
|
|
|
|
|
|
|
|
Li Ming works at Agricultural Bank of China in Beijing.
|
|
|
|
|
IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
|
|
|
|
|
IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
|
|
|
|
|
IOE I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
|
|
|
|
|
IOBES B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
|
|
|
|
|
|
|
|
|
|
There are three chunk types(named entity types) including PER(person), ORG(organization)
|
|
|
|
|
and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>.
|
|
|
|
@ -124,31 +123,31 @@ and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chun
|
|
|
|
|
Since the calculations actually use label ids rather than labels, extra attention
|
|
|
|
|
should be paid when mapping labels to ids to make CheckEvalOp work. The key point
|
|
|
|
|
is that the listed equations are satisfied by ids.
|
|
|
|
|
|
|
|
|
|
tag_type = label % num_tag_type
|
|
|
|
|
chunk_type = label / num_tag_type
|
|
|
|
|
|
|
|
|
|
tag_type = label % num_tag_type
|
|
|
|
|
chunk_type = label / num_tag_type
|
|
|
|
|
|
|
|
|
|
where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type`
|
|
|
|
|
is the num of chunk types, and `tag_type` get its value from the following table.
|
|
|
|
|
|
|
|
|
|
Scheme Begin Inside End Single
|
|
|
|
|
plain 0 - - -
|
|
|
|
|
IOB 0 1 - -
|
|
|
|
|
IOE - 0 1 -
|
|
|
|
|
IOBES 0 1 2 3
|
|
|
|
|
|
|
|
|
|
Scheme Begin Inside End Single
|
|
|
|
|
plain 0 - - -
|
|
|
|
|
IOB 0 1 - -
|
|
|
|
|
IOE - 0 1 -
|
|
|
|
|
IOBES 0 1 2 3
|
|
|
|
|
|
|
|
|
|
Still use NER as example, assuming the tagging scheme is IOB while chunk types are ORG,
|
|
|
|
|
PER and LOC. To satisfy the above equations, the label map can be like this:
|
|
|
|
|
|
|
|
|
|
B-ORG 0
|
|
|
|
|
I-ORG 1
|
|
|
|
|
B-PER 2
|
|
|
|
|
I-PER 3
|
|
|
|
|
B-LOC 4
|
|
|
|
|
I-LOC 5
|
|
|
|
|
O 6
|
|
|
|
|
B-ORG 0
|
|
|
|
|
I-ORG 1
|
|
|
|
|
B-PER 2
|
|
|
|
|
I-PER 3
|
|
|
|
|
B-LOC 4
|
|
|
|
|
I-LOC 5
|
|
|
|
|
O 6
|
|
|
|
|
|
|
|
|
|
It’s not hard to verify the equations noting that the num of chunk types
|
|
|
|
|
It's not hard to verify the equations noting that the num of chunk types
|
|
|
|
|
is 3 and the num of tag types in IOB scheme is 2. For example, the label
|
|
|
|
|
id of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of
|
|
|
|
|
I-LOC is 2, which consistent with the results from the equations.
|
|
|
|
|