This is the report on the shared task of biomedical entity recognition that was held from March to April 2004. The results were presented in the joint workshop of BioNLP/NLPBA 2004.
The task aims to identify and classify technical terms in the domain of molecular biology that correspond to instances of concepts that are of interest to biologists.
The training data used in the task came from the GENIA version 3.02 corpus, This was formed from a controlled search on MEDLINE using the MeSH terms 'human', 'blood cells' and 'transcription factors'. From this search, 2,000 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. For the shared task we decided however to simplify the 36 classes and used only the classes protein, DNA, RNA, cell line and cell type. The first three incorporate several subclasses from the original taxonomy while the last two are interesting in order to make the task realistic for post-processing by a potential template filling application. The publication year of the training set ranges over 1990~1999.
For testing purposes we used a new annotated collection of MEDLINE abstracts from the GENIA project. 404 abstracts were used that were annotated for the same classes of entities. Most parts of the test set include abstracts retrieved with the same set of MeSH terms, and their publication year ranges over 1978~2001. To see the effect of publication year, the test set was roughly divided into four subsets: 1978-1989 set (which represents an old age from the viewpoint of the models that will be trained using the training set), 1990-1999 set (which represents the same age as the training set), 2000-2001 set (which represents a new age compared to the training set) and S/1998-2001 set (which represents roughly a new age in a super domain). The last subset represents a super domain and the abstracts was retrieved with MeSH terms, `blood cells' and `transcription factors' (without `human'). (The S/1998-2001 set includes the whole 2000-2001 set.) The following table shows the size of the data sets
|
|
# of abstracts |
# of sentences |
# of tokens |
|
|
Training Set |
2,000 |
20,546 (10.27/abs) |
472,006 (236.00/abs) (22.97/sen) |
|
|
Test Set |
Total |
404 |
4,260 (10.54/abs) |
96,780 (239.55/abs) (22.72/sen) |
|
1978-1989 |
104 |
991 ( 9.53/abs) |
22,320 (214.62/abs) (22.52/sen) |
|
|
1990-1999 |
106 |
1,115 (10.52/abs) |
25,080 (236.60/abs) (22.49/sen) |
|
|
2000-2001 |
130 |
1,452 (11.17/abs) |
33,380 (256.77/abs) (22.99/sen) |
|
|
S/1998-2001 |
204 |
2,254 (11.05/abs) |
51,628 (253.08/abs) (22.91/sen) |
|
To simplify the annotation task to a simple linear sequential analysis problem, embedded structures have been removed leaving only the outermost structures (i.e. the longest tag sequence). Consequently, a group of coordinated entities involving ellipsis are annotated as one structure like in the following example:
... in [lymphocytes] and [T- and B- lymphocyte] count in ...
In the example, "T- and B-lymphocyte" is annotated as one structure but involves two entity names, "T-lymphocyte" and "B-lymphocyte", whereas "lymphocytes" is annotated as one and involves as many entity names.
Results are given as F-scores using a modifies version of the CoNLL evaluation script and are defined as F=(2PR)/(P+R), where P denotes Precision and R Recall. P is the ratio of the number of correctly found NE chunks to the number of found NE chunks, and R is the ratio of the number of correctly found NE chunks to the number of true NE chunks. The script outputs three sets of F-scores according to exact boundary match, right and left boundary matching. In the right boundary matching only right boundaries of entities are considered without matching left boundaries and vice versa.
The following table lists entity recognition performance of each participating system on each test set.
|
|
1978-1989 set |
1990-1999 set |
2000-2001 set |
S/1998-2001 set |
Total |
|
[Zho04] |
75.3 / 69.5 / 72.3 |
77.1 / 69.2 / 72.9 |
75.6 / 71.3 / 73.8 |
75.8 / 69.5 / 72.5 |
76.0 / 69.4 / 72.6 |
|
[Fin04] |
66.9 / 70.4 / 68.6 |
73.8 / 69.4 / 71.5 |
72.6 / 69.3 / 70.9 |
71.8 / 67.5 / 69.6 |
71.6 / 68.6 / 70.1 |
|
[Set04] |
63.6 / 71.4 / 67.3 |
72.2 / 68.7 / 70.4 |
71.3 / 69.6 / 70.5 |
71.3 / 68.8 / 70.1 |
70.3 / 69.3 / 69.8 |
|
[Son04] |
60.3 / 66.2 / 63.1 |
71.2 / 65.6 / 68.2 |
69.5 / 65.8 / 67.6 |
68.3 / 64.0 / 66.1 |
67.8 / 64.8 / 66.3 |
|
[Zha04] |
63.2 / 60.4 / 61.8 |
72.5 / 62.6 / 67.2 |
69.1 / 60.2 / 64.7 |
69.2 / 60.3 / 64.4 |
69.1 / 61.0 / 64.8 |
|
[Rös04] |
59.2 / 60.3 / 59.8 |
70.3 / 61.8 / 65.8 |
68.4 / 61.5 / 64.8 |
68.3 / 60.4 / 64.1 |
67.4 / 61.0 / 64.0 |
|
[Par04] |
62.8 / 55.9 / 59.2 |
70.3 / 61.4 / 65.6 |
65.1 / 60.4 / 62.7 |
65.9 / 59.7 / 62.7 |
66.5 / 59.8 / 63.0 |
|
[Lee04] |
42.5 / 42.0 / 42.2 |
52.5 / 49.1 / 50.8 |
53.8 / 50.9 / 52.3 |
52.3 / 48.1 / 50.1 |
50.8 / 47.6 / 49.1 |
BL |
47.1 / 33.9 / 39.4 |
56.8 / 45.5 / 50.5 |
51.7 / 46.3 / 48.8 |
52.6 / 46.0 / 49.1 |
52.6 / 43.6 / 47.7 |
The baseline model (BL) utilizes lists of entities of each class collected from the training set, and performs longest match search for entities through the test set. Frequency of each entity with each class is referred to break ties.
[Zho04] GuoDong Zhou and Jian Su, "Exploring Deep Knowledge Resources in Biomedical Name Recognition", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Fin04] Jenny Finkel, Shipra Dingare, Huy Nguyen, Malvina Nissim, Gail Sinclair and Christopher Manning, "Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Set04] Burr Settles, "Biomedical Named Entity Recognition Using Conditional Random Fields and Novel Feature Sets", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Son04] Yu Song, Eunju Kim, Gary Geunbae Lee and Byoung-kee Yi, "POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Zha04] Shaojun Zhao, "Name Entity Recognition in Biomedical Text using a HMM model", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Ros04] Marc Rossler, "Adapting an NER-System for German to the Biomedical Domain", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Par04] Kyung-Mi Park, Seon-Ho Kim, Do-Gil Lee and Hae-Chang Rim. "Boosting Lexical Knowledge for Biomedical Named Entity Recognition", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
[Lee04] Chih Lee, Wen-Juan Hou and Hsin-Hsi Chen, "Annotating Multiple Types of Biomedical Entities: A Single Word Classificication Approach", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).
• last modification made on 14 October 2004 by Jin-Dong Kim.
• workshop homepage : http://www.genisis.ch/~natlang/JNLPBA04/
• shared task homepage : http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm