GENIA Treebank
Seeking linguistic structure of bio-medical text
Overview
The part-of-speech and syntactic tree annotation has been made to the GENIA corpus. The annotation scheme of the GENIA Treebank has been designed based on the Penn Treebank II (PTB) bracketing guidelines (Beis et al, 1995).
Example
Documentation
- Annotation Guidelines
- Tateisi, Yuka and Jun'ichi Tsujii. GENIA Annotation Guidelines for Treebanking. Technical Report(TR-NLP-UT-2006-5). Tsujii Laboratory, University of Tokyo, 2006.
Download
- GENIA trebank version 1.0
- GENIA_treebank_v1.tar.gz (2,347,058 bytes)
Related Resources
- Mapping table between UID, PMID, and PMCID for the GENIA corpus files.
- UID-PMID-PMCID.lst(374)
- Illes Solt at Budapest University of Technology and Economics has written an XSL transformation to PTB format.
- XML transformation and GENIA treebank in PTB format