Genia TreeBank Beta version

Note that the corpus is in the development stage, and the scheme might change in the next version.

Scheme

The annotateion scheme basically follows the gBracketing Guidelines for Treebank II Styleh manual of Penn Treebank Project. The following are our own rules. Some examples here are written in PTB format.
  1. XML format is used. For non-null elements, the syntactic category is used as tags.
    <NP>a <ADJP>very sweet </ADJP>apple </NP> Null elements are marked by using childless elements whose tags are corresponding categories. e.g. <NP NULL="NONE" ref="i10"/> Attribute NULL specifies the type of a null element.
    Value: NONE for *, Zero for 0, QSTN for *?*, T for *T*
    Null complementizers are marked with <COMP NULL="ZERO"/>
  2. All labels except NX, NAC are used. The phrases that should be labelled as NX and NAC are also labeled NP.
    e.g.
    PTB: (NP:tasty (NX:(NX:California oranges) and (NX:Fuji apples)))
    -->(NP:tasty (NP:(NP:California oranges) and (NP:Fuji apples)))
  3. Some function tags in PTB are not used. Those used are:
  4. Case elements such as SBJ, PRD, DTV are attached within the tag, connected to the category name with "-".
  5. Othe function tags are marked as follows.
    Adverbial
    Attribute name: SEM, Attribute value: TMP ¨SEM="TMP"
    e.g. <NP-SBJ>We </NP-SBJ>
    Others
    Attribute name: FCTN , Attribute value:
    It is..that..sentense CLF
    Headline HLN
    Vocative VOC
    Vopicalized elements TPC
  6. In the gapped structures, the complete clause corresponding to the gap is used as a template, and assigned an id number (id attribute). The corresponding gap is labeled with "GAP" attribute (the value is the id number of the corresponding template. e.g.
    <S><LST>AB</LST> - <NP-SBJ id="i77"><NP>The production </NP><PP>of <NP><NP><NP>human immunodeficiency virus type 1 </NP><PRN>(<NP>HIV-1</NP>) </PRN></NP>progeny </NP></PP></NP-SBJ><VP>was <VP SYN="COOD"><VP>followed <NP NULL="NONE" ref="i77"/><PP id="i104">in <NP>the U937 promonocytic cell line </NP></PP><PP SEM="TMP">after <NP><NP>stimulation </NP><PP>either with <NP SYN="COOD"><NP>retinoic acid </NP>or <NP>PMA</NP></NP></PP></NP></PP></VP>, and <VP><PP GAP="i104">in <NP>purified human <NP SYN="COOD"><NP>monocytes </NP>and <NP>macrophages</NP></NP></NP></PP></VP></VP></VP>.</S>
  7. Coordination is labeled with SYN attribute, whose value is "COOD".
  8. Reference id is marked with id numbers. The refenent has the "id" attribute and the referer has the "ref" attribute with identical value with the referent's "id" attribute.
  9. When the annotator is unsure bracketing tag or finds error in the original text, the comment is inserted using following attributes.

Notes

Coordination, "COOD" attribute

  1. "COOD" is attached at the parent element which show the whole part of Coordination.
    PTB: (NP:(NP:a cat) and (NP:three dogs)) ¨ (NP-COOD:(NP:a cat) and (NP:three dogs))

  2. When therefs phrase coordination in object or adverbial phrase, the whole part of coordination is bracketed by "COOD" tag attached with parent element to make it clear. examples:

    I buy and quickly eat oranges.
    PTB: I (VP:(VP:buy (NP *RNR*-1)) and (VP:quickly eat (NP *RNR*-1)) (NP-1:oranges)). ¨I (VP:(VP-COOD:(VP:buy (NP *RNR*-1)) and (VP:quickly eat (NP *RNR*-1))) (NP-1:oranges)).
    I went out and did shopping yesterday
    PTB: I (VP:(VP:went out) and (VP:did shopping) (NP-TMP:yesterday)) ¨I (VP:(VP-COOD:(VP:went out) and (VP:did shopping)) (NP-TMP:yesterday))
  3. When single-word elements of the same syntactic category are coordinated, the coodination is not explicitly marked in the original PTB. that be bracketed with flat structure. But here, single-word elementswe are also tagged "COOD".examples:

    PTB: (NP:John and Mary) ¨ (NP-COOD:(NP:John) and (NP:Mary))
    PTB: She (VP:smiled and then cried). ¨She (VP-COOD:(VP:smiled) and then (VP:cried)).
    PTB: I (VP:buy and eat (NP:oranges)). ¨I (VP:(VP-COOD:(VP:buy(NP *RNR*-1)) and (VP:eat(NP *RNR*-1))) (NP-1:oranges)).
    PTB: (NP:a (ADJP:beautiful, smart) cat) ¨ (NP:a (ADJP-COOD:(ADJP:beautiful), (ADJP:smart)) cat)

Noun Phrase

  1. NAC and NX are not used.
  2. When noun phrase consists with sequence of nouns, the internal structure is not neccessarily shown.