GENIA LOGO

The GENIA Ontology

Ontologies have been developed in the biomedical sciences for several applications. Such ontologies include conceptual hierarchies for databases covering diseases and drug names. Construction of a more general ontology (e.g. Gene Ontology, BioCon Knowledge Base of the TAMBIS Project) is being attempted by several groups interested in interconnecting databases under a uniform view.

The GENIA ontology is intended to be a formal model of cell signaling reactions in human. It is to be used as a basis of thesauri and semantic dictionaries for natural language processing applications, e.g.,

Another use of the GENIA ontology is to provide the basis for integrated view of multiple databases including CSNDB developed at National Institiute of Health Science.

The current version of the GENIA ontology, a taxonomy of some entities involved in reactions (Figure 1), was developed as the semantic classification used in the GENIA corpus. The links in the figure point to the notes regarding to that class that serve as scope notes for annotation. They also include examples of entities that belong to the class.
+---+-source-+-natural-+-organism-+-multi-cell organism
       |               |              |                 +-mono-cell organism
       |               |              |                 +-virus
       |               |              +-body part
       |               |              +-tissue
       |               |              +-cell type
       |               |              +-cell component
       |               |              +-other (natural source)
       |               +-artificial-+-cell line
       |                                +-other (artificial source)
       +-substance-+-compound-+-organic-+-amino acid-+-protein-+-protein family or group
       |                   |                   |               |                   |              +-protein complex
       |                   |                   |               |                   |              +-individual protein molecule
       |                   |                   |               |                   |              +-subunit of protein complex
       |                   |                   |               |                   |              +-substructure of protein
       |                   |                   |               |                   |              +-domain or region of protein
       |                   |                   |               |                   +-peptide
       |                   |                   |               |                   +-amino acid monomer
       |                   |                   |               +-nucleic acid-+-DNA-+-DNA family or group
       |                   |                   |               |                      |          +-individual DNA molecule
       |                   |                   |               |                      |          +-domain or region of DNA
       |                   |                   |               |                      +-RNA-+-RNA family or group
       |                   |                   |               |                      |          +-individual RNA molecule
       |                   |                   |               |                      |          +-domain or region of RNA
       |                   |                   |               |                      +-polynucletotide
       |                   |                   |               |                      +-nucleotide
       |                   |                   |               +-lipid-+-steroid
       |                   |                   |               +-carbohydrate
       |                   |                   |               +-other (organic compounds)
       |                   |                   +-inorganic
       |                   +-atom
       +-other

Figure 1. The current GENIA Ontology

Notes on Classes of The GENIA Ontology

Substance

The entities under the "substance" node refer to the substances involved in biochemical reactions. In this taxonomy, the substances are classified according to their chemical characteristics rather than their biological role. This is because, in the annotation work, the classes should be as mutually exclusive as possible and stably defined for the ease of the task. Chemical classification of substances is quite independent of the biological context in which it appears, and is therefore more stably defined, and can be easily expanded into other ontologies.

compound

organic

amino acid

An amino acid molecule or the compounds that consist of amino acids.

protein

Proteins include protein groups, families, molecules, complexes, and substructures.

protein family or group

A family or a group of proteins, e.g., STATs

protein complex

A protein complex e.g., RNA polymerase II. The class includes conjugated proteins such as lipoproteins and glycoproteins.

individual protein molecule

An individual member of a group of non-complex proteins, e.g., STAT1, STAT2, STAT3, or a (non-complex) protein not regarded as a member of a particular group.

subunit of protein complex

A monomer in a complex, e.g., RNA polymerase II alpha subunit.

substructure of protein

A secondary structure or a combination of secondary structures, e.g. leucine-zipper, zinc-finger, alpha-helix,beta-sheet, helix-loop-helix

domain or region of protein

A tertiary structure that is supposed to have a particular function, e.g., SH2, SH3.

peptide

A peptide e.g., peptide hormone, 15 amino acids, 18-20 residue-long peptide fragment

amino acid monomer

An amino acid monomer e.g., tyrosine, serin, tyr, ser

nucleic acid

A nucleic acid molecule or the compounds that consist of nucleic acids.

DNA

DNAs include DNA groups, families, molecules, domains, and regions.

DNA family or group

A family or a group of DNAs, e.g., myc family genes, rel family genes

individual DNA molecule

An individual member of a family or a group of DNAs, e.g., AP-1/c-jun expression vector, AP2 cDNA

domain or region of DNA

A substructure of DNA molecule which is supposed to have a particular function, such as a gene, e.g., c-jun gene, promoter region, Sp1 site, CA repeat. This class also includes a base sequence that has a particular function.

RNA

RNAs include RNA groups, families, molecules, domains, and regions.

RNA family or group

A family or a group of RNAs, e.g., tRNAs, viral RNA, HIV mRNA

individual RNA molecule

An individual molecule of RNA, e.g., globlin mRNA, Oct-T1 transcript

domain or region of RNA

A domain or a region of RNA, e.g., polyA site, alternative splicing site

polynucleotide

Polynucleotides include primers and synthetic DNA fragment.

nucleotide

An individual nucleotide, e.g., guanine, thymidine, uridine, ATP, GTP

lipid

steroid

carbohydrate

other organic compounds

inorganic compounds

atom

Source

Sources are biological locations where substances are found and their reactions take place, such as human (an organism), liver (a tissue), leukocyte (a cell), membrane (a sub-location of a cell) or HeLa (a cultured cell line). Organisms are further classified into multi-cell organisms, mono-cell organisms other than viruses, and viruses. In multi-cell organism, tissue, cell, sub-locations are interrelated with `part-of' relation but that relation is not shown in Figure 1.

natural source

organism

Organisms include multi-cell organisms, mono-cell organisms, and viruses.

multi-cell organism

A multi-cell organism, e.g., human, mouse

mono-cell organism

A mono-cell organism other than viruses, e.g., E. Coli, yeast

virus

A virus, e.g., HIV, HTLV, EBV

body part

A body part, e.g., central nervous system, immune system, blood

tissue

A tissue, e.g., peripheral blood, lymphoid tissue, vascular endothelium

cell type

A cell type, e.g., T-lymphocyte, T cell, astrocyte, fibroblast

cell component

A part of cells that has a particular function, e.g., nucleus, cytoplasm

other (natural source)

artificial source

Cultured, immortalized or otherwise artficially processed sources.

cell line

The class inculdes cell strains and estublished cell cultures, e.g., HeLa cell, NIH 3T3, lymphoma line, human bome marrow culture

other (artificial source)

other

In the GENIA corpus, the terms that are not categorized as sources or substances may be marked up, with <subClassOf resource="GENIA#other_names"/>. These terms represent the entities that play important roles in biological reactions but not yet fully classified in the GENIA ontology. We will collect these terms and classify them to further enhance the ontology.