Search Admin

GENIA Project

Mining literature for knowledge in molecular biology

The GENIA project seeks to automatically extract useful information from texts written by scientists to help overcome the problems caused by information overload. We intend that while the methods are customized for application in the micro-biology domain, the basic methods should be generalisable to knowledge acquisition in other scientific and engineering domains.

We are currently working on the key task of extracting event information about protein interactions. This type of information extraction requires the joint effort of many sources of knowledge, which we are now developing. These include a parser, ontology, thesaurus and domain dictionaries as well as supervised learning models.

Recent News

(30 January 2010) 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2010)
An updated Call for Papers of BioTxtM 2010 is now online.
(8 January 2010) The GENIA relation annotation
Ongoing efforts to annotate relation such as part-of on GENIA corpus.
(5 June 2009) BioNLP'09 Shared Task Workshop
BioNLP Shared Task workshop held in conjunction with BioNLP NAACL-HLT 2009 workshop.
(24 Mar 2009) The GENIA treebank corpus released
1,999 PubMed abstracts have been annotated with Penn Treebank-style syntactic parses.
(9 Feb 2009) Parsed MEDLINE data download service
Syntactic parses of the whole MEDLINE abstracts are available online.
(9 Jan 2008) The GENIA event corpus released
1,999 PubMed abstracts have been annotated for the mention of molecular events.
(8 Jan 2008) The homepage renewed.
The GENIA project homepage has been renewed.