GENIA Project
Mining literature for knowledge in molecular biology
The GENIA project seeks to automatically extract useful information from texts written by scientists to help overcome the problems caused by information overload. We intend that while the methods are customized for application in the micro-biology domain, the basic methods should be generalisable to knowledge acquisition in other scientific and engineering domains.
We are currently working on the key task of extracting event information about protein interactions. This type of information extraction requires the joint effort of many sources of knowledge, which we are now developing. These include a parser, ontology, thesaurus and domain dictionaries as well as supervised learning models.
Events
- (27 Sep. 2010 - 10 Mar. 2011) BioNLP Shared Task 2011 (BioNLP-ST'11)
Recent News
- (23 Jun. 2011) The GENIA mTOR pathway corpus
- Corpus annotated for pathway events with focus on Dissociation, in BioNLP Shared Task'11 format.
- (23 Jun. 2011) The GENIA "Exhaustive PTM" corpus
- Corpus annotated for nearly 40 types of protein post-translational modification events, in BioNLP Shared Task'11 EPI task format.
- (26 Jul. 2010) The GENIA DNA methylation corpus
- Corpus annotated for DNA methylation, in BioNLP'09 Shared Task format. Presented in Ohta et al. 2010 (SMBM)
- (12 Oct. 2010) BioDiscourseRelations corpus released
- BioDiscourseRelations Project released a corpus with 24 full paper articles which were annotated with discourse structure.
- (15 Jul. 2010) The GENIA Post-Translational Modification annotation
- Corpus annotated for four novel post-translational event types, in BioNLP'09 Shared Task format. Presented in Ohta et al. 2010 (BioNLP)
- (15 Jul. 2010) The GENIA T4SS annotation
- Corpus of T4SS domain texts annotated for gene/gene product entities and their high-level process associations, in GENIA Event format. Presented in Pyysalo et al. 2010 (BioNLP)
- (30 Jan. 2010) 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2010)
- An updated Call for Papers of BioTxtM 2010 is now online.
- (8 Jan. 2010) The GENIA relation annotation
- Ongoing efforts to annotate relation such as part-of on GENIA corpus.
- (5 Jun. 2009) BioNLP'09 Shared Task Workshop
- BioNLP Shared Task workshop held in conjunction with BioNLP NAACL-HLT 2009 workshop.
- (24 Mar. 2009) The GENIA treebank corpus released
- 1,999 PubMed abstracts have been annotated with Penn Treebank-style syntactic parses.
- (9 Feb. 2009) Parsed MEDLINE data download service
- Syntactic parses of the whole MEDLINE abstracts are available online.
- (9 Jan. 2008) The GENIA event corpus released
- 1,999 PubMed abstracts have been annotated for the mention of molecular events.
- (8 Jan. 2008) The homepage renewed.
- The GENIA project homepage has been renewed.