The University of Tokyo
Department of Computer Science

Contents

We are working on a joint project, U-Compare.org, in which all of our tools are integrated. Please visit the U-Compare site for details. If you have any further questions or would like to test our beta version, contact Yoshinobu Kano, who is in charge of UIMA related issues in the Tsujii Lab at the University of Tokyo and the project lead of the U-Compare project.

The content of this page is obsolute and will be changed thoroughly after the first public release of the U-Compare system.
Any use of the softwares below in this page is discouraged until the revision is done.

Overview

UIMA , Unstructured Information Management Architecture introduced by IBM, provides a framework for developing, testing and executing NLP components. UIMA enables sharing more easily thanks to a well-defined type system and a discriptor for each component. All the descriptions are represented in XML format.

Tsujii Lab's UIMA repository is a collection of NLP tools developed by Tsujii Lab which are compliant to UIMA. It includes:

Notes: The original versions of these components can be downloaded from Tsujii Lab download pages.

How to install

1. Install JAVA 1.5.x

Make sure that Java 1.5 runtime is installed and $JAVA_HOME is pointing to that directory, e.g. C:/Program Files/Java/jre1.5.x/. (Note: Do not use quotations on Windows).

2. Install Apache UIMA SDK 2.0

Set $UIMA_HOME to the installed Apache UIMA, e.g. $UIMA_HOME=C:/apache-uima.

3. Download and install UIMA Package
3.1 SOAP-client package

Currently, Tsujii Lab's server supports 2 UIMA services (or remote annotators) as below:

COMPONENT DESCRIPTOR
Enju with GENIA POS tagger desc/aggregate/SOAPGENIATaggerAndEnju-MoriV.x
Enju with OpenNLP POS tagger desc/aggregate/SOAPOpenNLPTaggerAndEnju-MoriV.xmlml

3.2 Local packages

How to use

Each component has its own UIMA component descriptor which explicitly describes the types system used, as well as input and output types. All the descriptor files are contained in /desc folder.
There are 5 primitive annotators whose discriptors are as below:

COMPONENT DESCRIPTOR
Enju desc/primitive/tsujiilab/Enju.xml
MoriV desc/primitive/tsujiilab/MoriV.xml
GENIATagger desc/primitive/tsujiilab/GENIATagger.xml
SLTagger desc/primitive/tsujiilab/SLTagger.xml
NEDetector desc/primitive/tsujiilab/NEDetector.xml

Moreover, you can combine these primitive annotators together or with other available UIMA annotators to form new aggregate annotators. Some examples of the aggregate annotators can also be found in /desc folder.

Manual

Technical report

License

The license of Tsujii Lab tools has not been decided yet. Please use personally and internally. The package includes Apache AXIS library (Apache license), UIMA library (Apache license) and OpenNLP type system (LGPL).


Last updated: May. 7th, 2007