
Version 2.3.1 is available since Dec. 12th, 2008
Online demo is available!
Enju is a syntactic parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role.
The main features of the Enju parser are:
This version includes the following additional features.
"mogura -super""enju2ptb/convert < ENJU_XML_OUTPUT > PTB_STYLE_OUTPUT"For any inquiry, contact us.
Binary packages of Enju 2.3.1 are avaiable at Tsujii Laboratory software download page. Currently, following packages are available for download.
You can try Enju before download via online demo. Please contact us when you need a source package.
> cd $DIR
> tar xvzf enju-X.Y-PLATFORM.tar.gz
"enju" will be installed in "$DIR/enju-X.Y/".
> tar xvzf enju-X.Y.tar.gz
> cd enju-X.Y
> ./configure
> make
> make install
"enju" will be installed in "/usr/local/bin/".
If you want to install it other than "/usr/local/", specify "--prefix". For example,
> ./configure --prefix=$DIR
will install Enju into $DIR ("enju" is installed in "$DIR/bin/").
To parse sentences, put a file (having one sentence per line) to the standard input.
> enju < RAWTEXT > RESULTS
You can alternatively use a high-speed parser by using the command "mogura"
> mogura < RAWTEXT > RESULTS
These commands work in mostly the same way.
If you want to parse tokenized texts with Penn Treebank-style part-of-speech tags,
> enju -nt < TAGGEDTEXT > RESULTS
The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a stand-off format.
> enju -xml < RAWTEXT > RESULTS
> enju -so < RAWTEXT > RESULTS
You can also use Enju as a CGI server.
> enju -cgi PORT_NUMBER
You can access to the port PORT_NUMBER with a CGI query,
and receive parsing results in the XML format.
http://localhost:PORT_NUMBER/cgi-lilfes/enju?sentence=he+runs+the+company
For further details on the output formats, see the manuals and the technical report.
Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire semantic relations among words in an input sentence without the burden of analyzing its deep-syntactic structure.
Parsing examples are shown below. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "run" and the argument "he". Note that the same semantic relations holding among the three words, "he", "run", and "company", are obtained from sentences written in different syntactic structures.
| ROOT | ROOT | ROOT | ROOT | -1 | ROOT | ROOT | runs | run | VBZ | VB | 1 |
| runs | run | VBZ | VB | 1 | verb_arg12 | ARG1 | He | he | PRP | PRP | 0 |
| runs | run | VBZ | VB | 1 | verb_arg12 | ARG2 | company | company | NN | NN | 3 |
| the | the | DT | DT | 2 | det_arg1 | ARG1 | company | company | NN | NN | 3 |
| ROOT | ROOT | ROOT | ROOT | -1 | ROOT | ROOT | is | be | VBZ | VB | 5 |
| is | be | VBZ | VB | 5 | verb_arg12 | ARG1 | company | company | NN | NN | 1 |
| is | be | VBZ | VB | 5 | verb_arg12 | ARG2 | small | small | JJ | JJ | 6 |
| small | small | JJ | JJ | 6 | adj_arg1 | ARG1 | company | company | NN | NN | 1 |
| The | the | DT | DT | 0 | det_arg1 | ARG1 | company | company | NN | NN | 1 |
| that | that | IN | IN | 2 | relative_arg1 | ARG1 | company | company | NN | NN | 1 |
| runs | run | VBZ | VB | 4 | verb_arg12 | ARG1 | he | he | PRP | PRP | 3 |
| runs | run | VBZ | VB | 4 | verb_arg12 | ARG2 | company | company | NN | NN | 1 |
Enju can also output both phrase structures and predicate-argument structures in a quasi-XML format. The following pages show the phrase structure and the predicate argument structure for the sentence "It's falling like a stone, said Danny Linger, a pit trader who was standing outside the London International Financial Futures Exchange."
Note: Firefox shows a graphical view, while Internet Explorer shows a bare XML document.The online demo is available to see how Enju works.
UIMA Web Interface for Enju is also available.
Enju includes a parsing model adapted to biomedical text. These were trained with the GENIA treebank by a method of domain adaptation [12,13]. To use this model, specify the option "-genia".
> enju -genia