Enju

Developed at:
The University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 2.3.1 is available since Dec. 12th, 2008

Online demo is available!

Japanese page

Contents

Overview

Enju is a syntactic parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role.

The main features of the Enju parser are:

This version includes the following additional features.

For any inquiry, contact us.

How to install Enju

Binary packages of Enju 2.3.1 are avaiable at Tsujii Laboratory software download page. Currently, following packages are available for download.

You can try Enju before download via online demo. Please contact us when you need a source package.

Installation from a binary package

1. Download the latest package for your particular platform (enju-X.Y-PLATFORM.tar.gz).

2. Untar the archive into a directory where you would like to install Enju ($DIR indicates the directory in what follows).

> cd $DIR
> tar xvzf enju-X.Y-PLATFORM.tar.gz

"enju" will be installed in "$DIR/enju-X.Y/".

Installation from a source package

To install Enju, you need: The above packages are included in most Linux distributions.

1. Download the latest source package of Enju (enju-X.Y.tar.gz).

2. Compile and install Enju.

> tar xvzf enju-X.Y.tar.gz
> cd enju-X.Y
> ./configure
> make
> make install

"enju" will be installed in "/usr/local/bin/".

If you want to install it other than "/usr/local/", specify "--prefix". For example,

> ./configure --prefix=$DIR

will install Enju into $DIR ("enju" is installed in "$DIR/bin/").

How to use Enju

To parse sentences, put a file (having one sentence per line) to the standard input.

> enju < RAWTEXT > RESULTS

You can alternatively use a high-speed parser by using the command "mogura"

> mogura < RAWTEXT > RESULTS

These commands work in mostly the same way.

If you want to parse tokenized texts with Penn Treebank-style part-of-speech tags,

> enju -nt < TAGGEDTEXT > RESULTS

The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a stand-off format.

> enju -xml < RAWTEXT > RESULTS
> enju -so < RAWTEXT > RESULTS

You can also use Enju as a CGI server.

> enju -cgi PORT_NUMBER

You can access to the port PORT_NUMBER with a CGI query, and receive parsing results in the XML format.

http://localhost:PORT_NUMBER/cgi-lilfes/enju?sentence=he+runs+the+company

For further details on the output formats, see the manuals and the technical report.

Demo and web interface

Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire semantic relations among words in an input sentence without the burden of analyzing its deep-syntactic structure.

Parsing examples are shown below. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "run" and the argument "he". Note that the same semantic relations holding among the three words, "he", "run", and "company", are obtained from sentences written in different syntactic structures.

Sentence 1: He runs the company.

ROOTROOTROOTROOT-1ROOTROOTrunsrunVBZVB1
runsrunVBZVB1verb_arg12ARG1HehePRPPRP0
runsrunVBZVB1verb_arg12ARG2companycompanyNNNN3
thetheDTDT2det_arg1ARG1companycompanyNNNN3

Sentence 2: The company that he runs is small.

ROOTROOTROOTROOT-1ROOTROOTisbeVBZVB5
isbeVBZVB5verb_arg12ARG1companycompanyNNNN1
isbeVBZVB5verb_arg12ARG2smallsmallJJJJ6
smallsmallJJJJ6adj_arg1ARG1companycompanyNNNN1
ThetheDTDT0det_arg1ARG1companycompanyNNNN1
thatthatININ2relative_arg1ARG1companycompanyNNNN1
runsrunVBZVB4verb_arg12ARG1hehePRPPRP3
runsrunVBZVB4verb_arg12ARG2companycompanyNNNN1

Enju can also output both phrase structures and predicate-argument structures in a quasi-XML format. The following pages show the phrase structure and the predicate argument structure for the sentence "It's falling like a stone, said Danny Linger, a pit trader who was standing outside the London International Financial Futures Exchange."

Note: Firefox shows a graphical view, while Internet Explorer shows a bare XML document.

The online demo is available to see how Enju works.

UIMA Web Interface for Enju is also available.

Documentation

A parsing model for biomedical text

Enju includes a parsing model adapted to biomedical text. These were trained with the GENIA treebank by a method of domain adaptation [12,13]. To use this model, specify the option "-genia".

> enju -genia

Publications

[1] Yusuke Miyao and Jun'ichi Tsujii. 2002. Maximum Entropy Estimation for Feature Forests. In Proceedings of HLT 2002.

[2] Yusuke Miyao and Jun'ichi Tsujii. 2003. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) 2003, pp. 285-291

[3] Yusuke Miyao, Takashi Ninomiya, and Jun'ichi Tsujii. 2004. Corpus-oriented Grammar Development for Acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of IJCNLP-04.

[4] Yusuke Miyao and Jun'ichi Tsujii. 2005. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. In Proceedings of ACL-2005, pp. 83-90.

[5] Takashi Ninomiya, Takuya Matsuzaki, Yoshimasa Tsuruoka, Yusuke Miyao and Jun'ichi Tsujii. 2006. Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proceedings of EMNLP 2006.

[6] Takashi Ninomiya, Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. A log-linear model with an n-gram reference distribution for accurate HPSG parsing. In Proceedings of IWPT 2007.

[7] Yusuke Miyao and Jun'ichi Tsujii. 2008. Feature Forest Models for Probabilistic HPSG Parsing. Computational Linguistics. 34(1). pp. 35--80, MIT Press.

[8] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

[9] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing . In Proceedings of IWPT 2005.

[10] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii. 2006. Fast and Scalable HPSG Parsing. Traitement automatique des langues (TAL). 46(2). Association pour le Traitement Automatique des Langues.

[11] Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Efficient HPSG Parsing with Supertagging and CFG-filtering. In Proceedings of IJCAI 2007.

[12] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Adapting a probabilistic disambiguation model of an HPSG parser to a new domain . In Proceedings of IJCNLP 2005.

[13] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser. In Proceedings of IWPT 2007.

[14] Kenji Sagae, Yusuke Miyao, and Jun'ichi Tsujii. 2007. HPSG Parsing with Shallow Dependency Constraints. In Proceedings of ACL 2007.

[15] Takuya Matsuzaki and Jun'ichi Tsujii. 2008. Comparative Parser Performance Analysis across Grammar Frameworks through Automatic Tree Conversion using Synchronous Grammars. In Proceedings COLING 2008.

[16] Yusuke Miyao, Rune Saetre, Kenji Sagae, Takuya Matsuzaki, and Jun'ichi Tsujii. 2008. Task-Oriented Evaluation of Syntactic Parsers and Their Representations. In Proceedings of ACL-08:HLT.