HOME  •   PROGRAM  •   TSUJIILAB  •  
AUTHOR'S ABSTRACTS: INVITED SPEAKERS
Latent Variable Models in NLP
Haghighi, Aria,
Generative latent variable models define a joint probability distribution over observed variables as well as hidden, or latent, variables not observed during training. These models allow for rich dependencies amongst observed variables without sacrificing tractability. In recent years, there has been renewed interest in latent variable models for NLP applications ranging from PCFG annotation for parsing (Petrov et. al. '06), phrase-table learning (DeNero et. al. '08), and several unsupervised applications (Smith and Eisner '05, Haghighi and Klein '07). In this talk we will discuss these and other NLP applications and why they are suited for structured NLP tasks.
Collapsed Variational Inference for Hierarchical Dirichlet Process
Kurihara, Kenichi,
A wide variety of Dirichlet-multinomial 'topic' models have found interesting ap- plications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by gen- eralizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet vari- ables. Experiments show a significant improvement in accuracy.
Learning to Rank: From Pairwise Approach to Listwise Approach
Liu, Tie-Yan,
The talk is concerned with learning to rank, which is to construct a model or a function for ranking objects. Several methods for learning to rank have been proposed such as Ranking SVM and RankNet, which take object pairs as 'instances' in learning. We refer to them as the pairwise approach. Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on a list of objects. This talk postulates that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning. We will mainly introduce two algorithms: ListNet, in which we propose a listwise loss function based on probability model, and Relational Ranking, in which we propose a listwise ranking function that take relations (e.g. similarity, diversity, parent-child) into consideration when ranking objects. Experimental results on information retrieval show that the proposed approach and the two algorithms perform significantly better than the pairwise approach.
Leveraging User Annotations in Sentiment Summarization
McDonald, Ryan,
Online reviews are quickly becoming the de-facto standard for measuring the quality of products, local services, and merchants. These reviews are increasingly structured and often include star ratings for a variety of relevant aspects, pros-cons lists, and review helpfulness ratings. How can we leverage such structure to improve the accuracy of our tools and corresponding applications? In this talk I will look at the problem of summarizing sentiment. In particular, I will discuss some novel classifiers and topic models that exploit user generated aspect ratings and varying levels of context to classify and extract relevant phrases for inclusion in a summary. This is joint work with Ivan Titov.
Present and Future of a Text Modeling
Mochihashi, Daichi,
In this talk, I will first introduce and summarize several probabilistic text models proposed so far, such as LDA, DM, GaP, Pachinko Allocation and so on, and discuss their strenths and weaknesses to show some future directions. As opposed to these "bag-of-words" style text modeling, n-gram (or \infty-gram) language models are now amenable for Bayesian modeling, especially through the hierarchical Pitman-Yor processes. However, latent topic extension of n-gram models is not that straightforward. I will describe what I did in the \infty-gram paper to show the problems and the future directions of research.
Nonparametric Bayesian Approach for The Distributional Hypothesis
Sato, Issei,
Semantic knowledge of words for particular domains is increasingly important in text mining, information retrieval and speech recognition. We present an unsupervised approach for automatically discovering semantic relationship among words from corpus based on Nonparametric Bayesian modeling and the Distributional Hypothesis which is the basis of statistical semantics. The hypothesis states that words that occur in the same context tend to have a similar meaning. In this study, we take a sequence of N words preceding a word in a corpus, i.e., N-grams as the context of a word. We assume that a context generates a semantic latent topic characterized by words distribution and that a word succeeding the context is generated according to the topic. This generation process can be modeled by Hierarchical Dirichlet Process (HDP) or Hierarchical Pitman-Yor Process (HPY). We present methods to extract the semantic relationship of a word and a context with the generation process.
Two Topics in Statistical NLP: the Jeopardy Model and an M-Estimator
Smith, Noah,
This talk covers two recent advances in statistical parsing.

The first part of the talk presents our work using statistical quasi-synchronous grammars - an elegant tree-to-tree transformation model originally designed for machine translation - in question answering. By modeling loose answer-to-question transformations at the level of bare-bones dependency structure, we achieve notably high on a TREC-style answer-selection task (Wang, Smith, and Mitamura, EMNLP-CoNLL 2007).

The second part of the talk turns to a learning problem. Since log-linear ("maximum entropy") models were first applied to NLP at IBM in the 1990s, they have been widely used. Training them, however, is very expensive for models of sequences and trees. We present a novel, generative parameter estimation algorithm for log-linear structure models based on a generalization of maximum likelihood estimation called M-estimation. We compare this method to existing learning algorithms on a shallow parsing task (Smith, Vail, and Lafferty, ACL 2007).

Semi-supervised Structured Output Learnings
Suzuki, Jun,
Both semi-supervised and structured output learning are important methodologies for many applications in the natural language processing (NLP). In this talk, I first introduce a simple but yet powerful semi-supervised discriminative learning framework for structured output variables, which can be viewed as a natural semi-supervised extension of conventional supervised conditional random fields (CRFs) (Lafferty et al., 2001). Then, I explain the experiments on real NLP tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition, which are also referred to as sequential labeling and segmentation problems. Experimental results on these tasks show that the proposed semi-supervised learning further improve the state-of-the-art performance provided by supervised learning such as CRFs.
Conditional Random Fields Incorporating Incomplete Annotations
Tsuboi, Yuta,
We address corpus building situations which we only annotate important parts of given data, or which we cannot resolve label ambiguities with referring to the linguistic context. We propose a parameter estimation method for Conditional Random Fields (CRFs) which enables us to use those partial and ambiguous annotations of structured data. We show promising results of our method applied to a domain adaptation task of Japanese word segmentation and part-of-speech tagging task using ambiguous tags in the Penn treebank corpus.
AUTHOR'S ABSTRACTS: PROJECT PRESENTATION
Towards Framework-Independent Evaluation of Syntactic Parsers
Miyao, Yusuke,
This talk describes practical issues in the framework-independent evaluation of deep and shallow parsers. We focus on the use of two dependency-based syntactic representation formats in parser evaluation. Our approach is to convert the output of parsers into these two formats, and measure the accuracy of the resulting converted output. Through the evaluation of an HPSG parser and Penn Treebank phrase structure parsers, we found that mapping between different representation schemes is a non-trivial task that results in lossy conversions that may obscure important differences between different parsing approaches. We discuss sources of disagreements in the representation of syntactic structures in the two dependency-based formats, indicating possible directions for improved framework-independent parser evaluation.
Dualized L1-regularized Log-Linear Models and Its Applicatoin in NLP
Okanohara, Daisuke,
The training result of L1-regularized Log-Linear Model (L1-LLM) typically produces a sparse parameter vector, in which many of the parameters are exactly zero. This achieves the feature selection and the models are efficient and interpretable ones. However, the training of L1-LLM is difficult, because it is not differentiable where a parameter is zero, and it therefore cannot be optimized with gradient-based optimization algorithms. We present a dual representation of L1-LLM and show that the optimization problem can be converted into a dual form, and propose an efficient algorithm to estimate parameters based on the dual representation, and a method to extract effective combinations of features. We examined the performance of our algorithm using a dependency parsing task, and clarified the characteristics of our method.
Predicate-argument analysis with DAG parsing
Sagae, Kenji
Most of parsing research has focused on syntactic representations based on tree structures. Although trees have several desirable properties from both computational and linguistic perspectives, the structure of linguistic phenomena that goes beyond shallow syntax often cannot be fully captured by tree representations. There are linguistic formalisms that overcome this limitation by enriching their tree-based representations with features and unification operations. However, this usually results in additional complexity in parser implementation and training of parsing models.

I will present a parsing approach that is nearly as simple as current data-driven dependency parsing approaches, but outputs directed acyclic graphs directly. I will demonstrate the benefits of DAG parsing in predicate-argument analysis, where the advantages of DAGs over trees can be clearly observed.

Comparative parsing evaluation across different grammar frameworks
Matsuzaki, Takuya
The importance and difficulty of framework-independent parser evaluation have recently been recognized among the researchers working on lexicalized grammar parsing. Lexicalized grammar parsers output deep analyses, which include the semantic structures of the sentences. However, it is difficult to directly compare the deep analyses across different grammar frameworks because they use different representational devices. We present a method for the framework-independent parser evaluation wherein a shallow representation of the analyses is used as the `common-format' for the evaluation; the deep analyses are firstly mapped to shallow analyses in the common format by means of a tree-converter derived from a parallel treebank and then evaluated against the gold-standard analyses. Experimental results show that we can still observe meaningful differences among the parsers' performances in such a shallow representation.

COPYRIGHT © TSUJIILAB, UNIVERSITY OF TOKYO. ALL RIGHTS RESERVED.