|
|
|
|
|
|
| AUTHOR'S ABSTRACTS: INVITED SPEAKERS |
|
|
|
|
Latent Variable Models in NLP |
|
| Haghighi, Aria,
|
 |
Generative latent variable models define a joint probability distribution
over observed variables as well as hidden, or latent, variables not observed
during training. These models allow for rich dependencies amongst observed
variables without sacrificing tractability. In recent years, there has been
renewed interest in latent variable models for NLP applications ranging from
PCFG annotation for parsing (Petrov et. al. '06), phrase-table
learning (DeNero et. al. '08), and several unsupervised
applications (Smith and Eisner '05, Haghighi and Klein '07). In this talk
we will discuss these and other NLP applications and why they are suited
for structured NLP tasks.
|
|
|
Collapsed Variational Inference for Hierarchical Dirichlet Process |
|
| Kurihara, Kenichi,
|
 |
A wide variety of Dirichlet-multinomial 'topic' models have found
interesting ap- plications in recent years. While Gibbs sampling
remains an important method of inference in such models, variational
techniques have certain advantages such as easy assessment of
convergence, easy optimization without the need to maintain detailed
balance, a bound on the marginal likelihood, and side-stepping of
issues with topic-identifiability. The most accurate variational
technique thus far, namely collapsed variational latent Dirichlet
allocation, did not deal with model selection nor did it include
inference for hyperparameters. We address both issues by gen-
eralizing the technique, obtaining the first variational algorithm to
deal with the hierarchical Dirichlet process and to deal with
hyperparameters of Dirichlet vari- ables. Experiments show a
significant improvement in accuracy.
|
|
|
|
Learning to Rank: From Pairwise Approach to Listwise Approach
|
|
| Liu, Tie-Yan,
|
 |
The talk is concerned with learning to rank, which is to construct a
model or a function for ranking objects. Several methods for learning
to rank have been proposed such as Ranking SVM and RankNet, which take
object pairs as 'instances' in learning. We refer to them as the
pairwise approach. Although the pairwise approach offers advantages,
it ignores the fact that ranking is a prediction task on a list of
objects. This talk postulates that learning to rank should adopt the
listwise approach in which lists of objects are used as 'instances' in
learning. We will mainly introduce two algorithms: ListNet, in which
we propose a listwise loss function based on probability model, and
Relational Ranking, in which we propose a listwise ranking function
that take relations (e.g. similarity, diversity, parent-child) into
consideration when ranking objects. Experimental results on
information retrieval show that the proposed approach and the two
algorithms perform significantly better than the pairwise approach.
|
|
Leveraging User Annotations in Sentiment Summarization |
|
| McDonald, Ryan,
|
|
Online reviews are quickly becoming the de-facto standard for
measuring the quality of products, local services, and merchants.
These reviews are increasingly structured and often include star
ratings for a variety of relevant aspects, pros-cons lists, and review
helpfulness ratings. How can we leverage such structure to improve the
accuracy of our tools and corresponding applications? In this talk I
will look at the problem of summarizing sentiment. In particular, I
will discuss some novel classifiers and topic models that exploit user
generated aspect ratings and varying levels of context to classify and
extract relevant phrases for inclusion in a summary. This is joint
work with Ivan Titov.
|
|
|
Present and Future of a Text Modeling |
|
| Mochihashi, Daichi,
|
|
In this talk, I will first introduce and summarize several
probabilistic text models
proposed so far, such as LDA, DM, GaP, Pachinko Allocation and so on, and
discuss their strenths and weaknesses to show some future directions.
As opposed to these "bag-of-words" style text modeling, n-gram (or \infty-gram)
language models are now amenable for Bayesian modeling, especially through the
hierarchical Pitman-Yor processes. However, latent topic extension of
n-gram models
is not that straightforward. I will describe what I did in the
\infty-gram paper to
show the problems and the future directions of research.
|
|
|
Nonparametric Bayesian Approach for The Distributional Hypothesis |
|
| Sato, Issei,
|
|
Semantic knowledge of words for particular domains is increasingly
important in text mining, information retrieval and speech
recognition.
We present an unsupervised approach for automatically discovering
semantic relationship among words from corpus based on Nonparametric
Bayesian modeling and the Distributional Hypothesis which is the basis
of statistical semantics. The hypothesis states that words that occur
in the same context tend to have a similar meaning.
In this study, we take a sequence of N words preceding a word in a
corpus, i.e., N-grams as the context of a word. We assume that a
context generates a semantic latent topic characterized by words
distribution and that a word succeeding the context is generated
according to the topic. This generation process can be modeled by
Hierarchical Dirichlet Process (HDP) or Hierarchical Pitman-Yor
Process (HPY).
We present methods to extract the semantic relationship of a word and
a context with the generation process.
|
|
|
Two Topics in Statistical NLP: the Jeopardy Model and an M-Estimator
|
|
| Smith, Noah,
|
 |
This talk covers two recent advances in statistical parsing.
The first part of the talk presents our work using statistical
quasi-synchronous grammars - an elegant tree-to-tree transformation
model originally designed for machine translation - in question
answering. By modeling loose answer-to-question transformations at
the level of bare-bones dependency structure, we achieve notably high
on a TREC-style answer-selection task (Wang, Smith, and Mitamura,
EMNLP-CoNLL 2007).
The second part of the talk turns to a learning problem. Since
log-linear ("maximum entropy") models were first applied to NLP at IBM
in the 1990s, they have been widely used. Training them, however, is
very expensive for models of sequences and trees. We present a novel,
generative parameter estimation algorithm for log-linear structure
models based on a generalization of maximum likelihood estimation
called M-estimation. We compare this method to existing learning
algorithms on a shallow parsing task (Smith, Vail, and Lafferty, ACL 2007).
|
|
|
Semi-supervised Structured Output Learnings |
|
| Suzuki, Jun,
|
|
Both semi-supervised and structured output learning are important
methodologies for many applications in the natural language processing
(NLP). In this talk, I first introduce a simple but yet powerful
semi-supervised discriminative learning framework for structured
output variables, which can be viewed as a natural semi-supervised
extension of conventional supervised conditional random fields (CRFs)
(Lafferty et al., 2001). Then, I explain the experiments on real NLP
tasks, such as part-of-speech tagging, syntactic chunking, and named
entity recognition, which are also referred to as sequential labeling
and segmentation problems. Experimental results on these tasks show
that the proposed semi-supervised learning further improve the
state-of-the-art performance provided by supervised learning such as
CRFs.
|
|
|
Conditional Random Fields Incorporating Incomplete Annotations |
|
| Tsuboi, Yuta,
|
|
We address corpus building situations which we only annotate important
parts of given data, or which we cannot resolve label ambiguities with
referring to the linguistic context. We propose a parameter estimation
method for Conditional Random Fields (CRFs) which enables us to use those
partial and ambiguous annotations of structured data. We show promising
results of our method applied to a domain adaptation task of Japanese word
segmentation and part-of-speech tagging task using ambiguous tags in the
Penn treebank corpus.
|
|
|
| AUTHOR'S ABSTRACTS: PROJECT PRESENTATION |
|
|
Towards Framework-Independent Evaluation of Syntactic Parsers
|
|
| Miyao, Yusuke,
|
 |
This talk describes practical issues in the framework-independent
evaluation of deep and shallow parsers. We focus on the use of two
dependency-based syntactic representation formats in parser
evaluation. Our approach is to convert the output of parsers into
these two formats, and measure the accuracy of the resulting converted
output. Through the evaluation of an HPSG parser and Penn Treebank
phrase structure parsers, we found that mapping between different
representation schemes is a non-trivial task that results in lossy
conversions that may obscure important differences between different
parsing approaches. We discuss sources of disagreements in the
representation of syntactic structures in the two dependency-based
formats, indicating possible directions for improved
framework-independent parser evaluation.
|
|
|
Dualized L1-regularized Log-Linear Models and Its Applicatoin in NLP |
|
| Okanohara, Daisuke,
|
 |
The training result of L1-regularized Log-Linear Model (L1-LLM)
typically produces a sparse parameter vector, in which many of the
parameters are exactly zero. This achieves the feature selection and
the models are efficient and interpretable ones. However, the training
of L1-LLM is difficult, because it is not differentiable where a
parameter is zero, and it therefore cannot be optimized
with gradient-based optimization algorithms.
We present a dual representation of L1-LLM and show that the optimization
problem can be converted into a dual form, and propose an efficient
algorithm to estimate parameters based on the dual representation, and
a method to extract effective combinations of features.
We examined the performance of our algorithm using a dependency
parsing task, and clarified the characteristics of our method.
|
|
Predicate-argument analysis with DAG parsing |
|
| Sagae, Kenji
|
 |
Most of parsing research has focused on syntactic representations
based on tree structures. Although trees have several desirable
properties from both computational and linguistic perspectives, the
structure of linguistic phenomena that goes beyond shallow syntax
often cannot be fully captured by tree representations. There are
linguistic formalisms that overcome this limitation by enriching their
tree-based representations with features and unification operations.
However, this usually results in additional complexity in parser
implementation and training of parsing models.
I will present a parsing approach that is nearly as simple as current
data-driven dependency parsing approaches, but outputs directed
acyclic graphs directly. I will demonstrate the benefits of DAG
parsing in predicate-argument analysis, where the advantages of DAGs
over trees can be clearly observed.
|
|
|
Comparative parsing evaluation across different grammar frameworks |
|
| Matsuzaki, Takuya
|
 |
The importance and difficulty of framework-independent parser evaluation
have recently been recognized among the researchers working on lexicalized
grammar parsing.
Lexicalized grammar parsers output deep analyses, which include
the semantic structures of the sentences.
However, it is difficult to directly compare the deep analyses across different
grammar frameworks because they use different representational devices.
We present a method for the framework-independent parser evaluation
wherein a shallow representation of the analyses is used as the `common-format'
for the evaluation; the deep analyses are firstly mapped to shallow analyses in
the common format by means of a tree-converter derived from a parallel treebank
and then evaluated against the gold-standard analyses.
Experimental results show that we can still observe meaningful differences among
the parsers' performances in such a shallow representation.
|
|
|