- The tool of Latent Variable Perceptron is uploaded.
- The DPLVM code is uploaded.
- Xu SUN (孫 栩)
- PhD candidate
- Tsujii Laboratory, Room 615, Faculty of Science Bldg. 7, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033 Japan
- EMAIL: sunxu AT is.s.u-tokyo.ac.jp
- RESEARCH INTERESTS: statistical natural language processing, machine learning, and algorithms.
- Beyond the research, I like swimming, cycling, badminton, movies, travelling.
- Reviewer for: ACM Transactions on Asian Language Information Processing
- The tool of discriminative probabilistic latent variable model (DPLVM) (Sun et al. EACL 2009, Morency et al. CVPR 2007).
<V0.5 Download>
DPLVM code is a efficient and open source implementation of Discriminative Probabilistic Latent Variable Model (or, latent-dynamic conditional random fields) for segmenting/labeling sequential data. DPLVM is designed for generic purpose and will be applied to a variety of sequential labeling NLP tasks, such as Named Entity Recognition, Information Extraction, Text Chunking, and Word Segmentation.
In the "data" folder, there is an artificial date for illustrating the format of the training and testing files. You can also compare the performance of DPLVM vs. CRF on this artificial data.
version 0.5: 1)fixed a bug in the main.cpp and the memo leak problem on decoding 2)added an artificial data for testing performance
- The tool of Latent variable perceptron (Sun et al. IJCAI 2009).
<V0.2 Download>
This is an efficient and open source implementation of latent perceptron (Sun et al. 2009). It may be treated as an online version of the DPLVM model (discriminative probabilistic latent variable model) (Morency et al. 2007, Sun et al. 2009). It is also an extension of the Collins' averaged perceptron with latent variables. The advantage of using latent variable is to 1) modeling long distance dependency 2) modeling hidden information (to some extent, like EM). Latent perceptron is designed for generic purpose and will be applied to a variety of sequential labeling NLP tasks, such as Named Entity Recognition, Information Extraction, and Text Chunking. This code works on both window XP and Linux.
- Xu Sun, Naoaki OKazaki, Jun'ichi Tsujii.
Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information.
Oral Paper. Proceedings of The Annual Meeting of the Association for Computational Linguistics and IJCNLP (ACL-IJCNLP'09).
Pages 905-913. Suntec, Singapore. 2009.
<PDF> <bib>
- Xu Sun, Takuya Matsuzaki, Daisuke OKanohara, Jun'ichi Tsujii.
Latent Variable Perceptron Algorithm for Structured Classification.
Oral Paper. Proceedings of The International Joint Conference on Artificial Intelligence (IJCAI'09).
Pages 1236-1242. Los Angeles, USA. 2009.
<PDF> <bib> <Source code> <Similar artificial data>
- Xu Sun, Yaozhong Zhang, Takuya Matsuzaki, Yoshimasa Tsuruoka, Jun'ichi Tsujii.
A Latent Variable Chinese Segmenter with Hybrid Features.
Oral Paper. Proceedings of The North American Chapter of the Association for Computational Linguistics (NAACL-HLT'09).
Pages 56-64. Colorado, USA. 2009.
<PDF> <bib>
- Xu Sun, Jun'ichi Tsujii.
Sequential Labeling with Latent Variables: An Exact Inference Algorithm and An Efficient Approximation.
Full Paper. Proceedings of The European Chapter of the Association for Computational Linguistics (EACL'09).
Pages 772-780. Athens, Greece. 2009.
<PDF> <bib> <Source code>
- Xu Sun, Louis-Philippe Morency, Daisuke OKanohara, Jun'ichi Tsujii.
Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference.
Oral Paper. Proceedings of International Conference on Computational Linguistics (COLING'08).
Pages 841-848. Manchester, UK. 2008.
<PDF> <bib>
- Xu Sun, Houfeng Wang, Bo Wang.
Predicting Chinese abbreviations from definitions: An empirical learning approach using support vector regression.
Regular Paper. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 23(4): 602-611. Springer. July 2008.
<PDF>
- Peng Jin, Xu Sun, Yunfang Wu, Shiwen Yu.
Word Clustering for Collocation-Based Word Sense Disambiguation.
Oral Paper. The Int'l Conf. on Computational Linguistics and Intelligent Text Processing (CICLing'07). Lecture Notes in Computer Science, Springer. Pages 267-274. 2007.
- Xu Sun, Houfeng Wang.
Chinese Abbreviation Identification Using Abbreviation-Template Features and Context Information.
Oral Paper. The Int'l Conf. on the Computer Processing of Oriental Languages (ICCPOL'06). Lecture Notes in Computer Science, Springer. Pages 245-255. 2006.
<PDF>
- Xu Sun, Houfeng Wang, Yu Zhang.
Chinese Abbreviation-Definition Identification: A SVM Approach Using Context Information.
Oral Paper. The Pacific Rim Int'l Conf. on Artificial Intelligence (PRICAI'06). Lecture Notes in Computer Science, Springer. Pages 495-504. 2006.
<PDF>
- Xu Sun.
Latent variable perceptron algorithm: Proof of convergence.
Technical Report (TR-ML-2009-3), Tsujii Laboratory, University of Tokyo, 2009.
To appear.
<link>