|
|
| Position |
D3 |
| Research Area |
Statistical Natural Language Processing, Machine Learning, Data ,Structure, and Algorithm |
| Publication |
English | Japanese |
| Contact |
Department of Computer Science, Faculty of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, JAPAN |
| office |
Room 615, 7th Building of Faculty of Science |
| e-mail |
hillbig at is.s.u-tokyo.ac.jp (replace at with @) |
I am a third-year PhD student, and now working on a stastical natural language processing. I am interested in a statistical natural language processing using very large corpora. I also study data indexing, algorithm for large data, statistical learning theory, and information theory.
- I will have a talk at ALENEX 2010 (Jan. 2010)
- I had talk at SPIRE 2009 (Aug. 2009)
- I had a poster presentation at NAACL-HLT 2009 (May. 2009)
- I had a poster presentation at SDM 2009 (May. 2009) poster(ppt)
- I had a talk at ESA 2008 (Sep. 2008)
- "Learning Combination Features with L1 Regularization", D. Okanohara and J. Tsujii., In the NAACL-HLT. June 2009. pdf ppt
- "Text Categorization with All Substring Features", D. Okanohara, J. Tsujii., In the SIAM International Conference on Data Mining (SDM). April 2009. PDF PPT
- "A discriminative language model with pseudo-negative samples", D. Okanohara and J. Tsujii., In Proc. of ACL 2007 pdf
- "Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition", D. Okanohara, Y. Miyao, Y. Tsuruoka and J. Tsujii., In Proc. of ACL 2006. Sydney, Australia, July 2006. pdf
- "Assigning Polarity Scores to Reviews Using Machine Learning Techniques", D. Okanohara and J. Tsujii., IJCNLP 2005. LNCS3651. Jeju Island, Korea, Springer-Verlag, October 2005. pdf
- "Conjunctive Filter: Breaking the Entropy Barrier", D. Okanohara, Y. Yoshida., In the Proc. of ALENEX 2010 (pdf pptx(slides) pdf(slides))
- "A Linear-Time Burrows-Wheeler Transform using Induced Sorting", D. Okanohara and K. Sadakane., In the Proc. of SPIRE 16th String Processing and Information Retrieval Symposium. Aug 2009. (pdf(draft))
- "An Online Algorithm for Finding the Longest Previous Factors", D. Okanohara and K. Sadakane., In the 16th European Symposium on Algorithms. Sep 2008 (ppt, pdf)
- "Practical Entropy-Compressed Rank/Select Dictionary", D. Okanohara and K. Sadakane., In the Proceedings of ALENEX 2007. New Orleans, Lousiana, January 2007. (ppt, pdf)
- "Partially Decodable Compression with Static PPM", D. Okanohara., In the Data Compression Conference 2005 poster session. Snowbird, UT, USA, March 2005.
- Minise MIni Search Engine. A compact full-text search engine supporting sequential search, and indexes; inverted file index, N-gram index, and suffix arrays.
- Ohmm Online EM algorithm for Hidden Markov Models
- OLL Online Machine Learning Library
- Bep Associative Arrays for very large collections (And minimal perfect hash function library)
- Tx Succinct Trie Data Structure
(Mitou Software Souzou Jigyou)
- A New Data Compression Algorithm using Word Extraction Method. (2002)
- Universal Probabilistic Language Models (2003)
- Document Classification using Context Information. (2004-2005)
These software are used at Preferred Infrastructure
- Data Compression Handbook", Shuwa System, 2003 (Japanese)
- "Compression Algorithms", C Magazine, Softbank Creative, 2006 January (Japnese)
- Genome4, Bio Informatics Programming Contest Problem 2 Best awards, 2004
- Exploratory Software Project, Super Creater Awards 2005
- YANS 2006, Best presentation awards
- President's Prize of the University of Tokyo, 2007 link(Japanese)
- YANS 2007, Best presentation awards,
- IBIS 2008, Prize for encouragement