Exploring Representations for Semantic-Rich Part of Speech Tagging
- DOI
- 10.2991/iiicec-15.2015.223How to use a DOI?
- Keywords
- Part-of-Speech (POS); Treebank; Maximum Entropy; N-gram Model
- Abstract
Part-of-speech (POS) tagging is the basic and primary analysis step in many natural language processing (NLP) applications. For English, it is often considered a solved problem. There are well established approaches, and the accuracy is around 97% with sufficient domain-specific training data. However, many NLP applications have very different special requirements, and the POS tageset has its own characteristics. These challenges can greatly affect the quality of the part-of-speech tagging process. To address these issues and achieve high POS tagging accuracy, we investigate the representations that can be applied to improve the performance of POS task. Our experiments show that the accuracy of POS tagging degrades significantly when tested with a large semantic and syntactic tagset. In addition, our analysis of experiments suggests that tokens rather than POS tags have more effect on tagging accuracy. Our best results were reached by using the most appropriate representations for POS tagging task.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Weidong Qu AU - Sicong Yue PY - 2015/03 DA - 2015/03 TI - Exploring Representations for Semantic-Rich Part of Speech Tagging BT - Proceedings of the 2015 International Industrial Informatics and Computer Engineering Conference PB - Atlantis Press SP - 999 EP - 1002 SN - 2352-538X UR - https://doi.org/10.2991/iiicec-15.2015.223 DO - 10.2991/iiicec-15.2015.223 ID - Qu2015/03 ER -