A novel approach for building Domain-specific Lexical Repository with Chinese Wikipedia
- DOI
- 10.2991/icemct-15.2015.230How to use a DOI?
- Keywords
- Domain-specific lexical repository, Domain Corpus, Domain Relatedness, Modified Explicit Semantic Analysis, Chinese-Wikipedia.
- Abstract
Domain ontology is a collection of domain-specific concepts and their interrelationships, which provide an abstract view of the application domain and is used in many areas such as semantic mining(SM) and natural language processing(NLP). But the direct construction of Domain ontology manually is labor intensive and time consuming, while auto-generated Domain-specific Lexical Repository can be used to build domain ontology as an indispensable component. In this paper, we propose a two-stage method to build domain-specific lexical repository making use of the dump service of Chinese Wikipedia. The main idea is that only concepts strongly semantic-related to the multi roots we choose are incorporate into the repository. First we use the dump service for all pages(zhwiki-all-pages.xml) of Chinese Wikipedia to generate a graph of all Wikipedia concepts, we call it pre-stage. Then we enter stage one by selecting three top-level nodes as roots, traversing the graph generated in the pre-stage using BFS-like algorithm to form spanning trees and computing rough domain relatedness of these nodes at the same time. Finally, in stage two we use the novel Modified Explicit Semantic Analysis method combined with the results we got in stage one to compute the ultimate domain relatedness. The experimental results shows that our method could get a high-quality domain-specific lexical repository.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Zhijian Ruan AU - Xiu Li PY - 2015/06 DA - 2015/06 TI - A novel approach for building Domain-specific Lexical Repository with Chinese Wikipedia BT - Proceedings of the 2015 International Conference on Education, Management and Computing Technology PB - Atlantis Press SP - 1093 EP - 1100 SN - 2352-5398 UR - https://doi.org/10.2991/icemct-15.2015.230 DO - 10.2991/icemct-15.2015.230 ID - Ruan2015/06 ER -