Normalizing Chinese Address for Internet Applications
- DOI
- 10.2991/iceeecs-16.2016.2How to use a DOI?
- Keywords
- Set operation; Administrative division; Chinese address; Moving window; Matching degree; Analytical rules
- Abstract
Many Internet applications take addresses as input. However, addresses on the Internet are always non-normalized, which cannot be used directly. In this paper, we propose an Administrative Divisions Extracting Algorithm to normalize Chinese addresses on the Internet. Our approach proceeds as follows: 1) It began with the "Road" feature words processing and extracted all possible administrative divisions data set from Chinese addresses by using administrative divisions dictionary and Moving Window Algorithm. 2) According to the Chinese administrative divisions has the characteristics of hierarchical relationships between elements, the algorithm established the conditions set operations rules of administrative divisions, it carried on the set operations to administrative divisions data set. 3) The algorithm obtained Chinese address administrative divisions of the most integrity. In order to investigate the feasibility and effectiveness of our approach, we performed experiments that the paper verified the availability of whether to adopt the "road" feature words processing for about 250 thousands Chinese address data extracted from the internet. At the same time, the algorithm compared with the current address matching technology. Experimental results show that the accuracy reached 93.51%.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Xiaolin Li AU - Shuang Huang AU - Tao Lu AU - Deng Chen PY - 2016/12 DA - 2016/12 TI - Normalizing Chinese Address for Internet Applications BT - Proceedings of the 2016 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 2016) PB - Atlantis Press SP - 5 EP - 9 SN - 2352-538X UR - https://doi.org/10.2991/iceeecs-16.2016.2 DO - 10.2991/iceeecs-16.2016.2 ID - Li2016/12 ER -