The Analysis of Web Page Information Processing Based on Natural Language Processing
- DOI
- 10.2991/cecs-18.2018.79How to use a DOI?
- Keywords
- Natural Language Processing, Python, Crawler, Word Segmentation, TF-IDF.
- Abstract
Nowadays, the structure of webpages has gradually become more and more complicated, and the content of webpages has gradually increased. This has caused a lot of useless and even illegal information in webpages. The screening of keywords in webpage information and the evasion of invalid illegal information have become the focus of attention. This paper will use natural language processing (NLP) technology to crawl web page information and then process it, in order to avoid some invalid or illegal information, and to find out the key information in the web page. Therefore, this paper also concludes that NLP is reasonable and practical for applications on web pages.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yusheng Zhao PY - 2018/07 DA - 2018/07 TI - The Analysis of Web Page Information Processing Based on Natural Language Processing BT - Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018) PB - Atlantis Press SP - 466 EP - 469 SN - 2352-538X UR - https://doi.org/10.2991/cecs-18.2018.79 DO - 10.2991/cecs-18.2018.79 ID - Zhao2018/07 ER -