The Analysis of Web Page Information Processing Based on Natural Language Processing

Yusheng Zhao

doi:10.2991/cecs-18.2018.79

<Previous Article In Volume

Next Article In Volume>

The Analysis of Web Page Information Processing Based on Natural Language Processing

Authors

Yusheng Zhao

Corresponding Author

Yusheng Zhao

Available Online July 2018.

DOI: 10.2991/cecs-18.2018.79 How to use a DOI?
Keywords: Natural Language Processing, Python, Crawler, Word Segmentation, TF-IDF.
Abstract: Nowadays, the structure of webpages has gradually become more and more complicated, and the content of webpages has gradually increased. This has caused a lot of useless and even illegal information in webpages. The screening of keywords in webpage information and the evasion of invalid illegal information have become the focus of attention. This paper will use natural language processing (NLP) technology to crawl web page information and then process it, in order to avoid some invalid or illegal information, and to find out the key information in the web page. Therefore, this paper also concludes that NLP is reasonable and practical for applications on web pages.
Copyright: © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018)
Series: Advances in Computer Science Research
Publication Date: July 2018
ISBN: 978-94-6252-571-9
ISSN: 2352-538X
DOI: 10.2991/cecs-18.2018.79 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yusheng Zhao
PY  - 2018/07
DA  - 2018/07
TI  - The Analysis of Web Page Information Processing Based on Natural Language Processing
BT  - Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018)
PB  - Atlantis Press
SP  - 466
EP  - 469
SN  - 2352-538X
UR  - https://doi.org/10.2991/cecs-18.2018.79
DO  - 10.2991/cecs-18.2018.79
ID  - Zhao2018/07
ER  -

download .riscopy to clipboard