Analysis and Research of Distributed network Crawler based on Cloud Computing Hadoop Platform
- DOI
- 10.2991/snce-18.2018.216How to use a DOI?
- Keywords
- Cloud computing; Network crawler; Hadoop platform; Data processing; Distributed network
- Abstract
Cloud computing is a new way of network service, which transforms the traditional desktop task processing into the network based task processing. Hadoop is a Java - based software framework for distributed intensive data processing and data analysis. Web crawler is a program or script that automatically captures web information according to certain rules of Internet access. This paper presents analysis and research of distributed network crawler based on cloud computing Hadoop platform. Distributed network crawling nodes can be divided into four parts: network crawler module, node information maintenance module, task allocation module, node communication module.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Hongsheng Xu AU - Ganglong Fan AU - Ke Li PY - 2018/05 DA - 2018/05 TI - Analysis and Research of Distributed network Crawler based on Cloud Computing Hadoop Platform BT - Proceedings of the 8th International Conference on Social Network, Communication and Education (SNCE 2018) PB - Atlantis Press SP - 1045 EP - 1049 SN - 2352-538X UR - https://doi.org/10.2991/snce-18.2018.216 DO - 10.2991/snce-18.2018.216 ID - Xu2018/05 ER -