Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath
- DOI
- 10.2991/icoiese-18.2019.50How to use a DOI?
- Keywords
- DOM, Regex, Web Scraping, Xpath
- Abstract
Data collection is the initial stage of research. There are various data sources on the internet that can be used in the research process. The process of taking data or information from sites on the internet is called web scraping. Some methods of web scraping include Regular Expression (Regex), HTML DOM and XPath. This study ai to determine the performance of the three methods of web scraping. The Comparison is done by testing each method when retrieving data from the target website, then measuring the performance of the process and comparing it. Process time, memory usage, and data consumption are used as measurement parameters in the experiment. The results of the experiment show that web scraping with the regex method is the smallest in memory usage compared to the HTML DOM method, and Xpath. While HTML DOM requires the least amount of time and the smallest data consumption compared to Regular Expression and Xpath methods.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Rohmat Gunawan AU - Alam Rahmatulloh AU - Irfan Darmawan AU - Firman Firdaus PY - 2019/03 DA - 2019/03 TI - Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath BT - Proceedings of the 2018 International Conference on Industrial Enterprise and System Engineering (IcoIESE 2018) PB - Atlantis Press SP - 283 EP - 287 SN - 2589-4943 UR - https://doi.org/10.2991/icoiese-18.2019.50 DO - 10.2991/icoiese-18.2019.50 ID - Gunawan2019/03 ER -