An Analysis of Characters and Structures of Web Pages Based on Regular Expressions

Xu Lei

doi:10.2991/csss-14.2014.22

<Previous Article In Volume

Next Article In Volume>

An Analysis of Characters and Structures of Web Pages Based on Regular Expressions

Authors

Xu Lei

Corresponding Author

Xu Lei

Available Online June 2014.

DOI: 10.2991/csss-14.2014.22 How to use a DOI?
Keywords: information extraction; HTML; regular expressions
Abstract: This paper introduces a method to analyze characters and structures of web pages via regular expressions. From encoding to HMTL elements, characters in Web pages are counted one by one. The effectiveness of this tool is proven in experiments with more than one hundred real-world web pages. All work can be ready for massive web information extraction.
Copyright: © 2014, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 3rd International Conference on Computer Science and Service System
Series: Advances in Intelligent Systems Research
Publication Date: June 2014
ISBN: 978-94-6252-012-7
ISSN: 1951-6851
DOI: 10.2991/csss-14.2014.22 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Xu Lei
PY  - 2014/06
DA  - 2014/06
TI  - An Analysis of Characters and Structures of Web Pages Based on Regular Expressions
BT  - Proceedings of the 3rd International Conference on Computer Science and Service System
PB  - Atlantis Press
SP  - 98
EP  - 101
SN  - 1951-6851
UR  - https://doi.org/10.2991/csss-14.2014.22
DO  - 10.2991/csss-14.2014.22
ID  - Lei2014/06
ER  -

download .riscopy to clipboard