Authoring of Personalized Web Page from Heterogeneous Web Pages by Content Extraction and Integration
- DOI
- 10.2991/cnct-16.2017.102How to use a DOI?
- Keywords
- Authoring of Web pages, Content extraction, Element similarity, CS-DOM tree.
- Abstract
Authoring of personalized Web page by integrating heterogeneous Web page elements from different sites is a challenging task in Web 2.0 applications. An approach to extract various of partitions or elements, which can be the basic HTML elements, CSS definitions, JavaScript source code, etc, from different Web sites, thus implementing authoring of new page from heterogeneous Web pages is proposed in this paper. A novel DOM tree model, CS-DOM tree, is introduced to retrieve the CSS definitions. In order to assure that the new Web pages keep updating synchronized with the source pages, a method based on the structure of DOM and the context of elements to relocate the elements that have been retrieved before is then presented. The similarity calculation algorithm used to judge whether the relocated elements and the elements retrieved before are from the same position is developed at last. The method proposed in this paper has been applied to develop a personalized portal.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Wei-gang LI AU - Ke SUN AU - Shuo-chen WANG PY - 2016/12 DA - 2016/12 TI - Authoring of Personalized Web Page from Heterogeneous Web Pages by Content Extraction and Integration BT - Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016) PB - Atlantis Press SP - 734 EP - 740 SN - 2352-538X UR - https://doi.org/10.2991/cnct-16.2017.102 DO - 10.2991/cnct-16.2017.102 ID - LI2016/12 ER -