International Journal of Computational Intelligence Systems

Volume 12, Issue 1, November 2018, Pages 299 - 310

A Methodology to Refine Labels in Web Search Results Clustering

Authors
Zaher Salah1, *, Ahmad Aloqaily1, Malak Al-Hassan2, Abdel-Rahman Al-Ghuwairi1
1Prince Al Hussein Bin Abdullah II Faculty for Information Technology, Hashemite University, Zarqa, Jordan
2Department of Business of Information Technology, University of Jordan, Amman, Jordan
*Corresponding author. Email: zahersalah@hotmail.com
Corresponding Author
Zaher Salah
Received 25 June 2018, Revised 19 November 2018, Accepted 14 December 2018, Available Online 31 December 2018.
DOI
10.2991/ijcis.2019.125905647How to use a DOI?
Keywords
Information retrieval; Machine learning; Web search results clustering; Web intelligence
Abstract

Information retrieval systems like web search engines can be used to meet the user’s information needs by searching and retrieving the relevant documents that match the user’s query. Firstly, the query is inputted to the web search engine and assumed to be a good representative for the user’s intention and reflecting specifically his information needs and thus it should be long enough, discriminative, specific and unambiguous. Secondly, the web search engine typically respond to the query by sending back a long flat list of web search results and each search result represents a relevant document. Typically, that list may contain thousands or millions of web search results and thus it is difficult to navigate and locate a specific document relevant to a specific topic. As a postretrieval process, web search results clustering may be a solution for this issue where web search results can be categorized as clusters. These clusters supposed to contain topically related documents and labelled by descriptive and concise labels. These labels supposed to correctly describe the contents of each cluster. Thus the users can easily choose a cluster representing the intended topic and navigate through relatively few documents inside that cluster. High-quality labelling for clusters is crucial for users who can now gain insight into that clusters’ contents, general structure, and distribution of the topics among documents in the clusters. This make the user able to preview and navigate easily and fast. To this end, the authors in this paper introduced a methodology to enhance labels for clusters of web search results. The proposed methodology is founded on the idea of using the existing labels nominated by the original Suffix Tree Clustering (STC) algorithm and adapting these labels and/or clusters so that it become more concise and descriptive. The propose methodology was conducted on the original STC algorithm to produce an enhanced version of the classical STC algorithm. The enhanced algorithm was experimented and the produced clusters and labels were evaluated and compared with respect to the classical STC algorithm. For evaluation, the authors used clusters labelling performance measure considered five parameters f1: Comprehensibility, f2: Descriptiveness, f3: Discriminative Power, f4: Uniqueness, and f5: Nonredundancy. The reported results shown that the new enhanced labels outperformed the original labels and the overall performance has been enhanced. The recorded results indicated that: (i) The proposed methodology achieved better performance and the overall average recorded values for the used performance measure (f6) was 0.921. (ii) Number of clusters was decreased from 15 to 9 clusters only. (iii) Number of duplicated results was decreased from 143 to 121 only, and (iv) average number of phrases per label was increased from 1.67 to 2.00 phrases.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
12 - 1
Pages
299 - 310
Publication Date
2018/12/31
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.2019.125905647How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Zaher Salah
AU  - Ahmad Aloqaily
AU  - Malak Al-Hassan
AU  - Abdel-Rahman Al-Ghuwairi
PY  - 2018
DA  - 2018/12/31
TI  - A Methodology to Refine Labels in Web Search Results Clustering
JO  - International Journal of Computational Intelligence Systems
SP  - 299
EP  - 310
VL  - 12
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.2019.125905647
DO  - 10.2991/ijcis.2019.125905647
ID  - Salah2018
ER  -