Information Extraction from Web as Knowledge Resources for Indonesian Question Answering System
- DOI
- 10.2991/aisr.k.200424.064How to use a DOI?
- Keywords
- information extraction, Indonesian Question Answering System
- Abstract
Research in the field of Open Domain Question Answering System (OD-QAS) generally involves external knowledge which are dynamic and require high-level representation. Strong external knowledge is one of the key success of QAS. Therefore, intensive research is needed in this area. Web is one of the big source of information that can be used as external knowledge by QAS. However, the main problem is the Web contains a lot of unstructured data. Hence, a model is needed to extract information from the web. The model developed in this research based on pipeline architecture and consists three main processes: pre-processing, information extraction processing, and text processing. The input model is factoid questions, and the output are snippets or set of sentences that contains target answers. There are three search engines assist to finding relevant information from the Web, i.e, Yahoo!, Bing, and Ask. The result of average precision and deviation value for the each search engines are slightly different. The highest total number of snippets (true positive) generated by Yahoo! is 65 snippets, while the best average precision obtained by Bing is 25.33%.
- Copyright
- © 2020, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Abdiansah ABDIANSAH AU - Alvi Syahrini UTAMI PY - 2020 DA - 2020/05/06 TI - Information Extraction from Web as Knowledge Resources for Indonesian Question Answering System BT - Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019) PB - Atlantis Press SP - 419 EP - 425 SN - 1951-6851 UR - https://doi.org/10.2991/aisr.k.200424.064 DO - 10.2991/aisr.k.200424.064 ID - ABDIANSAH2020 ER -