Use of Topic Modelling for Improvement of Quality in the Task of Semantic Search of Educational Courses
- DOI
- 10.2991/csit-19.2019.18How to use a DOI?
- Keywords
- topic modeling, topic filtering
- Abstract
This paper proposes an approach, improving the quality of the original educational course programmes semantic search algorithm, based on vector representations, produced by distributional semantic. The proposed approach works by providing an expert with interpretable topic filtering of courses in search results. Application of probabilistic topic modeling based on additive regularization ensures the interpretability of vector components in representations of texts, allowing the expert, in the process of exploratory search, to narrow down the set of relevant documents found previously by using the vector model. In our experiments, we study the applied task of educational course search, using current requirements of the labor market (requirements described in professional standards serve as search queries). The implementation of topic filtering is based on the open-source library BigARTM. We investigate the influence of hyperparameters and the choice of regularizers in the construction of a topic model on the improvement of quality of educational course semantic search using various vector models: word2vec, fasttext, TF-IDF are investigated.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ivan Nikolaev AU - Dmitry Botov AU - Yuri Dmitrin AU - Julius Klenin AU - Andrei Melnikov PY - 2019/12 DA - 2019/12 TI - Use of Topic Modelling for Improvement of Quality in the Task of Semantic Search of Educational Courses BT - Proceedings of the 21st International Workshop on Computer Science and Information Technologies (CSIT 2019) PB - Atlantis Press SP - 104 EP - 111 SN - 2589-4900 UR - https://doi.org/10.2991/csit-19.2019.18 DO - 10.2991/csit-19.2019.18 ID - Nikolaev2019/12 ER -