The Comparison of Distributive Semantics Models Applied to the Task of Short Job Requirements Clustering for the Russian Labor Market
- DOI
- 10.2991/aisr.k.201029.056How to use a DOI?
- Keywords
- clustering, vector models, short texts, job vacancies, labour market
- Abstract
In this article we compare different vector models (tf-idf, word2vec, fasttext, lda, lsi, artm) in the short text clustering task, using a dataset of job vacancy descriptions in Russian. A two-step experiment is proposed to determine the best model and its hyperparameters based on the quality of the resulting short text clusters. In the first stage, we investigate how various hyperparameters of each model can affect the clusters, produced by training a K-means model on each of the vector representations. In particular, we consider in detail, how the size of the output vector representation in each of our models can influence the quality of the final clusters. We also provide an extensive analysis of the effects of various regularization options for clusters, learned using the vectors produced by the ARTM algorithm. During the second stage, the models showing the best results in the previous step (word2vec, fasttext) are analyzed in greater detail. We compare the effectiveness of these models against datasets of different sizes, as well as using different structures of the source fragments (partial elements or full texts of vacancy descriptions). In our experiments, the highest quality of clusters (evaluated using the ARI metric) was achieved by word2vec, closely followed by the fasttext model. Finally, we perform a topic analysis for each of the resulting clusters and evaluate their homogeneity.
- Copyright
- © 2020, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ivan Nikolaev AU - Ivan Ryazanov AU - Dmitry Botov PY - 2020 DA - 2020/11/10 TI - The Comparison of Distributive Semantics Models Applied to the Task of Short Job Requirements Clustering for the Russian Labor Market BT - Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020) PB - Atlantis Press SP - 295 EP - 301 SN - 1951-6851 UR - https://doi.org/10.2991/aisr.k.201029.056 DO - 10.2991/aisr.k.201029.056 ID - Nikolaev2020 ER -