Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020)

The Comparison of Distributive Semantics Models Applied to the Task of Short Job Requirements Clustering for the Russian Labor Market

Authors
Ivan Nikolaev, Ivan Ryazanov, Dmitry Botov
Corresponding Author
Ivan Nikolaev
Available Online 10 November 2020.
DOI
10.2991/aisr.k.201029.056How to use a DOI?
Keywords
clustering, vector models, short texts, job vacancies, labour market
Abstract

In this article we compare different vector models (tf-idf, word2vec, fasttext, lda, lsi, artm) in the short text clustering task, using a dataset of job vacancy descriptions in Russian. A two-step experiment is proposed to determine the best model and its hyperparameters based on the quality of the resulting short text clusters. In the first stage, we investigate how various hyperparameters of each model can affect the clusters, produced by training a K-means model on each of the vector representations. In particular, we consider in detail, how the size of the output vector representation in each of our models can influence the quality of the final clusters. We also provide an extensive analysis of the effects of various regularization options for clusters, learned using the vectors produced by the ARTM algorithm. During the second stage, the models showing the best results in the previous step (word2vec, fasttext) are analyzed in greater detail. We compare the effectiveness of these models against datasets of different sizes, as well as using different structures of the source fragments (partial elements or full texts of vacancy descriptions). In our experiments, the highest quality of clusters (evaluated using the ARI metric) was achieved by word2vec, closely followed by the fasttext model. Finally, we perform a topic analysis for each of the resulting clusters and evaluate their homogeneity.

Copyright
© 2020, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020)
Series
Advances in Intelligent Systems Research
Publication Date
10 November 2020
ISBN
978-94-6239-265-6
ISSN
1951-6851
DOI
10.2991/aisr.k.201029.056How to use a DOI?
Copyright
© 2020, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Ivan Nikolaev
AU  - Ivan Ryazanov
AU  - Dmitry Botov
PY  - 2020
DA  - 2020/11/10
TI  - The Comparison of Distributive Semantics Models Applied to the Task of Short Job Requirements Clustering for the Russian Labor Market
BT  - Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020)
PB  - Atlantis Press
SP  - 295
EP  - 301
SN  - 1951-6851
UR  - https://doi.org/10.2991/aisr.k.201029.056
DO  - 10.2991/aisr.k.201029.056
ID  - Nikolaev2020
ER  -