Text Age Rating Methods for Digital Libraries
- DOI
- 10.2991/assehr.k.200509.066How to use a DOI?
- Keywords
- content rating, age restrictions, Russian Age Rating System, text classification, text addressee, textual target audience, machine learning
- Abstract
The addressee plays a major role in communication. Text creating involves taking into account the features of the target audience, to which he refers in writing. In this article, the text addressee detection is considered from the point of view of natural language processing. The task of age classification deserves special attention. Its relevance is associated with the development of e-learning systems and digital libraries. Moreover, nowadays all information products in Russia must be marked by age rating. This article describes the first attempt to solve the automatic age rating prediction task by the example of Russian texts. In this work, we analyze the main factors affecting the text age rating and propose the first approximation classifier for determining the age of the textual target audience. Our approach is based on a range of features designed to capture readability, lexical and topic modeling characteristics. We use these features to train a Linear Support Vector Classifier. We trained and tested our classifier on a dataset of 1200 previews of fiction books in Russian annotated for age rating by books’ publishers. Our performance evaluation suggests that proposed features are a good indicator for text age rating. However, in future work, we plan to add and evaluate other types of models and linguistic features.
- Copyright
- © 2020, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - A.V. Glazkova PY - 2020 DA - 2020/05/13 TI - Text Age Rating Methods for Digital Libraries BT - Proceedings of the International Scientific Conference “Digitalization of Education: History, Trends and Prospects” (DETP 2020) PB - Atlantis Press SP - 364 EP - 368 SN - 2352-5398 UR - https://doi.org/10.2991/assehr.k.200509.066 DO - 10.2991/assehr.k.200509.066 ID - Glazkova2020 ER -