A Method for Calculating the Similarity of TF - IDF Texts for Synonyms in Biomedical Domains
- DOI
- 10.2991/fmsmt-17.2017.117How to use a DOI?
- Keywords
- FT-IDF texts, Synonyms, Biomedical domains
- Abstract
In the traditional text similarity calculation, most of the TF-IDF method. TF-IDF establishes the word frequency vector for the text, and calculates the cosine between the vectors as the similarity of the text. The algorithm is widely used in many search engines, information retrieval system can be seen, but in the text of the vocabulary processing is not ideal. The synonyms between professional phrases are not perceived by models, and they are used as different words to calculate similarity. In this paper, synonymous with biomedical field as an example, in the TF-IDF model embedded synonyms recognition function. Firstly, this method acquires the synonyms of the vocabulary in the biomedical field and establishes the synonyms, then identifies the synonyms in the TF-IDF model and calculates the better weight of the phrase. The experimental results show that this method can effectively improve the precision of text similarity calculation in biomedical field, and it is a more effective than the traditional TF-IDF text similarity calculation method.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Miao Hao AU - Ke Fan PY - 2017/04 DA - 2017/04 TI - A Method for Calculating the Similarity of TF - IDF Texts for Synonyms in Biomedical Domains BT - Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017) PB - Atlantis Press SP - 578 EP - 583 SN - 2352-5401 UR - https://doi.org/10.2991/fmsmt-17.2017.117 DO - 10.2991/fmsmt-17.2017.117 ID - Hao2017/04 ER -