Proceedings of the 2023 4th International Conference on Big Data and Social Sciences (ICBDSS 2023)

Similarity analysis method of power unstructured text based on multi-dimensional fusion feature extraction

Authors
Li Yongle1, *, Chen Jiaqi1, Liu Yang1, Sheng Shuang1, Zheng Ling2, Chen Fei2
1Big Data Center of State Grid Corporation of China, Beijing, China
2North China Electric Power University, Beijing, China
*Corresponding author. Email: lyl_d08@163.com
Corresponding Author
Li Yongle
Available Online 27 October 2023.
DOI
10.2991/978-94-6463-276-7_11How to use a DOI?
Keywords
Deep metric learning; Text feature extraction; MatchPyramid; Pseudo Siamese Network
Abstract

Similarity analysis of power unstructured text is one of the most important tasks in power unstructured data management. This paper studies the feature extraction and similarity analysis of electric power unstructured text. A multi-dimensional fusion feature extraction-based similarity analysis method is proposed to capture the features of unstructured text with more keywords and strong professionalism in electric power. This method improves the MatchPyramid model. In the input layer, word vectors generated by BERT model are used to strengthen the relationship between semantics. In the matching layer, the matching matrix between texts is constructed according to word vectors. In the feature extraction layer, the multi-word feature vectors extracted by BERT model are extracted in a multidimensional text fusion feature vector by means of dense connection. The higher-order feature vector of unstructured text is obtained. The higher-order features of different unstructured texts were input into Pseudo Siamese Network for similarity analysis. This method improves the semantic feature extraction ability and similarity analysis accuracy of unstructured text. Experiments show that compared with the traditional MatchPyramid model, the proposed method improves the feature extraction accuracy of unstructured text by 2.66% and F1 value by 2.99%.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2023 4th International Conference on Big Data and Social Sciences (ICBDSS 2023)
Series
Atlantis Highlights in Social Sciences, Education and Humanities
Publication Date
27 October 2023
ISBN
978-94-6463-276-7
ISSN
2667-128X
DOI
10.2991/978-94-6463-276-7_11How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Li Yongle
AU  - Chen Jiaqi
AU  - Liu Yang
AU  - Sheng Shuang
AU  - Zheng Ling
AU  - Chen Fei
PY  - 2023
DA  - 2023/10/27
TI  - Similarity analysis method of power unstructured text based on multi-dimensional fusion feature extraction
BT  - Proceedings of the 2023 4th International Conference on Big Data and Social Sciences (ICBDSS 2023)
PB  - Atlantis Press
SP  - 86
EP  - 94
SN  - 2667-128X
UR  - https://doi.org/10.2991/978-94-6463-276-7_11
DO  - 10.2991/978-94-6463-276-7_11
ID  - Yongle2023
ER  -