Proceedings of the 2022 2nd International Conference on Computer Technology and Media Convergence Design (CTMCD 2022)

Diversify Keyphrase Generation with Subtopic Content Modeling

Authors
Yanyan Ge1, 3, *, Peng Yang1, 2, 3, Wenjun Li1, 3
1School of Computer Science and Engineering, Southeast University, Nanjing, China
2School of Cyber Science and Engineering, Southeast University, Nanjing, China
3Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing, China
*Corresponding author. Email: geyanyan@seu.edu.cn
Corresponding Author
Yanyan Ge
Available Online 17 December 2022.
DOI
10.2991/978-94-6463-046-6_33How to use a DOI?
Keywords
Keyphrase Generation; Seq2Seq; VAE; Graph Clustering
Abstract

Keyphrase generation is a task of automatically creating keyphrases that reflect core information expressed in a given document. Keyphrase generation is actually a one-to-many problem, as it is possible predict phrases from different aspects of the document. In this paper, we explore the diversity of keyphrase generation and propose a novel model to improve the diversity of generated keyphrases by using the hierarchical structure of text content. Specifically, we relate hierarchical content with subtopics, which are modeled by a subgraph with the technique of graph clustering. In the generation stage, a multi-decoder is adopted to allow generating keyphrases in parallel, where each decoder corresponds to a subtopic. In addition, to take into account various means of expression, we introduce conditional variational autoencoder to enhance wording diversity. Experimental results on a public dataset confirms that our proposed method outperforms state-of-the-art methods on quantitative metrics and improves the keyphrase novelty.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2022 2nd International Conference on Computer Technology and Media Convergence Design (CTMCD 2022)
Series
Advances in Computer Science Research
Publication Date
17 December 2022
ISBN
978-94-6463-046-6
ISSN
2352-538X
DOI
10.2991/978-94-6463-046-6_33How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yanyan Ge
AU  - Peng Yang
AU  - Wenjun Li
PY  - 2022
DA  - 2022/12/17
TI  - Diversify Keyphrase Generation with Subtopic Content Modeling
BT  - Proceedings of the 2022 2nd International Conference on Computer Technology and Media Convergence Design (CTMCD 2022)
PB  - Atlantis Press
SP  - 276
EP  - 283
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-046-6_33
DO  - 10.2991/978-94-6463-046-6_33
ID  - Ge2022
ER  -