Video Description Method based on Semantic Information Filtering and Sentence Length Modulation

Xiangqing Wang; Xiaodong Cai; Meixin Zhou; Qingnan Huang

doi:10.2991/978-94-6463-040-4_68

<Previous Article In Volume

Next Article In Volume>

Video Description Method based on Semantic Information Filtering and Sentence Length Modulation

Authors

Xiangqing Wang¹, Xiaodong Cai¹^{, *}, Meixin Zhou¹, Qingnan Huang¹

¹School of Information and Communication, Guilin University of Electronic Technology, Guilin, China

^*Corresponding author. Email: caixiaodong@guet.edu.cn

Corresponding Author

Xiaodong Cai

Available Online 27 December 2022.

DOI: 10.2991/978-94-6463-040-4_68 How to use a DOI?
Keywords: Video description; Encoder-decoder; Fusion mechanism; Sentence length modulation; Deep learning
Abstract: In the current video description task, the spatial redundancy information in the video features is usually not effectively eliminated, and the commonly used loss function is composed of the logarithm of the probability of the correct word of the target, and the long sentences formed often bring great losses to the model. If the sentence length generated by the optimization of the log-likelihood loss function is too short, the description semantics will be incomplete and the accuracy will not be high. This paper proposes a video description method based on semantic information filtering and sentence length modulation to solve the above problems. Firstly, the model introduces a gated fusion mechanism, which removes redundant information in the semantic information of video features by screening the semantic features of the video, reduces the interference of redundant information on the generated description, and improves the accuracy of the description. Secondly, a new sentence length modulation loss function is proposed, which modulates the cross-entropy loss function with the label sentence length, which alleviates the tendency of the model to generate short sentences, and makes the semantics of the generated description close to the label, thereby improving the accuracy of the description. The experimental results on the MSVD dataset, which is widely used in this field, show that the method in this paper can significantly improve the accuracy of generating video descriptions, and all indicators are significantly better than existing models.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
Series: Atlantis Highlights in Computer Sciences
Publication Date: 27 December 2022
ISBN: 978-94-6463-040-4
ISSN: 2589-4900
DOI: 10.2991/978-94-6463-040-4_68 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Xiangqing Wang
AU  - Xiaodong Cai
AU  - Meixin Zhou
AU  - Qingnan Huang
PY  - 2022
DA  - 2022/12/27
TI  - Video Description Method based on Semantic Information Filtering and Sentence Length Modulation
BT  - Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
PB  - Atlantis Press
SP  - 446
EP  - 452
SN  - 2589-4900
UR  - https://doi.org/10.2991/978-94-6463-040-4_68
DO  - 10.2991/978-94-6463-040-4_68
ID  - Wang2022
ER  -

download .riscopy to clipboard