Video Description Method based on Semantic Information Filtering and Sentence Length Modulation
- DOI
- 10.2991/978-94-6463-040-4_68How to use a DOI?
- Keywords
- Video description; Encoder-decoder; Fusion mechanism; Sentence length modulation; Deep learning
- Abstract
In the current video description task, the spatial redundancy information in the video features is usually not effectively eliminated, and the commonly used loss function is composed of the logarithm of the probability of the correct word of the target, and the long sentences formed often bring great losses to the model. If the sentence length generated by the optimization of the log-likelihood loss function is too short, the description semantics will be incomplete and the accuracy will not be high. This paper proposes a video description method based on semantic information filtering and sentence length modulation to solve the above problems. Firstly, the model introduces a gated fusion mechanism, which removes redundant information in the semantic information of video features by screening the semantic features of the video, reduces the interference of redundant information on the generated description, and improves the accuracy of the description. Secondly, a new sentence length modulation loss function is proposed, which modulates the cross-entropy loss function with the label sentence length, which alleviates the tendency of the model to generate short sentences, and makes the semantics of the generated description close to the label, thereby improving the accuracy of the description. The experimental results on the MSVD dataset, which is widely used in this field, show that the method in this paper can significantly improve the accuracy of generating video descriptions, and all indicators are significantly better than existing models.
- Copyright
- © 2023 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Xiangqing Wang AU - Xiaodong Cai AU - Meixin Zhou AU - Qingnan Huang PY - 2022 DA - 2022/12/27 TI - Video Description Method based on Semantic Information Filtering and Sentence Length Modulation BT - Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022) PB - Atlantis Press SP - 446 EP - 452 SN - 2589-4900 UR - https://doi.org/10.2991/978-94-6463-040-4_68 DO - 10.2991/978-94-6463-040-4_68 ID - Wang2022 ER -