Multi Semantic Feature Fusion Framework for Video Segmentation and Description

Rui Liang; Qingxin Zhu

doi:10.2991/icmeit-16.2016.74

<Previous Article In Volume

Next Article In Volume>

Multi Semantic Feature Fusion Framework for Video Segmentation and Description

Authors

Rui Liang, Qingxin Zhu

Corresponding Author

Rui Liang

Available Online August 2016.

DOI: 10.2991/icmeit-16.2016.74 How to use a DOI?
Keywords: Video Semantic Analysis, Video Segmentation and Description, Deep Learning, Multi Feature Fusion.
Abstract: It is a difficult task to make machine understanding video and describe it in natural language. In the reality, videos are much longer than these video clips in research experiments, each video contains multi parts of semantic. It is a challenge work to describe a long video, it requires to control the granularity of the video's semantics, exclude redundancy information and give complete description. This task is very important for video understanding and video retrieving. In the paper, we proposed a framework to solve these problems. The framework consists of two stage: video segmentation and video description, the two stage can divide into five steps, firstly extracts features of video sequence with pre-trained deep learning models, secondly fuse different features of a same frame into a feature vector with a weight vector, thirdly generates a histogram of similarity (HOS) of adjacent frames' feature vectors in sequence, fourthly uses a threshold t to divide the video into short fragments of different semantic, finally uses LSTM networks which take frame sequences' features of each fragment as input and output natural language description for each fragment. Our research handles the 'in-the-wild' long videos, it can enhance the comprehensibility of long video, it is meaningful in the task of understanding and describing video.
Copyright: © 2016, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2016 International Conference on Mechatronics Engineering and Information Technology
Series: Advances in Engineering Research
Publication Date: August 2016
ISBN: 978-94-6252-222-0
ISSN: 2352-5401
DOI: 10.2991/icmeit-16.2016.74 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Rui Liang
AU  - Qingxin Zhu
PY  - 2016/08
DA  - 2016/08
TI  - Multi Semantic Feature Fusion Framework for Video Segmentation and Description
BT  - Proceedings of the 2016 International Conference on Mechatronics Engineering and Information Technology
PB  - Atlantis Press
SP  - 388
EP  - 392
SN  - 2352-5401
UR  - https://doi.org/10.2991/icmeit-16.2016.74
DO  - 10.2991/icmeit-16.2016.74
ID  - Liang2016/08
ER  -

download .riscopy to clipboard