Proceedings of the 2016 International Conference on Mechatronics Engineering and Information Technology

Multi Semantic Feature Fusion Framework for Video Segmentation and Description

Authors
Rui Liang, Qingxin Zhu
Corresponding Author
Rui Liang
Available Online August 2016.
DOI
10.2991/icmeit-16.2016.74How to use a DOI?
Keywords
Video Semantic Analysis, Video Segmentation and Description, Deep Learning, Multi Feature Fusion.
Abstract

It is a difficult task to make machine understanding video and describe it in natural language. In the reality, videos are much longer than these video clips in research experiments, each video contains multi parts of semantic. It is a challenge work to describe a long video, it requires to control the granularity of the video's semantics, exclude redundancy information and give complete description. This task is very important for video understanding and video retrieving. In the paper, we proposed a framework to solve these problems. The framework consists of two stage: video segmentation and video description, the two stage can divide into five steps, firstly extracts features of video sequence with pre-trained deep learning models, secondly fuse different features of a same frame into a feature vector with a weight vector, thirdly generates a histogram of similarity (HOS) of adjacent frames' feature vectors in sequence, fourthly uses a threshold t to divide the video into short fragments of different semantic, finally uses LSTM networks which take frame sequences' features of each fragment as input and output natural language description for each fragment. Our research handles the 'in-the-wild' long videos, it can enhance the comprehensibility of long video, it is meaningful in the task of understanding and describing video.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Mechatronics Engineering and Information Technology
Series
Advances in Engineering Research
Publication Date
August 2016
ISBN
978-94-6252-222-0
ISSN
2352-5401
DOI
10.2991/icmeit-16.2016.74How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Rui Liang
AU  - Qingxin Zhu
PY  - 2016/08
DA  - 2016/08
TI  - Multi Semantic Feature Fusion Framework for Video Segmentation and Description
BT  - Proceedings of the 2016 International Conference on Mechatronics Engineering and Information Technology
PB  - Atlantis Press
SP  - 388
EP  - 392
SN  - 2352-5401
UR  - https://doi.org/10.2991/icmeit-16.2016.74
DO  - 10.2991/icmeit-16.2016.74
ID  - Liang2016/08
ER  -