Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018)

Multimodal Cross-guided Attention Networks for Visual Question Answering

Authors
Haibin Liu, Shengrong Gong, Yi Ji, Jianyu Yang, Tengfei Xing, Chunping Liu
Corresponding Author
Haibin Liu
Available Online April 2018.
DOI
10.2991/cmsa-18.2018.80How to use a DOI?
Keywords
visual question answering; attention; cross-guided; gated activation
Abstract

Visual Question Answering (VQA) is an attractive topic combin-ing computer vision with natural language processing. It is more challenging than text-based question answering because of its multimodal nature. The VQA reasoning process requires both effective semantic embedding and fine-grained visual compre-hension. Existing approaches predominantly infer answers from visual spatial information, while neglecting important semantic information in questions and the guidance information between images and questions. To remedy this, we imitate the human mechanism of cross-reasoning about visual and textual infor-mation and propose a multimodal cross-guided attention net-work (MCAN) for VQA which employs a cross-guided joint learning strategy with a gated activation learning method, which can simultaneously capture both rich visual spatial information and significant semantic information. We evaluate the proposed model on two public datasets: VQA dataset and COCO-QA da-taset. Extensive experiments show state-of-the-art performance on the datasets.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018)
Series
Advances in Intelligent Systems Research
Publication Date
April 2018
ISBN
978-94-6252-523-8
ISSN
1951-6851
DOI
10.2991/cmsa-18.2018.80How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Haibin Liu
AU  - Shengrong Gong
AU  - Yi Ji
AU  - Jianyu Yang
AU  - Tengfei Xing
AU  - Chunping Liu
PY  - 2018/04
DA  - 2018/04
TI  - Multimodal Cross-guided Attention Networks for Visual Question Answering
BT  - Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018)
PB  - Atlantis Press
SP  - 347
EP  - 353
SN  - 1951-6851
UR  - https://doi.org/10.2991/cmsa-18.2018.80
DO  - 10.2991/cmsa-18.2018.80
ID  - Liu2018/04
ER  -