Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

	Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
	Z. Wang; X. Liu; L. Chen; L. Wang; Y. Qiao; X. Xie; C. Fowlkes
	2018
会议日期	2018
会议地点	美国
英文摘要	Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-ofspeech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs 1. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice.
URL标识	查看原文
内容类型	会议论文
源URL	[http://ir.siat.ac.cn:8080/handle/172644/13696]
专题	深圳先进技术研究院_集成所
推荐引用方式 GB/T 7714	Z. Wang,X. Liu,L. Chen,et al. Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering[C]. 见:. 美国. 2018.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们