Attention Analysis and Calibration for Transformer in Natural Language Generation

doi:10.1109/TASLP.2022.3180678

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室 > 自然语言处理

	Attention Analysis and Calibration for Transformer in Natural Language Generation
	Lu, Yu 2,3; Zhang, Jiajun 2,3; Zeng, Jiali 1; Wu, Shuangzhi 1; Zong, Chengqing 2,3
刊名	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
	2022
卷号	30 页码:1927-1938
关键词	Graphics Magnetization Symbols Magnetostatics Speech processing Permeability Image color analysis Attention mechanism interpretability Transformer attention calibration
ISSN号	2329-9290
DOI	10.1109/TASLP.2022.3180678
通讯作者	Zhang, Jiajun(jjzhang@nlpr.ia.ac.cn)
英文摘要	Attention mechanism has been ubiquitous in neural machine translation by dynamically selecting relevant contexts for different translations. Apart from performance gains, attention weights assigned to input tokens are often utilized to explain that high-attention tokens contribute more to the prediction. However, many works question whether this assumption holds in text classification by manually manipulating attention weights and observing decision flips. This article extends this question to Transformer-based neural machine translation, which heavily relies on cross-lingual attention to produce accurate translations but is relatively understudied in this context. We first design a mask perturbation model which automatically assesses each input's contribution to model outputs. We then test whether the token contributing most to the current translation receives the highest attention weight. We find that it sometimes does not, which closely depends on the entropy of attention weights, the syntactic role of the current generation, and language pairs. We also rethink the discrepancy between attention weights and word alignments from the view of unreliable attention weights. Our observations further motivate us to calibrate the cross-lingual multi-head attention by attaching more attention to indispensable tokens, whose removal leads to a dramatic performance drop. Empirical experiments on different-scale translation tasks and text summarization tasks demonstrate that our calibration methods significantly outperform strong baselines.
资助项目	Natural Science Foundation of China[62122088] ; Natural Science Foundation of China[U1836221] ; Natural Science Foundation of China[62006224]
WOS研究方向	Acoustics ; Engineering
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:000811572000003
资助机构	Natural Science Foundation of China
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/49623]
专题	模式识别国家重点实验室_自然语言处理
通讯作者	Zhang, Jiajun
作者单位	1.Dept Tencent Cloud Xiaowei, Beijing 100089, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Lu, Yu,Zhang, Jiajun,Zeng, Jiali,et al. Attention Analysis and Calibration for Transformer in Natural Language Generation[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2022,30:1927-1938.
APA	Lu, Yu,Zhang, Jiajun,Zeng, Jiali,Wu, Shuangzhi,&Zong, Chengqing.(2022).Attention Analysis and Calibration for Transformer in Natural Language Generation.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,30,1927-1938.
MLA	Lu, Yu,et al."Attention Analysis and Calibration for Transformer in Natural Language Generation".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 30(2022):1927-1938.