Named Entity Recognition in Chinese Medical Literature Using Pretraining Models
Wang, Yu1,2; Sun, Yining1,2; Ma, Zuchang1; Gao, Lisheng1; Xu, Yang1
刊名SCIENTIFIC PROGRAMMING
2020-09-09
卷号2020
ISSN号1058-9244
DOI10.1155/2020/8812754
通讯作者Sun, Yining(ynsun@iim.ac.cn)
英文摘要The medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. Named Entity Recognition (NER) is the initial step in extracting this knowledge from unstructured text and presenting it as a Knowledge Graph (KG). However, the previous approaches of NER have often suffered from small-scale human-labelled training data. Furthermore, extracting knowledge from Chinese medical literature is a more complex task because there is no segmentation between Chinese characters. Recently, the pretraining models, which obtain representations with the prior semantic knowledge on large-scale unlabelled corpora, have achieved state-of-the-art results for a wide variety of Natural Language Processing (NLP) tasks. However, the capabilities of pretraining models have not been fully exploited, and applications of other pretraining models except BERT in specific domains, such as NER in Chinese medical literature, are also of interest. In this paper, we enhance the performance of NER in Chinese medical literature using pretraining models. First, we propose a method of data augmentation by replacing the words in the training set with synonyms through the Mask Language Model (MLM), which is a pretraining task. Then, we consider NER as the downstream task of the pretraining model and transfer the prior semantic knowledge obtained during pretraining to it. Finally, we conduct experiments to compare the performances of six pretraining models (BERT, BERT-WWM, BERT-WWM-EXT, ERNIE, ERNIE-tiny, and RoBERTa) in recognizing named entities from Chinese medical literature. The effects of feature extraction and fine-tuning, as well as different downstream model structures, are also explored. Experimental results demonstrate that the method of data augmentation we proposed can obtain meaningful improvements in the performance of recognition. Besides, RoBERTa-CRF achieves the highestF1-score compared with the previous methods and other pretraining models.
资助项目major special project of Anhui Science and Technology Department[18030801133] ; Science and Technology Service Network Initiative[KFJ-STS-ZDTP-079]
WOS研究方向Computer Science
语种英语
出版者HINDAWI LTD
WOS记录号WOS:000574404000001
资助机构major special project of Anhui Science and Technology Department ; Science and Technology Service Network Initiative
内容类型期刊论文
源URL[http://ir.hfcas.ac.cn:8080/handle/334002/104341]  
专题中国科学院合肥物质科学研究院
通讯作者Sun, Yining
作者单位1.Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Anhui Prov Key Lab Med Phys & Technol, Hefei 230031, Peoples R China
2.Univ Sci & Technol China, Hefei 230026, Peoples R China
推荐引用方式
GB/T 7714
Wang, Yu,Sun, Yining,Ma, Zuchang,et al. Named Entity Recognition in Chinese Medical Literature Using Pretraining Models[J]. SCIENTIFIC PROGRAMMING,2020,2020.
APA Wang, Yu,Sun, Yining,Ma, Zuchang,Gao, Lisheng,&Xu, Yang.(2020).Named Entity Recognition in Chinese Medical Literature Using Pretraining Models.SCIENTIFIC PROGRAMMING,2020.
MLA Wang, Yu,et al."Named Entity Recognition in Chinese Medical Literature Using Pretraining Models".SCIENTIFIC PROGRAMMING 2020(2020).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace