Semi-supervised Chinese Word Segmentation based on Bilingual Information
Chen W(陈炜); Xu B(徐波); Chen,Wei
2015-09
会议日期2015-9
会议地点Lisbon, Portugal
关键词Chinese Word Segmentation Semi-supervised Bilingual
英文摘要

This paper presents a bilingual semi- supervised Chinese word segmentation (CWS) method that leverages the nat- ural segmenting information of English sentences. The proposed method in- volves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple sub- models. We use a sub-model of condi- tional random fields (CRF) to learn mono- lingual grammars, a sub-model based on character-based alignment to obtain ex- plicit segmenting knowledge, and anoth- er sub-model based on transliteration sim- ilarity to detect out-of-vocabulary (OOV) words. Moreover, we propose a sub-model leveraging neural network to ensure the proper treatment of the semantic gap and a phrase-based translation sub-model to s- core the translation probability of the Chi- nese segmentation and its corresponding English sentences. A cascaded log-linear model is employed to combine these fea- tures to segment bilingual unlabeled data, the results of which are used to justify the original supervised CWS model. The eval- uation shows that our method results in su- perior results compared with those of the state-of-the-art monolingual and bilingual semi-supervised models that have been re- ported in the literature. 


会议录Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)
内容类型会议论文
源URL[http://ir.ia.ac.cn/handle/173211/11801]  
专题数字内容技术与服务研究中心_听觉模型与认知计算
通讯作者Chen,Wei
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Chen W,Xu B,Chen,Wei. Semi-supervised Chinese Word Segmentation based on Bilingual Information[C]. 见:. Lisbon, Portugal. 2015-9.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace