Semi-supervised Chinese Word Segmentation based on Bilingual Information | |
Chen W(陈炜); Xu B(徐波); Chen,Wei | |
2015-09 | |
会议日期 | 2015-9 |
会议地点 | Lisbon, Portugal |
关键词 | Chinese Word Segmentation Semi-supervised Bilingual |
英文摘要 | This paper presents a bilingual semi- supervised Chinese word segmentation (CWS) method that leverages the nat- ural segmenting information of English sentences. The proposed method in- volves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple sub- models. We use a sub-model of condi- tional random fields (CRF) to learn mono- lingual grammars, a sub-model based on character-based alignment to obtain ex- plicit segmenting knowledge, and anoth- er sub-model based on transliteration sim- ilarity to detect out-of-vocabulary (OOV) words. Moreover, we propose a sub-model leveraging neural network to ensure the proper treatment of the semantic gap and a phrase-based translation sub-model to s- core the translation probability of the Chi- nese segmentation and its corresponding English sentences. A cascaded log-linear model is employed to combine these fea- tures to segment bilingual unlabeled data, the results of which are used to justify the original supervised CWS model. The eval- uation shows that our method results in su- perior results compared with those of the state-of-the-art monolingual and bilingual semi-supervised models that have been re- ported in the literature. |
会议录 | Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/11801] |
专题 | 数字内容技术与服务研究中心_听觉模型与认知计算 |
通讯作者 | Chen,Wei |
作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Chen W,Xu B,Chen,Wei. Semi-supervised Chinese Word Segmentation based on Bilingual Information[C]. 见:. Lisbon, Portugal. 2015-9. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论