A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
ZHAO Yan ; SHI Hui
刊名chinese journal of library and information science
2012-12-25
卷号5期号:4页码:77-92
关键词Chinese-English mixed documents String matching Accuracy of automatic indexing Cybernetics Dedicated hepatitis B virus (HBV) database
ISSN号1674-3393
通讯作者yan zhao (e-mail: zhaoyan2000@shisu.edu.cn)
中文摘要

purpose: the thrust of this paper is to present a method for improving the accuracy of automatic indexing of chinese-english mixed documents.
design/methodology/approach: based on the inherent characteristics of chinese-english mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. it consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of chinese-english mixed documents. an experiment was conducted to investigate the effect of our proposed method.
findings: this method distinguishes chinese and english documents in grammatical structures and word formation rules. through the implementation of this method in the three phases of automatic indexing for the chinese-english mixed documents, the results were encouraging. the precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
research limitations: the indexing method is relatively complicated and the whole indexing process requires substantial human intervention. due to pattern matching based on a bruteforce (bf) approach, the indexing efficiency has been reduced to some extent.
practical implications: the research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to chinese-english mixed documents). the proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
originality/value: so far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. this study will provide insights into the automatic indexing of multilingual documents, especially chinese-english mixed documents.

英文摘要

purpose: the thrust of this paper is to present a method for improving the accuracy of automatic indexing of chinese-english mixed documents.
design/methodology/approach: based on the inherent characteristics of chinese-english mixed texts and the cybernetics theory, we proposed an integrated control method for indexing  documents. it consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of chinese-english mixed documents. an experiment was conducted to investigate the effect of our proposed method.
findings: this method distinguishes chinese and english documents in grammatical structures and word formation rules. through the implementation of this method in the three phases of automatic indexing for the chinese-english mixed documents, the results were encouraging. the precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%.
research limitations: the indexing method is relatively complicated and the whole indexing process requires substantial human intervention. due to pattern matching based on a bruteforce (bf) approach, the indexing efficiency has been reduced to some extent.
practical implications: the research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to chinese-english mixed documents). the proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.
originality/value: so far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. this study will provide insights into the automatic indexing of multilingual documents, especially chinese-english mixed documents.

学科主题编辑出版
原文出处http://www.chinalibraries.net
公开日期2012-12-11
内容类型期刊论文
源URL[http://ir.las.ac.cn/handle/12502/5628]  
专题文献情报中心_Journal of Data and Information Science_Chinese Journal of Library and Information Science-2012
推荐引用方式
GB/T 7714
ZHAO Yan,SHI Hui. A method for improving the accuracy of automatic indexing of Chinese-English mixed documents[J]. chinese journal of library and information science,2012,5(4):77-92.
APA ZHAO Yan,&SHI Hui.(2012).A method for improving the accuracy of automatic indexing of Chinese-English mixed documents.chinese journal of library and information science,5(4),77-92.
MLA ZHAO Yan,et al."A method for improving the accuracy of automatic indexing of Chinese-English mixed documents".chinese journal of library and information science 5.4(2012):77-92.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace