CORC  > 厦门大学  > 信息技术-已发表论文
ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs)
Liang, C. ; Wang, G. ; Liu, L. ; Ji, G. L. ; Fang, L. ; Liu, Y. S. ; Carter, K. ; Webb, J. S. ; Dean, J. F. D. ; Ji GL(吉国力)
刊名http://dx.doi.org/10.1186/1471-2164-8-134
2007-05-29
关键词HUMAN GENES LIBRARIES PINE CDNA POLYADENYLATION IDENTIFICATION GENERATION
英文摘要Background: With the advent of low-cost, high-throughput sequencing, the amount of public domain Expressed Sequence Tag (EST) sequence data available for both model and non-model organism is growing exponentially. While these data are widely used for characterizing various genomes, they also present a serious challenge for data quality control and validation due to their inherent deficiencies, particularly for species without genome sequences. Description: ConiferEST is an integrated system for data reprocessing, visualization and mining of conifer ESTs. In its current release, Build 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from reprocessing raw DNA sequencer traces using our software - WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features including cloning vector segments, adapter sequences, restriction endonuclease recognition sites, polyA and polyT runs, and their corresponding Phred quality values. Based on these putative features, verified sequence features such as 3' and/or 5' termini of cDNA inserts in either sense or non-sense strand have been identified in-silico. Interestingly, only 30.03% of the designated 3' ESTs were found to have an authenticated 5' terminus in the non-sense strand (i.e., polyT tails), while fewer than 5.34% of the designated 5' ESTs had a verified 5' terminus in the sense strand. Such previously ignored features provide valuable insight for data quality control and validation of error-prone ESTs, as well as the ability to identify novel functional motifs embedded in large EST datasets. We found that "double-termini adapters" were effective indicators of potential EST chimeras. For all sequences with in-silico verified termin/terminus, we used InterProScan to assign protein domain signatures, results of which are available for in-depth exploration using our biologist-friendly web interfaces. Conclusion: ConiferEST represents a unique and complementary public resource for EST data integration and mining in conifers by reprocessing raw DNA traces, identifying putative sequence features and determining and annotating in-silico verified features. Seamlessly integrated with other public resources, ConiferEST provides biologists powerful tools to verify data, visualize abnormalities, including EST chimeras, and explore large EST datasets.
语种英语
内容类型期刊论文
源URL[http://dspace.xmu.edu.cn/handle/2288/70722]  
专题信息技术-已发表论文
推荐引用方式
GB/T 7714
Liang, C.,Wang, G.,Liu, L.,et al. ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs)[J]. http://dx.doi.org/10.1186/1471-2164-8-134,2007.
APA Liang, C..,Wang, G..,Liu, L..,Ji, G. L..,Fang, L..,...&吉国力.(2007).ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs).http://dx.doi.org/10.1186/1471-2164-8-134.
MLA Liang, C.,et al."ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs)".http://dx.doi.org/10.1186/1471-2164-8-134 (2007).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace