Are Differences in Genomic Data Sets due to True Biological Variants or Errors in Genome Assembly: An Example from Two Chloroplast Genomes | |
Wu, Zhiqiang; Tembrock, Luke R.; Ge, Song | |
刊名 | PLOS ONE |
2015 | |
卷号 | 10期号:2 |
ISSN号 | 1932-6203 |
DOI | 10.1371/journal.pone.0118019 |
文献子类 | Article |
英文摘要 | DNA sequencing has been revolutionized by the development of high-throughput sequencing technologies. Plummeting costs and the massive throughput capacities of second and third generation sequencing platforms have transformed many fields of biological research. Concurrently, new data processing pipelines made rapid de novo genome assemblies possible. However, high quality data are critically important for all investigations in the genomic era. We used chloroplast genomes of one Oryza species (O. australiensis) to compare differences in sequence quality: one genome (GU592209) was obtained through Illumina sequencing and reference-guided assembly and the other genome (KJ830774) was obtained via target enrichment libraries and shotgun sequencing. Based on the whole genome alignment, GU592209 was more similar to the reference genome (O. sativa: AY522330) with 99.2% sequence identity (SI value) compared with the 98.8% SI values in the KJ830774 genome; whereas the opposite result was obtained when the SI values in coding and noncoding regions of GU592209 and KJ830774 were compared. Additionally, the junctions of two single copies and repeat copies in the chloroplast genome exhibited differences. Phylogenetic analyses were conducted using these sequences, and the different data sets yielded dissimilar topologies: phylogenetic replacements of the two individuals were remarkably different based on whole genome sequencing or SNP data and insertions and deletions (indels) data. Thus, we concluded that the genomic composition of GU592209 was heterogeneous in coding and non-coding regions. These findings should impel biologists to carefully consider the quality of sequencing and assembly when working with next-generation data. |
学科主题 | Multidisciplinary Sciences |
出版地 | SAN FRANCISCO |
WOS关键词 | PHYLOGENETIC ANALYSIS ; GENE ORGANIZATION ; DNA ; SEQUENCE ; INDELS ; DIVERSIFICATION ; IDENTIFICATION ; TRANSFORMATION ; CHALLENGES ; EVOLUTION |
WOS研究方向 | Science Citation Index Expanded (SCI-EXPANDED) |
语种 | 英语 |
出版者 | PUBLIC LIBRARY SCIENCE |
WOS记录号 | WOS:000349444900251 |
资助机构 | National Natural Science Foundation of China [30990240] |
内容类型 | 期刊论文 |
源URL | [http://ir.ibcas.ac.cn/handle/2S10CLM1/26022] |
专题 | 系统与进化植物学国家重点实验室 |
作者单位 | 1.Wu, Zhiqiang; Tembrock, Luke R.] Colorado State Univ, Dept Biol, Ft Collins, CO 80523 USA 2.Chinese Acad Sci, Inst Bot, State Key Lab Systemat & Evolutionary Bot, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Wu, Zhiqiang,Tembrock, Luke R.,Ge, Song. Are Differences in Genomic Data Sets due to True Biological Variants or Errors in Genome Assembly: An Example from Two Chloroplast Genomes[J]. PLOS ONE,2015,10(2). |
APA | Wu, Zhiqiang,Tembrock, Luke R.,&Ge, Song.(2015).Are Differences in Genomic Data Sets due to True Biological Variants or Errors in Genome Assembly: An Example from Two Chloroplast Genomes.PLOS ONE,10(2). |
MLA | Wu, Zhiqiang,et al."Are Differences in Genomic Data Sets due to True Biological Variants or Errors in Genome Assembly: An Example from Two Chloroplast Genomes".PLOS ONE 10.2(2015). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论