Accurate and efficient protein sequence design through learning concise local environment of residues | |
Huang, Bin; Fan, Tingwen2; Wang, Kaiyue3,4; Zhang, Haicang5; Yu, Chungong5; Nie, Shuyu2,6; Qi, Yangshuo2,6; Zheng, Wei-Mou; Han, Jian2; Fan, Zheng8 | |
刊名 | BIOINFORMATICS |
2023 | |
卷号 | 39期号:3页码:btad122 |
关键词 | COMPUTATIONAL DESIGN PROLINE |
ISSN号 | 1367-4803 |
DOI | 10.1093/bioinformatics/btad122 |
英文摘要 | MotivationComputational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.ResultsHere, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.Availability and implementationThe source code of ProDESIGN-LE is available at . |
学科主题 | Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Computer Science ; Mathematical & Computational Biology ; Mathematics |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://ir.itp.ac.cn/handle/311006/27921] |
专题 | 理论物理研究所_理论物理所1978-2010年知识产出 |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, SKLP, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100110, Peoples R China 3.Chinese Acad Sci, Inst Microbiol, Key Lab Microbial Physiol & Metab Engn, State Key Lab Mycol, Beijing 100101, Peoples R China 4.Beihang Univ, Beijing Adv Innovat Ctr Big Data Based Precis Med, Sch Engn Med, Beijing 100083, Peoples R China 5.Beihang Univ, Key Lab Big Data based Precis Med, Minist Ind & Informat Technol Peoples Republ China, Beijing 100083, Peoples R China 6.Zhongke Big Data Acad, Zhengzhou 450046, Henan, Peoples R China 7.Hebei Univ, Sch Life Sci, Baoding 071002, Hebei, Peoples R China 8.Chinese Acad Sci, Inst Theoret Phys, Beijing 100190, Peoples R China 9.Chinese Acad Sci, Inst Microbiol, Inst Ctr Shared Technol & Facil, Beijing 100101, Peoples R China |
推荐引用方式 GB/T 7714 | Huang, Bin,Fan, Tingwen,Wang, Kaiyue,et al. Accurate and efficient protein sequence design through learning concise local environment of residues[J]. BIOINFORMATICS,2023,39(3):btad122. |
APA | Huang, Bin.,Fan, Tingwen.,Wang, Kaiyue.,Zhang, Haicang.,Yu, Chungong.,...&Bu, Dongbo.(2023).Accurate and efficient protein sequence design through learning concise local environment of residues.BIOINFORMATICS,39(3),btad122. |
MLA | Huang, Bin,et al."Accurate and efficient protein sequence design through learning concise local environment of residues".BIOINFORMATICS 39.3(2023):btad122. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论