1
|
Zhu H, Terashi G, Farheen F, Nakamura T, Kihara D. AI-based quality assessment methods for protein structure models from cryo-EM. Curr Res Struct Biol 2025; 9:100164. [PMID: 39996138 PMCID: PMC11848767 DOI: 10.1016/j.crstbi.2025.100164] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/23/2025] [Accepted: 01/29/2025] [Indexed: 02/26/2025] Open
Abstract
Cryogenic electron microscopy (cryo-EM) has revolutionized structural biology, with an increasing number of structures being determined by cryo-EM each year, many at higher resolutions. However, challenges remain in accurately interpreting cryo-EM maps. Inaccuracies can arise in regions of locally low resolution, where manual model building is more prone to errors. Validation scores for structure models have been developed to assess both the compatibility between map density and the structure, as well as the geometric and stereochemical properties of protein models. Recent advancements have introduced artificial intelligence (AI) into this field. These emerging AI-driven tools offer unique capabilities in the validation and refinement of cryo-EM-derived protein atomic models, potentially leading to more accurate protein structures and deeper insights into complex biological systems.
Collapse
Affiliation(s)
- Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Farhanaz Farheen
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Tsukasa Nakamura
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Structural Biology Research Center, High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki, 305-0801, Japan
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
2
|
Bekker G, Nagao C, Shirota M, Nakamura T, Katayama T, Kihara D, Kinoshita K, Kurisu G. Protein Data Bank Japan: Improved tools for sequence-oriented analysis of protein structures. Protein Sci 2025; 34:e70052. [PMID: 39969112 PMCID: PMC11837027 DOI: 10.1002/pro.70052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 01/19/2025] [Accepted: 01/23/2025] [Indexed: 02/20/2025]
Abstract
Protein Data Bank Japan (PDBj) is the Asian hub of three-dimensional macromolecular structure data, and a founding member of the worldwide Protein Data Bank. We have accepted, processed, and distributed experimentally determined biological macromolecular structures for over two decades. Although we collaborate with RCSB PDB and BMRB in the United States, PDBe and EMDB in Europe and recently PDBc in China for our data-in activities, we have developed our own unique services and tools for searching, exploring, visualizing and analyzing protein structures. We have recently introduced a new UniProt-integrated portal to provide users with a quick overview of their target protein and shows a recommended structure with integrated data from various internal and external resources. The portal page helps users identify known genomic variations of their protein of interest and provide insights into how these modifications might impact the structure, stability and dynamics of the protein. Furthermore, the portal page also helps users to select the optimal structure to use for further analysis. We have also introduced another service to explore proteins using experimental and computational approaches, which enables experimental structural biologists to increase their insight to help them to more efficiently design their experimental studies. With these new additions, we have enhanced our service portfolio to benefit both experimental and computational structural biologists in their search to interpret protein structures, their dynamics and function.
Collapse
Affiliation(s)
| | - Chioko Nagao
- Institute for Protein ResearchOsaka UniversitySuitaJapan
| | - Matsuyuki Shirota
- Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan
- Advanced Research Center for Innovations in Next‐Generation MedicineTohoku UniversitySendaiJapan
- Graduate School of Information SciencesTohoku UniversitySendaiJapan
| | - Tsukasa Nakamura
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Structural Biology Research Center, Institute of Material Structure ScienceHigh Energy Accelerator Research OrganizationTsukubaJapan
| | - Toshiaki Katayama
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Database Center for Life Science, Joint Support‐Center for Data Science ResearchResearch Organization of Information and SystemsKashiwaJapan
| | - Daisuke Kihara
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Structural Biology Research Center, Institute of Material Structure ScienceHigh Energy Accelerator Research OrganizationTsukubaJapan
- Department of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Kengo Kinoshita
- Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan
- Advanced Research Center for Innovations in Next‐Generation MedicineTohoku UniversitySendaiJapan
- Graduate School of Information SciencesTohoku UniversitySendaiJapan
| | - Genji Kurisu
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Protein Research FoundationMinohJapan
| |
Collapse
|
3
|
Bekker GJ, Nagao C, Shirota M, Nakamura T, Katayama T, Kihara D, Kinoshita K, Kurisu G. Protein Data Bank Japan: Computational Resources for Analysis of Protein Structures. J Mol Biol 2025:169013. [PMID: 40133793 DOI: 10.1016/j.jmb.2025.169013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 03/27/2025]
Abstract
Protein Data Bank Japan (PDBj, https://pdbj.org/) is the Asian hub of three-dimensional macromolecular structure data, and a founding member of the worldwide Protein Data Bank. We have accepted, processed, and distributed experimentally determined biological macromolecular structures for over two decades. Although we collaborate with RCSB PDB and BMRB in the United States, PDBe and EMDB in Europe and recently PDBc in China for our data-in activities, we have developed our own unique services and tools for searching, exploring, visualizing, and analyzing protein structures. We have also developed novel archives for computational data and raw crystal diffraction images. Recently, we introduced the Sequence Navigator Pro service to explore proteins using experimental and computational approaches, which enables experimental structural biologists to increase their insight to help them to design their experimental studies more efficiently. In addition, we also introduced a new UniProt-integrated portal to provide users with a quick overview of their target protein and it shows a recommended structure and integrates data from various internal and external resources. With these new additions, we have enhanced our service portfolio to benefit both experimental as computational structural biologists in their search to interpret protein structures, their dynamics and function.
Collapse
Affiliation(s)
- Gert-Jan Bekker
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan.
| | - Chioko Nagao
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Matsuyuki Shirota
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8573, Japan; Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Sendai, Miyagi 980-8573, Japan; Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 980-8579, Japan
| | - Tsukasa Nakamura
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Structural Biology Research Center, Institute of Material Structure Science, High Energy Accelerator Research Organization, 1-1 Oho, Tsukuba, Ibaraki 305-0801 Japan
| | - Toshiaki Katayama
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | - Daisuke Kihara
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Structural Biology Research Center, Institute of Material Structure Science, High Energy Accelerator Research Organization, 1-1 Oho, Tsukuba, Ibaraki 305-0801 Japan; Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Kengo Kinoshita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8573, Japan; Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Sendai, Miyagi 980-8573, Japan; Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 980-8579, Japan
| | - Genji Kurisu
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Protein Research Foundation, Ina 4-1-2, Minoh, Osaka 562-8686, Japan.
| |
Collapse
|
4
|
Li S, Terashi G, Zhang Z, Kihara D. Advancing structure modeling from cryo-EM maps with deep learning. Biochem Soc Trans 2025; 53:BST20240784. [PMID: 39927816 DOI: 10.1042/bst20240784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 01/16/2025] [Accepted: 01/21/2025] [Indexed: 02/11/2025]
Abstract
Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by enabling the determination of biomolecular structures that are challenging to resolve using conventional methods. Interpreting a cryo-EM map requires accurate modeling of the structures of underlying biomolecules. Here, we concisely discuss the evolution and current state of automatic structure modeling from cryo-EM density maps. We classify modeling methods into two categories: de novo modeling methods from high-resolution maps (better than 5 Å) and methods that model by fitting individual structures of component proteins to maps at lower resolution (worse than 5 Å). Special attention is given to the role of deep learning in the modeling process, highlighting how AI-driven approaches are transformative in cryo-EM structure modeling. We conclude by discussing future directions in the field.
Collapse
Affiliation(s)
- Shu Li
- Department of Computer Science, Purdue University, West Lafayette, IN, U.S.A
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, U.S.A
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, U.S.A
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, U.S.A
- Department of Biological Sciences, Purdue University, West Lafayette, IN, U.S.A
| |
Collapse
|
5
|
Farheen F, Terashi G, Zhu H, Kihara D. AI-based methods for biomolecular structure modeling for Cryo-EM. Curr Opin Struct Biol 2025; 90:102989. [PMID: 39864242 PMCID: PMC11793015 DOI: 10.1016/j.sbi.2025.102989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/29/2024] [Accepted: 01/04/2025] [Indexed: 01/28/2025]
Abstract
Cryo-electron microscopy (Cryo-EM) has revolutionized structural biology by enabling the determination of macromolecular structures that were challenging to study with conventional methods. Processing cryo-EM data involves several computational steps to derive three-dimensional structures from raw projections. Recent advancements in artificial intelligence (AI) including deep learning have significantly improved the performance of these processes. In this review, we discuss state-of-the-art AI-based techniques used in key steps of cryo-EM data processing, including macromolecular structure modeling and heterogeneity analysis.
Collapse
Affiliation(s)
- Farhanaz Farheen
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA; Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
6
|
Punuru P, Jain A, Kihara D. Secondary Structure Detection and Structure Modeling for Cryo-EM. Methods Mol Biol 2025; 2870:341-355. [PMID: 39543043 DOI: 10.1007/978-1-0716-4213-9_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Rapid advancements in cryogenic electron microscopy (cryo-EM) have revolutionized the field of structural biology by enabling the determination of complex macromolecular structures at unprecedented resolutions. When cryo-EM density maps have a resolution around 3 Å, the atomic structure can be modeled manually. However, as the resolution decreases, analyzing these density maps becomes increasingly challenging. For modeling structures in lower resolution maps, deep learning can be used to identify structural features in the maps to assist in structure modeling.Here, we present a suite of deep learning-based tools developed by our lab that enable structural biologists to work with cryo-EM maps of a wide range of resolutions. For cryo-EM maps at near-atomic resolution (5 Å or better), DeepMainmast automatically models all-atom structures by tracing the main chain from local map features of amino acids and atoms detected by deep learning; DAQ score quantifies map-model fit and indicates potential misassignments in protein models. In intermediate resolution maps (5-10 Å), Emap2sec and Emap2sec+ can accurately detect protein secondary structures and nucleic acids. These tools and more are available at our web server: https://em.kiharalab.org/ .
Collapse
Affiliation(s)
- Pranav Punuru
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Anika Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
7
|
Baghirov J, Zhu H, Wang X, Kihara D. Protein Secondary Structure and DNA/RNA Detection for Cryo-EM and Cryo-ET Using Emap2sec and Emap2sec . Methods Mol Biol 2025; 2867:105-120. [PMID: 39576577 DOI: 10.1007/978-1-0716-4196-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Cryo-electron microscopy (cryo-EM) has become a powerful tool for determining the structures of macromolecules, such as proteins and DNA/RNA complexes. While high-resolution cryo-EM maps are increasingly available, there is still a substantial number of maps determined at intermediate or low resolution. These maps present challenges when it comes to extracting structural information. In response to this, two computational methods, Emap2sec and Emap2sec+, have been developed by our group to address these challenges and benefit the analysis of cryo-EM maps. In this chapter, we describe how to use the web servers of two of our structure analysis software for cryo-EM, Emap2sec and Emapsec+. Both methods identify local structures in medium-resolution EM maps of 5-10 Å to help find and fit protein and DNA/RNA structures in EM maps. Emap2sec identifies the secondary structures of proteins, while Emap2sec+ also identifies DNA/RNA locations in cryo-EM maps. As cryo-electron tomogram (cryo-ET) has started to produce data of this resolution, these methods would be useful for cryo-ET, too. Both methods are available in the form of webservers and source code at https://kiharalab.org/emsuites/ .
Collapse
Affiliation(s)
- Javad Baghirov
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
8
|
Sánchez Rodríguez F, Simpkin AJ, Chojnowski G, Keegan RM, Rigden DJ. Using deep-learning predictions reveals a large number of register errors in PDB depositions. IUCRJ 2024; 11:938-950. [PMID: 39387575 PMCID: PMC11533997 DOI: 10.1107/s2052252524009114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 09/17/2024] [Indexed: 10/15/2024]
Abstract
The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map-model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3-5 Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.
Collapse
Affiliation(s)
- Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
- Life ScienceDiamond Light SourceHarwell Science and Innovation CampusDidcotOX11 0DEUnited Kingdom
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUnited Kingdom
| | - Adam J. Simpkin
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
| | - Grzegorz Chojnowski
- European Molecular Biology LaboratoryHamburg Unit, Notkestrasse 8522607HamburgGermany
| | - Ronan M. Keegan
- UKRI–STFCRutherford Appleton LaboratoryResearch Complex at HarwellDidcotOX11 0FAUnited Kingdom
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
| |
Collapse
|
9
|
Bou‐Abdallah F, Fish J, Terashi G, Zhang Y, Kihara D, Arosio P. Unveiling the stochastic nature of human heteropolymer ferritin self-assembly mechanism. Protein Sci 2024; 33:e5104. [PMID: 38995055 PMCID: PMC11241160 DOI: 10.1002/pro.5104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 06/18/2024] [Accepted: 06/23/2024] [Indexed: 07/13/2024]
Abstract
Despite ferritin's critical role in regulating cellular and systemic iron levels, our understanding of the structure and assembly mechanism of isoferritins, discovered over eight decades ago, remains limited. Unveiling how the composition and molecular architecture of hetero-oligomeric ferritins confer distinct functionality to isoferritins is essential to understanding how the structural intricacies of H and L subunits influence their interactions with cellular machinery. In this study, ferritin heteropolymers with specific H to L subunit ratios were synthesized using a uniquely engineered plasmid design, followed by high-resolution cryo-electron microscopy analysis and deep learning-based amino acid modeling. Our structural examination revealed unique architectural features during the self-assembly mechanism of heteropolymer ferritins and demonstrated a significant preference for H-L heterodimer formation over H-H or L-L homodimers. Unexpectedly, while dimers seem essential building blocks in the protein self-assembly process, the overall mechanism of ferritin self-assembly is observed to proceed randomly through diverse pathways. The physiological significance of these findings is discussed including how ferritin microheterogeneity could represent a tissue-specific adaptation process that imparts distinctive tissue-specific functions to isoferritins.
Collapse
Affiliation(s)
- Fadi Bou‐Abdallah
- Department of ChemistryState University of New YorkPotsdamNew YorkUSA
| | - Jeremie Fish
- Department of Electrical & Computer EngineeringCoulter School of Engineering, Clarkson UniversityPotsdamNew YorkUSA
| | - Genki Terashi
- Department of Biological Sciences and Department of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Yuanyuan Zhang
- Department of Biological Sciences and Department of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Daisuke Kihara
- Department of Biological Sciences and Department of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Paolo Arosio
- Department of Molecular and Translational MedicineUniversity of BresciaBresciaItaly
| |
Collapse
|
10
|
Terashi G, Wang X, Prasad D, Nakamura T, Kihara D. DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction. Nat Methods 2024; 21:122-131. [PMID: 38066344 DOI: 10.1038/s41592-023-02099-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 10/22/2023] [Indexed: 12/19/2023]
Abstract
Three-dimensional structure modeling from maps is an indispensable step for studying proteins and their complexes with cryogenic electron microscopy. Although the resolution of determined cryogenic electron microscopy maps has generally improved, there are still many cases where tracing protein main chains is difficult, even in maps determined at a near-atomic resolution. Here we developed a protein structure modeling method, DeepMainmast, which employs deep learning to capture the local map features of amino acids and atoms to assist main-chain tracing. Moreover, we integrated AlphaFold2 with the de novo density tracing protocol to combine their complementary strengths and achieved even higher accuracy than each method alone. Additionally, the protocol is able to accurately assign the chain identity to the structure models of homo-multimers, which is not a trivial task for existing methods.
Collapse
Affiliation(s)
- Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Devashish Prasad
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Tsukasa Nakamura
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|