Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bitard-Feildel T, Callebaut I. Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017;7:41425. [PMID: 28134276 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open

For:	Bitard-Feildel T, Callebaut I. Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017;7:41425. [PMID: 28134276 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open

Number

Cited by Other Article(s)

Fang A, Zhang Z, Zhou A, Zitnik M. ATOMICA: Learning Universal Representations of Intermolecular Interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.02.646906. [PMID: 40291688 PMCID: PMC12026499 DOI: 10.1101/2025.04.02.646906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]

Abstract

Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. This siloed approach prevents generalization across biomolecular classes and limits the ability to model interaction interfaces systematically. We introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across diverse biomolecular modalities, including small molecules, metal ions, amino acids, and nucleic acids. ATOMICA uses a self-supervised denoising and masking objective to train on 2,037,972 interaction complexes and generate hierarchical embeddings at the levels of atoms, chemical blocks, and molecular interfaces. The model generalizes across molecular classes and recovers shared physicochemical features without supervision. Its latent space captures compositional and chemical similarities across interaction types and follows scaling laws that improve representation quality with increasing biomolecular data modalities. We apply ATOMICA to construct five modality-specific interfaceome networks, termed ATOMICAN et s, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. These networks identify disease pathways across 27 conditions and predict disease-associated proteins in autoimmune neuropathies and lymphoma. Finally, we use ATOMICA to annotate the dark proteome-proteins lacking known structure or function-by predicting 2,646 previously uncharacterized ligand-binding sites. These include putative zinc finger motifs and transmembrane cytochrome subunits, demonstrating that ATOMICA enables systematic annotation of molecular interactions across the proteome.

Collapse

Yang H, Li Q, Stroup EK, Wang S, Ji Z. Widespread stable noncanonical peptides identified by integrated analyses of ribosome profiling and ORF features. Nat Commun 2024;15:1932. [PMID: 38431639 PMCID: PMC10908861 DOI: 10.1038/s41467-024-46240-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 02/18/2024] [Indexed: 03/05/2024] Open

Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023;91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]

Bruley A, Mornon JP, Duprat E, Callebaut I. Digging into the 3D Structure Predictions of AlphaFold2 with Low Confidence: Disorder and Beyond. Biomolecules 2022;12:1467. [PMID: 36291675 PMCID: PMC9599455 DOI: 10.3390/biom12101467] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 01/12/2023] Open

Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. eLife 2022;11:e67667. [PMID: 35356891 PMCID: PMC9132574 DOI: 10.7554/elife.67667] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open

Affiliation(s)

Chiara Vanni Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Jacobs University BremenBremenGermany
Matthew S Schechter Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Department of Medicine, University of ChicagoChicagoUnited States
Silvia G Acinas Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
Albert Barberán Department of Environmental Science, University of ArizonaTucsonUnited States
Pier Luigi Buttigieg Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener InstituteBremerhavenGermany
Emilio O Casamayor Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for ResearchBlanesSpain
Tom O Delmont Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-SaclayEvryFrance
Carlos M Duarte Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and TechnologyThuwalSaudi Arabia
A Murat Eren Department of Medicine, University of ChicagoChicagoUnited States Josephine Bay Paul Center, Marine Biological LaboratoryWoods HoleUnited States
Robert D Finn European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
Renzo Kottmann Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
Alex Mitchell European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
Pablo Sánchez Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
Kimmo Siren Section for Evolutionary Genomics, The GLOBE Institute, University of CopenhagenCopenhagenDenmark
Martin Steinegger School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
Frank Oliver Gloeckner Jacobs University BremenBremenGermany University of Bremen and Life Sciences and ChemistryBremenGermany Computing Center, Helmholtz Center for Polar and Marine ResearchBremerhavenGermany
Antonio Fernàndez-Guerra Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of CopenhagenCopenhagenDenmark

Collapse

Kulkarni P, Bhattacharya S, Achuthan S, Behal A, Jolly MK, Kotnala S, Mohanty A, Rangarajan G, Salgia R, Uversky V. Intrinsically Disordered Proteins: Critical Components of the Wetware. Chem Rev 2022;122:6614-6633. [PMID: 35170314 PMCID: PMC9250291 DOI: 10.1021/acs.chemrev.1c00848] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Papadopoulos C, Chevrollier N, Lopes A. Exploring the Peptide Potential of Genomes. Methods Mol Biol 2022;2405:63-82. [PMID: 35298808 DOI: 10.1007/978-1-0716-1855-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021;31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]

Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021;39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Uversky VN. Torches, Candles, Lamps, Lanterns, Flashlights, Spotlights, Night Vision Goggles… You Need Them All to See in Darkness. Proteomics 2020;19:e1900085. [PMID: 30829430 DOI: 10.1002/pmic.201900085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Lamiable A, Bitard-Feildel T, Rebehmed J, Quintus F, Schoentgen F, Mornon JP, Callebaut I. A topology-based investigation of protein interaction sites using Hydrophobic Cluster Analysis. Biochimie 2019;167:68-80. [PMID: 31525399 DOI: 10.1016/j.biochi.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/11/2019] [Indexed: 01/20/2023]

Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019;18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Perdigão N, Rosa A. Dark Proteome Database: Studies on Dark Proteins. High Throughput 2019;8:ht8020008. [PMID: 30934744 PMCID: PMC6630768 DOI: 10.3390/ht8020008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 12/27/2022] Open

Faure G, Jézéquel K, Roisné-Hamelin F, Bitard-Feildel T, Lamiable A, Marcand S, Callebaut I. Discovery and Evolution of New Domains in Yeast Heterochromatin Factor Sir4 and Its Partner Esc1. Genome Biol Evol 2019;11:572-585. [PMID: 30668669 PMCID: PMC6394760 DOI: 10.1093/gbe/evz010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2019] [Indexed: 12/22/2022] Open

Bitard‐Feildel T, Lamiable A, Mornon J, Callebaut I. Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences. Proteomics 2018;18:e1800054. [PMID: 30299594 PMCID: PMC7168002 DOI: 10.1002/pmic.201800054] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/29/2018] [Indexed: 12/17/2022]

Kulkarni P, Uversky VN. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. Proteomics 2018;18:e1800061. [DOI: 10.1002/pmic.201800061] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 09/07/2018] [Indexed: 12/27/2022]

Hoffmann B, Elbahnsi A, Lehn P, Décout JL, Pietrucci F, Mornon JP, Callebaut I. Combining theoretical and experimental data to decipher CFTR 3D structures and functions. Cell Mol Life Sci 2018;75:3829-3855. [PMID: 29779042 PMCID: PMC11105360 DOI: 10.1007/s00018-018-2835-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 05/04/2018] [Accepted: 05/07/2018] [Indexed: 12/15/2022]

Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018;285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]

Unique function words characterize genomic proteins. Proc Natl Acad Sci U S A 2018;115:6703-6708. [PMID: 29895692 PMCID: PMC6042118 DOI: 10.1073/pnas.1801182115] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

The vast, mostly unknown protein universe can be explored by analyzing protein sequences as a string of domains. A broader coverage can be achieved when these domains, the essential blocks in protein evolution, are detected using sequence profiles. Using clustering to collapse redundant profiles into unique function words (UFWs), we find that over the years 2009–2016, the number of UFWs saturates while the number of sequences matched by a combination of two or more UFWs grows exponentially.

Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).

Collapse

Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017;12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open

Abstract

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.

Collapse