1
|
Robert Kolar M, Mitra D, Kobzarenko V. Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization. PLoS Comput Biol 2025; 21:e1012391. [PMID: 40273414 DOI: 10.1371/journal.pcbi.1012391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 04/08/2025] [Indexed: 04/26/2025] Open
Abstract
We have developed a robust method for efficiently tracking multiple co-occurring mutations in a sequence database. Evolution often hinges on the interaction of several mutations to produce significant phenotypic changes that lead to the proliferation of a variant. However, identifying numerous simultaneous mutations across a vast database of sequences poses a significant computational challenge. Our approach leverages a matrix factorization technique to automatically and efficiently pinpoint subsets of positions where co-mutations occur, appearing in a substantial number of sequences within the database. We validated our method using SARS-CoV-2 receptor-binding domains, comprising approximately seven hundred thousand sequences of the Spike protein, demonstrating superior performance compared to a reasonably exhaustive brute-force method. Furthermore, we explore the biological significance of the identified co-mutational positions (CMPs) and their potential impact on the virus's evolution and functionality, identifying key mutations in Delta and Omicron variants. This analysis underscores the significant role of identified CMPs in understanding the evolutionary trajectory. By tracking the "birth" and "death" of CMPs, we can elucidate the persistence and impact of specific groups of mutations across different viral strains, providing valuable insights into the virus' adaptability and thus, possibly aiding vaccine design strategies.
Collapse
Affiliation(s)
- Michael Robert Kolar
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| | - Debasis Mitra
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| | - Valerie Kobzarenko
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| |
Collapse
|
2
|
Fung TS, Ryu KW, Thompson CB. Arginine: at the crossroads of nitrogen metabolism. EMBO J 2025; 44:1275-1293. [PMID: 39920310 PMCID: PMC11876448 DOI: 10.1038/s44318-025-00379-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 12/06/2024] [Accepted: 12/10/2024] [Indexed: 02/09/2025] Open
Abstract
L-arginine is the most nitrogen-rich amino acid, acting as a key precursor for the synthesis of nitrogen-containing metabolites and an essential intermediate in the clearance of excess nitrogen. Arginine's side chain possesses a guanidino group which has unique biochemical properties, and plays a primary role in nitrogen excretion (urea), cellular signaling (nitric oxide) and energy buffering (phosphocreatine). The post-translational modification of protein-incorporated arginine by guanidino-group methylation also contributes to epigenetic gene control. Most human cells do not synthesize sufficient arginine to meet demand and are dependent on exogenous arginine. Thus, dietary arginine plays an important role in maintaining health, particularly upon physiologic stress. How cells adapt to changes in extracellular arginine availability is unclear, mostly because nearly all tissue culture media are supplemented with supraphysiologic levels of arginine. Evidence is emerging that arginine-deficiency can influence disease progression. Here, we review new insights into the importance of arginine as a metabolite, emphasizing the central role of mitochondria in arginine synthesis/catabolism and the recent discovery that arginine can act as a signaling molecule regulating gene expression and organelle dynamics.
Collapse
Affiliation(s)
- Tak Shun Fung
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Keun Woo Ryu
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Craig B Thompson
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
| |
Collapse
|
3
|
Pouresmaeil M, Azizi-Dargahlou S. Investigation of CaMV-host co-evolution through synonymous codon pattern. J Basic Microbiol 2024; 64:e2300664. [PMID: 38436477 DOI: 10.1002/jobm.202300664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/20/2024] [Accepted: 02/10/2024] [Indexed: 03/05/2024]
Abstract
Cauliflower mosaic virus (CaMV) has a double-stranded DNA genome and is globally distributed. The phylogeny tree of 121 CaMV isolates was categorized into two primary groups, with Iranian isolates showing the greatest genetic variations. Nucleotide A demonstrated the highest percentage (36.95%) in the CaMV genome and the dinucleotide odds ratio analysis revealed that TC dinucleotide (1.34 ≥ 1.23) and CG dinucleotide (0.63 ≤ 0.78) are overrepresented and underrepresented, respectively. Relative synonymous codon usage (RSCU) analysis confirmed codon usage bias in CaMV and its hosts. Brassica oleracea and Brassica rapa, among the susceptible hosts of CaMV, showed a codon adaptation index (CAI) value above 0.8. Additionally, relative codon deoptimization index (RCDI) results exhibited the highest degree of deoptimization in Raphanus sativus. These findings suggest that the genes of CaMV underwent codon adaptation with its hosts. Among the CaMV open reading frames (ORFs), genes that produce reverse transcriptase and virus coat proteins showed the highest CAI value of 0.83. These genes are crucial for the creation of new virion particles. The results confirm that CaMV co-evolved with its host to ensure the optimal expression of its genes in the hosts, allowing for easy infection and effective spread. To detect the force behind codon usage bias, an effective number of codons (ENC)-plot and neutrality plot were conducted. The results indicated that natural selection is the primary factor influencing CaMV codon usage bias.
Collapse
Affiliation(s)
- Mahin Pouresmaeil
- Faculty of Agriculture and Natural Resources, University of Mohaghegh Ardabili, Ardabil, Iran
| | - Shahnam Azizi-Dargahlou
- Agricultural Biotechnology, Seed and Plant Certification and Registration Institute, Ardabil Agricultural and Natural Resources Research Center, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| |
Collapse
|
4
|
Lei L, Burton ZF. The 3 31 Nucleotide Minihelix tRNA Evolution Theorem and the Origin of Life. Life (Basel) 2023; 13:2224. [PMID: 38004364 PMCID: PMC10672568 DOI: 10.3390/life13112224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 11/08/2023] [Accepted: 11/16/2023] [Indexed: 11/26/2023] Open
Abstract
There are no theorems (proven theories) in the biological sciences. We propose that the 3 31 nt minihelix tRNA evolution theorem be universally accepted as one. The 3 31 nt minihelix theorem completely describes the evolution of type I and type II tRNAs from ordered precursors (RNA repeats and inverted repeats). Despite the diversification of tRNAome sequences, statistical tests overwhelmingly support the theorem. Furthermore, the theorem relates the dominant pathway for the origin of life on Earth, specifically, how tRNAomes and the genetic code may have coevolved. Alternate models for tRNA evolution (i.e., 2 minihelix, convergent and accretion models) are falsified. In the context of the pre-life world, tRNA was a molecule that, via mutation, could modify anticodon sequences and teach itself to code. Based on the tRNA sequence, we relate the clearest history to date of the chemical evolution of life. From analysis of tRNA evolution, ribozyme-mediated RNA ligation was a primary driving force in the evolution of complexity during the pre-life-to-life transition. TRNA formed the core for the evolution of living systems on Earth.
Collapse
Affiliation(s)
- Lei Lei
- School of Biological Sciences, University of New England, Biddeford, ME 04005, USA;
| | - Zachary Frome Burton
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
5
|
Fontecilla-Camps JC. Reflections on the Origin and Early Evolution of the Genetic Code. Chembiochem 2023; 24:e202300048. [PMID: 37052530 DOI: 10.1002/cbic.202300048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/01/2023] [Indexed: 04/14/2023]
Abstract
Examination of the genetic code (GeCo) reveals that amino acids coded by (A/U) codons display a large functional spectrum and bind RNA whereas, except for Arg, those coded by (G/C) codons do not. From a stereochemical viewpoint, the clear preference for (A/U)-rich codons to be located at the GeCo half blocks suggests they were specifically determined. Conversely, the overall lower affinity of cognate amino acids for their (G/C)-rich anticodons points to their late arrival to the GeCo. It is proposed that i) initially the code was composed of the eight (A/U) codons; ii) these codons were duplicated when G/C nucleotides were added to their wobble positions, and three new codons with G/C in their first position were incorporated; and iii) a combination of A/U and G/C nucleotides progressively generated the remaining codons.
Collapse
|
6
|
Hallee L, Khomtchouk BB. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 2023; 13:2088. [PMID: 36747072 PMCID: PMC9902438 DOI: 10.1038/s41598-023-28965-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/27/2023] [Indexed: 02/08/2023] Open
Abstract
In this study, we investigate how an organism's codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19713, USA
| | - Bohdan B Khomtchouk
- Department of BioHealth Informatics, Center for Computational Biology and Bioinformatics, Indiana University, Indianapolis, IN, 46202, USA.
| |
Collapse
|
7
|
Chirumbolo S, Vella A. Molecules, Information and the Origin of Life: What Is Next? Molecules 2021; 26:molecules26041003. [PMID: 33672848 PMCID: PMC7917628 DOI: 10.3390/molecules26041003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/09/2021] [Accepted: 02/10/2021] [Indexed: 12/20/2022] Open
Abstract
How life did originate and what is life, in its deepest foundation? The texture of life is known to be held by molecules and their chemical-physical laws, yet a thorough elucidation of the aforementioned questions still stands as a puzzling challenge for science. Focusing solely on molecules and their laws has indirectly consolidated, in the scientific knowledge, a mechanistic (reductionist) perspective of biology and medicine. This occurred throughout the long historical path of experimental science, affecting subsequently the onset of the many theses and speculations about the origin of life and its maintenance. Actually, defining what is life, asks for a novel epistemology, a ground on which living systems’ organization, whose origin is still questioned via chemistry, physics and even philosophy, may provide a new key to focus onto the complex nature of the human being. In this scenario, many issues, such as the role of information and water structure, have been long time neglected from the theoretical basis on the origin of life and marginalized as a kind of scenic backstage. On the contrary, applied science and technology went ahead on considering molecules as the sole leading components in the scenery. Water physics and information dynamics may have a role in living systems much more fundamental than ever expected. Can an organism be simply explained by a mechanistic view of its nature or we need “something else”? Probably, we can earn sound foundations about life by simply changing our prejudicial view about living systems simply as complex, highly ordered machines. In this manuscript we would like to reappraise many fundamental aspects of molecular and chemical biology and reading them through a new paradigm, which includes Prigogine’s dissipative structures and informational dissipation (Shannon dissipation). This would provide readers with insightful clues about how biology and chemistry may be thoroughly revised, referring to new models, such as informational dissipation. We trust they are enabled to address a straightforward contribution in elucidating what life is for science. This overview is not simply a philosophical speculation, but it would like to affect deeply our way to conceive and describe the foundations of organisms’ life, providing intriguing suggestions for readers in the field.
Collapse
Affiliation(s)
- Salvatore Chirumbolo
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, 37134 Verona, Italy
- Correspondence: ; Tel.: +39-0458027645
| | - Antonio Vella
- Verona-Unit of Immunology, Azienda Ospedaliera Universitaria Integrata, 37134 Verona, Italy;
| |
Collapse
|
8
|
The origin of the genetic code and origin of ideas. J Theor Biol 2021; 516:110615. [PMID: 33545188 DOI: 10.1016/j.jtbi.2021.110615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 01/21/2021] [Accepted: 01/26/2021] [Indexed: 11/22/2022]
Abstract
Inouye et al. (2020) use the observation that Ser is coded in the genetic code by two blocks of codons that differ on more than one base to understand some aspects of the origin of the genetic code organization. I argue instead that this observation per se cannot be used to understand any aspect of the origin of the genetic code, unless it is accompanied by other assumptions concerning in the specific case: (i) the ancestrality of some amino acids, (ii) the hypothesis that the first mRNA to be translated was poly-G, which can be translated into poly-Gly, and (iii) an evolutionary mechanism for the genetic code origin based on the duplication of tRNAs. However, both the tRNA duplication mechanism and the existence of poly-G as the first mRNA to be translated are not corroborated as mechanisms through which the genetic code would have been structured. For example, the origin of the actual mRNA should have been preceded by the evolution of a proto-mRNA which evidently already coded for more than one amino acid. Therefore, when it evolved from proto-mRNA, the mRNA should already have coded for more than one amino acid. In other words, poly-G as mRNA would most likely never have existed because the first mRNAs already had to code for more than one amino acid. On the contrary, all these assumptions would have been operational if the observations of Inouye et al. (2020) had been discussed within the coevolution theory of the origin of the genetic code, which they do not.
Collapse
|
9
|
Abstract
Diverse models have been advanced for the evolution of the genetic code. Here, models for tRNA, aminoacyl-tRNA synthetase (aaRS) and genetic code evolution were combined with an understanding of EF-Tu suppression of tRNA 3rd anticodon position wobbling. The result is a highly detailed scheme that describes the placements of all amino acids in the standard genetic code. The model describes evolution of 6-, 4-, 3-, 2- and 1-codon sectors. Innovation in column 3 of the code is explained. Wobbling and code degeneracy are explained. Separate distribution of serine sectors between columns 2 and 4 of the code is described. We conclude that very little chaos contributed to evolution of the genetic code and that the pattern of evolution of aaRS enzymes describes a history of the evolution of the code. A model is proposed to describe the biological selection for the earliest evolution of the code and for protocell evolution.
Collapse
Affiliation(s)
- Lei Lei
- Department of Biology, University of New England, Biddeford, ME, USA
| | - Zachary Frome Burton
- Department of Biochemistry and Molecular Biology, Michigan State University, E. Lansing, MI, USA
| |
Collapse
|
10
|
Profile of Masayori Inouye. Proc Natl Acad Sci U S A 2020; 117:28543-28545. [DOI: 10.1073/pnas.2021565117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|