1
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
2
|
Pharmacogenes (PGx-genes): Current understanding and future directions. Gene 2019; 718:144050. [DOI: 10.1016/j.gene.2019.144050] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 12/14/2022]
|
3
|
Prioritization of Variants for Investigation of Genotype-Directed Nutrition in Human Superpopulations. Int J Mol Sci 2019; 20:ijms20143516. [PMID: 31323740 PMCID: PMC6678450 DOI: 10.3390/ijms20143516] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 07/04/2019] [Accepted: 07/16/2019] [Indexed: 01/06/2023] Open
Abstract
Dietary guidelines recommended by key health agencies are generally designed for a global population. However, ethnicity affects human disease and environment-gene interactions, including nutrient intake. Historically, isolated human populations with different genetic backgrounds have adapted to distinct environments with varying food sources. Ethnicity is relevant to the interaction of food intake with genes and disease susceptibility; yet major health agencies generally do not recommend food and nutrients codified by population genotypes and their frequencies. In this paper, we have consolidated published nutrigenetic variants and examine their frequencies in human superpopulations to prioritize these variants for future investigation of population-specific genotype-directed nutrition. The nutrients consumed by individuals interact with their genome and may alter disease risk. Herein, we searched the literature, designed a data model, and manually curated hundreds of papers. The resulting database houses 101 variants that reached significance (p < 0.05), from 35 population studies. Nutrigenetic variants associated with modified nutrient intake have the potential to reduce the risk of colorectal cancer, obesity, metabolic syndrome, type 2 diabetes, and several other diseases. Since many nutrigenetic studies have identified a major variant in some populations, we suggest that superpopulation-specific genotype-directed nutrition modifications be prioritized for future study and evaluation. Genotype-directed nutrition approaches to dietary modification have the potential to reduce disease risk in select human populations.
Collapse
|
4
|
Marano LA, Marcorin L, Castelli EDC, Mendes-Junior CT. Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project. Genet Mol Biol 2017; 40:530-539. [PMID: 28486572 PMCID: PMC5488459 DOI: 10.1590/1678-4685-gmb-2016-0180] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/23/2016] [Indexed: 02/06/2023] Open
Abstract
The advent of next-generation sequencing allows simultaneous processing of several
genomic regions/individuals, increasing the availability and accuracy of whole-genome
data. However, these new approaches may present some errors and bias due to
alignment, genotype calling, and imputation methods. Despite these flaws, data
obtained by next-generation sequencing can be valuable for population and
evolutionary studies of specific genes, such as genes related to how pigmentation
evolved among populations, one of the main topics in human evolutionary biology.
Melanocortin-1 receptor (MC1R) is one of the most studied genes
involved in pigmentation variation. As MC1R has already been
suggested to affect melanogenesis and increase risk of developing melanoma, it
constitutes one of the best models to understand how natural selection acts on
pigmentation. Here we employed a locally developed pipeline to obtain genotype and
haplotype data for MC1R from the raw sequencing data provided by the
1000 Genomes FTP site. We also compared such genotype data to Phase
3 VCF to evaluate its quality and discover any polymorphic sites that may have been
overlooked. In conclusion, either the VCF file or one of the presently described
pipelines could be used to obtain reliable and accurate genotype calling from the
1000 Genomes Phase 3 data.
Collapse
Affiliation(s)
- Leonardo Arduino Marano
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| | - Letícia Marcorin
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| | - Erick da Cruz Castelli
- Departamento de Patologia, Faculdade de Medicina de Botucatu, Universidade Estadual Paulista "Júlio de Mesquita Filho,"(UNESP) Botucatu, SP, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
5
|
Rossier BC, Baker ME, Studer RA. Epithelial sodium transport and its control by aldosterone: the story of our internal environment revisited. Physiol Rev 2015; 95:297-340. [PMID: 25540145 DOI: 10.1152/physrev.00011.2014] [Citation(s) in RCA: 162] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Transcription and translation require a high concentration of potassium across the entire tree of life. The conservation of a high intracellular potassium was an absolute requirement for the evolution of life on Earth. This was achieved by the interplay of P- and V-ATPases that can set up electrochemical gradients across the cell membrane, an energetically costly process requiring the synthesis of ATP by F-ATPases. In animals, the control of an extracellular compartment was achieved by the emergence of multicellular organisms able to produce tight epithelial barriers creating a stable extracellular milieu. Finally, the adaptation to a terrestrian environment was achieved by the evolution of distinct regulatory pathways allowing salt and water conservation. In this review we emphasize the critical and dual role of Na(+)-K(+)-ATPase in the control of the ionic composition of the extracellular fluid and the renin-angiotensin-aldosterone system (RAAS) in salt and water conservation in vertebrates. The action of aldosterone on transepithelial sodium transport by activation of the epithelial sodium channel (ENaC) at the apical membrane and that of Na(+)-K(+)-ATPase at the basolateral membrane may have evolved in lungfish before the emergence of tetrapods. Finally, we discuss the implication of RAAS in the origin of the present pandemia of hypertension and its associated cardiovascular diseases.
Collapse
Affiliation(s)
- Bernard C Rossier
- Department of Pharmacology and Toxicology, University of Lausanne, Lausanne, Switzerland; Division of Nephrology-Hypertension, University of California San Diego, La Jolla, California; and Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, United Kingdom
| | - Michael E Baker
- Department of Pharmacology and Toxicology, University of Lausanne, Lausanne, Switzerland; Division of Nephrology-Hypertension, University of California San Diego, La Jolla, California; and Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, United Kingdom
| | - Romain A Studer
- Department of Pharmacology and Toxicology, University of Lausanne, Lausanne, Switzerland; Division of Nephrology-Hypertension, University of California San Diego, La Jolla, California; and Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, United Kingdom
| |
Collapse
|
6
|
Zhang W, Meehan J, Su Z, Ng HW, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H. Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population. BMC Bioinformatics 2014; 15 Suppl 11:S6. [PMID: 25350283 PMCID: PMC4251052 DOI: 10.1186/1471-2105-15-s11-s6] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Background Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. Methods In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. Results We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. Conclusion We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
Collapse
|
7
|
Patnaik SK, Helmberg W, Blumenfeld OO. BGMUT Database of Allelic Variants of Genes Encoding Human Blood Group Antigens. ACTA ACUST UNITED AC 2014; 41:346-51. [PMID: 25538536 DOI: 10.1159/000366108] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 05/19/2014] [Indexed: 12/30/2022]
Abstract
The Blood group antigen Gene MUTation (BGMUT) database documents variations in genes of human blood group systems. In March 2014, the database, accessible at www.ncbi.nlm.nih.gov/gv/mhc/xslcgi.cgi?cmd=bgmut, listed 1,545 alleles of 44 genes of 34 blood group systems. Besides allelic information, the BGMUT resource also presents comprehensive and current information on blood group systems. This review describes the database and notes its utility for the transfusion medicine and human genetics communities.
Collapse
Affiliation(s)
- Santosh Kumar Patnaik
- Department of Thoracic Surgery, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY, USA
| | - Wolfgang Helmberg
- Department of Blood Group Serology and Transfusion Medicine, Medical University of Graz, Graz, Austria
| | - Olga O Blumenfeld
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
8
|
Bonassi S, Taioli E, Vermeulen R. Omics in population studies: a molecular epidemiology perspective. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2013; 54:455-460. [PMID: 23908054 DOI: 10.1002/em.21805] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 06/19/2013] [Indexed: 06/02/2023]
Abstract
The convergence of striking developments in (bio)-technology, increasing availability of biobanked samples, and advances in biostatistics and bio-informatics allow an optimistic outlook for epidemiological research. In this special issue on Omics in Population Studies: A Molecular Epidemiology Perspective we explore and reflect on the potential of these new developments in both exposure science and clinical research since they provide the essential link between exposure and disease and may enable scientists to improve their understanding of disease origin and progression. As noted in this special issue, this is an exciting time for epidemiology. While cancer and other noncommunicable diseases rise in number worldwide, various new tools can be applied effectively to increase understanding of the underlying causes and potential for progression to improve their prevention and treatment.
Collapse
Affiliation(s)
- Stefano Bonassi
- Unit of Clinical and Molecular Epidemiology, Area of Systems Approaches and Non Communicable Diseases. IRCCS San Raffaele Pisana, Rome, Italy.
| | | | | |
Collapse
|
9
|
Garaffo G, Provero P, Molineris I, Pinciroli P, Peano C, Battaglia C, Tomaiuolo D, Etzion T, Gothilf Y, Santoro M, Merlo GR. Profiling, Bioinformatic, and Functional Data on the Developing Olfactory/GnRH System Reveal Cellular and Molecular Pathways Essential for This Process and Potentially Relevant for the Kallmann Syndrome. Front Endocrinol (Lausanne) 2013; 4:203. [PMID: 24427155 PMCID: PMC3876029 DOI: 10.3389/fendo.2013.00203] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 12/18/2013] [Indexed: 11/28/2022] Open
Abstract
During embryonic development, immature neurons in the olfactory epithelium (OE) extend axons through the nasal mesenchyme, to contact projection neurons in the olfactory bulb. Axon navigation is accompanied by migration of the GnRH+ neurons, which enter the anterior forebrain and home in the septo-hypothalamic area. This process can be interrupted at various points and lead to the onset of the Kallmann syndrome (KS), a disorder characterized by anosmia and central hypogonadotropic hypogonadism. Several genes has been identified in human and mice that cause KS or a KS-like phenotype. In mice a set of transcription factors appears to be required for olfactory connectivity and GnRH neuron migration; thus we explored the transcriptional network underlying this developmental process by profiling the OE and the adjacent mesenchyme at three embryonic ages. We also profiled the OE from embryos null for Dlx5, a homeogene that causes a KS-like phenotype when deleted. We identified 20 interesting genes belonging to the following categories: (1) transmembrane adhesion/receptor, (2) axon-glia interaction, (3) scaffold/adapter for signaling, (4) synaptic proteins. We tested some of them in zebrafish embryos: the depletion of five (of six) Dlx5 targets affected axonal extension and targeting, while three (of three) affected GnRH neuron position and neurite organization. Thus, we confirmed the importance of cell-cell and cell-matrix interactions and identified new molecules needed for olfactory connection and GnRH neuron migration. Using available and newly generated data, we predicted/prioritized putative KS-disease genes, by building conserved co-expression networks with all known disease genes in human and mouse. The results show the overall validity of approaches based on high-throughput data and predictive bioinformatics to identify genes potentially relevant for the molecular pathogenesis of KS. A number of candidate will be discussed, that should be tested in future mutation screens.
Collapse
Affiliation(s)
- Giulia Garaffo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Paolo Provero
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Ivan Molineris
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Patrizia Pinciroli
- Department of Medical Biotechnology Translational Medicine (BIOMETRA), University of Milano, Milano, Italy
| | - Clelia Peano
- Institute of Biomedical Technology, National Research Council, ITB-CNR, Segrate, Italy
| | - Cristina Battaglia
- Department of Medical Biotechnology Translational Medicine (BIOMETRA), University of Milano, Milano, Italy
- Institute of Biomedical Technology, National Research Council, ITB-CNR, Segrate, Italy
| | - Daniela Tomaiuolo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Talya Etzion
- The George S. Wise Faculty of Life Sciences, Department of Neurobiology, Tel-Aviv University, Tel-Aviv, Israel
| | - Yoav Gothilf
- The George S. Wise Faculty of Life Sciences, Department of Neurobiology, Tel-Aviv University, Tel-Aviv, Israel
| | - Massimo Santoro
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Giorgio R. Merlo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
- *Correspondence: Giorgio R. Merlo, Department of Molecular Biotechnology and Health Science, University of Torino, Via Nizza 52, Torino 10126, Italy e-mail:
| |
Collapse
|