1
|
Mora‐Márquez F, Nuño JC, Soto Á, López de Heredia U. Missing genotype imputation in non-model species using self-organizing maps. Mol Ecol Resour 2025; 25:e13992. [PMID: 38970328 PMCID: PMC11887599 DOI: 10.1111/1755-0998.13992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 05/30/2024] [Accepted: 06/26/2024] [Indexed: 07/08/2024]
Abstract
Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.
Collapse
Affiliation(s)
- Fernando Mora‐Márquez
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Juan Carlos Nuño
- GI en Especies Leñosas (WooSp), Dpto. Matemática Aplicada, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Álvaro Soto
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Unai López de Heredia
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| |
Collapse
|
2
|
Mowlaei ME, Li C, Jamialahmadi O, Dias R, Chen J, Jamialahmadi B, Rebbeck TR, Carnevale V, Kumar S, Shi X. STICI: Split-Transformer with integrated convolutions for genotype imputation. Nat Commun 2025; 16:1218. [PMID: 39890780 PMCID: PMC11785734 DOI: 10.1038/s41467-025-56273-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 01/08/2025] [Indexed: 02/03/2025] Open
Abstract
Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
Collapse
Affiliation(s)
- Mohammad Erfan Mowlaei
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| | - Chong Li
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| | - Oveis Jamialahmadi
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, Wallenberg Laboratory, University of Gothenburg, Gothenburg, Sweden
| | - Raquel Dias
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Benyamin Jamialahmadi
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | - Timothy Richard Rebbeck
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Institute for Computational Molecular Science, Temple University, Philadelphia, PA, USA
| | - Sudhir Kumar
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Xinghua Shi
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA.
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Shaheen A, Ye L, Karunaratne C, Seppänen T. Fully-Gated Denoising Auto-Encoder for Artifact Reduction in ECG Signals. SENSORS (BASEL, SWITZERLAND) 2025; 25:801. [PMID: 39943444 PMCID: PMC11821071 DOI: 10.3390/s25030801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 01/17/2025] [Accepted: 01/27/2025] [Indexed: 02/16/2025]
Abstract
Cardiovascular diseases (CVDs) are the primary cause of death worldwide. For accurate diagnosis of CVDs, robust and efficient ECG denoising is particularly critical in ambulatory cases where various artifacts can degrade the quality of the ECG signal. None of the present denoising methods preserve the morphology of ECG signals adequately for all noise types, especially at high noise levels. This study proposes a novel Fully-Gated Denoising Autoencoder (FGDAE) to significantly reduce the effects of different artifacts on ECG signals. The proposed FGDAE utilizes gating mechanisms in all its layers, including skip connections, and employs Self-organized Operational Neural Network (self-ONN) neurons in its encoder. Furthermore, a multi-component loss function is proposed to learn efficient latent representations of ECG signals and provide reliable denoising with maximal morphological preservation. The proposed model is trained and benchmarked on the QT Database (QTDB), degraded by adding randomly mixed artifacts collected from the MIT-BIH Noise Stress Test Database (NSTDB). The FGDAE showed the best performance on all seven error metrics used in our work in different noise intensities and artifact combinations compared with state-of-the-art algorithms. Moreover, FGDAE provides reliable denoising in extreme conditions and for varied noise compositions. The significantly reduced model size, 61% to 73% reduction, compared with the state-of-the-art algorithm, and the inference speed of the FGDAE model provide evident benefits in various practical applications. While our model performs best compared with other models tested in this study, more improvements are needed for optimal morphological preservation, especially in the presence of electrode motion artifacts.
Collapse
Affiliation(s)
- Ahmed Shaheen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, FI-90014 Oulu, Finland; (C.K.); (T.S.)
| | - Liang Ye
- Department of Information and Communication Engineering, Harbin Institute of Technology, Harbin 150001, China;
| | - Chrishni Karunaratne
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, FI-90014 Oulu, Finland; (C.K.); (T.S.)
| | - Tapio Seppänen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, FI-90014 Oulu, Finland; (C.K.); (T.S.)
| |
Collapse
|
4
|
Naito T, Okada Y. Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology. J Hum Genet 2024; 69:481-486. [PMID: 38225263 PMCID: PMC11422162 DOI: 10.1038/s10038-023-01213-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/23/2023] [Accepted: 12/04/2023] [Indexed: 01/17/2024]
Abstract
The imputation of unmeasured genotypes is essential in human genetic research, particularly in enhancing the power of genome-wide association studies and conducting subsequent fine-mapping. Recently, several deep learning-based genotype imputation methods for genome-wide variants with the capability of learning complex linkage disequilibrium patterns have been developed. Additionally, deep learning-based imputation has been applied to a distinct genomic region known as the major histocompatibility complex, referred to as HLA imputation. Despite their various advantages, the current deep learning-based genotype imputation methods do have certain limitations and have not yet become standard. These limitations include the modest accuracy improvement over statistical and conventional machine learning-based methods. However, their benefits include other aspects, such as their "reference-free" nature, which ensures complete privacy protection, and their higher computational efficiency. Furthermore, the continuing evolution of deep learning technologies is expected to contribute to further improvements in prediction accuracy and usability in the future.
Collapse
Affiliation(s)
- Tatsuhiko Naito
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita-shi, Osaka, 565-0871, Japan.
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan.
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita-shi, Osaka, 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, 2-2, Yamadaoka, Suita-shi, Osaka, 565-0871, Japan
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, 2-2, Yamadaoka, Suita-shi, Osaka, 565-0871, Japan
| |
Collapse
|
5
|
Kojima K, Tadaka S, Okamura Y, Kinoshita K. Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes. J Hum Genet 2024; 69:511-518. [PMID: 38918526 PMCID: PMC11422160 DOI: 10.1038/s10038-024-01261-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/16/2024] [Accepted: 05/13/2024] [Indexed: 06/27/2024]
Abstract
Widely used genotype imputation methods are based on the Li and Stephens model, which assumes that new haplotypes can be represented by modifying existing haplotypes in a reference panel through mutations and recombinations. These methods use genotypes from SNP arrays as inputs to estimate haplotypes that align with the input genotypes by analyzing recombination patterns within a reference panel, and then infer unobserved variants. While these methods require reference panels in an identifiable form, their public use is limited due to privacy and consent concerns. One strategy to overcome these limitations is to use de-identified haplotype information, such as summary statistics or model parameters. Advances in deep learning (DL) offer the potential to develop imputation methods that use haplotype information in a reference-free manner by handling it as model parameters, while maintaining comparable imputation accuracy to methods based on the Li and Stephens model. Here, we provide a brief introduction to DL-based reference-free genotype imputation methods, including RNN-IMP, developed by our research group. We then evaluate the performance of RNN-IMP against widely-used Li and Stephens model-based imputation methods in terms of accuracy (R2), using the 1000 Genomes Project Phase 3 dataset and corresponding simulated Omni2.5 SNP genotype data. Although RNN-IMP is sensitive to missing values in input genotypes, we propose a two-stage imputation strategy: missing genotypes are first imputed using denoising autoencoders; RNN-IMP then processes these imputed genotypes. This approach restores the imputation accuracy that is degraded by missing values, enhancing the practical use of RNN-IMP.
Collapse
Affiliation(s)
- Kaname Kojima
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
| | - Shu Tadaka
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yasunobu Okamura
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-0873, Japan
| | - Kengo Kinoshita
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-0873, Japan.
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aza-Aoba, Aramaki, Aoba-ku, Sendai, Miyagi, 980-8579, Japan.
- Institute of Development, Aging and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| |
Collapse
|
6
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
7
|
Yuan M, Hoskens H, Goovaerts S, Herrick N, Shriver MD, Walsh S, Claes P. Hybrid autoencoder with orthogonal latent space for robust population structure inference. Sci Rep 2023; 13:2612. [PMID: 36788253 PMCID: PMC9929087 DOI: 10.1038/s41598-023-28759-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open
Abstract
Analysis of population structure and genomic ancestry remains an important topic in human genetics and bioinformatics. Commonly used methods require high-quality genotype data to ensure accurate inference. However, in practice, laboratory artifacts and outliers are often present in the data. Moreover, existing methods are typically affected by the presence of related individuals in the dataset. In this work, we propose a novel hybrid method, called SAE-IBS, which combines the strengths of traditional matrix decomposition-based (e.g., principal component analysis) and more recent neural network-based (e.g., autoencoders) solutions. Namely, it yields an orthogonal latent space enhancing dimensionality selection while learning non-linear transformations. The proposed approach achieves higher accuracy than existing methods for projecting poor quality target samples (genotyping errors and missing data) onto a reference ancestry space and generates a robust ancestry space in the presence of relatedness. We introduce a new approach and an accompanying open-source program for robust ancestry inference in the presence of missing data, genotyping errors, and relatedness. The obtained ancestry space allows for non-linear projections and exhibits orthogonality with clearly separable population groups.
Collapse
Affiliation(s)
- Meng Yuan
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium.
| | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Seppe Goovaerts
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Noah Herrick
- Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Mark D Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, USA
| | - Susan Walsh
- Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Peter Claes
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium.
- Murdoch Children's Research Institute, Melbourne, VIC, Australia.
| |
Collapse
|
8
|
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics 2023; 39:btac715. [PMID: 36342186 PMCID: PMC9805557 DOI: 10.1093/bioinformatics/btac715] [Citation(s) in RCA: 87] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 10/24/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Antimicrobial peptides (AMPs) are essential components of therapeutic peptides for innate immunity. Researchers have developed several computational methods to predict the potential AMPs from many candidate peptides. With the development of artificial intelligent techniques, the protein structures can be accurately predicted, which are useful for protein sequence and function analysis. Unfortunately, the predicted peptide structure information has not been applied to the field of AMP prediction so as to improve the predictive performance. RESULTS In this study, we proposed a computational predictor called sAMPpred-GAT for AMP identification. To the best of our knowledge, sAMPpred-GAT is the first approach based on the predicted peptide structures for AMP prediction. The sAMPpred-GAT predictor constructs the graphs based on the predicted peptide structures, sequence information and evolutionary information. The Graph Attention Network (GAT) is then performed on the graphs to learn the discriminative features. Finally, the full connection networks are utilized as the output module to predict whether the peptides are AMP or not. Experimental results show that sAMPpred-GAT outperforms the other state-of-the-art methods in terms of AUC, and achieves better or highly comparable performance in terms of the other metrics on the eight independent test datasets, demonstrating that the predicted peptide structure information is important for AMP prediction. AVAILABILITY AND IMPLEMENTATION A user-friendly webserver of sAMPpred-GAT can be accessed at http://bliulab.net/sAMPpred-GAT and the source code is available at https://github.com/HongWuL/sAMPpred-GAT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wei Peng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
9
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Luo Z, Qiu C, Zhao LJ, Su KJ, Tian Q, Shen H, Hong H, Gong P, Shi X, Deng HW, Zhang C. An autoencoder-based deep learning method for genotype imputation. Front Artif Intell 2022; 5:1028978. [PMID: 36406474 PMCID: PMC9671213 DOI: 10.3389/frai.2022.1028978] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Zhe Luo
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Chuan Qiu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Lan Juan Zhao
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Kuan-Jui Su
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Qing Tian
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Xinghua Shi
- Department of Computer & Information Sciences, Temple University, Philadelphia, PA, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| |
Collapse
|
10
|
Dias R, Evans D, Chen SF, Chen KY, Loguercio S, Chan L, Torkamani A. Rapid, Reference-Free human genotype imputation with denoising autoencoders. eLife 2022; 11:e75600. [PMID: 36148981 PMCID: PMC9555874 DOI: 10.7554/elife.75600] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 09/19/2022] [Indexed: 11/13/2022] Open
Abstract
Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools.
Collapse
Affiliation(s)
- Raquel Dias
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
- Department of Microbiology and Cell Science, University of FloridaGainesvilleUnited States
| | - Doug Evans
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| | - Shang-Fu Chen
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| | - Kai-Yu Chen
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| | - Salvatore Loguercio
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| | - Leslie Chan
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| | - Ali Torkamani
- Scripps Research Translational Institute, Scripps Research InstituteLa JollaUnited States
- Department of Integrative Structural and Computational Biology, Scripps ResearchLa JollaUnited States
| |
Collapse
|
11
|
Wang S, Kim M, Jiang X, Harmanci AO. Evaluation of vicinity-based hidden Markov models for genotype imputation. BMC Bioinformatics 2022; 23:356. [PMID: 36038834 PMCID: PMC9422108 DOI: 10.1186/s12859-022-04896-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 08/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The decreasing cost of DNA sequencing has led to a great increase in our knowledge about genetic variation. While population-scale projects bring important insight into genotype-phenotype relationships, the cost of performing whole-genome sequencing on large samples is still prohibitive. In-silico genotype imputation coupled with genotyping-by-arrays is a cost-effective and accurate alternative for genotyping of common and uncommon variants. Imputation methods compare the genotypes of the typed variants with the large population-specific reference panels and estimate the genotypes of untyped variants by making use of the linkage disequilibrium patterns. Most accurate imputation methods are based on the Li-Stephens hidden Markov model, HMM, that treats the sequence of each chromosome as a mosaic of the haplotypes from the reference panel. RESULTS Here we assess the accuracy of vicinity-based HMMs, where each untyped variant is imputed using the typed variants in a small window around itself (as small as 1 centimorgan). Locality-based imputation is used recently by machine learning-based genotype imputation approaches. We assess how the parameters of the vicinity-based HMMs impact the imputation accuracy in a comprehensive set of benchmarks and show that vicinity-based HMMs can accurately impute common and uncommon variants. CONCLUSIONS Our results indicate that locality-based imputation models can be effectively used for genotype imputation. The parameter settings that we identified can be used in future methods and vicinity-based HMMs can be used for re-structuring and parallelizing new imputation methods. The source code for the vicinity-based HMM implementations is publicly available at https://github.com/harmancilab/LoHaMMer .
Collapse
Affiliation(s)
- Su Wang
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, 04763, Republic of Korea
| | - Xiaoqian Jiang
- Center for Secure Artificial Intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Arif Ozgun Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
13
|
Ausmees K, Nettelblad C. A deep learning framework for characterization of genotype data. G3 GENES|GENOMES|GENETICS 2022; 12:6515290. [PMID: 35078229 PMCID: PMC8896001 DOI: 10.1093/g3journal/jkac020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 01/18/2022] [Indexed: 01/05/2023]
Abstract
Dimensionality reduction is a data transformation technique widely used in various fields of genomics research. The application of dimensionality reduction to genotype data is known to capture genetic similarity between individuals, and is used for visualization of genetic variation, identification of population structure as well as ancestry mapping. Among frequently used methods are principal component analysis, which is a linear transform that often misses more fine-scale structures, and neighbor-graph based methods which focus on local relationships rather than large-scale patterns. Deep learning models are a type of nonlinear machine learning method in which the features used in data transformation are decided by the model in a data-driven manner, rather than by the researcher, and have been shown to present a promising alternative to traditional statistical methods for various applications in omics research. In this study, we propose a deep learning model based on a convolutional autoencoder architecture for dimensionality reduction of genotype data. Using a highly diverse cohort of human samples, we demonstrate that the model can identify population clusters and provide richer visual information in comparison to principal component analysis, while preserving global geometry to a higher extent than t-SNE and UMAP, yielding results that are comparable to an alternative deep learning approach based on variational autoencoders. We also discuss the use of the methodology for more general characterization of genotype data, showing that it preserves spatial properties in the form of decay of linkage disequilibrium with distance along the genome and demonstrating its use as a genetic clustering method, comparing results to the ADMIXTURE software frequently used in population genetic studies.
Collapse
Affiliation(s)
- Kristiina Ausmees
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
14
|
Kim M, Harmanci AO, Bossuat JP, Carpov S, Cheon JH, Chillotti I, Cho W, Froelicher D, Gama N, Georgieva M, Hong S, Hubaux JP, Kim D, Lauter K, Ma Y, Ohno-Machado L, Sofia H, Son Y, Song Y, Troncoso-Pastoriza J, Jiang X. Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation. Cell Syst 2021; 12:1108-1120.e4. [PMID: 34464590 PMCID: PMC9898842 DOI: 10.1016/j.cels.2021.07.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 04/21/2021] [Accepted: 07/29/2021] [Indexed: 02/06/2023]
Abstract
Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.
Collapse
Affiliation(s)
- Miran Kim
- Department of Computer Science and Engineering and Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Arif Ozgun Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.,Corresponding authors: ,
| | | | - Sergiu Carpov
- Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland.,CEA, LIST, 91191 Gif-sur-Yvette Cedex, France
| | - Jung Hee Cheon
- Department of Mathematical Sciences, Seoul National University, Seoul, 08826, Republic of Korea.,Crypto Lab Inc., Seoul, 08826, Republic of Korea
| | | | - Wonhee Cho
- Department of Mathematical Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Nicolas Gama
- Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland
| | - Mariya Georgieva
- Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland
| | - Seungwan Hong
- Department of Mathematical Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Duhyeong Kim
- Department of Mathematical Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Yiping Ma
- University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California, San Diego, CA, 92093, USA
| | - Heidi Sofia
- National Institutes of Health (NIH) - National Human Genome Research Institute, Bethesda, MD, 20892, USA
| | | | - Yongsoo Song
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Xiaoqian Jiang
- Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.,Corresponding authors: ,
| |
Collapse
|
15
|
Zamanzadeh DJ, Petousis P, Davis TA, Nicholas SB, Norris KC, Tuttle KR, Bui AAT, Sarrafzadeh M. Autopopulus: A Novel Framework for Autoencoder Imputation on Large Clinical Datasets. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2303-2309. [PMID: 34891747 PMCID: PMC8862635 DOI: 10.1109/embc46164.2021.9630135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The adoption of electronic health records (EHRs) has made patient data increasingly accessible, precipitating the development of various clinical decision support systems and data-driven models to help physicians. However, missing data are common in EHR-derived datasets, which can introduce significant uncertainty, if not invalidating the use of a predictive model. Machine learning (ML)-based imputation methods have shown promise in various domains for the task of estimating values and reducing uncertainty to the point that a predictive model can be employed. We introduce Autopopulus, a novel framework that enables the design and evaluation of various autoencoder architectures for efficient imputation on large datasets. Autopopulus implements existing autoencoder methods as well as a new technique that outputs a range of estimated values (rather than point estimates), and demonstrates a workflow that helps users make an informed decision on an appropriate imputation method. To further illustrate Autopopulus' utility, we use it to identify not only which imputation methods can most accurately impute on a large clinical dataset, but to also identify the imputation methods that enable downstream predictive models to achieve the best performance for prediction of chronic kidney disease (CKD) progression.
Collapse
|
16
|
Naito T, Suzuki K, Hirata J, Kamatani Y, Matsuda K, Toda T, Okada Y. A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nat Commun 2021; 12:1639. [PMID: 33712626 PMCID: PMC7955122 DOI: 10.1038/s41467-021-21975-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/19/2021] [Indexed: 01/31/2023] Open
Abstract
Conventional human leukocyte antigen (HLA) imputation methods drop their performance for infrequent alleles, which is one of the factors that reduce the reliability of trans-ethnic major histocompatibility complex (MHC) fine-mapping due to inter-ethnic heterogeneity in allele frequency spectra. We develop DEEP*HLA, a deep learning method for imputing HLA genotypes. Through validation using the Japanese and European HLA reference panels (n = 1,118 and 5,122), DEEP*HLA achieves the highest accuracies with significant superiority for low-frequency and rare alleles. DEEP*HLA is less dependent on distance-dependent linkage disequilibrium decay of the target alleles and might capture the complicated region-wide information. We apply DEEP*HLA to type 1 diabetes GWAS data from BioBank Japan (n = 62,387) and UK Biobank (n = 354,459), and successfully disentangle independently associated class I and II HLA variants with shared risk among diverse populations (the top signal at amino acid position 71 of HLA-DRβ1; P = 7.5 × 10-120). Our study illustrates the value of deep learning in genotype imputation and trans-ethnic MHC fine-mapping.
Collapse
Affiliation(s)
- Tatsuhiko Naito
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.26999.3d0000 0001 2151 536XDepartment of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ken Suzuki
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jun Hirata
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.419889.50000 0004 1779 3502Pharmaceutical Discovery Research Laboratories, Teijin Pharma Limited, Hino, Japan
| | - Yoichiro Kamatani
- grid.26999.3d0000 0001 2151 536XLaboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Koichi Matsuda
- grid.26999.3d0000 0001 2151 536XLaboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Tatsushi Toda
- grid.26999.3d0000 0001 2151 536XDepartment of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yukinori Okada
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.136593.b0000 0004 0373 3971Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan ,grid.136593.b0000 0004 0373 3971Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| |
Collapse
|
17
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Shen H, Gong P, Zhang C, Deng HW. A Review of Integrative Imputation for Multi-Omics Datasets. Front Genet 2020; 11:570255. [PMID: 33193667 PMCID: PMC7594632 DOI: 10.3389/fgene.2020.570255] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use of the correlations and shared information among multi-omics datasets are expected to outperform approaches that rely on single-omics information alone, resulting in more accurate results for the subsequent downstream analyses. In this review, we provide an overview of the currently available imputation methods for handling missing values in bioinformatics data with an emphasis on multi-omics imputation. In addition, we also provide a perspective on how deep learning methods might be developed for the integrative imputation of multi-omics datasets.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
18
|
Innovating Computational Biology and Intelligent Medicine: ICIBM 2019 Special Issue. Genes (Basel) 2020; 11:genes11040437. [PMID: 32316483 PMCID: PMC7231250 DOI: 10.3390/genes11040437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 04/15/2020] [Indexed: 12/03/2022] Open
Abstract
The International Association for Intelligent Biology and Medicine (IAIBM) is a nonprofit organization that promotes intelligent biology and medical science. It hosts an annual International Conference on Intelligent Biology and Medicine (ICIBM), which was established in 2012. The ICIBM 2019 was held from 9 to 11 June 2019 in Columbus, Ohio, USA. Out of the 105 original research manuscripts submitted to the conference, 18 were selected for publication in a Special Issue in Genes. The topics of the selected manuscripts cover a wide range of current topics in biomedical research including cancer informatics, transcriptomic, computational algorithms, visualization and tools, deep learning, and microbiome research. In this editorial, we briefly introduce each of the manuscripts and discuss their contribution to the advance of science and technology.
Collapse
|