1
|
Araghi S, Nguyen T. A Hybrid Supervised Approach to Human Population Identification Using Genomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:443-454. [PMID: 31150342 DOI: 10.1109/tcbb.2019.2919501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are one type of genetic variations and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research demonstrated that SNPs can be used to identify the correct source population of an individual. In addition, variations in the DNA sequences have an influence on human diseases. In this regard, SNPs studies are helpful for personalized medicine and treatment. In the literature, unsupervised clustering methods especially principal component analysis (PCA) have been popular for studying population structure. In this study, we investigate supervised approaches, particularly the LASSO multinomial regression classification method, for recognizing individuals' origin genetic population. Then, we introduce PCA-LASSO as an extension of LASSO method that benefits from advantageous characteristics of both PCA and LASSO regression. The experimental results obtained on the 1,000 genome project dataset show PCA-LASSO's significantly high accuracy in prediction of individual's origin population.
Collapse
|
2
|
Vilor-Tejedor N, Alemany S, Cáceres A, Bustamante M, Mortamais M, Pujol J, Sunyer J, González JR. Sparse multiple factor analysis to integrate genetic data, neuroimaging features, and attention-deficit/hyperactivity disorder domains. Int J Methods Psychiatr Res 2018; 27:e1738. [PMID: 30105890 PMCID: PMC6877273 DOI: 10.1002/mpr.1738] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 05/17/2018] [Accepted: 06/26/2018] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES We proposed the application of a multivariate cross-sectional framework based on a combination of a variable selection method and a multiple factor analysis (MFA) in order to identify complex meaningful biological signals related to attention-deficit/hyperactivity disorder (ADHD) symptoms and hyperactivity/inattention domains. METHODS The study included 135 children from the general population with genomic and neuroimaging data. ADHD symptoms were assessed using a questionnaire based on ADHD-DSM-IV criteria. In all analyses, the raw sum scores of the hyperactivity and inattention domains and total ADHD were used. The analytical framework comprised two steps. First, zero-inflated negative binomial linear model via penalized maximum likelihood (LASSO-ZINB) was performed. Second, the most predictive features obtained with LASSO-ZINB were used as input for the MFA. RESULTS We observed significant relationships between ADHD symptoms and hyperactivity and inattention domains with white matter, gray matter regions, and cerebellum, as well as with loci within chromosome 1. CONCLUSIONS Multivariate methods can be used to advance the neurobiological characterization of complex diseases, improving the statistical power with respect to univariate methods, allowing the identification of meaningful biological signals in Imaging Genetic studies.
Collapse
Affiliation(s)
- Natàlia Vilor-Tejedor
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.,Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Barcelonabeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
| | - Silvia Alemany
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Alejandro Cáceres
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Mariona Bustamante
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.,Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Marion Mortamais
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Jesús Pujol
- MRI Research Unit, Hospital del Mar, and Centro de Investigación Biomédica en Red de Salud Mental, CIBERSAM G21, Barcelona, Spain
| | - Jordi Sunyer
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.,IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
| | - Juan R González
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| |
Collapse
|