1
|
Das S, Biswas NK, Basu A. Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data. Nucleic Acids Res 2023; 51:e75. [PMID: 37378434 PMCID: PMC10415152 DOI: 10.1093/nar/gkad539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/16/2023] [Accepted: 06/27/2023] [Indexed: 06/29/2023] Open
Abstract
High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of sequencing artifacts holds the key in separating true variants from false positives. Here, we develop Mapinsights, a toolkit that performs quality control (QC) analysis of sequence alignment files, capable of detecting outliers based on sequencing artifacts of HTS data at a deeper resolution compared with existing methods. Mapinsights performs a cluster analysis based on novel and existing QC features derived from the sequence alignment for outlier detection. We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to sequencing cycles, sequencing chemistry, sequencing libraries and across various orthogonal sequencing platforms. Mapinsights also enables identification of anomalies related to sequencing depth. A logistic regression-based model built on the features of Mapinsights shows high accuracy in detecting 'low-confidence' variant sites. Quantitative estimates and probabilistic arguments provided by Mapinsights can be utilized in identifying errors, bias and outlier samples, and also aid in improving the authenticity of variant calls.
Collapse
Affiliation(s)
- Subrata Das
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| | - Nidhan K Biswas
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| | - Analabha Basu
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| |
Collapse
|
2
|
Kim HL, Li T, Kalsi N, Nguyen HTT, Shaw TA, Ang KC, Cheng KC, Ratan A, Peltier WR, Samanta D, Pratapneni M, Schuster SC, Horton BP. Prehistoric human migration between Sundaland and South Asia was driven by sea-level rise. Commun Biol 2023; 6:150. [PMID: 36739308 DOI: 10.1038/s42003-023-04510-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 01/20/2023] [Indexed: 02/06/2023] Open
Abstract
Rapid sea-level rise between the Last Glacial Maximum (LGM) and the mid-Holocene transformed the Southeast Asian coastal landscape, but the impact on human demography remains unclear. Here, we create a paleogeographic map, focusing on sea-level changes during the period spanning the LGM to the present-day and infer the human population history in Southeast and South Asia using 763 high-coverage whole-genome sequencing datasets from 59 ethnic groups. We show that sea-level rise, in particular meltwater pulses 1 A (MWP1A, ~14,500-14,000 years ago) and 1B (MWP1B, ~11,500-11,000 years ago), reduced land area by over 50% since the LGM, resulting in segregation of local human populations. Following periods of rapid sea-level rises, population pressure drove the migration of Malaysian Negritos into South Asia. Integrated paleogeographic and population genomic analysis demonstrates the earliest documented instance of forced human migration driven by sea-level rise.
Collapse
|
3
|
Abondio P, Bruno F, Bruni AC, Luiselli D. Rare Amyloid Precursor Protein Point Mutations Recapitulate Worldwide Migration and Admixture in Healthy Individuals: Implications for the Study of Neurodegeneration. Int J Mol Sci 2022; 23:ijms232415871. [PMID: 36555510 PMCID: PMC9781461 DOI: 10.3390/ijms232415871] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 11/30/2022] [Accepted: 12/11/2022] [Indexed: 12/23/2022] Open
Abstract
Genetic discoveries related to Alzheimer's disease and other dementias have been performed using either large cohorts of affected subjects or multiple individuals from the same pedigree, therefore disregarding mutations in the context of healthy groups. Moreover, a large portion of studies so far have been performed on individuals of European ancestry, with a remarkable lack of epidemiological and genomic data from underrepresented populations. In the present study, 70 single-point mutations on the APP gene in a publicly available genetic dataset that included 2504 healthy individuals from 26 populations were scanned, and their distribution was analyzed. Furthermore, after gametic phase reconstruction, a pairwise comparison of the segments surrounding the mutations was performed to reveal patterns of haplotype sharing that could point to specific cross-population and cross-ancestry admixture events. Eight mutations were detected in the worldwide dataset, with several of them being specific for a single individual, population, or macroarea. Patterns of segment sharing reflected recent historical events of migration and admixture possibly linked to colonization campaigns. These observations reveal the population dynamics of the considered APP mutations in worldwide human groups and support the development of ancestry-informed screening practices for the improvement of precision and personalized approaches to neurodegeneration and dementia.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
- Laboratory of Molecular Anthropology and Center for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Via Selmi 3, 40126 Bologna, Italy
| | - Francesco Bruno
- Regional Neurogenetic Center (CRN), Department of Primary Care, ASP Catanzaro, 88046 Lamezia Terme, Italy
- Association for Neurogenetic Research (ARN), 88046 Lamezia Terme, Italy
- Correspondence:
| | - Amalia Cecilia Bruni
- Regional Neurogenetic Center (CRN), Department of Primary Care, ASP Catanzaro, 88046 Lamezia Terme, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
4
|
Tagore D, Majumder PP, Chatterjee A, Basu A. Multiple migrations from East Asia led to linguistic transformation in NorthEast India and mainland Southeast Asia. Front Genet 2022; 13:1023870. [PMID: 36303544 PMCID: PMC9592996 DOI: 10.3389/fgene.2022.1023870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 09/27/2022] [Indexed: 11/13/2022] Open
Abstract
NorthEast India, with its unique geographic location in the midst of the Himalayas and Bay of Bengal, has served as a passage for the movement of modern humans across the Indian subcontinent and East/Southeast Asia. In this study we look into the population genetics of a unique population called the Khasi, speaking a language (also known as the Khasi language) belonging to the Austroasiatic language family and residing amidst the Tibeto-Burman speakers as an isolated population. The Khasi language belongs to one of the three major broad classifications or phyla of the Austroasiatic language and the speakers of the three sub-groups are separated from each other by large geographical distances. The Khasi speakers are separated from their nearest Austroasiatic language-speaking sub-groups: the “Mundari” sub-family from East and peninsular India and the “Mon-Khmers” in Mainland Southeast Asia. We found the Khasi population to be genetically distinct from other Austroasiatic speakers, i.e. Mundaris and Mon-Khmers, but relatively similar to the geographically proximal Tibeto Burmans. The possible reasons for this genetic-linguistic discordance lie in the admixture history of different migration events that originated from East Asia and proceeded possibly towards Southeast Asia. We found at least two distinct migration events from East Asia. While the ancestors of today’s Tibeto-Burman speakers were affected by both, the ancestors of Khasis were insulated from the second migration event. Correlating the linguistic similarity of Tibeto-Burman and Sino-Tibetan languages of today’s East Asians, we infer that the second wave of migration resulted in a linguistic transition while the Khasis could preserve their linguistic identity.
Collapse
Affiliation(s)
| | - Partha P. Majumder
- National Institute of Biomedical Genomics, Kalyani, India
- Indian Statistical Institute, Kolkata, India
| | - Anupam Chatterjee
- Department of Biotechnology, North-Eastern Hill University, Shillong, India
- School of Biosciences, Royal Global University, Guwahati, India
| | - Analabha Basu
- National Institute of Biomedical Genomics, Kalyani, India
- *Correspondence: Analabha Basu,
| |
Collapse
|
5
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:6535682. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
6
|
Hoh BP, Deng L, Xu S. The Peopling and Migration History of the Natives in Peninsular Malaysia and Borneo: A Glimpse on the Studies Over the Past 100 years. Front Genet 2022; 13:767018. [PMID: 35154269 PMCID: PMC8829068 DOI: 10.3389/fgene.2022.767018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 01/07/2022] [Indexed: 12/05/2022] Open
Abstract
Southeast Asia (SEA) has one of the longest records of modern human habitation out-of-Africa. Located at the crossroad of the mainland and islands of SEA, Peninsular Malaysia is an important piece of puzzle to the map of peopling and migration history in Asia, a question that is of interest to many anthropologists, archeologists, and population geneticists. This review aims to revisit our understanding to the population genetics of the natives from Peninsular Malaysia and Borneo over the past century based on the chronology of the technology advancement: 1) Anthropological and Physical Characterization; 2) Blood Group Markers; 3) Protein Markers; 4) Mitochondrial and Autosomal DNA Markers; and 5) Whole Genome Analysis. Subsequently some missing gaps of the study are identified. In the later part of this review, challenges of studying the population genetics of natives will be elaborated. Finally, we conclude our review by reiterating the importance of unveiling migration history and genetic diversity of the indigenous populations as a steppingstone towards comprehending disease evolution and etiology.
Collapse
Affiliation(s)
- Boon-Peng Hoh
- Faculty of Medicine and Health Sciences, UCSI University, UCSI Hospital, Port Dickson, Malaysia
- *Correspondence: Boon-Peng Hoh,
| | - Lian Deng
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, China
- Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
7
|
Tagore D, Aghakhanian F, Naidu R, Phipps ME, Basu A. Publisher Correction to: Insights into the demographic history of Asia from common ancestry and admixture in the genomic landscape of present-day Austroasiatic speakers. BMC Biol 2021; 19:238. [PMID: 34736468 PMCID: PMC8569990 DOI: 10.1186/s12915-021-01174-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Debashree Tagore
- National Institute of Biomedical Genomics, Kalyani, 741251, India
| | - Farhang Aghakhanian
- Oklahoma Medical Research Foundation, Genes and Human Disease Program, 825 NE 13th Street, Oklahoma City, OK, 73104, USA.,Genomics Facility, School of Science, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Rakesh Naidu
- Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| | - Maude E Phipps
- Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| | - Analabha Basu
- National Institute of Biomedical Genomics, Kalyani, 741251, India.
| |
Collapse
|