1
|
Banes GL, Fountain ED, Karklus A, Fulton RS, Antonacci-Fulton L, Nelson JO. Nine out of ten samples were mistakenly switched by The Orang-utan Genome Consortium. Sci Data 2022; 9:485. [PMID: 35961988 PMCID: PMC9374732 DOI: 10.1038/s41597-022-01602-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 06/24/2022] [Indexed: 12/20/2022] Open
Abstract
The Sumatran orang-utan (Pongo abelii) reference genome was first published in 2011, in conjunction with ten re-sequenced genomes from unrelated wild-caught individuals. Together, these published data have been utilized in almost all great ape genomic studies, plus in much broader comparative genomic research. Here, we report that the original sequencing Consortium inadvertently switched nine of the ten samples and/or resulting re-sequenced genomes, erroneously attributing eight of these to the wrong source individuals. Among them is a genome from the recently identified Tapanuli (P. tapanuliensis) species: thus, this genome was sequenced and published a full six years prior to the species’ description. Sex was wrongly assigned to five known individuals; the numbers in one sample identifier were swapped; and the identifier for another sample most closely resembles that of a sample from another individual entirely. These errors have been reproduced in countless subsequent manuscripts, with noted implications for studies reliant on data from known individuals.
Collapse
Affiliation(s)
- Graham L Banes
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, 1220 Capitol Court, Madison, WI, 53715, USA. .,School of Veterinary Medicine, University of Wisconsin-Madison, 2015 Linden Drive, Madison, WI, 53706, USA. .,The Orang-utan Conservation Genetics Project, Madison, WI, 53715, USA.
| | - Emily D Fountain
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, 1220 Capitol Court, Madison, WI, 53715, USA.,The Orang-utan Conservation Genetics Project, Madison, WI, 53715, USA
| | - Alyssa Karklus
- School of Veterinary Medicine, University of Wisconsin-Madison, 2015 Linden Drive, Madison, WI, 53706, USA.,The Orang-utan Conservation Genetics Project, Madison, WI, 53715, USA
| | - Robert S Fulton
- McDonnell Genome Institute at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, MO, 63108, USA
| | - Lucinda Antonacci-Fulton
- McDonnell Genome Institute at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, MO, 63108, USA
| | - Joanne O Nelson
- McDonnell Genome Institute at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, MO, 63108, USA
| |
Collapse
|
2
|
Fountain ED, Zhou LC, Karklus A, Liu QX, Meyers J, Fontanilla IKC, Rafael EF, Yu JY, Zhang Q, Zhu XL, Pei EL, Yuan YH, Banes GL. Cross-Species Application of Illumina iScan Microarrays for Cost-Effective, High-Throughput SNP Discovery. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.629252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Microarrays can be a cost-effective alternative to high-throughput sequencing for discovering novel single-nucleotide polymorphisms (SNPs). Illumina’s iScan platform dominates the market, but their commercial microarray products are designed for model organisms. Further, the platform outputs data in a proprietary format. This cannot be easily converted to human-readable genotypes or be merged with pre-existing data. To address this, we present and validate a novel pipeline to facilitate data analysis from cross-species application of Illumina microarrays. This facilitates the generation of a compatible VCF from iScan data and the merging of this with a second VCF comprising genotypes derived from other samples and sources. Our pipeline includes a custom script, iScanVCFMerge (presented as a Python package), which we validate using iScan data from three great ape genera. We conclude that cross-species application of microarrays can be a rapid, cost-effective approach for SNP discovery in non-model organisms. Our pipeline surmounts the common challenges of integrating iScan genotypes with pre-existing data.
Collapse
|
3
|
Banes GL, Fountain ED, Karklus A, Huang HM, Jang-Liaw NH, Burgess DL, Wendt J, Moehlenkamp C, Mayhew GF. Genomic targets for high-resolution inference of kinship, ancestry and disease susceptibility in orang-utans (genus: Pongo). BMC Genomics 2020; 21:873. [PMID: 33287706 PMCID: PMC7720378 DOI: 10.1186/s12864-020-07278-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 11/24/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Orang-utans comprise three critically endangered species endemic to the islands of Borneo and Sumatra. Though whole-genome sequencing has recently accelerated our understanding of their evolutionary history, the costs of implementing routine genome screening and diagnostics remain prohibitive. Capitalizing on a tri-fold locus discovery approach, combining data from published whole-genome sequences, novel whole-exome sequencing, and microarray-derived genotype data, we aimed to develop a highly informative gene-focused panel of targets that can be used to address a broad range of research questions. RESULTS We identified and present genomic co-ordinates for 175,186 SNPs and 2315 Y-chromosomal targets, plus 185 genes either known or presumed to be pathogenic in cardiovascular (N = 109) or respiratory (N = 43) diseases in humans - the primary and secondary causes of captive orang-utan mortality - or a majority of other human diseases (N = 33). As proof of concept, we designed and synthesized 'SeqCap' hybrid capture probes for these targets, demonstrating cost-effective target enrichment and reduced-representation sequencing. CONCLUSIONS Our targets are of broad utility in studies of orang-utan ancestry, admixture and disease susceptibility and aetiology, and thus are of value in addressing questions key to the survival of these species. To facilitate comparative analyses, these targets could now be standardized for future orang-utan population genomic studies. The targets are broadly compatible with commercial target enrichment platforms and can be utilized as published here to synthesize applicable probes.
Collapse
Affiliation(s)
- Graham L Banes
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, 1220 Capitol Court, Madison, WI, 53715, USA.
| | - Emily D Fountain
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, 1220 Capitol Court, Madison, WI, 53715, USA
| | - Alyssa Karklus
- School of Veterinary Medicine, University of Wisconsin-Madison, 2015 Linden Drive, Madison, WI, 53706, USA
| | - Hao-Ming Huang
- Conservation Genetics Laboratory, Conservation and Research Center, Taipei Zoo, No. 30, Section 2, Xinguang Road, Wenshan District, Taipei City, Taiwan, 11656
| | - Nian-Hong Jang-Liaw
- Conservation Genetics Laboratory, Conservation and Research Center, Taipei Zoo, No. 30, Section 2, Xinguang Road, Wenshan District, Taipei City, Taiwan, 11656
| | - Daniel L Burgess
- Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA.,Polymer Forge, Inc., 504 S Rosa Rd Ste 200, Madison, WI, 53719, USA
| | - Jennifer Wendt
- Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA.,Promega Corporation, 2800 Woods Hollow Rd, Fitchburg, WI, 53711, USA
| | - Cynthia Moehlenkamp
- Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA.,Exact Sciences, 441 Charmany Dr, Madison, WI, 53719, USA
| | - George F Mayhew
- Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA
| |
Collapse
|