1
|
He C, Washburn JD, Schleif N, Hao Y, Kaeppler H, Kaeppler SM, Zhang Z, Yang J, Liu S. Trait association and prediction through integrative k-mer analysis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:833-850. [PMID: 39259496 DOI: 10.1111/tpj.17012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 08/14/2024] [Accepted: 08/22/2024] [Indexed: 09/13/2024]
Abstract
Genome-wide association study (GWAS) with single nucleotide polymorphisms (SNPs) has been widely used to explore genetic controls of phenotypic traits. Alternatively, GWAS can use counts of substrings of length k from longer sequencing reads, k-mers, as genotyping data. Using maize cob and kernel color traits, we demonstrated that k-mer GWAS can effectively identify associated k-mers. Co-expression analysis of kernel color k-mers and genes directly found k-mers from known causal genes. Analyzing complex traits of kernel oil and leaf angle resulted in k-mers from both known and candidate genes. A gene encoding a MADS transcription factor was functionally validated by showing that ectopic expression of the gene led to less upright leaves. Evolution analysis revealed most k-mers positively correlated with kernel oil were strongly selected against in maize populations, while most k-mers for upright leaf angle were positively selected. In addition, genomic prediction of kernel oil, leaf angle, and flowering time using k-mer data resulted in a similarly high prediction accuracy to the standard SNP-based method. Collectively, we showed k-mer GWAS is a powerful approach for identifying trait-associated genetic elements. Further, our results demonstrated the bridging role of k-mers for data integration and functional gene discovery.
Collapse
Affiliation(s)
- Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas, 66506, USA
| | - Jacob D Washburn
- Plant Genetics Research Unit, USDA-ARS, Columbia, Missouri, 65211, USA
| | - Nathaniel Schleif
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Yangfan Hao
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas, 66506, USA
| | - Heidi Kaeppler
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, 99164, USA
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, Nebraska, 68583-0915, USA
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, Nebraska, 68583, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas, 66506, USA
| |
Collapse
|
2
|
Dong X, Bai Y, Liao Z, Gritsch D, Liu X, Wang T, Borges-Monroy R, Ehrlich A, Serrano GE, Feany MB, Beach TG, Scherzer CR. Circular RNAs in the human brain are tailored to neuron identity and neuropsychiatric disease. Nat Commun 2023; 14:5327. [PMID: 37723137 PMCID: PMC10507039 DOI: 10.1038/s41467-023-40348-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 07/20/2023] [Indexed: 09/20/2023] Open
Abstract
Little is known about circular RNAs (circRNAs) in specific brain cells and human neuropsychiatric disease. Here, we systematically identify over 11,039 circRNAs expressed in vulnerable dopamine and pyramidal neurons laser-captured from 190 human brains and non-neuronal cells using ultra-deep, total RNA sequencing. 1526 and 3308 circRNAs are custom-tailored to the cell identity of dopamine and pyramidal neurons and enriched in synapse pathways. 29% of Parkinson's and 12% of Alzheimer's disease-associated genes produced validated circRNAs. circDNAJC6, which is transcribed from a juvenile-onset Parkinson's gene, is already dysregulated during prodromal, onset stages of common Parkinson's disease neuropathology. Globally, addiction-associated genes preferentially produce circRNAs in dopamine neurons, autism-associated genes in pyramidal neurons, and cancers in non-neuronal cells. This study shows that circular RNAs in the human brain are tailored to neuron identity and implicate circRNA-regulated synaptic specialization in neuropsychiatric diseases.
Collapse
Affiliation(s)
- Xianjun Dong
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Genomics and Bioinformatics Hub, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Yunfei Bai
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- State Key Lab of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Zhixiang Liao
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - David Gritsch
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Xiaoli Liu
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Department of Neurology, Zhejiang Hospital, Zhejiang, China
| | - Tao Wang
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, China
| | - Rebeca Borges-Monroy
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Alyssa Ehrlich
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Mel B Feany
- Departement of Pathology, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Clemens R Scherzer
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA.
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA.
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
Dong X, Bai Y, Liao Z, Gritsch D, Liu X, Wang T, Borges-Monroy R, Ehrlich A, Serano GE, Feany MB, Beach TG, Scherzer CR. Circular RNAs in the human brain are tailored to neuron identity and neuropsychiatric disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.01.535194. [PMID: 37066229 PMCID: PMC10103951 DOI: 10.1101/2023.04.01.535194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Little is known about circular RNAs (circRNAs) in specific brain cells and human neuropsychiatric disease. Here, we systematically identified over 11,039 circRNAs expressed in vulnerable dopamine and pyramidal neurons laser-captured from 190 human brains and non-neuronal cells using ultra-deep, total RNA sequencing. 1,526 and 3,308 circRNAs were custom-tailored to the cell identity of dopamine and pyramidal neurons and enriched in synapse pathways. 88% of Parkinson's and 80% of Alzheimer's disease-associated genes produced circRNAs. circDNAJC6, produced from a juvenile-onset Parkinson's gene, was already dysregulated during prodromal, onset stages of common Parkinson's disease neuropathology. Globally, addiction-associated genes preferentially produced circRNAs in dopamine neurons, autism-associated genes in pyramidal neurons, and cancers in non-neuronal cells. This study shows that circular RNAs in the human brain are tailored to neuron identity and implicate circRNA- regulated synaptic specialization in neuropsychiatric diseases.
Collapse
Affiliation(s)
- Xianjun Dong
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- Genomics and Bioinformatics Hub, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - Yunfei Bai
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Zhixiang Liao
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
| | - David Gritsch
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
| | - Xiaoli Liu
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- Department of Neurology, Zhejiang Hospital, Zhejiang, China
| | - Tao Wang
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Rebeca Borges-Monroy
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
| | - Alyssa Ehrlich
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- Department of Psychiatry, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Mel B. Feany
- Departement of Pathology, Brigham & Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Clemens R. Scherzer
- APDA Center for Advanced Parkinson Disease Research, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA, USA
- Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
4
|
Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023; 233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]
Abstract
Aquaculture has witnessed an excellent growth rate during the last two decades and offers huge potential to provide nutritional as well as livelihood security. Genomic research has contributed significantly toward the development of beneficial technologies for aquaculture. The existing high throughput technologies like next-generation technologies generate oceanic data which requires extensive analysis using appropriate tools. Bioinformatics is a rapidly evolving science that involves integrating gene based information and computational technology to produce new knowledge for the benefit of aquaculture. Bioinformatics provides new opportunities as well as challenges for information and data processing in new generation aquaculture. Rapid technical advancements have opened up a world of possibilities for using current genomics to improve aquaculture performance. Understanding the genes that govern economically relevant characteristics, necessitates a significant amount of additional research. The various dimensions of data sources includes next-generation DNA sequencing, protein sequencing, RNA sequencing gene expression profiles, metabolic pathways, molecular markers, and so on. Appropriate bioinformatics tools are developed to mine the biologically relevant and commercially useful results. The purpose of this scoping review is to present various arms of diverse bioinformatics tools with special emphasis on practical translation to the aquaculture industry.
Collapse
Affiliation(s)
- Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India.
| | - Deepak Agarwal
- Institute of Fisheries Post Graduation Studies OMR Campus, Vaniyanchavadi, Chennai, India
| | | | - Irfan Ahamd Khan
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Sujit Kumar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Adnan Amin
- Postgraduate Institute of Fisheries Education and Research Kamdhenu University, Gandhinagar-India University of Kurasthra, India; Department of Aquatic Environmental Management, Faculty of Fisheries Rangil- Ganderbel -SKUAST-K, India
| | - Jitendra Kumar Sundaray
- ICAR-Central Institute of Freshwater Aquaculture, Kausalyaganga, Bhubaneswar, Odisha 751002, India
| | - Tahiya Qadri
- Division of Food Science and Technology, SKUAST-K, Shalimar, India
| |
Collapse
|
5
|
James SA, Ong HS, Hari R, Khan AM. A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences. BMC Genomics 2021; 22:700. [PMID: 34583643 PMCID: PMC8477458 DOI: 10.1186/s12864-021-07657-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 04/28/2021] [Indexed: 11/10/2022] Open
Abstract
Background Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. Results This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. Conclusion Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07657-4.
Collapse
Affiliation(s)
- Stephen Among James
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia.,Department of Biochemistry, Faculty of Science, Kaduna State University, Kaduna, 800211, Nigeria
| | - Hui San Ong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia
| | - Ranjeev Hari
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia
| | - Asif M Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia. .,Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, 34820, Turkey.
| |
Collapse
|
6
|
He C, Lin G, Wei H, Tang H, White FF, Valent B, Liu S. Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences. NAR Genom Bioinform 2020; 2:lqaa075. [PMID: 33575622 PMCID: PMC7671381 DOI: 10.1093/nargab/lqaa075] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 08/02/2020] [Accepted: 09/01/2020] [Indexed: 12/25/2022] Open
Abstract
Genome sequences provide genomic maps with a single-base resolution for exploring genetic contents. Sequencing technologies, particularly long reads, have revolutionized genome assemblies for producing highly continuous genome sequences. However, current long-read sequencing technologies generate inaccurate reads that contain many errors. Some errors are retained in assembled sequences, which are typically not completely corrected by using either long reads or more accurate short reads. The issue commonly exists, but few tools are dedicated for computing error rates or determining error locations. In this study, we developed a novel approach, referred to as k-mer abundance difference (KAD), to compare the inferred copy number of each k-mer indicated by short reads and the observed copy number in the assembly. Simple KAD metrics enable to classify k-mers into categories that reflect the quality of the assembly. Specifically, the KAD method can be used to identify base errors and estimate the overall error rate. In addition, sequence insertion and deletion as well as sequence redundancy can also be detected. Collectively, KAD is valuable for quality evaluation of genome assemblies and, potentially, provides a diagnostic tool to aid in precise error correction. KAD software has been developed to facilitate public uses.
Collapse
Affiliation(s)
- Cheng He
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Guifang Lin
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Haibao Tang
- Center for Genomics and Biotechnology and Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fujian 350002, China
| | - Frank F White
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611-0680, USA
| | - Barbara Valent
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| |
Collapse
|
7
|
Beier S, Ulpinnis C, Schwalbe M, Münch T, Hoffie R, Koeppel I, Hertig C, Budhagatapalli N, Hiekel S, Pathi KM, Hensel G, Grosse M, Chamas S, Gerasimova S, Kumlehn J, Scholz U, Schmutzer T. Kmasker plants - a tool for assessing complex sequence space in plant species. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:631-642. [PMID: 31823436 DOI: 10.1111/tpj.14645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 06/10/2023]
Abstract
Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high-throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k-mer counting methods has proved to be reliable. Easy-to-use applications utilizing k-mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k-mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command-line and web-based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site-directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre-computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker. The web service is accessible at https://kmasker.ipk-gatersleben.de.
Collapse
Affiliation(s)
- Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Chris Ulpinnis
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data, 06120, Halle, Germany
| | - Markus Schwalbe
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Thomas Münch
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Robert Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Iris Koeppel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Christian Hertig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Nagaveni Budhagatapalli
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Stefan Hiekel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Krishna M Pathi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Goetz Hensel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Martin Grosse
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Sindy Chamas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Sophia Gerasimova
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Jochen Kumlehn
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Thomas Schmutzer
- Department of Natural Sciences III, Institute for Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, 06120, Halle, Germany
| |
Collapse
|
8
|
Khachatryan L, de Leeuw RH, Kraakman MEM, Pappas N, Te Raa M, Mei H, de Knijff P, Laros JFJ. Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples. Forensic Sci Int Genet 2020; 46:102257. [PMID: 32058299 DOI: 10.1016/j.fsigen.2020.102257] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 12/30/2019] [Accepted: 01/27/2020] [Indexed: 12/30/2022]
Abstract
The assessment of microbiome biodiversity is the most common application of metagenomics. While 16S sequencing remains standard procedure for taxonomic profiling of metagenomic data, a growing number of studies have clearly demonstrated biases associated with this method. By using Whole Genome Shotgun sequencing (WGS) metagenomics, most of the known restrictions associated with 16S data are alleviated. However, due to the computationally intensive data analyses and higher sequencing costs, WGS based metagenomics remains a less popular option. Selecting the experiment type that provides a comprehensive, yet manageable amount of information is a challenge encountered in many metagenomics studies. In this work, we created a series of artificial bacterial mixes, each with a different distribution of skin-associated microbial species. These mixes were used to estimate the resolution of two different metagenomic experiments - 16S and WGS - and to evaluate several different bioinformatics approaches for taxonomic read classification. In all test cases, WGS approaches provide much more accurate results, in terms of taxa prediction and abundance estimation, in comparison to those of 16S. Furthermore, we demonstrate that a 16S dataset, analysed using different state of the art techniques and reference databases, can produce widely different results. In light of the fact that most forensic metagenomic analysis are still performed using 16S data, our results are especially important.
Collapse
Affiliation(s)
- Lusine Khachatryan
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands.
| | - Rick H de Leeuw
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Margriet E M Kraakman
- Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Nikos Pappas
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Marije Te Raa
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Peter de Knijff
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Jeroen F J Laros
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
9
|
Parducci L, Alsos IG, Unneberg P, Pedersen MW, Han L, Lammers Y, Salonen JS, Väliranta MM, Slotte T, Wohlfarth B. Shotgun Environmental DNA, Pollen, and Macrofossil Analysis of Lateglacial Lake Sediments From Southern Sweden. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00189] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
10
|
Rowe WPM, Carrieri AP, Alcon-Giner C, Caim S, Shaw A, Sim K, Kroll JS, Hall LJ, Pyzer-Knapp EO, Winn MD. Streaming histogram sketching for rapid microbiome analytics. MICROBIOME 2019; 7:40. [PMID: 30878035 PMCID: PMC6420756 DOI: 10.1186/s40168-019-0653-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/01/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. RESULTS We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed 'histosketch' that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a 'real life' example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. CONCLUSIONS Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space. ( https://github.com/will-rowe/hulk ).
Collapse
Affiliation(s)
- Will PM Rowe
- Scientific Computing Department, STFC Daresbury Laboratory, Warrington, UK
| | | | | | - Shabhonam Caim
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Alex Shaw
- Department of Medicine, Section of Paediatrics, Imperial College London, London, UK
| | - Kathleen Sim
- Department of Medicine, Section of Paediatrics, Imperial College London, London, UK
| | - J. Simon Kroll
- Department of Medicine, Section of Paediatrics, Imperial College London, London, UK
| | - Lindsay J. Hall
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | | | - Martyn D. Winn
- Scientific Computing Department, STFC Daresbury Laboratory, Warrington, UK
| |
Collapse
|
11
|
Kaisers W, Schwender H, Schaal H. Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities. Int J Mol Sci 2018; 19:E3687. [PMID: 30469355 PMCID: PMC6274891 DOI: 10.3390/ijms19113687] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 11/15/2018] [Indexed: 01/14/2023] Open
Abstract
We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.
Collapse
Affiliation(s)
- Wolfgang Kaisers
- Department of Anaesthesiology, HELIOS University Hospital Wuppertal, University of Witten/Herdecke, Heusnerstr. 40, 42283 Wuppertal, Germany.
- Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany.
| | - Holger Schwender
- Mathematisches Institut, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany.
| | - Heiner Schaal
- Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany.
| |
Collapse
|
12
|
Dong X, Liao Z, Gritsch D, Hadzhiev Y, Bai Y, Locascio JJ, Guennewig B, Liu G, Blauwendraat C, Wang T, Adler CH, Hedreen JC, Faull RLM, Frosch MP, Nelson PT, Rizzu P, Cooper AA, Heutink P, Beach TG, Mattick JS, Müller F, Scherzer CR. Enhancers active in dopamine neurons are a primary link between genetic variation and neuropsychiatric disease. Nat Neurosci 2018; 21:1482-1492. [PMID: 30224808 PMCID: PMC6334654 DOI: 10.1038/s41593-018-0223-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 07/23/2018] [Indexed: 01/07/2023]
Abstract
Enhancers function as DNA logic gates and may control specialized functions of billions of neurons. Here we show a tailored program of noncoding genome elements active in situ in physiologically distinct dopamine neurons of the human brain. We found 71,022 transcribed noncoding elements, many of which were consistent with active enhancers and with regulatory mechanisms in zebrafish and mouse brains. Genetic variants associated with schizophrenia, addiction, and Parkinson's disease were enriched in these elements. Expression quantitative trait locus analysis revealed that Parkinson's disease-associated variants on chromosome 17q21 cis-regulate the expression of an enhancer RNA in dopamine neurons. This study shows that enhancers in dopamine neurons link genetic variation to neuropsychiatric traits.
Collapse
Affiliation(s)
- Xianjun Dong
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Zhixiang Liao
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - David Gritsch
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Yavor Hadzhiev
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Yunfei Bai
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Joseph J Locascio
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Boris Guennewig
- Sydney Medical School, Brain and Mind Centre, The University of Sydney, Sydney, New South Wales, Australia
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Ganqiang Liu
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | | | - Tao Wang
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | | | - John C Hedreen
- Harvard Brain Tissue Resource Center, McLean Hospital, Harvard Medical School, Boston, MA, USA
| | - Richard L M Faull
- Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Matthew P Frosch
- C.S. Kubik Laboratory for Neuropathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter T Nelson
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Patrizia Rizzu
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Antony A Cooper
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Peter Heutink
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | | | - John S Mattick
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Ferenc Müller
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Clemens R Scherzer
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
13
|
West PT, Probst AJ, Grigoriev IV, Thomas BC, Banfield JF. Genome-reconstruction for eukaryotes from complex natural microbial communities. Genome Res 2018; 28:569-580. [PMID: 29496730 PMCID: PMC5880246 DOI: 10.1101/gr.228429.117] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 02/27/2018] [Indexed: 11/24/2022]
Abstract
Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k-mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation, and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities.
Collapse
Affiliation(s)
- Patrick T West
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Alexander J Probst
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA
| | - Igor V Grigoriev
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA.,US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Brian C Thomas
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA.,Department of Environmental Science, Policy, and Management, University of California, Berkeley, California 94720, USA.,Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| |
Collapse
|
14
|
Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 2017; 33:574-576. [PMID: 27797770 PMCID: PMC5408915 DOI: 10.1093/bioinformatics/btw663] [Citation(s) in RCA: 259] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 10/17/2016] [Indexed: 11/29/2022] Open
Abstract
Motivation De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. Results We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. Availability and Implementation KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
15
|
Harr B, Karakoc E, Neme R, Teschke M, Pfeifle C, Pezer Ž, Babiker H, Linnenbrink M, Montero I, Scavetta R, Abai MR, Molins MP, Schlegel M, Ulrich RG, Altmüller J, Franitza M, Büntge A, Künzel S, Tautz D. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Sci Data 2016; 3:160075. [PMID: 27622383 PMCID: PMC5020872 DOI: 10.1038/sdata.2016.75] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 07/29/2016] [Indexed: 12/20/2022] Open
Abstract
Wild populations of the house mouse (Mus musculus) represent the raw genetic material for the classical inbred strains in biomedical research and are a major model system for evolutionary biology. We provide whole genome sequencing data of individuals representing natural populations of M. m. domesticus (24 individuals from 3 populations), M. m. helgolandicus (3 individuals), M. m. musculus (22 individuals from 3 populations) and M. spretus (8 individuals from one population). We use a single pipeline to map and call variants for these individuals and also include 10 additional individuals of M. m. castaneus for which genomic data are publically available. In addition, RNAseq data were obtained from 10 tissues of up to eight adult individuals from each of the three M. m. domesticus populations for which genomic data were collected. Data and analyses are presented via tracks viewable in the UCSC or IGV genome browsers. We also provide information on available outbred stocks and instructions on how to keep them in the laboratory.
Collapse
Affiliation(s)
- Bettina Harr
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Emre Karakoc
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Rafik Neme
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Meike Teschke
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Christine Pfeifle
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Željka Pezer
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Hiba Babiker
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Miriam Linnenbrink
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Inka Montero
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Rick Scavetta
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Mohammad Reza Abai
- Department of Medical Entomology and Vector Control, School of Public Health, Tehran University of Medical Sciences, Tehran 1417613151, Iran
| | - Marta Puente Molins
- Laboratorio de Anatomía Animal, Departamento de Biología Animal, Facultad de Ciencias, Universidad de Vigo, 36200 Vigo, Spain
| | - Mathias Schlegel
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute for Novel and Emerging Infectious Diseases, Südufer 10, 17493 Greifswald-Insel Riems, Germany
| | - Rainer G Ulrich
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute for Novel and Emerging Infectious Diseases, Südufer 10, 17493 Greifswald-Insel Riems, Germany
| | - Janine Altmüller
- Cologne Center for Genomics (CCG), University of Cologne, Weyertal 115b, 50931 Cologne, Germany.,Institute of Human Genetics, Universitätsklinik Köln, Kerpener Str. 34, 50931 Köln, Germany
| | - Marek Franitza
- Cologne Center for Genomics (CCG), University of Cologne, Weyertal 115b, 50931 Cologne, Germany.,Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Joseph-Stelzmann-Str. 26, 50931 Cologne, Germany
| | - Anna Büntge
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Sven Künzel
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, August-Thienemanstrasse 2, 24306 Plön, Germany
| |
Collapse
|
16
|
Henden L, Freytag S, Afawi Z, Baldassari S, Berkovic SF, Bisulli F, Canafoglia L, Casari G, Crompton DE, Depienne C, Gecz J, Guerrini R, Helbig I, Hirsch E, Keren B, Klein KM, Labauge P, LeGuern E, Licchetta L, Mei D, Nava C, Pippucci T, Rudolf G, Scheffer IE, Striano P, Tinuper P, Zara F, Corbett M, Bahlo M. Identity by descent fine mapping of familial adult myoclonus epilepsy (FAME) to 2p11.2-2q11.2. Hum Genet 2016; 135:1117-25. [PMID: 27368338 DOI: 10.1007/s00439-016-1700-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/21/2016] [Indexed: 02/03/2023]
Abstract
Familial adult myoclonus epilepsy (FAME) is a rare autosomal dominant disorder characterized by adult onset, involuntary muscle jerks, cortical myoclonus and occasional seizures. FAME is genetically heterogeneous with more than 70 families reported worldwide and five potential disease loci. The efforts to identify potential causal variants have been unsuccessful in all but three families. To date, linkage analysis has been the main approach to find and narrow FAME critical regions. We propose an alternative method, pedigree free identity-by-descent (IBD) mapping, that infers regions of the genome between individuals that have been inherited from a common ancestor. IBD mapping provides an alternative to linkage analysis in the presence of allelic and locus heterogeneity by detecting clusters of individuals who share a common allele. Succeeding IBD mapping, gene prioritization based on gene co-expression analysis can be used to identify the most promising candidate genes. We performed an IBD analysis using high-density single nucleotide polymorphism (SNP) array data followed by gene prioritization on a FAME cohort of ten European families and one Australian/New Zealander family; eight of which had known disease loci. By identifying IBD regions common to multiple families, we were able to narrow the FAME2 locus to a 9.78 megabase interval within 2p11.2-q11.2. We provide additional evidence of a founder effect in four Italian families and allelic heterogeneity with at least four distinct founders responsible for FAME at the FAME2 locus. In addition, we suggest candidate disease genes using gene prioritization based on gene co-expression analysis.
Collapse
Affiliation(s)
- Lyndal Henden
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Zaid Afawi
- Tel Aviv University Medical School, 69978, Tel Aviv, Israel
| | - Sara Baldassari
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia
| | - Francesca Bisulli
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Laura Canafoglia
- Neurophysiopathology and Epilepsy Center, IRCCS Foundation C. Besta Neurological Institute, Milan, Italy
| | - Giorgio Casari
- Division of Genetics and Cell Biology, Università Vita-Salute San Raffaele, San Raffaele Scientific Institute, Milan, Italy
| | | | - Christel Depienne
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Laboratoire de diagnostic génétique, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jozef Gecz
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia.,School of Biological Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Renzo Guerrini
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy.,IRCCS Stella Maris Foundation, Pisa, Italy
| | - Ingo Helbig
- Department of Neuropediatrics, Christian-Albrechts-University of Kiel and University Medical Center, Kiel, Schleswig-Holstein, Germany.,Departments of Brain and Cognitive Sciences, Physiology and Cell Biology, Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Negev, Israel.,Division of Neurology, The Children's Hospital of Philadelphia, Philadelphia, USA
| | - Edouard Hirsch
- Medical and Surgical Epilepsy Unit, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Boris Keren
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Karl Martin Klein
- Department of Neurology, Epilepsy Center Frankfurt Rhine-Main, Center of Neurology and Neurosurgery, University Hospital, Goethe-University Frankfurt, Frankfurt, Germany.,Department of Neurology, Epilepsy Center Hessen, University Hospitals Giessen and Marburg, Philipps-University Marburg, Marburg, Germany
| | - Pierre Labauge
- Department of Neurology, Montpellier University, Gui de Chauliac, 34295, Montpellier, Cedex 5, France
| | - Eric LeGuern
- Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France.,INSERM, U 1127; CNRS, UMR 7225; INSERM UMR 975; Institut du Cerveau et de la Moelle Epinière; and Département de Génétique et de Cytogénétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux De Paris (AP-HP), Paris, France.,Université Pierre et Marie Curie (Paris 6) (UPMC), UMRS 975, Paris, France
| | - Laura Licchetta
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Davide Mei
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy
| | - Caroline Nava
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Tommaso Pippucci
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Gabrielle Rudolf
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Department of Neurology, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Ingrid Eileen Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia.,Florey Institute of Neuroscience and Mental Health, Melbourne, VIC, 3084, Australia.,Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Melbourne, VIC, 3052, Australia
| | - Pasquale Striano
- Pediatric Neurology and Muscular Diseases Unit, Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, Gaslini Institute, Genoa, Italy
| | - Paolo Tinuper
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Federico Zara
- Laboratory of Neurogenetics, Department of Neurosciences, Gaslini Institute, Genoa, Italy
| | - Mark Corbett
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia. .,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|