1
|
Wang Z, Cheng X, Ma A, Jiang F, Chen Y. Multiplexed food-borne pathogen detection using an argonaute-mediated digital sensor based on a magnetic-bead-assisted imaging transcoding system. NATURE FOOD 2025; 6:170-181. [PMID: 39748032 DOI: 10.1038/s43016-024-01082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/31/2024] [Indexed: 01/04/2025]
Abstract
Accurate, sensitive and multiplexed detection of food-borne pathogens is crucial for assessing food safety risks. Here we present a digital DNA-amplification-free nucleic acid detection assay to achieve multiplexed and ultrasensitive detection of three food-borne pathogens. We used mesophilic Clostridium butyricum argonaute and magnetic beads in a digital carrier system (d-MAGIC). Clostridium butyricum argonaute, with its two-guide accurate cleavage activity, precisely targets and cleaves fluorescence-quencher reporters corresponding to different bacteria through a two-step process. The system uses fluorescence-encoded magnetic beads as programmable multi-probes, allowing the simultaneous detection of multiple pathogens and easy data interpretation via artificial intelligence. The method showed a wide detection range (101 to 107 CFU ml-1) and a low limit of detection of 6 CFU ml-1 for food-borne pathogens without DNA amplification. Digital nucleic acid testing using d-MAGIC can become a next-generation strategy for accurate and convenient pathogen detection.
Collapse
Affiliation(s)
- Zhipan Wang
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xinrui Cheng
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Aimin Ma
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Feng Jiang
- Key Laboratory of Detection Technology of Focus Chemical Hazards in Animal-Derived Food for State Market Regulation, Wuhan, China
| | - Yiping Chen
- College of Food Science and Technology, Huazhong Agricultural University, Wuhan, China.
- State Key Laboratory of Marine Food Processing and Safety Control, Dalian Polytechnic University, Dalian, China.
| |
Collapse
|
2
|
Katsonis P, Lichtarge O. Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects. Nat Commun 2025; 16:159. [PMID: 39746940 PMCID: PMC11696468 DOI: 10.1038/s41467-024-55066-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 11/27/2024] [Indexed: 01/04/2025] Open
Abstract
Computational methods for estimating missense variant impact suffer from inconsistent performance across genes, which poses a major challenge for their reliable use in clinical practice. While ensemble scores leverage multiple prediction methods to enhance consistency, the overrepresentation of certain genes in the training data can bias their outcomes. To address this critical limitation, we propose a gene-specific ensemble framework trained on reference computational annotations rather than on clinical or experimental data. Accordingly, we generate Meta-EA ensemble scores that achieve comparable performance to the top individual predicting method for each gene set. Incorporating the effects of splicing and the allele frequency of human polymorphisms further enhances the performance of Meta-EA, achieving an area under the receiver operating characteristic curve of 0.97 for both gene-balanced and imbalanced clinical assessments. In conclusion, this work leverages the wealth of existing variant impact prediction approaches to generate improved estimations for clinical interpretation.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Acharya P, Singh US, Rajamannar V, Muniaraj M, Nayak B, Das A. Genome resequencing and genome-wide polymorphisms in mosquito vectors Aedes aegypti and Aedes albopictus from south India. Sci Rep 2024; 14:22931. [PMID: 39358370 PMCID: PMC11447132 DOI: 10.1038/s41598-024-71484-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 08/28/2024] [Indexed: 10/04/2024] Open
Abstract
Aedes aegypti and Aedes albopictus mosquitoes spread major vector-borne viral diseases in tropical and sub-tropical regions of the globe. In this study, we sequenced the genome of Indian Ae. aegypti and Ae. albopictus and mapped to their reference genomes. Comparative genomics were performed between our strain and the reference strains. A total of 14,416,484 single nucleotide polymorphisms (SNPs) and 156,487 insertions and deletions (InDels) were found in Ae. aegypti, and 28,940,433 SNPs and 188,987 InDels in Ae. albopictus. Particular emphasis was given to gene families involved in mosquito digestion, development, and innate immunity, which could be putative candidates for vector control. Serine protease cascades and their inhibitors called serpins, play a central role in these processes. We extracted high-impact variants in genes associated with serine proteases and serpins. This study reports for the first time a high coverage genome sequence data of an Indian Ae. albopictus mosquito. The results from this study will provide insights into Indian Aedes specific polymorphisms and the evolution of immune related genes in mosquitoes, which can serve as a resource for future comparative genomics and those pursuing the development of targeted biopesticides for effective mosquito control strategies.
Collapse
Affiliation(s)
- Preeti Acharya
- Sambalpur University, Jyoti Vihar, Sambalpur, Odisha, 768019, India
- ICMR-National Institute of Research in Tribal Health, Jabalpur, Madhya Pradesh, India
| | | | | | - Mayilsamy Muniaraj
- ICMR-Vector Control Research Centre Field Station, Madurai, Tamil Nadu, India
| | - Binata Nayak
- Sambalpur University, Jyoti Vihar, Sambalpur, Odisha, 768019, India.
| | - Aparup Das
- ICMR-National Institute of Research in Tribal Health, Jabalpur, Madhya Pradesh, India.
| |
Collapse
|
4
|
Susán HK, Orosz G, Zámbó V, Csala M, Kereszturi É. Severity Ranking of Missense and Frameshift Genetic Variants in SCD1 by In Silico and In Vitro Functional Analysis. Nutrients 2024; 16:3259. [PMID: 39408225 PMCID: PMC11478377 DOI: 10.3390/nu16193259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024] Open
Abstract
BACKGROUND A considerable proportion of the symptoms associated with excessive dietary intake can be attributed to systemic imbalances in lipid metabolism. The prominent toxicity of saturated fatty acids has been repeatedly demonstrated and sheds light on the protective role of stearoyl-CoA desaturase-1 (SCD1), the key enzyme for fatty acid desaturation. SCD1 protein expression is regulated at the levels of transcription, translation, and degradation. However, the modulating effect of the variability of the human genome must also be taken into account. Therefore, we aimed to ascertain whether natural missense or frameshift mutations in SCD1 (p.H125P, p.M224L, p.A333T, p.R253AfsTer7) could influence the expression, degradation, or function of the enzyme. METHODS In silico and in vitro experiments were conducted to comprehensively evaluate the consequences associated with each genetic variation, with the objective of using the results to propose a risk or severity ranking of SCD1 variants. RESULTS As anticipated, the p.R253AfsTer7 variant was identified as the most deleterious in structural, functional, and quantitative terms. The p.H125P variant also reduced the desaturation capacity of the enzyme in accordance with the predicted structural alterations and augmented degradation resulting from folding complications. This was aggravated by increased mRNA instability and accompanied by mild endoplasmic reticulum stress induction. The p.A333T protein exhibited an intermediate phenotype, whereas p.M224L showed no deleterious effects and even increased the amount of SCD1. CONCLUSIONS In conclusion, the large-scale identification of genetic variations needs to be supplemented with comprehensive functional characterization of these variations to facilitate adequate personalized prevention and treatment of lipid metabolism-related conditions.
Collapse
Affiliation(s)
| | | | | | | | - Éva Kereszturi
- Department of Molecular Biology, Semmelweis University, H-1085 Budapest, Hungary; (H.K.S.); (G.O.); (V.Z.); (M.C.)
| |
Collapse
|
5
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
6
|
Rioux B, Chong M, Walker R, McGlasson S, Rannikmäe K, McCartney D, McCabe J, Brown R, Crow YJ, Hunt D, Whiteley W. Phenotypes associated with genetic determinants of type I interferon regulation in the UK Biobank: a protocol. Wellcome Open Res 2023; 8:550. [PMID: 38855722 PMCID: PMC11162527 DOI: 10.12688/wellcomeopenres.20385.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 06/11/2024] Open
Abstract
Background Type I interferons are cytokines involved in innate immunity against viruses. Genetic disorders of type I interferon regulation are associated with a range of autoimmune and cerebrovascular phenotypes. Carriers of pathogenic variants involved in genetic disorders of type I interferons are generally considered asymptomatic. Preliminary data suggests, however, that genetically determined dysregulation of type I interferon responses is associated with autoimmunity, and may also be relevant to sporadic cerebrovascular disease and dementia. We aim to determine whether functional variants in genes involved in type I interferon regulation and signalling are associated with the risk of autoimmunity, stroke, and dementia in a population cohort. Methods We will perform a hypothesis-driven candidate pathway association study of type I interferon-related genes using rare variants in the UK Biobank (UKB). We will manually curate type I interferon regulation and signalling genes from a literature review and Gene Ontology, followed by clinical and functional filtering. Variants of interest will be included based on pre-defined clinical relevance and functional annotations (using LOFTEE, M-CAP and a minor allele frequency <0.1%). The association of variants with 15 clinical and three neuroradiological phenotypes will be assessed with a rare variant genetic risk score and gene-level tests, using a Bonferroni-corrected p-value threshold from the number of genetic units and phenotypes tested. We will explore the association of significant genetic units with 196 additional health-related outcomes to help interpret their relevance and explore the clinical spectrum of genetic perturbations of type I interferon. Ethics and dissemination The UKB has received ethical approval from the North West Multicentre Research Ethics Committee, and all participants provided written informed consent at recruitment. This research will be conducted using the UKB Resource under application number 93160. We expect to disseminate our results in a peer-reviewed journal and at an international cardiovascular conference.
Collapse
Affiliation(s)
- Bastien Rioux
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland, UK
| | - Michael Chong
- Population Health Research Institute, McMaster University, Hamilton, Ontario, Canada
- Thrombosis and Atherosclerosis Research Institute, McMaster University, Hamilton, Ontario, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Rosie Walker
- Department of Psychology, University of Exeter, Exeter, England, UK
| | - Sarah McGlasson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland, UK
| | - Kristiina Rannikmäe
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
| | - Daniel McCartney
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, Scotland, UK
| | - John McCabe
- School of Medicine, University College Dublin, Dublin, Leinster, Ireland
- Department of Medicine for the Elderly, Mater Misericordiae University Hospital, Dublin, Ireland
| | - Robin Brown
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, England, UK
| | - Yanick J. Crow
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, Scotland, UK
- Laboratory of Neurogenetics and Neuroinflammation, Institut Imagine, Université de Paris, Paris, France
| | - David Hunt
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland, UK
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland, UK
- MRC Population Health Unit, Nuffield Department of Population Health, University of Oxford, Oxford, England, UK
| |
Collapse
|
7
|
Long E, Wan P, Chen Q, Lu Z, Choi J. From function to translation: Decoding genetic susceptibility to human diseases via artificial intelligence. CELL GENOMICS 2023; 3:100320. [PMID: 37388909 PMCID: PMC10300605 DOI: 10.1016/j.xgen.2023.100320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
While genome-wide association studies (GWAS) have discovered thousands of disease-associated loci, molecular mechanisms for a considerable fraction of the loci remain to be explored. The logical next steps for post-GWAS are interpreting these genetic associations to understand disease etiology (GWAS functional studies) and translating this knowledge into clinical benefits for the patients (GWAS translational studies). Although various datasets and approaches using functional genomics have been developed to facilitate these studies, significant challenges remain due to data heterogeneity, multiplicity, and high dimensionality. To address these challenges, artificial intelligence (AI) technology has demonstrated considerable promise in decoding complex functional datasets and providing novel biological insights into GWAS findings. This perspective first describes the landmark progress driven by AI in interpreting and translating GWAS findings and then outlines specific challenges followed by actionable recommendations related to data availability, model optimization, and interpretation, as well as ethical concerns.
Collapse
Affiliation(s)
- Erping Long
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peixing Wan
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
8
|
Lee C, Lin J, Prokop A, Gopalakrishnan V, Hanna RN, Papa E, Freeman A, Patel S, Yu W, Huhn M, Sheikh AS, Tan K, Sellman BR, Cohen T, Mangion J, Khan FM, Gusev Y, Shameer K. StarGazer: A Hybrid Intelligence Platform for Drug Target Prioritization and Digital Drug Repositioning Using Streamlit. Front Genet 2022; 13:868015. [PMID: 35711912 PMCID: PMC9197487 DOI: 10.3389/fgene.2022.868015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/29/2022] [Indexed: 01/26/2023] Open
Abstract
Target prioritization is essential for drug discovery and repositioning. Applying computational methods to analyze and process multi-omics data to find new drug targets is a practical approach for achieving this. Despite an increasing number of methods for generating datasets such as genomics, phenomics, and proteomics, attempts to integrate and mine such datasets remain limited in scope. Developing hybrid intelligence solutions that combine human intelligence in the scientific domain and disease biology with the ability to mine multiple databases simultaneously may help augment drug target discovery and identify novel drug-indication associations. We believe that integrating different data sources using a singular numerical scoring system in a hybrid intelligent framework could help to bridge these different omics layers and facilitate rapid drug target prioritization for studies in drug discovery, development or repositioning. Herein, we describe our prototype of the StarGazer pipeline which combines multi-source, multi-omics data with a novel target prioritization scoring system in an interactive Python-based Streamlit dashboard. StarGazer displays target prioritization scores for genes associated with 1844 phenotypic traits, and is available via https://github.com/AstraZeneca/StarGazer.
Collapse
Affiliation(s)
- Chiyun Lee
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Junxia Lin
- Georgetown University, Washington, DC, United States
| | | | | | - Richard N. Hanna
- Early Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States
| | - Eliseo Papa
- Research Data and Analytics, R&D IT, AstraZeneca, Cambridge, United Kingdom
| | - Adrian Freeman
- Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Saleha Patel
- Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Wen Yu
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States
| | - Monika Huhn
- Biometrics and Information Sciences, BioPharmaceuticals R&D, AstraZeneca, Mölndal, Sweden
| | - Abdul-Saboor Sheikh
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Keith Tan
- Neuroscience, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Bret R. Sellman
- Discovery Microbiome, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States
| | - Taylor Cohen
- Discovery Microbiome, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States
| | - Jonathan Mangion
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Faisal M. Khan
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States
| | - Yuriy Gusev
- Georgetown University, Washington, DC, United States
| | - Khader Shameer
- Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, MD, United States,*Correspondence: Khader Shameer,
| |
Collapse
|
9
|
Urbanek-Trzeciak MO, Kozlowski P, Galka-Marciniak P. miRMut: Annotation of mutations in miRNA genes from human whole-exome or whole-genome sequencing. STAR Protoc 2022; 3:101023. [PMID: 34977675 PMCID: PMC8686061 DOI: 10.1016/j.xpro.2021.101023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Here, we present the miRMut protocol to annotate mutations found in miRNA genes based on whole-exome sequencing (WES) or whole-genome sequencing (WGS) results. The pipeline assigns mutation characteristics, including miRNA gene IDs (miRBase and MirGeneDB), mutation localization within the miRNA precursor structure, potential RNA-binding motif disruption, the ascription of mutation according to Human Genome Variation Society (HGVS) nomenclature, and miRNA gene characteristics, such as miRNA gene confidence and miRNA arm balance. The pipeline includes creating tabular and graphical summaries. For complete details on the use and execution of this protocol, please refer to Urbanek-Trzeciak et al. (2020).
Collapse
Affiliation(s)
- Martyna O. Urbanek-Trzeciak
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Piotr Kozlowski
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Paulina Galka-Marciniak
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| |
Collapse
|
10
|
A method for scoring the cell type-specific impacts of noncoding variants in personal genomes. Proc Natl Acad Sci U S A 2020; 117:21364-21372. [PMID: 32817564 PMCID: PMC7474608 DOI: 10.1073/pnas.1922703117] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Here we use the expression and accessibility data from a diverse set of cell types to learn a model for the dependence of the accessibility of a regulatory element on its DNA sequence and TF expression. Using GTEx samples with WGS data, we show that the noncoding variants predicted to affect accessibility are more strongly associated with the expression of nearby genes. To interpret a personal genome, we combine the sequence information with context-specific TF expression to prioritize variants and regulatory elements in any genomic region of interest. This approach should be helpful in the study of risk loci previously identified by GWAS. Results from analysis of height and WGS data from the GTEx project support this hypothesis. A person’s genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. The interpretation of these variants, i.e., the assessment of their potential impact on a person’s phenotype, is currently of great interest in human genetics and medicine. We have developed a prioritization tool called OpenCausal which takes as inputs 1) a personal genome and 2) a reference context-specific TF expression profile and returns a list of noncoding variants prioritized according to their impact on chromatin accessibility for any given genomic region of interest. We applied OpenCausal to 6,430 samples across 18 tissues derived from the GTEx project and found that the variants prioritized by OpenCausal are highly enriched for eQTLs and caQTLs. We further propose a strategy to integrate the predicted open scores with genome-wide association studies (GWAS) data to prioritize putative causal variants and regulatory elements for a given risk locus (i.e., fine-mapping analysis). As an initial example, we applied this method to a GWAS dataset of human height and found that the prioritized putative variants and elements are correlated with the phenotype (i.e., heights of individuals) better than others.
Collapse
|
11
|
Petrini A, Mesiti M, Schubach M, Frasca M, Danis D, Re M, Grossi G, Cappelletti L, Castrignanò T, Robinson PN, Valentini G. parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. Gigascience 2020; 9:giaa052. [PMID: 32444882 PMCID: PMC7244787 DOI: 10.1093/gigascience/giaa052] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 10/31/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.
Collapse
Affiliation(s)
- Alessandro Petrini
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Marco Mesiti
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Max Schubach
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178 Berlin, Germany
- Charité – Universitätsmedizin Berlin, Chariteplatz 1, 10117 Berlin, Germany
| | - Marco Frasca
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington (CT) - 06032, United States of America
| | - Matteo Re
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Giuliano Grossi
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Luca Cappelletti
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Tiziana Castrignanò
- CINECA, SCAI SuperComputing Applications and Innovation Department, Via dei Tizii 6, 00185 Roma, Italy
- University of Tuscia, Department of Ecological and Biological Sciences (DEB), Largo dell'Università snc, 01100 Viterbo, Italy
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington (CT) - 06032, United States of America
| | - Giorgio Valentini
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
- CINI National Laboratory in Artificial Intelligence and Intelligent Systems - AIIS, Università di Roma, Via Ariosto 25, 00185 Roma, Italy
| |
Collapse
|
12
|
Cai M, Ran D, Zhang X. Advances in identifying coding variants of common complex diseases. JOURNAL OF BIO-X RESEARCH 2019. [DOI: 10.1097/jbr.0000000000000046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
13
|
Ariza MJ, Pérez-López C, Almagro F, Sánchez-Tévar AM, Muñiz-Grijalvo O, Álvarez-Sala Walter LA, Rioja J, Sánchez-Chaparro MÁ, Valdivielso P. Genetic variants in the LPL and GPIHBP1 genes, in patients with severe hypertriglyceridaemia, detected with high resolution melting analysis. Clin Chim Acta 2019; 500:163-171. [PMID: 31669931 DOI: 10.1016/j.cca.2019.10.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 10/02/2019] [Accepted: 10/14/2019] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Pathogenic variants in lipoprotein lipase (LPL) and glycosylphosphatidylinositol-anchored high-density lipoprotein-binding protein 1 (GPIHBP1) have been described in patients with severe hypertriglyceridaemia. We aimed to optimise high resolution melting (HRM) assays to detect the presence of functional variants in these genes. METHODS One hundred and sixteen patients with severe hypertriglyceridaemia were studied. HRM assays were optimised to scan exons and splice junctions in LPL and GPIHBP1. Sanger sequencing was the reference method. Next-generation-sequencing (NGS) was performed in five patients, including one with Familial Chylomicronemia syndrome (FCS). RESULTS We identified 15 different variants in LPL and 6 in GPIHBP1. The variants revealed with NGS were also detected with HRM, including a rare premature stop codon in LPL (p.Trp421*) and two LPL pathogenic variants in the patient with FCS (p.His80Arg + p.Gly215Glu). Having multiple functional variant alleles was associated with pancreatitis onset at younger ages and higher baseline triglycerides. CONCLUSIONS Our HRM assays detected the presence of functional gene variants that were confirmed with Sanger and NGS sequencing. The presence of multiple functional variant alleles was associated with differences in the clinical profile. Therefore, these assays represent a reliable, cost-effective tool that can be used to complement the NGS approach for gene scanning.
Collapse
Affiliation(s)
- María José Ariza
- Department of Medicine and Dermatology, Lipids and Atherosclerosis Laboratory, Centro de Investigaciones Médico Sanitarias (CIMES), Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, C/Marqués de Beccaria n° 3, 29010 Málaga, Spain.
| | - Carmen Pérez-López
- Internal Medicine Unit, University Hospital Virgen de la Victoria, Campus de Teatinos, S/N, 29010 Málaga, Spain
| | - Fátima Almagro
- Lipids Unit, Internal Medicine, University Hospital Donostia, San Sebastian, Begiristain Doktorea Pasealekua, 107-115, 20014 Donostia, Gipuzkoa, Spain
| | - Ana María Sánchez-Tévar
- Department of Medicine and Dermatology, Lipids and Atherosclerosis Laboratory, Centro de Investigaciones Médico Sanitarias (CIMES), Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, C/Marqués de Beccaria n° 3, 29010 Málaga, Spain
| | - Ovidio Muñiz-Grijalvo
- UCERV-UCAMI, Internal Medicine Department, University Hospital Virgen del Rocío, Av. Manuel Siurot, S/n, 41013 Sevilla, Spain
| | - Luis Antonio Álvarez-Sala Walter
- Lipids Unit, Internal Medicine, Hospital General Universitario Gregorio Marañón, IiSGM, Calle del Dr. Esquerdo, 46, 28007 Madrid, Spain; Department of Medicine, School of Medicine, Universidad Complutense, Av. Séneca, 2, 28040 Madrid, Spain
| | - José Rioja
- Department of Medicine and Dermatology, Lipids and Atherosclerosis Laboratory, Centro de Investigaciones Médico Sanitarias (CIMES), Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, C/Marqués de Beccaria n° 3, 29010 Málaga, Spain
| | - Miguel Ángel Sánchez-Chaparro
- Department of Medicine and Dermatology, Lipids and Atherosclerosis Laboratory, Centro de Investigaciones Médico Sanitarias (CIMES), Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, C/Marqués de Beccaria n° 3, 29010 Málaga, Spain; Internal Medicine Unit, University Hospital Virgen de la Victoria, Campus de Teatinos, S/N, 29010 Málaga, Spain
| | - Pedro Valdivielso
- Department of Medicine and Dermatology, Lipids and Atherosclerosis Laboratory, Centro de Investigaciones Médico Sanitarias (CIMES), Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, C/Marqués de Beccaria n° 3, 29010 Málaga, Spain; Internal Medicine Unit, University Hospital Virgen de la Victoria, Campus de Teatinos, S/N, 29010 Málaga, Spain
| |
Collapse
|
14
|
Müller H, Jimenez-Heredia R, Krolo A, Hirschmugl T, Dmytrus J, Boztug K, Bock C. VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data. Nucleic Acids Res 2019; 45:W567-W572. [PMID: 28520890 PMCID: PMC5570181 DOI: 10.1093/nar/gkx425] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 05/04/2017] [Indexed: 02/07/2023] Open
Abstract
Next generation sequencing is widely used to link genetic variants to diseases, and it has massively accelerated the diagnosis and characterization of rare genetic diseases. After initial bioinformatic data processing, the interactive analysis of genome, exome, and panel sequencing data typically starts from lists of genetic variants in VCF format. Medical geneticists filter and annotate these lists to identify variants that may be relevant for the disease under investigation, or to select variants that are reported in a clinical diagnostics setting. We developed VCF.Filter to facilitate the search for disease-linked variants, providing a standalone Java program with a user-friendly interface for interactive variant filtering and annotation. VCF.Filter allows the user to define a broad range of filtering criteria through a graphical interface. Common workflows such as trio analysis and cohort-based filtering are pre-configured, and more complex analyses can be performed using VCF.Filter's support for custom annotations and filtering criteria. All filtering is documented in the results file, thus providing traceability of the interactive variant prioritization. VCF.Filter is an open source tool that is freely and openly available at http://vcffilter.rarediseases.at.
Collapse
Affiliation(s)
- Heiko Müller
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,Fondazione Istituto Italiano di Tecnologia, 16163 Genoa, Italy
| | - Raul Jimenez-Heredia
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Ana Krolo
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Tatjana Hirschmugl
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Jasmin Dmytrus
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Kaan Boztug
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria.,Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, 1090 Vienna, Austria.,St. Anna Kinderspital and Children's Cancer Research Institute, Department of Pediatrics, Medical University of Vienna, 1090 Vienna, Austria
| | - Christoph Bock
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria.,Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| |
Collapse
|
15
|
Abstract
Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers’ experimental work builds upon years and (collectively) billions of dollars’ worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources’ development and management.
Collapse
|
16
|
Shameer K, Johnson KW, Glicksberg BS, Dudley JT, Sengupta PP. Machine learning in cardiovascular medicine: are we there yet? Heart 2018; 104:1156-1164. [PMID: 29352006 DOI: 10.1136/heartjnl-2017-311198] [Citation(s) in RCA: 240] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 12/19/2017] [Accepted: 12/21/2017] [Indexed: 12/11/2022] Open
Abstract
Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine.
Collapse
Affiliation(s)
- Khader Shameer
- Departments of Medical Informatics and Research Informatics, Northwell Health, Great Neck, New York, USA.,Institute for Next Generation Healthcare, Mount Sinai Health System, New York City, New York, USA.,Icahn Institute for Genomics and Multiscale Biology, Mount Sinai Health System, New York City, New York, USA.,Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York City, New York, USA.,Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA.,Center for Research Informatics and Innovation, Northwell Health, New Hyde Park, NY, USA
| | - Kipp W Johnson
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York City, New York, USA.,Icahn Institute for Genomics and Multiscale Biology, Mount Sinai Health System, New York City, New York, USA.,Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York City, New York, USA.,Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Benjamin S Glicksberg
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York City, New York, USA.,Icahn Institute for Genomics and Multiscale Biology, Mount Sinai Health System, New York City, New York, USA.,Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York City, New York, USA.,Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA.,Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York City, New York, USA.,Icahn Institute for Genomics and Multiscale Biology, Mount Sinai Health System, New York City, New York, USA.,Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York City, New York, USA.,Icahn School of Medicine at Mount Sinai, Mount Sinai Health System, New York City, New York, USA
| | - Partho P Sengupta
- Division of Cardiology, West Virginia Heart and Vascular Institute, Morgantown, West Virginia, USA
| |
Collapse
|
17
|
Shameer K, Nayarisseri A, Duran FXR, González-Díaz H. Editorial: Improving Neuropharmacology using Big Data, Machine Learning and Computational Algorithms. Curr Neuropharmacol 2017; 15:1058-1061. [PMID: 29199918 PMCID: PMC5725537 DOI: 10.2174/1570159x1508171114113425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Khader Shameer
- Institute of Next Generation Healthcare (INGH), Icahn Institute of Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Mount Sinai Health System, USA
| | - Anuraj Nayarisseri
- Bioinformatics Research Laboratory, Eminent Biosciences, Vijaynagar, Indore-, India
- In silico Research Laboratory, Legene Biosciences, Vijaynagar, Indore-, India
| | | | - Humberto González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, , Spain
| |
Collapse
|
18
|
Cheng SJ, Shi FY, Liu H, Ding Y, Jiang S, Liang N, Gao G. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Res 2017; 45:e82. [PMID: 28158838 PMCID: PMC5449550 DOI: 10.1093/nar/gkx041] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 01/24/2017] [Indexed: 02/07/2023] Open
Abstract
In genomics, effectively identifying the biological effects of genetic variants is crucial. Current methods handle each variant independently, assuming that each variant acts in a context-free manner. However, variants within the same gene may interfere with each other, producing combinational (compound) rather than individual effects. In this work, we introduce COPE, a gene-centric variant annotation tool that integrates the entire sequential context in evaluating the functional effects of intra-genic variants. Applying COPE to the 1000 Genomes dataset, we identified numerous cases of multiple-variant compound effects that frequently led to false-positive and false-negative loss-of-function calls by conventional variant-centric tools. Specifically, 64 disease-causing mutations were identified to be rescued in a specific genomic context, thus potentially contributing to the buffering effects for highly penetrant deleterious mutations. COPE is freely available for academic use at http://cope.cbi.pku.edu.cn.
Collapse
Affiliation(s)
- Si-Jin Cheng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Huan Liu
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Yang Ding
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Shuai Jiang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
19
|
van Ooijen MP, Jong VL, Eijkemans MJC, Heck AJR, Andeweg AC, Binai NA, van den Ham HJ. Identification of differentially expressed peptides in high-throughput proteomics data. Brief Bioinform 2017; 19:971-981. [DOI: 10.1093/bib/bbx031] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Indexed: 12/25/2022] Open
Affiliation(s)
| | - Victor L Jong
- Department of Biostatistics and Research Support, Julius Center, UMC Utrecht, Netherlands
| | - Marinus J C Eijkemans
- Julius Center for Health Sciences and Primary Care of the University Medical Center Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Netherlands
| | - Arno C Andeweg
- Department of Viroscience, Erasmus MC, CA Rotterdam, Netherlands
| | - Nadine A Binai
- Biomolecular Mass Spectrometry Group, Utrecht University, Netherlands
| | | |
Collapse
|
20
|
Uversky VN. Intrinsically disordered proteins in overcrowded milieu: Membrane-less organelles, phase separation, and intrinsic disorder. Curr Opin Struct Biol 2016; 44:18-30. [PMID: 27838525 DOI: 10.1016/j.sbi.2016.10.015] [Citation(s) in RCA: 478] [Impact Index Per Article: 53.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Revised: 10/08/2016] [Accepted: 10/25/2016] [Indexed: 12/22/2022]
Abstract
Although the cellular interior is crowded with various biological macromolecules, the distribution of these macromolecules is highly inhomogeneous. Eukaryotic cells contain numerous proteinaceous membrane-less organelles (PMLOs), which are condensed liquid droplets formed as a result of the reversible and highly controlled liquid-liquid phase transitions. The interior of these cellular bodies represents an overcrowded milieu, since their protein concentrations are noticeably higher than those of the crowded cytoplasm and nucleoplasm. PMLOs are different in size, shape, and composition, and almost invariantly contain intrinsically disordered proteins (e.g., eIF4B and TDP43 in stress granules, TTP in P-bodies, RDE-12 in nuage, RNG105 in RNA granules, centrins in centrosomes, NOPP140 in nucleoli, SRSF4 in nuclear speckles, Saf-B in nuclear stress bodies, NOLC1 in Cajal bodies, CBP in PML nuclear bodies, SOX9 in paraspeckles, KSRP in perinucleolar compartment, and hnRNPG and Sam68 in Sam68 nuclear body, to name a few), which indicates that the formation of these phase-separated droplets is crucially dependent on intrinsic disorder. The goal of this review is to show the roles of intrinsic disorder in the magic behind biological liquid-liquid phase transitions that lead to the formation of PMLOs.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA; Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russian Federation.
| |
Collapse
|