Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gruenstaeudl M, Jenke N. PACVr: plastome assembly coverage visualization in R. BMC Bioinformatics 2020;21:207. [PMID: 32448146 PMCID: PMC7245912 DOI: 10.1186/s12859-020-3475-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 03/31/2020] [Indexed: 11/10/2022] Open

For:	Gruenstaeudl M, Jenke N. PACVr: plastome assembly coverage visualization in R. BMC Bioinformatics 2020;21:207. [PMID: 32448146 PMCID: PMC7245912 DOI: 10.1186/s12859-020-3475-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 03/31/2020] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Moffett AS, Falcón-Cortés A, Di Pierro M. Quantifying the influence of genetic context on duplicated mammalian genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.03.647042. [PMID: 40236061 PMCID: PMC11996522 DOI: 10.1101/2025.04.03.647042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]

Li X, Nguyen J, Korkut A. Recurrent Composite Markers of Cell Types and States. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.07.17.549344. [PMID: 37503180 PMCID: PMC10370072 DOI: 10.1101/2023.07.17.549344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Kuluev AR, Matniyazov RT, Kuluev BR, Chemeris DA, Chemeris AV. Complete chloroplast genomes of five Aegilops aucheri Boiss. accessions having different geographical origins. Mitochondrial DNA A DNA Mapp Seq Anal 2025;35:119-125. [PMID: 40074559 DOI: 10.1080/24701394.2025.2476401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/03/2025] [Indexed: 03/14/2025]

Li Y, Xiao P, Boadu F, Goldkamp AK, Nirgude S, Cheng J, Hagen DE, Kalish JM, Rivera RM. Beckwith-Wiedemann syndrome and large offspring syndrome involve alterations in methylome, transcriptome, and chromatin configuration. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2023.12.14.23299981. [PMID: 38168424 PMCID: PMC10760283 DOI: 10.1101/2023.12.14.23299981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Satas G, Myers MA, McPherson A, Shah SP. Inferring active mutational processes in cancer using single cell sequencing and evolutionary constraints. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639589. [PMID: 40060559 PMCID: PMC11888314 DOI: 10.1101/2025.02.24.639589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/17/2025]

Abstract

Ongoing mutagenesis in cancer drives genetic diversity throughout the natural history of cancers. As the activities of mutational processes are dynamic throughout evolution, distinguishing the mutational signatures of 'active' and 'historical' processes has important implications for studying how tumors evolve. This can aid in understanding mutagenic states at the time of presentation, and in associating active mutational process with therapeutic resistance. As bulk sequencing primarily captures historical mutational processes, we studied whether ultra-low-coverage single-cell whole-genome sequencing (scWGS), which measures the distribution of mutations across hundreds or thousands of individual cells, could enable the distinction between historical and active mutational processes. While technical challenges and data sparsity have limited mutation analysis in scWGS, we show that these data contain valuable information about dynamic mutational processes. To robustly interpret single nucleotide variants (SNVs) in scWGS, we introduce ArtiCull, a method to identify and remove SNV artifacts by leveraging evolutionary constraints, enabling reliable detection of mutations for signature analysis. Applying this approach to scWGS data from pancreatic ductal adenocarcinoma (PDAC), triple-negative breast cancer (TNBC), and high-grade serous ovarian cancer (HGSOC), we uncover temporal and spatial patterns in mutational processes. In PDAC, we observe a temporal increase in mismatch repair deficiency (MMRd). In cisplatin-treated TNBC patient-derived xenografts, we identify therapy-induced mutagenesis and inactivation of APOBEC3 activity. In HGSOC, we show distinct patterns of APOBEC3 mutagenesis, including late tumor-wide activation in one case and clade-specific enrichment in another. Additionally, we detect a clone-specific increase in SBS17 activity, in a clone previously linked to recurrence. Our findings establish ultra-low-coverage scWGS as a powerful approach for studying active mutational processes that may influence ongoing clonal evolution and therapeutic resistance.

Collapse

Kumari P, Friedman RZ, Pi L, Curtis SW, Paraiso K, Visel A, Rhea L, Dunnwald M, Patni AP, Mar D, Bomsztyk K, Mathieu J, Ruohola-Baker H, Leslie EJ, White MA, Cohen BA, Cornell RA. Identification of functional non-coding variants associated with orofacial cleft. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.01.596914. [PMID: 40027800 PMCID: PMC11870446 DOI: 10.1101/2024.06.01.596914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]

Sant C, Mucke L, Corces MR. CHOIR improves significance-based detection of cell types and states from single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.01.18.576317. [PMID: 38328105 PMCID: PMC10849522 DOI: 10.1101/2024.01.18.576317] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]

Czech E, Millar TR, Tyler W, White T, Elsworth B, Guez J, Hancox J, Jeffery B, Karczewski KJ, Miles A, Tallman S, Unneberg P, Wojdyla R, Zabad S, Hammerbacher J, Kelleher J. Analysis-ready VCF at Biobank scale using Zarr. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.11.598241. [PMID: 38915693 PMCID: PMC11195102 DOI: 10.1101/2024.06.11.598241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]

Abstract

Background

Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of the VCF data model (either as text or packed binary) emphasises efficient retrieval of all data for a given variant, but accessing data on a field or sample basis is inefficient. Biobank scale datasets currently available consist of hundreds of thousands of whole genomes and hundreds of terabytes of compressed VCF. Row-wise data storage is fundamentally unsuitable and a more scalable approach is needed.

Results

Zarr is a format for storing multi-dimensional data that is widely used across the sciences, and is ideally suited to massively parallel processing. We present the VCF Zarr specification, an encoding of the VCF data model using Zarr, along with fundamental software infrastructure for efficient and reliable conversion at scale. We show how this format is far more efficient than standard VCF based approaches, and competitive with specialised methods for storing genotype data in terms of compression ratios and single-threaded calculation performance. We present case studies on subsets of three large human datasets (Genomics England: n=78,195; Our Future Health: n=651,050; All of Us: n=245,394) along with whole genome datasets for Norway Spruce (n=1,063) and SARS-CoV-2 (n=4,484,157). We demonstrate the potential for VCF Zarr to enable a new generation of high-performance and cost-effective applications via illustrative examples using cloud computing and GPUs.

Conclusions

Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely-used, open-source technologies has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores, while maintaining compatibility with existing file-oriented workflows.

Collapse

Affiliation(s)

Eric Czech Open Athena AI Foundation, Lincoln, New Zealand Related Sciences, Lincoln, New Zealand
Timothy R. Millar The New Zealand Institute for Plant & Food Research Ltd, Lincoln, New Zealand Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
Will Tyler Independent researcher, Manchester, UK
Tom White Tom White Consulting Ltd., Manchester, UK
Benjamin Elsworth Our Future Health, Manchester, UK
Jérémy Guez Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
Jonny Hancox NVIDIA Ltd, Reading, UK
Ben Jeffery Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
Konrad J. Karczewski Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Alistair Miles Wellcome Sanger Institute, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Sam Tallman Genomics England, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Per Unneberg Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Rafal Wojdyla Open Athena AI Foundation, Lincoln, New Zealand
Shadi Zabad School of Computer Science, McGill University, Montreal, QC, Canada
Jeff Hammerbacher Open Athena AI Foundation, Lincoln, New Zealand Related Sciences, Lincoln, New Zealand
Jerome Kelleher Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK

Collapse

Koyyalagunta D, Ganesh K, Morris Q. Inferring cancer type-specific patterns of metastatic spread using Metient. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.07.09.602790. [PMID: 39282311 PMCID: PMC11398359 DOI: 10.1101/2024.07.09.602790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]

Li Q, Nichols C, Welner RS, Chen JY, Ku WS, Yue Z. Toden-E: Topology-Based and Density-Based Ensembled Clustering for the Development of Super-PAG in Functional Genomics using PAG Network and LLM. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.20.619308. [PMID: 39484450 PMCID: PMC11526983 DOI: 10.1101/2024.10.20.619308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]

Abstract

The integrative analysis of gene sets, networks, and pathways is pivotal for deciphering omics data in translational biomedical research. To significantly increase gene coverage and enhance the utility of pathways, annotated gene lists, and gene signatures from diverse sources, we introduced pathways, annotated gene lists, and gene signatures (PAGs) enriched with metadata to represent biological functions. Furthermore, we established PAG-PAG networks by leveraging gene member similarity and gene regulations. However, in practice, high similarity in functional descriptions or gene membership often leads to redundant PAGs, hindering the interpretation from a fuzzy enriched PAG list. In this study, we developed todenE (topology-based and density-based ensemble) clustering, pioneering in integrating topology-based and density-based clustering methods to detect PAG communities leveraging the PAG network and Large Language Models (LLM). In computational genomics annotation, the genes can be grouped/clustered through the gene relationships and gene functions via guilt by association. Similarly, PAGs can be grouped into higher-level clusters, forming concise functional representations called Super-PAGs. TodenE captures PAG-PAG similarity and encapsulates functional information through LLM, in characterizing network-based functional Super-PAGs. In synthetic data, we introduced a metric called the Disparity Index (DI), measuring the connectivity of gene neighbors to gauge clusterability. We compared multiple clustering algorithms to identify the best method for generating performance-driven clusters. In non-simulated data (Gene Ontology), by leveraging transfer learning and LLM, we formed a language-based similarity embedding. TodenE utilizes this embedding together with the topology-based embedding to generate putative Super-PAGs with superior performance in semantic and gene member inclusiveness.

Collapse

Turner TC, Pittman FS, Zhang H, Hymel LA, Zheng T, Behara M, Anderson SE, Harrer JA, Link KA, Ahammed MA, Maner-Smith K, Liu X, Yin X, Lim HS, Spite M, Qiu P, García AJ, Mortensen LJ, Jang YC, Willett NJ, Botchwey EA. Improving Functional Muscle Regeneration in Volumetric Muscle Loss Injuries by Shifting the Balance of Inflammatory and Pro-Resolving Lipid Mediators. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.06.611741. [PMID: 39314313 PMCID: PMC11418947 DOI: 10.1101/2024.09.06.611741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]

Sampaio IW, Tassi E, Bellani M, Benedetti F, Nenadic I, Phillips M, Piras F, Yatham L, Bianchi AM, Brambilla P, Maggioni E. A generalizable normative deep autoencoder for brain morphological anomaly detection: application to the multi-site StratiBip dataset on bipolar disorder in an external validation framework. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.04.611239. [PMID: 39282436 PMCID: PMC11398360 DOI: 10.1101/2024.09.04.611239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2024]

Lehle JD, Lin YH, Gomez A, Chavez L, McCarrey JR. Endocrine disruptor-induced epimutagenesis in vitro : Insight into molecular mechanisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574355. [PMID: 38746310 PMCID: PMC11092511 DOI: 10.1101/2024.01.05.574355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Abstract

Endocrine disrupting chemicals (EDCs) such as bisphenol S (BPS) are xenobiotic compounds that can disrupt endocrine signaling following exposure due to steric similarities to endogenous hormones within the body. EDCs have been shown to induce disruptions in normal epigenetic programming (epimutations) that accompany dysregulation of normal gene expression patterns that appear to predispose disease states. Most interestingly, the prevalence of epimutations following exposure to many different EDCs often persists over multiple subsequent generations, even with no further exposure to the causative EDC. Many previous studies have described both the direct and prolonged effects of EDC exposure in animal models, but many questions remain about molecular mechanisms by which EDCs initially induce epimutations or contribute to the propagation of EDC-induced epimutations either within the exposed generation or to subsequent generations. Additional questions remain regarding the extent to which there may be differences in cell-type specific susceptibilities to various EDCs, and whether this susceptibility is correlative with expression of relevant hormone receptors and/or the location of relevant hormone response elements (HREs) in the genome. To address these questions, we exposed cultured mouse pluripotent (induced pluripotent stem [iPS]), somatic (Sertoli and granulosa), and germ (primordial germ cell like [PGCLC]) cells to BPS and measured changes in DNA methylation levels at the epigenomic level and gene expression at the transcriptomic level. We found that there was indeed a difference in cell-type specific susceptibility to EDC-induced epimutagenesis and that this susceptibility correlated with differential expression of relevant hormone receptors and, in many cases, tended to generate epimutations near relevant HREs within the genome. Additionally, however, we also found that BPS can induce epimutations in a cell type that does not express relevant receptors and in genomic regions that do not contain relevant HREs, suggesting that both canonical and non-canonical signaling mechanisms can be disrupted by BPS exposure. Most interestingly, we found that when iPS cells were exposed to BPS and then induced to differentiate into PGCLCs, the prevalence of epimutations and differentially expressed genes (DEGs) initially induced in the iPSCs was largely retained in the resulting PGCLCs, however, >90% of the specific epimutations and DEGs were not conserved but were rather replaced by novel epimutations and DEGs following the iPSC to PGCLC transition. These results are consistent with a unique concept that many EDC-induced epimutations may normally be corrected by germline and/or embryonic epigenetic reprogramming but that due to disruption of the underlying chromatin architecture induced by the EDC exposure, many novel epimutations may emerge during the reprogramming process as well. Thus, it appears that following exposure to a disruptive agent such as an EDC, a prevalence of epimutations may transcend epigenetic reprogramming even though most individual epimutations are not conserved during this process.

Collapse

Liu Y, Carbonetto P, Willwerscheid J, Oakes SA, Macleod KF, Stephens M. Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.15.553436. [PMID: 37645713 PMCID: PMC10462040 DOI: 10.1101/2023.08.15.553436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Belica CA, Carpenter MA, Chen Y, Brown WL, Moeller NH, Boylan IT, Harris RS, Aihara H. A real-time biochemical assay for quantitative analyses of APOBEC-catalyzed DNA deamination. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.11.593688. [PMID: 38766133 PMCID: PMC11100776 DOI: 10.1101/2024.05.11.593688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]

Affiliation(s)

Christopher A. Belica Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA, 55455 Institute for Molecular Virology, University of Minnesota, Minneapolis, Minnesota, 55455, USA Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, 55455, USA
Michael A. Carpenter Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, Texas, 78229, USA Howard Hughes Medical Institute, University of Texas Health San Antonio, San Antonio, Texas, 78229, USA
Yanjun Chen Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, Texas, 78229, USA
William L. Brown Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA, 55455 Institute for Molecular Virology, University of Minnesota, Minneapolis, Minnesota, 55455, USA Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, 55455, USA
Nicholas H. Moeller Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA, 55455 Institute for Molecular Virology, University of Minnesota, Minneapolis, Minnesota, 55455, USA Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, 55455, USA
Ian T. Boylan Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA, 55455 Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, 55455, USA
Reuben S. Harris Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, Texas, 78229, USA Howard Hughes Medical Institute, University of Texas Health San Antonio, San Antonio, Texas, 78229, USA
Hideki Aihara Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA, 55455 Institute for Molecular Virology, University of Minnesota, Minneapolis, Minnesota, 55455, USA Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, 55455, USA

Collapse

Tagami D, Bisschop G, Kelleher J. tstrait: a quantitative trait simulator for ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.13.584790. [PMID: 38559118 PMCID: PMC10980058 DOI: 10.1101/2024.03.13.584790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Zeng X, Ding Y, Zhang Y, Uddin MR, Dabouei A, Xu M. DUAL: deep unsupervised simultaneous simulation and denoising for cryo-electron tomography. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.02.583135. [PMID: 38496657 PMCID: PMC10942334 DOI: 10.1101/2024.03.02.583135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]

Nwizu C, Hughes M, Ramseier ML, Navia AW, Shalek AK, Fusi N, Raghavan S, Winter PS, Amini AP, Crawford L. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579839. [PMID: 38405697 PMCID: PMC10888887 DOI: 10.1101/2024.02.11.579839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Affiliation(s)

Chibuikem Nwizu Center for Computational Molecular Biology, Brown University, Providence, RI, USA Warren Alpert Medical School of Brown University, Providence, RI, USA
Madeline Hughes Microsoft Research, Cambridge, MA, USA
Michelle L. Ramseier Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Andrew W. Navia Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Alex K. Shalek Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Harvard Medical School, Boston, MA, USA Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
Nicolo Fusi Microsoft Research, Cambridge, MA, USA
Srivatsan Raghavan Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA Harvard Medical School, Boston, MA, USA Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Peter S. Winter Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Ava P. Amini Microsoft Research, Cambridge, MA, USA
Lorin Crawford Center for Computational Molecular Biology, Brown University, Providence, RI, USA Microsoft Research, Cambridge, MA, USA Department of Biostatistics, Brown University, Providence, RI, USA

Collapse

Wang Z, Zhan Q, Yang S, Mu S, Chen J, Garai S, Orzechowski P, Wagenaar J, Shen L. QOT: Efficient Computation of Sample Level Distance Matrix from Single-Cell Omics Data through Quantized Optimal Transport. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.578032. [PMID: 38370767 PMCID: PMC10871252 DOI: 10.1101/2024.02.06.578032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]

Muller E, Shiryan I, Borenstein E. Multi-omic integration of microbiome data for identifying disease-associated modules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.03.547607. [PMID: 37461534 PMCID: PMC10349976 DOI: 10.1101/2023.07.03.547607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]

Abstract

The human gut microbiome is a complex ecosystem with profound implications for health and disease. This recognition has led to a surge in multi-omic microbiome studies, employing various molecular assays to elucidate the microbiome's role in diseases across multiple functional layers. However, despite the clear value of these multi-omic datasets, rigorous integrative analysis of such data poses significant challenges, hindering a comprehensive understanding of microbiome-disease interactions. Perhaps most notably, multiple approaches, including univariate and multivariate analyses, as well as machine learning, have been applied to such data to identify disease-associated markers, namely, specific features (e.g., species, pathways, metabolites) that are significantly altered in disease state. These methods, however, often yield extensive lists of features associated with the disease without effectively capturing the multi-layered structure of multi-omic data or offering clear, interpretable hypotheses about underlying microbiome-disease mechanisms. Here, we address this challenge by introducing MintTea - an intermediate integration-based method for analyzing multi-omic microbiome data. MintTea combines a canonical correlation analysis (CCA) extension, consensus analysis, and an evaluation protocol to robustly identify disease-associated multi-omic modules. Each such module consists of a set of features from the various omics that both shift in concord, and collectively associate with the disease. Applying MintTea to diverse case-control cohorts with multi-omic data, we show that this framework is able to capture modules with high predictive power for disease, significant cross-omic correlations, and alignment with known microbiome-disease associations. For example, analyzing samples from a metabolic syndrome (MS) study, we found a MS-associated module comprising of a highly correlated cluster of serum glutamate- and TCA cycle-related metabolites, as well as bacterial species previously implicated in insulin resistance. In another cohort, we identified a module associated with late-stage colorectal cancer, featuring Peptostreptococcus and Gemella species and several fecal amino acids, in agreement with these species' reported role in the metabolism of these amino acids and their coordinated increase in abundance during disease development. Finally, comparing modules identified in different datasets, we detected multiple significant overlaps, suggesting common interactions between microbiome features. Combined, this work serves as a proof of concept for the potential benefits of advanced integration methods in generating integrated multi-omic hypotheses underlying microbiome-disease interactions and a promising avenue for researchers seeking systems-level insights into coherent mechanisms governing microbiome-related diseases.

Collapse

Adelus ML, Ding J, Tran BT, Conklin AC, Golebiewski AK, Stolze LK, Whalen MB, Cusanovich DA, Romanoski CE. Single cell 'omic profiles of human aortic endothelial cells in vitro and human atherosclerotic lesions ex vivo reveals heterogeneity of endothelial subtype and response to activating perturbations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.03.535495. [PMID: 37066416 PMCID: PMC10104082 DOI: 10.1101/2023.04.03.535495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]

Abstract

Objective

Endothelial cells (ECs), macrophages, and vascular smooth muscle cells (VSMCs) are major cell types in atherosclerosis progression, and heterogeneity in EC sub-phenotypes are becoming increasingly appreciated. Still, studies quantifying EC heterogeneity across whole transcriptomes and epigenomes in both in vitro and in vivo models are lacking.

Approach and Results

To create an in vitro dataset to study human EC heterogeneity, multiomic profiling concurrently measuring transcriptomes and accessible chromatin in the same single cells was performed on six distinct primary cultures of human aortic ECs (HAECs). To model pro-inflammatory and activating environments characteristic of the atherosclerotic microenvironment in vitro, HAECs from at least three donors were exposed to three distinct perturbations with their respective controls: transforming growth factor beta-2 (TGFB2), interleukin-1 beta (IL1B), and siRNA-mediated knock-down of the endothelial transcription factor ERG (siERG). To form a comprehensive in vivo/ex vivo dataset of human atherosclerotic cell types, meta-analysis of single cell transcriptomes across 17 human arterial specimens was performed. Two computational approaches quantitatively evaluated the similarity in molecular profiles between heterogeneous in vitro and in vivo cell profiles. HAEC cultures were reproducibly populated by 4 major clusters with distinct pathway enrichment profiles: EC1-angiogenic, EC2-proliferative, EC3-activated/mesenchymal-like, and EC4-mesenchymal. Exposure to siERG, IL1B or TGFB2 elicited mostly distinct transcriptional and accessible chromatin responses. EC1 and EC2, the most canonically 'healthy' EC populations, were affected predominantly by siERG; the activated cluster EC3 was most responsive to IL1B; and the mesenchymal population EC4 was most affected by TGFB2. Quantitative comparisons between in vitro and in vivo transcriptomes confirmed EC1 and EC2 as most canonically EC-like, and EC4 as most mesenchymal with minimal effects elicited by siERG and IL1B. Lastly, accessible chromatin regions unique to EC2 and EC4 were most enriched for coronary artery disease (CAD)-associated SNPs from GWAS, suggesting these cell phenotypes harbor CAD-modulating mechanisms.

Conclusion

Primary EC cultures contain markedly heterogeneous cell subtypes defined by their molecular profiles. Surprisingly, the perturbations used here, which have been reported by others to be involved in the pathogenesis of atherosclerosis as well as induce endothelial-to-mesenchymal transition (EndMT), only modestly shifted cells between subpopulations, suggesting relatively stable molecular phenotypes in culture. Identifying consistently heterogeneous EC subpopulations between in vitro and in vivo models should pave the way for improving in vitro systems while enabling the mechanisms governing heterogeneous cell state decisions.

Collapse

Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Shen Y, Yu L, Qiu Y, Zhang T, Kingsford C. Improving Hi-C contact matrices using genome graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566275. [PMID: 37986943 PMCID: PMC10659349 DOI: 10.1101/2023.11.08.566275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Dhakal A, Gyawali R, Wang L, Cheng J. CryoTransformer: A Transformer Model for Picking Protein Particles from Cryo-EM Micrographs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.19.563155. [PMID: 37961171 PMCID: PMC10634673 DOI: 10.1101/2023.10.19.563155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Hristov BH, Noble WS, Bertero A. Systematic identification of inter-chromosomal interaction networks supports the existence of RNA factories. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558852. [PMID: 37790381 PMCID: PMC10542540 DOI: 10.1101/2023.09.21.558852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]

Alston JJ, Soranno A, Holehouse AS. Conserved molecular recognition by an intrinsically disordered region in the absence of sequence conservation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.06.552128. [PMID: 37609146 PMCID: PMC10441348 DOI: 10.1101/2023.08.06.552128] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Hope J, Beckerle T, Cheng PH, Viavattine Z, Feldkamp M, Fausner S, Saxena K, Ko E, Hryb I, Carter R, Ebner T, Kodandaramaiah S. Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.04.543578. [PMID: 37333228 PMCID: PMC10274744 DOI: 10.1101/2023.06.04.543578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]

Abstract

Complex behaviors are mediated by neural computations occurring throughout the brain. In recent years, tremendous progress has been made in developing technologies that can record neural activity at cellular resolution at multiple spatial and temporal scales. However, these technologies are primarily designed for studying the mammalian brain during head fixation - wherein the behavior of the animal is highly constrained. Miniaturized devices for studying neural activity in freely behaving animals are largely confined to recording from small brain regions owing to performance limitations. We present a cranial exoskeleton that assists mice in maneuvering neural recording headstages that are orders of magnitude larger and heavier than the mice, while they navigate physical behavioral environments. Force sensors embedded within the headstage are used to detect the mouse's milli-Newton scale cranial forces which then control the x, y, and yaw motion of the exoskeleton via an admittance controller. We discovered optimal controller tuning parameters that enable mice to locomote at physiologically realistic velocities and accelerations while maintaining natural walking gait. Mice maneuvering headstages weighing up to 1.5 kg can make turns, navigate 2D arenas, and perform a navigational decision-making task with the same performance as when freely behaving. We designed an imaging headstage and an electrophysiology headstage for the cranial exoskeleton to record brain-wide neural activity in mice navigating 2D arenas. The imaging headstage enabled recordings of Ca2+ activity of 1000s of neurons distributed across the dorsal cortex. The electrophysiology headstage supported independent control of up to 4 silicon probes, enabling simultaneous recordings from 100s of neurons across multiple brain regions and multiple days. Cranial exoskeletons provide flexible platforms for largescale neural recording during the exploration of physical spaces, a critical new paradigm for unraveling the brain-wide neural mechanisms that control complex behavior.

Collapse

Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh WG, Zhang R. Repurposing Non-pharmacological Interventions for Alzheimer's Diseases through Link Prediction on Biomedical Literature. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.15.23290002. [PMID: 37292731 PMCID: PMC10246059 DOI: 10.1101/2023.05.15.23290002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

Recently, computational drug repurposing has emerged as a promising method for identifying new pharmaceutical interventions (PI) for Alzheimer's Disease (AD). Non-pharmaceutical interventions (NPI), such as Vitamin E and Music therapy, have great potential to improve cognitive function and slow the progression of AD, but have largely been unexplored. This study predicts novel NPIs for AD through link prediction on our developed biomedical knowledge graph. We constructed a comprehensive knowledge graph containing AD concepts and various potential interventions, called ADInt, by integrating a dietary supplement domain knowledge graph, SuppKG, with semantic relations from SemMedDB database. Four knowledge graph embedding models (TransE, RotatE, DistMult and ComplEX) and two graph convolutional network models (R-GCN and CompGCN) were compared to learn the representation of ADInt. R-GCN outperformed other models by evaluating on the time slice test set and the clinical trial test set and was used to generate the score tables of the link prediction task. Discovery patterns were applied to generate mechanism pathways for high scoring triples. Our ADInt had 162,213 nodes and 1,017,319 edges. The graph convolutional network model, R-GCN, performed best in both the Time Slicing test set (MR = 7.099, MRR = 0.5007, Hits@1 = 0.4112, Hits@3 = 0.5058, Hits@10 = 0.6804) and the Clinical Trials test set (MR = 1.731, MRR = 0.8582, Hits@1 = 0.7906, Hits@3 = 0.9033, Hits@10 = 0.9848). Among high scoring triples in the link prediction results, we found the plausible mechanism pathways of (Photodynamic therapy, PREVENTS, Alzheimer's Disease) and (Choerospondias axillaris, PREVENTS, Alzheimer's Disease) by discovery patterns and discussed them further. In conclusion, we presented a novel methodology to extend an existing knowledge graph and discover NPIs (dietary supplements (DS) and complementary and integrative health (CIH)) for AD. We used discovery patterns to find mechanisms for predicted triples to solve the poor interpretability of artificial neural networks. Our method can potentially be applied to other clinical problems, such as discovering drug adverse reactions and drug-drug interactions.

Collapse

Dhakal A, Gyawali R, Wang L, Cheng J. CryoPPP: A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.21.529443. [PMID: 36865277 PMCID: PMC9980126 DOI: 10.1101/2023.02.21.529443] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]

Zhang J, Singh R. Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.24.525447. [PMID: 36747724 PMCID: PMC9900775 DOI: 10.1101/2023.01.24.525447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Kinnaman MD, Zaccaria S, Makohon-Moore A, Arnold B, Levine M, Gundem G, Ossa JEA, Glodzik D, Rodríguez-Sánchez MI, Bouvier N, Li S, Stockfisch E, Dunigan M, Cobbs C, Bhanot U, You D, Mullen K, Melchor J, Ortiz MV, O'Donohue T, Slotkin E, Wexler LH, Dela Cruz FS, Hameed M, Glade Bender JL, Tap WD, Meyers PA, Papaemmanuil E, Kung AL, Iacobuzio-Donahue CA. Subclonal somatic copy number alterations emerge and dominate in recurrent osteosarcoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522765. [PMID: 36711976 PMCID: PMC9881990 DOI: 10.1101/2023.01.05.522765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Affiliation(s)

Michael D Kinnaman Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Simone Zaccaria Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK Computational Cancer Genomics Research Group, University College London Cancer Institute, London, UK
Alvin Makohon-Moore Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA Hackensack Meridian Health Center for Discovery and Innovation, Nutley, NJ, USA (current affiliation) Georgetown University Lombardi Comprehensive Cancer Center, Washington, DC, USA (current affiliation)
Brian Arnold Department of Computer Science, Princeton University, Princeton, NJ, USA Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, USA
Max Levine Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Isabl, New York, NY, USA (current affiliation)
Gunes Gundem Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Juan E Arango Ossa Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Dominik Glodzik Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA (current affiliation)
M Irene Rodríguez-Sánchez Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Wunderman Thompson Health, New York, NY, USA (current affiliation)
Nancy Bouvier Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA IT and Digital Initiatives, Memorial Sloan Kettering Cancer Center, New York, NY, USA (current affiliation)
Shanita Li Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Emily Stockfisch Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Marisa Dunigan Integrated Genomics Operation Core, Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Cassidy Cobbs Integrated Genomics Operation Core, Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Umesh Bhanot Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA Precision Pathology Biobanking Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Daoqi You Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Katelyn Mullen Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY, USA
Jerry Melchor Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Michael V Ortiz Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Tara O'Donohue Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Emily Slotkin Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Leonard H Wexler Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Filemon S Dela Cruz Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Meera Hameed Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Julia L Glade Bender Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
William D Tap Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Paul A Meyers Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Elli Papaemmanuil Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Andrew L Kung Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Christine A Iacobuzio-Donahue Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA

Collapse

Sashittal P, Zhang H, Iacobuzio-Donahue CA, Raphael BJ. ConDoR: Tumor phylogeny inference with a copy-number constrained mutation loss model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522408. [PMID: 36711528 PMCID: PMC9882003 DOI: 10.1101/2023.01.05.522408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

Tumors consist of subpopulations of cells that harbor distinct collections of somatic mutations. These mutations range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). While many approaches infer tumor phylogenies using SNVs as phylogenetic markers, CNAs that overlap SNVs may lead to erroneous phylogenetic inference. Specifically, an SNV may be lost in a cell due to a deletion of the genomic segment containing the SNV. Unfortunately, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs. For instance, recent targeted scDNA-seq technologies, such as Mission Bio Tapestri, measure SNVs with high fidelity in individual cells, but yield much less reliable measurements of CNAs. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers and partial information about CNAs in the form of clustering of cells with similar copy-number profiles. This copy-number clustering constrains where loss of SNVs can occur in the phylogeny. We develop ConDoR (Constrained Dollo Reconstruction), an algorithm to infer tumor phylogenies from targeted scDNA-seq data using the constrained k-Dollo model. We show that ConDoR outperforms existing methods on simulated data. We use ConDoR to analyze a new multi-region targeted scDNA-seq dataset of 2153 cells from a pancreatic ductal adenocarcinoma (PDAC) tumor and produce a more plausible phylogeny compared to existing methods that conforms to histological results for the tumor from a previous study. We also analyze a metastatic colorectal cancer dataset, deriving a more parsimonious phylogeny than previously published analyses and with a simpler monoclonal origin of metastasis compared to the original study.

Code availability

Software is available at https://github.com/raphael-group/constrained-Dollo.

Collapse

Giorgashvili E, Reichel K, Caswara C, Kerimov V, Borsch T, Gruenstaeudl M. Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022;13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]

Characterization of the complete chloroplast genome of Zephyranthes phycelloides (Amaryllidaceae, tribe Hippeastreae) from Atacama region of Chile. Saudi J Biol Sci 2022;29:650-659. [PMID: 35002462 PMCID: PMC8716934 DOI: 10.1016/j.sjbs.2021.10.035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/21/2022] Open

Abstract

Sporadic rains in the Atacama Desert reveal a high biodiversity of plant species that only occur there. One of these rare species is the “Red añañuca” (Zephyranthes phycelloides), formerly known as Rhodophiala phycelloides. Many species of Zephyranthes in the Atacama Desert are dangerously threatened, due to massive extraction of bulbs and cutting of flowers. Therefore, studies of the biodiversity of these endemic species, which are essential for their conservation, should be conducted sooner rather than later. There are some chloroplast genomes available for Amaryllidaceae species, however there is no complete chloroplast genome available for any of the species of Zephyranthes subgenus Myostemma. The aim of the present work was to characterize and analyze the chloroplast of Z. phycelloides by NGS sequencing. The chloroplast genome of the Z. phycelloides consists of 158,107 bp, with typical quadripartite structures: a large single copy (LSC, 86,129 bp), a small single copy (SSC, 18,352 bp), and two inverted repeats (IR, 26,813 bp). One hundred thirty-seven genes were identified: 87 coding genes, 8 rRNA, 38 tRNA and 4 pseudogenes. The number of SSRs was 64 in Z. phycelloides and a total of 43 repeats were detected. The phylogenetic analysis of Z. phycelloides shows a distinct subclade with respect to Z. mesochloa. The average nucleotide variability (Pi) between Z. phycelloides and Z. mesochloa was of 0.02000, and seven loci with high variability were identified: psbA, trnS^GCU-trnG^UCC, trnD^GUC-trnY^GUA, trnL^UAA-trnF^GAA, rbcL, psbE-petL and ndhG-ndhI. The differences between the species are furthermore confirmed by the high amount of SNPs between these two species. Here, we report for the first time the complete cp genome of one species of the Zephyranthes subgenus Myostemma, which can be used for phylogenetic and population genomic studies.

Collapse

Pascual-Díaz JP, Garcia S, Vitales D. Plastome Diversity and Phylogenomic Relationships in Asteraceae. PLANTS (BASEL, SWITZERLAND) 2021;10:plants10122699. [PMID: 34961169 PMCID: PMC8705268 DOI: 10.3390/plants10122699] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 12/01/2021] [Accepted: 12/04/2021] [Indexed: 06/14/2023]

Lam MTY, Duttke SH, Odish MF, Le HD, Hansen EA, Nguyen CT, Trescott S, Kim R, Deota S, Chang MW, Patel A, Hepokoski M, Alotaibi M, Rolfsen M, Perofsky K, Warden AS, Foley J, Ramirez SI, Dan JM, Abbott RK, Crotty S, Crotty Alexander LE, Malhotra A, Panda S, Benner CW, Coufal NG. Profiling Transcription Initiation in Peripheral Leukocytes Reveals Severity-Associated Cis-Regulatory Elements in Critical COVID-19. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.08.24.457187. [PMID: 34462742 PMCID: PMC8404884 DOI: 10.1101/2021.08.24.457187] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Affiliation(s)

Michael Tun Yin Lam Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA Laboratory of Regulatory Biology, Salk Institute of Biological Studies, La Jolla, CA, USA
Sascha H. Duttke Division of Endocrinology and Metabolism, Department of Medicine, University of California, San Diego, CA, USA
Mazen F. Odish Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Hiep D. Le Laboratory of Regulatory Biology, Salk Institute of Biological Studies, La Jolla, CA, USA
Emily A. Hansen Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA Department of Pediatrics, University of California, San Diego, CA, USA
Celina T. Nguyen Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA
Samantha Trescott Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA Department of Pediatrics, University of California, San Diego, CA, USA
Roy Kim Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA Department of Pediatrics, University of California, San Diego, CA, USA
Shaunak Deota Laboratory of Regulatory Biology, Salk Institute of Biological Studies, La Jolla, CA, USA
Max W. Chang Division of Endocrinology and Metabolism, Department of Medicine, University of California, San Diego, CA, USA
Arjun Patel Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Mark Hepokoski Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Mona Alotaibi Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Mark Rolfsen Internal Medicine Residency Program, Department of Medicine, UC San Diego, CA, USA
Katherine Perofsky Department of Pediatrics, University of California, San Diego, CA, USA Rady Children’s Hospital, San Diego, CA
Anna S. Warden Division of Endocrinology and Metabolism, Department of Medicine, University of California, San Diego, CA, USA
Jennifer Foley Rady Children’s Hospital, San Diego, CA
Sydney I Ramirez Division of Infectious Diseases, Department of Medicine, University of California, San Diego Center for Infectious Diseases and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA
Jennifer M. Dan Division of Infectious Diseases, Department of Medicine, University of California, San Diego Center for Infectious Diseases and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA
Robert K Abbott Center for Infectious Diseases and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA Consortium for HIV/AIDS Vaccine Development (CHVAD), The Scripps Research Institute, La Jolla, CA, USA
Shane Crotty Center for Infectious Diseases and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA
Laura E Crotty Alexander Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Atul Malhotra Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of California, San Diego, CA USA
Satchidananda Panda Laboratory of Regulatory Biology, Salk Institute of Biological Studies, La Jolla, CA, USA
Christopher W. Benner Division of Endocrinology and Metabolism, Department of Medicine, University of California, San Diego, CA, USA
Nicole G. Coufal Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA Department of Pediatrics, University of California, San Diego, CA, USA Rady Children’s Hospital, San Diego, CA

Collapse

Mehl T, Gruenstaeudl M. airpg: automatically accessing the inverted repeats of archived plastid genomes. BMC Bioinformatics 2021;22:413. [PMID: 34418956 PMCID: PMC8379869 DOI: 10.1186/s12859-021-04309-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 07/26/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. However, many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations.

RESULTS

We introduce airpg, a Python package that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves plastid genomes from NCBI Nucleotide under variable search parameters, surveys them for length and location specifications of inverted repeats, and confirms any inverted repeat annotations through self-comparisons of the genome sequences. The package also includes functionality for automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants submitted to NCBI Nucleotide until the end of 2020 using airpg, followed by a statistical analysis of potential associations with record metadata, highlights that release year and publication status of the genome records have a significant effect on the frequency of complete and equal-length inverted repeat annotations.

CONCLUSION

The number of plastid genomes on NCBI Nucleotide has increased dramatically in recent years, and many more genomes will likely be submitted over the next decade. airpg enables researchers to automatically access and evaluate the inverted repeats of these plastid genomes as well as their sequence annotations and, thus, contributes to increasing the reliability of publicly available plastid genomes. The software is freely available via the Python package index at http://pypi.python.org/pypi/airpg .

Collapse

Mohanta TK, Mishra AK, Khan A, Hashem A, Abd_Allah EF, Al-Harrasi A. Gene Loss and Evolution of the Plastome. Genes (Basel) 2020;11:E1133. [PMID: 32992972 PMCID: PMC7650654 DOI: 10.3390/genes11101133] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 09/07/2020] [Accepted: 09/14/2020] [Indexed: 12/13/2022] Open

Abstract

Chloroplasts are unique organelles within the plant cells and are responsible for sustaining life forms on the earth due to their ability to conduct photosynthesis. Multiple functional genes within the chloroplast are responsible for a variety of metabolic processes that occur in the chloroplast. Considering its fundamental role in sustaining life on the earth, it is important to identify the level of diversity present in the chloroplast genome, what genes and genomic content have been lost, what genes have been transferred to the nuclear genome, duplication events, and the overall origin and evolution of the chloroplast genome. Our analysis of 2511 chloroplast genomes indicated that the genome size and number of coding DNA sequences (CDS) in the chloroplasts genome of algae are higher relative to other lineages. Approximately 10.31% of the examined species have lost the inverted repeats (IR) in the chloroplast genome that span across all the lineages. Genome-wide analyses revealed the loss of the Rbcl gene in parasitic and heterotrophic plants occurred approximately 56 Ma ago. PsaM, Psb30, ChlB, ChlL, ChlN, and Rpl21 were found to be characteristic signature genes of the chloroplast genome of algae, bryophytes, pteridophytes, and gymnosperms; however, none of these genes were found in the angiosperm or magnoliid lineage which appeared to have lost them approximately 203-156 Ma ago. A variety of chloroplast-encoded genes were lost across different species lineages throughout the evolutionary process. The Rpl20 gene, however, was found to be the most stable and intact gene in the chloroplast genome and was not lost in any of the analyzed species, suggesting that it is a signature gene of the plastome. Our evolutionary analysis indicated that chloroplast genomes evolved from multiple common ancestors ~1293 Ma ago and have undergone vivid recombination events across different taxonomic lineages.

Collapse