1
|
Seitz W, Kirwan AD, Brčić-Kostić K, Mitrikeski PT, Seitz PK. Visualizing genomic data: The mixing perspective. Biosystems 2023; 224:104839. [PMID: 36690200 DOI: 10.1016/j.biosystems.2023.104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/17/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023]
Abstract
We report on a novel way to visualize genomic data. By considering genome coding sequences, cds, as sets of the N=61 non-stop codons, one obtains a partition of the total number of codons in each cds. Partitions exhibit a statistical property known as mixing character which characterizes how mixed the partition is. Mixing characters have been shown mathematically to exhibit a partial order known as majorization (Ruch, 1975). In previous work (Seitz and Kirwan, 2022) we developed an approach that combined mixing and entropy that is visualized as a scatter plot. If we consider all 1,121,505 partitions of 61 codons, this produces a plot we call the theoretical mixing space, TGMS. A normalization procedure is developed here and applied to real genomic data to produce the genome mixing signature, GMS. Example GMS's of 19 species, including Homo sapiens, are shown and discussed.
Collapse
Affiliation(s)
- William Seitz
- Texas A&M University at Galveston, Galveston, TX 77553, United States of America.
| | - A D Kirwan
- College of Earth, Ocean and Environment, University of Delaware, Newark, DE, 19716, United States of America
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bos̆ković Institute, Zagreb 10000, Croatia
| | - Petar Tomev Mitrikeski
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bos̆ković Institute, Zagreb 10000, Croatia
| | - P K Seitz
- University of Texas Medical Branch, Galveston, TX 77555, United States of America
| |
Collapse
|
2
|
Hendriksen M, Francis A. A partial order and cluster-similarity metric on rooted phylogenetic trees. J Math Biol 2020; 80:1265-1290. [PMID: 32067071 DOI: 10.1007/s00285-019-01461-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 11/05/2019] [Indexed: 11/30/2022]
Abstract
Metrics on rooted phylogenetic trees are integral to a number of areas of phylogenetic analysis. Cluster-similarity metrics have recently been introduced in order to limit skew in the distribution of distances, and to ensure that trees in the neighbourhood of each other have similar hierarchies. In the present paper we introduce a new cluster-similarity metric on rooted phylogenetic tree space that has an associated local operation, allowing for easy calculation of neighbourhoods, a trait that is desirable for MCMC calculations. The metric is defined by the distance on the Hasse diagram induced by a partial order on the set of rooted phylogenetic trees, itself based on the notion of a hierarchy-preserving map between trees. The partial order we introduce is a refinement of the well-known refinement order on hierarchies. Both the partial order and the hierarchy-preserving maps may also be of independent interest.
Collapse
Affiliation(s)
- Michael Hendriksen
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia.
| |
Collapse
|
3
|
Bruggemann R, Carlsen L. Partial Order in Environmental Chemistry. Curr Comput Aided Drug Des 2019; 16:257-269. [PMID: 31038074 DOI: 10.2174/1573409915666190416160350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 03/15/2019] [Accepted: 04/01/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The theory of partial order is a branch of Discrete Mathematics and is often seen as pretty esoteric. However, depending on a suitable definition of an order relation, partial order theory has some statistical flavor. Here we introduce the application of partial order for environmental chemistry. OBJECTIVE We showed that partial order is an instrument, which at the same time, has both data exploration - and evaluation potency. METHODS The partial order theory was applied in this study. It depends on four indicators which describe the environmental hazards of chemicals. RESULTS Nineteen organic chemicals were found within a monitoring study in the German river Main and were taken as an exemplary case. The results indicated that chemicals can have a high risk on the environment, however, the type of risk is different and should not conceptually merge into a single quantity. CONCLUSION Partial order theory is of help to define different regulations and environmental management plans.
Collapse
Affiliation(s)
- Rainer Bruggemann
- Department of Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Ecohydrology, Berlin, Germany
| | - Lars Carlsen
- Awareness Center, Linkøpingvej 35, Trekroner, DK-4000 Roskilde, Denmark
| |
Collapse
|
4
|
Zhang GQ, Xing G, Cui L. An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies. J Biomed Inform 2018; 80:106-119. [PMID: 29548711 DOI: 10.1016/j.jbi.2018.03.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Revised: 03/10/2018] [Accepted: 03/12/2018] [Indexed: 11/17/2022]
Abstract
One of the basic challenges in developing structural methods for systematic audition on the quality of biomedical ontologies is the computational cost usually involved in exhaustive sub-graph analysis. We introduce ANT-LCA, a new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the hierarchical order induced by an ontology. The computation of LCA is a fundamental step for non-lattice approach for ontology quality assurance. Distinct from existing approaches, ANT-LCA only computes LCAs for non-trivial pairs, those having at least one common ancestor. To skip all trivial pairs that may be of no practical interest, ANT-LCA employs a simple but innovative algorithmic strategy combining topological order and dynamic programming to keep track of non-trivial pairs. We provide correctness proofs and demonstrate a substantial reduction in computational time for two largest biomedical ontologies: SNOMED CT and Gene Ontology (GO). ANT-LCA achieved an average computation time of 30 and 3 sec per version for SNOMED CT and GO, respectively, about 2 orders of magnitude faster than the best known approaches. Our algorithm overcomes a fundamental computational barrier in sub-graph based structural analysis of large ontological systems. It enables the implementation of a new breed of structural auditing methods that not only identifies potential problematic areas, but also automatically suggests changes to fix the issues. Such structural auditing methods can lead to more effective tools supporting ontology quality assurance work.
Collapse
Affiliation(s)
- Guo-Qiang Zhang
- Department of Computer Science, University of Kentucky, Lexington, KY, USA; Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA; Department of Internal Medicine, University of Kentucky, Lexington, KY, USA.
| | - Guangming Xing
- Department of Computer Science, Western Kentucky University, Bowling Green, KY, USA
| | - Licong Cui
- Department of Computer Science, University of Kentucky, Lexington, KY, USA; Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
5
|
Carlsen L, Bruggemann R, Kenessov B. Use of partial order in environmental pollution studies demonstrated by urban BTEX air pollution in 20 major cities worldwide. Sci Total Environ 2018; 610-611:234-243. [PMID: 28803199 DOI: 10.1016/j.scitotenv.2017.08.029] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 07/31/2017] [Accepted: 08/03/2017] [Indexed: 06/07/2023]
Abstract
Urban air pollution with benzene, toluene, ethyl benzene and xylenes (BTEX) is a common phenomenon in major cities where the pollution mainly originates from traffic as well as from residential heating. An attempt to rank cities according to their BTEX air pollution is not necessarily straight forward as we are faced with several individual pollutants simultaneously. A typical procedure is based on aggregation of data for the single compounds, a process that not only hides important information but is also subject to compensation effects. The present study applies a series of partial ordering tools to circumvent the aggregation. Based on partial ordering, most important indicators are disclosed, and an average ranking of the cities included in the study is derived. Since air pollution measurements are often subject to significant uncertainties, special attention has been given to the possible effect of uncertainty and/or data noise. Finally, the effect of introducing weight regimes is studied. In a concluding section the gross national income per person (GNI) is brought into play, demonstrating a positive correlation between BTEX air pollution and GNI. The results are discussed in terms of the ability/willingness to combat air pollution in the cities studied. The present study focuses on Almaty, the largest city in Kazakhstan and compares the data from Almaty to another 19 major cities around the world. It is found that the benzene for Almaty appears peculiar high. Overall Almaty appears ranked as the 8th most BTEX polluted city among the 20 cities included in the study.
Collapse
Affiliation(s)
- Lars Carlsen
- Awareness Center, Linkøpingvej 35, Trekroner, DK-4000 Roskilde, Denmark.
| | - Rainer Bruggemann
- Leibniz - Institute of Freshwater Ecology and Inland Fisheries, Department Ecohydrology, Müggelseedamm 310, D-12587 Berlin, Germany
| | - Bulat Kenessov
- Al-Farabi Kazakh National University, Faculty of Chemistry and Chemical Technology, Center of Physical Chemical Methods of Research and Analysis, Almaty, Kazakhstan
| |
Collapse
|
6
|
Abstract
BACKGROUND/AIMS Dose-finding trials can be conducted such that patients are first stratified into multiple risk groups before doses are allocated. The risk groups are often completely ordered in that, for a fixed dose, the probability of toxicity is monotonically increasing across groups. In some trials, the groups are only partially ordered. For example, one of several groups in a trial may be known to have the least risk of toxicity for a given dose, but the ordering of the risk among the remaining groups may not be known. The aim of the article is to introduce a method for designing dose-finding trials of cytotoxic agents in completely or partially ordered groups of patients. METHODS This article presents a method for dose-finding that combines previously proposed mathematical models, augmented with results using order restricted inference. The resulting method is computationally convenient and allows for dose-finding in trials with completely or partially ordered groups. Extensive simulations are done to evaluate the performance of the method, using randomly generated dose-toxicity curves where, within each group, the risk of toxicity is an increasing function of dose. RESULTS Our simulations show that the hybrid method, in which order-restricted estimation is applied to parameters of a parsimonious mathematical model, gives results that are similar to previously proposed methods for completely ordered groups. Our method generalizes to a wide range of partial orders among the groups. CONCLUSION The problem of dose-finding in partially ordered groups has not been extensively studied in the statistical literature. The proposed method is computationally feasible, and provides a potential solution to the design of dose-finding studies in completely or partially ordered groups.
Collapse
Affiliation(s)
- Mark Conaway
- 1 University of Virginia Health System, Charlottesville, VA, USA.,2 Division of Translational Research & Applied Statistics, Department of Public Health Sciences, University of Virginia School of Medicine, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
7
|
Carlsen L, Bruggemann R, Kenessova O, Erzhigitov E. Evaluation of analytical performance based on partial order methodology. Talanta 2014; 132:285-93. [PMID: 25476310 DOI: 10.1016/j.talanta.2014.09.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Revised: 08/27/2014] [Accepted: 09/05/2014] [Indexed: 11/20/2022]
Abstract
Classical measurements of performances are typically based on linear scales. However, in analytical chemistry a simple scale may be not sufficient to analyze the analytical performance appropriately. Here partial order methodology can be helpful. Within the context described here, partial order analysis can be seen as an ordinal analysis of data matrices, especially to simplify the relative comparisons of objects due to their data profile (the ordered set of values an object have). Hence, partial order methodology offers a unique possibility to evaluate analytical performance. In the present data as, e.g., provided by the laboratories through interlaboratory comparisons or proficiency testings is used as an illustrative example. However, the presented scheme is likewise applicable for comparison of analytical methods or simply as a tool for optimization of an analytical method. The methodology can be applied without presumptions or pretreatment of the analytical data provided in order to evaluate the analytical performance taking into account all indicators simultaneously and thus elucidating a "distance" from the true value. In the present illustrative example it is assumed that the laboratories analyze a given sample several times and subsequently report the mean value, the standard deviation and the skewness, which simultaneously are used for the evaluation of the analytical performance. The analyses lead to information concerning (1) a partial ordering of the laboratories, subsequently, (2) a "distance" to the Reference laboratory and (3) a classification due to the concept of "peculiar points".
Collapse
Affiliation(s)
- Lars Carlsen
- Awareness Center, Linkøpingvej 35, Trekroner, DK-4000 Roskilde, Denmark; Kazakh British Technical University, Department of Chemical Engineering, Almaty, Kazakhstan.
| | - Rainer Bruggemann
- Leibniz - Institute of Freshwater Ecology and Inland Fisheries, Department Ecohydrology, Müggelseedamm 310, D-12587 Berlin, Germany
| | | | | |
Collapse
|
8
|
Bruggemann R, Scherb H, Schramm KW, Cok I, Voigt K. CombiSimilarity, an innovative method to compare environmental and health data sets with different attribute sizes example: eighteen Organochlorine Pesticides in soil and human breast milk samples. Ecotoxicol Environ Saf 2014; 105:29-35. [PMID: 24780230 DOI: 10.1016/j.ecoenv.2014.03.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Revised: 03/25/2014] [Accepted: 03/27/2014] [Indexed: 06/03/2023]
Abstract
Human health and the health of the environment have entwined. In this paper we underpin this position by presenting a modeling approach named CombiSimilarity, which has been developed by the first author in the software tool PyHasse comprising a wide variety of partial ordering tools. A case study of 18 Organochlorine Pesticides (OCPs) detected in soil as well as in human breast milk samples in the Taurus Mountains in Turkey is carried out. Seven soil samples and 44 breast milk samples were measured. We seek to answer the question whether the contamination pattern in breast milk is associated with the contamination pattern in soil by studying the mutual quantitative relationships of the chemicals involved. We could demonstrate that there is a similarity with respect to the concentration profiles between the soil and breast milk pollution. Therefore the hypothesis may be formulated that the concentrations of chemicals in the milk samples are strongly related to the soil contamination. This supports the concept that soil could be a surrogate for human exposure at background locations.
Collapse
Affiliation(s)
- Rainer Bruggemann
- Leibniz-Institute of Fresh Water Ecology and Inland Fisheries, Berlin, Germany.
| | - Hagen Scherb
- Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Institute of Computational Biology, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany
| | - Karl-Werner Schramm
- Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Molecular EXposomics (MEX), Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; TUM, Wissenschaftszentrum Weihenstephan fuer Ernaehrung und Landnutzung, Department fuer Biowissenschaften, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Ismet Cok
- Department of Toxicology, Faculty of Pharmacy, Gazi University, 06330 Ankara, Turkey
| | - Kristina Voigt
- Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Institute of Computational Biology, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany
| |
Collapse
|