26
|
Sotero RC, Sanchez-Rodriguez LM, Moradi N, Dousty M. Estimation of global and local complexities of brain networks: A random walks approach. Netw Neurosci 2020; 4:575-594. [PMID: 32885116 PMCID: PMC7462425 DOI: 10.1162/netn_a_00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 03/23/2020] [Indexed: 11/29/2022] Open
Abstract
The complexity of brain activity has been observed at many spatial scales and has been proposed to differentiate between mental states and disorders. Here we introduced a new measure of (global) network complexity, constructed as the sum of the complexities of its nodes (i.e., local complexity). The complexity of each node is obtained by comparing the sample entropy of the time series generated by the movement of a random walker on the network resulting from removing the node and its connections, with the sample entropy of the time series obtained from a regular lattice (ordered state) and a random network (disordered state). We studied the complexity of fMRI-based resting-state networks. We found that positively correlated (pos) networks comprising only the positive functional connections have higher complexity than anticorrelation (neg) networks (comprising the negative connections) and the network consisting of the absolute value of all connections (abs). We also observed a significant correlation between complexity and the strength of functional connectivity in the pos network. Our results suggest that the pos network is related to the information processing in the brain and that functional connectivity studies should analyze pos and neg networks separately instead of the abs network, as is commonly done.
Collapse
|
27
|
Harvey JA, Larsen KW. Rattlesnake migrations and the implications of thermal landscapes. MOVEMENT ECOLOGY 2020; 8:21. [PMID: 32514356 PMCID: PMC7251723 DOI: 10.1186/s40462-020-00202-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 03/24/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND The importance of thermal resources to terrestrial ectotherms has been well documented but less often considered in larger-scale analyses of habitat use and selection, such as those routinely conducted using standard habitat features such as vegetation and physical structure. Selection of habitat based on thermal attributes may be of particular importance for ectothermic species, especially in colder climates. In Canada, Western Rattlesnakes (Crotalus oreganus) reach their northern limits, with limited time to conduct annual migratory movements between hibernacula and summer habitat. We radio-tracked 35 male snakes departing from 10 different hibernacula. We examined coarse-scale differences in migratory movements across the region, and then compared the route of each snake with thermal landscapes and ruggedness GIS maps generated for different periods of the animals' active season. RESULTS We observed dichotomous habitat use (grasslands versus upland forests) throughout most of the species' northern range, reflected in different migratory movements of male snakes emanating from different hibernacula. Snakes utilizing higher-elevation forests moved further during the course of their annual migrations, and these snakes were more likely to use warmer areas of the landscape. CONCLUSION In addition to thermal benefits, advantages gained from selective migratory patterns may include prey availability and outbreeding. Testing these alternative hypotheses was beyond the scope of this study, and to collect the data to do so will require overcoming certain challenges. Still, insight into migratory differences between rattlesnake populations and the causal mechanism(s) of migrations will improve our ability to assess the implications of landscape change, management, and efficacy of conservation planning. Our findings suggest that such assessments may need to be tailored to individual dens and the migration strategies of their inhabitants. Additionally, local and landscape-scale migration patterns, as detected in this study, will have repercussions for snakes under climate-induced shifts in ecosystem boundaries and thermal regimes.
Collapse
|
28
|
Ding Y, Chen B, Lei X, Liao B, Wu FX. Predicting novel CircRNA-disease associations based on random walk and logistic regression model. Comput Biol Chem 2020; 87:107287. [PMID: 32446243 DOI: 10.1016/j.compbiolchem.2020.107287] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/09/2020] [Indexed: 12/24/2022]
Abstract
Circular RNAs (circRNAs), a large group of small endogenous noncoding RNA molecules, have been proved to modulate protein-coding genes in the human genome. In recent years, many experimental studies have demonstrated that circRNAs are dysregulated in a number of diseases, and they can serve as biomarkers for disease diagnosis and prognosis. However, it is expensive and time-consuming to identify circRNA-disease associations by biological experiments and few computational models have been proposed for novel circRNA-disease association prediction. In this study, we develop a computational model based on the random walk and the logistic regression (RWLR) to predict circRNA-disease associations. Firstly, a circRNA-circRNA similarity network is constructed by calculating their functional similarity of circRNA based on circRNA-related gene ontology. Then, a random walk with restart is implemented on the circRNA similarity network, and the features of each pair of circRNA-disease are extracted based on the results of the random walk and the circRNA-disease association matrix. Finally, a logistic regression model is used to predict novel circRNA-disease associations. Leave one out validation (LOOCV), five-fold cross validation (5CV) and ten-fold cross validation (10CV) are adopted to evaluate the prediction performance of RWLR, by comparing with the latest two methods PWCDA and DWNN-RLS. The experiment results show that our RWLR has higher AUC values of LOOCV, 5CV and 10CV than the other two latest methods, which demonstrates that RWLR has a better performance than other computational methods. What's more, case studies also illustrate the reliability and effectiveness of RWLR for circRNA-disease association prediction.
Collapse
|
29
|
Wang W, Smith J, Hejase HA, Liu KJ. Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences. Algorithms Mol Biol 2020; 15:7. [PMID: 32322294 PMCID: PMC7164268 DOI: 10.1186/s13015-020-00167-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 04/04/2020] [Indexed: 11/18/2022] Open
Abstract
Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods.
Collapse
|
30
|
Rezaeinia P, Fairley K, Pal P, Meyer FG, Carter RM. Identifying brain network topology changes in task processes and psychiatric disorders. Netw Neurosci 2020; 4:257-273. [PMID: 32181418 PMCID: PMC7069064 DOI: 10.1162/netn_a_00122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 12/11/2019] [Indexed: 11/04/2022] Open
Abstract
A central goal in neuroscience is to understand how dynamic networks of neural activity produce effective representations of the world. Advances in the theory of graph measures raise the possibility of elucidating network topologies central to the construction of these representations. We leverage a result from the description of lollipop graphs to identify an iconic network topology in functional magnetic resonance imaging data and characterize changes to those networks during task performance and in populations diagnosed with psychiatric disorders. During task performance, we find that task-relevant subnetworks change topology, becoming more integrated by increasing connectivity throughout cortex. Analysis of resting state connectivity in clinical populations shows a similar pattern of subnetwork topology changes; resting scans becoming less default-like with more integrated sensory paths. The study of brain network topologies and their relationship to cognitive models of information processing raises new opportunities for understanding brain function and its disorders.
Collapse
|
31
|
Ruiz-Suarez S, Leos-Barajas V, Alvarez-Castro I, Morales JM. Using approximate Bayesian inference for a "steps and turns" continuous-time random walk observed at regular time intervals. PeerJ 2020; 8:e8452. [PMID: 32095333 PMCID: PMC7020826 DOI: 10.7717/peerj.8452] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 12/23/2019] [Indexed: 11/20/2022] Open
Abstract
The study of animal movement is challenging because movement is a process modulated by many factors acting at different spatial and temporal scales. In order to describe and analyse animal movement, several models have been proposed which differ primarily in the temporal conceptualization, namely continuous and discrete time formulations. Naturally, animal movement occurs in continuous time but we tend to observe it at fixed time intervals. To account for the temporal mismatch between observations and movement decisions, we used a state-space model where movement decisions (steps and turns) are made in continuous time. That is, at any time there is a non-zero probability of making a change in movement direction. The movement process is then observed at regular time intervals. As the likelihood function of this state-space model turned out to be intractable yet simulating data is straightforward, we conduct inference using different variations of Approximate Bayesian Computation (ABC). We explore the applicability of this approach as a function of the discrepancy between the temporal scale of the observations and that of the movement process in a simulation study. Simulation results suggest that the model parameters can be recovered if the observation time scale is moderately close to the average time between changes in movement direction. Good estimates were obtained when the scale of observation was up to five times that of the scale of changes in direction. We demonstrate the application of this model to a trajectory of a sheep that was reconstructed in high resolution using information from magnetometer and GPS devices. The state-space model used here allowed us to connect the scales of the observations and movement decisions in an intuitive and easy to interpret way. Our findings underscore the idea that the time scale at which animal movement decisions are made needs to be considered when designing data collection protocols. In principle, ABC methods allow to make inferences about movement processes defined in continuous time but in terms of easily interpreted steps and turns.
Collapse
|
32
|
Rodríguez J, Jattin J, Soracipa Y. Probabilistic temporal prediction of the deaths caused by traffic in Colombia. Mortality caused by traffic prediction. ACCIDENT; ANALYSIS AND PREVENTION 2020; 135:105332. [PMID: 31838321 DOI: 10.1016/j.aap.2019.105332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 07/02/2019] [Accepted: 10/15/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND from probability theory and probabilistic random walk, predictions about the quantity of cases of a given phenomenon for certain year, such as epidemics of dengue, have been previously obtained with results close to 100% in precision. OBJECTIVE To confirm the applicability of a methodology based on probability and probabilistic random walk to predict the dynamics of deaths from road traffic injuries in Colombia for 2010. METHODOLOGY through the development of a total probability space that analyses the probabilistic behaviour of augments and decreases observed in the variation of the lengths of the death rates caused by traffic in Colombia from 2004 to 2009, the most likely event for 2010 was established for predicting the rate of deaths for that year. RESULTS The predicted rate of deaths caused by traffic injuries in Colombia for 2010 was 14.88 with the methodology. When this value is compared with the value reported by national statistics, which was a rate of 12.9, a precision of 86.6% with the prediction was achieved. CONCLUSIONS the applicability of the developed methodology to predict the dynamic behaviour of deaths caused by traffic injuries in Colombia for 2010 by means of a probabilistic random walk was confirmed with a good precision, suggesting that this methodology could be useful to verify the efficacy of national road safety strategies implemented to reduce mortality rates.
Collapse
|
33
|
Abstract
The abundance of high-throughput data and technical refinements in graph theories have allowed network analysis to become an effective approach for various medical fields. This chapter introduces co-expression, Bayesian, and regression-based network construction methods, which are the basis of network analysis. Various methods in network topology analysis are explained, along with their unique features and applications in biomedicine. Furthermore, we explain the role of network embedding in reducing the dimensionality of networks and outline several popular algorithms used by researchers today. Current literature has implemented different combinations of topology analysis and network embedding techniques, and we outline several studies in the fields of genetic-based disease prediction, drug-target identification, and multi-level omics integration.
Collapse
|
34
|
Liu H, Zhang W, Nie L, Ding X, Luo J, Zou L. Predicting effective drug combinations using gradient tree boosting based on features extracted from drug-protein heterogeneous network. BMC Bioinformatics 2019; 20:645. [PMID: 31818267 PMCID: PMC6902475 DOI: 10.1186/s12859-019-3288-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 11/21/2019] [Indexed: 01/30/2023] Open
Abstract
Background Although targeted drugs have contributed to impressive advances in the treatment of cancer patients, their clinical benefits on tumor therapies are greatly limited due to intrinsic and acquired resistance of cancer cells against such drugs. Drug combinations synergistically interfere with protein networks to inhibit the activity level of carcinogenic genes more effectively, and therefore play an increasingly important role in the treatment of complex disease. Results In this paper, we combined the drug similarity network, protein similarity network and known drug-protein associations into a drug-protein heterogenous network. Next, we ran random walk with restart (RWR) on the heterogenous network using the combinatorial drug targets as the initial probability, and obtained the converged probability distribution as the feature vector of each drug combination. Taking these feature vectors as input, we trained a gradient tree boosting (GTB) classifier to predict new drug combinations. We conducted performance evaluation on the widely used drug combination data set derived from the DCDB database. The experimental results show that our method outperforms seven typical classifiers and traditional boosting algorithms. Conclusions The heterogeneous network-derived features introduced in our method are more informative and enriching compared to the primary ontology features, which results in better performance. In addition, from the perspective of network pharmacology, our method effectively exploits the topological attributes and interactions of drug targets in the overall biological network, which proves to be a systematic and reliable approach for drug discovery.
Collapse
|
35
|
Ögren M, Jha D, Dobberschütz S, Müter D, Carlsson M, Gulliksson M, Stipp SLS, Sørensen HO. Numerical simulations of NMR relaxation in chalk using local Robin boundary conditions. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2019; 308:106597. [PMID: 31546178 DOI: 10.1016/j.jmr.2019.106597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 06/10/2023]
Abstract
The interpretation of nuclear magnetic resonance (NMR) data is of interest in a number of fields. In Ögren (2014) local boundary conditions for random walk simulations of NMR relaxation in digital domains were presented. Here, we have applied those boundary conditions to large, three-dimensional (3D) porous media samples. We compared the random walk results with known solutions and then applied them to highly structured 3D domains, from images derived using synchrotron radiation CT scanning of North Sea chalk samples. As expected, there were systematic errors caused by digitalization of the pore surfaces so we quantified those errors, and by using linear local boundary conditions, we were able to significantly improve the output. We also present a technique for treating numerical data prior to input into the ESPRIT algorithm for retrieving Laplace components of time series from NMR data (commonly called T-inversion).
Collapse
|
36
|
Yokoi H, Tainaka KI, Sato K. Metapopulation model for a prey-predator system: Nonlinear migration due to the finite capacities of patches. J Theor Biol 2019; 477:24-35. [PMID: 31194986 DOI: 10.1016/j.jtbi.2019.05.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 05/29/2019] [Accepted: 05/31/2019] [Indexed: 10/26/2022]
Abstract
Many species live in spatially separated patches, and individuals can migrate between patches through paths. In real ecosystems, the capacities of patches are finite. If a patch is already occupied by the individuals of some species, then the migration into the patch is impossible. In the present paper, we deal with prey-predator system composed of two patches. Each patch contains a limited number of cells, where the cell is either empty or occupied by an individual of prey or predator. We introduce "swapping migration" defined by the exchange between occupied and empty cells. An individual can migrate, only when there are empty cells in the destination patch. Reaction-migration equations in prey-predator system are presented, where the migration term forms nonlinear function of densities. We numerically solve equilibrium densities, and find that the population dynamics are largely affected by nonlinear migration. Not only extinction points but also the responses to the environmental changes crucially depend on the patch capacities.
Collapse
|
37
|
Nordam T, Nepstad R, Litzler E, Röhrs J. On the use of random walk schemes in oil spill modelling. MARINE POLLUTION BULLETIN 2019; 146:631-638. [PMID: 31426202 DOI: 10.1016/j.marpolbul.2019.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/25/2019] [Accepted: 07/01/2019] [Indexed: 06/10/2023]
Abstract
In oil spill models, vertical mixing due to turbulence is commonly modelled by random walk. If the eddy diffusivity varies with depth, failing to take the derivative of the diffusivity into account in the random walk scheme will lead to incorrect results. Depending on the diffusivity profile, the result may be either over- or underprediction of the amount of surfaced oil. The importance of using consistent random walk schemes has been known for decades in, e.g., the plankton modelling community. However, it appears not to be common knowledge in the oil spill community, with inconsistent random walk schemes appearing even in recent publications. We demonstrate and quantify the error due to inconsistent random walk, using a simplified oil spill model, and two different diffusivity profiles. In the two cases considered, a commonly used inconsistent scheme predicts respectively 54% and 202% the amount of surface oil, compared to a consistent scheme.
Collapse
|
38
|
Kim TR, Jeong HH, Sohn KA. Topological integration of RPPA proteomic data with multi-omics data for survival prediction in breast cancer via pathway activity inference. BMC Med Genomics 2019; 12:94. [PMID: 31296204 PMCID: PMC6624183 DOI: 10.1186/s12920-019-0511-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The analysis of integrated multi-omics data enables the identification of disease-related biomarkers that cannot be identified from a single omics profile. Although protein-level data reflects the cellular status of cancer tissue more directly than gene-level data, past studies have mainly focused on multi-omics integration using gene-level data as opposed to protein-level data. However, the use of protein-level data (such as mass spectrometry) in multi-omics integration has some limitations. For example, the correlation between the characteristics of gene-level data (such as mRNA) and protein-level data is weak, and it is difficult to detect low-abundance signaling proteins that are used to target cancer. The reverse phase protein array (RPPA) is a highly sensitive antibody-based quantification method for signaling proteins. However, the number of protein features in RPPA data is extremely low compared to the number of gene features in gene-level data. In this study, we present a new method for integrating RPPA profiles with RNA-Seq and DNA methylation profiles for survival prediction based on the integrative directed random walk (iDRW) framework proposed in our previous study. In the iDRW framework, each omics profile is merged into a single pathway profile that reflects the topological information of the pathway. In order to address the sparsity of RPPA profiles, we employ the random walk with restart (RWR) approach on the pathway network. RESULTS Our model was validated using survival prediction analysis for a breast cancer dataset from The Cancer Genome Atlas. Our proposed model exhibited improved performance compared with other methods that utilize pathway information and also out-performed models that did not include the RPPA data utilized in our study. The risk pathways identified for breast cancer in this study were closely related to well-known breast cancer risk pathways. CONCLUSIONS Our results indicated that RPPA data is useful for survival prediction for breast cancer patients under our framework. We also observed that iDRW effectively integrates RNA-Seq, DNA methylation, and RPPA profiles, while variation in the composition of the omics data can affect both prediction performance and risk pathway identification. These results suggest that omics data composition is a critical parameter for iDRW.
Collapse
|
39
|
Song J, Peng W, Wang F. A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph. BMC Bioinformatics 2019; 20:238. [PMID: 31088372 PMCID: PMC6518800 DOI: 10.1186/s12859-019-2847-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 04/24/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Cancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task. RESULTS Considering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types. CONCLUSIONS The final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment. AVAILABILITY The source code can be obtained at https://github.com/weiba/Subdyquency .
Collapse
|
40
|
Liang L, Chen V, Zhu K, Fan X, Lu X, Lu S. Integrating data and knowledge to identify functional modules of genes: a multilayer approach. BMC Bioinformatics 2019; 20:225. [PMID: 31046665 PMCID: PMC6498600 DOI: 10.1186/s12859-019-2800-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 04/09/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. RESULTS Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. CONCLUSION Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.
Collapse
|
41
|
Kim SY, Jeong HH, Kim J, Moon JH, Sohn KA. Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies. Biol Direct 2019; 14:8. [PMID: 31036036 PMCID: PMC6489180 DOI: 10.1186/s13062-019-0239-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 01/15/2023] Open
Abstract
Background Integrating the rich information from multi-omics data has been a popular approach to survival prediction and bio-marker identification for several cancer studies. To facilitate the integrative analysis of multiple genomic profiles, several studies have suggested utilizing pathway information rather than using individual genomic profiles. Methods We have recently proposed an integrative directed random walk-based method utilizing pathway information (iDRW) for more robust and effective genomic feature extraction. In this study, we applied iDRW to multiple genomic profiles for two different cancers, and designed a directed gene-gene graph which reflects the interaction between gene expression and copy number data. In the experiments, the performances of the iDRW method and four state-of-the-art pathway-based methods were compared using a survival prediction model which classifies samples into two survival groups. Results The results show that the integrative analysis guided by pathway information not only improves prediction performance, but also provides better biological insights into the top pathways and genes prioritized by the model in both the neuroblastoma and the breast cancer datasets. The pathways and genes selected by the iDRW method were shown to be related to the corresponding cancers. Conclusions In this study, we demonstrated the effectiveness of a directed random walk-based multi-omics data integration method applied to gene expression and copy number data for both breast cancer and neuroblastoma datasets. We revamped a directed gene-gene graph considering the impact of copy number variation on gene expression and redefined the weight initialization and gene-scoring method. The benchmark result for iDRW with four pathway-based methods demonstrated that the iDRW method improved survival prediction performance and jointly identified cancer-related pathways and genes for two different cancer datasets. Reviewers This article was reviewed by Helena Molina-Abril and Marta Hidalgo.
Collapse
|
42
|
Integrating random walk and binary regression to identify novel miRNA-disease association. BMC Bioinformatics 2019; 20:59. [PMID: 30691413 PMCID: PMC6350368 DOI: 10.1186/s12859-019-2640-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 01/18/2019] [Indexed: 02/07/2023] Open
Abstract
Background In the last few decades, cumulative experimental researches have witnessed and verified the important roles of microRNAs (miRNAs) in the development of human complex diseases. Benefitting from the rapid growth both in the availability of miRNA-related data and the development of various analysis methodologies, up until recently, some computational models have been developed to predict human disease related miRNAs, efficiently and quickly. Results In this work, we proposed a computational model of Random Walk and Binary Regression-based MiRNA-Disease Association prediction (RWBRMDA). RWBRMDA extracted features for each miRNA from random walk with restart on the integrated miRNA similarity network for binary logistic regression to predict potential miRNA-disease associations. RWBRMDA obtained AUC of 0.8076 in the leave-one-out cross validation. Additionally, we carried out three different patterns of case studies on four human complex diseases. Specifically, Esophageal cancer and Prostate cancer were conducted as one kind of case study based on known miRNA-disease associations in HMDD v2.0 database. Out of the top 50 predicted miRNAs, 94 and 90% were respectively confirmed by recent experimental reports. To simulate new disease without known related miRNAs, the information of known Breast cancer related miRNAs was removed. As a result, 98% of the top 50 predicted miRNAs for Breast cancer were confirmed. Lymphoma, the verified ratio of which was 88%, was used to assess the prediction robustness of RWBRMDA based on the association records in HMDD v1.0 database. Conclusions We anticipated that RWBRMDA could benefit the future experimental investigations about the relation between human disease and miRNAs by generating promising and testable top-ranked miRNAs, and significantly reducing the effort and cost of identification works. Electronic supplementary material The online version of this article (10.1186/s12859-019-2640-9) contains supplementary material, which is available to authorized users.
Collapse
|
43
|
Alsmeyer G, Raschel K. The extinction problem for a distylous plant population with sporophytic self-incompatibility. J Math Biol 2019; 78:1841-1874. [PMID: 30683998 DOI: 10.1007/s00285-019-01328-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 01/11/2019] [Indexed: 11/28/2022]
Abstract
In this paper, the extinction problem for a class of distylous plant populations is considered within the framework of certain nonhomogeneous nearest-neighbor random walks in the positive quadrant. For the latter, extinction means absorption at one of the axes. Despite connections with some classical probabilistic models (standard two-type Galton-Watson process, two-urn model), exact formulae for the probabilities of absorption seem to be difficult to come by and one must therefore resort to good approximations. In order to meet this task, we develop potential-theoretic tools and provide various sub- and super-harmonic functions which, for large initial populations, provide bounds which in particular improve those that have appeared earlier in the literature.
Collapse
|
44
|
Abstract
Computational prediction of the clinical success or failure of a potential drug target for therapeutic use is a challenging problem. Novel network propagation algorithms that integrate heterogeneous biological networks are proving useful for drug target identification and prioritization. These approaches typically utilize a network describing relationships between targets, a method to disseminate the relevant information through the network, and a method to elucidate new associations between targets and diseases. Here, we utilize one such network propagation-based approach, DTINet, which starts with diffusion component analysis of networks of both potential drug targets and diseases. Then an inductive matrix completion algorithm is applied to identify novel disease targets based on their network topological similarities with known disease targets with successfully launched drugs. DTINet performed well as assessed with area under the precision-recall curve (AUPR = 0.88 ± 0.007) and area under the receiver operating characteristic curve (AUROC = 0.86 ± 0.008). These metrics improved when we combined data from multiple networks in the target space but reduced significantly when we used a more conservative method to define negative controls (AUPR = 0.56 ± 0.007, AUROC = 0.57 ± 0.007). We are optimistic that integration of more relevant and cleaner datasets and networks, careful calibration of model parameters, as well as algorithmic improvements will improve prediction accuracy. However, we also recognize that predicting drug targets that are likely to be successful is an extremely challenging problem due to its complex nature and sparsity of known disease targets.
Collapse
|
45
|
Zhang L, Yu G, Guo M, Wang J. Predicting protein-protein interactions using high-quality non-interacting pairs. BMC Bioinformatics 2018; 19:525. [PMID: 30598096 PMCID: PMC6311908 DOI: 10.1186/s12859-018-2525-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Identifying protein-protein interactions (PPIs) is of paramount importance for understanding cellular processes. Machine learning-based approaches have been developed to predict PPIs, but the effectiveness of these approaches is unsatisfactory. One major reason is that they randomly choose non-interacting protein pairs (negative samples) or heuristically select non-interacting pairs with low quality. RESULTS To boost the effectiveness of predicting PPIs, we propose two novel approaches (NIP-SS and NIP-RW) to generate high quality non-interacting pairs based on sequence similarity and random walk, respectively. Specifically, the known PPIs collected from public databases are used to generate the positive samples. NIP-SS then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins to construct the negative dataset. NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network, and then selects protein pairs not connected in the updated network as negative samples. Next, we use auto covariance (AC) descriptor to encode the feature information of amino acid sequences. After that, we employ deep neural networks (DNNs) to predict PPIs based on extracted features, positive and negative examples. Extensive experiments show that NIP-SS and NIP-RW can generate negative samples with higher quality than existing strategies and thus enable more accurate prediction. CONCLUSIONS The experimental results prove that negative datasets constructed by NIP-SS and NIP-RW can reduce the bias and have good generalization ability. NIP-SS and NIP-RW can be used as a plugin to boost the effectiveness of PPIs prediction. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NIP .
Collapse
|
46
|
Metapopulation model of rock-scissors-paper game with subpopulation-specific victory rates stabilized by heterogeneity. J Theor Biol 2018; 458:103-110. [PMID: 30213665 DOI: 10.1016/j.jtbi.2018.09.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 09/07/2018] [Accepted: 09/10/2018] [Indexed: 11/20/2022]
Abstract
Recently, metapopulation models for rock-paper-scissors games have been presented. Each subpopulation is represented by a node on a graph. An individual is either rock (R), scissors (S) or paper (P); it randomly migrates among subpopulations. In the present paper, we assume victory rates differ in different subpopulations. To investigate the dynamic state of each subpopulation (node), we numerically obtain the solutions of reaction-diffusion equations on the graphs with two and three nodes. In the case of homogeneous victory rates, we find each subpopulation has a periodic solution with neutral stability. However, when victory rates between subpopulations are heterogeneous, the solution approaches stable focuses. The heterogeneity of victory rates promotes the coexistence of species.
Collapse
|
47
|
Yuan Y, Chen YW, Dong C, Yu H, Zhu Z. Hybrid method combining superpixel, random walk and active contour model for fast and accurate liver segmentation. Comput Med Imaging Graph 2018; 70:119-134. [PMID: 30359946 DOI: 10.1016/j.compmedimag.2018.08.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 04/27/2018] [Accepted: 08/27/2018] [Indexed: 10/28/2022]
Abstract
Organ segmentation is an important pre-processing step in surgery planning and computer-aided diagnosis. In this paper, we propose a fast and accurate liver segmentation framework. Our proposed method combines a knowledge-based slice-by-slice Random Walk (RW) segmentation algorithm (proposed in our previous work) with a superpixel algorithm called the Contrast-enhanced Compact Watershed (CCWS) method to reduce computing time and memory costs. Compared to the commonly used Simple Linear Iterative Clustering (SLIC), we demonstrate that our CCWS is more appropriate for liver segmentation. To improve the methods accuracy, we use a modified narrow band active contour model as a refinement after the initial segmentation. The experiments showed that the superpixel-based slice-by-slice RW could segment the entire liver with improved speed, and the modified active contour model is more precise than the original Chan-Vese Model. As a result, the proposed framework is able to quickly and accurately segment the entire liver.
Collapse
|
48
|
Lorenz-Spreen P, Wolf F, Braun J, Ghoshal G, Djurdjevac Conrad N, Hövel P. Tracking online topics over time: understanding dynamic hashtag communities. COMPUTATIONAL SOCIAL NETWORKS 2018; 5:9. [PMID: 30416936 PMCID: PMC6208799 DOI: 10.1186/s40649-018-0058-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 09/28/2018] [Indexed: 11/10/2022]
Abstract
Background Hashtags are widely used for communication in online media. As a condensed version of information, they characterize topics and discussions. For their analysis, we apply methods from network science and propose novel tools for tracing their dynamics in time-dependent data. The observations are characterized by bursty behaviors in the increases and decreases of hashtag usage. These features can be reproduced with a novel model of dynamic rankings. Hashtag communities in time We build temporal and weighted co-occurrence networks from hashtags. On static snapshots, we infer the community structure using customized methods. On temporal networks, we solve the bipartite matching problem of detected communities at subsequent timesteps by taking into account higher-order memory. This results in a matching protocol that is robust toward temporal fluctuations and instabilities of the static community detection. The proposed methodology is broadly applicable and its outcomes reveal the temporal behavior of online topics. Modeling topic-dynamics We consider the size of the communities in time as a proxy for online popularity dynamics. We find that the distributions of gains and losses, as well as the interevent times are fat-tailed indicating occasional, but large and sudden changes in the usage of hashtags. Inspired by typical website designs, we propose a stochastic model that incorporates a ranking with respect to a time-dependent prestige score. This causes occasional cascades of rank shift events and reproduces the observations with good agreement. This offers an explanation for the observed dynamics, based on characteristic elements of online media.
Collapse
|
49
|
Boufadel MC, Cui F, Katz J, Nedwed T, Lee K. On the transport and modeling of dispersed oil under ice. MARINE POLLUTION BULLETIN 2018; 135:569-580. [PMID: 30301075 DOI: 10.1016/j.marpolbul.2018.07.046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 07/09/2018] [Accepted: 07/17/2018] [Indexed: 06/08/2023]
Abstract
Theoretical arguments and numerical investigations were conducted to understand the transport of oil droplets under ice. It was found that the boundary layer (BL) in the water under ice produces a downward velocity that reaches up to 0.2% of horizontal current speed, and is, in general, larger than the rise velocity of 70 μm oil droplets. The eddy diffusivity was found to increase with depth and to decrease gradually afterward. Neglecting the gradient of eddy diffusivity when conducting Lagrangian transport of oil droplets would result in an unphysical spatial distribution. When the downward velocity of water was neglected, oil accumulated at the water-ice interface regardless of the attachment efficiency. The lift force was found to scrape off droplets of the ice, especially for droplets ≤ 70 μm. These findings suggest that previous oil spill simulations may have overestimated the number of small droplets (≤70 μm) at the water-ice interface.
Collapse
|
50
|
Kim SY, Kim TR, Jeong HH, Sohn KA. Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer. BMC Med Genomics 2018; 11:68. [PMID: 30255812 PMCID: PMC6157196 DOI: 10.1186/s12920-018-0389-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background Integrative analysis on multi-omics data has gained much attention recently. To investigate the interactive effect of gene expression and DNA methylation on cancer, we propose a directed random walk-based approach on an integrated gene-gene graph that is guided by pathway information. Methods Our approach first extracts a single pathway profile matrix out of the gene expression and DNA methylation data by performing the random walk over the integrated graph. We then apply a denoising autoencoder to the pathway profile to further identify important pathway features and genes. The extracted features are validated in the survival prediction task for breast cancer patients. Results The results show that the proposed method substantially improves the survival prediction performance compared to that of other pathway-based prediction methods, revealing that the combined effect of gene expression and methylation data is well reflected in the integrated gene-gene graph combined with pathway information. Furthermore, we show that our joint analysis on the methylation features and gene expression profile identifies cancer-specific pathways with genes related to breast cancer. Conclusions In this study, we proposed a DRW-based method on an integrated gene-gene graph with expression and methylation profiles in order to utilize the interactions between them. The results showed that the constructed integrated gene-gene graph can successfully reflect the combined effect of methylation features on gene expression profiles. We also found that the selected features by DA can effectively extract topologically important pathways and genes specifically related to breast cancer.
Collapse
|