Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Xu R, Li D, Yang W, Wang G, Li Y. Improving ncRNA family prediction using multi-modal contrastive learning of sequence and structure. Bioinformatics 2024;40:btae640. [PMID: 39460948 PMCID: PMC11639665 DOI: 10.1093/bioinformatics/btae640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 10/15/2024] [Accepted: 10/22/2024] [Indexed: 10/28/2024] Open

Abstract

MOTIVATION

Recent advancements in high-throughput sequencing technology have significantly increased the focus on non-coding RNA (ncRNA) research within the life sciences. Despite this, the functions of many ncRNAs remain poorly understood. Research suggests that ncRNAs within the same family typically share similar functions, underlining the importance of understanding their roles. There are two primary methods for predicting ncRNA families: biological and computational. Traditional biological methods are not suitable for large-scale data prediction due to the significant human and resource requirements. Concurrently, most existing computational methods either rely solely on ncRNA sequence data or are exclusively based on the secondary structure of ncRNA molecules. These methods fail to fully utilize the rich multimodal information available from ncRNAs, thereby preventing them from learning more comprehensive and in-depth feature representations.

RESULTS

To tackle these problems, we proposed MM-ncRNAFP, a multi-modal contrastive learning framework for ncRNA family prediction. We first used a pre-trained language model to encode the primary sequences of a large mammalian ncRNA dataset. Then, we adopted a contrastive learning framework with an attention mechanism to fuse the secondary structure information obtained by graph neural networks. The MM-ncRNAFP method can effectively fuse multi-modal information. Experimental comparisons with several competitive baselines demonstrated that MM-ncRNAFP can achieve more comprehensive representations of ncRNA features by integrating both sequence and structural information. This integration significantly enhances the performance of ncRNA family prediction. Ablation experiments and qualitative analyses were performed to verify the effectiveness of each component in our model. Moreover, since our model is pre-trained on a large amount of ncRNA data, it has the potential to bring significant improvements to other ncRNA-related tasks.

AVAILABILITY AND IMPLEMENTATION

MM-ncRNAFP and the datasets are available at https://github.com/xuruiting2/MM-ncRNAFP.

Collapse

Zhang Y, Wang J, Yu J. PSA: an effective method for predicting horizontal gene transfers through parsimonious phylogenetic networks. Cladistics 2024;40:443-455. [PMID: 38717786 DOI: 10.1111/cla.12578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 07/15/2024] Open

Hong Y, Wang J. Frin: An Efficient Method for Representing Genome Evolutionary History. Front Genet 2019;10:1261. [PMID: 31867045 PMCID: PMC6909884 DOI: 10.3389/fgene.2019.01261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 11/14/2019] [Indexed: 11/13/2022] Open

Jamil HM. Optimizing Phylogenetic Queries for Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1692-1705. [PMID: 28858810 DOI: 10.1109/tcbb.2017.2743706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Wang J, Guo M. A review of metrics measuring dissimilarity for rooted phylogenetic networks. Brief Bioinform 2018;20:1972-1980. [DOI: 10.1093/bib/bby062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 06/20/2018] [Indexed: 11/14/2022] Open

Zou Q, Wan S, Zeng X, Ma ZS. Reconstructing evolutionary trees in parallel for massive sequences. BMC SYSTEMS BIOLOGY 2017;11:100. [PMID: 29297337 PMCID: PMC5751538 DOI: 10.1186/s12918-017-0476-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Chen X, Wang C, Tang S, Yu C, Zou Q. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment. BMC Bioinformatics 2017. [PMID: 28646874 PMCID: PMC5483318 DOI: 10.1186/s12859-017-1725-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

Background

The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.

Results

This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn²) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software.

Conclusion

CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA.

Collapse

Wang J, Guo M. A Metric on the Space of kth-order reduced Phylogenetic Networks. Sci Rep 2017;7:3189. [PMID: 28600511 DOI: 10.1038/s41598-017-03363-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 04/27/2017] [Indexed: 11/09/2022] Open

Wang J. A Survey of Methods for Constructing Rooted Phylogenetic Networks. PLoS One 2016;11:e0165834. [PMID: 27806124 PMCID: PMC5091748 DOI: 10.1371/journal.pone.0165834] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 10/18/2016] [Indexed: 11/18/2022] Open

Constructing Phylogenetic Networks Based on the Isomorphism of Datasets. BIOMED RESEARCH INTERNATIONAL 2016;2016:4236858. [PMID: 27547759 PMCID: PMC4980496 DOI: 10.1155/2016/4236858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 06/30/2016] [Indexed: 11/18/2022]

Wang J. A Metric on the Space of Partly Reduced Phylogenetic Networks. BIOMED RESEARCH INTERNATIONAL 2016;2016:7534258. [PMID: 27419137 PMCID: PMC4935902 DOI: 10.1155/2016/7534258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 05/23/2016] [Indexed: 11/17/2022]

Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 2015;31:2475-81. [DOI: 10.1093/bioinformatics/btv177] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 03/23/2015] [Indexed: 12/26/2022] Open