1
|
Boumajdi N, Bendani H, Belyamani L, Ibrahimi A. TreeWave: command line tool for alignment-free phylogeny reconstruction based on graphical representation of DNA sequences and genomic signal processing. BMC Bioinformatics 2024; 25:367. [PMID: 39604838 PMCID: PMC11600722 DOI: 10.1186/s12859-024-05992-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 11/18/2024] [Indexed: 11/29/2024] Open
Abstract
BACKGROUND Genomic sequence similarity comparison is a crucial research area in bioinformatics. Multiple Sequence Alignment (MSA) is the basic technique used to identify regions of similarity between sequences, although MSA tools are widely used and highly accurate, they are often limited by computational complexity, and inaccuracies when handling highly divergent sequences, which leads to the development of alignment-free (AF) algorithms. RESULTS This paper presents TreeWave, a novel AF approach based on frequency chaos game representation and discrete wavelet transform of sequences for phylogeny inference. We validate our method on various genomic datasets such as complete virus genome sequences, bacteria genome sequences, human mitochondrial genome sequences, and rRNA gene sequences. Compared to classical methods, our tool demonstrates a significant reduction in running time, especially when analyzing large datasets. The resulting phylogenetic trees show that TreeWave has similar classification accuracy to the classical MSA methods based on the normalized Robinson-Foulds distances and Baker's Gamma coefficients. CONCLUSIONS TreeWave is an open source and user-friendly command line tool for phylogeny reconstruction. It is a faster and more scalable tool that prioritizes computational efficiency while maintaining accuracy. TreeWave is freely available at https://github.com/nasmaB/TreeWave .
Collapse
Affiliation(s)
- Nasma Boumajdi
- Laboratory of Biotechnology (MedBiotech), Rabat Medical & Pharmacy School, Bioinova Research Center, Mohammed V University in Rabat, Rabat, Morocco
| | - Houda Bendani
- Laboratory of Biotechnology (MedBiotech), Rabat Medical & Pharmacy School, Bioinova Research Center, Mohammed V University in Rabat, Rabat, Morocco
| | - Lahcen Belyamani
- Mohammed VI Center for Research and Innovation (CM6), Rabat, Morocco
- Mohammed VI University of Sciences and Health (UM6SS), Casablanca, Morocco
- Emergency Department, Military Hospital Mohammed V, Rabat Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
| | - Azeddine Ibrahimi
- Laboratory of Biotechnology (MedBiotech), Rabat Medical & Pharmacy School, Bioinova Research Center, Mohammed V University in Rabat, Rabat, Morocco.
| |
Collapse
|
2
|
Taylor AD, Hathaway QA, Kunovac A, Pinti MV, Newman MS, Cook CC, Cramer ER, Starcovic SA, Winters MT, Westemeier-Rice ES, Fink GK, Durr AJ, Rizwan S, Shepherd DL, Robart AR, Martinez I, Hollander JM. Mitochondrial sequencing identifies long noncoding RNA features that promote binding to PNPase. Am J Physiol Cell Physiol 2024; 327:C221-C236. [PMID: 38826135 PMCID: PMC11427107 DOI: 10.1152/ajpcell.00648.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/24/2024] [Accepted: 05/24/2024] [Indexed: 06/04/2024]
Abstract
Extranuclear localization of long noncoding RNAs (lncRNAs) is poorly understood. Based on machine learning evaluations, we propose a lncRNA-mitochondrial interaction pathway where polynucleotide phosphorylase (PNPase), through domains that provide specificity for primary sequence and secondary structure, binds nuclear-encoded lncRNAs to facilitate mitochondrial import. Using FVB/NJ mouse and human cardiac tissues, RNA from isolated subcellular compartments (cytoplasmic and mitochondrial) and cross-linked immunoprecipitate (CLIP) with PNPase within the mitochondrion were sequenced on the Illumina HiSeq and MiSeq, respectively. lncRNA sequence and structure were evaluated through supervised [classification and regression trees (CART) and support vector machines (SVM)] machine learning algorithms. In HL-1 cells, quantitative PCR of PNPase CLIP knockout mutants (KH and S1) was performed. In vitro fluorescence assays assessed PNPase RNA binding capacity and verified with PNPase CLIP. One hundred twelve (mouse) and 1,548 (human) lncRNAs were identified in the mitochondrion with Malat1 being the most abundant. Most noncoding RNAs binding PNPase were lncRNAs, including Malat1. lncRNA fragments bound to PNPase compared against randomly generated sequences of similar length showed stratification with SVM and CART algorithms. The lncRNAs bound to PNPase were used to create a criterion for binding, with experimental validation revealing increased binding affinity of RNA designed to bind PNPase compared to control RNA. The binding of lncRNAs to PNPase was decreased through the knockout of RNA binding domains KH and S1. In conclusion, sequence and secondary structural features identified by machine learning enhance the likelihood of nuclear-encoded lncRNAs binding to PNPase and undergoing import into the mitochondrion.NEW & NOTEWORTHY Long noncoding RNAs (lncRNAs) are relatively novel RNAs with increasingly prominent roles in regulating genetic expression, mainly in the nucleus but more recently in regions such as the mitochondrion. This study explores how lncRNAs interact with polynucleotide phosphorylase (PNPase), a protein that regulates RNA import into the mitochondrion. Machine learning identified several RNA structural features that improved lncRNA binding to PNPase, which may be useful in targeting RNA therapeutics to the mitochondrion.
Collapse
Affiliation(s)
- Andrew D Taylor
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Quincy A Hathaway
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Heart and Vascular Institute, West Virginia University, Morgantown, West Virginia, United States
- Department of Medical Education, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Amina Kunovac
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Mark V Pinti
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- West Virginia University School of Pharmacy, Morgantown, West Virginia, United States
| | - Mackenzie S Newman
- Department of Physiology and Pharmacology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Chris C Cook
- Cardiovascular and Thoracic Surgery, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Evan R Cramer
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Sarah A Starcovic
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Michael T Winters
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Emily S Westemeier-Rice
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Garrett K Fink
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Andrya J Durr
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Saira Rizwan
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Danielle L Shepherd
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Aaron R Robart
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Ivan Martinez
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - John M Hollander
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| |
Collapse
|
3
|
Yin R, Luo Z, Kwoh CK. Exploring the Lethality of Human-Adapted Coronavirus Through Alignment-Free Machine Learning Approaches Using Genomic Sequences. Curr Genomics 2021; 22:583-595. [PMID: 35386190 PMCID: PMC8922323 DOI: 10.2174/1389202923666211221110857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 11/29/2022] Open
Abstract
Background A newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe, on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspectives for treatment. Methods We developed an alignment-free framework that utilizes machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of human-adapted coronavirus using genomic sequences. We performed extensive experiments through six different feature transformation and machine learning algorithms combining digital signal processing to identify the lethality of possible future novel coronaviruses using existing strains. Results The results tested on SARS-CoV, MERS-CoV and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our framework achieves high levels of prediction performance that is alignment-free and based on RNA sequences alone without genome annotations and specialized biological knowledge. Conclusion The results demonstrate that, for any novel human coronavirus strains, this study can offer a reliable real-time estimation for its viral lethality.
Collapse
Affiliation(s)
- Rui Yin
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
- Department of Biomedical Informatics, Harvard University, Boston, MA 02138, USA
| | - Zihan Luo
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| |
Collapse
|
4
|
Genomic signal processing of microarrays for cancer gene expression and identification using cluster-fuzzy adaptive networking. Soft comput 2020. [DOI: 10.1007/s00500-020-05068-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
5
|
Paredes O, Romo-Vázquez R, Román-Godínez I, Vélez-Pérez H, Salido-Ruiz RA, Morales JA. Frequency spectra characterization of noncoding human genomic sequences. Genes Genomics 2020; 42:1215-1226. [DOI: 10.1007/s13258-020-00980-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 04/27/2020] [Indexed: 11/28/2022]
|
6
|
Lichtblau D. Alignment-free genomic sequence comparison using FCGR and signal processing. BMC Bioinformatics 2019; 20:742. [PMID: 31888438 PMCID: PMC6937637 DOI: 10.1186/s12859-019-3330-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 12/17/2019] [Indexed: 01/14/2023] Open
Abstract
Background Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing “nearby” species in a reference data set, or for constructing phylogenetic trees. Results We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets. Conclusions The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.
Collapse
|
7
|
Alakus TB, Das B, Turkoglu I. DNA encoding with entropy based numerical mapping technique for phylogenetic analysis. 2019 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP) 2019. [DOI: 10.1109/idap.2019.8875937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
8
|
Farkaš T, Sitarčík J, Brejová B, Lucká M. SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches. Evol Bioinform Online 2019; 15:1176934319849071. [PMID: 31210725 PMCID: PMC6545658 DOI: 10.1177/1176934319849071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 04/12/2019] [Indexed: 11/16/2022] Open
Abstract
Computing similarity between 2 nucleotide sequences is one of the fundamental problems in bioinformatics. Current methods are based mainly on 2 major approaches: (1) sequence alignment, which is computationally expensive, and (2) faster, but less accurate, alignment-free methods based on various statistical summaries, for example, short word counts. We propose a new distance measure based on mathematical transforms from the domain of signal processing. To tolerate large-scale rearrangements in the sequences, the transform is computed across sliding windows. We compare our method on several data sets with current state-of-art alignment-free methods. Our method compares favorably in terms of accuracy and outperforms other methods in running time and memory requirements. In addition, it is massively scalable up to dozens of processing units without the loss of performance due to communication overhead. Source files and sample data are available at https://bitbucket.org/fiitstubioinfo/swspm/src.
Collapse
Affiliation(s)
- Tomáš Farkaš
- Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| | - Jozef Sitarčík
- Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| | - Broňa Brejová
- Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia
| | - Mária Lucká
- Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| |
Collapse
|
9
|
Randhawa GS, Hill KA, Kari L. ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. BMC Genomics 2019; 20:267. [PMID: 30943897 PMCID: PMC6448311 DOI: 10.1186/s12864-019-5571-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 02/27/2019] [Indexed: 11/11/2022] Open
Abstract
Background Although software tools abound for the comparison, analysis, identification, and classification of genomic sequences, taxonomic classification remains challenging due to the magnitude of the datasets and the intrinsic problems associated with classification. The need exists for an approach and software tool that addresses the limitations of existing alignment-based methods, as well as the challenges of recently proposed alignment-free methods. Results We propose a novel combination of supervised Machine Learning with Digital Signal Processing, resulting in ML-DSP: an alignment-free software tool for ultrafast, accurate, and scalable genome classification at all taxonomic levels. We test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of >97%. A quantitative comparison with state-of-the-art classification software tools is performed, on two small benchmark datasets and one large 4322 vertebrate mtDNA genomes dataset. Our results show that ML-DSP overwhelmingly outperforms the alignment-based software MEGA7 (alignment with MUSCLE or CLUSTALW) in terms of processing time, while having comparable classification accuracies for small datasets and superior accuracies for the large dataset. Compared with the alignment-free software FFP (Feature Frequency Profile), ML-DSP has significantly better classification accuracy, and is overall faster. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4,710 bacterial genomes into phyla with 95.5% accuracy. Lastly, our analysis shows that the “Purine/Pyrimidine”, “Just-A” and “Real” numerical representations of DNA sequences outperform ten other such numerical representations used in the Digital Signal Processing literature for DNA classification purposes. Conclusions Due to its superior classification accuracy, speed, and scalability to large datasets, ML-DSP is highly relevant in the classification of newly discovered organisms, in distinguishing genomic signatures and identifying their mechanistic determinants, and in evaluating genome integrity.
Collapse
Affiliation(s)
- Gurjit S Randhawa
- Department of Computer Science, University of Western Ontario, London, ON, Canada.
| | - Kathleen A Hill
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Lila Kari
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
10
|
Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Vélez-Pérez H, Morales JA. Genomic signal processing for DNA sequence clustering. PeerJ 2018; 6:e4264. [PMID: 29379686 PMCID: PMC5786891 DOI: 10.7717/peerj.4264] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 12/24/2017] [Indexed: 11/20/2022] Open
Abstract
Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.
Collapse
Affiliation(s)
| | - Israel Román-Godínez
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Sulema Torres-Ramos
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Ricardo A Salido-Ruiz
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Hugo Vélez-Pérez
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - J Alejandro Morales
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| |
Collapse
|
11
|
Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Morales JA. On DNA numerical representations for genomic similarity computation. PLoS One 2017; 12:e0173288. [PMID: 28323839 PMCID: PMC5360225 DOI: 10.1371/journal.pone.0173288] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 02/17/2017] [Indexed: 11/18/2022] Open
Abstract
Genomic signal processing (GSP) refers to the use of signal processing for the analysis of genomic data. GSP methods require the transformation or mapping of the genomic data to a numeric representation. To date, several DNA numeric representations (DNR) have been proposed; however, it is not clear what the properties of each DNR are and how the selection of one will affect the results when using a signal processing technique to analyze them. In this paper, we present an experimental study of the characteristics of nine of the most frequently-used DNR. The objective of this paper is to evaluate the behavior of each representation when used to measure the similarity of a given pair of DNA sequences.
Collapse
Affiliation(s)
- Gerardo Mendizabal-Ruiz
- Departamento de Ciencias Computacionales, División de Electrónica y Computación, Universidad de Guadalajara, Guadalajara, Jalisco, México
| | - Israel Román-Godínez
- Departamento de Ciencias Computacionales, División de Electrónica y Computación, Universidad de Guadalajara, Guadalajara, Jalisco, México
| | - Sulema Torres-Ramos
- Departamento de Ciencias Computacionales, División de Electrónica y Computación, Universidad de Guadalajara, Guadalajara, Jalisco, México
| | - Ricardo A. Salido-Ruiz
- Departamento de Ciencias Computacionales, División de Electrónica y Computación, Universidad de Guadalajara, Guadalajara, Jalisco, México
| | - J. Alejandro Morales
- Departamento de Ciencias Computacionales, División de Electrónica y Computación, Universidad de Guadalajara, Guadalajara, Jalisco, México
- * E-mail:
| |
Collapse
|
12
|
Mabrouk MS, Naeem SM, Eldosoky MA. DIFFERENT GENOMIC SIGNAL PROCESSING METHODS FOR EUKARYOTIC GENE PREDICTION: A SYSTEMATIC REVIEW. BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS 2017. [DOI: 10.4015/s1016237217300012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Bioinformatics field has now solidly settled itself as a control in molecular biology and incorporates an extensive variety of branches of knowledge from structural biology, genomics to gene expression studies. Bioinformatics is the application of computer technology to the management of biological information. Genomic signal processing (GSP) techniques have been connected most all around in bioinformatics and will keep on assuming an essential part in the investigation of biomedical issues. GSP refers to using the digital signal processing (DSP) methods for genomic data (e.g. DNA sequences) analysis. Recently, applications of GSP in bioinformatics have obtained great consideration such as identification of DNA protein coding regions, identification of reading frames, cancer detection and others. Cancer is one of the most dangerous diseases that the world faces and has raised the death rate in recent years, it is known medically as malignant neoplasm, so detection of it at the early stage can yield a promising approach to determine and take actions to treat with this risk. GSP is a method which can be used to detect the cancerous cells that are often caused due to genetic abnormality. This systematic review discusses some of the GSP applications in bioinformatics generally. The GSP techniques, used for cancer detection especially, are presented to collect the recent results and what has been reached at this point to be a new subject of research.
Collapse
Affiliation(s)
- Mai S. Mabrouk
- Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology (MUST University), Cairo, Egypt
| | - Safaa M. Naeem
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Cairo, Egypt
| | - Mohamed A. Eldosoky
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Cairo, Egypt
| |
Collapse
|
13
|
Borrayo E, Machida-Hirano R, Takeya M, Kawase M, Watanabe K. Principal components analysis--K-means transposon element based foxtail millet core collection selection method. BMC Genet 2016; 17:42. [PMID: 26880119 PMCID: PMC4754896 DOI: 10.1186/s12863-016-0343-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 02/01/2016] [Indexed: 11/21/2022] Open
Abstract
Background Core collections are important tools in genetic resources research and administration. At present, most core collection selection criteria are based on one of the following item characteristics: passport data, genetic markers, or morphological traits, which may lead to inadequate representations of variability in the complete collection. The development of a comprehensive methodology that includes as much element data as possible has been explored poorly. Using a collection of (Setaria italica sbsp. italica (L.) P. Beauv.) as a model, we developed a method for core collection construction based on genotype data and numerical representations of agromorphological traits, thereby improving the selection process. Results Principal component analysis allows the selection of the most informative discriminators among the various elements evaluated, regardless of whether they are genetic or morphological, thereby providing an adequate criterion for further K-mean clustering. Overall, the core collections of S. italica constructed using only genotype data demonstrated overall better validation scores than other core collections that we generated. However, core collection based on both genotype and agromorphological characteristics represented the overall diversity adequately. Conclusions The inclusion of both genotype and agromorphological characteristics as a comprehensive dataset in this methodology ensures that agricultural traits are considered in the core collection construction. This approach will be beneficial for genetic resources management and research activities for S. italica as well as other genetic resources. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0343-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ernesto Borrayo
- Gene Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba City, 305-8571, Ibaraki, Japan. .,Genetc Resources Center, National Institute of Agrobiological Sciences, 2-1-2 Kannodai, Tsukuba City, 305-8602, Ibaraki, Japan.
| | - Ryoko Machida-Hirano
- Gene Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba City, 305-8571, Ibaraki, Japan.
| | - Masaru Takeya
- Genetc Resources Center, National Institute of Agrobiological Sciences, 2-1-2 Kannodai, Tsukuba City, 305-8602, Ibaraki, Japan.
| | - Makoto Kawase
- Gene Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba City, 305-8571, Ibaraki, Japan.
| | - Kazuo Watanabe
- Gene Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba City, 305-8571, Ibaraki, Japan.
| |
Collapse
|