1
|
Yang W, Ji J, Fang G. A metric and its derived protein network for evaluation of ortholog database inconsistency. BMC Bioinformatics 2025; 26:6. [PMID: 39773281 PMCID: PMC11707888 DOI: 10.1186/s12859-024-06023-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 12/24/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Ortholog prediction, essential for various genomic research areas, faces growing inconsistencies amidst the expanding array of ortholog databases. The common strategy of computing consensus orthologs introduces additional arbitrariness, emphasizing the need to examine the causes of such inconsistencies and identify proteins susceptible to prediction errors. RESULTS We introduce the Signal Jaccard Index (SJI), a novel metric rooted in unsupervised genome context clustering, designed to assess protein similarity. Leveraging SJI, we construct a protein network and reveal that peripheral proteins within the network are the primary contributors to inconsistencies in orthology predictions. Furthermore, we show that a protein's degree centrality in the network serves as a strong predictor of its reliability in consensus sets. CONCLUSIONS We present an objective, unsupervised SJI-based network encompassing all proteins, in which its topological features elucidate ortholog prediction inconsistencies. The degree centrality (DC) effectively identifies error-prone orthology assignments without relying on arbitrary parameters. Notably, DC is stable, unaffected by species selection, and well-suited for ortholog benchmarking. This approach transcends the limitations of universal thresholds, offering a robust and quantitative framework to explore protein evolution and functional relationships.
Collapse
Affiliation(s)
- Weijie Yang
- NYU-Shanghai, Shanghai, 200120, China
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China
| | - Jingsi Ji
- NYU-Shanghai, Shanghai, 200120, China
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China
| | - Gang Fang
- NYU-Shanghai, Shanghai, 200120, China.
- Department of Biology, New York University, New York, NY, 10003, USA.
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
2
|
Aufiero G, Fruggiero C, D’Angelo D, D’Agostino N. Homoeologs in Allopolyploids: Navigating Redundancy as Both an Evolutionary Opportunity and a Technical Challenge-A Transcriptomics Perspective. Genes (Basel) 2024; 15:977. [PMID: 39202338 PMCID: PMC11353593 DOI: 10.3390/genes15080977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/22/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Allopolyploidy in plants involves the merging of two or more distinct parental genomes into a single nucleus, a significant evolutionary process in the plant kingdom. Transcriptomic analysis provides invaluable insights into allopolyploid plants by elucidating the fate of duplicated genes, revealing evolutionary novelties and uncovering their environmental adaptations. By examining gene expression profiles, scientists can discern how duplicated genes have evolved to acquire new functions or regulatory roles. This process often leads to the development of novel traits and adaptive strategies that allopolyploid plants leverage to thrive in diverse ecological niches. Understanding these molecular mechanisms not only enhances our appreciation of the genetic complexity underlying allopolyploidy but also underscores their importance in agriculture and ecosystem resilience. However, transcriptome profiling is challenging due to genomic redundancy, which is further complicated by the presence of multiple chromosomes sets and the variations among homoeologs and allelic genes. Prior to transcriptome analysis, sub-genome phasing and homoeology inference are essential for obtaining a comprehensive view of gene expression. This review aims to clarify the terminology in this field, identify the most challenging aspects of transcriptome analysis, explain their inherent difficulties, and suggest reliable analytic strategies. Furthermore, bulk RNA-seq is highlighted as a primary method for studying allopolyploid gene expression, focusing on critical steps like read mapping and normalization in differential gene expression analysis. This approach effectively captures gene expression from both parental genomes, facilitating a comprehensive analysis of their combined profiles. Its sensitivity in detecting low-abundance transcripts allows for subtle differences between parental genomes to be identified, crucial for understanding regulatory dynamics and gene expression balance in allopolyploids.
Collapse
Affiliation(s)
| | | | | | - Nunzio D’Agostino
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy; (G.A.); (C.F.); (D.D.)
| |
Collapse
|
3
|
Reeve J, Li Q, Lindtke D, Yeaman S. Comparing genome scans among species of the stickleback order reveals three different patterns of genetic diversity. Ecol Evol 2022; 12:e8502. [PMID: 35127027 PMCID: PMC8796908 DOI: 10.1002/ece3.8502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 12/09/2021] [Accepted: 12/14/2021] [Indexed: 12/03/2022] Open
Abstract
Comparing genome scans among species is a powerful approach for investigating the patterns left by evolutionary processes. In particular, this offers a way to detect candidate genes that drive convergent evolution. We compared genome scan results to investigate if patterns of genetic diversity and divergence are shared among divergent species within the stickleback order (Gasterosteiformes): the threespine stickleback (Gasterosteus aculeatus), ninespine stickleback (Pungitius pungitus), and tubesnout (Aulorhynchus flavidus). Populations were sampled from the southern and northern edges of each species' range, to identify patterns associated with latitudinal changes in genetic diversity. Weak correlations in genetic diversity (F ST and expected heterozygosity) and three different patterns in the genomic landscape were found among these species. Additionally, no candidate genes for convergent evolution were detected. This is a counterexample to the growing number of studies that have shown overlapping genetic patterns, demonstrating that genome scan comparisons can be noisy due to the effects of several interacting evolutionary forces.
Collapse
Affiliation(s)
- James Reeve
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
- Present address:
Tjärnö Marina LaboratoriumGöteborgs UniversitetStrömstadSweden
| | - Qiushi Li
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
- Present address:
Institute of Chinese Materia MedicaChina Academy of Chinese Medical SciencesBeijingChina
| | - Dorothea Lindtke
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
- Present address:
Institute of Plant SciencesUniversity of BernBernSwitzerland
| | - Samuel Yeaman
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
| |
Collapse
|
4
|
Kronenfeld JP, Ryon EL, Goldberg D, Lee RM, Yopp A, Wang A, Lee AY, Luu S, Hsu C, Silberfein E, Russell MC, Merchant NB, Goel N. Survival inequity in vulnerable populations with early-stage hepatocellular carcinoma: a United States safety-net collaborative analysis. HPB (Oxford) 2021; 23:868-876. [PMID: 33487553 PMCID: PMC8205960 DOI: 10.1016/j.hpb.2020.11.1150] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/12/2020] [Accepted: 11/24/2020] [Indexed: 12/12/2022]
Abstract
BACKGROUND Access to health insurance and curative interventions [surgery/liver-directed-therapy (LDT)] affects survival for early-stage hepatocellular carcinoma (HCC). The aim of this multi-institutional study of high-volume safety-net hospitals (SNHs) and their tertiary-academic-centers (AC) was to identify the impact of type/lack of insurance on survival disparities across hospitals, particularly SNHs whose mission is to minimize insurance related access-to-care barriers for vulnerable populations. METHODS Early-stage HCC patients (2012-2014) from the US Safety-Net Collaborative were propensity-score matched by treatment at SNH/AC. Overall survival (OS) was the primary outcome. Multivariable Cox proportional-hazard analysis was performed accounting for sociodemographic/clinical parameters. RESULTS Among 925 patients, those with no insurance (NI) had decreased curative surgery, compared to those with government insurance (GI) and private insurance [PI, (PI-SNH:60.5% vs. GI-SNH:33.1% vs. NI-SNH:13.6%, p < 0.001)], and decreased median OS (PI-SNH:32.1 vs. GI-SNH:22.8 vs. NI-SNH:9.4 months, p = 0.002). On multivariable regression controlling for sociodemographic/clinical parameters, NI-SNH (HR:2.5, 95% CI:1.3-4.9, p = 0.007) was the only insurance type/hospital system combination with significantly worse OS. CONCLUSION NI-SNH patients received less curative treatment than other insurance/hospitals types suggesting that treatment barriers, beyond access-to-care, need to be identified and addressed to achieve survival equity in early-stage HCC for vulnerable populations (NI-SNH).
Collapse
Affiliation(s)
- Joshua P Kronenfeld
- Division of Surgical Oncology, Department of Surgery, University of Miami Miller School of Medicine, 1120 NW 14th Street, Suite 410, Miami, FL 33136, USA
| | - Emily L Ryon
- Division of Surgical Oncology, Department of Surgery, University of Miami Miller School of Medicine, 1120 NW 14th Street, Suite 410, Miami, FL 33136, USA
| | - David Goldberg
- Division of Digestive Health and Liver Disease, Department of Medicine, University of Miami Miller School of Medicine, 1475 NW 12th Ave, Miami, FL 33136, USA
| | - Rachel M Lee
- Winship Cancer Institute, Division of Surgical Oncology, Department of Surgery, Emory University, 1365-C Clifton Road NE Atlanta, 30322, Georgia
| | - Adam Yopp
- Division of Surgical Oncology, Department of Surgery, University of Texas Southwestern Medical School, 2201 Inwood Rd 3rd Floor Suite 500, Dallas, TX 75390, USA
| | - Annie Wang
- Division of Surgical Oncology, Department of Surgery, NYU Langone Health, 160 East 34th Street, 3rd Floor, New York, NY, 10016, USA
| | - Ann Y Lee
- Division of Surgical Oncology, Department of Surgery, NYU Langone Health, 160 East 34th Street, 3rd Floor, New York, NY, 10016, USA
| | - Sommer Luu
- Division of Surgical Oncology, Department of Surgery, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Cary Hsu
- Division of Surgical Oncology, Department of Surgery, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Eric Silberfein
- Division of Surgical Oncology, Department of Surgery, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Maria C Russell
- Winship Cancer Institute, Division of Surgical Oncology, Department of Surgery, Emory University, 1365-C Clifton Road NE Atlanta, 30322, Georgia
| | - Nipun B Merchant
- Division of Surgical Oncology, Department of Surgery, University of Miami Miller School of Medicine, 1120 NW 14th Street, Suite 410, Miami, FL 33136, USA
| | - Neha Goel
- Division of Surgical Oncology, Department of Surgery, University of Miami Miller School of Medicine, 1120 NW 14th Street, Suite 410, Miami, FL 33136, USA.
| |
Collapse
|
5
|
Glover N, Sheppard S, Dessimoz C. Homoeolog Inference Methods Requiring Bidirectional Best Hits or Synteny Miss Many Pairs. Genome Biol Evol 2021; 13:6237894. [PMID: 33871639 PMCID: PMC8214411 DOI: 10.1093/gbe/evab077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2021] [Indexed: 12/22/2022] Open
Abstract
Homoeologs are pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization. Bioinformatic methods for accurate homoeology inference are crucial for studying the evolutionary consequences of polyploidization, and homoeology is typically inferred on the basis of bidirectional best hit (BBH) and/or positional conservation (synteny). However, these methods neglect the fact that genes can duplicate and move, both prior to and after the allopolyploidization event. These duplications and movements can result in many-to-many and/or nonsyntenic homoeologs-which thus remain undetected and unstudied. Here, using the allotetraploid upland cotton (Gossypium hirsutum) as a case study, we show that conventional approaches indeed miss a substantial proportion of homoeologs. Additionally, we found that many of the missed pairs of homoeologs are broadly and highly expressed. A gene ontology analysis revealed a high proportion of the nonsyntenic and non-BBH homoeologs to be involved in protein translation and are likely to contribute to the functional repertoire of cotton. Thus, from an evolutionary and functional genomics standpoint, choosing a homoeolog inference method which does not solely rely on 1:1 relationship cardinality or synteny is crucial for not missing these potentially important homoeolog pairs.
Collapse
Affiliation(s)
- Natasha Glover
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Switzerland
| | | | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Switzerland.,Department of Genetics, Evolution, and Environment, University College London, United Kingdom.,Department of Computer Science, University College London, United Kingdom
| |
Collapse
|
6
|
Abstract
Diagnostic processes typically rely on traditional and laborious methods, that are prone to human error, resulting in frequent misdiagnosis of diseases. Computational approaches are being increasingly used for more precise diagnosis of the clinical pathology, diagnosis of genetic and microbial diseases, and analysis of clinical chemistry data. These approaches are progressively used for improving the reliability of testing, resulting in reduced diagnostic errors. Artificial intelligence (AI)-based computational approaches mostly rely on training sets obtained from patient data stored in clinical databases. However, the use of AI is associated with several ethical issues, including patient privacy and data ownership. The capacity of AI-based mathematical models to interpret complex clinical data frequently leads to data bias and reporting of erroneous results based on patient data. In order to improve the reliability of computational approaches in clinical diagnostics, strategies to reduce data bias and analyzing real-life patient data need to be further refined.
Collapse
Affiliation(s)
- Mohammed A Alaidarous
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, Majmaah University, Majmaah, Kingdom of Saudi Arabia. E-mail.
| |
Collapse
|