1
|
Sick individuals, sick populations revisited: a test of the Rose hypothesis for type 2 diabetes disparities. BMJ PUBLIC HEALTH 2023; 1:e000655. [PMID: 38239263 PMCID: PMC10795613 DOI: 10.1136/bmjph-2023-000655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
Introduction The Rose hypothesis predicts that since genetic variation is greater within than between populations, genetic risk factors will be associated with individuals' risk of disease but not population disparities, and since socioenvironmental variation is greater between than within populations, socioenvironmental risk factors will be associated with population disparities but not individuals' disease risk. Methods We used the UK Biobank to test the Rose hypothesis for type 2 diabetes (T2D) ethnic disparities in the UK. Our cohort consists of 26 912 participants from Asian, black and white ethnic groups. Participants were characterised as T2D cases or controls based on the presence or absence of T2D diagnosis codes in electronic health records. T2D genetic risk was measured using a polygenic risk score (PRS), and socioeconomic deprivation was measured with the Townsend Index (TI). The variation of genetic (PRS) and socioeconomic (TI) risk factors within and between ethnic groups was calculated using analysis of variance. Multivariable logistic regression was used to associate PRS and TI with T2D cases, and mediation analysis was used to analyse the effect of PRS and TI on T2D ethnic group disparities. Results T2D prevalence differs for Asian 23.34% (OR=5.14, CI=4.68 to 5.65), black 16.64% (OR=3.81, CI=3.44 to 4.22) and white 7.35% (reference) ethnic groups in the UK. Both genetic and socioenvironmental T2D risk factors show greater within (w) than between (b) ethnic group variation: PRS w=64.60%, b=35.40%; TI w=71.18%, b=28.19%. Nevertheless, both genetic risk (PRS OR=1.96, CI=1.87 to 2.07) and socioeconomic deprivation (TI OR=1.09, CI=1.08 to 1.10) are associated with T2D individual risk and mediate T2D ethnic disparities (Asian PRS=22.5%, TI=9.8%; black PRS=32.0%, TI=25.3%). Conclusion A relative excess of within-group versus between-group variation does not preclude T2D risk factors from contributing to T2D ethnic disparities. Our results support an integrative approach to health disparities research that includes both genetic and socioenvironmental risk factors.
Collapse
|
2
|
Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. BMC GLOBAL AND PUBLIC HEALTH 2023; 1:22. [PMID: 38045036 PMCID: PMC10693462 DOI: 10.1186/s44263-023-00025-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/28/2023] [Indexed: 12/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the USA and has a greater observed prevalence among those who identify as Black or Hispanic. Methods This study aimed to assess T2D racial and ethnic disparities using the All of Us Research Program data and to measure associations between genetic ancestry (GA), socioeconomic deprivation, and T2D. We used the All of Us Researcher Workbench to analyze T2D prevalence and model its associations with GA, individual-level (iSDI), and zip code-based (zSDI) socioeconomic deprivation indices among participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n = 2311), Black (n = 16,282), Hispanic (n = 16,966), and White (n = 50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest levels of socioeconomic deprivation, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation, both iSDI and zSDI, are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with iSDI and zSDI on T2D. Higher levels of iSDI and zSDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of iSDI and zSDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusions Socioeconomic deprivation is associated with a higher prevalence of T2D in Black and Hispanic minority groups, compared to the majority White group. Nonetheless, socioeconomic deprivation is associated with reduced T2D risk within the Black and Hispanic groups. These results are paradoxical and have not been reported elsewhere, with possible explanations related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
|
3
|
Population Pharmacogenomics for Health Equity. Genes (Basel) 2023; 14:1840. [PMID: 37895188 PMCID: PMC10606908 DOI: 10.3390/genes14101840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
Health equity means the opportunity for all people and populations to attain optimal health, and it requires intentional efforts to promote fairness in patient treatments and outcomes. Pharmacogenomic variants are genetic differences associated with how patients respond to medications, and their presence can inform treatment decisions. In this perspective, we contend that the study of pharmacogenomic variation within and between human populations-population pharmacogenomics-can and should be leveraged in support of health equity. The key observation in support of this contention is that racial and ethnic groups exhibit pronounced differences in the frequencies of numerous pharmacogenomic variants, with direct implications for clinical practice. The use of race and ethnicity to stratify pharmacogenomic risk provides a means to avoid potential harm caused by biases introduced when treatment regimens do not consider genetic differences between population groups, particularly when majority group genetic profiles are assumed to hold for minority groups. We focus on the mitigation of adverse drug reactions as an area where population pharmacogenomics can have a direct and immediate impact on public health.
Collapse
|
4
|
Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. RESEARCH SQUARE 2023:rs.3.rs-2976764. [PMID: 37790565 PMCID: PMC10543018 DOI: 10.21203/rs.3.rs-2976764/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the United States and has greater observed prevalence among those who identify as Black or Hispanic. Methods The aims of this study were to determine whether T2D racial and ethnic disparities can be observed in data from the All of Us Research Program and to measure associations of genetic ancestry (GA) and socioeconomic deprivation with T2D. The All of Us Researcher Workbench was used to calculate T2D prevalence and to model T2D associations with GA, individual-level (iSDI) and zip code-based (zSDI) socioeconomic deprivation indices within and between participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n=2,311), Black (n=16,282), Hispanic (n=16,966), and White (n=50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest median SDI values, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with SDI on T2D. Higher levels of SDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of SDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusion Socioeconomic deprivation is positively associated with the SIRE group T2D disparities observed here but negatively associated with T2D within the Black and Hispanic groups that show the highest T2D prevalence. These results are paradoxical and have not been reported elsewhere. We discuss possible explanations for this paradox related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
|
5
|
Race, Ethnicity, and Pharmacogenomic Variation in the United States and the United Kingdom. Pharmaceutics 2023; 15:1923. [PMID: 37514109 PMCID: PMC10383154 DOI: 10.3390/pharmaceutics15071923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 06/30/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
The relevance of race and ethnicity to genetics and medicine has long been a matter of debate. An emerging consensus holds that race and ethnicity are social constructs and thus poor proxies for genetic diversity. The goal of this study was to evaluate the relationship between race, ethnicity, and clinically relevant pharmacogenomic variation in cosmopolitan populations. We studied racially and ethnically diverse cohorts of 65,120 participants from the United States All of Us Research Program (All of Us) and 31,396 participants from the United Kingdom Biobank (UKB). Genome-wide patterns of pharmacogenomic variation-6311 drug response-associated variants for All of Us and 5966 variants for UKB-were analyzed with machine learning classifiers to predict participants' self-identified race and ethnicity. Pharmacogenomic variation predicts race/ethnicity with averages of 92.1% accuracy for All of Us and 94.3% accuracy for UKB. Group-specific prediction accuracies range from 99.0% for the White group in UKB to 92.9% for the Hispanic group in All of Us. Prediction accuracies are substantially lower for individuals who identified with more than one group in All of Us (16.7%) or as Mixed in UKB (70.7%). There are numerous individual pharmacogenomic variants with large allele frequency differences between race/ethnicity groups in both cohorts. Frequency differences for toxicity-associated variants predict hundreds of adverse drug reactions per 1000 treated participants for minority groups in All of Us. Our results indicate that race and ethnicity can be used to stratify pharmacogenomic risk in the US and UK populations and should not be discounted when making treatment decisions. We resolve the contradiction between the results reported here and the orthodoxy of race and ethnicity as non-genetic, social constructs by emphasizing the distinction between global and local patterns of human genetic diversity, and we stress the current and future limitations of race and ethnicity as proxies for pharmacogenomic variation.
Collapse
|
6
|
The landscape of health disparities in the UK Biobank. Database (Oxford) 2023; 2023:7143539. [PMID: 37114803 PMCID: PMC10132819 DOI: 10.1093/database/baad026] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 03/01/2023] [Accepted: 04/05/2023] [Indexed: 04/29/2023]
Abstract
The UK Biobank (UKB), a large-scale biomedical database that includes demographic and electronic health record data for more than half a million ethnically diverse participants, is a potentially valuable resource for the study of health disparities. However, publicly accessible databases that catalog health disparities in the UKB do not exist. We developed the UKB Health Disparities Browser with the aims of (i) facilitating the exploration of the landscape of health disparities in the UK and (ii) directing the attention to areas of disparities research that might have the greatest public health impact. Health disparities were characterized for UKB participant groups defined by age, country of residence, ethnic group, sex and socioeconomic deprivation. We defined disease cohorts for UKB participants by mapping participant International Classification of Diseases, Tenth Revision (ICD-10) diagnosis codes to phenotype codes (phecodes). For each of the population attributes used to define population groups, disease percent prevalence values were computed for all groups from phecode case-control cohorts, and the magnitude of the disparities was calculated by both the difference and ratio of the range of disease prevalence values among groups to identify high- and low-prevalence disparities. We identified numerous diseases and health conditions with disparate prevalence values across population attributes, and we deployed an interactive web browser to visualize the results of our analysis: https://ukbatlas.health-disparities.org. The interactive browser includes overall and group-specific prevalence data for 1513 diseases based on a cohort of >500 000 participants from the UKB. Researchers can browse and sort by disease prevalence and prevalence differences to visualize health disparities for each of the five population attributes, and users can search for diseases of interest by disease names or codes. Database URL https://ukbatlas.health-disparities.org/.
Collapse
|
7
|
Rye: genetic ancestry inference at biobank scale. Nucleic Acids Res 2023; 51:e44. [PMID: 36928108 PMCID: PMC10164567 DOI: 10.1093/nar/gkad149] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 02/08/2023] [Accepted: 02/21/2023] [Indexed: 03/18/2023] Open
Abstract
Biobank projects are generating genomic data for many thousands of individuals. Computational methods are needed to handle these massive data sets, including genetic ancestry (GA) inference tools. Current methods for GA inference do not scale to biobank-size genomic datasets. We present Rye-a new algorithm for GA inference at biobank scale. We compared the accuracy and runtime performance of Rye to the widely used RFMix, ADMIXTURE and iAdmix programs and applied it to a dataset of 488221 genome-wide variant samples from the UK Biobank. Rye infers GA based on principal component analysis of genomic variant samples from ancestral reference populations and query individuals. The algorithm's accuracy is powered by Metropolis-Hastings optimization and its speed is provided by non-negative least squares regression. Rye produces highly accurate GA estimates for three-way admixed populations-African, European and Native American-compared to RFMix and ADMIXTURE (${R}^2 = \ 0.998 - 1.00$), and shows 50× runtime improvement compared to ADMIXTURE on the UK Biobank dataset. Rye analysis of UK Biobank samples demonstrates how it can be used to infer GA at both continental and subcontinental levels. We discuss user consideration and options for the use of Rye; the program and its documentation are distributed on the GitHub repository: https://github.com/healthdisparities/rye.
Collapse
|
8
|
Ethnic disparities in mortality and group-specific risk factors in the UK Biobank. PLOS GLOBAL PUBLIC HEALTH 2023; 3:e0001560. [PMID: 36963080 PMCID: PMC10021328 DOI: 10.1371/journal.pgph.0001560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/09/2023] [Indexed: 02/25/2023]
Abstract
Despite a substantial overall decrease in mortality, disparities among ethnic minorities in developed countries persist. This study investigated mortality disparities and their associated risk factors for the three largest ethnic groups in the United Kingdom: Asian, Black, and White. Study participants were sampled from the UK Biobank (UKB), a prospective cohort enrolled between 2006 and 2010. Genetics, biological samples, and health information and outcomes data of UKB participants were downloaded and data-fields were prioritized based on participants with death registry records. Kaplan-Meier method was used to evaluate survival differences among ethnic groups; survival random forest feature selection followed by Cox proportional-hazard modeling was used to identify and estimate the effects of shared and ethnic group-specific mortality risk factors. The White ethnic group showed significantly worse survival probability than the Asian and Black groups. In all three ethnic groups, endoscopy and colonoscopy procedures showed significant protective effects on overall mortality. Asian and Black women show lower relative risk of mortality than men, whereas no significant effect of sex was seen for the White group. The strongest ethnic group-specific mortality associations were ischemic heart disease for Asians, COVID-19 for Blacks, and cancers of respiratory/intrathoracic organs for Whites. Mental health-related diagnoses, including substance abuse, anxiety, and depression, were a major risk factor for overall mortality in the Asian group. The effect of mental health on Asian mortality, particularly for digestive cancers, was exacerbated by an observed hesitance to answer mental health questions, possibly related to cultural stigma. C-reactive protein (CRP) serum levels were associated with both overall and cause-specific mortality due to COVID-19 and digestive cancers in the Black group, where elevated CRP has previously been linked to psychosocial stress due to discrimination. Our results point to mortality risk factors that are group-specific and modifiable, supporting targeted interventions towards greater health equity.
Collapse
|
9
|
Abstract
This study assesses racial and ethnic differences in overall burden of firearm-related mortality and in change in firearm-related mortality among youths from 1999 to 2020.
Collapse
|
10
|
Comorbidities and ethnic health disparities in the UK biobank. JAMIA Open 2022; 5:ooac057. [PMID: 36313969 PMCID: PMC9272510 DOI: 10.1093/jamiaopen/ooac057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/15/2022] [Accepted: 06/24/2022] [Indexed: 11/15/2022] Open
Abstract
Objective The goal of this study was to investigate the relationship between comorbidities and ethnic health disparities in a diverse, cosmopolitan population. Materials and Methods We used the UK Biobank (UKB), a large progressive cohort study of the UK population. Study participants self-identified with 1 of 5 ethnic groups and participant comorbidities were characterized using the 31 disease categories captured by the Elixhauser Comorbidity Index. Ethnic disparities in comorbidities were quantified as the extent to which disease prevalence within categories varies across ethnic groups and the extent to which pairs of comorbidities co-occur within ethnic groups. Disease-risk factor comorbidity pairs were identified where one comorbidity is known to be a risk factor for a co-occurring comorbidity. Results The Asian ethnic group shows the greatest average number of comorbidities, followed by the Black and then White groups. The Chinese group shows the lowest average number of comorbidities. Comorbidity prevalence varies significantly among the ethnic groups for almost all disease categories, with diabetes and hypertension showing the largest differences across groups. Diabetes and hypertension both show ethnic-specific comorbidities that may contribute to the observed disease prevalence disparities. Discussion These results underscore the extent to which comorbidities vary among ethnic groups and reveal group-specific disease comorbidities that may underlie ethnic health disparities. Conclusion The study of comorbidity distributions across ethnic groups can be used to inform targeted group-specific interventions to reduce ethnic health disparities.
Collapse
|
11
|
The Apportionment of Pharmacogenomic Variation: Race, Ethnicity, and Adverse Drug Reactions. MEDICAL RESEARCH ARCHIVES 2022; 10:10.18103/mra.v10i9.2986. [PMID: 36304842 PMCID: PMC9600569 DOI: 10.18103/mra.v10i9.2986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Fifty years ago, Richard Lewontin found that the vast majority of human genetic variation falls within (~85%) rather than between (~15%) racial groups. This result has been replicated numerous times since and is widely taken to support the notion that genetic differences between racial groups are trivial and thus irrelevant for clinical decision-making. The aim of this study was to consider how the apportionment of pharmacogenomic variation within and between racial and ethnic groups relates to risk disparities for adverse drug reactions. We confirmed that the majority of pharmacogenomic variation falls within (97.3%) rather than between (2.78%) the three largest racial and ethnic groups in the United States: Black, Hispanic, and White. Nevertheless, pharmacogenomic variants showing far greater within than between-group variation can have high predictive value for adverse drug reactions, particularly for minority racial and ethnic groups. We predicted excess adverse drug reactions for minority Black and Hispanic groups, compared to the majority White group, and considered these results in light of the apportionment of genetic variation within and between groups. For 85% within and 15% between group variation, there are 700 excess adverse drug reactions per 1,000 patients predicted for a recessive effect model and 300 for a dominant model. We found high numbers of predicted Black and Hispanic excess adverse drug reactions for widely prescribed platinum chemotherapy compounds, such as cisplatin and oxaliplatin, as well as controlled narcotics, including fentanyl and tramadol. Our results indicate that race and ethnicity, while imprecise proxies for genetic diversity, correlate with patterns of pharmacogenomic variation in a way that is clearly relevant to medical treatment decisions. The effects of this variation is particularly pronounced for Black and Hispanic minority groups, owing to genetic differences from the majority White group. Treatment decisions that are made based on (assumed) White pharmacogenomic variant frequencies can be harmful for minority groups. Ignoring clinically relevant genetic differences among racial and ethnic groups, however well-intentioned, will exacerbate rather than ameliorate health disparities.
Collapse
|
12
|
Effects of genetic ancestry and socioeconomic deprivation on ethnic differences in serum creatinine. Gene 2022; 837:146709. [PMID: 35772650 PMCID: PMC9288982 DOI: 10.1016/j.gene.2022.146709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 06/24/2022] [Indexed: 11/18/2022]
Abstract
The inclusion of ethnicity in equations for estimating the glomerular filtration rate (eGFR) from serum creatinine levels has been challenged since ethnicity is socially defined and therefore a poor proxy for biological differences. We hypothesized that genetic ancestry (GA) would be more strongly associated with creatinine levels among healthy individuals than self-identified ethnicity. We studied a diverse cohort of 35,590 participants characterized as part of the UK Biobank, grouped by self-reported ethnicity: Black, East Asian, Mixed, Other, South Asian, and White. We used multivariable modeling to test for associations between ethnicity, GA, socioeconomic deprivation, and serum creatinine levels, including covariates for age, sex, height, and body mass index. Model fit comparisons and relative importance analysis were used to compare the effects of ethnicity and GA on creatinine levels. Black ethnicity shows a positive effect on participant serum creatinine levels (β = 9.36 ± 0.38), whereas East Asian (β = -1.80 ± 0.66) and South Asian (β = -0.28 ± 0.36) ethnicity show negative effects on creatinine. Male sex (β = 17.69 ± 0.34) and height (β = 0.13 ± 0.02) also show high positive associations with creatinine levels, while socioeconomic deprivation (β = -0.04 ± 0.04) shows no significant association. African ancestry has the highest association (β = 13.81 ± 0.52) with creatinine levels. Overall, GA (9.06%) explains significantly more of the variation in creatinine levels than ethnicity (4.96%), with African ancestry (6.36%) alone explaining more of the variation than ethnicity. We found that GA explains more of the variation in serum creatinine levels than socioeconomic deprivation, suggesting the possibility that ethnic differences in creatinine are shaped by genetic rather than social factors.
Collapse
|
13
|
Quality of life in caregivers of cancer patients in Colombia. J Clin Oncol 2022. [DOI: 10.1200/jco.2022.40.16_suppl.e24000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e24000 Background: It is well known that improvements in caregiver quality of life (QOL) could significantly impact the clinical outcome of cancer patients, which is why it is essential to study this population. Caregivers have extensive responsibilities, making them vulnerable to emotional, physical, social, and financial distress. Evaluation of cancer patient caregiver QOL in Latin America and validation of QOL metrics, such as the Caregiver Quality of Life Index-Cancer (CQOLC), has not yet been done in large populations. We sought to evaluate these characteristics in 5 regions in Colombia. Methods: Cancer patients (n = 165) receiving active treatment and their respective adult caregivers were evaluated. Both caregivers and patients completed a sociodemographic survey, along with the CQOLC, which was translated to Colombian Spanish and validated in a pilot cohort. CQOLC is composed of five subcategories with a total max score of 140. Higher scores are associated with better QOL. Internal consistency was determined by Cronbach’s alpha. Results: The patient’s median age was 63 years (58.8% females). The primary cancer diagnoses were breast, cervical, and lung cancer, with a median ECOG-PS of 1. 6.3% of patients did not have a caregiver, and their responses were also evaluated. The median caregiver age was 53 years (range 19-75 years, 60% females). 4.7% of caregivers were unemployed, and 59% reported financial or psychological distress. 93.8% of caregivers were family members, most frequently spouses or children. The median CQOLC score was 90 ± 15.2. The median score for each of the five subcategories “burden,” “disruptiveness,” “positive adaption,” “financial concern,” and “other” were as follows: 2, 3, 3, 3, and 3 (max score of 4). Conclusions: Most patients and caregivers were interested in participating in the study and reported no issues comprehending or answering the questions. This suggests no potential problems adapting the Spanish-translated CQOLC to future studies. Despite most caregivers reporting financial distress and low family income status, their overall CQOLC scores were relatively high compared to developing countries like the USA, potentially due to cross-cultural differences. The higher median score in the “disruptiveness” category suggests that Colombian caregivers feel strongly committed to caregiving and do not feel this activity interferes significantly with their daily life, possibly related to a sense of family responsibility and moral duty. However, the lower median score in the “burden” category indicates that caregivers feel a heavy emotional toll as a result of their caregiving role. Our results can help identify caregiver QOL areas of improvement, where government policies can be implemented to benefit both the caregiver and the patient.Keywords: caregiver burden, quality of life, Colombia, validation, CQOLC, cancer patients
Collapse
|
14
|
Epigenetics and cancer disparities: when nature might be nurture. Oncoscience 2022; 9:23-24. [PMID: 35479648 PMCID: PMC9033022 DOI: 10.18632/oncoscience.555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Indexed: 11/25/2022] Open
|
15
|
Association of genetic ancestry and molecular signatures with cancer survival disparities: a pan-cancer analysis. Cancer Res 2022; 82:1222-1233. [DOI: 10.1158/0008-5472.can-21-2105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/20/2021] [Accepted: 01/18/2022] [Indexed: 11/16/2022]
|
16
|
Correction to: Genetic Ancestry Inference for Pharmacogenomics. Methods Mol Biol 2022; 2547:C1. [PMID: 37794232 DOI: 10.1007/978-1-0716-2573-6_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
|
17
|
Genetic Ancestry Inference for Pharmacogenomics. Methods Mol Biol 2022; 2547:595-609. [PMID: 36068478 PMCID: PMC9486757 DOI: 10.1007/978-1-0716-2573-6_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Genetic ancestry inference can be used to stratify patient cohorts and to model pharmacogenomic variation within and between populations. We provide a detailed guide to genetic ancestry inference using genome-wide genetic variant datasets, with an emphasis on two widely used techniques: principal components analysis (PCA) and ADMIXTURE analysis. PCA can be used for patient stratification and categorical ancestry inference, whereas ADMIXTURE is used to characterize genetic ancestry as a continuous variable. Visualization methods are critical for the interpretation of genetic ancestry inference methods, and we provide instructions for how the results of PCA and ADMIXTURE can be effectively visualized.
Collapse
|
18
|
Comparative transcriptome analysis between patient and endometrial cancer cell lines to determine common signaling pathways and markers linked to cancer progression. Oncotarget 2021; 12:2500-2513. [PMID: 34966482 PMCID: PMC8711572 DOI: 10.18632/oncotarget.28161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/10/2021] [Indexed: 01/08/2023] Open
Abstract
The rising incidence and mortality of endometrial cancer (EC) in the United States calls for an improved understanding of the disease's progression. Current methodologies for diagnosis and treatment rely on the use of cell lines as models for tumor biology. However, due to inherent heterogeneity and differential growing environments between cell lines and tumors, these comparative studies have found little parallels in molecular signatures. As a consequence, the development and discovery of preclinical models and reliable drug targets are delayed. In this study, we established transcriptome parallels between cell lines and tumors from The Cancer Genome Atlas (TCGA) with the use of optimized normalization methods. We identified genes and signaling pathways associated with regulating the transformation and progression of EC. Specifically, the LXR/RXR activation, neuroprotective role for THOP1 in Alzheimer's disease, and glutamate receptor signaling pathways were observed to be mostly downregulated in advanced cancer stage. While some of these highlighted markers and signaling pathways are commonly found in the central nervous system (CNS), our results suggest a novel function of these genes in the periphery. Finally, our study underscores the value of implementing appropriate normalization methods in comparative studies to improve the identification of accurate and reliable markers.
Collapse
|
19
|
Comparing Genetic and Socioenvironmental Contributions to Ethnic Differences in C-Reactive Protein. Front Genet 2021; 12:738485. [PMID: 34733313 PMCID: PMC8558394 DOI: 10.3389/fgene.2021.738485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/05/2021] [Indexed: 02/03/2023] Open
Abstract
C-reactive protein (CRP) is a routinely measured blood biomarker for inflammation. Elevated levels of circulating CRP are associated with response to infection, risk for a number of complex common diseases, and psychosocial stress. The objective of this study was to compare the contributions of genetic ancestry, socioenvironmental factors, and inflammation-related health conditions to ethnic differences in C-reactive protein levels. We used multivariable regression to compare CRP blood serum levels between Black and White ethnic groups from the United Kingdom Biobank (UKBB) prospective cohort study. CRP serum levels are significantly associated with ethnicity in an age and sex adjusted model. Study participants who identify as Black have higher average CRP than those who identify as White, CRP increases with age, and females have higher average CRP than males. Ethnicity and sex show a significant interaction effect on CRP. Black females have higher average CRP levels than White females, whereas White males have higher average CRP than Black males. Significant associations between CRP, ethnicity, and genetic ancestry are almost completely attenuated in a fully adjusted model that includes socioenvironmental factors and inflammation-related health conditions. BMI, smoking, and socioeconomic deprivation all have high relative effects on CRP. These results indicate that socioenvironmental factors contribute more to CRP ethnic differences than genetics. Differences in CRP are associated with ethnic disparities for a number of chronic diseases, including type 2 diabetes, essential hypertension, sarcoidosis, and lupus erythematosus. Our results indicate that ethnic differences in CRP are linked to both socioenvironmental factors and numerous ethnic health disparities.
Collapse
|
20
|
The Impact of Ethnicity and Genetic Ancestry on Disease Prevalence and Risk in Colombia. Front Genet 2021; 12:690366. [PMID: 34650589 PMCID: PMC8507149 DOI: 10.3389/fgene.2021.690366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 08/11/2021] [Indexed: 11/13/2022] Open
Abstract
Currently, the vast majority of genomic research cohorts are made up of participants with European ancestry. Genomic medicine will only reach its full potential when genomic studies become more broadly representative of global populations. We are working to support the establishment of genomic medicine in developing countries in Latin America via studies of ethnically and ancestrally diverse Colombian populations. The goal of this study was to analyze the effect of ethnicity and genetic ancestry on observed disease prevalence and predicted disease risk in Colombia. Population distributions of Colombia's three major ethnic groups - Mestizo, Afro-Colombian, and Indigenous - were compared to disease prevalence and socioeconomic indicators. Indigenous and Mestizo ethnicity show the highest correlations with disease prevalence, whereas the effect of Afro-Colombian ethnicity is substantially lower. Mestizo ethnicity is mostly negatively correlated with six high-impact health conditions and positively correlated with seven of eight common cancers; Indigenous ethnicity shows the opposite effect. Malaria prevalence in particular is strongly correlated with ethnicity. Disease prevalence co-varies across geographic regions, consistent with the regional distribution of ethnic groups. Ethnicity is also correlated with regional variation in human development, partially explaining the observed differences in disease prevalence. Patterns of genetic ancestry and admixture for a cohort of 624 individuals from Medellín were compared to disease risk inferred via polygenic risk scores (PRS). African genetic ancestry is most strongly correlated with predicted disease risk, whereas European and Native American ancestry show weaker effects. African ancestry is mostly positively correlated with disease risk, and European ancestry is mostly negatively correlated. The relationships between ethnicity and disease prevalence do not show an overall correspondence with the relationships between ancestry and disease risk. We discuss possible reasons for the divergent health effects of ethnicity and ancestry as well as the implication of our results for the development of precision medicine in Colombia.
Collapse
|
21
|
Abstract
We investigated the ancestral origins of four Ecuadorian ethnic groups-Afro-Ecuadorian, Mestizo, Montubio, and the Indigenous Tsáchila-in an effort to gain insight on the relationship between ancestry, culture, and the formation of ethnic identities in Latin America. The observed patterns of genetic ancestry are largely concordant with ethnic identities and historical records of conquest and colonization in Ecuador. Nevertheless, a number of exceptional findings highlight the complex relationship between genetic ancestry and ethnicity in Ecuador. Afro-Ecuadorians show far less African ancestry, and the highest levels of Native American ancestry, seen for any Afro-descendant population in the Americas. Mestizos in Ecuador show high levels of Native American ancestry, with substantially less European ancestry, despite the relatively low Indigenous population in the country. The recently recognized Montubio ethnic group is highly admixed, with substantial contributions from all three continental ancestries. The Tsáchila show two distinct ancestry subgroups, with most individuals showing almost exclusively Native American ancestry and a smaller group showing a Mestizo characteristic pattern. Considered together with historical data and sociological studies, our results indicate the extent to which ancestry and culture interact, often in unexpected ways, to shape ethnic identity in Ecuador.
Collapse
|
22
|
Vitamin D and socioeconomic deprivation mediate COVID-19 ethnic health disparities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.09.20.21263865. [PMID: 34611667 PMCID: PMC8491858 DOI: 10.1101/2021.09.20.21263865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Ethnic minorities in developed countries suffer a disproportionately high burden of COVID-19 morbidity and mortality, and COVID-19 ethnic disparities have been attributed to social determinants of health. Vitamin D has been proposed as a modifiable risk factor that could mitigate COVID-19 health disparities. We investigated the relationship between vitamin D and COVID-19 susceptibility and severity using the UK Biobank, a large progressive cohort study of the United Kingdom population. Structural equation modelling was used to evaluate the ability of vitamin D, socioeconomic deprivation, and other known risk factors to mediate COVID-19 ethnic health disparities. Asian ethnicity is associated with higher COVID-19 susceptibility, compared to the majority White population, and Asian and Black ethnicity are both associated with higher COVID-19 severity. Socioeconomic deprivation mediates all three ethnic disparities and shows the highest overall signal of mediation for any COVID-19 risk factor. Vitamin supplements, including vitamin D, mediate the Asian disparity in COVID-19 susceptibility, and serum 25-hydroxyvitamin D (calcifediol) levels mediate Asian and Black COVID-19 severity disparities. Several measures of overall health also mediate COVID-19 ethnic disparities, underscoring the importance of comorbidities. Our results support ethnic minorities' use of vitamin D as both a prophylactic and a supplemental therapeutic for COVID-19.
Collapse
|
23
|
Socioeconomic deprivation and genetic ancestry interact to modify type 2 diabetes ethnic disparities in the United Kingdom. EClinicalMedicine 2021; 37:100960. [PMID: 34386746 PMCID: PMC8343245 DOI: 10.1016/j.eclinm.2021.100960] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/19/2021] [Accepted: 05/25/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) is a complex common disease that disproportionately impacts minority ethnic groups in the United Kingdom (UK). Socioeconomic deprivation (SED) is widely considered as a potential explanation for T2D ethnic disparities in the UK, whereas the effect of genetic ancestry (GA) on such disparities has yet to be studied. METHODS We leveraged data from the UK Biobank prospective cohort study, with participants enrolled between 2006 and 2010, to model the relationship between SED (Townsend index), GA (clustering principal components of whole genome genotype data), and T2D status (ICD-10 codes) across the three largest ethnic groups in the UK - Asian, Black, and White - using multivariable logistic regression. FINDINGS The Asian group shows the highest T2D prevalence (17·9%), followed by the Black (11·7%) and White (5·5%) ethnic groups. We find that both SED (OR: 1·11, 95% CI: 1·10-1·11) and non-European GA (OR South Asian versus European: 4·37, 95% CI: 4·10-4·66; OR African versus European: 2·52, 95% CI: 2·23-2·85) are significantly associated with the observed T2D disparities. GA and SED show significant interaction effects on T2D, with SED being a relatively greater risk factor for T2D for individuals with South Asian and African ancestry, compared to those with European ancestry. INTERPRETATION The significant interactions between SED and GA underscore how the effects of environmental risk factors can differ among ancestry groups, suggesting the need for group-specific interventions. FUNDING This work was supported by the National Institutes of Health (NIH) Distinguished Scholars Program (DSP) to LMR and the Division of Intramural Research (DIR) of the National Institute on Minority Health and Health Disparities (NIMHD) at NIH.
Collapse
|
24
|
Transcriptional Analyses of Acute Exposure to Methylmercury on Erythrocytes of Loggerhead Sea Turtle. TOXICS 2021; 9:70. [PMID: 33805397 PMCID: PMC8066450 DOI: 10.3390/toxics9040070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/11/2021] [Accepted: 03/17/2021] [Indexed: 01/09/2023]
Abstract
To understand changes in enzyme activity and gene expression as biomarkers of exposure to methylmercury, we exposed loggerhead turtle erythrocytes (RBCs) to concentrations of 0, 1, and 5 mg L-1 of MeHg and de novo transcriptome were assembled using RNA-seq. The analysis of differentially expressed genes (DEGs) indicated that 79 unique genes were dysregulated (39 upregulated and 44 downregulated genes). The results showed that MeHg altered gene expression patterns as a response to the cellular stress produced, reflected in cell cycle regulation, lysosomal activity, autophagy, calcium regulation, mitochondrial regulation, apoptosis, and regulation of transcription and translation. The analysis of DEGs showed a low response of the antioxidant machinery to MeHg, evidenced by the fact that genes of early response to oxidative stress were not dysregulated. The RBCs maintained a constitutive expression of proteins that represented a good part of the defense against reactive oxygen species (ROS) induced by MeHg.
Collapse
|
25
|
Detection of high heteroplasmy in complete loggerhead and hawksbill sea turtles mitochondrial genomes using RNAseq. Mitochondrial DNA A DNA Mapp Seq Anal 2021; 32:106-114. [PMID: 33629889 DOI: 10.1080/24701394.2021.1885389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Sea turtle populations around the world face rapid decline due to the effect of anthropogenic and environmental factors. Among the affected populations are those of hawksbill turtles (Eretmochelys imbricata) and loggerhead turtles (Caretta caretta), which is why a greater effort is currently being made in their monitoring and tracing. The intragenic degree of heteroplasmic mutations, commonly associated with diseases of variable symptoms, has not been analyzed in these species. In this study, heteroplasmy in the complete mitogenome (mtDNA) of three loggerhead turtles and one hawksbill turtle was identified from data obtained by RNAseq. Individuals Cc3, Ei1, Cc1 and Cc2 presented 0.3, 1.7, 1.8 and 7.1% of heteroplasmic mutations in all their mtDNA, respectively. The protein-coding genes that presented the highest percentage of heteroplasmy were ND4 and ND5 in individual Cc2 with 16 and 38.6%, respectively. Of the tRNA genes, only tRNATyr was heteroplasmic in the four individuals with 5.63% (Cc1), 25.35% (Ei1 and Cc2) and 49.3% (Cc3). In this study, we identified the critical sites of heteroplasmy in each individual and the genetic variability of their mitogenomes. The data obtained represents the baseline for future projects that evaluate the population status of these species.
Collapse
|
26
|
Transcriptome annotation in the cloud: complexity, best practices, and cost. Gigascience 2021; 10:6123656. [PMID: 33511996 DOI: 10.1093/gigascience/giaa163] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/13/2020] [Accepted: 12/23/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative provides NIH-funded researchers cost-effective access to commercial cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). These cloud providers represent an alternative for the execution of large computational biology experiments like transcriptome annotation, which is a complex analytical process that requires the interrogation of multiple biological databases with several advanced computational tools. The core components of annotation pipelines published since 2012 are BLAST sequence alignments using annotated databases of both nucleotide or protein sequences almost exclusively with networked on-premises compute systems. FINDINGS We compare multiple BLAST sequence alignments using AWS and GCP. We prepared several Jupyter Notebooks with all the code required to submit computing jobs to the batch system on each cloud provider. We consider the consequence of the number of query transcripts in input files and the effect on cost and processing time. We tested compute instances with 16, 32, and 64 vCPUs on each cloud provider. Four classes of timing results were collected: the total run time, the time for transferring the BLAST databases to the instance local solid-state disk drive, the time to execute the CWL script, and the time for the creation, set-up, and release of an instance. This study aims to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment. CONCLUSIONS We demonstrate that public cloud providers are a practical alternative for the execution of advanced computational biology experiments at low cost. Using our cloud recipes, the BLAST alignments required to annotate a transcriptome with ∼500,000 transcripts can be processed in <2 hours with a compute cost of ∼$200-$250. In our opinion, for BLAST-based workflows, the choice of cloud platform is not dependent on the workflow but, rather, on the specific details and requirements of the cloud provider. These choices include the accessibility for institutional use, the technical knowledge required for effective use of the platform services, and the availability of open source frameworks such as APIs to deploy the workflow.
Collapse
|
27
|
PM4NGS, a project management framework for next-generation sequencing data analysis. Gigascience 2021; 10:giaa141. [PMID: 33410471 PMCID: PMC7788391 DOI: 10.1093/gigascience/giaa141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 10/14/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.
Collapse
|
28
|
TPMCalculator: one-step software to quantify mRNA abundance of genomic features. Bioinformatics 2020; 35:1960-1962. [PMID: 30379987 DOI: 10.1093/bioinformatics/bty896] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 10/02/2018] [Accepted: 10/30/2018] [Indexed: 11/12/2022] Open
Abstract
SUMMARY The quantification of RNA sequencing (RNA-seq) abundance using a normalization method that calculates transcripts per million (TPM) is a key step to compare multiple samples from different experiments. TPMCalculator is a one-step software to process RNA-seq alignments in BAM format and reports TPM values, raw read counts and feature lengths for genes, transcripts, exons and introns. The program describes the genomic features through a model generated from the gene transfer format file used during alignments reporting of the TPM values and the raw read counts for each feature. In this paper, we show the correlation for 1256 samples from the TCGA-BRCA project between TPM and FPKM reported by TPMCalculator and RSeQC. We also show the correlation for raw read counts reported by TPMCalculator, HTSeq and featureCounts. AVAILABILITY AND IMPLEMENTATION TPMCalculator is freely available at https://github.com/ncbi/TPMCalculator. It is implemented in C++14 and supported on Mac OS X, Linux and MS Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
29
|
Machine Learning Neuroprotective Strategy Reveals a Unique Set of Parkinson Therapeutic Nicotine Analogs. THE OPEN BIOINFORMATICS JOURNAL 2020; 13:1-14. [PMID: 33927788 PMCID: PMC8081347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
AIMS Present a novel machine learning computational strategy to predict the neuroprotection potential of nicotine analogs acting over the behavior of unpaired signaling pathways in Parkinson's disease. BACKGROUND Dopaminergic replacement has been used for Parkinson's Disease (PD) treatment with positive effects on motor symptomatology but low progression and prevention effects. Epidemiological studies have shown that nicotine consumption decreases PD prevalence through neuroprotective mechanisms activation associated with the overstimulation of signaling pathways (SP) such as PI3K/AKT through nicotinic acetylcholine receptors (e.g α7 nAChRs) and over-expression of anti-apoptotic genes such as Bcl-2. Nicotine analogs with similar neuroprotective activity but decreased secondary effects remain as a promissory field. OBJECTIVE The objective of this study is to develop an interdisciplinary computational strategy predicting the neuroprotective activity of a series of 8 novel nicotine analogs over Parkinson's disease. METHODS We present a computational strategy integrating structural bioinformatics, SP manual reconstruction, and deep learning to predict the potential neuroprotective activity of 8 novel nicotine analogs over the behavior of PI3K/AKT. We performed a protein-ligand analysis between nicotine analogs and α7 nAChRs receptor using geometrical conformers, physicochemical characterization of the analogs and developed manually curated neuroprotective datasets to analyze their potential activity. Additionally, we developed a predictive machine-learning model for neuroprotection in PD through the integration of Markov Chain Monte-Carlo transition matrix for the 2 SP with synthetic training datasets of the physicochemical properties and structural dataset. RESULTS Our model was able to predict the potential neuroprotective activity of seven new nicotine analogs based on the binomial Bcl-2 response regulated by the activation of PI3K/AKT. CONCLUSION Hereby, we present a robust novel strategy to assess the neuroprotective potential of biomolecules based on SP architecture. Our theoretical strategy can be further applied to the study of new treatments related to SP deregulation and may ultimately offer new opportunities for therapeutic interventions in neurodegenerative diseases.
Collapse
|
30
|
Banana (Musa acuminata) transcriptome profiling in response to rhizobacteria: Bacillus amyloliquefaciens Bs006 and Pseudomonas fluorescens Ps006. BMC Genomics 2019; 20:378. [PMID: 31088352 PMCID: PMC6518610 DOI: 10.1186/s12864-019-5763-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/02/2019] [Indexed: 12/19/2022] Open
Abstract
Background Banana is one of the most important crops in tropical and sub-tropical regions. To meet the demands of international markets, banana plantations require high amounts of chemical fertilizers which translate into high farming costs and are hazardous to the environment when used excessively. Beneficial free-living soil bacteria that colonize the rhizosphere are known as plant growth-promoting rhizobacteria (PGPR). PGPR affect plant growth in direct or indirect ways and hold great promise for sustainable agriculture. Results PGPR of the genera Bacillus and Pseudomonas in banana cv. Williams were evaluated. These plants were produced through in vitro culture and inoculated individually with two rhizobacteria, Bacillus amyloliquefaciens strain Bs006 and Pseudomonas fluorescens strain Ps006. Control plants without microbial inoculum were also evaluated. These plants were kept in a controlled climate growth room with conditions required to favor plant-microorganism interactions. These interactions were evaluated at 1-, 48- and 96-h using transcriptome sequencing after inoculation to establish differentially expressed genes (DEGs) in plants elicited by the interaction with the two rhizobacteria. Additionally, droplet digital PCR was performed at 1, 48, 96 h, and also at 15 and 30 days to validate the expression patterns of selected DEGs. The banana cv. Williams transcriptome reported differential expression in a large number of genes of which 22 were experimentally validated. Genes validated experimentally correspond to growth promotion and regulation of specific functions (flowering, photosynthesis, glucose catabolism and root growth) as well as plant defense genes. This study focused on the analysis of 18 genes involved in growth promotion, defense and response to biotic or abiotic stress. Conclusions Differences in banana gene expression profiles in response to the rhizobacteria evaluated here (Bacillus amyloliquefaciens Bs006 and Pseudomonas fluorescens Ps006) are influenced by separate bacterial colonization processes and levels that stimulate distinct groups of genes at various points in time. Electronic supplementary material The online version of this article (10.1186/s12864-019-5763-5) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Abstract
BACKGROUND Modern Latin American populations were formed via genetic admixture among ancestral source populations from Africa, the Americas and Europe. We are interested in studying how combinations of genetic ancestry in admixed Latin American populations may impact genomic determinants of health and disease. For this study, we characterized the impact of ancestry and admixture on genetic variants that underlie health- and disease-related phenotypes in population genomic samples from Colombia, Mexico, Peru, and Puerto Rico. RESULTS We analyzed a total of 347 admixed Latin American genomes along with 1102 putative ancestral source genomes from Africans, Europeans, and Native Americans. We characterized the genetic ancestry, relatedness, and admixture patterns for each of the admixed Latin American genomes, finding a spectrum of ancestry proportions within and between populations. We then identified single nucleotide polymorphisms (SNPs) with anomalous ancestry-enrichment patterns, i.e. SNPs that exist in any given Latin American population at a higher frequency than expected based on the population's genetic ancestry profile. For this set of ancestry-enriched SNPs, we inspected their phenotypic impact on disease, metabolism, and the immune system. All four of the Latin American populations show ancestry-enrichment for a number of shared pathways, yielding evidence of similar selection pressures on these populations during their evolution. For example, all four populations show ancestry-enriched SNPs in multiple genes from immune system pathways, such as the cytokine receptor interaction, T cell receptor signaling, and antigen presentation pathways. We also found SNPs with excess African or European ancestry that are associated with ancestry-specific gene expression patterns and play crucial roles in the immune system and infectious disease responses. Genes from both the innate and adaptive immune system were found to be regulated by ancestry-enriched SNPs with population-specific regulatory effects. CONCLUSIONS Ancestry-enriched SNPs in Latin American populations have a substantial effect on health- and disease-related phenotypes. The concordant impact observed for same phenotypes across populations points to a process of adaptive introgression, whereby ancestry-enriched SNPs with specific functional utility appear to have been retained in modern populations by virtue of their effects on health and fitness.
Collapse
|
32
|
Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform 2018; 18:908-918. [PMID: 27524380 DOI: 10.1093/bib/bbw072] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Indexed: 12/19/2022] Open
Abstract
Transposable elements (TEs) are an important source of human genetic variation with demonstrable effects on phenotype. Recently, a number of computational methods for the detection of polymorphic TE (polyTE) insertion sites from next-generation sequence data have been developed. The use of such tools will become increasingly important as the pace of human genome sequencing accelerates. For this report, we performed a comparative benchmarking and validation analysis of polyTE detection tools in an effort to inform their selection and use by the TE research community. We analyzed a core set of seven tools with respect to ease of use and accessibility, polyTE detection performance and runtime parameters. An experimentally validated set of 893 human polyTE insertions was used for this purpose, along with a series of simulated data sets that allowed us to assess the impact of sequence coverage on tool performance. The recently developed tool MELT showed the best overall performance followed by Mobster and then RetroSeq. PolyTE detection tools can best detect Alu insertion events in the human genome with reduced reliability for L1 insertions and substantially lowered performance for SVA insertions. We also show evidence that different polyTE detection tools are complementary with respect to their ability to detect a complete set of insertion events. Accordingly, a combined approach, coupled with manual inspection of individual results, may yield the best overall performance. In addition to the benchmarking results, we also provide notes on tool installation and usage as well as suggestions for future polyTE detection algorithm development.
Collapse
|
33
|
Workflow and web application for annotating NCBI BioProject transcriptome data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3737827. [PMID: 28605765 PMCID: PMC5467576 DOI: 10.1093/database/bax008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 01/24/2017] [Indexed: 01/08/2023]
Abstract
The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/
Collapse
|
34
|
Abstract
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous ( dN) and synonymous ( dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (Δ dN & Δ dS) were then compared to their functional similarities ( sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between Δ dN and sGO, whereas there is no apparent relationship between Δ dS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
Collapse
|
35
|
De novo transcriptome assembly of loggerhead sea turtle nesting of the Colombian Caribbean. GENOMICS DATA 2017; 13:18-20. [PMID: 28649496 PMCID: PMC5472237 DOI: 10.1016/j.gdata.2017.06.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 05/31/2017] [Accepted: 06/07/2017] [Indexed: 11/23/2022]
Abstract
Loggerhead sea turtle Caretta caretta is widely distributed in the oceans of tropical and subtropical latitude. This turtle is an endangered species due to anthropic and natural factors that have decreased their population levels. In this study, RNA sequencing and de-novo assembly of genes expressed in blood were performed. The raw FASTQ files have been deposited on NCBI's SRA database with accession number SRX2629512. A total of 5.4 Gb raw sequence data were obtained, corresponding to 48,257,019 raw reads. Trinity pipeline was used to perform a de-novo assembly, we were able to identify 64,930 transcripts for female loggerhead turtle transcriptome with an N50 of 1131 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species.
Collapse
|
36
|
Human population-specific gene expression and transcriptional network modification with polymorphic transposable elements. Nucleic Acids Res 2017; 45:2318-2328. [PMID: 27998931 PMCID: PMC5389732 DOI: 10.1093/nar/gkw1286] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 12/05/2016] [Accepted: 12/12/2016] [Indexed: 02/07/2023] Open
Abstract
Transposable element (TE) derived sequences are known to contribute to the regulation of the human genome. The majority of known TE-derived regulatory sequences correspond to relatively ancient insertions, which are fixed across human populations. The extent to which human genetic variation caused by recent TE activity leads to regulatory polymorphisms among populations has yet to be thoroughly explored. In this study, we searched for associations between polymorphic TE (polyTE) loci and human gene expression levels using an expression quantitative trait loci (eQTL) approach. We compared locus-specific polyTE insertion genotypes to B cell gene expression levels among 445 individuals from 5 human populations. Numerous human polyTE loci correspond to both cis and trans eQTL, and their regulatory effects are directly related to cell type-specific function in the immune system. PolyTE loci are associated with differences in expression between European and African population groups, and a single polyTE loci is indirectly associated with the expression of numerous genes via the regulation of the B cell-specific transcription factor PAX5. The polyTE-gene expression associations we found indicate that human TE genetic variation can have important phenotypic consequences. Our results reveal that TE-eQTL are involved in population-specific gene regulation as well as transcriptional network modification.
Collapse
|
37
|
Complete mitochondrial genome of the nesting Colombian Caribbean Hawksbill Turtle. MITOCHONDRIAL DNA PART B-RESOURCES 2017; 2:128-129. [PMID: 30370335 PMCID: PMC6200328 DOI: 10.1080/23802359.2017.1292477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/29/2022]
Abstract
The hawksbill turtle, Eretmochelis imbricata (Linnaeus, 1766), is an endangered sea turtle in Colombian Caribbean beach. In this study, we report the complete mitochondrial DNA sequences of hawksbill turtle. The entire genome comprised 16,386 base pairs, and a nucleotide frequency of T: 25.6%, C: 26.9%, A: G 35.4% and 12.1%. The mitogenome sequence of hawksbill turtle would contribute to better understand population genetics, and evolution of sea turtles. Molecule was deposited at the GenBank database under the accession number KP221806.
Collapse
|
38
|
Population and clinical genetics of human transposable elements in the (post) genomic era. Mob Genet Elements 2017; 7:1-20. [PMID: 28228978 PMCID: PMC5305044 DOI: 10.1080/2159256x.2017.1280116] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/03/2017] [Accepted: 01/04/2017] [Indexed: 10/26/2022] Open
Abstract
Recent technological developments-in genomics, bioinformatics and high-throughput experimental techniques-are providing opportunities to study ongoing human transposable element (TE) activity at an unprecedented level of detail. It is now possible to characterize genome-wide collections of TE insertion sites for multiple human individuals, within and between populations, and for a variety of tissue types. Comparison of TE insertion site profiles between individuals captures the germline activity of TEs and reveals insertion site variants that segregate as polymorphisms among human populations, whereas comparison among tissue types ascertains somatic TE activity that generates cellular heterogeneity. In this review, we provide an overview of these new technologies and explore their implications for population and clinical genetic studies of human TEs. We cover both recent published results on human TE insertion activity as well as the prospects for future TE studies related to human evolution and health.
Collapse
|
39
|
Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 2016; 17:479. [PMID: 27871221 PMCID: PMC5117513 DOI: 10.1186/s12859-016-1354-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/11/2016] [Indexed: 11/24/2022] Open
Abstract
Background Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Results Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Conclusions Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1354-5) contains supplementary material, which is available to authorized users.
Collapse
|
40
|
Dataset of Arabidopsis plants that overexpress FT driven by a meristem-specific KNAT1 promoter. Data Brief 2016; 8:520-8. [PMID: 27366785 PMCID: PMC4919726 DOI: 10.1016/j.dib.2016.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 05/25/2016] [Accepted: 06/02/2016] [Indexed: 11/18/2022] Open
Abstract
In this dataset we integrated figures comparing leaf number and rosette diameter in three Arabidopsis FT overexpressor lines (AtFTOE) driven by KNAT1 promoter, “A member of the KNOTTED class of homeodomain proteins encoded by the STM gene of Arabidopsis” [5], vs Wild Type (WT) Arabidopsis plats. Also, presented in the tables are some transcriptomic data obtained by RNA-seq Illumina HiSeq from rosette leaves of Arabidopsis plants of AtFTOE 2.1 line vs WT with accession numbers SRR2094583 and SRR2094587 for AtFTOE replicates 1–3 and AtWT for control replicates 1–2 respectively. Raw data of paired-end sequences are located in the public repository of the National Center for Biotechnology Information of the National Library of Medicine, National Institutes of Health, United States of America, Bethesda, MD, USA as Sequence Read Archive (SRA). Performed analyses of differential expression genes are visualized by Mapman and presented in figures. “Transcriptomic analysis of Arabidopsis overexpressing flowering locus T driven by a meristem-specific promoter that induces early flowering” [2], described the interpretation and discussion of the obtained data.
Collapse
|
41
|
Transcriptomic analysis of Arabidopsis overexpressing flowering locus T driven by a meristem-specific promoter that induces early flowering. Gene 2016; 587:120-31. [PMID: 27154816 DOI: 10.1016/j.gene.2016.04.060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/18/2016] [Accepted: 04/25/2016] [Indexed: 01/09/2023]
Abstract
Here we analyzed in leaves the effect of FT overexpression driven by meristem-specific KNAT1 gene homolog of Arabidopsis thaliana (Lincoln et al., 1994; Long et al., 1996) on the transcriptomic response during plant development. Our results demonstrated that meristematic FT overexpression generates a phenotype with an early flowering independent of photoperiod when compared with wild type (WT) plants. Arabidopsis FT-overexpressor lines (AtFTOE) did not show significant differences compared with WT lines neither in leaf number nor in rosette diameter up to day 21, when AtFTOE flowered. After this period AtFTOE plants started flower production and no new rosette leaves were produced. Additionally, WT plants continued on vegetative stage up to day 40, producing 12-14 rosette leaves before flowering. Transcriptomic analysis of rosette leaves studied by sequencing Illumina RNA-seq allowed us to determine the differential expression in mature leaf rosette of 3652 genes, being 626 of them up-regulated and 3026 down-regulated. Overexpressed genes related with flowering showed up-regulated transcription factors such as MADS-box that are known as flowering markers in meristem and which overexpression has been related with meristem identity preservation and the transition from vegetative to floral stage. Genes related with sugar transport have shown a higher demand of monosaccharides derived from the hydrolysis of sucrose to glucose and probably fructose, which can also be influenced by reproductive stage of AtFTOE plants.
Collapse
|
42
|
HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw014. [PMID: 26989147 PMCID: PMC4795928 DOI: 10.1093/database/baw014] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 02/01/2016] [Indexed: 12/15/2022]
Abstract
Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. ‘HistoneDB 2.0 – with variants’ is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL:http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0
Collapse
|
43
|
Abstract
OBJECTIVE Chocó is a state located on the Pacific coast of Colombia that has a majority Afro-Colombian population. The objective of this study was to characterize the genetic ancestry, admixture and diversity of the population of Chocó, Colombia. METHODOLOGY Genetic variation was characterized for a sample of 101 donors (61 female and 40 male) from the state of Chocó. Genotypes were determined for each individual via the characterization of 610,545 single nucleotide polymorphisms genome-wide. Haplotypes for the uniparental mitochondrial DNA (female) and Y-DNA (male) chromosomes were also determined. These data were used for comparative analyses with a number of worldwide populations, including putative ancestral populations from Africa, the Americas and Europe, along with several admixed American populations. RESULTS The population of Chocó has predominantly African genetic ancestry (75.8%) with approximately equal parts European (13.4%) and Native American (11.1%) ancestry. Chocó shows relatively high levels of three-way genetic admixture, and far higher levels of Native American ancestry, compared to other New World African populations from the Caribbean and the United States. There is a striking pattern of sex-specific ancestry in Chocó, with Native American admixture along the female lineage and European admixture along the male lineage. The population of Chocó is also characterized by relatively high levels of overall genetic diversity compared to both putative ancestral populations and other admixed American populations. CONCLUSION These results suggest a unique genetic heritage for the population of Chocó and underscore the profound human genetic diversity that can be found in the region.
Collapse
|
44
|
Genetic diversity and population structure in Physalis peruviana and related taxa based on InDels and SNPs derived from COSII and IRG markers. ACTA ACUST UNITED AC 2015; 4:29-37. [PMID: 26550601 DOI: 10.1016/j.plgene.2015.09.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The genus Physalis is common in the Americas and includes several economically important species, among them Physalis peruviana that produces appetizing edible fruits. We studied the genetic diversity and population structure of P. peruviana and characterized 47 accessions of this species along with 13 accessions of related taxa consisting of 222 individuals from the Colombian Corporation of Agricultural Research (CORPOICA) germplasm collection, using Conserved Orthologous Sequences (COSII) and Immunity Related Genes (IRGs). In addition, 642 Single Nucleotide Polymorphism (SNPs) markers were identified and used for the genetic diversity analysis. A total of 121 alleles were detected in 24 InDels loci ranging from 2 to 9 alleles per locus, with an average of 5.04 alleles per locus. The average number of alleles in the SNP markers was two. The observed heterozygosity for P. peruviana with InDel and SNP markers was higher (0.48 and 0.59) than the expected heterozygosity (0.30 and 0.41). Interestingly, the observed heterozygosity in related taxa (0.4 and 0.12) was lower than the expected heterozygosity (0.59 and 0.25). The coefficient of population differentiation FST was 0.143 (InDels) and 0.038 (SNPs), showing a relatively low level of genetic differentiation among P. peruviana and related taxa. Higher levels of genetic variation were instead observed within populations based on the AMOVA analysis. Population structure analysis supported the presence of two main groups and PCA analysis based on SNP markers revealed two distinct clusters in the P. peruviana accessions corresponding to their state of cultivation. In this study, we identified molecular markers useful to detect genetic variation in Physalis germplasm for assisting conservation and crossbreeding strategies.
Collapse
|
45
|
Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. INTERNATIONAL JOURNAL OF BIOINFORMATICS RESEARCH AND APPLICATIONS 2014; 10:384-408. [PMID: 24989859 DOI: 10.1504/ijbra.2014.062991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.
Collapse
|
46
|
Mammalian-wide interspersed repeat (MIR)-derived enhancers and the regulation of human gene expression. Mob DNA 2014; 5:14. [PMID: 25018785 PMCID: PMC4090950 DOI: 10.1186/1759-8753-5-14] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 04/10/2014] [Indexed: 11/26/2022] Open
Abstract
Background Mammalian-wide interspersed repeats (MIRs) are the most ancient family of transposable elements (TEs) in the human genome. The deep conservation of MIRs initially suggested the possibility that they had been exapted to play functional roles for their host genomes. MIRs also happen to be the only TEs whose presence in-and-around human genes is positively correlated to tissue-specific gene expression. Similar associations of enhancer prevalence within genes and tissue-specific expression, along with MIRs’ previous implication as providing regulatory sequences, suggested a possible link between MIRs and enhancers. Results To test the possibility that MIRs contribute functional enhancers to the human genome, we evaluated the relationship between MIRs and human tissue-specific enhancers in terms of genomic location, chromatin environment, regulatory function, and mechanistic attributes. This analysis revealed MIRs to be highly concentrated in enhancers of the K562 and HeLa human cell-types. Significantly more enhancers were found to be linked to MIRs than would be expected by chance, and putative MIR-derived enhancers are characterized by a chromatin environment highly similar to that of canonical enhancers. MIR-derived enhancers show strong associations with gene expression levels, tissue-specific gene expression and tissue-specific cellular functions, including a number of biological processes related to erythropoiesis. MIR-derived enhancers were found to be a rich source of transcription factor binding sites, underscoring one possible mechanistic route for the element sequences co-option as enhancers. There is also tentative evidence to suggest that MIR-enhancer function is related to the transcriptional activity of non-coding RNAs. Conclusions Taken together, these data reveal enhancers to be an important cis-regulatory platform from which MIRs can exercise a regulatory function in the human genome and help to resolve a long-standing conundrum as to the reason for MIRs’ deep evolutionary conservation.
Collapse
|
47
|
Identification of immunity related genes to study the Physalis peruviana--Fusarium oxysporum pathosystem. PLoS One 2013; 8:e68500. [PMID: 23844210 PMCID: PMC3701084 DOI: 10.1371/journal.pone.0068500] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 05/30/2013] [Indexed: 11/18/2022] Open
Abstract
The Cape gooseberry (Physalisperuviana L) is an Andean exotic fruit with high nutritional value and appealing medicinal properties. However, its cultivation faces important phytosanitary problems mainly due to pathogens like Fusarium oxysporum, Cercosporaphysalidis and Alternaria spp. Here we used the Cape gooseberry foliar transcriptome to search for proteins that encode conserved domains related to plant immunity including: NBS (Nucleotide Binding Site), CC (Coiled-Coil), TIR (Toll/Interleukin-1 Receptor). We identified 74 immunity related gene candidates in P. peruviana which have the typical resistance gene (R-gene) architecture, 17 Receptor like kinase (RLKs) candidates related to PAMP-Triggered Immunity (PTI), eight (TIR-NBS-LRR, or TNL) and nine (CC–NBS-LRR, or CNL) candidates related to Effector-Triggered Immunity (ETI) genes among others. These candidate genes were categorized by molecular function (98%), biological process (85%) and cellular component (79%) using gene ontology. Some of the most interesting predicted roles were those associated with binding and transferase activity. We designed 94 primers pairs from the 74 immunity-related genes (IRGs) to amplify the corresponding genomic regions on six genotypes that included resistant and susceptible materials. From these, we selected 17 single band amplicons and sequenced them in 14 F. oxysporum resistant and susceptible genotypes. Sequence polymorphisms were analyzed through preliminary candidate gene association, which allowed the detection of one SNP at the PpIRG-63 marker revealing a nonsynonymous mutation in the predicted LRR domain suggesting functional roles for resistance.
Collapse
|
48
|
In silico identification and characterization of the ion transport specificity for P-type ATPases in the Mycobacterium tuberculosis complex. BMC STRUCTURAL BIOLOGY 2012; 12:25. [PMID: 23031689 PMCID: PMC3573892 DOI: 10.1186/1472-6807-12-25] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 09/27/2012] [Indexed: 12/30/2022]
Abstract
Background P-type ATPases hydrolyze ATP and release energy that is used in the transport of ions against electrochemical gradients across plasma membranes, making these proteins essential for cell viability. Currently, the distribution and function of these ion transporters in mycobacteria are poorly understood. Results In this study, probabilistic profiles were constructed based on hidden Markov models to identify and classify P-type ATPases in the Mycobacterium tuberculosis complex (MTBC) according to the type of ion transported across the plasma membrane. Topology, hydrophobicity profiles and conserved motifs were analyzed to correlate amino acid sequences of P-type ATPases and ion transport specificity. Twelve candidate P-type ATPases annotated in the M. tuberculosis H37Rv proteome were identified in all members of the MTBC, and probabilistic profiles classified them into one of the following three groups: heavy metal cation transporters, alkaline and alkaline earth metal cation transporters, and the beta subunit of a prokaryotic potassium pump. Interestingly, counterparts of the non-catalytic beta subunits of Hydrogen/Potassium and Sodium/Potassium P-type ATPases were not found. Conclusions The high content of heavy metal transporters found in the MTBC suggests that they could play an important role in the ability of M. tuberculosis to survive inside macrophages, where tubercle bacilli face high levels of toxic metals. Finally, the results obtained in this work provide a starting point for experimental studies that may elucidate the ion specificity of the MTBC P-type ATPases and their role in mycobacterial infections.
Collapse
|
49
|
Differences in local genomic context of bound and unbound motifs. Gene 2012; 506:125-34. [PMID: 22692006 DOI: 10.1016/j.gene.2012.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 06/04/2012] [Indexed: 11/25/2022]
Abstract
Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.
Collapse
|
50
|
Genome sequences for six Rhodanobacter strains, isolated from soils and the terrestrial subsurface, with variable denitrification capabilities. J Bacteriol 2012; 194:4461-2. [PMID: 22843592 PMCID: PMC3416251 DOI: 10.1128/jb.00871-12] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Accepted: 06/04/2012] [Indexed: 11/20/2022] Open
Abstract
We report the first genome sequences for six strains of Rhodanobacter species isolated from a variety of soil and subsurface environments. Three of these strains are capable of complete denitrification and three others are not. However, all six strains contain most of the genes required for the respiration of nitrate to gaseous nitrogen. The nondenitrifying members of the genus lack only the gene for nitrate reduction, the first step in the full denitrification pathway. The data suggest that the environmental role of bacteria from the genus Rhodanobacter should be reevaluated.
Collapse
|