1
|
Silberstein M, Weissbrod O, Otten L, Tzemach A, Anisenia A, Shtark O, Tuberg D, Galfrin E, Gannon I, Shalata A, Borochowitz ZU, Dechter R, Thompson E, Geiger D. A system for exact and approximate genetic linkage analysis of SNP data in large pedigrees. Bioinformatics 2012; 29:197-205. [PMID: 23162081 DOI: 10.1093/bioinformatics/bts658] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The use of dense single nucleotide polymorphism (SNP) data in genetic linkage analysis of large pedigrees is impeded by significant technical, methodological and computational challenges. Here we describe Superlink-Online SNP, a new powerful online system that streamlines the linkage analysis of SNP data. It features a fully integrated flexible processing workflow comprising both well-known and novel data analysis tools, including SNP clustering, erroneous data filtering, exact and approximate LOD calculations and maximum-likelihood haplotyping. The system draws its power from thousands of CPUs, performing data analysis tasks orders of magnitude faster than a single computer. By providing an intuitive interface to sophisticated state-of-the-art analysis tools coupled with high computing capacity, Superlink-Online SNP helps geneticists unleash the potential of SNP data for detecting disease genes. RESULTS Computations performed by Superlink-Online SNP are automatically parallelized using novel paradigms, and executed on unlimited number of private or public CPUs. One novel service is large-scale approximate Markov Chain-Monte Carlo (MCMC) analysis. The accuracy of the results is reliably estimated by running the same computation on multiple CPUs and evaluating the Gelman-Rubin Score to set aside unreliable results. Another service within the workflow is a novel parallelized exact algorithm for inferring maximum-likelihood haplotyping. The reported system enables genetic analyses that were previously infeasible. We demonstrate the system capabilities through a study of a large complex pedigree affected with metabolic syndrome. AVAILABILITY Superlink-Online SNP is freely available for researchers at http://cbl-hap.cs.technion.ac.il/superlink-snp. The system source code can also be downloaded from the system website. CONTACT omerw@cs.technion.ac.il SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark Silberstein
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Kruse LV, Nyegaard M, Christensen U, Møller-Larsen S, Haagerup A, Deleuran M, Hansen LG, Venø SK, Goossens D, Del-Favero J, Børglum AD. A genome-wide search for linkage to allergic rhinitis in Danish sib-pair families. Eur J Hum Genet 2012; 20:965-72. [PMID: 22419170 PMCID: PMC3421129 DOI: 10.1038/ejhg.2012.46] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Allergic rhinitis (AR) is a complex disorder with a polygenic, multifactorial aetiology. Twin studies have found the genetic contribution to be substantial. We collected and clinically characterised a sample consisting of 127 Danish nuclear families with at least two siblings suffering from AR or allergic conjunctivitis including 540 individuals (286 children and 254 parents). A whole-genome linkage scan, using 424 microsatellite markers, was performed on both this sample and an earlier collected sample consisting of 130 families with atopic dermatitis and other atopic disorders. A third sib-pair family sample, which was previously collected and genotyped, was added to the analysis increasing the total sample size to 357 families consisting of 1508 individuals. In total, 190 families with AR was included. The linkage analysis software Genehunter NPL, Genehunter MOD, and Genehunter Imprinting were used to obtain nonparametric and parametric linkage results. Family-based association analysis of positional candidate SNPs was carried out using the FBAT program. We obtained genome-wide significant linkage to a novel AR locus at 1p13 and suggestive linkage to two novel regions at 1q31-q32 and 20p12, respectively. Family-based association analysis of SNPs in the candidate locus DNND1B/CRB1 at 1q31 showed no significant association and could not explain the linkage signal observed. Suggestive evidence of linkage was also obtained at three AR loci previously reported (2q14-q23, 2q23, and 12p13) and indication of linkage was observed at a number of additional loci. Likely maternal imprinting was observed at 2q23, and possible maternal imprinting at 3q28.
Collapse
|
3
|
Kirichenko AV, Belonogova NM, Aulchenko YS, Axenovich TI. PedStr software for cutting large pedigrees for haplotyping, IBD computation and multipoint linkage analysis. Ann Hum Genet 2009; 73:527-31. [PMID: 19604226 DOI: 10.1111/j.1469-1809.2009.00531.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We propose an automatic heuristic algorithm for splitting large pedigrees into fragments of no more than a user-specified bit size. The algorithm specifically aims to split large pedigrees where many close relatives are genotyped and to produce a set of sub-pedigrees for haplotype reconstruction, IBD computation or multipoint linkage analysis with the help of the Lander-Green-Kruglyak algorithm. We demonstrate that a set of overlapping pedigree fragments constructed with the help of our algorithm allows fast and effective haplotype reconstruction and detection of an allele's parental origin. Moreover, we compared pedigree fragments constructed with the help of our algorithm and existing programs PedCut and Jenti for multipoint linkage analysis. Our algorithm demonstrated significantly higher linkage power than the algorithm of Jenti and significantly shorter running time than the algorithm of PedCut. The software package PedStr implementing our algorithms is available at http://mga.bionet.nsc.ru/soft/index.html.
Collapse
Affiliation(s)
- Anatoly V Kirichenko
- Institute of Cytology & Genetics, Siberian Division, Russian Academy of Sciences, Novosibirsk, 630090 Russia
| | | | | | | |
Collapse
|
4
|
Liu F, Kirichenko A, Axenovich TI, van Duijn CM, Aulchenko YS. An approach for cutting large and complex pedigrees for linkage analysis. Eur J Hum Genet 2008; 16:854-60. [PMID: 18301450 DOI: 10.1038/ejhg.2008.24] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Utilizing large pedigrees in linkage analysis is a computationally challenging task. The pedigree size limits applicability of the Lander-Green-Kruglyak algorithm for linkage analysis. A common solution is to split large pedigrees into smaller computable subunits. We present a pedigree-splitting method that, within a user supplied bit-size limit, identifies subpedigrees having the maximal number of subjects of interest (eg patients) who share a common ancestor. We compare our method with the maximum clique partitioning method using a large and complex human pedigree consisting of 50 patients with Alzheimer's disease ascertained from genetically isolated Dutch population. We show that under a bit-size limit our method can assign more patients to subpedigrees than the clique partitioning method, particularly when splitting deep pedigrees where the subjects of interest are scattered in recent generations and are relatively distantly related via multiple genealogic connections. Our pedigree-splitting algorithm and associated software can facilitate genome-wide linkage scans searching for rare mutations in large pedigrees coming from genetically isolated populations. The software package PedCut implementing our approach is available at http://mga.bionet.nsc.ru/soft/index.html.
Collapse
Affiliation(s)
- Fan Liu
- Department of Epidemiology & Biostatistics, Erasmus MC, Rotterdam, The Netherlands
| | | | | | | | | |
Collapse
|
5
|
Axenovich TI, Zorkoltseva IV, Liu F, Kirichenko AV, Aulchenko YS. Breaking loops in large complex pedigrees. Hum Hered 2007; 65:57-65. [PMID: 17898536 DOI: 10.1159/000108937] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 05/22/2007] [Indexed: 11/19/2022] Open
Abstract
For pedigrees with multiple loops, exact likelihoods could not be computed in an acceptable time frame and thus, approximate methods are used. Some of these methods are based on breaking loops and approximations of complex pedigree likelihoods using the exact likelihood of the corresponding zero-loop pedigree. Due to ignoring loops, this method results in a loss of genetic information and a decrease in the power to detect linkage. To minimize this loss, an optimal set of loop breakers has to be selected. In this paper, we present a graph theory based algorithm for automatic selection of an optimal set of loop breakers. We propose using a total relationship between measured pedigree members as a proxy to power. To minimize the loss of genetic information, we suggest selection of such breakers whose duplication in a pedigree would be accompanied by a minimal loss of total relationship between measured pedigree members. We show that our algorithm compares favorably with other existing loop-breaker selection algorithms in terms of conservation of genetic information, statistical power and CPU time of subsequent linkage analysis. We implemented our method in a software package LOOP_EDGE, which is available at http://mga.bionet.nsc.ru/nlru/.
Collapse
Affiliation(s)
- Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Division of Russian Academy of Sciences, Novosibirsk, Russia.
| | | | | | | | | |
Collapse
|
6
|
Webb BT, van den Oord E, Akkari A, Wilton S, Ly T, Duff R, Barnes KC, Carlsen K, Gerritsen J, Lenney W, Silverman M, Sly P, Sundy J, Tsanakas J, von Berg A, Whyte M, Blumenthal M, Vestbo J, Middleton L, Helms PJ, Anderson WH, Pillai SG. Quantitative linkage genome scan for atopy in a large collection of Caucasian families. Hum Genet 2006; 121:83-92. [PMID: 17103228 DOI: 10.1007/s00439-006-0285-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2006] [Accepted: 10/18/2006] [Indexed: 10/23/2022]
Abstract
Quantitative phenotypes correlated with a complex disorder offer increased power to detect linkage in comparison to affected-unaffected classifications. Asthma is a complex disorder characterized by periods of bronchial obstruction and increased bronchial hyper reactivity. In childhood and early adulthood, asthma is frequently associated also with quantitative measures of atopy. Genome wide quantitative multipoint linkage analysis was conducted for serum IgE levels and percentage of positive skin prick test (SPT(per)) using three large groups of families originally ascertained for asthma. In this report, 438 and 429 asthma families were informative for linkage using IgE and SPT(per) which represents 690 independent families. Suggestive linkage (LOD > or = 2) was found on chromosomes 1, 3, and 8q with maximum LODs of 2.34 (IgE), 2.03 (SPT(per)), and 2.25 (IgE) near markers D1S1653, D3S2322-D3S1764, and D8S2324, respectively. The results from chromosomes 1 and 3 replicate previous reports of linkage. We also replicate linkage to 5q with peak LODs of 1.96 (SPT(per)) and 1.77 (IgE) at or near marker D5S1480. Our results provide further evidence implicating chromosomes 1, 3, and 5q. The current report represents one of the biggest genome scans so far reported for asthma related phenotypes. This study also demonstrates the utility of increased sample sizes and quantitative phenotypes in linkage analysis of complex disorders.
Collapse
Affiliation(s)
- Bradley T Webb
- Virginia Institute for Psychiatric and Behavioral Genetics, Medical College of Virginia, Virginia Commonwealth University, Richmond, VA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Ciullo M, Bellenguez C, Colonna V, Nutile T, Calabria A, Pacente R, Iovino G, Trimarco B, Bourgain C, Persico MG. New susceptibility locus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolate. Hum Mol Genet 2006; 15:1735-43. [PMID: 16611673 DOI: 10.1093/hmg/ddl097] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Essential hypertension (EH) affects a large proportion of the adult population in Western countries and is a major risk factor for cardiovascular diseases. EH is a multifactorial disease with a complex genetic component. To tackle the complexity of this genetic component, we have initiated a study of Campora, an isolated village in South Italy. A random sample of 389 adults was genotyped for a very dense microsatellite genome scan and phenotyped for EH. Of this sample, 173 affected individuals were all related through a 2,180-member pedigree and could be integrated within a linkage analysis. The complexity of the pedigree prevented its direct use for a non-parametric linkage (NPL) analysis. Therefore, the method proposed by Falchi et al. [2004, Am. J. Hum. Genet., 75, 1015-1031] was used for automatic pedigree-breaking. We identified a new locus for EH on chromosome 8q22-23 and detected linkage with two known loci for EH: 1q42-43 and 4p16. Simulations showed that the linkage with 8q22-23 is highly genome-wide significant, even when accounting for the breaking of the pedigree. An extension to qualitative traits of another pedigree-breaking approach [Pankratz et al., 2001, Genet. Epidemiol., 21 (Suppl. 1), S258-S263] also detected a significant linkage on 8q22-23 using a remarkably different set of sub-pedigrees and helped to refine the location of the linkage signal. This work both identifies a new locus strongly linked to hypertension and shows that the power of linkage analysis can be improved by the appropriate use of efficient pedigree-breaking strategies.
Collapse
Affiliation(s)
- Marina Ciullo
- Institute of Genetics and Biophysics, A. Buzzati-Traverso, CNR Naples, Italy.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Mathias RA, Beaty TH, Bailey-Wilson JE, Bickel C, Stockton ML, Barnes KC. Inheritance of total serum IgE in the isolated Tangier Island population from Virginia: complexities associated with genealogical depth of pedigrees in segregation analyses. Hum Hered 2005; 59:228-38. [PMID: 16093728 DOI: 10.1159/000087123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2004] [Accepted: 05/12/2005] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES This study was aimed at performing a segregation analysis of total serum immunoglobulin E (tIgE) in an isolated population using maximal genealogical information permitted by current software and computer capacities, while assessing the reliability of the best-fitting model of inheritance for tIgE through simulations. METHODS All current Tangier Island, VA, residents (n = 664) belonged to one large extended pedigree (n = 3,501) spanning 13 generations, with an average inbreeding coefficient of 0.009. Phenotype data were obtained on 453 (68.2%) of the residents using a population-based recruitment scheme. Due to computational limitations resulting from the extremely complex pedigree structure, analysis on only two pedigree reconstructions was feasible: a reduced pedigree retaining all phenotyped individuals and their parents as 57 distinct families, and 922 nuclear families. RESULTS Familial correlations and heritability calculations reveal a significant genetic component to tIgE in these data (heritability = 26%). The most parsimonious model to explain tIgE distribution indicated by the reduced pedigree structure was a two-distribution Mendelian model. However, larger and non-genetic models could not be rejected. Simulations over 200 replicates performed to evaluate the reliability of this model, indicated that using restricted genealogical information had minimal impact on results of segregation analyses performed here.
Collapse
Affiliation(s)
- Rasika A Mathias
- Department of Epidemiology, Bloomberg School of Hygiene and Public Health, Johns Hopkins University, Baltimore, MD 21224, USA.
| | | | | | | | | | | |
Collapse
|
9
|
Falchi M, Forabosco P, Mocci E, Borlino CC, Picciau A, Virdis E, Persico I, Parracciani D, Angius A, Pirastu M. A genomewide search using an original pairwise sampling approach for large genealogies identifies a new locus for total and low-density lipoprotein cholesterol in two genetically differentiated isolates of Sardinia. Am J Hum Genet 2004; 75:1015-31. [PMID: 15478097 PMCID: PMC1182138 DOI: 10.1086/426155] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2004] [Accepted: 09/22/2004] [Indexed: 11/03/2022] Open
Abstract
A powerful approach to mapping the genes for complex traits is to study isolated founder populations, in which genetic heterogeneity and environmental noise are likely to be reduced and in which extended genealogical data are often available. Using graph theory, we applied an approach that involved sampling from the large number of pairwise relationships present in an extended genealogy to reconstruct sets of subpedigrees that maximize the useful information for linkage mapping while minimizing calculation burden. We investigated, through simulation, the properties of the different sets in terms of bias in identity-by-descent (IBD) estimation and power decrease under various genetic models. We applied this approach to a small isolated population from Sardinia, the village of Talana, consisting of a unique large and complex pedigree, and performed a genomewide search through variance-components linkage analysis for serum lipid levels. We identified a region of significant linkage on chromosome 2 for total serum cholesterol and low-density lipoprotein (LDL) cholesterol. Through higher-density mapping, we obtained an increased linkage for both traits on 2q21.2-q24.1, with a LOD score of 4.3 for total serum cholesterol and of 3.9 for LDL cholesterol. A replication study was performed in an independent and larger set from a genetically differentiated isolated population of the same region of Sardinia, the village of Perdasdefogu. We obtained consistent linkage to the region for total serum cholesterol (LOD score 1.4) and LDL cholesterol (LOD score 2.2), with a level of concordance uncommon for complex traits, and refined the location of the quantitative-trait locus. Interestingly, the 2q21.1-22 region has also been linked to premature coronary heart disease in Finns, and, in the adjacent 2q14 region, significant linkage with triglycerides has been reported in Hutterites.
Collapse
|
10
|
Dyer TD, Blangero J, Williams JT, Göring HH, Mahaney MC. The effect of pedigree complexity on quantitative trait linkage analysis. Genet Epidemiol 2002; 21 Suppl 1:S236-43. [PMID: 11793675 DOI: 10.1002/gepi.2001.21.s1.s236] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Due to the computational difficulties of performing linkage analysis on large complex pedigrees, most investigators resort to simplifying such pedigrees by some ad hoc strategy. In this paper, we suggest an analytical method to compare the power of various pedigree simplification schemes by using the asymptotic distribution of the likelihood-ratio statistic. We applied the method to the large Hutterine pedigree. Our results indicate that the breaking and reduction of inbreeding loops can greatly diminish the power to localize quantitative trait loci. We also present an efficient Monte Carlo method for estimating identity-by-descent allele sharing in large complex pedigrees. This method is used to facilitate a linkage analysis of serum IgE levels in the Hutterites without simplifying the pedigree.
Collapse
Affiliation(s)
- T D Dyer
- Department of Genetics, Southwest Foundation for Biomedical Research, 7620 NW Loop 410, P.O. Box 760549, San Antonio, TX 78245-0549, USA
| | | | | | | | | |
Collapse
|