1
|
Wyngaard GA, Skern-Mauritzen R, Malde K, Prendergast R, Peruzzi S. The salmon louse genome may be much larger than sequencing suggests. Sci Rep 2022; 12:6616. [PMID: 35459797 PMCID: PMC9033869 DOI: 10.1038/s41598-022-10585-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/08/2022] [Indexed: 12/30/2022] Open
Abstract
The genome size of organisms impacts their evolution and biology and is often assumed to be characteristic of a species. Here we present the first published estimates of genome size of the ecologically and economically important ectoparasite, Lepeophtheirus salmonis (Copepoda, Caligidae). Four independent L. salmonis genome assemblies of the North Atlantic subspecies Lepeophtheirus salmonis salmonis, including two chromosome level assemblies, yield assemblies ranging from 665 to 790 Mbps. These genome assemblies are congruent in their findings, and appear very complete with Benchmarking Universal Single-Copy Orthologs analyses finding > 92% of expected genes and transcriptome datasets routinely mapping > 90% of reads. However, two cytometric techniques, flow cytometry and Feulgen image analysis densitometry, yield measurements of 1.3-1.6 Gb in the haploid genome. Interestingly, earlier cytometric measurements reported genome sizes of 939 and 567 Mbps in L. salmonis salmonis samples from Bay of Fundy and Norway, respectively. Available data thus suggest that the genome sizes of salmon lice are variable. Current understanding of eukaryotic genome dynamics suggests that the most likely explanation for such variability involves repetitive DNA, which for L. salmonis makes up ≈ 60% of the genome assemblies.
Collapse
Affiliation(s)
- Grace A Wyngaard
- Department of Biology, James Madison University, Harrisonburg, VA, USA
| | | | - Ketil Malde
- Institute of Marine Research, Bergen, Norway
- Department of Informatics, University of Bergen, Bergen, Norway
| | | | - Stefano Peruzzi
- Department of Arctic Marine Biology, UiT-the Arctic University of Norway, Tromsø, Norway.
| |
Collapse
|
2
|
Skern-Mauritzen R, Malde K, Eichner C, Dondrup M, Furmanek T, Besnier F, Komisarczuk AZ, Nuhn M, Dalvin S, Edvardsen RB, Klages S, Huettel B, Stueber K, Grotmol S, Karlsbakk E, Kersey P, Leong JS, Glover KA, Reinhardt R, Lien S, Jonassen I, Koop BF, Nilsen F. The salmon louse genome: Copepod features and parasitic adaptations. Genomics 2021; 113:3666-3680. [PMID: 34403763 DOI: 10.1016/j.ygeno.2021.08.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/06/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022]
Abstract
Copepods encompass numerous ecological roles including parasites, detrivores and phytoplankton grazers. Nonetheless, copepod genome assemblies remain scarce. Lepeophtheirus salmonis is an economically and ecologically important ectoparasitic copepod found on salmonid fish. We present the 695.4 Mbp L. salmonis genome assembly containing ≈60% repetitive regions and 13,081 annotated protein-coding genes. The genome comprises 14 autosomes and a ZZ-ZW sex chromosome system. Assembly assessment identified 92.4% of the expected arthropod genes. Transcriptomics supported annotation and indicated a marked shift in gene expression after host attachment, including apparent downregulation of genes related to circadian rhythm coinciding with abandoning diurnal migration. The genome shows evolutionary signatures including loss of genes needed for peroxisome biogenesis, presence of numerous FNII domains, and an incomplete heme homeostasis pathway suggesting heme proteins to be obtained from the host. Despite repeated development of resistance against chemical treatments L. salmonis exhibits low numbers of many genes involved in detoxification.
Collapse
Affiliation(s)
| | - Ketil Malde
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway; Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Christiane Eichner
- Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Michael Dondrup
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormøhlens Gate 55, 5008 Bergen, Norway
| | - Tomasz Furmanek
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway
| | - Francois Besnier
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway
| | - Anna Zofia Komisarczuk
- Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Michael Nuhn
- EMBL-The European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Sussie Dalvin
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway
| | - Rolf B Edvardsen
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway
| | - Sven Klages
- Sequencing Core Facility, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Bruno Huettel
- Max Planck Genome Centre Cologne, Carl von Linné Weg 10, D-50829 Köln, Germany
| | - Kurt Stueber
- Max Planck Genome Centre Cologne, Carl von Linné Weg 10, D-50829 Köln, Germany
| | - Sindre Grotmol
- Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Egil Karlsbakk
- Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Paul Kersey
- EMBL-The European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK; Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Jong S Leong
- Department of Biology, University of Victoria, Victoria, British Columbia V8W 3N5, Canada
| | - Kevin A Glover
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway; Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway
| | - Richard Reinhardt
- Max Planck Genome Centre Cologne, Carl von Linné Weg 10, D-50829 Köln, Germany
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Oluf Thesens vei 6, 1433 Ås, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormøhlens Gate 55, 5008 Bergen, Norway
| | - Ben F Koop
- Department of Biology, University of Victoria, Victoria, British Columbia V8W 3N5, Canada
| | - Frank Nilsen
- Institute of Marine Research, Postboks 1870 Nordnes, 5817 Bergen, Norway; Sea Lice Research Centre. Department of Biological Sciences, University of Bergen, Thormøhlens Gate 53, 5006 Bergen, Norway.
| |
Collapse
|
3
|
Jansson E, Besnier F, Malde K, André C, Dahle G, Glover KA. Genome wide analysis reveals genetic divergence between Goldsinny wrasse populations. BMC Genet 2020; 21:118. [PMID: 33036553 PMCID: PMC7547435 DOI: 10.1186/s12863-020-00921-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/24/2020] [Indexed: 12/11/2022] Open
Abstract
Background Marine fish populations are often characterized by high levels of gene flow and correspondingly low genetic divergence. This presents a challenge to define management units. Goldsinny wrasse (Ctenolabrus rupestris) is a heavily exploited species due to its importance as a cleaner-fish in commercial salmonid aquaculture. However, at the present, the population genetic structure of this species is still largely unresolved. Here, full-genome sequencing was used to produce the first genomic reference for this species, to study population-genomic divergence among four geographically distinct populations, and, to identify informative SNP markers for future studies. Results After construction of a de novo assembly, the genome was estimated to be highly polymorphic and of ~600Mbp in size. 33,235 SNPs were thereafter selected to assess genomic diversity and differentiation among four populations collected from Scandinavia, Scotland, and Spain. Global FST among these populations was 0.015–0.092. Approximately 4% of the investigated loci were identified as putative global outliers, and ~ 1% within Scandinavia. SNPs showing large divergence (FST > 0.15) were picked as candidate diagnostic markers for population assignment. One hundred seventy-three of the most diagnostic SNPs between the two Scandinavian populations were validated by genotyping 47 individuals from each end of the species’ Scandinavian distribution range. Sixty-nine of these SNPs were significantly (p < 0.05) differentiated (mean FST_173_loci = 0.065, FST_69_loci = 0.140). Using these validated SNPs, individuals were assigned with high probability (≥ 94%) to their populations of origin. Conclusions Goldsinny wrasse displays a highly polymorphic genome, and substantial population genomic structure. Diversifying selection likely affects population structuring globally and within Scandinavia. The diagnostic loci identified now provide a promising and cost-efficient tool to investigate goldsinny wrasse populations further.
Collapse
Affiliation(s)
- Eeva Jansson
- Institute of Marine Research, P. O. Box 1870, Nordnes, 5817, Bergen, Norway.
| | - Francois Besnier
- Institute of Marine Research, P. O. Box 1870, Nordnes, 5817, Bergen, Norway
| | - Ketil Malde
- Institute of Marine Research, P. O. Box 1870, Nordnes, 5817, Bergen, Norway
| | - Carl André
- Department of Marine Sciences-Tjärnö, University of Gothenburg, 45296, Strömstad, Sweden
| | - Geir Dahle
- Institute of Marine Research, P. O. Box 1870, Nordnes, 5817, Bergen, Norway
| | - Kevin A Glover
- Institute of Marine Research, P. O. Box 1870, Nordnes, 5817, Bergen, Norway.,Institute of Biology, University of Bergen, P. O. Box 7803, 5020, Bergen, Norway
| |
Collapse
|
4
|
Boitsov S, Grøsvik BE, Nesje G, Malde K, Klungsøyr J. Levels and temporal trends of persistent organic pollutants (POPs) in Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus) from the southern Barents Sea. Environ Res 2019; 172:89-97. [PMID: 30782539 DOI: 10.1016/j.envres.2019.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 01/08/2019] [Accepted: 02/07/2019] [Indexed: 06/09/2023]
Abstract
Liver samples of two gadoid species, Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus), sampled in the southern Barents Sea in the period 1992-2015, were studied for the levels of six types of persistent organic pollutants (POPs): polychlorinated biphenyls (PCBs), chlorinated organic pesticides (DDTs, hexachlorocyclohexanes (HCHs), hexachlorobenzene (HCB), trans-nonachlor (TNC)), and polybrominated diphenyl ethers (PBDEs). Higher average levels were found in cod than in haddock. Sampling approximately every third year allowed studies of temporal trends for all the compound groups except PBDEs. Time series are reported for 1992-2015 for Atlantic cod and for 1998-2015 for haddock. Decreasing temporal trends have been modeled in cod for the analyzed POPs for this time period. The decrease seems to be slowing down in the later years. HCB levels showed least decrease with time among all the contaminants, with the poorest fit to the proposed model. Similar time trends were found in haddock, but the decrease is less apparent due to shorter time series. The observed time trends of legacy POPs document the effectiveness of efforts during the 1990s to reduce the levels of these contaminants in the marine environment but question the possibility to eliminate them altogether from the marine environment in the foreseeable future.
Collapse
Affiliation(s)
- Stepan Boitsov
- Institute of Marine Research, P.O. Box 1870 Nordnes, 5817 Bergen, Norway.
| | | | - Guri Nesje
- Institute of Marine Research, P.O. Box 1870 Nordnes, 5817 Bergen, Norway
| | - Ketil Malde
- Institute of Marine Research, P.O. Box 1870 Nordnes, 5817 Bergen, Norway
| | - Jarle Klungsøyr
- Institute of Marine Research, P.O. Box 1870 Nordnes, 5817 Bergen, Norway
| |
Collapse
|
5
|
Abstract
The age structure of a fish population has important implications for recruitment processes and population fluctuations, and is a key input to fisheries-assessment models. The current method of determining age structure relies on manually reading age from otoliths, and the process is labor intensive and dependent on specialist expertise. Recent advances in machine learning have provided methods that have been remarkably successful in a variety of settings, with potential to automate analysis that previously required manual curation. Machine learning models have previously been successfully applied to object recognition and similar image analysis tasks. Here we investigate whether deep learning models can also be used for estimating the age of otoliths from images. We adapt a pre-trained convolutional neural network designed for object recognition, to estimate the age of fish from otolith images. The model is trained and validated on a large collection of images of Greenland halibut otoliths. We show that the model works well, and that its precision is comparable to documented precision obtained by human experts. Automating this analysis may help to improve consistency, lower cost, and increase the extent of age estimation. Given that adequate data are available, this method could also be used to estimate age of other species using images of otoliths or fish scales.
Collapse
Affiliation(s)
- Endre Moen
- Institute of Marine Research, Bergen, Norway
- * E-mail:
| | | | | | | | - Alf Harbitz
- Institute of Marine Research, Bergen, Norway
| | - Ketil Malde
- Institute of Marine Research, Bergen, Norway
- Department of Informatics, University of Bergen, Norway
| |
Collapse
|
6
|
Malde K, Seliussen BB, Quintela M, Dahle G, Besnier F, Skaug HJ, Øien N, Solvang HK, Haug T, Skern-Mauritzen R, Kanda N, Pastene LA, Jonassen I, Glover KA. Whole genome resequencing reveals diagnostic markers for investigating global migration and hybridization between minke whale species. BMC Genomics 2017; 18:76. [PMID: 28086785 PMCID: PMC5237217 DOI: 10.1186/s12864-016-3416-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 12/12/2016] [Indexed: 11/24/2022] Open
Abstract
Background In the marine environment, where there are few absolute physical barriers, contemporary contact between previously isolated species can occur across great distances, and in some cases, may be inter-oceanic. An example of this can be seen in the minke whale species complex. Antarctic minke whales are genetically and morphologically distinct from the common minke found in the north Atlantic and Pacific oceans, and the two species are estimated to have been isolated from each other for 5 million years or more. Recent atypical migrations from the southern to the northern hemisphere have been documented and fertile hybrids and back-crossed individuals between both species have also been identified. However, it is not known whether this represents a contemporary event, potentially driven by ecosystem changes in the Antarctic, or a sporadic occurrence happening over an evolutionary time-scale. We successfully used whole genome resequencing to identify a panel of diagnostic SNPs which now enable us address this evolutionary question. Results A large number of SNPs displaying fixed or nearly fixed allele frequency differences among the minke whale species were identified from the sequence data. Five panels of putatively diagnostic markers were established on a genotyping platform for validation of allele frequencies; two panels (26 and 24 SNPs) separating the two species of minke whale, and three panels (22, 23, and 24 SNPs) differentiating the three subspecies of common minke whale. The panels were validated against a set of reference samples, demonstrating the ability to accurately identify back-crossed whales up to three generations. Conclusions This work has resulted in the development of a panel of novel diagnostic genetic markers to address inter-oceanic and global contact among the genetically isolated minke whale species and sub-species. These markers, including a globally relevant genetic reference data set for this species complex, are now openly available for researchers interested in identifying other potential whale hybrids in the world’s oceans. The approach used here, combining whole genome resequencing and high-throughput genotyping, represents a universal approach to develop similar tools for other species and population complexes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3416-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ketil Malde
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway.,Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | | | - María Quintela
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway
| | - Geir Dahle
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway
| | - Francois Besnier
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway
| | - Hans J Skaug
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway.,Department of Mathematics, University of Bergen, N-5020, Bergen, Norway
| | - Nils Øien
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway
| | - Hiroko K Solvang
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway
| | - Tore Haug
- Institute of Marine Research, PO box 6404, N-9294, Tromsø, Norway
| | | | - Naohisa Kanda
- Institute of Cetacean Research, Toyomi-cho 4-5, Chuo-ku, Tokyo, 104-0055, Japan.,Japan NUS Co., Ltd, Nishi-Shinjuku Kimuraya Bldg 5F, 7-5-25, Nishi-Shinjuku, 160-0023, Japan
| | - Luis A Pastene
- Institute of Cetacean Research, Toyomi-cho 4-5, Chuo-ku, Tokyo, 104-0055, Japan
| | - Inge Jonassen
- Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | - Kevin A Glover
- Institute of Marine Research, PO box 1870, Nordnes, N-5817, Bergen, Norway. .,Department of Biology, University of Bergen, N-5020, Bergen, Norway.
| |
Collapse
|
7
|
Eichner C, Dalvin S, Skern-Mauritzen R, Malde K, Kongshaug H, Nilsen F. Characterization of a novel RXR receptor in the salmon louse (Lepeophtheirus salmonis, Copepoda) regulating growth and female reproduction. BMC Genomics 2015; 16:81. [PMID: 25765704 PMCID: PMC4333900 DOI: 10.1186/s12864-015-1277-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 01/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nuclear receptors have crucial roles in all metazoan animals as regulators of gene transcription. A wide range of studies have elucidated molecular and biological significance of nuclear receptors but there are still a large number of animals where the knowledge is very limited. In the present study we have identified an RXR type of nuclear receptor in the salmon louse (Lepeophtheirus salmonis) (i.e. LsRXR). RXR is one of the two partners of the Ecdysteroid receptor in arthropods, the receptor for the main molting hormone 20-hydroxyecdysone (E20) with a wide array of effects in arthropods. RESULTS Five different LsRXR transcripts were identified by RACE showing large differences in domain structure. The largest isoforms contained complete DNA binding domain (DBD) and ligand binding domain (LBD), whereas some variants had incomplete or no DBD. LsRXR is transcribed in several tissues in the salmon louse including ovary, subcuticular tissue, intestine and glands. By using Q-PCR it is evident that the LsRXR mRNA levels vary throughout the L. salmonis life cycle. We also show that the truncated LsRXR transcript comprise about 50% in all examined samples. We used RNAi to knock-down the transcription in adult reproducing female lice. This resulted in close to zero viable offspring. We also assessed the LsRXR RNAi effects using a L. salmonis microarray and saw significant effects on transcription in the female lice. Transcription of the major yolk proteins was strongly reduced by knock-down of LsRXR. Genes involved in lipid metabolism and transport were also down regulated. Furthermore, different types of growth processes were up regulated and many cuticle proteins were present in this group. CONCLUSIONS The present study demonstrates the significance of LsRXR in adult female L. salmonis and discusses the functional aspects in relation to other arthropods. LsRXR has a unique structure that should be elucidated in the future.
Collapse
Affiliation(s)
- Christiane Eichner
- Department of Biology, Sea Lice Research Centre, University of Bergen, Bergen, Norway.
| | - Sussie Dalvin
- Department of Biology, Sea Lice Research Centre, University of Bergen, Bergen, Norway. .,Institute of Marine Research, Bergen, Norway.
| | | | - Ketil Malde
- Institute of Marine Research, Bergen, Norway.
| | - Heidi Kongshaug
- Department of Biology, Sea Lice Research Centre, University of Bergen, Bergen, Norway.
| | - Frank Nilsen
- Department of Biology, Sea Lice Research Centre, University of Bergen, Bergen, Norway.
| |
Collapse
|
8
|
Besnier F, Kent M, Skern-Mauritzen R, Lien S, Malde K, Edvardsen RB, Taylor S, Ljungfeldt LER, Nilsen F, Glover KA. Human-induced evolution caught in action: SNP-array reveals rapid amphi-atlantic spread of pesticide resistance in the salmon ecotoparasite Lepeophtheirus salmonis. BMC Genomics 2014; 15:937. [PMID: 25344698 PMCID: PMC4223847 DOI: 10.1186/1471-2164-15-937] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 10/16/2014] [Indexed: 12/23/2022] Open
Abstract
Background The salmon louse, Lepeophtheirus salmonis, is an ectoparasite of salmonids that causes huge economic losses in salmon farming, and has also been causatively linked with declines of wild salmonid populations. Lice control on farms is reliant upon a few groups of pesticides that have all shown time-limited efficiency due to resistance development. However, to date, this example of human-induced evolution is poorly documented at the population level due to the lack of molecular tools. As such, important evolutionary and management questions, linked to the development and dispersal of pesticide resistance in this parasite, remain unanswered. Here, we introduce the first Single Nucleotide Polymorphism (SNP) array for the salmon louse, which includes 6000 markers, and present a population genomic scan using this array on 576 lice from twelve farms distributed across the North Atlantic. Results Our results support the hypothesis of a single panmictic population of lice in the Atlantic, and importantly, revealed very strong selective sweeps on linkage groups 1 and 5. These sweeps included candidate genes potentially connected to pesticide resistance. After genotyping a further 576 lice from 12 full sibling families, a genome-wide association analysis established a highly significant association between the major sweep on linkage group 5 and resistance to emamectin benzoate, the most widely used pesticide in salmonid aquaculture for more than a decade. Conclusions The analysis of conserved haplotypes across samples from the Atlantic strongly suggests that emamectin benzoate resistance developed at a single source, and rapidly spread across the Atlantic within the period 1999 when the chemical was first introduced, to 2010 when samples for the present study were obtained. These results provide unique insights into the development and spread of pesticide resistance in the marine environment, and identify a small genomic region strongly linked to emamectin benzoate resistance. Finally, these results have highly significant implications for the way pesticide resistance is considered and managed within the aquaculture industry. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-937) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Abstract
Background High-throughput sequencing is a cost effective method for identifying genetic variation, and it is currently in use on a large scale across the field of biology, including ecology and population genetics. Correctly identifying variable sites and allele frequencies from sequencing data remains challenging, in large part due to artifacts and biases inherent in the sequencing process. Selecting variants that are diagnostic is commonly done using diversity statistics like FST, but these measures are not ideal for the task. Results Here, we develop a method that directly calculates the expected amount of information gained from observing each variant site. We then develop and implement a conservative estimator that takes into account uncertainity introduced by sampling bias and sequencing error. This estimator is applied to simulated and real sequencing data, and we discuss how it performs compared to the commonly used existing methods for identifying diagnostic polymorphisms. Conclusion The expected information content gives an easy to interpret measure for the usefulness of variant sites. The results show that we achieve a clear separation between true variants and noise, allowing us to select candidate sites with a high degree of confidence.
Collapse
|
10
|
Edvardsen RB, Dalvin S, Furmanek T, Malde K, Mæhle S, Kvamme BO, Skern-Mauritzen R. Gene expression in five salmon louse (Lepeophtheirus salmonis, Krøyer 1837) tissues. Mar Genomics 2014; 18 Pt A:39-44. [PMID: 24999079 DOI: 10.1016/j.margen.2014.06.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 06/17/2014] [Accepted: 06/21/2014] [Indexed: 11/30/2022]
Abstract
The Atlantic salmon, Salmo salar L, is an important species both for traditional fishery and fish farming. Many Atlantic salmon stocks have been declining and a suspected main contributor to this decline is the salmon louse (Lepeophtheirus salmonis); a parasitic copepod living off the salmonid hosts epidermal tissues and blood. Contributing to the growing body of knowledge on the molecular biology of the salmon louse we have utilized a microarray containing 11,100 salmon louse genes to study the gene expression patterns in selected tissues. This approach has yielded information about potential functions of the transcripts and tissues. Microarray analyses were preformed on subcuticular and frontal (neuronal and gland enriched tissue) tissues, as well as gut, ovary and testes of adult lice. Tissue specific transcriptomes were evident, allowing us to address main traits of functional partitioning between tissues and providing valuable insight into the biology of the louse. The results furthermore represent an important tool and resource for further experiments.
Collapse
Affiliation(s)
| | - Sussie Dalvin
- Institute of Marine Research, P.O. Box 1870, Nordnes, 5817 Bergen, Norway
| | - Tomasz Furmanek
- Institute of Marine Research, P.O. Box 1870, Nordnes, 5817 Bergen, Norway
| | - Ketil Malde
- Institute of Marine Research, P.O. Box 1870, Nordnes, 5817 Bergen, Norway
| | - Stig Mæhle
- Institute of Marine Research, P.O. Box 1870, Nordnes, 5817 Bergen, Norway
| | - Bjørn Olav Kvamme
- Institute of Marine Research, P.O. Box 1870, Nordnes, 5817 Bergen, Norway
| | | |
Collapse
|
11
|
Abstract
Background The field of population genetics use the genetic composition of populations to study the effects of ecological and evolutionary factors, including selection, genetic drift, mating structure, and migration. Until recently, these studies were usually based upon the analysis of relatively few (typically 10–20) DNA markers on samples from multiple populations. In contrast, high-throughput sequencing provides large amounts of data and consequently very high resolution genetic information. Recent technological developments are rapidly making this a cost-effective alternative. In addition, sequencing allows both the direct study of genomic differences between population, and the discovery of single nucleotide polymorphism marker that can be subsequently used in high-throughput genotyping. Much of the analysis in population genetics was developed before large scale sequencing became feasible. Methods often do not take into account the characteristics of the different sequencing technologies, and consequently, may not always be well suited to this kind of data. Results Although the FlowSim suite of tools originally targeted simulation of de novo 454 genomics data, recent developments and enhancements makes it suitable also for simulating other kinds of data. We examine its application to population genomics, and provide examples and supplementary scripts and utilities to aid in this task. Conclusions Simulation is an important tool to study and develop methods in many fields, and here we demonstrate how to simulate a high-throughput sequencing dataset for population genomics.
Collapse
Affiliation(s)
- Ketil Malde
- Institute of Marine Research, Nordnesgaten 50, Bergen, Norway.
| |
Collapse
|
12
|
Skern-Mauritzen R, Malde K, Besnier F, Nilsen F, Jonassen I, Reinhardt R, Koop B, Dalvin S, Mæhle S, Kongshaug H, Glover K. How does sequence variability affectde novoassembly quality? J NAT HIST 2013. [DOI: 10.1080/00222933.2012.738833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
13
|
Abstract
Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database.
Collapse
Affiliation(s)
- Ketil Malde
- Institute of Marine Research, Bergen, Norway.
| | | |
Collapse
|
14
|
Abstract
MOTIVATION Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. RESULTS With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. AVAILABILITY JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Susanne Balzer
- Norwegian Marine Data Centre, Institute of Marine Research, P.O. Box 1870, N-5817 Bergen, Norway
| | | | | | | |
Collapse
|
15
|
Kleppe L, Edvardsen RB, Kuhl H, Malde K, Furmanek T, Drivenes Ø, Reinhardt R, Taranger GL, Wargelius A. Maternal 3'UTRs: from egg to onset of zygotic transcription in Atlantic cod. BMC Genomics 2012; 13:443. [PMID: 22937762 PMCID: PMC3462720 DOI: 10.1186/1471-2164-13-443] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 08/29/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Zygotic transcription in fish embryos initiates around the time of gastrulation, and all prior development is initiated and controlled by maternally derived messenger RNAs. Atlantic cod egg and embryo viability is variable, and it is hypothesized that the early development depends upon the feature of these maternal RNAs. Both the length and the presence of specific motifs in the 3'UTR of maternal RNAs are believed to regulate expression and stability of the maternal transcripts. Therefore, the aim of this study was to characterize the overall composition and 3'UTR structure of the most common maternal RNAs found in cod eggs and pre-zygotic embryos. RESULTS 22229 Sanger-sequences were obtained from 3'-end sequenced cDNA libraries prepared from oocyte, 1-2 cell, blastula and gastrula stages. Quantitative PCR revealed that EST copy number below 9 did not reflect the gene expression profile. Consequently genes represented by less than 9 ESTs were excluded from downstream analyses, in addition to sequences with low-quality gene hits. This provided 12764 EST sequences, encoding 257 unique genes, for further analysis. Mitochondrial transcripts accounted for 45.9-50.6% of the transcripts isolated from the maternal stages, but only 12.2% of those present at the onset of zygotic transcription. 3'UTR length was predicted in nuclear sequences with poly-A tail, which identified 191 3'UTRs. Their characteristics indicated a more complex regulation of transcripts that are abundant prior to the onset of zygotic transcription. Maternal and stable transcripts had longer 3'UTR (mean 187.1 and 208.8 bp) and more 3'UTR isoforms (45.7 and 34.6%) compared to zygotic transcripts, where 15.4% had 3'UTR isoforms and the mean 3'UTR length was 76 bp. Also, diversity and the amount of putative polyadenylation motifs were higher in both maternal and stable transcripts. CONCLUSIONS We report on the most pronounced processes in the maternally transferred cod transcriptome. Maternal stages are characterized by a rich abundance of mitochondrial transcripts. Maternal and stable transcripts display longer 3'UTRs with more variation of both polyadenylation motifs and 3'UTR isoforms. These data suggest that cod eggs possess a complex array of maternal RNAs which likely act to tightly regulate early developmental processes in the newly fertilized egg.
Collapse
Affiliation(s)
- Lene Kleppe
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Rolf B Edvardsen
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Heiner Kuhl
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, D-14195, Berlin-Dahlem, Germany
| | - Ketil Malde
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Tomasz Furmanek
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Øyvind Drivenes
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Richard Reinhardt
- Max-Planck Genome centre, MPI fuer Pflanzenzüchtungsforschung, Carl-von-Linné-Weg 10, D-80829, Koeln, Germany
| | - Geir L Taranger
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| | - Anna Wargelius
- Institute of Marine Research, P. O. Box 1870, Nordnesgaten 50, 5817, Bergen, Norway
| |
Collapse
|
16
|
Sagstad A, Grotmol S, Kryvi H, Krossøy C, Totland GK, Malde K, Wang S, Hansen T, Wargelius A. Identification of vimentin- and elastin-like transcripts specifically expressed in developing notochord of Atlantic salmon (Salmo salar L.). Cell Tissue Res 2011; 346:191-202. [PMID: 22057848 DOI: 10.1007/s00441-011-1262-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 10/05/2011] [Indexed: 11/26/2022]
Abstract
The notochord functions as the midline structural element of all vertebrate embryos, and allows movement and growth at early developmental stages. Moreover, during embryonic development, notochord cells produce secreted factors that provide positional and fate information to a broad variety of cells within adjacent tissues, for instance those of the vertebrae, central nervous system and somites. Due to the large size of the embryo, the salmon notochord is useful to study as a model for exploring notochord development. To investigate factors that might be involved in notochord development, a normalized cDNA library was constructed from a mix of notochords from ∼500 to ∼800 day°. From the 1968 Sanger-sequenced transcripts, 22 genes were identified to be predominantly expressed in the notochord compared to other organs of salmon. Twelve of these genes were found to show expressional regulation around mineralization of the notochord sheath; 11 genes were up-regulated and one gene was down-regulated. Two genes were found to be specifically expressed in the notochord; these genes showed similarity to vimentin (acc. no GT297094) and elastin (acc. no GT297478). In-situ results showed that the vimentin- like transcript was expressed in both chordocytes and chordoblasts, whereas the elastin- like transcript was uniquely expressed in the chordoblasts lining the notochordal sheath. In salmon aquaculture, vertebral deformities are a common problem, and some malformations have been linked to the notochord. The expression of identified transcripts provides further insight into processes taking place in the developing notochord, prior to and during the early mineralization period.
Collapse
Affiliation(s)
- Anita Sagstad
- Department of Biology, University of Bergen, P.O. Box 7800, NO-5020 Bergen, Norway.
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Abstract
Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. Availability: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. Contact:susanne.balzer@imr.no
Collapse
Affiliation(s)
- Susanne Balzer
- Institute of Marine Research, P.O. Box 1870, N-5817 Bergen, Norway.
| | | | | |
Collapse
|
18
|
Star B, Nederbragt AJ, Jentoft S, Grimholt U, Malmstrøm M, Gregers TF, Rounge TB, Paulsen J, Solbakken MH, Sharma A, Wetten OF, Lanzén A, Winer R, Knight J, Vogel JH, Aken B, Andersen O, Lagesen K, Tooming-Klunderud A, Edvardsen RB, Tina KG, Espelund M, Nepal C, Previti C, Karlsen BO, Moum T, Skage M, Berg PR, Gjøen T, Kuhl H, Thorsen J, Malde K, Reinhardt R, Du L, Johansen SD, Searle S, Lien S, Nilsen F, Jonassen I, Omholt SW, Stenseth NC, Jakobsen KS. The genome sequence of Atlantic cod reveals a unique immune system. Nature 2011; 477:207-10. [PMID: 21832995 DOI: 10.1038/nature10342] [Citation(s) in RCA: 527] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 06/28/2011] [Indexed: 01/24/2023]
Abstract
Atlantic cod (Gadus morhua) is a large, cold-adapted teleost that sustains long-standing commercial fisheries and incipient aquaculture. Here we present the genome sequence of Atlantic cod, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates. The genome assembly was obtained exclusively by 454 sequencing of shotgun and paired-end libraries, and automated annotation identified 22,154 genes. The major histocompatibility complex (MHC) II is a conserved feature of the adaptive immune system of jawed vertebrates, but we show that Atlantic cod has lost the genes for MHC II, CD4 and invariant chain (Ii) that are essential for the function of this pathway. Nevertheless, Atlantic cod is not exceptionally susceptible to disease under natural conditions. We find a highly expanded number of MHC I genes and a unique composition of its Toll-like receptor (TLR) families. This indicates how the Atlantic cod immune system has evolved compensatory mechanisms in both adaptive and innate immunity in the absence of MHC II. These observations affect fundamental assumptions about the evolution of the adaptive immune system and its components in vertebrates.
Collapse
Affiliation(s)
- Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Balzer S, Malde K, Lanzen A, Sharma A, Jonassen I. Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim. Bioinformatics 2011. [DOI: 10.1093/bioinformatics/btr384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
20
|
Abstract
UNLABELLED The SFF file format produced by Roche's 454 sequencing technology is a compact, binary format that contains the flow values that are used for base and quality calling of the reads. Applications, e.g. in metagenomics, often depend on accurate sequence information, and access to flow values is important to estimate the probability of errors. Unfortunately, the programs supplied by Roche for accessing this information are not publicly available. Flower is a program that can extract the information contained in SFF files, and convert it to various textual output formats. AVAILABILITY Flower is freely available under the General Public License.
Collapse
Affiliation(s)
- Ketil Malde
- The Norwegian Marine Data Centre, Institute of Marine Research, Bergen, Norway.
| |
Collapse
|
21
|
Abstract
Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact:susanne.balzer@imr.no; ketil.malde@imr.no
Collapse
Affiliation(s)
- Susanne Balzer
- Institute of Marine Research, University of Bergen, Bergen, Norway.
| | | | | | | | | |
Collapse
|
22
|
Edvardsen RB, Malde K, Mittelholzer C, Taranger GL, Nilsen F. EST resources and establishment and validation of a 16k cDNA microarray from Atlantic cod (Gadus morhua). Comp Biochem Physiol Part D Genomics Proteomics 2010; 6:23-30. [PMID: 20663723 DOI: 10.1016/j.cbd.2010.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Revised: 06/21/2010] [Accepted: 06/22/2010] [Indexed: 11/28/2022]
Abstract
The Atlantic cod, Gadus morhua, is an important species both for traditional fishery and increasingly also in fish farming. The Atlantic cod is also under potential threat from various environmental changes such as pollution and climate change, but the biological impact of such changes are not well known, in particular when it comes to sublethal effects that can be difficult to assert. Modern molecular and genomic approaches have revolutionized biological research during the last decade, and offer new avenues to study biological functions and e.g. the impact of anthropogenic activities at different life-stages for a given organism. In order to develop genomic data and genomic tools for Atlantic cod we conducted a program were we constructed 20 cDNA libraries, and produced and analyzed 44006 expressed sequence tags (ESTs) from these. Several tissues are represented in the multiple cDNA libraries, that differ in either sexual maturation or immulogical stimulation. This approach allowed us to identify genes that are expressed in particular tissues, life-stages or in response to specific stimuli, and also gives us information about potential functions of the transcripts. The ESTs were used to construct a 16k cDNA microarray to further investigate the cod transcriptome. Microarray analyses were preformed on pylorus, pituitary gland, spleen and testis of sexually maturing male cod. The four different tissues displayed tissue specific transcriptomes demonstrating that the cDNA array is working as expected and will prove to be a powerful tool in further experiments.
Collapse
|
23
|
Patel S, Malde K, Lanzén A, Olsen RH, Nerland AH. Identification of immune related genes in Atlantic halibut (Hippoglossus hippoglossus L.) following in vivo antigenic and in vitro mitogenic stimulation. Fish Shellfish Immunol 2009; 27:729-738. [PMID: 19751833 DOI: 10.1016/j.fsi.2009.09.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2009] [Revised: 09/03/2009] [Accepted: 09/03/2009] [Indexed: 05/28/2023]
Abstract
To identify and characterize genes and proteins of the Atlantic halibut (Hippoglossus hippoglossus) immune system, six cDNA libraries were constructed from liver, kidney, spleen, peripheral blood, and thymus. Halibut were injected with nodavirus, infectious pancreatic necrosis virus (IPNV), or vibriosis vaccine and tissue samples were collected at various time points. Leukocytes from peripheral blood and spleen from stimulated and mock-injected fish were isolated and further in vitro activated with the mitogens, concanavalin A (Con A) and phorbol myristate acetate (PMA) to facilitate activation and proliferation. A total of 5117 high quality expressed sequence tags (ESTs) were identified and assembled into 781 contigs and 2796 singletons. Amongst these ESTs, 147 different putative immune related genes were identified. Several genes involved in innate and adaptive immune responses such as complement proteins, immunoglobulins, cell surface receptors, and cytokines and chemokines were identified. Of the immune related genes identified in this study, 44% had no match against any of the publicly available sequence data for halibut and thus can be considered as novel identification in halibut species. The approach of combining in vivo antigenic with in vitro mitogen stimulation, in addition to preparation of cDNA libraries from thymus enabled identification of many of the interesting genes including those involved in T-cell receptor complex.
Collapse
Affiliation(s)
- Sonal Patel
- Institute of Marine Research (IMR), Bergen, Norway.
| | | | | | | | | |
Collapse
|
24
|
Abstract
MOTIVATION The nucleotide sequencing process produces not only the sequence of nucleotides, but also associated quality values. Quality values provide valuable information, but are primarily used only for trimming sequences and generally ignored in subsequent analyses. RESULTS This article describes how the scoring schemes of standard alignment algorithms can be modified to take into account quality values to produce improved alignments and statistically more accurate scores. A prototype implementation is also provided, and used to post-process a set of BLAST results. Quality-adjusted alignment is a natural extension of standard alignment methods, and can be implemented with only a small constant factor performance penalty. The method can also be applied to related methods including heuristic search algorithms like BLAST and FASTA. AVAILABILITY http://malde.org/~ketil/qaa.
Collapse
Affiliation(s)
- Ketil Malde
- Institute of Marine Research, Bergen, Norway.
| |
Collapse
|
25
|
Abstract
Background Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats. Results Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome). Conclusion Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries.
Collapse
Affiliation(s)
- Ketil Malde
- Computational Biology Unit, Bergen Centre for Computational Sciences, University of Bergen, Norway.
| | | |
Collapse
|
26
|
Abstract
MOTIVATION Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis. RESULTS We present a fast, flexible and library-less method for masking repeats in EST sequences, based on match statistics within the EST collection. The method is not linked to a particular clustering algorithm. Extensive testing on datasets using different clustering methods and a genomic mapping as reference shows that this method gives results that are better than or as good as those obtained using RepeatMasker with a repeat library. AVAILABILITY The implementation of RBR is available under the terms of the GPL from http://www.ii.uib.no/~ketil/bioinformatics CONTACT ketil.malde@bccs.uib.no SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ketil Malde
- Computational Biology Unit, Bergen Centre for Computational Sciences, University of Bergen, Norway.
| | | | | | | |
Collapse
|
27
|
Schneeberger K, Malde K, Coward E, Jonassen I. Masking repeats while clustering ESTs. Nucleic Acids Res 2005; 33:2176-80. [PMID: 15831790 PMCID: PMC1079970 DOI: 10.1093/nar/gki511] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2004] [Revised: 03/10/2005] [Accepted: 03/28/2005] [Indexed: 11/15/2022] Open
Abstract
A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. Unlike traditional methods, repeats are inferred directly from the EST data, we do not rely on any external library of known repeats. This makes the method especially suitable for analysing the ESTs from organisms without good repeat libraries. We demonstrate that the result is very similar to performing standard repeat masking before clustering.
Collapse
Affiliation(s)
| | - Ketil Malde
- Department of Informatics, University of BergenBergen, Norway
| | - Eivind Coward
- Department of Informatics, University of BergenBergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, University of BergenBergen, Norway
- Department of Informatics, University of BergenBergen, Norway
| |
Collapse
|
28
|
Abstract
MOTIVATION EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. RESULTS In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed. AVAILABILITY The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/ CONTACT ketil@ii.uib.no.
Collapse
Affiliation(s)
- Ketil Malde
- Department of Informatics, University of Bergen, Norway.
| | | | | |
Collapse
|
29
|
Abstract
MOTIVATION Efficient clustering is important for handling the large amount of available EST sequences. Most contemporary methods are based on some kind of all-against-all comparison, resulting in a quadratic time complexity. A different approach is needed to keep up with the rapid growth of EST data. RESULTS A new, fast EST clustering algorithm is presented. Sub-quadratic time complexity is achieved by using an algorithm based on suffix arrays. A prototype implementation has been developed and run on a benchmark data set. The produced clusterings are validated by comparing them to clusterings produced by other methods, and the results are quite promising. AVAILABILITY The source code for the prototype implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bio/.
Collapse
Affiliation(s)
- Ketil Malde
- Department of Informatics, University of Bergen, HIB, N5020 Norway.
| | | | | |
Collapse
|
30
|
Norderhaug IN, Sandberg S, Fosså SD, Forland F, Malde K, Kvinnsland S, Traaholt I, Rossiné BK, Førde OH. Health technology assessment and implications for clinical practice: the case of prostate cancer screening. Scand J Clin Lab Invest 2003; 63:331-8. [PMID: 14599155 DOI: 10.1080/00365510310002022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
We describe an initiative to disseminate evidence from systematic reviews about the clinical effectiveness of prostate cancer screening to general practitioners and urologists in Norway. The Norwegian Centre for Health Technology Assessment invited The Norwegian Medical Association, The Norwegian Cancer Society, The Norwegian Board of Health, The Norwegian Urological Cancer Group and The Norwegian Patient Association to develop and disseminate clinical practice recommendations. The clinical effectiveness of prostate cancer screening has been assessed in nine independent systematic reviews, which are summarized in a joint INAHTA report. The conclusion was that there is no evidence from appropriately designed trials that early detection and treatment of prostate cancer can reduce mortality, morbidity or improve quality of life. The number of prostate-specific antigen (PSA) tests analysed in Norway increased by 47% [corrected] from 1996 to 1999; at the county level the increase ranged from 12 to 48%. On this background we disseminated leaflets with information about PSA and prostate cancer to 4100 general practitioners and specialists in urology. The main message was, i) PSA should not be taken in healthy men, ii) if the test is wanted, the physician is obliged to give information about the possible consequences. Despite efforts to anchor the information campaign within the mentioned organizations, this met with notable opposition from The Norwegian Urological Society. A survey among agencies within the INAHTA network showed that more than half of the countries within this collaboration have implemented guidelines or recommendations on prostate cancer screening. In conclusion, evidence obtained through an international collaboration such as the INAHTA collaboration may be used to develop and implement national guidelines or recommendations.
Collapse
Affiliation(s)
- I N Norderhaug
- The Norwegian Centre for Health Technology Assessment, SINTEF Unimed, Oslo, Norway.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Malde K, Kvamme O, Ebbing H. [The action for correct fees--storm in a glass of water?]. Tidsskr Nor Laegeforen 1999; 119:3804-7. [PMID: 10574062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
|
32
|
Mouland G, Bratland B, Fagan M, Malde K, Rygh E, Welander F, Waerdahl E, Ytterdahl T. [Use of new antidepressive agents in general practice]. Tidsskr Nor Laegeforen 1998; 118:4497-500. [PMID: 9889632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
There is much discussion about the use of the new antidepressant drugs. Some advocate a more liberal use while others criticize overprescribing. Our study examines the indication, dosage and length of treatment in five family practices. Records of 208 patients prescribed one of three new antidepressants in 1995 were reviewed. 90% had depression or anxiety while 10% received medication for pain or for other symptoms. Dosage and median length of treatment were in accordance with recommended guidelines. However, a number of patients discontinue their medication within a month of start of treatment. The study did not examine the reason for this, but side effects or lack of effect are possible explanations. 36 patients were treated for a year or longer and we raise the question whether abstinence-like symptoms or the physicians prescribing practice can explain this finding. The article also examines the increased sale of the new antidepressants in Norway and in Aust-Agder county specifically, the county in Norway with the largest per capita sale of these drugs.
Collapse
|
33
|
Kvamme OJ, Ebbing H, Malde K. [Is the normal fee used incorrectly?]. Tidsskr Nor Laegeforen 1998; 118:2526-8. [PMID: 9667133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
34
|
Eskerud J, Nordby K, Malde K, Kahn H, Holtedahl KA, Wahl H, Graff-Iversen S. [Special care services--an advantage or a disadvantage?]. Tidsskr Nor Laegeforen 1988; 108:2753-5. [PMID: 3206488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
35
|
Lie H, Malde K, Sorteberg K, Hasvold O, Ekren T, Malm OJ. Long-term fluoride and calcium therapy of postmenopausal osteoporosis. J Oslo City Hosp 1982; 32:147-54. [PMID: 7153813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
36
|
Waldum HL, Malde K. [Epidemics of Yersinia enterocolitica infections]. Tidsskr Nor Laegeforen 1975; 95:1578-9, 1593, 1602. [PMID: 1179383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|