1
|
Comparative Genome Microsynteny Illuminates the Fast Evolution of Nuclear Mitochondrial Segments (NUMTs) in Mammals. Mol Biol Evol 2024; 41:msad278. [PMID: 38124445 PMCID: PMC10764098 DOI: 10.1093/molbev/msad278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/16/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023] Open
Abstract
The escape of DNA from mitochondria into the nuclear genome (nuclear mitochondrial DNA, NUMT) is an ongoing process. Although pervasively observed in eukaryotic genomes, their evolutionary trajectories in a mammal-wide context are poorly understood. The main challenge lies in the orthology assignment of NUMTs across species due to their fast evolution and chromosomal rearrangements over the past 200 million years. To address this issue, we systematically investigated the characteristics of NUMT insertions in 45 mammalian genomes and established a novel, synteny-based method to accurately predict orthologous NUMTs and ascertain their evolution across mammals. With a series of comparative analyses across taxa, we revealed that NUMTs may originate from nonrandom regions in mtDNA, are likely found in transposon-rich and intergenic regions, and unlikely code for functional proteins. Using our synteny-based approach, we leveraged 630 pairwise comparisons of genome-wide microsynteny and predicted the NUMT orthology relationships across 36 mammals. With the phylogenetic patterns of NUMT presence-and-absence across taxa, we constructed the ancestral state of NUMTs given the mammal tree using a coalescent method. We found support on the ancestral node of Fereuungulata within Laurasiatheria, whose subordinal relationships are still controversial. This study broadens our knowledge on NUMT insertion and evolution in mammalian genomes and highlights the merit of NUMTs as alternative genetic markers in phylogenetic inference.
Collapse
|
2
|
Neuronal migration prevents spatial competition in retinal morphogenesis. Nature 2023; 620:615-624. [PMID: 37558872 DOI: 10.1038/s41586-023-06392-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 06/30/2023] [Indexed: 08/11/2023]
Abstract
The concomitant occurrence of tissue growth and organization is a hallmark of organismal development1-3. This often means that proliferating and differentiating cells are found at the same time in a continuously changing tissue environment. How cells adapt to architectural changes to prevent spatial interference remains unclear. Here, to understand how cell movements that are key for growth and organization are orchestrated, we study the emergence of photoreceptor neurons that occur during the peak of retinal growth, using zebrafish, human tissue and human organoids. Quantitative imaging reveals that successful retinal morphogenesis depends on the active bidirectional translocation of photoreceptors, leading to a transient transfer of the entire cell population away from the apical proliferative zone. This pattern of migration is driven by cytoskeletal machineries that differ depending on the direction: microtubules are exclusively required for basal translocation, whereas actomyosin is involved in apical movement. Blocking the basal translocation of photoreceptors induces apical congestion, which hampers the apical divisions of progenitor cells and leads to secondary defects in lamination. Thus, photoreceptor migration is crucial to prevent competition for space, and to allow concurrent tissue growth and lamination. This shows that neuronal migration, in addition to its canonical role in cell positioning4, can be involved in coordinating morphogenesis.
Collapse
|
3
|
MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics 2023; 24:288. [PMID: 37464285 PMCID: PMC10354987 DOI: 10.1186/s12859-023-05385-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/13/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Collapse
|
4
|
A novel nematode species from the Siberian permafrost shares adaptive mechanisms for cryptobiotic survival with C. elegans dauer larva. PLoS Genet 2023; 19:e1010798. [PMID: 37498820 PMCID: PMC10374039 DOI: 10.1371/journal.pgen.1010798] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 05/24/2023] [Indexed: 07/29/2023] Open
Abstract
Some organisms in nature have developed the ability to enter a state of suspended metabolism called cryptobiosis when environmental conditions are unfavorable. This state-transition requires execution of a combination of genetic and biochemical pathways that enable the organism to survive for prolonged periods. Recently, nematode individuals have been reanimated from Siberian permafrost after remaining in cryptobiosis. Preliminary analysis indicates that these nematodes belong to the genera Panagrolaimus and Plectus. Here, we present precise radiocarbon dating indicating that the Panagrolaimus individuals have remained in cryptobiosis since the late Pleistocene (~46,000 years). Phylogenetic inference based on our genome assembly and a detailed morphological analysis demonstrate that they belong to an undescribed species, which we named Panagrolaimus kolymaensis. Comparative genome analysis revealed that the molecular toolkit for cryptobiosis in P. kolymaensis and in C. elegans is partly orthologous. We show that biochemical mechanisms employed by these two species to survive desiccation and freezing under laboratory conditions are similar. Our experimental evidence also reveals that C. elegans dauer larvae can remain viable for longer periods in suspended animation than previously reported. Altogether, our findings demonstrate that nematodes evolved mechanisms potentially allowing them to suspend life over geological time scales.
Collapse
|
5
|
A high-quality, haplotype-phased genome reconstruction reveals unexpected haplotype diversity in a pearl oyster. DNA Res 2022; 29:dsac035. [PMID: 36351462 PMCID: PMC9646362 DOI: 10.1093/dnares/dsac035] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/18/2022] [Accepted: 09/12/2022] [Indexed: 07/30/2023] Open
Abstract
Homologous chromosomes in the diploid genome are thought to contain equivalent genetic information, but this common concept has not been fully verified in animal genomes with high heterozygosity. Here we report a near-complete, haplotype-phased, genome assembly of the pearl oyster, Pinctada fucata, using hi-fidelity (HiFi) long reads and chromosome conformation capture data. This assembly includes 14 pairs of long scaffolds (>38 Mb) corresponding to chromosomes (2n = 28). The accuracy of the assembly, as measured by an analysis of k-mers, is estimated to be 99.99997%. Moreover, the haplotypes contain 95.2% and 95.9%, respectively, complete and single-copy BUSCO genes, demonstrating the high quality of the assembly. Transposons comprise 53.3% of the assembly and are a major contributor to structural variations. Despite overall collinearity between haplotypes, one of the chromosomal scaffolds contains megabase-scale non-syntenic regions, which necessarily have never been detected and resolved in conventional haplotype-merged assemblies. These regions encode expanded gene families of NACHT, DZIP3/hRUL138-like HEPN, and immunoglobulin domains, multiplying the immunity gene repertoire, which we hypothesize is important for the innate immune capability of pearl oysters. The pearl oyster genome provides insight into remarkable haplotype diversity in animals.
Collapse
|
6
|
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
|
7
|
Abstract
A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.
Collapse
|
8
|
|
9
|
EASI-FISH for thick tissue defines lateral hypothalamus spatio-molecular organization. Cell 2021; 184:6361-6377.e24. [PMID: 34875226 DOI: 10.1016/j.cell.2021.11.024] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 08/22/2021] [Accepted: 11/12/2021] [Indexed: 11/17/2022]
Abstract
Determining the spatial organization and morphological characteristics of molecularly defined cell types is a major bottleneck for characterizing the architecture underpinning brain function. We developed Expansion-Assisted Iterative Fluorescence In Situ Hybridization (EASI-FISH) to survey gene expression in brain tissue, as well as a turnkey computational pipeline to rapidly process large EASI-FISH image datasets. EASI-FISH was optimized for thick brain sections (300 μm) to facilitate reconstruction of spatio-molecular domains that generalize across brains. Using the EASI-FISH pipeline, we investigated the spatial distribution of dozens of molecularly defined cell types in the lateral hypothalamic area (LHA), a brain region with poorly defined anatomical organization. Mapping cell types in the LHA revealed nine spatially and molecularly defined subregions. EASI-FISH also facilitates iterative reanalysis of scRNA-seq datasets to determine marker-genes that further dissociated spatial and morphological heterogeneity. The EASI-FISH pipeline democratizes mapping molecularly defined cell types, enabling discoveries about brain organization.
Collapse
|
10
|
Finding long tandem repeats in long noisy reads. Bioinformatics 2021; 37:612-621. [PMID: 33031558 PMCID: PMC8097686 DOI: 10.1093/bioinformatics/btaa865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/07/2020] [Accepted: 09/23/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10 000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10–20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (<1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity. Availability and implementation https://github.com/morisUtokyo/mTR.
Collapse
|
11
|
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
|
12
|
Rapid and ongoing evolution of repetitive sequence structures in human centromeres. SCIENCE ADVANCES 2020; 6:6/50/eabd9230. [PMID: 33310858 PMCID: PMC7732198 DOI: 10.1126/sciadv.abd9230] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/30/2020] [Indexed: 06/12/2023]
Abstract
Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.
Collapse
|
13
|
Abstract
The transition from 'well-marked varieties' of a single species into 'well-defined species'-especially in the absence of geographic barriers to gene flow (sympatric speciation)-has puzzled evolutionary biologists ever since Darwin1,2. Gene flow counteracts the buildup of genome-wide differentiation, which is a hallmark of speciation and increases the likelihood of the evolution of irreversible reproductive barriers (incompatibilities) that complete the speciation process3. Theory predicts that the genetic architecture of divergently selected traits can influence whether sympatric speciation occurs4, but empirical tests of this theory are scant because comprehensive data are difficult to collect and synthesize across species, owing to their unique biologies and evolutionary histories5. Here, within a young species complex of neotropical cichlid fishes (Amphilophus spp.), we analysed genomic divergence among populations and species. By generating a new genome assembly and re-sequencing 453 genomes, we uncovered the genetic architecture of traits that have been suggested to be important for divergence. Species that differ in monogenic or oligogenic traits that affect ecological performance and/or mate choice show remarkably localized genomic differentiation. By contrast, differentiation among species that have diverged in polygenic traits is genomically widespread and much higher overall, consistent with the evolution of effective and stable genome-wide barriers to gene flow. Thus, we conclude that simple trait architectures are not always as conducive to speciation with gene flow as previously suggested, whereas polygenic architectures can promote rapid and stable speciation in sympatry.
Collapse
|
14
|
Practical sensorless aberration estimation for 3D microscopy with deep learning. OPTICS EXPRESS 2020; 28:29044-29053. [PMID: 33114810 PMCID: PMC7679184 DOI: 10.1364/oe.401933] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Estimation of optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for 3D microscopy. Recent approaches based on deep learning promise accurate results at fast processing speeds. However, collecting ground truth microscopy data for training the network is typically very difficult or even impossible thereby limiting this approach in practice. Here, we demonstrate that neural networks trained only on simulated data yield accurate predictions for real experimental images. We validate our approach on simulated and experimental datasets acquired with two different microscopy modalities and also compare the results to non-learned methods. Additionally, we study the predictability of individual aberrations with respect to their data requirements and find that the symmetry of the wavefront plays a crucial role. Finally, we make our implementation freely available as open source software in Python.
Collapse
|
15
|
Abstract
Bats possess extraordinary adaptations, including flight, echolocation, extreme longevity and unique immunity. High-quality genomes are crucial for understanding the molecular basis and evolution of these traits. Here we incorporated long-read sequencing and state-of-the-art scaffolding protocols1 to generate, to our knowledge, the first reference-quality genomes of six bat species (Rhinolophus ferrumequinum, Rousettus aegyptiacus, Phyllostomus discolor, Myotis myotis, Pipistrellus kuhlii and Molossus molossus). We integrated gene projections from our 'Tool to infer Orthologs from Genome Alignments' (TOGA) software with de novo and homology gene predictions as well as short- and long-read transcriptomics to generate highly complete gene annotations. To resolve the phylogenetic position of bats within Laurasiatheria, we applied several phylogenetic methods to comprehensive sets of orthologous protein-coding and noncoding regions of the genome, and identified a basal origin for bats within Scrotifera. Our genome-wide screens revealed positive selection on hearing-related genes in the ancestral branch of bats, which is indicative of laryngeal echolocation being an ancestral trait in this clade. We found selection and loss of immunity-related genes (including pro-inflammatory NF-κB regulators) and expansions of anti-viral APOBEC3 genes, which highlights molecular mechanisms that may contribute to the exceptional immunity of bats. Genomic integrations of diverse viruses provide a genomic record of historical tolerance to viral infection in bats. Finally, we found and experimentally validated bat-specific variation in microRNAs, which may regulate bat-specific gene-expression programs. Our reference-quality bat genomes provide the resources required to uncover and validate the genomic basis of adaptations of bats, and stimulate new avenues of research that are directly relevant to human health and disease1.
Collapse
|
16
|
Rod nuclear architecture determines contrast transmission of the retina and behavioral sensitivity in mice. eLife 2019; 8:49542. [PMID: 31825309 PMCID: PMC6974353 DOI: 10.7554/elife.49542] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 12/11/2019] [Indexed: 01/06/2023] Open
Abstract
Rod photoreceptors of nocturnal mammals display a striking inversion of nuclear architecture, which has been proposed as an evolutionary adaptation to dark environments. However, the nature of visual benefits and the underlying mechanisms remains unclear. It is widely assumed that improvements in nocturnal vision would depend on maximization of photon capture at the expense of image detail. Here, we show that retinal optical quality improves 2-fold during terminal development, and that this enhancement is caused by nuclear inversion. We further demonstrate that improved retinal contrast transmission, rather than photon-budget or resolution, enhances scotopic contrast sensitivity by 18–27%, and improves motion detection capabilities up to 10-fold in dim environments. Our findings therefore add functional significance to a prominent exception of nuclear organization and establish retinal contrast transmission as a decisive determinant of mammalian visual perception.
Collapse
|
17
|
|
18
|
Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat Methods 2018; 15:1090-1097. [PMID: 30478326 DOI: 10.1038/s41592-018-0216-7] [Citation(s) in RCA: 447] [Impact Index Per Article: 74.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 10/10/2018] [Indexed: 02/05/2023]
Abstract
Fluorescence microscopy is a key driver of discoveries in the life sciences, with observable phenomena being limited by the optics of the microscope, the chemistry of the fluorophores, and the maximum photon exposure tolerated by the sample. These limits necessitate trade-offs between imaging speed, spatial resolution, light exposure, and imaging depth. In this work we show how content-aware image restoration based on deep learning extends the range of biological phenomena observable by microscopy. We demonstrate on eight concrete examples how microscopy images can be restored even if 60-fold fewer photons are used during acquisition, how near isotropic resolution can be achieved with up to tenfold under-sampling along the axial direction, and how tubular and granular structures smaller than the diffraction limit can be resolved at 20-times-higher frame rates compared to state-of-the-art methods. All developed image restoration methods are freely available as open source software in Python, FIJI, and KNIME.
Collapse
|
19
|
Differential lateral and basal tension drive folding of Drosophila wing discs through two distinct mechanisms. Nat Commun 2018; 9:4620. [PMID: 30397306 PMCID: PMC6218478 DOI: 10.1038/s41467-018-06497-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 09/05/2018] [Indexed: 12/26/2022] Open
Abstract
Epithelial folding transforms simple sheets of cells into complex three-dimensional tissues and organs during animal development. Epithelial folding has mainly been attributed to mechanical forces generated by an apically localized actomyosin network, however, contributions of forces generated at basal and lateral cell surfaces remain largely unknown. Here we show that a local decrease of basal tension and an increased lateral tension, but not apical constriction, drive the formation of two neighboring folds in developing Drosophila wing imaginal discs. Spatially defined reduction of extracellular matrix density results in local decrease of basal tension in the first fold; fluctuations in F-actin lead to increased lateral tension in the second fold. Simulations using a 3D vertex model show that the two distinct mechanisms can drive epithelial folding. Our combination of lateral and basal tension measurements with a mechanical tissue model reveals how simple modulations of surface and edge tension drive complex three-dimensional morphological changes.
Collapse
|
20
|
Abstract
In the originally published version of this Article, the sequenced axolotl strain (the homozygous white mutant) was denoted as 'D/D' rather than 'd/d' in Fig. 1a and the accompanying legend, the main text and the Methods section. The original Article has been corrected online.
Collapse
|
21
|
Biobeam-Multiplexed wave-optical simulations of light-sheet microscopy. PLoS Comput Biol 2018; 14:e1006079. [PMID: 29652879 PMCID: PMC5898703 DOI: 10.1371/journal.pcbi.1006079] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 03/06/2018] [Indexed: 11/19/2022] Open
Abstract
Sample-induced image-degradation remains an intricate wave-optical problem in light-sheet microscopy. Here we present biobeam, an open-source software package that enables simulation of operational light-sheet microscopes by combining data from 105–106 multiplexed and GPU-accelerated point-spread-function calculations. The wave-optical nature of these simulations leads to the faithful reproduction of spatially varying aberrations, diffraction artifacts, geometric image distortions, adaptive optics, and emergent wave-optical phenomena, and renders image-formation in light-sheet microscopy computationally tractable. Modern microscopes permit to acquire high quality images of large fields of view, which is the result of a decade-long development of computer aided optical design. However, this high image quality can only be obtained at the very surface of biological specimens: when trying to penetrate deeper into biological tissues, light scattering by cells rapidly leads to severe image blur and computers have so far been unable to model the process by which light forms images in such turbid optical environments. We developed a software that allows one to simulate how microscopes record images deep inside scattering biological samples. Our software reproduces a wide range of optical effects that underlie image blur in tissues. Hence strategies to improve image quality within three-dimensional samples can now be systematically tested by computers. Specifically, our software reproduces intricate wave-optical effects that have recently been proposed as strategies to gain perfect images even in the most turbid environments.This provides the chance for a new generation of microscopes, in which computer models guide the imaging process to enable highest possible resolution even deep inside biological specimens.
Collapse
|
22
|
PreMosa: extracting 2D surfaces from 3D microscopy mosaics. Bioinformatics 2018; 33:2563-2569. [PMID: 28383656 DOI: 10.1093/bioinformatics/btx195] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 04/04/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation A significant focus of biological research is to understand the development, organization and function of tissues. A particularly productive area of study is on single layer epithelial tissues in which the adherence junctions of cells form a 2D manifold that is fluorescently labeled. Given the size of the tissue, a microscope must collect a mosaic of overlapping 3D stacks encompassing the stained surface. Downstream interpretation is greatly simplified by preprocessing such a dataset as follows: (i) extracting and mapping the stained manifold in each stack into a single 2D projection plane, (ii) correcting uneven illumination artifacts, (iii) stitching the mosaic planes into a single, large 2D image and (iv) adjusting the contrast. Results We have developed PreMosa, an efficient, fully automatic pipeline to perform the four preprocessing tasks above resulting in a single 2D image of the stained manifold across which contrast is optimized and illumination is even. Notable features are as follows. First, the 2D projection step employs a specially developed algorithm that actually finds the manifold in the stack based on maximizing contrast, intensity and smoothness. Second, the projection step comes first, implying all subsequent tasks are more rapidly solved in 2D. And last, the mosaic melding employs an algorithm that globally adjusts contrasts amongst the 2D tiles so as to produce a seamless, high-contrast image. We conclude with an evaluation using ground-truth datasets and present results on datasets from Drosophila melanogaster wings and Schmidtae mediterranea ciliary components. Availability and Implementation PreMosa is available under https://cblasse.github.io/premosa. Contact blasse@mpi-cbg.de or myers@mpi-cbg.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
23
|
Cell dynamics underlying oriented growth of the Drosophila wing imaginal disc. Development 2017; 144:4406-4421. [PMID: 29038308 DOI: 10.1242/dev.155069] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/05/2017] [Indexed: 12/30/2022]
Abstract
Quantitative analysis of the dynamic cellular mechanisms shaping the Drosophila wing during its larval growth phase has been limited, impeding our ability to understand how morphogen patterns regulate tissue shape. Such analysis requires explants to be imaged under conditions that maintain both growth and patterning, as well as methods to quantify how much cellular behaviors change tissue shape. Here, we demonstrate a key requirement for the steroid hormone 20-hydroxyecdysone (20E) in the maintenance of numerous patterning systems in vivo and in explant culture. We find that low concentrations of 20E support prolonged proliferation in explanted wing discs in the absence of insulin, incidentally providing novel insight into the hormonal regulation of imaginal growth. We use 20E-containing media to observe growth directly and to apply recently developed methods for quantitatively decomposing tissue shape changes into cellular contributions. We discover that whereas cell divisions drive tissue expansion along one axis, their contribution to expansion along the orthogonal axis is cancelled by cell rearrangements and cell shape changes. This finding raises the possibility that anisotropic mechanical constraints contribute to growth orientation in the wing disc.
Collapse
|
24
|
A tunable refractive index matching medium for live imaging cells, tissues and model organisms. eLife 2017; 6. [PMID: 28708059 PMCID: PMC5582871 DOI: 10.7554/elife.27240] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/13/2017] [Indexed: 11/17/2022] Open
Abstract
In light microscopy, refractive index mismatches between media and sample cause spherical aberrations that often limit penetration depth and resolution. Optical clearing techniques can alleviate these mismatches, but they are so far limited to fixed samples. We present Iodixanol as a non-toxic medium supplement that allows refractive index matching in live specimens and thus substantially improves image quality in live-imaged primary cell cultures, planarians, zebrafish and human cerebral organoids. DOI:http://dx.doi.org/10.7554/eLife.27240.001 Light microscopy is a key tool in biomedical research. For perfect images, light needs to be able to pass through the sample, the material (or “mounting medium”) that holds the sample in place, and finally the image-detecting equipment in a straight line. However, in practice, light rays often deviate away from this line because they move at different speeds in different materials; how much the speed of light changes is related to a property called the refractive index of the material. This is exactly the effect that causes a stick stuck into water to look bent at the water’s surface. In light microscopy, mismatches in refractive index significantly reduce quality of the images that can be obtained. Live specimens are particularly challenging to image because different specimens have very different refractive indices compared to the mounting medium, which holds specimens in place but must also keep them alive. Although the addition of chemical compounds can theoretically match the refractive index of the mounting medium to that of the specimen, this approach has so far not been practical because such manipulations tend to kill the specimen. An important challenge has therefore been to identify a compound that can adjust, or “tune”, the refractive index of mounting media over a wide range, yet without harming the specimens. Now, Boothe et al. have identified a chemical called Iodixanol as an ideal and easy to use supplement for tuning the refractive index of water-based live imaging media. Adding Iodixanol to the mounting media did not appear to have any toxic effects on cell cultures, developing zebrafish embryos or regenerating planarian flatworms. Importantly, Boothe et al. found that Iodixanol significantly improved the quality of the images collected from all of these different specimens. It is important to stress that Iodixanol does not change the refractive index of the sample or cancel out refractive index differences within the sample – so it cannot render opaque specimens transparent. Nevertheless, Iodixanol supplementation is a simple and affordable technique to improve image quality in any live imaging application without having to resort to more expensive and highly specialized microscopes. DOI:http://dx.doi.org/10.7554/eLife.27240.002
Collapse
|
25
|
Abstract
Characterizing the identity and types of neurons in the brain, as well as their associated function, requires a means of quantifying and comparing 3D neuron morphology. Presently, neuron comparison methods are based on statistics from neuronal morphology such as size and number of branches, which are not fully suitable for detecting local similarities and differences in the detailed structure. We developed BlastNeuron to compare neurons in terms of their global appearance, detailed arborization patterns, and topological similarity. BlastNeuron first compares and clusters 3D neuron reconstructions based on global morphology features and moment invariants, independent of their orientations, sizes, level of reconstruction and other variations. Subsequently, BlastNeuron performs local alignment between any pair of retrieved neurons via a tree-topology driven dynamic programming method. A 3D correspondence map can thus be generated at the resolution of single reconstruction nodes. We applied BlastNeuron to three datasets: (1) 10,000+ neuron reconstructions from a public morphology database, (2) 681 newly and manually reconstructed neurons, and (3) neurons reconstructions produced using several independent reconstruction methods. Our approach was able to accurately and efficiently retrieve morphologically and functionally similar neuron structures from large morphology database, identify the local common structures, and find clusters of neurons that share similarities in both morphology and molecular profiles.
Collapse
|
26
|
Automated detection and quantification of single RNAs at cellular resolution in zebrafish embryos. J Cell Sci 2016. [DOI: 10.1242/jcs.186973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
27
|
A platform for brain-wide imaging and reconstruction of individual neurons. eLife 2016; 5:e10566. [PMID: 26796534 PMCID: PMC4739768 DOI: 10.7554/elife.10566] [Citation(s) in RCA: 253] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 11/18/2015] [Indexed: 12/19/2022] Open
Abstract
The structure of axonal arbors controls how signals from individual neurons are routed within the mammalian brain. However, the arbors of very few long-range projection neurons have been reconstructed in their entirety, as axons with diameters as small as 100 nm arborize in target regions dispersed over many millimeters of tissue. We introduce a platform for high-resolution, three-dimensional fluorescence imaging of complete tissue volumes that enables the visualization and reconstruction of long-range axonal arbors. This platform relies on a high-speed two-photon microscope integrated with a tissue vibratome and a suite of computational tools for large-scale image data. We demonstrate the power of this approach by reconstructing the axonal arbors of multiple neurons in the motor cortex across a single mouse brain.
Collapse
|
28
|
Automated detection and quantification of single RNAs at cellular resolution in zebrafish embryos. Development 2015; 143:540-6. [PMID: 26700682 DOI: 10.1242/dev.128918] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 12/14/2015] [Indexed: 12/25/2022]
Abstract
Analysis of differential gene expression is crucial for the study of cell fate and behavior during embryonic development. However, automated methods for the sensitive detection and quantification of RNAs at cellular resolution in embryos are lacking. With the advent of single-molecule fluorescence in situ hybridization (smFISH), gene expression can be analyzed at single-molecule resolution. However, the limited availability of protocols for smFISH in embryos and the lack of efficient image analysis pipelines have hampered quantification at the (sub)cellular level in complex samples such as tissues and embryos. Here, we present a protocol for smFISH on zebrafish embryo sections in combination with an image analysis pipeline for automated transcript detection and cell segmentation. We use this strategy to quantify gene expression differences between different cell types and identify differences in subcellular transcript localization between genes. The combination of our smFISH protocol and custom-made, freely available, analysis pipeline will enable researchers to fully exploit the benefits of quantitative transcript analysis at cellular and subcellular resolution in tissues and embryos.
Collapse
|
29
|
|
30
|
|
31
|
Fast, accurate reconstruction of cell lineages from large-scale fluorescence microscopy data. Nat Methods 2014; 11:951-8. [PMID: 25042785 DOI: 10.1038/nmeth.3036] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 06/16/2014] [Indexed: 12/19/2022]
Abstract
The comprehensive reconstruction of cell lineages in complex multicellular organisms is a central goal of developmental biology. We present an open-source computational framework for the segmentation and tracking of cell nuclei with high accuracy and speed. We demonstrate its (i) generality by reconstructing cell lineages in four-dimensional, terabyte-sized image data sets of fruit fly, zebrafish and mouse embryos acquired with three types of fluorescence microscopes, (ii) scalability by analyzing advanced stages of development with up to 20,000 cells per time point at 26,000 cells min(-1) on a single computer workstation and (iii) ease of use by adjusting only two parameters across all data sets and providing visualization and editing tools for efficient data curation. Our approach achieves on average 97.0% linkage accuracy across all species and imaging modalities. Using our system, we performed the first cell lineage reconstruction of early Drosophila melanogaster nervous system development, revealing neuroblast dynamics throughout an entire embryo.
Collapse
|
32
|
Thalamocortical input onto layer 5 pyramidal neurons measured using quantitative large-scale array tomography. Front Neural Circuits 2013; 7:177. [PMID: 24273494 PMCID: PMC3824245 DOI: 10.3389/fncir.2013.00177] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Accepted: 10/16/2013] [Indexed: 11/13/2022] Open
Abstract
The subcellular locations of synapses on pyramidal neurons strongly influences dendritic integration and synaptic plasticity. Despite this, there is little quantitative data on spatial distributions of specific types of synaptic input. Here we use array tomography (AT), a high-resolution optical microscopy method, to examine thalamocortical (TC) input onto layer 5 pyramidal neurons. We first verified the ability of AT to identify synapses using parallel electron microscopic analysis of TC synapses in layer 4. We then use large-scale array tomography (LSAT) to measure TC synapse distribution on L5 pyramidal neurons in a 1.00 × 0.83 × 0.21 mm3 volume of mouse somatosensory cortex. We found that TC synapses primarily target basal dendrites in layer 5, but also make a considerable input to proximal apical dendrites in L4, consistent with previous work. Our analysis further suggests that TC inputs are biased toward certain branches and, within branches, synapses show significant clustering with an excess of TC synapse nearest neighbors within 5–15 μm compared to a random distribution. Thus, we show that AT is a sensitive and quantitative method to map specific types of synaptic input on the dendrites of entire neurons. We anticipate that this technique will be of wide utility for mapping functionally-relevant anatomical connectivity in neural circuits.
Collapse
|
33
|
Unsupervised segmentation of noisy electron microscopy images using salient watersheds and region merging. BMC Bioinformatics 2013; 14:294. [PMID: 24090265 PMCID: PMC3852992 DOI: 10.1186/1471-2105-14-294] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 06/19/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Segmenting electron microscopy (EM) images of cellular and subcellular processes in the nervous system is a key step in many bioimaging pipelines involving classification and labeling of ultrastructures. However, fully automated techniques to segment images are often susceptible to noise and heterogeneity in EM images (e.g. different histological preparations, different organisms, different brain regions, etc.). Supervised techniques to address this problem are often helpful but require large sets of training data, which are often difficult to obtain in practice, especially across many conditions. RESULTS We propose a new, principled unsupervised algorithm to segment EM images using a two-step approach: edge detection via salient watersheds following by robust region merging. We performed experiments to gather EM neuroimages of two organisms (mouse and fruit fly) using different histological preparations and generated manually curated ground-truth segmentations. We compared our algorithm against several state-of-the-art unsupervised segmentation algorithms and found superior performance using two standard measures of under-and over-segmentation error. CONCLUSIONS Our algorithm is general and may be applicable to other large-scale segmentation problems for bioimages.
Collapse
|
34
|
Abstract
Motivation: Optical flow is a key method used for quantitative motion
estimation of biological structures in light microscopy. It has also been used as a key
module in segmentation and tracking systems and is considered a mature technology in the
field of computer vision. However, most of the research focused on 2D natural images,
which are small in size and rich in edges and texture information. In contrast, 3D
time-lapse recordings of biological specimens comprise up to several terabytes of image
data and often exhibit complex object dynamics as well as blurring due to the
point-spread-function of the microscope. Thus, new approaches to optical flow are required
to improve performance for such data. Results: We solve optical flow in large 3D time-lapse microscopy datasets by
defining a Markov random field (MRF) over super-voxels in the foreground and applying
motion smoothness constraints between super-voxels instead of voxel-wise. This model is
tailored to the specific characteristics of light microscopy datasets: super-voxels help
registration in textureless areas, the MRF over super-voxels efficiently propagates motion
information between neighboring cells and the background subtraction and super-voxels
reduce the dimensionality of the problem by an order of magnitude. We validate our
approach on large 3D time-lapse datasets of Drosophila and zebrafish
development by analyzing cell motion patterns. We show that our approach is, on average,
10 × faster than commonly used optical flow implementations in the Insight Tool-Kit
(ITK) and reduces the average flow end point error by 50% in regions with complex
dynamic processes, such as cell divisions. Availability: Source code freely available in the Software section at
http://janelia.org/lab/keller-lab. Contact:amatf@janelia.hhmi.org or kellerp@janelia.hhmi.org Supplementary information:Supplementary data are available at Bioinformatics
online.
Collapse
|
35
|
A GAL4-driver line resource for Drosophila neurobiology. Cell Rep 2012; 2:991-1001. [PMID: 23063364 PMCID: PMC3515021 DOI: 10.1016/j.celrep.2012.09.011] [Citation(s) in RCA: 916] [Impact Index Per Article: 76.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 09/14/2012] [Accepted: 09/17/2012] [Indexed: 11/19/2022] Open
Abstract
We established a collection of 7,000 transgenic lines of Drosophila melanogaster. Expression of GAL4 in each line is controlled by a different, defined fragment of genomic DNA that serves as a transcriptional enhancer. We used confocal microscopy of dissected nervous systems to determine the expression patterns driven by each fragment in the adult brain and ventral nerve cord. We present image data on 6,650 lines. Using both manual and machine-assisted annotation, we describe the expression patterns in the most useful lines. We illustrate the utility of these data for identifying novel neuronal cell types, revealing brain asymmetry, and describing the nature and extent of neuronal shape stereotypy. The GAL4 lines allow expression of exogenous genes in distinct, small subsets of the adult nervous system. The set of DNA fragments, each driving a documented expression pattern, will facilitate the generation of additional constructs for manipulating neuronal function.
Collapse
|
36
|
Abstract
We have developed software for fully automated tracking of vibrissae (whiskers) in high-speed videos (>500 Hz) of head-fixed, behaving rodents trimmed to a single row of whiskers. Performance was assessed against a manually curated dataset consisting of 1.32 million video frames comprising 4.5 million whisker traces. The current implementation detects whiskers with a recall of 99.998% and identifies individual whiskers with 99.997% accuracy. The average processing rate for these images was 8 Mpx/s/cpu (2.6 GHz Intel Core2, 2 GB RAM). This translates to 35 processed frames per second for a 640 px×352 px video of 4 whiskers. The speed and accuracy achieved enables quantitative behavioral studies where the analysis of millions of video frames is required. We used the software to analyze the evolving whisking strategies as mice learned a whisker-based detection task over the course of 6 days (8148 trials, 25 million frames) and measure the forces at the sensory follicle that most underlie haptic perception.
Collapse
|
37
|
Abstract
Motivation: The centrosome is a dynamic structure in animal cells that serves as a microtubule organizing center during mitosis and also regulates cell-cycle progression and sets polarity cues. Automated and reliable tracking of centrosomes is essential for genetic screens that study the process of centrosome assembly and maturation in the nematode Caenorhabditis elegans. Results: We have developed a fully automatic system for tracking and measuring fluorescently labeled centrosomes in 3D time-lapse images of early C.elegans embryos. Using a spinning disc microscope, we monitor the centrosome cycle in living embryos from the 1- up to the 16-cell stage at imaging intervals between 30 and 50 s. After establishing the centrosome trajectories with a novel method involving two layers of inference, we also automatically detect the nuclear envelope breakdown in each cell division and recognize the identities of the centrosomes based on the invariant cell lineage of C.elegans. To date, we have tracked centrosomes in over 500 wild type and mutant embryos with almost no manual correction required. Availability: The centrosome tracking software along with test data is freely available at http://publications.mpi-cbg.de/itemPublication.html?documentId=4082 Contact:jaensch@mpi-cbg.de
Collapse
|
38
|
Vibrissa-based object localization in head-fixed mice. J Neurosci 2010; 30:1947-67. [PMID: 20130203 PMCID: PMC6634009 DOI: 10.1523/jneurosci.3762-09.2010] [Citation(s) in RCA: 212] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Revised: 12/21/2009] [Accepted: 12/24/2009] [Indexed: 11/21/2022] Open
Abstract
Linking activity in specific cell types with perception, cognition, and action, requires quantitative behavioral experiments in genetic model systems such as the mouse. In head-fixed primates, the combination of precise stimulus control, monitoring of motor output, and physiological recordings over large numbers of trials are the foundation on which many conceptually rich and quantitative studies have been built. Choice-based, quantitative behavioral paradigms for head-fixed mice have not been described previously. Here, we report a somatosensory absolute object localization task for head-fixed mice. Mice actively used their mystacial vibrissae (whiskers) to sense the location of a vertical pole presented to one side of the head and reported with licking whether the pole was in a target (go) or a distracter (no-go) location. Mice performed hundreds of trials with high performance (>90% correct) and localized to <0.95 mm (<6 degrees of azimuthal angle). Learning occurred over 1-2 weeks and was observed both within and across sessions. Mice could perform object localization with single whiskers. Silencing barrel cortex abolished performance to chance levels. We measured whisker movement and shape for thousands of trials. Mice moved their whiskers in a highly directed, asymmetric manner, focusing on the target location. Translation of the base of the whiskers along the face contributed substantially to whisker movements. Mice tended to maximize contact with the go (rewarded) stimulus while minimizing contact with the no-go stimulus. We conjecture that this may amplify differences in evoked neural activity between trial types.
Collapse
|
39
|
Segmentation of center brains and optic lobes in 3D confocal images of adult fruit fly brains. Methods 2009; 50:63-9. [PMID: 19698789 PMCID: PMC2841987 DOI: 10.1016/j.ymeth.2009.08.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Revised: 08/10/2009] [Accepted: 08/13/2009] [Indexed: 11/30/2022] Open
Abstract
Automatic alignment (registration) of 3D images of adult fruit fly brains is often influenced by the significant displacement of the relative locations of the two optic lobes (OLs) and the center brain (CB). In one of our ongoing efforts to produce a better image alignment pipeline of adult fruit fly brains, we consider separating CB and OLs and align them independently. This paper reports our automatic method to segregate CB and OLs, in particular under conditions where the signal to noise ratio (SNR) is low, the variation of the image intensity is big, and the relative displacement of OLs and CB is substantial. We design an algorithm to find a minimum-cost 3D surface in a 3D image stack to best separate an OL (of one side, either left or right) from CB. This surface is defined as an aggregation of the respective minimum-cost curves detected in each individual 2D image slice. Each curve is defined by a list of control points that best segregate OL and CB. To obtain the locations of these control points, we derive an energy function that includes an image energy term defined by local pixel intensities and two internal energy terms that constrain the curve's smoothness and length. Gradient descent method is used to optimize this energy function. To improve both the speed and robustness of the method, for each stack, the locations of optimized control points in a slice are taken as the initialization prior for the next slice. We have tested this approach on simulated and real 3D fly brain image stacks and demonstrated that this method can reasonably segregate OLs from CBs despite the aforementioned difficulties.
Collapse
|
40
|
VANO: a volume-object image annotation system. Bioinformatics 2009; 25:695-7. [PMID: 19189978 PMCID: PMC2647838 DOI: 10.1093/bioinformatics/btp046] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Revised: 01/14/2009] [Accepted: 01/16/2009] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Volume-object annotation system (VANO) is a cross-platform image annotation system that enables one to conveniently visualize and annotate 3D volume objects including nuclei and cells. An application of VANO typically starts with an initial collection of objects produced by a segmentation computation. The objects can then be labeled, categorized, deleted, added, split, merged and redefined. VANO has been used to build high-resolution digital atlases of the nuclei of Caenorhabditis elegans at the L1 stage and the nuclei of Drosophila melanogaster's ventral nerve cord at the late embryonic stage. AVAILABILITY Platform independent executables of VANO, a sample dataset, and a detailed description of both its design and usage are available at research.janelia.org/peng/proj/vano. VANO is open-source for co-development.
Collapse
|
41
|
Abstract
MOTIVATION Caenorhabditis elegans, a roundworm found in soil, is a widely studied model organism with about 1000 cells in the adult. Producing high-resolution fluorescence images of C.elegans to reveal biological insights is becoming routine, motivating the development of advanced computational tools for analyzing the resulting image stacks. For example, worm bodies usually curve significantly in images. Thus one must 'straighten' the worms if they are to be compared under a canonical coordinate system. RESULTS We develop a worm straightening algorithm (WSA) that restacks cutting planes orthogonal to a 'backbone' that models the anterior-posterior axis of the worm. We formulate the backbone as a parametric cubic spline defined by a series of control points. We develop two methods for automatically determining the locations of the control points. Our experimental methods show that our approaches effectively straighten both 2D and 3D worm images.
Collapse
|
42
|
Abstract
We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.
Collapse
|
43
|
Abstract
Background Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns. Results We have developed algorithms and associated software that extracts a feature representation of a gene expression pattern from an ISH image, that clusters genes sharing the same spatio-temporal pattern of expression, that suggests transcription factor binding (TFB) site motifs for genes that appear to be co-regulated (based on the clustering), and that automatically identifies the anatomical regions that express a gene given a training set of annotations. In fact, we developed three different feature representations, based on Gaussian Mixture Models (GMM), Principal Component Analysis (PCA), and wavelet functions, each having different merits with respect to the tasks above. For clustering image patterns, we developed a minimum spanning tree method (MSTCUT), and for proposing TFB sites we used standard motif finders on clustered/co-expressed genes with the added twist of requiring conservation across the genomes of 8 related fly species. Lastly, we trained a suite of binary-classifiers, one for each anatomical annotation term in a controlled vocabulary or ontology that operate on the wavelet feature representation. We report the results of applying these methods to the Berkeley Drosophila Genome Project (BDGP) gene expression database. Conclusion Our automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.
Collapse
|
44
|
Improved repeat identification and masking in Dipterans. Gene 2006; 389:1-9. [PMID: 17137733 PMCID: PMC1945102 DOI: 10.1016/j.gene.2006.09.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 09/08/2006] [Accepted: 09/09/2006] [Indexed: 12/23/2022]
Abstract
Repetitive sequences are a major constituent of many eukaryote genomes and play roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability. The identification of repetitive elements has traditionally relied on in-depth, manual curation and computational determination of close relatives based on DNA identity. However, the rapid divergence of repetitive sequence has made identification of repeats by DNA identity difficult even in closely related species. Hence, the presence of unidentified repeats in genome sequences affects the quality of gene annotations and annotation-dependent analyses (e.g. microarray analyses). We have developed an enhanced repeat identification pipeline using two approaches. First, the de novo repeat finding program PILER-DF was used to identify interspersed repetitive elements in several recently finished Dipteran genomes. Repeats were classified, when possible, according to their similarity to known elements described in Repbase and GenBank, and also screened against annotated genes as one means of eliminating false positives. Second, we used a new program called RepeatRunner, which integrates results from both RepeatMasker nucleotide searches and protein searches using BLASTX. Using RepeatRunner with PILER-DF predictions, we masked repeats in thirteen Dipteran genomes and conclude that combining PILER-DF and RepeatRunner greatly enhances repeat identification in both well-characterized and un-annotated genomes.
Collapse
|
45
|
Interpreting anonymous DNA samples from mass disasters--probabilistic forensic inference using genetic markers. Bioinformatics 2006; 22:e298-306. [PMID: 16873485 DOI: 10.1093/bioinformatics/btl200] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The problem of identifying victims in a mass disaster using DNA fingerprints involves a scale of computation that requires efficient and accurate algorithms. In a typical scenario there are hundreds of samples taken from remains that must be matched to the pedigrees of the alleged victim's surviving relatives. Moreover the samples are often degraded due to heat and exposure. To develop a competent method for this type of forensic inference problem, the complicated quality issues of DNA typing need to be handled appropriately, the matches between every sample and every family must be considered, and the confidence of matches need to be provided. RESULTS We present a unified probabilistic framework that efficiently clusters samples, conservatively eliminates implausible sample-pedigree pairings, and handles both degraded samples (missing values) and experimental errors in producing and/or reading a genotype. We present a method that confidently exclude forensically unambiguous sample-family matches from the large hypothesis space of candidate matches, based on posterior probabilistic inference. Due to the high confidentiality of disaster DNA data, simulation experiments are commonly performed and used here for validation. Our framework is shown to be robust to these errors at levels typical in real applications. Furthermore, the flexibility in the probabilistic models makes it possible to extend this framework to include other biological factors such as interdependent markers, mitochondrial sequences, and blood type. AVAILABILITY The software and data sets are available from the authors upon request.
Collapse
|
46
|
Abstract
MOTIVATION Many signals in biological sequences are based on the presence or absence of base signals and their spatial combinations. One of the best known examples of this is the signal identifying a core promoter--the site at which the basal transcription machinery starts the transcription of a gene. Our goal is a fully automatic pattern recognition system for a family of sequences, which simultaneously discovers the base signals, their spatial relationships and a classifier based upon them. RESULTS In this paper we present a general method for characterizing a set of sequences by their recurrent motifs. Our approach relies on novel probabilistic models for DNA binding sites and modules of binding sites, on algorithms to study them from the data and on a support vector machine that uses the models studied to classify a set of sequences. We demonstrate the applicability of our approach to diverse instances, ranging from families of promoter sequences to a dataset of intronic sequences flanking alternatively spliced exons. On a core promoter dataset our results are comparable with the state-of-the-art McPromoter. On a dataset of alternatively spliced exons we outperform a previous approach. We also achieve high success rates in recognizing cell cycle regulated genes. These results demonstrate that a fully automatic pattern recognition algorithm can meet or exceed the performance of hand-crafted approaches. AVAILABILITY The software and datasets are available from the authors upon request.
Collapse
|
47
|
Abstract
SUMMARY Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem. Therefore, repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curated libraries of known repeat families. We present a new approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. We describe PILER, a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for Homo sapiens, Arabidopsis thalania and Drosophila melanogaster. AVAILABILITY The PILER software is freely available at http://www.drive5.com/piler.
Collapse
|
48
|
Abstract
Fast and exact comparison of large genomic sequences remains a challenging task in biosequence analysis. We consider the problem of finding all epsilon-matches between two sequences, i.e., all local alignments over a given length with an error rate of at most epsilon. We study this problem theoretically, giving an efficient q-gram filter for solving it. Two applications of the filter are also discussed, in particular genomic sequence assembly and BLAST-like sequence comparison. Our results show that the method is 25 times faster than BLAST, while not being heuristic.
Collapse
|
49
|
Abstract
We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.
Collapse
|
50
|
Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005; 437:376-80. [PMID: 16056220 PMCID: PMC1464427 DOI: 10.1038/nature03959] [Citation(s) in RCA: 4967] [Impact Index Per Article: 261.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2005] [Accepted: 06/10/2005] [Indexed: 02/06/2023]
Abstract
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
Collapse
|