GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 325-334
    Abstract: Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions 3,4 . We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences 5,6 .
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    In: Cell Reports Methods, Elsevier BV, Vol. 3, No. 8 ( 2023-08), p. 100543-
    Type of Medium: Online Resource
    ISSN: 2667-2375
    Language: English
    Publisher: Elsevier BV
    Publication Date: 2023
    detail.hit.zdb_id: 3091714-1
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    In: Nature, Springer Science and Business Media LLC, Vol. 611, No. 7936 ( 2022-11-17), p. 519-531
    Abstract: The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2022
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    In: Nature Biotechnology, Springer Science and Business Media LLC
    Type of Medium: Online Resource
    ISSN: 1087-0156 , 1546-1696
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 1494943-X
    detail.hit.zdb_id: 1311932-1
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 335-343
    Abstract: The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications 1,2 . Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology 3 , it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium 4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph 5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination 6,7 . The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations 8 , and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago 9 .
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 312-324
    Abstract: Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 7
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 370, No. 6523 ( 2020-12-18)
    Abstract: The rhesus macaque ( Macaca mulatta ) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2020
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 8
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 376, No. 6588 ( 2022-04)
    Abstract: To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how these sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry.
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2022
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 9
    In: Genome Research, Cold Spring Harbor Laboratory, Vol. 33, No. 4 ( 2023-04), p. 496-510
    Abstract: There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%] , or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
    Type of Medium: Online Resource
    ISSN: 1088-9051 , 1549-5469
    RVK:
    Language: English
    Publisher: Cold Spring Harbor Laboratory
    Publication Date: 2023
    detail.hit.zdb_id: 1483456-X
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 10
    In: Nature, Springer Science and Business Media LLC, Vol. 585, No. 7823 ( 2020-09-03), p. 79-84
    Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist 1,2 . Here we present a human genome assembly that surpasses the continuity of GRCh38 2 , along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3 , we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2020
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...