GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    In: Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, Vol. 116, No. 46 ( 2019-11-12), p. 23243-23253
    Abstract: Short tandem repeats (STRs) and variable number tandem repeats (VNTRs) are important sources of natural and disease-causing variation, yet they have been problematic to resolve in reference genomes and genotype with short-read technology. We created a framework to model the evolution and instability of STRs and VNTRs in apes. We phased and assembled 3 ape genomes (chimpanzee, gorilla, and orangutan) using long-read and 10x Genomics linked-read sequence data for 21,442 human tandem repeats discovered in 6 haplotype-resolved assemblies of Yoruban, Chinese, and Puerto Rican origin. We define a set of 1,584 STRs/VNTRs expanded specifically in humans, including large tandem repeats affecting coding and noncoding portions of genes (e.g., MUC3A , CACNA1C ). We show that short interspersed nuclear element–VNTR– Alu (SVA) retrotransposition is the main mechanism for distributing GC-rich human-specific tandem repeat expansions throughout the genome but with a bias against genes. In contrast, we observe that VNTRs not originating from retrotransposons have a propensity to cluster near genes, especially in the subtelomere. Using tissue-specific expression from human and chimpanzee brains, we identify genes where transcript isoform usage differs significantly, likely caused by cryptic splicing variation within VNTRs. Using single-cell expression from cerebral organoids, we observe a strong effect for genes associated with transcription profiles analogous to intermediate progenitor cells. Finally, we compare the sequence composition of some of the largest human-specific repeat expansions and identify 52 STRs/VNTRs with at least 40 uninterrupted pure tracts as candidates for genetically unstable regions associated with disease.
    Type of Medium: Online Resource
    ISSN: 0027-8424 , 1091-6490
    RVK:
    RVK:
    Language: English
    Publisher: Proceedings of the National Academy of Sciences
    Publication Date: 2019
    detail.hit.zdb_id: 209104-5
    detail.hit.zdb_id: 1461794-8
    SSG: 11
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 370, No. 6523 ( 2020-12-18)
    Abstract: The rhesus macaque ( Macaca mulatta ) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2020
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    In: Nature, Springer Science and Business Media LLC, Vol. 611, No. 7936 ( 2022-11-17), p. 519-531
    Abstract: The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2022
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 325-334
    Abstract: Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions 3,4 . We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences 5,6 .
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 312-324
    Abstract: Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    In: Nature, Springer Science and Business Media LLC, Vol. 617, No. 7960 ( 2023-05-11), p. 335-343
    Abstract: The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications 1,2 . Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology 3 , it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium 4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph 5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination 6,7 . The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations 8 , and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago 9 .
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2023
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 7
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 376, No. 6588 ( 2022-04)
    Abstract: Large, high-identity duplicated sequences—termed segmental duplications (SDs)—are frequently the last regions of genomes to be sequenced and assembled. While the human reference genome provided a roadmap of the SD landscape, 〉 50% of the remaining gaps correspond to regions of complex SDs.  RATIONALE SDs are major sources of evolutionary gene innovations and contribute disproportionately to genetic variation within and between ape species. With the complete human genome (T2T-CHM13), researchers have the potential to identify genes and uncover patterns of human genetic variation.  RESULTS We identified 51 million base pairs (Mbp) of additional human SD in T2T-CHM13 and now estimate that 7% of the human genome consists of SDs [(218 Mbp of 3.1 billion base pairs (Gbp)]. SDs make up two-thirds (45.1 of 68.1 Mbp) of acrocentric short arms, and these SDs are the largest in the human genome (see the figure, panel A). Additionally, 54% of acrocentric SDs are copy number variable or map to different chromosomes among the six individuals examined. A detailed comparison between the current reference genome (GRCh38) and T2T-CHM13 for SD content identifies 81 Mbp of previously unresolved or structurally variable SDs. Short-read whole-genome sequence data from a diversity panel of 268 humans show that human copy number is nine times (59.26 versus 6.55 Mbp) more likely to match T2T-CHM13 rather than GRCh38, including 119 protein-coding genes (see the figure, panel B). Using long-read–sequencing data from 25 human haplotypes, we investigated patterns of human genetic variation identifying significant increases in structural and single-nucleotide diversity. We identified gene-rich regions (e.g., TBC1D3 ) that vary by hundreds of kilo–base pairs and gene copy number between individuals showing some of the highest genome-wide structural heterozygosity (85 to 90%). Our analysis identified 182 candidate protein-coding genes as well as the complete sequence for structurally variable gene models that were previously unresolved. Among these is the complete gene structure of lipoprotein A ( LPA ), including the expanded kringle IV repeat domain. Reduced copies of this domain are among the strongest genetic associations with cardiovascular disease, especially among African Americans, and sequencing of multiple human haplotypes identified not only copy number variation but also other forms of rare coding variation potentially relevant to disease risk. Finally, we compared global methylation and expression patterns between duplicated and unique genes. Transcriptionally inactive duplicate genes are more likely to map to hypomethylated genomic regions; however, specifically over the transcription start site we observe an increase in methylation, suggesting that as many as two-thirds of duplicated genes are epigenetically silenced. Additionally, SD genes show a high degree of concordance between methylation profiles and transcription levels, allowing us to define the actively transcribed members of high-identity gene families that are otherwise indistinguishable by coding sequence. CONCLUSION A complete human genome provides a more comprehensive understanding of the organization, expression, and regulation of duplicated genes. Our analysis reveals underappreciated patterns of human genetic diversity and suggests characteristic features of methylation and gene regulation. This resource will serve as a critical baseline for improved gene annotation, genotyping, and previously unknown associations for some of the most dynamic regions of our genome. More-complete segmental duplication content improves genotyping. ( A ) Increase (by a factor of 10) in the number of large ( 〉 10 kilo–base pairs) acrocentric segmental duplications (red) in T2T-CHM13 (right) compared with GRCh38 (left). ( B ) Read-depth genotyping of short-read Illumina whole-genome sequence from a human diversity panel ( n = 268) better matches T2T-CHM13 (red) when compared to GRCh38 (blue), irrespective of human population group.
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2022
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 8
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 366, No. 6463 ( 2019-10-18)
    Abstract: Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2019
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 9
    In: Nature, Springer Science and Business Media LLC, Vol. 585, No. 7823 ( 2020-09-03), p. 79-84
    Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist 1,2 . Here we present a human genome assembly that surpasses the continuity of GRCh38 2 , along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3 , we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2020
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 10
    In: Nature, Springer Science and Business Media LLC, Vol. 593, No. 7857 ( 2021-05-06), p. 101-107
    Abstract: The complete assembly of each human chromosome is essential for understanding human biology and evolution 1,2 . Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
    Type of Medium: Online Resource
    ISSN: 0028-0836 , 1476-4687
    RVK:
    RVK:
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2021
    detail.hit.zdb_id: 120714-3
    detail.hit.zdb_id: 1413423-8
    SSG: 11
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...