GLORIA — GEOMAR Library Ocean Research Information Access

Hits per page

hits 1 - 10 | 14 hits

Sorting

Online Resource

Insights into mammalian TE diversity through the curation of 248 genome assemblies

Osmanski, Austin B. ; Paulat, Nicole S. ; Korstian, Jenny ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: An estimated 160 million years have passed since the first placental mammals evolved. These eutherians are categorized into 19 orders consisting of nearly 4000 extant species, with ~70% being bats or rodents. Broad, in-depth, and comparative genomic studies across Eutheria have previously been unachievable because of the lack of genomic resources. The collaboration of the Zoonomia Consortium made available hundreds of high-quality genome assemblies for comparative analysis. Our focus within the consortium was to investigate the evolution of transposable elements (TEs) among placental mammals. Using these data, we identified previously known TEs, described previously unknown TEs, and analyzed the TE distribution among multiple taxonomic levels. RATIONALE The emergence of accurate and affordable sequencing technology has propelled efforts to sequence increasingly more nonmodel mammalian genomes in the past decade. Most of these efforts have traditionally focused on genic regions searching for patterns of selection or variation in gene regulation. The common trend of ignoring or trivializing TE annotation with newly published genomes has resulted in severe lag of TE analyses, leading to extensive undiscovered TE variation. This oversight has neglected an important source of evolution because the accumulation of TEs is attributable to drastic alterations in genome architecture, including insertions, deletions, duplications, translocations, and inversions. Our approach to the Zoonomia dataset was to provide future inquirers accurate and meticulous TE curations and to describe taxonomic variation among eutherians. RESULTS We annotated the TE content of 248 mammalian genome assemblies, which yielded a library of 25,676 consensus TE sequences, 8263 of which were previously unidentified TE sequences (available at https://dfam.org ). We affirmed that the largest component of a typical mammalian genome is comprised of TEs (average 45.6%). Of the 248 assemblies, the lowest genomic percentage of TEs was found in the star-nosed mole (27.6%), and the largest percentage was seen in the aardvark (74.5%), whose increase in TE accumulation drove a corresponding increase in genome size—a correlation we observed across Eutheria. The overall genomic proportions of recently accumulated TEs were roughly similar across most mammals in the dataset, with a few notable exceptions (see the figure). Diversity of recently accumulated TEs is highest among multiple families of bats, mostly driven by substantial DNA transposon activity. Our data also exhibit an increase of recently accumulated DNA transposons among carnivore lineages over their herbivorous counterparts, which suggests that diet may play a role in determining the genomic content of TEs. CONCLUSION The copious TE data provided in this work emanated from the largest comprehensive TE curation effort to date. Considering the wide-ranging effects that TEs impose on genomic architecture, these data are an important resource for future inquiries into mammalian genomics and evolution and suggest avenues for continued study of these important yet understudied genomic denizens. Boxplots depicting the range of recently accumulated TEs among mammals (by proportion of genome). Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had 〈 0.1%; DNA: 210 of 248 mammals had 〈 0.1%). ILLUSTRATIONS: BRITTANY ANN HALE

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn1430

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Chiropterans Are a Hotspot for Horizontal Transfer of DNA Transposons in Mammalia

Paulat, Nicole S ; Storer, Jessica M ; Moreno-Santillán, Diana D ; [et al.]

Oxford University Press (OUP) ; 2023

In: Molecular Biology and Evolution Vol. 40, No. 5 ( 2023-05-02)

add to mindlist on the mindlist

Details

In: Molecular Biology and Evolution, Oxford University Press (OUP), Vol. 40, No. 5 ( 2023-05-02)

Abstract: Horizontal transfer of transposable elements (TEs) is an important mechanism contributing to genetic diversity and innovation. Bats (order Chiroptera) have repeatedly been shown to experience horizontal transfer of TEs at what appears to be a high rate compared with other mammals. We investigated the occurrence of horizontally transferred (HT) DNA transposons involving bats. We found over 200 putative HT elements within bats; 16 transposons were shared across distantly related mammalian clades, and 2 other elements were shared with a fish and two lizard species. Our results indicate that bats are a hotspot for horizontal transfer of DNA transposons. These events broadly coincide with the diversification of several bat clades, supporting the hypothesis that DNA transposon invasions have contributed to genetic diversification of bats.

Type of Medium: Online Resource

ISSN: 0737-4038 , 1537-1719

URL: Article

DOI: 10.1093/molbev/msad092

Language: English

Publisher: Oxford University Press (OUP)

Publication Date: 2023

detail.hit.zdb_id: 2024221-9

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Evolutionary constraint and innovation across hundreds of placental mammals

Christmas, Matthew J. ; Kaplow, Irene M. ; Genereux, Diane P. ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: A major challenge in genomics is discerning which bases among billions alter organismal phenotypes and affect health and disease risk. Evidence of past selective pressure on a base, whether highly conserved or fast evolving, is a marker of functional importance. Bases that are unchanged in all mammals may shape phenotypes that are essential for organismal health. Bases that are evolving quickly in some species, or changed only in species that share an adaptive trait, may shape phenotypes that support survival in specific niches. Identifying bases associated with exceptional capacity for cellular recovery, such as in species that hibernate, could inform therapeutic discovery. RATIONALE The power and resolution of evolutionary analyses scale with the number and diversity of species compared. By analyzing genomes for hundreds of placental mammals, we can detect which individual bases in the genome are exceptionally conserved (constrained) and likely to be functionally important in both coding and noncoding regions. By including species that represent all orders of placental mammals and aligning genomes using a method that does not require designating humans as the reference species, we explore unusual traits in other species. RESULTS Zoonomia’s mammalian comparative genomics resources are the most comprehensive and statistically well-powered produced to date, with a protein-coding alignment of 427 mammals and a whole-genome alignment of 240 placental mammals representing all orders. We estimate that at least 10.7% of the human genome is evolutionarily conserved relative to neutrally evolving repeats and identify about 101 million significantly constrained single bases (false discovery rate 〈 0.05). We cataloged 4552 ultraconserved elements at least 20 bases long that are identical in more than 98% of the 240 placental mammals. Many constrained bases have no known function, illustrating the potential for discovery using evolutionary measures. Eighty percent are outside protein-coding exons, and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Constrained bases tend to vary less within human populations, which is consistent with purifying selection. Species threatened with extinction have few substitutions at constrained sites, possibly because severely deleterious alleles have been purged from their small populations. By pairing Zoonomia’s genomic resources with phenotype annotations, we find genomic elements associated with phenotypes that differ between species, including olfaction, hibernation, brain size, and vocal learning. We associate genomic traits, such as the number of olfactory receptor genes, with physical phenotypes, such as the number of olfactory turbinals. By comparing hibernators and nonhibernators, we implicate genes involved in mitochondrial disorders, protection against heat stress, and longevity in this physiologically intriguing phenotype. Using a machine learning–based approach that predicts tissue-specific cis - regulatory activity in hundreds of species using data from just a few, we associate changes in noncoding sequence with traits for which humans are exceptional: brain size and vocal learning. CONCLUSION Large-scale comparative genomics opens new opportunities to explore how genomes evolved as mammals adapted to a wide range of ecological niches and to discover what is shared across species and what is distinctively human. High-quality data for consistently defined phenotypes are necessary to realize this potential. Through partnerships with researchers in other fields, comparative genomics can address questions in human health and basic biology while guiding efforts to protect the biodiversity that is essential to these discoveries. Comparing genomes from 240 species to explore the evolution of placental mammals. Our new phylogeny (black lines) has alternating gray and white shading, which distinguishes mammalian orders (labeled around the perimeter). Rings around the phylogeny annotate species phenotypes. Seven species with diverse traits are illustrated, with black lines marking their branch in the phylogeny. Sequence conservation across species is described at the top left. IMAGE CREDIT: K. MORRILL

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn3943

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Leveraging base-pair mammalian constraint to understand genetic variation and human disease

Sullivan, Patrick F. ; Meadows, Jennifer R. S. ; Gazal, Steven ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency 〈 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P 〈 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P 〈 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P 〈 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability.

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn2937

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

The functional and evolutionary impacts of human-specific deletions in conserved elements

Xue, James R. ; Mackay-Smith, Ava ; Mouri, Kousuke ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Deciphering the molecular and genetic changes that differentiate humans from our closest primate relatives is critical for understanding our origins. Although earlier studies have prioritized how newly gained genetic sequences or variations have contributed to evolutionary innovation, the role of sequence loss has been less appreciated. Alterations in evolutionary conserved regions that are enriched for biological function could be particularly more likely to have phenotypic effects. We thus sought to identify and characterize sequences that have been conserved across evolution, but are then surprisingly lost in all humans. These human-specific deletions in conserved regions (hCONDELs) may play an important role in uniquely human traits. RATIONALE Sequencing advancements have identified millions of genetic changes between chimpanzee and human genomes; however, the functional impacts of the ~1 to 5% difference between our species is largely unknown. hCONDELs are one class of these predominantly noncoding sequence changes. Although large hCONDELs ( 〉 1 kb) have been previously identified, the vast majority of all hCONDELs (95.7%) are small ( 〈 20 base pairs) and have not yet been functionally assessed. We adapted massively parallel reporter assays (MPRAs) to characterize the effects of thousands of these small hCONDELs and uncovered hundreds with functional effects. By understanding the effects of these hCONDELs, we can gain insight into the mechanistic patterns driving evolution in the human genome. RESULTS We identified 10,032 hCONDELs by examining conserved regions across diverse vertebrate genomes and overlapping with confidently annotated, human-specific fixed deletions. We found that these hCONDELs are enriched to delete conserved sequences originating from stem amniotes. Overlap with transcriptional, epigenomic, and phenotypic datasets all implicate neuronal and cognitive functional impacts. We characterized these hCONDELs using MPRA in six different human cell types, including induced pluripotent stem cell–derived neural progenitor cells. We found that 800 hCONDELs displayed species-specific regulatory effect effects. Although many hCONDELs perturb transcription factor–binding sites in active enhancers, we estimate that 30% create or improve binding sites, including activators and repressors. Some hCONDELs exhibit molecular functions that affect core neurodevelopmental genes. One hCONDEL removes a single base in an active enhancer in the neurogenesis gene HDAC5 , and another deletes six bases in an alternative promoter of PPP2CA , a gene that regulates neuronal signaling. We deeply characterized an hCONDEL in a putative regulatory element of LOXL2 , a gene that controls neuronal differentiation. Using genome engineering to reintroduce the conserved chimpanzee sequence into human cells, we confirmed that the human deletion alters transcriptional output of LOXL2 . Single-cell RNA sequencing of these cells uncovered a cascade of myelination and synaptic function–related transcriptional changes induced by the hCONDEL. CONCLUSION Our identification of hundreds of hCONDELs with functional impacts reveals new molecular changes that may have shaped our unique biological lineage. These hCONDELs display predicted functions in a variety of biological systems but are especially enriched for function in neuronal tissue. Many hCONDELs induced gains of regulatory activity, a surprising discovery given that deletions of conserved bases are commonly thought to abrogate function. Our work provides a paradigm for the characterization of nucleotide changes shaping species-specific biology across humans or other animals. Human-specific deletions that remove nucleotides from regions highly conserved in other animals (hCONDELs). We assessed 10,032 hCONDELs across diverse, biologically relevant datasets and identified tissue-specific enrichment (top left). The regulatory impact of hCONDELs was characterized by comparing chimp and human sequences in MPRAs (bottom left). The ability of hCONDELs to either improve or perturb activating and repressing gene-regulatory elements was assessed (top right). The deleted chimpanzee sequence was reintroduced back into human cells, causing a cascade of transcriptional differences for an hCONDEL regulating LOXL2 (bottom right).

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn2253

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Three-dimensional genome rewiring in loci with human accelerated regions

Keough, Kathleen C. ; Whalen, Sean ; Inoue, Fumitaka ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Human accelerated regions (HARs) are evolutionarily conserved sequences that acquired an unexpectedly high number of nucleotide substitutions in the human genome since divergence from our common ancestor with chimpanzees. Prior work has established that many HARs are gene regulatory enhancers that function during embryonic development, particularly in neurodevelopment, and that most HARs show signatures of positive selection. However, the events that caused the sudden change in selective pressures on HARs remain a mystery. RATIONALE Because HARs acquired many substitutions in our ancestors after millions of years of extreme constraint across diverse mammals, we reasoned that their conserved roles in regulating development of the brain and other organs must have changed during human evolution. One mechanism that could drive such a functional shift is enhancer hijacking, whereby the target gene repertoire of a noncoding sequence is changed through alterations in three-dimensional genome folding. The regulatory information encoded in a hijacked enhancer would likely need to change to avoid deleterious expression of the altered target gene while also possibly supporting modified expression patterns. Structural variants—large genomic insertions, deletions, and rearrangements—are the greatest sources of sequence differences between the human and chimpanzee genomes, and they have the potential to affect how a region of the genome folds and localizes in the nucleus. We therefore hypothesized that some HARs were generated through enhancer hijacking triggered by nearby human-specific structural variants (hsSVs). RESULTS We leveraged an alignment of hundreds of mammalian genomes plus a Nextflow pipeline that we wrote for automating the detection of lineage-specific accelerated regions to identify 312 high-confidence HARs (zooHARs). Through massively parallel reporter assays and machine learning integration of hundreds of epigenomic datasets, we showed that many zooHARs function as neurodevelopmental enhancers and that their human substitutions alter transcription factor binding sites, consistent with previous studies. We further mapped zooHARs to specific cell types and tissues using single-cell open chromatin and gene expression data, and we found that they represent a more diverse set of neurodevelopmental processes than a parallel set of chimpanzee accelerated regions. To test the enhancer hijacking hypothesis, we first examined the three-dimensional neighborhoods of zooHARs using publicly available chromatin capture (Hi-C) data, finding a significant enrichment of zooHARs in domains with hsSVs. This motivated us to use deep learning to predict how hsSVs changed genome folding in the human versus the chimpanzee genomes. We found that 30% of zooHARs occur within 500 kb of an hsSV that substantially alters local chromatin interactions, and we confirmed this association in Hi-C data that we generated in human and chimpanzee neural progenitor cells. Finally, we showed that chromatin domains containing zooHARs and hsSVs are enriched for genes differentially expressed in human versus chimpanzee neurodevelopment. CONCLUSION The origin of many HARs may be explained by human-specific structural variants that altered three-dimensional genome folding, causing evolutionarily conserved enhancers to adapt to different target genes and regulatory domains. Example of HAR enhancer hijacking. The HAR is nearby and regulates gene A, but not gene B, as the chimpanzee genome folds. An insertion in the human genome brings the HAR closer to gene B, causing expression of gene B. The HAR adapts to being in gene B’s regulatory domain through substitutions to previously conserved nucleotides.

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abm1696

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Integrating gene annotation with orthology inference at scale

Kirilenko, Bogdan M. ; Munegowda, Chetan ; Osipova, Ekaterina ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Comparative genomics provides valuable insights into gene function, phylogeny, molecular evolution, and associations between phenotypic and genomic differences. Such analyses require knowledge about which genes originated from a speciation event (orthologs) or from a duplication event (paralogs). Existing methods to detect orthologs in turn require knowledge of the location of genes in the genome (gene annotation), which is itself a challenging problem, resulting in a growing gap between sequenced and annotated genomes. RATIONALE We developed TOGA (Tool to infer Orthologs from Genome Alignments), a genomics method that integrates orthology inference and gene annotation. TOGA takes as input a gene annotation of a reference species (e.g., human, mouse, or chicken) and a whole-genome alignment between the reference and a query genome (e.g., other mammals or birds). It infers orthologous gene loci in the query genome, annotates and classifies orthologous genes, detects gene losses and duplications, and generates protein and codon alignments. Orthology detection relies on the principle that orthologous sequences are generally more similar to each other than to paralogous sequences. Whereas existing methods work with annotated protein-coding sequences, TOGA extends this similarity principle to non-exonic regions (introns and intergenic regions) and uses machine learning to detect orthologous gene loci based on alignments of intronic and intergenic regions. RESULTS We demonstrate that TOGA’s machine learning classifier detects orthologous gene loci with a very high accuracy, and also works for orthologous genes that underwent translocations or inversions. TOGA improves ortholog detection and comprehensively annotates conserved genes, even if transcriptomics data are available. Although homology-based methods such as TOGA cannot annotate orthologs of genes that are not present in the reference, we show that reference bias can be effectively counteracted by integrating annotations generated with multiple reference species. TOGA can also be applied to highly fragmented genome assemblies, where genes are often split across scaffolds. By accurately identifying and joining orthologous gene fragments, TOGA annotates entire genes and thus increases the utility of fragmented genomes for comparative analyses. TOGA’s gene classification explicitly distinguishes between genes with missing sequences (indicative of assembly incompleteness) and genes with inactivating mutations (potentially indicative of base errors). We show that this classification provides a superior benchmark for assembly completeness and quality. As genomes are generated at an increasing rate, annotation and orthology inference methods that can handle hundreds or thousands of genomes are needed. TOGA’s reference species methodology scales linearly with the number of query species. By applying TOGA with human and mouse as references to 488 placental mammal assemblies and using chicken as a reference for 501 bird assemblies, we created large comparative resources for mammals and birds that comprise gene annotations, ortholog sets, lists of inactivated genes, and multiple codon alignments. CONCLUSION TOGA provides a general strategy to cope with the annotation and orthology inference bottleneck. We envision three major uses. First, TOGA enables phylogenomic analyses of orthologous genes and screens for gene changes (e.g., selection, loss, and duplication) that are associated with phenotypic differences. Second, TOGA provides annotations of genes that are conserved in newly sequenced genomes, which can be supplemented with transcriptomics data to detect lineage-specific genes or exons. Finally, TOGA’s gene classification provides a powerful genome assembly quality benchmark. A different paradigm for orthology inference. Orthologous, but not paralogous, genes have partially aligning intronic and intergenic regions. TOGA uses this principle to infer orthologous gene loci and integrates orthology inference with gene annotation. Using a reference species, TOGA can be applied to hundreds of aligned query genomes to provide rich comparative genomics resources.

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn3107

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Evolution of the ancestral mammalian karyotype and syntenic regions

Damas, Joana ; Corbo, Marco ; Kim, Jaebum ; [et al.]

Proceedings of the National Academy of Sciences ; 2022

In: Proceedings of the National Academy of Sciences Vol. 119, No. 40 ( 2022-10-04)

add to mindlist on the mindlist

Details

In: Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, Vol. 119, No. 40 ( 2022-10-04)

Abstract: Decrypting the rearrangements that drive mammalian chromosome evolution is critical to understanding the molecular bases of speciation, adaptation, and disease susceptibility. Using 8 scaffolded and 26 chromosome-scale genome assemblies representing 23/26 mammal orders, we computationally reconstructed ancestral karyotypes and syntenic relationships at 16 nodes along the mammalian phylogeny. Three different reference genomes (human, sloth, and cattle) representing phylogenetically distinct mammalian superorders were used to assess reference bias in the reconstructed ancestral karyotypes and to expand the number of clades with reconstructed genomes. The mammalian ancestor likely had 19 pairs of autosomes, with nine of the smallest chromosomes shared with the common ancestor of all amniotes (three still conserved in extant mammals), demonstrating a striking conservation of synteny for ∼320 My of vertebrate evolution. The numbers and types of chromosome rearrangements were classified for transitions between the ancestral mammalian karyotype, descendent ancestors, and extant species. For example, 94 inversions, 16 fissions, and 14 fusions that occurred over 53 My differentiated the therian from the descendent eutherian ancestor. The highest breakpoint rate was observed between the mammalian and therian ancestors (3.9 breakpoints/My). Reconstructed mammalian ancestor chromosomes were found to have distinct evolutionary histories reflected in their rates and types of rearrangements. The distributions of genes, repetitive elements, topologically associating domains, and actively transcribed regions in multispecies homologous synteny blocks and evolutionary breakpoint regions indicate that purifying selection acted over millions of years of vertebrate evolution to maintain syntenic relationships of developmentally important genes and regulatory landscapes of gene-dense chromosomes.

Type of Medium: Online Resource

ISSN: 0027-8424 , 1091-6490

URL: Article

DOI: 10.1073/pnas.2209139119

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: Proceedings of the National Academy of Sciences

Publication Date: 2022

detail.hit.zdb_id: 209104-5

detail.hit.zdb_id: 1461794-8

SSG: 11

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Relating enhancer genetic variation across mammals to complex phenotypes using machine learning

Kaplow, Irene M. ; Lawler, Alyssa J. ; Schäffer, Daniel E. ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of 〉 400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abm7993

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Mammalian evolution of human cis-regulatory elements and transcription factor binding sites

Andrews, Gregory ; Fan, Kaili ; Pratt, Henry E. ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2023

In: Science Vol. 380, No. 6643 ( 2023-04-28)

add to mindlist on the mindlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6643 ( 2023-04-28)

Abstract: Mammals, including humans, achieve high levels of organismal complexity largely due to how their proteins are regulated; characterizing the regulatory landscape of the human genome is a longstanding goal of modern biology. Contemporary approaches measure genome-wide biochemical signals, including chromatin accessibility, histone modifications, DNA methylation, and binding of ~1600 transcription factors (TFs) by the human genome. Using these methods, the ENCODE consortium defined almost one million candidate cis-regulatory elements (cCREs). Another approach uses evolutionary conservation to identify potential regulatory regions. We combine these approaches, examining how different functional classes of regulatory elements respond to evolutionary pressures. RATIONALE cCREs tend to be conserved and cCRE classes exhibit varying levels of conservation, suggesting interesting evolutionary dynamics. We examine these dynamics in placental mammals using tools developed by the Zoonomia project: the evolutionary constraint in placental mammals and the reference-free 241-genome alignment. We identify the human cCREs and transcription factor binding sites (TFBSs) conserved in the mammalian lineage, characterize the evolutionary histories of cCREs and TFBSs and identify the driving forces behind their gains and losses and—using biochemical and epigenomic data—assess the likelihood that conserved cCREs and TFBSs are functional in humans and other mammals. RESULTS We explored the ENCODE cCREs derived from epigenomic data and the binding sites of 367 TFs from chromatin immunoprecipitation data. We found a spectrum of mammalian conservation for regulatory elements: on one end lies the highly conserved cCREs and constrained TFBSs, and on the other are primate-specific cCREs and TFBSs overlapping transposable elements (TEs). Conserved elements predominate near genes that function in fundamental cellular processes (metabolism, development) and tend to be functional in other mammalian genomes whereas unconstrained elements lie near genes involved in interaction with the environment. We identified ~439 thousand deeply conserved cCREs (47.5% of cCREs and 4% of the human genome) and 2 million TFBSs (0.8% of the human genome) under mammalian constraint. Using a panel of 69 genome-wide association studies, we found that conserved cCREs and constrained TFBSs achieved high heritability enrichment, demonstrating their utility for functional interpretation of human genetic variants. Meanwhile, more than 85% of primate-specific TFBSs—representing more than 20% of all TFBSs—are derived from TEs. Phylogenetic analysis revealed a staggering number of TFBS clusters sharing patterns of presence and absence across primate genomes and enrichment in specific TE families, suggesting that multiple waves of TE insertion spread these TFBSs during primate evolution. CONCLUSION We charted the evolutionary landscapes of cCREs and TFBSs among placental mammals, identifying a subset of elements under purifying selection in the mammalian lineage. These elements are highly enriched in the human genetic variants associated with a panel of diverse, complex traits, with heritability enrichment contributed by both nucleotides under mammalian and nucleotides under primate constraint. Mammalian evolution of the human regulatory landscape. ( A ) Distribution of human cCREs by the number of genomes they align. ( B ) Projection of cCREs by alignments to the other 240 mammalian genomes. ( C ) Project of HNF4A sites (constrained, red; unconstrained, blue). ( D ) Heritability enrichment for 69 human traits in partitions of TFBSs ordered by evolutionary constraint. ( E ) Heritability enrichment for human traits by subsets of TFBSs.

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.abn7930

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2023

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

hits 1 - 10 | 14 hits