GLORIA — GEOMAR Library Ocean Research Information Access

1

Online Resource

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

Schneider, Valerie A. ; Graves-Lindsay, Tina ; Howe, Kerstin ; [et al.]

Cold Spring Harbor Laboratory ; 2017

In: Genome Research Vol. 27, No. 5 ( 2017-05), p. 849-864

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 27, No. 5 ( 2017-05), p. 849-864

Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Type of Medium: Online Resource

ISSN: 1088-9051 , 1549-5469

URL: Article

DOI: 10.1101/gr.213611.116

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2017

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

2

Online Resource

Comparative genome sequencing of Drosophila pseudoobscura : Chromosomal, gene, and cis -element evolution

Richards, Stephen ; Liu, Yue ; Bettencourt, Brian R. ; [et al.]

Cold Spring Harbor Laboratory ; 2005

In: Genome Research Vol. 15, No. 1 ( 2005-01), p. 1-18

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 15, No. 1 ( 2005-01), p. 1-18

Abstract: We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura , and compared this to the genome sequence of Drosophila melanogaster , a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis -regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis -regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis -regulatory sequences emerges as important themes of genome divergence between these species of Drosophila .

Type of Medium: Online Resource

ISSN: 1088-9051

URL: Article

DOI: 10.1101/gr.3059305

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2005

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

3

Online Resource

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Earl, Dent ; Bradnam, Keith ; St. John, John ; [et al.]

Cold Spring Harbor Laboratory ; 2011

In: Genome Research Vol. 21, No. 12 ( 2011-12), p. 2224-2241

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 21, No. 12 ( 2011-12), p. 2224-2241

Abstract: Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/ .

Type of Medium: Online Resource

ISSN: 1088-9051

URL: Article

DOI: 10.1101/gr.126599.111

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2011

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

4

Online Resource

Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes

Dolle, Dirk D. ; Liu, Zhicheng ; Cotten, Matthew ; [et al.]

Cold Spring Harbor Laboratory ; 2017

In: Genome Research Vol. 27, No. 2 ( 2017-02), p. 300-309

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 27, No. 2 ( 2017-02), p. 300-309

Abstract: We are rapidly approaching the point where we have sequenced millions of human genomes. There is a pressing need for new data structures to store raw sequencing data and efficient algorithms for population scale analysis. Current reference-based data formats do not fully exploit the redundancy in population sequencing nor take advantage of shared genetic variation. In recent years, the Burrows–Wheeler transform (BWT) and FM-index have been widely employed as a full-text searchable index for read alignment and de novo assembly. We introduce the concept of a population BWT and use it to store and index the sequencing reads of 2705 samples from the 1000 Genomes Project. A key feature is that, as more genomes are added, identical read sequences are increasingly observed, and compression becomes more efficient. We assess the support in the 1000 Genomes read data for every base position of two human reference assembly versions, identifying that 3.2 Mbp with population support was lost in the transition from GRCh37 with 13.7 Mbp added to GRCh38. We show that the vast majority of variant alleles can be uniquely described by overlapping 31-mers and show how rapid and accurate SNP and indel genotyping can be carried out across the genomes in the population BWT. We use the population BWT to carry out nonreference queries to search for the presence of all known viral genomes and discover human T-lymphotropic virus 1 integrations in six samples in a recognized epidemiological distribution.

Type of Medium: Online Resource

ISSN: 1088-9051 , 1549-5469

URL: Article

DOI: 10.1101/gr.211748.116

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2017

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

5

Online Resource

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Pruitt, Kim D. ; Harrow, Jennifer ; Harte, Rachel A. ; [et al.]

Cold Spring Harbor Laboratory ; 2009

In: Genome Research Vol. 19, No. 7 ( 2009-07), p. 1316-1323

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 19, No. 7 ( 2009-07), p. 1316-1323

Abstract: Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Type of Medium: Online Resource

ISSN: 1088-9051

URL: Article

DOI: 10.1101/gr.080531.108

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2009

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

6

Online Resource

The Atlas Genome Assembly System

Havlak, Paul ; Chen, Rui ; Durbin, K. James ; [et al.]

Cold Spring Harbor Laboratory ; 2004

In: Genome Research Vol. 14, No. 4 ( 2004-04), p. 721-732

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 14, No. 4 ( 2004-04), p. 721-732

Abstract: Atlas is a suite of programs developed for assembly of genomes by a “combined approach” that uses DNA sequence reads from both BACs and whole-genome shotgun (WGS) libraries. The BAC clones afford advantages of localized assembly with reduced computational load, and provide a robust method for dealing with repeated sequences. Inclusion of WGS sequences facilitates use of different clone insert sizes and reduces data production costs. A core function of Atlas software is recruitment of WGS sequences into appropriate BACs based on sequence overlaps. Because construction of consensus sequences is from local assembly of these reads, only small ( 〈 0.1%) units of the genome are assembled at a time. Once assembled, each BAC is used to derive a genomic layout. This “sequence-based” growth of the genome map has greater precision than with non-sequence-based methods. Use of BACs allows correction of artifacts due to repeats at each stage of the process. This is aided by ancillary data such as BAC fingerprint, other genomic maps, and syntenic relations with other genomes. Atlas was used to assemble a draft DNA sequence of the rat genome; its major components including overlapper and split-scaffold are also being used in pure WGS projects.

Type of Medium: Online Resource

ISSN: 1088-9051

URL: Article

DOI: 10.1101/gr.2264004

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2004

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

7

Online Resource

An Overview of Ensembl

Birney, Ewan ; Andrews, T. Daniel ; Bevan, Paul ; [et al.]

Cold Spring Harbor Laboratory ; 2004

In: Genome Research Vol. 14, No. 5 ( 2004-05), p. 925-928

add to mindlist on the mindlist

Details

In: Genome Research, Cold Spring Harbor Laboratory, Vol. 14, No. 5 ( 2004-05), p. 925-928

Abstract: Ensembl ( http://www.ensembl.org/ ) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to “widen” this biological integration to include other model organisms relevant to understanding human biology as they become available; to “deepen” this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.

Type of Medium: Online Resource

ISSN: 1088-9051

URL: Article

DOI: 10.1101/gr.1860604

RVK:

XA 10000

Language: English

Publisher: Cold Spring Harbor Laboratory

Publication Date: 2004

detail.hit.zdb_id: 1483456-X

SSG: 12

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher