Keywords:
Human genetics.
;
Electronic books.
Description / Table of Contents:
This book surveys statistical aspects of designing, analyzing and interpreting results of genome-wide association scans for genetic causes of disease, using unrelated subjects. Covers bioinformatics and data handling methods needed to ready data for analysis.
Type of Medium:
Online Resource
Pages:
1 online resource (344 pages)
Edition:
1st ed.
ISBN:
9781461494430
Series Statement:
Statistics for Biology and Health Series
URL:
https://ebookcentral.proquest.com/lib/geomar/detail.action?docID=1592968
DDC:
599.935
Language:
English
Note:
Intro -- Acknowledgments -- Contents -- Chapter 1: Introduction -- 1.1 Historical Perspective -- 1.2 DNA Basics -- 1.2.1 Organization of Chromosomes -- 1.2.2 Organization of DNA -- 1.2.3 DNA and Protein -- 1.3 Types of Genetic Variation -- 1.3.1 Single-Nucleotide Variants and Polymorphisms -- 1.3.2 Insertions/Deletions -- 1.3.3 Larger Structural Variants -- 1.3.4 Exonic Variation and Disease -- 1.3.5 Non-exonic SNPs and Disease -- 1.3.6 SNP Haplotypes -- 1.3.7 Microsatellites -- 1.3.8 Mitochondrial Variation -- 1.4 Overview of Genotyping Methods -- 1.4.1 SNP Calling -- 1.5 Overview of GWAS Genotype Arrays -- 1.6 Software and Data Resources -- 1.7 Web Resources -- 1.7.1 Basic Genomics -- 1.7.2 GWAS Associations -- 1.7.3 Annotation -- 1.8 Hardware and Operating Systems -- 1.9 Data Example -- 1.9.1 Save Your Work -- References -- Chapter 2: Topics in Quantitative Genetics -- 2.1 Distribution of a Single Diallelic Variant in a Randomly Mixing Population -- 2.1.1 Hardy-Weinberg Equilibrium -- 2.1.2 Random Samples of Unrelated Individuals -- 2.1.3 Joint Distribution Between Relatives of Allele Counts for a Single SNP -- 2.1.3.1 Identity by Descent -- 2.1.4 Coefficients of Kinship and of Inbreeding -- 2.2 Relationship Between Identity by State and Identity by Descent for a Single Diallelic Marker -- 2.3 Estimating IBD Probabilities from Genotype Data -- 2.4 The Covariance Matrix for a Single Allele in Nonrandomly Mixing Populations -- 2.4.1 Hidden Structure and Correlation -- 2.4.1.1 Relationship Between Balding-Nichols´ F Parameter and the Fixation Index Fst -- 2.4.2 Effects of Incomplete Admixture on the Covariance Matrix of a Single Variant -- 2.5 Direct Estimation of Differentiation Parameter F from Genotype Data -- 2.5.1 Relatedness Revisited -- 2.5.2 Estimation of Allele Frequencies -- 2.6 Allele Frequency Distributions.
,
2.6.1 Initial Mutations and Common Ancestors -- 2.6.2 Mutations and the Coalescent -- 2.6.3 Allelic Distribution of Genetic Variants -- 2.6.4 Allele Distributions Under Population Increase and Selection -- 2.7 Recombination and Linkage Disequilibrium -- 2.7.1 Quantification of Recombination -- 2.7.2 Phased Versus Unphased Data and LD Estimation -- 2.7.3 Hidden Population Structure -- 2.7.3.1 Stable Populations -- 2.7.3.2 Out Migration and Population Expansion -- 2.7.3.3 Population Admixture and Hidden Stratification -- 2.7.3.4 Hidden Relatedness Between Subjects -- 2.7.4 Pseudo-LD Induced by Hidden Structure and Relatedness -- 2.8 Covering the Genome for Common Alleles -- 2.8.1 High-Throughput Sequencing -- 2.9 Principal Components Analysis -- 2.9.1 Display of Principal Components for the HapMap Phase 3 Samples -- 2.10 Chapter Summary -- 2.11 Data and Software Exercises -- References -- Chapter 3: An Introduction to Association Analysis -- 3.1 Single Marker Associations -- 3.1.1 Dominant, Recessive, and Co-dominant Effects -- 3.2 Regression Analysis and Generalized Linear Models in Genetic Analysis -- 3.3 Tests of Hypotheses for Genotype Data Using Generalized Linear Models -- 3.3.1 Test of Hypothesis regarding Genotype Effects Testing Using Logistic Regression in Case-Control Analysis -- 3.3.2 Interpreting Regression Equation Coefficients -- 3.4 Summary of Maximum Likelihood Estimation, Wald Tests, Likelihood Ratio Tests, Score Tests, and Sufficient Statistics -- 3.4.1 Properties of Log Likelihood Functions -- 3.4.2 Score Tests -- 3.4.3 Likelihood Ratio Tests -- 3.4.4 Wald Tests -- 3.4.5 Fisher´s Scoring Procedure for Finding the MLE -- 3.4.6 Scores and Information for Normal and Binary Regression -- 3.4.7 Score Tests of beta=0 for Linear and Logistic Models -- 3.4.8 Matrix Formulae for Estimators in OLS Regression.
,
3.5 Covariates, Interactions, and Confounding -- 3.6 Conditional Logistic Regression -- 3.6.1 Breaking the Matching in Logistic Regression of Matched Data -- 3.6.2 Parent Affected-Offspring Design -- 3.7 Case-Only Analyses -- 3.7.1 Case-Only Analyses of Disease Subtype -- 3.7.2 Case-Only Analysis of GenexEnvironment and GenexGene Interactions -- 3.8 Non-independent Phenotypes -- 3.8.1 OLS Estimation When Phenotypes Are Correlated -- 3.9 Needs of a GWAS Analysis -- 3.9.1 Hardware Requirements for GWAS -- 3.9.2 Software Solutions -- 3.10 The Multiple Comparisons Problem -- 3.11 Behavior of the Bonferroni Correction with Non-Independent Tests -- 3.12 Reliability of Small p-Values -- 3.12.1 Test of a Single Binomial Proportion -- 3.12.2 Test of a Difference in Binomial Proportions -- 3.13 Chapter Summary -- Appendix -- References -- Chapter 4: Correcting for Hidden Population Structure in Single Marker Association Testing and Estimation -- 4.1 Effects of Hidden Population Structure on the Behavior of Statistical Tests for Association -- 4.1.1 Effects on Inference Induced by Correlated Phenotypes -- 4.1.2 Influences of Latent Variables -- 4.1.3 Hidden Structure as a Latent Variable -- 4.1.4 Polygenes, Latent Structure, Hidden Relatedness, and Confounding -- 4.1.5 Hidden Non-mixing Strata -- 4.1.5.1 Eigenvector Analysis -- 4.1.5.2 Varying the Number of Strata -- 4.1.6 Admixture -- 4.1.6.1 Eigen Analysis of the Relationship Matrix for Simple Admixture -- 4.1.7 Polygenes and Cryptic Relatedness -- 4.1.7.1 Effects of Hidden Relatedness -- 4.2 Correcting for the Effects of Hidden Structure and Relatedness -- 4.2.1 Genomic Control -- 4.2.2 Regression-based Adjustment for Leading Principal Components -- 4.2.3 Implementation of Principal Components Adjustment Methods -- 4.2.3.1 Estimation of K.
,
4.2.3.2 Choosing the Number of Eigenvectors to Include as Adjustment Variables -- 4.2.4 Random Effects Models -- 4.2.4.1 Introduction to Estimation of Random Effects Models -- 4.2.4.2 Software for Genetic Applications -- 4.2.4.3 Estimation of K -- 4.2.4.4 Control of Confounding Using Random Effects Models for Case-Control Data -- 4.2.5 Retrospective Methods -- 4.2.5.1 The Bourgain Test -- 4.2.5.2 An Empirical Bourgain Test -- 4.3 Comparison of Correction Methods by Simulation -- 4.3.1 Comparison of the Mixed Model and Retrospective Approach for Binary (case-control) Outcomes -- 4.3.2 Conclusions -- 4.4 Behavior of the Genomic Control Parameter as Sample Size increases -- 4.5 Removing Related Individuals as Part of Quality Control, Is It Needed? -- 4.6 Chapter Summary -- Data and Software Exercises -- References -- Chapter 5: Haplotype Imputation for Association Analysis -- 5.1 The Role of Haplotypes in Association Testing -- 5.2 Haplotypes, LD Blocks, and Haplotype Uncertainty -- 5.3 Haplotype Frequency Estimation and Imputation -- 5.3.1 Small Numbers of SNPs -- 5.3.2 Haplotype Uncertainty -- 5.4 Haplotype Frequency Estimation for Larger Numbers of SNPs -- 5.4.1 Partition-Ligation EM Algorithm -- 5.4.2 Phasing Large Numbers of SNPs -- 5.5 Regression Analysis Using Haplotypes as Explanatory Variables -- 5.5.1 Expectation Substitution -- 5.5.2 Fitting Dominant, Recessive, or Two Degrees of Freedom Models for the Effect of Haplotypes -- 5.5.2.1 Global Test for Haplotype Effects -- 5.6 Dealing with Uncertainty in Haplotype Estimation in Association Testing -- 5.6.1 Full Likelihood Estimation of Risk Parameters and Haplotype Frequencies -- 5.6.2 Ascertainment in Case-Control Studies -- 5.6.3 Example: Expectation-Substitution Method -- 5.7 Haplotype Analysis Genome-Wide -- 5.7.1 Studies of Homogeneous Non-admixed Populations.
,
5.7.2 The Four-Gamete Rule for Fast Block Definition -- 5.7.3 Multiple Comparisons in Haplotype Analysis -- 5.8 Multiple Populations -- 5.9 Chapter Summary -- References -- Chapter 6: SNP Imputation for Association Studies -- 6.1 The Role of Imputed SNPs in Association Testing -- 6.2 EM Algorithm and SNP Imputation -- 6.3 Phasing Large Numbers of SNPs for the Reference Panel -- 6.4 Brief Introduction to Hidden Markov Models -- 6.4.1 The Baum-Welch Algorithm -- 6.5 Large-Scale Imputation Using HMMs -- 6.6 Using an HMM to Impute Missing Genotype Data when Both the Reference Panel and Study Genotypes Are Phased -- 6.7 Using an HMM to Phase Reference or Main Study Genotypes -- 6.7.1 Initializing and Updating the Current List of Haplotypes -- 6.8 Practical Issues in Large-Scale SNP Imputation -- 6.8.1 Assessing Imputation Accuracy -- 6.8.2 Imputing Rare SNPs -- 6.8.3 Use of Cosmopolitan Reference Panels -- 6.9 Estimating Relative Risks for Imputed SNPs -- 6.9.1 Expectation Substitution -- 6.10 Chapter Summary -- 6.10.1 Links -- References -- Chapter 7: Design of Large-Scale Genetic Association Studies, Sample Size, and Power -- 7.1 Design Considerations -- 7.2 Sample Size and Power for Studies of Unrelated Subjects -- 7.2.1 Power for Chi-Square Tests -- 7.2.2 Calculation of Non-centrality Parameters for Chi-Square Tests in Generalized Linear Models -- 7.3 QUANTO -- 7.3.1 Use of QUANTO to Compute Power to Detect Main Effects of Genetic Variants in Case-Control Studies -- 7.4 Alternative Designs -- 7.4.1 Sibling Controls -- 7.4.2 Power for Interactions -- 7.4.3 Parent-Affected-Offspring Trios -- 7.4.4 Power for Case-Only Analysis of Interactions -- 7.5 Control for Multiple Comparisons -- 7.5.1 Single Marker Associations -- 7.5.2 More Complex Marker Associations -- 7.5.3 Reliability of Very Small p-Values -- 7.6 Two-Staged Genotyping Designs.
,
7.6.1 Measured SNP Association Tests.
Permalink