LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping

Carsten Wiuf; Jonatan Schaumburg-Müller Pallesen; Leslie Foldager; Jakob Grove

doi:10.1515/sagmb-2015-0085

Published by De Gruyter June 7, 2016

LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping

Carsten Wiuf , Jonatan Schaumburg-Müller Pallesen , Leslie Foldager and Jakob Grove

From the journal Statistical Applications in Genetics and Molecular Biology

https://doi.org/10.1515/sagmb-2015-0085

Showing a limited preview of this publication:

Abstract

In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).

Keywords: association mapping; genome scan; multiple testing; random walk

Acknowledgments

The study was supported by grants from the Danish Strategic Research Council (2101-07-0059), the Lundbeck Foundation, Denmark, and the Danish Cancer Society.

References

Carvalho, B., C. Postma, S. Mongera, E. Hopmans, S. Diskin, M. A. van de Wiel, W. van Criekinge, O. Thas, A. Matthäi, M. A. Cuesta, J. S. Terhaar Sive Droste, M. Craanen, E. Schröck, B. Ylstra and G. A. Meijer. (2009): “Multiple putative oncogenes at the chromosome 20q amplicon contribute to colorectal adenoma to carcinoma progression,” Gut, 58, 79–89.10.1136/gut.2007.143065Search in Google Scholar PubMed

Cheverud, J. M. (2001): “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87, 52–58.10.1046/j.1365-2540.2001.00901.xSearch in Google Scholar PubMed

Dudoit, S. and M. J. van der Laan (2008): Multiple testing procedures with applications to genomics, Springer Series in Statistics, Springer.10.1007/978-0-387-49317-6Search in Google Scholar

Feller, W. (1968): An introduction to probability theory and its applications, Volume I, 3rd ed. Wiley, New York.Search in Google Scholar

Fisher, R. A. (1932): Statistical methods for research workers, 1st ed. Oliver and Boyd, Edinburgh.Search in Google Scholar

Friedman, J. H. and N. I. Fisher (1999): “Bump hunting in high-dimensional data,” Stat. Comput., 9, 123–143.10.1023/A:1008894516817Search in Google Scholar

Hendricks, A. E., J. Dupuis, M. W. Logue, R. H. Myers, and K. L. Lunetta (2014): “Correction for multiple testing in a gene region,” Eur. J. Human. Genet., 22, 414–418.10.1038/ejhg.2013.144Search in Google Scholar PubMed PubMed Central

Gel, B., A. Díez-Villanueva, E. Serra, M. Buschbeck, M. A. Peinado and R. Malinverni (2016): “regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests,” Bioinformatics, 32, 289–291.10.1093/bioinformatics/btv562Search in Google Scholar PubMed PubMed Central

Gladwin, T. E., E. M. Derks, Genetic Risk and Outcome of Psychosis (GROUP), M. Rietschel, M. Mattheisen, R. Breuer, T. G. Schulze, M. M. Nöthen, D. Levinson, J. Shi, P. V. Gejman, S. Cichon and R. A. Ophoff. (2012): “Segment-wise genome-wide association analysis identifies a candidate region associated with schizophrenia in three independent samples,” PLoS One, 7, e38828.10.1371/journal.pone.0038828Search in Google Scholar PubMed PubMed Central

Iglehart, E. (1972): “Extreme values in GI/G/1 queue,” Ann. Math. Stat., 43, 627–635.10.1214/aoms/1177692642Search in Google Scholar

Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg, and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209.10.1093/ije/dyr238Search in Google Scholar PubMed PubMed Central

Jasmine, F., R. Rahaman, C. Dodsworth, S. Roy, R. Paul, M. Raza, R. Paul-Brutus, M. Kamal, H. Ahsan, and H. G. Kibriya (2012): “A genome-wide study of cytogenetic changes in colorectal cancer using snp microarrays: opportunities for future personalized treatment,” PLoS One, 7, e31968.10.1371/journal.pone.0031968Search in Google Scholar PubMed PubMed Central

Karlin, S. and S. F. Altschul (1990): “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes,” Proc. Natl. Acad. Sci. USA, 87, 2264–2268.10.1073/pnas.87.6.2264Search in Google Scholar PubMed PubMed Central

Karlin, S. and A. Dembo (1992): “Limit distributions of maximal segmental score among markov-dependent partial sums,” Adv. Appl. Prob., 24, 113–140.10.2307/1427732Search in Google Scholar

Meijer, R. J., T. J. P. Krebs, and J. J. Goeman (2015): “A region-based multiple testing method for hypotheses ordered in space or time,” Stat. Appl. Genet. Mol. Biol., 14, 1–19.10.1515/sagmb-2013-0075Search in Google Scholar PubMed

Naus, J. (1982): “Approximations for distributions of scan statistics,” J. Amer. Stat. Assoc., 77, 177–183.10.1080/01621459.1982.10477783Search in Google Scholar

Nyholt, D. R. (2004): “A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other,” Am. J. Hum. Genet., 74, 765–769.10.1086/383251Search in Google Scholar PubMed PubMed Central

Sanders, A. R., D. F. Levinson, J. Duan, J. M. Dennis, R. Li, K. S. Kendler, J. P. Rice, J. Shi, B. J. Mowry, F. Amin, et al. (2010): “The internet-based mgs2 control sample: self report of mental illness,” Amer. J. Psych., 167, 854–865.10.1176/appi.ajp.2010.09071050Search in Google Scholar PubMed PubMed Central

Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014): “Biological insights from 108 schizophrenia-associated genetic loci,” Nature, 511, 421–427.10.1038/nature13595Search in Google Scholar PubMed PubMed Central

Schwartzman, A., Y. Gavrilov and R. J. Adler (2011): “Multiple testing of local maxima for detection of peaks in 1D,” Ann. Stat., 39, 3290–3319.10.1214/11-AOS943Search in Google Scholar PubMed PubMed Central

Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2015-0085) offers supplementary material, available to authorized users.

Published Online: 2016-6-7

Published in Print: 2016-8-1

LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping

Abstract

Acknowledgments

References

Supplemental Material:

Journal and Issue

Articles in the same Issue