Introduction

Literacy and numeracy are two key learning outcomes of formal education. Given the societal importance of acquiring functional ability in these areas, there is considerable research invested in understanding the development of these skills and why people differ in their ability. Numerous factors influence variation in performance on assessments of literacy and numeracy, including socioeconomic status (SES; Sirin 2005), home environments (Smyth et al. 2010), classroom and teacher (Byrne et al. 2010; Nye et al. 2004), study (Cooper et al. 2006), confidence (Stankov et al. 2014), and genetic variation (Kovas et al. 2007). Genetic variation has generally been found to account for more than half of the variation in literacy and numeracy performance in school students in developed countries (de Zeeuw et al. 2015; Olson et al. 2014). Yet time-point specific assessment of ability is only one aspect of assessing the development of these skills. Students also vary in the rate they acquire literacy and numeracy skills. Depending on the skill measured and ability at initial performance, growth trajectories have been found to differ and may result in either increasing or decreasing achievement gaps (Pfost et al. 2014; Shin et al. 2013). Variation in growth trajectories have also been linked with SES, cognitive resources, and behavior (Baumert et al. 2012; Morgan et al. 2011). However, relatively few studies have investigated the influence of genetic variation on growth, or investigated the extent of environmental influences after controlling for genetic influences.

In this paper we examine the relative influence of genes and the environment on both stability and growth in literacy and numeracy performance in Australian twins assessed on the National Assessment Program in Literacy and Numeracy (NAPLAN). Since 2008, each year students in Grades 3, 5, 7, and 9 sit nationwide standardised tests of reading, spelling, grammar and punctuation, writing, and numeracy (Senate Standing Committee on Education and Employment 2014). Student growth is one aspect of the reporting available on these tests, and has been considered a basis to judge the value added by schools (Masters et al. 2008). However, it is important to assess the extent of genetic influence on growth before assuming that growth in scores indicates school value. Grasby et al. (2016) found genes were important contributors to variation in each of the literacy and numeracy domains at each grade level in these Australian tests. In this paper we test the genetic and environmental etiology of individual differences in both stability and growth of literacy and numeracy.

Past performance is one of the best predictors of achievement in an academic domain; students who achieve highly in one year will typically achieve highly in a later year, and those who score lower in one year will typically score lower in another (Hattie 2008). For reading, genes have been found to be the strongest contributor to this stability in relative performance between students in the early years of school (Byrne et al. 2005; Olson et al. 2011; Samuelsson et al. 2008; Soden et al. 2015), and through middle school to high school (Betjemann et al. 2008; Harlaar et al. 2007; Wadsworth et al. 2001). These studies comprise a range of measures of reading skills, including phonological awareness, word reading, spelling, vocabulary, comprehension, and reading fluency. Consistently, many of the same genes are influencing variation in performance at multiple ages with genetic correlations ranging from .65 (Harlaar et al. 2007) to 1.0 (Betjemann et al. 2008; Olson et al. 2011; Wadsworth et al. 2001). Similar results have been found for mathematics, with genetic correlations of .62, .68, and .73 reported among participants tested at ages 7, 9 and 10 (Kovas et al. 2007). From these studies, genes have explained at least 75 % of the phenotypic correlation in performance across time in both reading skills and mathematics.

Although genes substantially contribute to correlated performance over time in literacy and numeracy, it does not necessarily follow that variation in growth of performance is primarily due to genetic influences. A possible scenario could be that genes maintain the relative performance of students in a way that is largely similar among individuals and environmental factors deviate growth from this trajectory. A scenario such as this would result in high phenotypic and genetic correlations between assessments but most of the variation in growth would arise from the environment. Although such a scenario is theoretically possible, it is more likely that both genetic and environmental factors will influence variation in growth of literacy and numeracy.

Biometric growth curve models have been published on three twin studies of reading: The Western Reserve Reading Project (WRRP; Logan et al. 2013; Petrill et al. 2010), The International Longitudinal Twin Study (ILTS; Christopher et al. 2013a, b), and The Florida Twin Project on Reading (FTPR; Hart et al. 2013). All of these studies model growth of reading ability from the beginning of school to early or middle school and, where sufficient measurement times were available, showed a deceleration in rate of reading growth over time. There are some differences in the relative influence of genes and the shared environment among these studies. Both genes and the shared environment were significant contributors to variation in growth of oral reading fluency among participants in the FTPR (Hart et al. 2013). Similarly, both genes and the shared environment were significant contributors to variation in growth of word reading among participants in the WRRP (Logan et al. 2013) and among Scandinavian participants in the ILTS (Christopher et al. 2013a). However, the shared environment was not a significant contributor to variation in growth of word reading among Australian or Colorado participants in the ILTS (Christopher et al. 2013a, b). For reading comprehension, only the shared environment substantially and significantly influenced growth among participants in the WRRP (Logan et al. 2013), while genes, the shared environment, and the unique environment were comparable in size of influence among Colorado participants in the ILTS but only genes and the unique environment were significant (Christopher et al. 2013b). The only study to include a measure of spelling has been conducted on Colorado participants in the ILTS, again genetic, shared environmental, and unique environmental influences on growth were of a comparable size to each other, but only genes and the unique environment were significant (Christopher et al. 2013b). From these few studies, genes and the unique environment were generally important contributors to growth in reading while the shared environment was important in some samples (i.e. WRRP, FTPR, ILTS Scandinavia).

The current study will first build on previous work by assessing the genetic contribution to stability of performance in reading, spelling, grammar and punctuation, writing, and numeracy in students from Grade 3 through to Grade 9. Thus far, genetic influences on stability in reading and spelling have been assessed in Australian children only from preschool to Grade 2 (Byrne et al. 2005, 2007, 2009). Given the substantial contribution of genes to stability in performance in younger Australian children and in students from the USA and the UK we expect the longitudinal phenotypic correlations in reading and spelling from Grade 3 to Grade 9 will mostly be due to genes. To date, there are no studies on the longitudinal influence of genes on stability in grammar and punctuation or writing performance, although, genes are substantial contributors to variation in both of these domains, .58 for grammar and punctuation and .45 for writing (when averaged across all four grades; Grasby et al. 2016). Longitudinal genetic influence on math performance has only been reported on students aged 7 through to 10 in the UK (Kovas et al. 2007); this age-span is similar to our first two grades of testing and we expect to replicate strong genetic contribution to stability in numeracy performance.

Secondly, the current study will expand research into the etiology of variation in growth by using biometric growth curve models to assess growth in several achievement domains not previously studied, namely grammar and punctuation, writing, as well as numeracy. Moreover, our model will assess if the genetic and environmental influence on growth are similar for females and males. Some phenotypic studies have found differences between reading and numeracy in growth. A compensatory growth pattern has been found in reading, such that poorer readers catch up a little to strong readers, while a more stable or even fan growth pattern has been found in numeracy, such that those who initially perform better also improve faster than those who initially performed poorer (Baumert et al. 2012; Morgan et al. 2011; Shin et al. 2013). Thus, it is feasible that the influence of genes and the environment might differ across academic domain. In contrast with the studies from the ILTS, WRRP and FTRP where growth was modelled from the first year or two of schooling, Grade 3 is our initial year of testing, by which grade the rapid initial growth in reading performance has typically slowed (Bloom et al. 2008). Previous biometric growth studies that show substantial influence from the shared environment might do so due to variation in early instruction in learning to read, which will not be captured in the current study. However, the tests in the current study are designed to assess achievement in line with educational curricula (Senate Standing Committee on Education and Employment 2014); as such, we expect to see continued growth and variation in growth over the grades and it is possible that differences in teaching methods and school factors, which would mostly emerge as shared environmental effects in a twin study, will contribute to growth beyond the initial rapid development in reading. In summary for growth, we will assess the extent of genetic and environmental influences on growth, if genetic and environmental influences are similar for females and males, and if compensatory or fan patterns of growth are due to the influence of genes, the environment, or both. In determining the extent that variation in growth is attributable to environmental influence, we will thus evaluate the appropriateness of attributing variation in growth to teachers and schools.

Methods

Participants

Twins and triplets born from 1993 to 2006 were recruited through the voluntary Australian Twin Registry. For the 34 sets of triplets, a random pair from each set was selected for the analyses, and from hereon all multiple births are referred to as “twins.” Twins were invited to participate if they had sat (or would sit by 2014) a NAPLAN test. Of the 6853 families contacted, 2272 (33 %) consented to participate. Of those who consented to participate, state departments provided NAPLAN results for 1949 families and of these zygosity information was reported from 1940 families. For the current study, twins who either skipped or repeated grades were not included, resulting in 1927 twins who sat NAPLAN tests in the same year and with 2-year intervals between tests.

Zygosity was determined with a short questionnaire (Lykken et al. 1990), which classified a sub-sample of twins in this study with 95 % accuracy when compared to parent report on DNA results. Our final sample of twin pairs included 865 monozygotic (MZ; 454 female, 411 male) and 1062 dizygotic (DZ; 301 female, 286 male, 475 opposite-sex). Table 1 details the number of twin pairs by gender and zygosity for each test. Due to the introduction of the tests in 2008, data are not available for all participants at each grade, with some participants being too old to have sat NAPLAN in their early grades of school and others not yet old enough to have taken the later years. Thus, most of the longitudinal missing data is missing by design. The number of twin pairs at each grade with longitudinal numbers specified in parenthesis was: 1178 in Grade 3; 1101 in Grade 5 (773 in Grade 3); 990 in Grade 7 (717 in Grade 5 and 413 in Grade 3), and 809 in Grade 9 (653 in Grade 7, 408 in Grade 5, and 149 in Grade 3). The average age in Grade 3 at the time of testing was 8.6 years.

Table 1 Descriptive statistics, intraclass correlations by sex and zygosity, and heritability estimates for each domain and grade

Materials

National Assessment Program in Literacy and Numeracy

The NAPLAN is a nationwide, standardised assessment introduced in 2008 in Australia. Each year, students in Grades 3, 5, 7, and 9 sit tests in reading, writing, language conventions, and numeracy. The test content is based on the “statements of learning for English” and the “statements of learning for mathematics,” which inform state and territory curricula. For each calendar year, test calibration and scaling was based on the Rasch model. Scores for each achievement domain were equated across grades, based on common items in tests from adjacent grades, and equated with historical years, based on administering a sub-sample of students equating tests and the NAPLAN tests. Raw scores were transformed into a score on a common scale from 0 to 1000. This scaled score spans all years of the test and was designed to measure growth within cohorts (i.e. calendar year) and to compare across cohorts. Technical information and test administration details were obtained from the Australian Curriculum Assessment and Reporting Authority (2015a, b; R. Randall, personal communication, July 10, 2013). Example test papers and writing prompts are available at www.nap.edu.au.

Reading comprehension

The reading comprehension test was composed of 7–8 passages. The passages were extracts or adaptions from books, newspaper articles, posters or poems. Passage length varies from brief single paragraphs of about 100 words, to several paragraphs of about 450 words in total. There were 5–8 items relating to any given passage. Most items are multiple-choice format, with a few short answer questions in each test. For Grades 3 and 5 there were 35–38 items to be completed in 45–50 min, and for Grades 7 and 9 there were 45–50 items to be completed in 65 min. A Cronbach’s alpha of .85 (or above) for each test in each year indicates a high internal reliability.

Spelling

The spelling test presents misspelt words in simple sentences and requires students to identify and correct the spelling errors. For Grades 3 and 5 there were 23–25 items, and for Grades 7 and 9 there were 25–30 items. The spelling test is administered in the same paper as the grammar and punctuation test, and students are given 40–45 min to complete both of these question sets. For the spelling test, a Cronbach’s alpha of .90 (or above) for each test in each year indicates a high internal reliability.

Grammar and punctuation

The grammar questions ask students to choose the correct word(s) to complete a sentence. This form of question is used in early grades to identify correct tense, pronouns, conjunctions, and verb forms. In later grades relative pronouns, clauses, and comparative adjectives are also assessed. The punctuation questions ask students to insert or identify punctuation marks at the correct location in a sentence. For all grades there were 23–28 items. A Cronbach’s alpha of .71 to .87 for each test in each year indicates an acceptable internal reliability (average .80).

Writing

The writing test is composed of a writing stimulus, which provides an idea or topic, and students are asked to write a response in a specified writing style (i.e. narrative, informative, or persuasive). For example, “It is cruel to keep animals in cages. What do you think? Do you agree or disagree? Perhaps you can think of ideas for both sides of this topic.” The same prompt and style is used for all grade levels in a given year. Students have 40 min of writing time. Marks are awarded on 10 criteria: audience, text structure, ideas, vocabulary, cohesion, paragraphing, sentence structure, punctuation, spelling, and the final criterion depended on the writing style specified. For persuasive writing the criterion was persuasive devices (2011–2014), and for narrative writing the criterion was character and setting (2008–2010). From 2008 to 2010 the maximum score was 47, from 2011 to 2014 the maximum score was 48. A Cronbach’s alpha, calculated using pooled data from all grades, of .93 (or above) for each test year indicates a high internal reliability; unfortunately we were unable to obtain inter-rater reliability information.

Numeracy

The numeracy test assesses five aspects of mathematics. Working mathematically includes problem solving, reasoning and interpretation. Number includes counting and computation. Algebra, function and pattern includes working with functions and relationships, graphs, equations, and rules. Measurement, chance and data includes working with units, likelihood and inference. Space includes shape and location. Most items are multiple-choice format, with a few short answer questions in each test. For Grade 7 and Grade 9, students sit a calculator-allowed and a non-calculator numeracy test. For Grades 3 and 5 there were 35–40 items to be completed in 45–50 min. For Grades 7 and 9 there were 62–64 items from the combined calculator and non-calculator papers, with each paper to be completed in 40 min. A Cronbach’s alpha of .84 (or above) for each test in each year indicates a high internal reliability.

Procedure

Along with consent forms, parents completed a questionnaire detailing the school attended at the time of each NAPLAN test, the home environment, child medical history, and zygosity. After receiving parental consent, the state and territory departments of education provided NAPLAN test results.

The NAPLAN tests are administered in the morning over three consecutive days each year in the second full week of May (approximately 3.5 months into the school year). On the first day the language conventions test (comprised of the spelling and grammar and punctuation domains) is administered and, after a minimum 20 min break, is followed by the writing test. On the second day the reading test is administered. On the third day the numeracy tests are administered; for Grades 7 and 9 the first test permits use of a calculator and the second test does not. Support within specific constraints can be provided for students with disability, such as scribing or reading questions in the numeracy test. Across the nation 96 % of students participate in the tests.

Analyses

Phenotypic growth curves were estimated with the R package lavaan (Rosseel 2012). Biometric models were estimated using the scaled scores and full information maximum likelihood estimation in OpenMx, which uses all available data (Boker et al. 2011). Sex, age, age-squared, age-by-sex, and cohort effects have been found to covary with mean performance, and sex has been shown to moderate heritability in some of the domains and grades (details in Grasby et al. 2016). Thus, age, age-squared, age-by-sex, and cohort effects were regressed out of the scaled scores. Sex effects were tested within the latent growth curve model for each domain (detailed below).

Before obtaining genetic, shared environmental and unique environmental correlations (r A, r C, r E respectively), sex effects were tested within a sex-limited correlated factors model (Neale et al. 2006). This model estimated the path loadings and correlations across time separately for each sex. Sex effects were first tested by equating all parameters to be equal across sex and significance was determined using the log likelihood ratio test, which compares the difference in log likelihood from nested models to a χ2 distribution with degrees of freedom equal to the difference in estimated parameters from the nested models (Rijsdijk and Sham 2002). If sex effects were present, the correlational structure alone was tested for sex effects by equating the correlations but allowing the path estimates to vary by sex; this effectively allowed for quantitative sex differences.

A phenotypic growth curve was initially estimated to determine the shape of growth for each domain. All participants were used in estimation of the phenotypic growth curves; non-independence between twins in a pair will bias standard errors but not parameter estimates, which are of interest in these phenotypic analyses. Growth was centered at Grade 3; growth to Grade 5 was fixed to 1, and growth to Grade 7 and to Grade 9 were estimated (parameters g7 and g9 respectively in Fig. 1). The estimates from the phenotypic models were used to specify the shape of growth for the biometric growth curve models.

Fig. 1
figure 1

Biometric latent growth model. The left side of the figure represents Twin 1 and the right side represents Twin 2. I = intercept; G = growth; g7 = Grade 7 growth parameter from the phenotypic model; g9 = Grade 9 growth parameter from the phenotypic model. Each path from latent additive genetic (A1, A2), shared environment (C1, C2), and unique environment (E1, E2) were moderated by sex, but is not depicted for lack of space. Correlations between the time-specific error variances (u) were equated across grades, but were allowed to vary between monozygotic (MZ) and dizygotic (DZ) twins

The biometric latent quadratic growth curve model is depicted in Fig. 1. The model estimates variation in performance at an intercept (I) and for growth (G) (McArdle et al. 1998 discusses in detail). Variation in the intercept and growth and their covariance are decomposed into additive (A), shared environmental (C), and unique environmental (E) sources. In line with classic twin design methodology (see Plomin et al. 2013), genetic correlations between MZ twins are fixed to 1, while those of DZ twins are fixed to .5; shared environmental correlations are fixed to 1 for all twins. The deviations of scores from that predicted by the growth curve are modelled as time-point specific error variance (u). The biometric growth curve models used on the WWRP, FTPR, and ILTS data have some differences. In particular, the errors of the growth curve model of the ILTS data were allowed to correlate between twins in a pair (Christopher et al. 2013a, b). Moreover, Christopher et al. (2013b) showed that models with correlated errors resulted in a better fit than models with uncorrelated errors. Given non-independence within twin pairs it is likely that there is some interdependence between twin pairs in these error variances, such as stressful family-level events. Christopher et al. (2013b) provided a contrast of results from both models, and showed that the influence of the shared environment on growth in word reading was significant when estimated in the model with uncorrelated errors but was negligible when errors were allowed to correlate. Thus, in the ILTS data, the biometric latent growth curves with uncorrelated errors inflated the effect of the shared environment on variation in growth. To reduce this possible source of bias in measuring variance in growth, in our model we allowed the errors between twin pairs to correlate. This error correlation (r U) was estimated separately for MZ and DZ twins to allow for possible genetic influences in the errors. Sex was modelled as a covariate on the latent intercept and slopes and as a moderator on the A, C, and E variance–covariance structure. For each domain, sex was dropped from the covariance structure and significance was determined using the log likelihood ratio test.

Results

Descriptives and longitudinal correlations

Descriptive statistics, intraclass correlations, heritability, shared and unique environmental estimates and total variance for each grade and subject are reported in Table 1. The pattern of change in means over time show scores increase at a decreasing rate in each domain. The heritability estimates were substantial and significant in all domains and grades, most shared environmental estimates were small, and the unique environmental estimates (including measurement error) were greatest in the writing domain. Total variance decreased with increasing grade in all domains except for writing, which increased in total variance with increasing grade. Phenotypic correlations were high among most grades for each domain, though the correlations across grades for writing were more moderate (see Table 2). Correlations were equally high for grades further apart in time as for adjacent grades.

Table 2 Phenotypic correlations among the grades in each domain

Equating sex in the correlated factors model resulted in no significant loss of fit for reading, χ2(62) = 53.3, p = .778, spelling, χ2(62) = 75.6, p = .115, grammar and punctuation, χ2(62) = 40.3, p = .985, or writing, χ2(62) = 57.9, p = .626. By contrast for numeracy, correlations could be equated across sex, χ2(50) = 34.99, p = .947, but not path loadings as well, χ2(62) = 87.84, p = .017. For numeracy, this indicated that the correlational structure was similar for females and males but the relative influence of genes and the environment on performance in numeracy was different. Accordingly, for all domains females and males were combined for analyses of correlations across time.

The genetic correlations within domain and across time were high; 95 % confidence intervals typically included 1, and the estimates indicate that mostly the same genes were influencing performance at each grade level (see Table 3). Similarly, shared environmental correlations for reading were large and mostly significant, but for the other domains most estimates had wide confidence intervals that indicated non-significant estimates. The unique environmental correlations for reading, grammar and punctuation, and numeracy were modest but most were significant. For numeracy, they tended to be more substantial between the later grades. Spelling had larger unique environmental correlations than the other domains (average .48). This indicates that generally some unique environmental factors were present and influential at multiple grades for these domains. However, for writing the unique environmental correlations were small and most were non-significant. The mostly modest unique environmental correlations across grades suggest that much of the unique environmental estimates at each grade level may have been due to measurement error.

Table 3 Genetic, shared environmental, and unique environmental correlations between grades for each domain

Biometric growth curve

The shape estimates from the phenotypic growth curves were rounded to one decimal place and tested for fit against the observed data. Shape estimates and fit indices are reported in Table 4; fit was very good in each domain, where an RMSEA below .01, .05, and .08 indicate excellent, good, and acceptable fit, respectively (MacCallum et al. 1996).

Table 4 Phenotypic growth estimates for Grade 7 and Grade 9 and fit indices for each domain

For the biometric growth curve models sex could be dropped as a moderator on the covariance structure without significant loss of fit for reading, χ2(18) = 24.06, p = .153, and grammar and punctuation, χ2(18) = 18.35, p = .433. However, sex could not be dropped for spelling, χ2(18) = 40.26, p = .002, writing, χ2(18) = 35.40, p = .008, or numeracy, χ2(18) = 61.54, p < .001, indicating sex differences in the quantitative contribution of genes and the environment to variation in growth of performance in spelling, writing, and numeracy. Significant sex differences were found on the intercept in all domains; sex was coded with females as 0 and males as 1 so the estimates in Table 5 indicate that girls scored higher on the literacy domains and boys scored higher on numeracy. Sex effects on growth indicated that growth in performance for boys was significantly steeper than for girls in reading, spelling, and numeracy. In combination these sex effects indicate that for reading and spelling, girls scored higher than boys in Grade 3 but this sex effect diminished over time as the boys had a steeper rate of growth. For grammar and punctuation and writing girls performed higher than boys by a relatively stable margin over the grades. For numeracy, boys scored higher than girls in Grade 3 and this effect increased in size as the grades progressed.

Table 5 Estimated mean, variance and sex effect on intercept and growth, and error correlations for each domain

Variation in the intercept and growth are reported in Table 5. All domains are scored on a scale of 0–1000 with similar means across the domains at each grade. As such, writing had a noticeably constricted variation in performance on the intercept when compared to the other domains; this was also the only domain to increase in variation as the grades progressed. For spelling, writing, and numeracy, where sex could not be dropped from the covariance structure, boys had greater variance than girls on both intercept and growth. The variation in growth in numeracy was narrower than the literacy domains, indicating that growth in NAPLAN numeracy was quite similar particularly among girls.

Correlations between errors of twin pairs were significant in each domain. Most were small in size, indicating that most of the time-point-specific error variances are uncorrelated. However, correlations were larger between MZ twins than DZ twins, and could not be equated in any domain without a significant loss of model fit. This indicated some genetic factors contribute to the deviation of scores from that predicted by the growth curve.

Genetic and environmental influences on growth

Standardised A, C, and E variance components of the intercept and growth are reported in Table 6 for each domain, where A1, C1, and E1 represent the intercept and A2, C2, and E2 represent growth. On the off diagonal are reported the standardised A, C, and E covariances between the intercept and growth. On the intercept, genetic influences were significant for each domain, standardised variance estimates ranged from .56 for numeracy performance among girls to .89 for spelling performance among girls. Shared environmental influences were modest and significant for all domains except for spelling among girls, estimates ranged from .06 for numeracy among boys to .41 for numeracy among girls. The unique environmental influence on the intercept for each domain was generally small and significant, estimates ranging from .04 for numeracy among girls to .20 for writing among boys.

Table 6 Estimates of standardised A, C, E variance and covariance components of the intercept (A1, C1, E1) and growth (A2, C2, E2) for each domain

For reading, genetic influences on growth were substantial and significant, with a modest influence from the unique environment. Much of the genetic influence on growth was shared with that of the intercept. The genetic correlation (rA) between the intercept and growth was −.76 (calculated from the genetic covariance divided by the square root of the product of the genetic variance of the intercept and growth [−.59/sqrt(.80 * .74) = −.76]. About half of the unique environmental influences on growth were shared with the intercept (unique environmental correlation, rE = −.54). Negative covariation between intercept and growth indicated that (on average) individuals who scored higher in Grade 3 had a slower rate of growth than individuals who scored lower in Grade 3. The genetic and unique environmental correlations between intercept and growth indicate many of the genes and about half of the unique environmental factors that influence individuals to score higher in Grade 3 also contribute to less growth.

For spelling, variation in growth for girls was significantly influenced by genes, the shared environment, and the unique environment. The genetic influence was modest, while the shared environment was the most substantial influence. This shared environmental influence on growth was independent of any factors influencing performance in Grade 3, as there was no influence from the shared environment on performance in spelling for girls in Grade 3. For boys, the shared environment was the only significant influence on variation in growth of spelling, and the same factors that influenced growth also influenced spelling in Grade 3 (shared environmental correlation, rC = −.99).

For grammar and punctuation, genetic influences were not significant. The shared environment was the most substantial influence on growth with a more modest influence from the unique environment. There was both overlap and independence of factors influencing growth and performance in Grade 3 (rC = −.67 and rE = −.33).

For writing, genetic influences on growth for girls were small but significant. The unique environment was the most substantial influence on growth for girls, with most factors that influenced growth also influencing performance in Grade 3 (rE = −.92). For boys, the shared and unique environments influenced growth to a similar extent, with approximately half of the factors influencing growth also influencing performance in Grade 3 (rC = −.47 and rE = −.42).

For numeracy, genes and the shared environment influenced growth to a similar extent among girls, and the factors were mostly the same as those that influenced variation in Grade 3 (rA = 1.0 and rC = −.89). The positive genetic correlation in numeracy indicates that, unlike reading, the genes that influenced girls to score higher in Grade 3 also influenced girls to have a faster rate of growth. However, no overall fan effect in growth was evident for girls, as this positive correlation between intercept and growth on the genetic factor was counter-balanced by a negative correlation between intercept and growth on shared environmental factors. No genetic influence on growth in numeracy was evident among boys; instead both the shared environment and unique environment were substantial influences. Unlike the other domains, most of the factors influencing growth in numeracy among boys were independent of those influencing performance in Grade 3 (rC = −.18 and rE = −.09).

Discussion

The purpose of this paper was to examine the relative influence of genes and the environment on longitudinal stability and growth in performance on various measures of literacy and numeracy in Australian school students through the middle years of formal education. Consistent with research into the development of reading skills, genetic factors were the strongest contributor to stability in performance over time, not only in reading, but in all domains. In contrast, the etiology of variation in growth differed by domain. For reading, genetic factors were the strongest influence on growth with some influence from unique environmental factors. For spelling, the shared environment was the strongest influence on growth for both girls and boys, with a smaller influence from genes and the unique environment for girls. For grammar and punctuation, shared environmental factors were again the strongest influence on growth with some influence from unique environmental factors. For writing, unique environmental influences were the strongest influence on growth for girls, while both shared and unique environmental factors were significant for boys. For numeracy, both shared and unique environmental factors were again significant for boys, while genetic and shared environmental factors were substantial influences on growth for girls.

Longitudinal stability

Strong phenotypic correlations across the four grades for reading, spelling, grammar and punctuation, and numeracy indicated a high level of stability in relative performance over time. Reading, numeracy, and particularly spelling correlations were very high (respectively .73, .77, and .81 averaged across all grade comparisons), and genes mediated most of this stability in performance (74 % for reading, 81 % for numeracy, and 87 % for spelling). Grammar and punctuation phenotypic correlations were a little lower (.66 on average), but were also predominantly mediated by genes (76 %). The reported internal reliabilities for the grammar and punctuation test were more variable from year-to-year than the other domains, and greater measurement error in the test might account for slightly lower phenotypic correlations between grades.

Performance in writing was somewhat less stable than the other domains, with phenotypic correlations among grades of .51 (on average). From grade specific estimates, the unique environment accounted for as much of the variation in performance as genes. However, small unique environmental correlations between grades and high genetic correlations resulted in genes mediating most of the phenotypic correlations in writing (72 % on average). This strong influence from the unique environment with low unique environmental correlations over time indicates a lack of continuity in the substantial influence of the unique environment. This is consistent with measurement error, but it could also result from genuine time-specific unique influences in the writing tests. The tests require students to develop a coherent argument or narrative based on a prompt; as such, it is reasonable to expect individuals to produce more creative or inspired work in some years compared to other years depending on their personal experiences and interests. The strength of genetic mediation on the stability in writing performance might stem from some of the foundation skills of writing, such as vocabulary, spelling, grammar, and punctuation. Stability of performance in these skills has been shown in previous work (Olson et al. 2011; Samuelsson et al. 2008) and in this current study, to be strongly mediated by genes.

Genetic correlations near to unity indicate that, for each domain, essentially the same genes were influencing performance across the different grades. For reading, this finding is consistent with results from simplex models conducted on reading assessments in the FTPR (Hart et al. 2013), the ILTS and WRRP (Soden et al. 2015). These studies all found genetic factors at Grade 1 continued to influence performance through all the grades assessed (Grades 4, 5, or 6 depending on the study duration), but none found significant novel genetic influences after Grade 3. Our findings extend this evidence of genetic stability in performance to other literacy domains and to numeracy. In contrast to our results and those from the USA, novel genetic influences on later ages have been found for both reading and math in children assessed as ages 7, 9 and 10 in the UK (Kovas et al. 2007). The differences in these findings might be linked to the different forms of assessment employed across these studies, teacher ratings in the UK study, oral reading fluency in FTPR, and reading comprehension in the ILTS, WRRP, and the current study. Fundamentally, for these NAPLAN data the genetic variation among students in Grade 3 continues to influence performance and contribute to relative stability in performance through to Grade 9.

Growth

In each domain, genetic variation was the strongest contributor to the intercept of the growth function. The heritability of these intercepts was slightly higher than general estimates of heritability in performance in Grade 3, because some of the total variance in Grade 3 is modelled as error variance in the latent growth curve. For reading, girls scored higher than boys in Grade 3 but boys had a faster rate of growth than girls, resulting in a reduced sex effect on mean scores over time. Despite this, there were no significant sex differences in the relative contribution of genes and the environment to this difference in growth. Genes were the most substantial contributor to individual differences in growth in reading, and these were largely same genes that influenced performance at Grade 3. The unique environment had a more modest influence on growth in reading, indicating that the role of specific environmental impacts on variation in reading growth from Grade 3, such as different teachers, instruction methods, or interests, is much less substantial than that of genes.

The negative covariation between the intercept and growth are consistent with a compensatory model of reading development, such that students with poorer initial performance catch up a little to students who are better readers. Moreover, our results show that this gain is largely due to genetic factors. In Pfost et al.’s (2014) review, a compensatory developmental pattern—such as this—was associated more strongly with constrained reading skills. Constrained skills are skills universally mastered, such as letter knowledge or phonics, and individuals primarily differ in their age of acquisition and duration until mastery (Paris 2005). Interestingly, reading comprehension, which is the skill assessed in these NAPLAN data, is not a constrained reading skill. However, in Pfost et al.’s review reading comprehension was also associated more with compensatory growth than an increasing achievement gap. It is possible that the compensatory growth observed in reading comprehension is due to the compensatory growth pattern in the constrained reading skills that are precursors to reading comprehension. These precursors, or codependent reading skills, to reading comprehension are substantially influenced by genetic variability in the early years of school (Byrne et al. 2005; Petrill et al. 2007). Given the genetic nature of the compensatory growth in NAPLAN reading, it seems that this relative improvement in poorer readers may largely be due to an inherent developmental delay in the mastery of necessary reading skills.

Results for grammar and punctuation had some similarity to reading; girls scored higher than boys in Grade 3 and boys had a faster rate of growth than girls, which lessened the achievement gap between girls and boys over time. Furthermore, there were no significant sex differences in the relative contribution of genes and the environment to growth. Like reading, genes, the shared environment, and the unique environment all made significant contributions to variation in performance at initial testing in Grade 3. Grammar and punctuation also showed compensatory growth, such that the poorer performing students in Grade 3 tended to catch-up a little to the higher performing students. However, the results differed from reading regarding the influences on growth; for grammar and punctuation the environment contributed to variation in growth. Therefore, the relative improvement in poorer performers in the case of grammar and punctuation is due to environmental influences, most of which are shared environmental factors. These would include potential effects from the home environment, teachers, and schools.

For the remaining literacy domains, spelling and writing, girls and boys significantly differed in the relative contribution of genes and the environment to variation in growth. As with the other literacy skills, girls scored higher than boys in Grade 3, but there were no significant sex differences in average growth, indicating a relatively stable achievement gap between girls and boys in spelling and writing over time. Although there were small genetic influences on growth in both spelling and writing for girls, like grammar and punctuation the environment was the strongest influence. For spelling, the shared environment was the most substantial and significant influence on growth. This was also a compensatory growth pattern, indicating a possible instructional influence on relative improvement in poorer spellers over time. For writing, unlike the other literacy domains, there was no overall compensatory growth for poorer writers. For girls the principal influence was the unique environment, while for boys both the shared and unique environments were important influences on variation in growth of writing. This sex difference in environmental influence is interesting; however, around each of these estimates there are wide confidence intervals. It is possible there is a fundamental difference to the way that girls and boys are responding to their environments, either home or school, which impacts growth of their writing performance. However, given the lack of any other biometric longitudinal analysis in writing, replication would be desirable before refining too much on the influence of unique versus shared experiences on gender differences in growth of writing.

In contrast to the literacy domains, boys scored higher than girls in Grade 3 numeracy and had a higher rate of growth. This resulted in increasing sex differences over time. At initial testing in Grade 3, shared environmental influences were stronger for girls than boys and genetic influences were stronger for boys than girls. Like writing there was no evidence of compensatory growth. For girls both genes and the shared environment fairly evenly influenced variation in growth of numeracy, while for boys it was influenced by the shared and unique environments. Moreover, the environmental influences on growth for boys were predominantly independent from the environmental influences on performance in Grade 3. This independence of environmental influences on growth indicates that the factors influencing boys are either being experienced after Grade 3 or are only relevant to performance after Grade 3. This could include the impact of different teachers or classes over time through these middle years of school.

Implications

For the most part many of the influences on variation in growth are already present and influencing performance in Grade 3. In many cases this is part of a compensatory growth pattern where those who were performing higher in Grade 3 are then growing more slowly, essentially the higher performing students are not experiencing the same increase in performance as those who performed poorer in Grade 3. This might reflect a ceiling effect for high performers in these NAPLAN tests, at least in our sample.

The influence of genetic factors on growth for many of the domains suggests a tempering of the claim that growth in NAPLAN performance reflects the value added by the school. This is particularly important as far as reading is concerned, where variation in growth was strongly influenced by genes. The timing of the NAPLAN tests in the school year and biennial administration means that growth in performance cannot be considered an accurate measure of teacher or class effect, but growth has been suggested to reflect a school effect. Most of our twins attended the same school (95, 98, 95, and 92 % in Grades 3, 5, 7, and 9 respectively); as such, the effect of variation in school would predominantly be a shared environmental effect. Other than reading, the shared environment was a significant influence on growth in NAPLAN performance, so our results are consistent with schools influencing growth. Unfortunately our results are not conclusive on this interpretation, as the shared environment includes everything that influences growth in performance that twins have in common, not least of which is the same family environment.

Limitations

This inability to tease apart the influence of the home from the school environment, particularly regarding the shared environmental influence on variation in growth of spelling, grammar and punctuation, writing, and numeracy limits the extent that these findings can be interpreted. There was considerable overlap in the environmental influence on growth and on performance in Grade 3, indicating that factors affecting growth are stable for individuals over time or have a long-term influence. However, with Grade 3 as the initial testing time it is not possible to assess if any long-term influences on growth are from family factors, perhaps even pre-dating formal schooling, or if they are from the early years of school. Even where there is no substantial overlap in environmental factors between growth and initial performance, as with numeracy among boys, it is not possible to determine from these models what the specific factors are that influence growth and whether they are educational in origin or not.

Conclusions

There were two main goals of this paper: to assess the relative genetic and environmental influences on (a) stability and (b) growth in literacy and numeracy in Australian school students. Genes were the predominant influence on stability in performance in reading, spelling, grammar and punctuation, writing, and numeracy. Phenotypic correlations were high among all grades, and genes were the principal mediator (78 % on average) of this stability in performance. Genes were also the main influence on growth in reading. Many of the same genes that contributed to variation in performance at initial testing also influenced growth in reading, and in such a way that those who performed poorer at Grade 3 closed the achievement gap a little in subsequent grades. For the other literacy domains of spelling, grammar and punctuation, and writing environmental factors were the principal influences on growth, and in the case of spelling and grammar and punctuation these environmental influences contributed to a lessening of the achievement gap over time. In contrast to the literacy domains, boys outperformed girls at initial testing in numeracy and the achievement gap increased over time. For girls, genetic and shared environmental factors influenced variation in growth of numeracy and these factors were the same as those that influenced initial performance. Whereas for boys, shared environmental factors influenced variation in growth and these factors were essentially different to those that influenced initial performance.