SLA Topics   Vivian Cook 

V.J. Cook: L2 Users and English Spelling
Journal of Multilingual and Multicultural Development 18, 6, 474-488, 1997



This paper compares the spelling of adult L2 users of English with native L1 users, both children and adults, using data from the 1993 NFER survey of L1 children, from the 1980 Wing and Baddeley corpus, from a UK university EFL test for overseas students and from work by overseas students in England. An overall comparison showed similar error rates in L1 children and L2 adults and a similar distribution of errors both for L1 adults and children and for L2 users across the familiar categories of letter insertion, omission, substitution and transposition, apart from a lower proportion of omission errors for L2 users. More detailed comparisons found that, while some errors were particular to certain groups, such as <l><r> and epenthetic <e> for Japanese, others were common with all users, such as consonant doub­ling, vowels representing schwa and digraph reversals <hg>. Much of the errors reflect problems with sound/ letter correspondences, some with individual words such as because. Yet overall L2 users can perform at a level equivalent to a fifteen-year-old child, unlike most other areas of language.

In the popular view, correct spelling is a sign of education; a spelling mistake is a solecism that betrays carelessness or plebeian origins1. Spelling is thus a crucial factor in the way people present themselves. However iconoclastic or revolutionary their views, political writers would not use anything but standard spelling. Editors of academic journals insist on either British or American spelling even though they would find it unthinkable to dictate that speakers at conferences should speak with a British or an American accent. Spelling features prominently in the National Curriculum in the UK; examination boards are instructed to deduct marks for poor spelling in all subjects.

Effective spelling is then important for users of a second language because of its social overtones, if for no other reason. Yet the amount of attention given to it in research is minimal. Journals seldom publish anything about it; introductions to the field never mention it, whether Ellis’s mammoth introduction to second language acquisition research (Ellis, 1994) or Baker’s introduction to bilingualism (Baker, 1993).


The intention of this paper is then to remedy this lack by comparing the spelling of adult L2 users of English with that of L1 children and L1 adult users. The terms ‘L1 user’ and ‘L2 user’ are preferred here to ‘native speaker’ and ‘L2 learner’ because, on the one hand, of the oddness of speaking about writers as ‘speakers’, and, on the other, of the connotations in ‘L2 learner’ that people are forever learners in a second language.


A starting-point is the ‘standard’ dual-route model of reading aloud, shown in Figure 1, which consists of phonological and visual routes from perceived words to pronunciation. There are many versions of this figure, the one here being adapted from the ‘generic reading aloud model’ depicted in Paap, Noel and Johansen (1992). The phonological route relates written letters to spoken sounds through rules for sound/ letter correspondences, such as the correspondence between the letter <n> and the phoneme /n/ in the English words son and bent. The visual route describes how individual words are accessed through a lexical store without passing through phonology, as in words like yacht or though Users do not employ the same process for all words but shift between them. Even in a sound-based system such as English, frequent words such as wash and heard are accessed through the visual route (Seidenberg, 1985). The phonological route is, however, always available in English as a default. Confronted with a word that we have never encountered before such as radiciflorous or with a newly invented word such as Diageo (a company formed out of Guinness and GrandMet), we can attempt to say it aloud by using the most likely letter/sound correspondences.

The dual-route model also serves to distinguish the major types of writing system found in different languages. Character-based scripts like Chinese primarily use the visual route. The character  means ‘person’ without reference to its spoken form, which may vary in the different ‘dialects’ of Chinese. The link is between the symbol and its meaning, to a greater or lesser extent. Alphabet-based scripts like Spanish or syllable-based scripts like Japanese kana primarily use the phonological route. The hiragana  stands for the whole syllable ta; the pronunciation of most Spanish words is predictable from the letters. Other languages are not so clearly divided between the two routes. The English script is not entirely sound-based: it relates partly to morphology, as in the constant <s> spelling for the third person present tense “s” in fits, annoys and snatches despite its three variant pronunciations /s/, /z/ and /Iz/;it also relates partly to the grammar, as in <th> corresponding to /q/ in content words such as thumb but to // in grammatical words such as they. And indeed it has a large amount of visual examples such as hiccough. For these reasons some researchers see the two routes not as distinct but as a single dimension of ‘orthographic depth’. Languages such as Chinese are orthographically deep, i.e. meaning-based, languages such as Spanish are orthographically shallow, i.e. sound-based; languages like English or German come somewhere in between.


The dual-route model has been linked to second language acquisition research by asking the extent to which L2 users’ knowledge of sound/letter rules and of individual visual items reflects their different L1 systems of spelling and pronunciation. Users of an orthographically deep language might be expected to have more problems with the phonological route in an L2, users of an orthographically shallow language with the visual route. Chinese L1 speakers acquiring Japanese syllabic kana indeed rely more on visual strategies, English users on phonological strategies (Chikamatsu, 1996); speakers of non-sound-based writing systems are better at visual reading tasks than sound-based ones (Brown & Haynes, 1985). Phonological strategies are nevertheless important for everybody, whether their L1s are Arabic or Japanese (Koda, 1987; 1989).


The first aim of the present paper is descriptive: what types of errors do L2 users of English make? This question will be answered by applying familiar spelling categories to samples of users’ written language. The second question is whether all users of English have the same difficulties with spelling regardless of whether they are children or adults, L1 or L2 users; does English present the same problems to all of its users? This means looking at whether both routes are involved in the L2 or L2 users will tend to prefer only the route they are familiar with. It also involves investigating whether there are characteristic mistakes for speakers of particular L1s, whether derived from their different phonological or writing systems.


The methodology used here is then the analysis of spelling errors made by different types of users of English. Comparable previous work, which will be drawn on here from time to time, includes Bebout (1985), who compared 9‑11-year-old English-speaking children with adult Spanish learners of English, and James, Scholfield, Garrett & Griffiths (1993), who looked at English spelling mistakes by 10-year-old Welsh/ English bilingual children without comparison with L1 users. Errors in the corpus such as lack of word space <inturn> and incorrect lower case <english> that are unrelated to spelling are discussed elsewhere in the context of the general framework of orthographic knowledge (Cook, in preparation, a).


The sources of data for spelling errors are:

L1 adults’ spelling errors (Cambridge). One source is the corpus of spelling errors given in Wing and Baddeley (1980), which was taken from the complete exam scripts of 40 native English‑speaking candidates for Cambridge college entrance examinations, that is to say, adults aged about 18. All mistakes were used here apart from those involving the omission of grammatical inflections, such as plural ‘-s’ atom (atoms), and verb endings, such as find (finding) and form (formed), which are better considered in grammatical terms. This yielded 351 different mistakes, some words having more than one mistake. Adult L2 users of English are shown by ns in the examples.

L1 children’s errors (NFER). A second source is the National Foundation for Educational Research (NFER) survey (Brooks, Gorman and Kendall, 1993), which took a sample of the first ten lines in essays written by 1492 English children aged 11 and 15 to produce various general figures for English children mentioned below. While this group are called children, the oldest may be only three years or so younger than the ‘adult’ group in (i).

L2 adults’ spelling errors (EFLtest). One source for L2 adults is a corpus of writing taken from the English assess­ment test for 375 overseas students at the University of Essex in England given in 1996. Only the first ten lines of the essay questions were sampled, to make the corpus parallel the NFER research. The L1s or countries represented are as follows, with the abbreviations attached to later examples given in brackets: Angola (an) 1; Arabic (ar) 13; Bahasa Malaysia (bm) 7; Bosnian 1; Chinese (ch) 38; Finnish 1; French (f) 6; Georgia (ge) 2; German (g) 6; Greek (gk) 112; Hebrew (h) 5; Hungar­ian 1; Italian (it) 7; Japanese (j) 37; Korean (k) 3; Lithuanian 1; Mauritian 1; Persian (pe) 1; Portuguese (p) 1; Russian (r) 7; Slovak 1; Spanish (sp) 29; Thai 3; Turkish (t) 3; Ukrainian (u) 1; Vietnamese (v) 2. 85 students did not supply their L1 or nationality, indicated by (?). While Greek students form the largest group at 30%, there is a fair representation of other L1s. In principle all of the students had already satisfied the minimum university entry requirement for 1996 of IELTS level 6, TOEFL 540 or Cambridge Proficiency C. This yielded 381 spelling mistakes.

L1 and L2 students’ errors (L1 & L2 collections). Errors were collected from students’ work at Essex university from L1 and L2 users, both handwritten and typed. This represented a similar range of L1s to that in (iii) (indeed sometimes the same students) with the addition of speakers of Polish (po), Urdu (u), Swedish (sw) and Bengali (b), including one trilingual (poitj). This yielded an additional 338 mis­takes, 121 from L1 users, 217 from L2 users. Unlike the earlier corpora which were produced essentially under ‘exam’ conditions, these errors may have been checked, some indeed by a spelling checker.

  • The level of education and academic ability of the adult L1 users and the L2 users are therefore more or less equi­valent. The functions of writing are similar in both the adult natives (i) and the L2 students (iii) since both were examin­ation writing in stressful circumstances. Some quantitative comparisons can be made between samples (i), (ii) and (iii). For other purposes, these sources will be combined so that the Cambridge corpus (i) is added to the L1 collection (iv) to get the ‘L1 corpus’, consisting of 472 mistakes, and the EFLtest corpus (iii) is added to the L2 errors (iv) to get the ‘L2 corpus’ consisting of 598 errors. Additional comparisons will be made with other data sources from time to time.
  • Overall comparisons

  • Overall comparison of L1 children vs adult L2 users
  • A straightforward comparison can be made between L1 children learning English spelling, namely the NFER survey (Brooks et al, 1993) (ii), and adult L2 users, namely the EFLtest corpus (iii), both based on samples of the first ten lines of essays.
  •  Error rates
  • The comparison of the 715 fifteen-year-old L1 children with the 375 L2 users is given in Table 1. The proportion of error-free ten-line samples from L1 children was 39%, from L2 users 40.2%; the number of samples with more than six errors in L1 children was 6%, in L2 users 2%; the average number of errors per sample was 1.6 for children, 1.02 for L2 users. In other words the L2 users were similar to L1 children, having similar proportions of error-free samples, of samples with 6+ errors, and of errors per sample. The differences between the samples for each group in terms of error-free samples were not statistically significant c(2=0.001, df.1) but the numbers of samples with over 6 errors were significant (c2 =7.3, df.1, p.<0.01)  The 1.6 errors per ten lines for 15-year-olds incidentally compares favourably with the 4 spelling mistakes in the first ten lines of the manuscript of Keats’ Ode to Autumn (sweeness, furuits, hazle, wam) (Gittings, 1970).
  • Given the heterogeneous nature of the EFLtest sample, it is not productive to separate the L2 users by first language. Looking at error rates, the 18 speakers of Arabic and Hebrew (i.e. Semitic languages with non-Roman alphabets) had an average of 4.44 spelling errors per sample comp­ared to 1.14 for 112 speakers of Greek, 1.11 for 49 speakers of Roman alphabet languages and 0.97 for 76 Chinese and Japanese (i.e. character-based languages). However, the average scores on the multiple choice section of the EFLtest were 46.6% for Semitic speakers, but 65.6% for speakers of character-based languages. That is to say, the difference in numbers of spelling errors is explained by their different levels of English as much as by their L1.
  • Detailed comparisons of L1 children and L2 users
  • Error categories
  • insertion of single letters: untill for until
  • omission of a single letter: occuring for occurring
  • substitution of one letter by another: definate for definite
  • transposition of two consecutive letters: freind for friend
  • grapheme substitution, i.e. multiple related changes: thort for thought
  • other mistakes, such as local accent: fought for thought
  • The data was also analysed into several minor types, such as doubled letters, which will be brought in to the discussion later. While these error categories are fairly traditional and the first four are based on non-linguistic criteria, they provide a starting point for comparison purposes.
  • The data from the EFLtest were analysed into the same major categories, figures for which are presented in Table 2, along with the L1 adult group to be discussed below. The major differences between the L2 users and the L1 children are: the increased proportion of substitution errors by L2 users, 31.7% compared to 19%, and of ‘other’ mistakes, 7.6% compared to 3%; and the diminished proportion of grapheme substitution, 3.7% compared to 19%, and of omission, 31.5% compared with 36%. These differences between the groups are significant (c=103.3, df.5, p.<0.001). The comparison is here between the L2 users and all the 11- and 15-year-olds rather than the 15-year-olds alone as these are presented in a comparable form in the NFER report; an analysis in terms of the percentage making no errors in particular categories showed no difference for pupil’s age in this sample (Brooks et al, p.15).  

    L1 children versus L1 adults

    The comparison of L1 children with L1 adults may be affected by the fact that errors in the Cambridge L1 sample were collected from whole answers rather than just from the first ten lines. The main difference seen in Table 2 is in grapheme substitution, where children scored 19% compared to adults’ 3.4%. However, grapheme substitution stands out as a different type of mistake since it is based on phonology rather than letters; allocation to this category is thus more subjective than the other two. To compensate, L1 adults slightly increased the proportion of omission by 7.5% and of insertion by 4.9%. Though the methodology is very different, Treiman (1993) found first-grade English-speaking children’s substitution errors running at about twice the rate of omission errors, essentially the reverse of the results for both groups here. Presumably substitution also is used less by the age of 11. The differences between L1 children and L1 adults are statistically significant (c2 >=70.52, df.5, p.<0.001,).

    L2 adults versus L1 adults

    The comparison of L2 users with adult L1 users reveals a lower proportion of omission for L2 users, 31.5% compared with 43.5%, compensated by slightly higher proportions of substitution, transposition, grapheme substitution, and ‘other’. Error rates for the L1 adults could not be calculated as the whole exam script was used rather than the first ten lines. The two groups seem similar, apart from omission. The figures are statistically significant (c2=18.7, df.5, p.<0.005).In broad terms these results show that there is little to distinguish L2 users of English from either L1 child­ren or L2 adults in terms of the categories of spelling error used by the NFER study. The error rates of the 15 year-old L1 children and the L2 users are similar; the proportions and types of errors are similar across all three groups. This result differs from Bebout (1985), who found that English-speaking children aged 9‑11 had a far higher error rate than Spanish adults; her children were, however, in a younger age-bracket than those in the NFER survey. The features that here distinguish the L2 adults from both L1 children and L1 adults are a lower proportion of omission and a higher amount in the ‘other’ category. Compared to children, L2 users also have a far higher proportion of substitution, 31.7% compared to 19%. L2 users do not then have very distinctive spelling characteristics from L1 users or L1 children in terms of the categories used here.

    More Detailed Comparisons

    In the previous section the data-collection methods were kept as equivalent as possible to try to get a reasonable comparison between the three groups. This meant analysing the errors in fairly traditional, letter-based categories. This type of analysis is unrevealing when applied to language systems except at a gross level. This section takes four main categories from the last section as a skeleton, that is to say leaving out grapheme substitution and ‘other’ except incidentally, but deals in more detail with sub-categories of errors, turning to the amplified L1 and L2 corpora and varying the order of types to get a smoother order of presentation. Errors are here mostly taken as representative rather than comparable in figures. The intention is to provide a more detailed description of typical L1 and L2 errors and to lead towards questions concerning the involvement of the two routes and their relationship to the user’s L1.


    One of the most numerous major categories for L1 and L2 users is substitution, replacing one letter with another single letter. In the L1 users substitution involves a proportion of 1 consonant mistake to 1.4 vowels, in the L2 users 1 consonant to every 0.8 vowels; that is to say, the L2 users have proportionately more consonant substitutions.

    Vowel substitution

    The majority of vowel substitutions reflect exchanges between <a>, <e> and <i>, as they did in Bebout (1985).

    <a> and <e>. In one direction <a> is often found for <e>, as in catagories bm and machinary ns; in the other <e> stands for <a> in exectly gk and persueded ns.

    <e> and <i>. Again a large group of errors, with <e> wrongly used in examples such as definetely sp and penecillin ns, and <i> wrongly used in biggin h and convinient ns.

    <a> and <i>. The set of errors with <a> includes languistics ar and feasable ns, the set of incorrect <i>’s includes privite gk and imaginitive ns.

    This can be illustrated as exchanges within a triangle of three vowels as in:

    The errors for the six possible exchanges amount to 68.7% of the total vowel substitutions for L2 users and 76.6% for L1 users. The remaining twelve possible vowel‑pair relationships thus account for less than a third of the errors for both groups.

    Vowel substitutions also tend to affect certain combinations of letters, often corresponding to suffixes:

    <-ent>/<-ence> vs <-ant>/<-ence> as in transparant j, utterences ur, irrelevent ns, referance ns

    <de->/<di-> as in defacult ar, devided h, dicided gk, destinct ns, divice ns

    <-ate>/<-ite> as in definately sp gk, definate ns

    Though L1 users had some errors with <-able>/<-ible> as in responsably ns and feasable ns, this type did not occur with the L2 users.

    Why should <a>, <e> and <i> cause the greatest problems? Frequency can­not be the sole cause since <o> comes in the top ten frequent letters in English alongside <a>, <e> and <i> (Gaines, 1940). The obvious reason can be found in the rules of sound/letter correspondences. Unstressed vowels in English are commonly reduced to schwa //, as in the <e> of transparent and the <a> of utterances. So, while it is possible to link all three vowels to a single sound //, in the reverse direction, the sound // can be spelled in three ways. Looking at the totals for all vowel substitutions, for L2 users 41.2% could be pronounced as schwa, for L1 users 66.2%. The L1 children in the NFER and Bebout research also made frequent mistakes with schwa vowels. Many substitutions then represent the problem of spelling //, proportionately higher for the L1 users. This cannot, however, be the whole explanation since many wrong uses of <a>, <e> and <i> did not involve reduction to unstressed schwa, such as charecteristics bm, perticipate j, and edecation gk.


    Consonants also show a wide range of alternations, comparatively smaller for the L1 users with a total of 21 different pairs, compared to 38 pairs for the L2 users. Most L1 errors appear random slips of pen or finger and are found only once or twice, for example <d> for <k> weeds and <w> for <h> upwill.

    <s>, <c>, <z>, <t>

    The main type of errors for the L2 users is the choice between <s>, <c>, <z> and <t>, as in cources gk, immence j, persent gk, influencial j, spetial poitj, revoluzion gk and amasing gk. This is the notorious sound/letter problem with sibilants, as witnessed by the sprinkling of mistakes from L1 users such as Spanich, existense, prise (price) and eazy. In particular this goes with certain letter combinations:

    <-ns->/<-nc-> as in immence j, existense ns

    <-se>/<-ci->/<-ce> as in prosess ge and prise (price) ns

    <-ti-> as in condicioned sp, contension ns.

    Again, while morpheme boundaries are often involved, this does not cover all instances; instead the letter-pairs function as some kind of orthographic unit. Albrow (1972) divides <-se> and <-ce> between the basic and Romance spelling systems for English; Carney (1994) provides seven main rules for working out the difference between <s> and <c>, motivated by the length of the preceding vowel as in lose, whether the preceding vowel is /u:/ as in spruce, and other factors. Interestingly no mistakes were found here of the type where the letter <c> corresponds to different sounds in derived forms, as in electric /k/ versus electricity /s/.

    <d>, <t>

    A few examples occur of alternation of <d> and <t>, as in Grade Britain gk and kintergarten ch. The only two ns mistakes are irregular past tenses, build and spend, a type not found in the L2 users.

    <b>, <d>, <p>

    Only the Greek speakers in the L2 sample confused <b> with <d> bepent (depend), <d> with <b> descride (des­cribe), <b> with <p> cabable and <p> with <b> propably. Of these, <p> for <b> is found once in the L1 users propably ns. These alternations appear to be a mistake with the mechanics of letter formation: <p> is the vertical mirror of <b>, <d> is the horizontal mirror of <b>. Or they might reflect the relative lack of aspiration in Greek /p/. As we have seen, Greeks are over-represented in the sample so this may not be necessarily a Greek phenomenon alone—indeed it is common in L1 dyslexia in English. Alternation of <b> and <p> has also been found with Arab students and attributed to phonology as Arabic lacks a /p/ (Ibrahim, 1978).

    <l>, <r>

    Only the Japanese speakers are uncertain whether to use <l> or <r> as in walmer, familiality, grobal and sarary. While the obvious explanation is the well-known Japanese inability to distinguish the pronunciation of /l/ and /r/ (Cochrane, 1980), this may not be the full story since, as well as character and syllabic scripts, Japanese also uses romaji, a system (or rather two alternative systems) for writing kana in Roman script (Smith & Schmidt, 1996). Many <l>/<r> words have conventional romaji spellings different from English, for example sarari. Hence the explanation for written <l>/<r> confusions may in part be the alternative writing systems known by Japanese users rather than simply their pronunciation difficulties.

    <y>, <i>

    It seems best to treat all mistakes with <y> together. As a vowel, word-final <y> corresponds to /i/ as in wealthy and syllable-internal <y> to /ai/ as in type or /i/ as in pygmy. Substitution of <i> and <y> is common in both directions, whether the L1 users’ bipass and bycycle or the L2 users’ analise j, g and stydies gk. This is particularly com­mon with the <-ies> spelling when <y> is followed by an inflectional <s> whether in nouns dictionarys ns and essayies p, or verbs implys j. Other problems are: whether to change <i> to <y> at a morpheme break, as in Chomskian ns, identifing ns or studing h; the use of <y> to represent /i/ as the only vowel in the syllable phisical ar and synonim sp; and the conversion of nouns in <y> to verbs discoveried ch.


    Omissions involving double consonants and <e> are described separately below. The letter most frequently omitted by L1 users, amounting to about 20% of the total of omissions, is <r> as in bain, magerine and execise. Since the L1 users undoubtedly have non-rhotic British accents, <r> before a consonant or silence does not corres­pond to /r/; 14 out of the total of 24 mistakes involve ‘silent’ <r>s of this type, 8 <er>. <r> is rarely omitted by the L2 users, the only examples being carrees sp, coner ch, county (country) ge and intenational ?; indeed some of the L2 learners may have an American rhotic accent.

    The letter most often omitted by the L2 users is <n> in consonant clusters, amounting to about 8% of errors, as in biligualism j and desigs sp; the few L1 mistakes with <n> were mostly with initial <in> as in iterest and iformation.

    Most omissions by both groups involved reducing a pair of conson­ants to one: <ct> attracs gk, predicable ns, particularly consonant pairs corresponding to a single sound or nothing, <cq> aquir ch, aquisition ns, <ch> scolarship gk or <gh> thougt ns. As can be seen from these examples, many letters that were omitted were not ‘silent’ but corresponded to a sound (even if in a digraph such as <cq>), apart from silent <r> seen above and the forms discussed below. There is therefore no real difference between the types of omissions in L1 and L2 groups to explain why the L2 users should omit proportionately less. While the sample is not balanced in terms of numbers per L1 or per level, the error rate for omission does not seem to vary between L1s.

    Consonant doubling

    Consonant doubling can be treated here as a whole. One aspect is indeed the omission of consonants from doubled letters. Consonant doubling is one of the most complex areas of the English spelling system, involving the length of prec­eding vowels, Latinate prefixes, word divisions, British versus American English, and so on (Carney, 1994). Hardly surprisingly, both L1 and L2 groups made mistakes with most doubled letters, ranging from <gg> biggin h and aggreement ns to <ff> proffessional bm, sp and profficiency ns. Double <ll> was a particular problem, both groups having unnecessary double letters, as in controll j, allready h, carefull gk, bellow ns and propell ns, and single letters in a double pair, as in filed b and modeled ns. <ss> had similar mistakes in both directions: decission gk, occassion ns, necesity sp and posses ns.

    Consonant doubling seems a built-in problem with the English sound/letter system that affects both groups. The NFER survey also identified doubling as the fourth largest category of minor errors with children. Consonant doubling was the second largest category of errors for Bebout’s Spanish users and fifth largest for L1 users, caused, she claims, by the use of consonant doubling to represent a different spoken consonant in Spanish, <rr>, <ll>, but not in English (Bebout, 1985).

    Omission of <e>

    ‘Silent’ final <e> has several functions. The use for indicating the preceding vowel leads to some word-final <e> omissions by L2 users mad ar, morphem g and softwar gk, and by L1 users befor, determin and moleul. ‘magic e’ was also sixth largest of the minor categories in the NFER survey. Also common are: omission of <e> before <-ly> anfortunatly gk, completly ns, likly ns; and omission of <e> in <fore-> as in forsee gk ns, forsight ns and unforseen ns. Omission of <e> in the past tense <-ed> was sometimes found, as in prefferd h, happend g, ns and invertd ns.


    Most systematic insertions have already been discussed under consonant doubling above. Many seemed random performance errors such as whewre ns, studrents ch and thorough-goping gk.

    One group of sound-based insertions consists of incorrect sound/letter correspondences that coincidentally add a letter. Common from the L2 users is the use of <ie> corresponding to /ai/ in words spelled with <i>, priemary ar and dierect bm; from the L1 users the use of <er> corresponding to <> as in intergated and properganda. To these can be added some mistakes with <ur> that were assigned to grapheme substitution, such as persue bm. Some L1 insertions probably reflected the user’s pronunciation as in vocabularly and idear (indeed the printed text of James et al (1993, p.299) itself contains an uncorrected use of vocabularly, showing the invisibility of this mistake to authors and editors).

    Finally Japanese users appeared to use epenthetic vowels to pad out consonant clusters into a CV structure as in adovocated, courese, Engilish, respecitively and subejects. Silent <e> is also sometime added as in develope ns and useage ns, ch.


    A few transpositions of pairs of letters were found. The pair that was transposed more than once by the L1 users was <e> and <i>, whether in the famous problem following <c> concieve, percieved, recieved, or in words such as heirarchy. L2 users similarly produced concieved ch as well as foriegner bm, sceince gk and thier bm. Transpositions of <e> and <i> only accounted for 18% of the NFER children’s total. While the frequency in the adult L1 and L2 users seems low (4 examples in the L1 such as acheived and 5 in the L2 such as sceince gk), this problem persists despite its high public profile. The only other pair to be transposed more than once by the L1 users was <sy> following <p> as in pysche and pyschology, again a well-known problem.

    The other transposition categories to be represented more than once for the L2 users were <er>, <ur> and <re> semestres sw, hrut bm, foerigners ch, and <ng> versus <gn> assinged u and congitive j. The confusion of form with from was also common from j, g, gk. Two words exchanged <t> and <e>, resulting in the homophone discreet (discrete) pe and the near-homophone quiet (quite) gk, both well-known confusions. Several of the L2 transpositions consisted of the exchange of the two letters of a digraph with <h>, i.e. <ph> shpere u, <th> lenght gk and <gh> althouhg gk. Bebout (1985) found a prepond­er­ance of transpositions involved <l> or <r>, with a higher proportion of trans­positions from her L1 children than in either the NFER sample or the L1 and L2 groups studied here.

    The answer to the question about whether there are differences between the spelling of L1 children, L1 adults and L2 adults is clear: even at closer scrutiny there are still strong resemblances between the L1 errors and the L2 errors. Whichever sub-category of spelling error one picks, it is usually possible to provide an example from both the L1 and the L2 samples. There are indeed some characteristics found in certain L1 groups, such as the Japanese confusion of <l>/<r> or the Greek problem with <p>/ <b> /<d>. But these make up only a small proportion of the errors.


    We can now return to the general question about the extent to which L2 learners make use of the different routes for spelling. Like L1 users and L1 children, a vast preponderance of L2 users’ mistakes involves sound/letter correspondences, whether the overall categories of substitution or grapheme substitution, the more precise problems with schwa and <s>/<c>, or even the transfer of pronunciation mistakes to written language such as the Japanese use of <l>/<r> and epenthetic vowels. In addition we can see problems with sound/letter correspondences in the use of homo­phones such as discreet (discrete) pe, fairs (fares) sp, to (too) g and too (to) gk, and of near-homophones such as fear (fair) gk and break (brick) gk. Such sound-linked errors may be more extensive since the pronunciation of L2 users of English may differ from native speakers in significant ways, just as young L1 children often analyse the pronunciation of words differently from adults (Treiman, 1993). Other mistakes based on sound/ letter correspondence include agoe pe, knoledge bm, gk, sp and prosess ge. More precise evaluation of the contribution made by the L1 phonological system requires considering the spelling mistakes of a single L1 group in the context of their L1 phonology, as is partly done in Bebout (1985) and Ibrahim (1978).

    The importance of the visual route for knowledge of individual words is difficult to test against corpus data, essentially because by definition it treats words as idiosyncratic items rather than through generalisable rules. The ten words that are most frequently misspelled by L2 users are different forms of because, career, choose, interest, kindergarten, knowledge, necessary, professional, study, and which. Some have many variant misspellings: because is becase gk, h, becaus ge, becouse ?, becuase pe, becuse ar and begause gk; the different versions of study include studrents ch, studys poij, stydies gk, studie ?, studiing ch, studing ch, h, gk and steday ar. Clearly users still have problems with certain words of high frequency in the academic English that they use; they have still not internalised some ‘difficult’ frequent words as individual items.

    The results may be misleading in that, like any analysis of errors, the evidence of spelling errors shows on the one hand what the users cannot do rather than what they can do, and on the other what they have presented rather than the words they have avoided. In terms of spelling errors, the L2 users in the comparison of the EFLtest and NFER samples had 40.2% of error-free samples and similar error rates to L1 children. In any other area of language acquisition an L2 user would be extremely satisfied with performance at 15-year-old native level! The results show some L2 problems with the phonological route and with individual words such as study and interest.

    There is also missing information about what can be called ‘orthographic rules’. The standard dual-route model subsumes everything within two processes. Hence it does not deal with regularities in the visual forms. For example in English <ck> occurs only at the end of orthographic syllables as in dock, never at the beginning as in *ckod. Much of the complexity of English spelling is a matter neither of sound/spelling correspondences nor of individual visual forms but of regularities of letter combination. A corpus-based study such as this cannot distinguish knowledge of such orthographic rules from knowledge of sound rules or individual instances; orthographic rules are being investigated in a response time framework elsewhere (author, to appear, b). Some L2 transposition mistakes involving digraphs with <h> may show lack of orthographic rules, such as althouhg gk, as may some L2 omission mistakes such as majr bm and teachg j; these may equally be straightforward performance errors of writing or typing.

    The residual question, as always, is the involvement of the L1. Both Bebout (1985) and James et al (1993) assign a large proportion of the errors to the user’s L1, particularly to the phonological system, though both are concerned with younger children. Bebout (1985) explains consonant doubling mistakes in English as the use of doubling for phonological reasons in Spanish writing. James et al (1993) attribute 38.5% of mistakes to interference from the users’ L1 Welsh, whether from phonological interference in the L2 pronunciation, orthographic interference from Welsh sound/letter rules, or transfer of cognate words. Again it is hard to tease specific L1 interference from the broadly-based results here, though some points about Greek and Japanese users have emerged. The fact that categories of error are represented so evenly by the L1 children and the different adult L1 and L2 users suggests that transfer from the L1 is less important than had been believed. Earlier research may have been misleading in not including adult controls (Bebout, 1985), or native controls (James et al, 1993) or in not having a wide enough sample of L1s.

    Some brief remarks can be made about the implications for teaching. In the National Curriculum for England and Wales (HMSO, 1990), the targets for spelling start at level 2 with high frequency short words (see, car), ‘common patterns’ (coat, goat) and letter-names, go on at Stage 3 to longer words with ‘common patterns’ (because, after), ‘word families’ (grow, growth) and ‘regular’ patterns for letter strings (-ng, -ion, -ous), and finish at Stage 4 with ‘the main prefixes and suffixes’. While there may be some justification for this from research and theory—the idea of high-frequency words being stored visually (Seidenberg, 1985) or the constancy of morpheme representation (Chomsky & Halle, 1968)—it does not cover the most common L1 error types outlined above, whether the specific characteristics of sound/letter correspondences (substitutions for schwa, say), or the characteristics of orthographic rules (consonant doubling, for example). If the National Curriculum indeed reflects the current system of teaching spelling in the UK, no wonder Prince Charles once said ‘All the letters sent from my office I have to correct myself, and that is because English is taught so bloody badly’.

    In general, the L2 users’ problems with individual words such as because seem best tackled by utilising the system for knowledge of individual word forms to get the students concentrating on the visual forms of a small number of words that are predictable in an academic context, such as knowledge, university and career. Other more widespread problems with the English sound-to-spelling correspondences such as doubling might lend themselves to specific explanation and practice, as provided for example in Digby and Myers (1993) or Abbot (1978).

    The conclusion is then that L2 users are surprisingly good at English spelling. Doubtless this is the tip of the iceberg, representing only those who have made it successfully through the difficult early stages and smoothing over differences between L1s. But it certainly shows that spelling is a unique area of language achievement and needs far more research in its own right to discover what the difficulties may be at earlier stages and how these relate to general issues about how the multi-competent mind uses two writing systems as opposed both to the monolingual use of several systems, as in Japanese users, and to the monolingual use of a single system, as in English users.



    Abbot, E. (1978). ‘Teaching English spelling to adults’. English Language Teaching 32:119-121

    Albrow, K.H. (1972). The English Writing System: notes towards a description. London: Longman

    Baker, C. (1993). Foundations of Bilingual Education and Bilingualism, Clevedon: Multilingual Matters

    Bebout, L. (1985). ‘An error analysis of misspellings made by learners of English as a first and as a second language.’ Journal of Psycholinguistic Research 14: 6, 569-593

    Brooks, G., Gorman, T. & Kendall, L. (1993). Spelling It Out: the spelling abilities of 11‑ and 15-year-olds. NFER

    Brown, T. & Haynes, M. (1985). ‘Literacy background and reading development in a second language.’ In Carr, T.H. (ed.), The Development of Reading Skills. San Francisco, CA: Jossey-Bass

    Carney, E. (1994). A Survey of English Spelling. Routledge

    Chikamatsu, N. (1996). ‘The effects of L1 orthography on L2 word recognition.’ Studies in Second Language Acquisi­tion 18: 403-432

    Chomsky, N. & Halle, M. (1968). The Sound Pattern of English. Harper & Row

    Cochrane, R. (1980). ‘The acquisition of /r/ and /l/ by Japanese children and adults learning English as a second language.’ Journal of Multilingual and Multicultural Development 1: 331-60

    Cook, V.J. (to appear, a). ‘The components of orthographic knowledge

    Cook, V.J. (to appear, b). ‘Orthographic structure and dual-process models’

    Digby, C. & Myers, J. (1993). Making Sense of Spelling and Pronunciation. Prentice-Hall

    Ellis, R. (1994). The Study of Second Language Acquisition. OUP

    Gaines, H. (1940). Cryptanalysis. American Cryptogram Association, reprinted by Dover Books 1956

    Gittings, R. (1970). The Odes of Keats and their Earliest Known Manuscripts. London: Heinemann

    Her Majesty’s Stationery Office (HMSO) (1990). The Education (National Curriculum) (Attainment Targets and Programmes of Study in English) (No 2) HMSO

    Ibrahim, M. (1978). ‘Patterns in spelling errors.’ English Language Teaching 32: 207-12

    James, C., Scholfield, P., Garrett, P. & Griffiths, Y. (1993). ‘Welsh bilinguals’ spelling: an Error Analysis.’ J. Multilingual and Multicultural Dev. 14: 4, 287‑306

    Koda, K. (1987). ‘Cognitive strategy transfer in second language reading.’ In Devine, J., Carrell, P. & Eskey, D. (eds.), Research in reading in English as a second language. Washington: TESOL

    Koda, K. (1989). ‘Effects of L1 orthographic representation on L2 phonological coding strategies.’ Journal of Psycholinguistic Research 18: 201-222

    Paap, K.R., Noel, R.W. & Johansen, L.S. (1992). ‘Dual-route models of print to sound: red herrings and real horses’, in Frost, R. & Katz, L. (eds.), Orthography, Phonology, Morphology, and Meaning, Elsevier, 293-318

    Seidenberg, M.S. (1985). ‘The time course of phonological code activation in two writing systems.’ Cognition 19: 1-30

    Smith, J.S. & Schmidt, D.L. (1996). ‘Variability in written Japanese: towards a socio­linguistics of script choice.’ Visible Language 30: 46-71

    Treiman, R. 1993. Beginning to Spell. New York: OUP

    Wing, A.M. & Baddeley, A.D. (1980). ‘Spelling errors in handwriting: a corpus and a distributional analysis.’ In Frith, U. (ed.), Cognitive Processes in Spelling. London: Academic Press, 251-285



    1. I am endebted to Philip Scholfield and Denise Chappell for their comments and to the staff and students of the English as a Foreign Language Unit at the University of Essex for their help.