An Experimental Approach to Second Language Learning

Vivian Cook, University of Essex

In V. Cook (ed) (1986), Experimental Approaches to Second Language Learning, Pergamon, pp 3-21

The Reasons for Conducting Second Language (L2) Research

A useful starting point is to ask why researchers are interested in L2 learning. Although they are seldom made explicit, four main orientations can be discovered.

1. To investigate Second Language Learning itself

In the United States particularly, and to some extent in Europe, a separate discipline of second language learning research has evolved, which, though related in some ways to psychology, to linguistics, and to language teaching, considers itself an autonomous field of study, under no obligation to subordinate itself to other sciences unless they fit in with its own methodology and aims. To quote Evelyn Hatch, "The basic question that second language research addresses is: how can we describe the process of second language acquisition." (Hatch, 1980, p. 177). Research with this emphasis is mostly carried out by those who come into the study of second language learning from psychology or linguistics. It aims at describing L21earning in its own terms, and has led to the recent proliferation of "models" of L2 learning, such as the Monitor Model (Krashen, 1981) or the Acculturation Model (Schumann, 1978). For this approach the implications of L2 research for language teaching or for other areas are interesting but are of secondary importance. Probably the papers in the present volume by Meara and by Nation and McLaughlin are closest to this orientation.

 2. To Improve Language Teaching

Much L2 research, however, still has an applied goal; it is useful not for itself but for its applications to language teaching. Thus a recent book with the title Foreign and Second Language Learning states its aim as "to examine aspects of learning which might help us to improve teaching" (Littlewood, 1984, p. 2). Many researchers with this orientation have come into the field from language teaching; they argue that, since language teaching is only successful if language learning takes place, the more we know about language learning the better we can teach; real progress in language teaching comes not so much from innovations in syllabus design or teaching techniques as from basing teaching on better information about learning. One of the encouraging aspects of the movement to emphasize comprehension-based teaching methods is not so much the proposed teaching techniques themselves as the impressive body of experimental evidence that has been put forward for them, compared with the almost total lack of evidence for other contemporary models such as communicative language teaching. This approach is represented in this book by the work of Garcia and Winitz and is discussed further in Chapter 2.

3. To contribute to wider issues in linguistics and the linguistic theory of language acquisition

Second language learning can also be studied in order to test hypotheses from linguistic theory. White's article in this volume, for example, explores the Chomskyan theory of parameter setting in relationship to second language learning. One interpretation of this approach claims that first language acquisition is difficult to study because it confuses the effects of language acquisition with cognitive and social development; these are, however, separated in L2learning in adults. L2 learning is then a better test case of universals of language acquisition than L1 learning and has importance for linguistic theory (Gass and Ard, 1980). Even if this argument is not fully accepted, linguistics proper nevertheless can gain useful information from L2learning. Yet, however attractive this is in principle, in recent years L2 learning has had little to say to linguistics or to language acquisition. The main reason is the lack of connection between L2 research and contemporary linguistic theory. Partly the cause is precisely the attempt to make L2 learning an autonomous subject in its own right; straight linguistics and first language learning research were seen as separate areas. Partly, also, it was due to the alienation of L2 researchers from contemporary linguistics; the theories current in the early 1970s appeared remote, mathematical, and not connected with "practical" questions about language learning. With the exception of the select band of researchers, typified by White in this collection, who have recently been examining the relevance of current Chomskyan theory for L2 learning, L2 researchers have made little attempt to make connections with recent developments in either linguistics or first language learning. It is argued elsewhere that this is regrettable (Cook, 1985); over the past decade linguistics and first language learning research have evolved and in many ways have become simpler and closer to practical problems.

 4. To Contribute to General Issues in Psychology

Just as on the one side L2learning can be linked to linguistics, so on the other it can be related to psychology. Many researchers have come into the area from psychology or have adopted psychological research methods; a book such as Research Design and Statistics/or Applied Linguistics (Hatch and Farhady, 1982) despite its title has far more connection with psychological research than with linguistics, and indeed has no references at all to the literature of applied or theoretical linguistics. One motivation for L2 research is to settle problems in psychology; "the mere fact of possessing two or more language repertoires poses interesting puzzles regarding the nature of the underlying memory systems, thought processes, productive efficiency, and the like" (Paivio and Begg, 1981, p. 288). For example, in the development of memory children progress from an adding stage in which they repeat each word in a list several times to a combining stage in which they combine it with previous words; is this a matter of gaining cognitive maturity or of becoming familiar with language? Perhaps only L2 learning can tell us; one experiment suggested that L2 learners indeed start with a combining strategy (Cook, 1981a), and consequently that this development is a matter of cognition rather than language. As with linguistics, L2learning can provide a special testing ground for psychology because of the way it disengages language learning from maturational issues. My own contribution to this volume uses L2 learning to investigate the boundaries between language and cognition; the contribution by Esser and Kossling is also firmly within this psychological approach.

 Doubtless other purposes can be achieved by studying people learning second languages. It may contribute to the treatment of mental illness, it may help teaching deaf children, it may help political and social problems, it may contribute to our understanding of how the brain develops; and so on. The reason for starting here is not to itemize all the possible orientations but to stress that research is not carried out in a vacuum; the approach, the techniques, the conclusions are coloured by the researcher's purposes and goals and by the assumptions of the research paradigm within which he or she is working. A piece of research has to be evaluated against its overall purpose.

Overall Research Designs in L2 Research

One way of approaching L2 learning is through the concept of time. Language learning represents change taking place through time; time therefore is a central feature of the research design. One possibility is a longitudinal study in which learners are followed over a period of time; samples are gathered at different intervals and progress between them is measured over periods ranging from months to years. Henning Wode, for instance, studied his own four children acquiring English as a second language continuously for 6 months (Wode, 1981). Traute Taeschner (1983) described her own two children simultaneously acquiring German and Italian over several years. A longitudinal study has the advantage that it is directly concerned with observing development, since the same learners are involved throughout; it has practical problems in the sheer length of time that is necessary and in other factors such as the drop-out rate among the learners. Secondly, different learners can be studied at different stages of development, different points of progression through time, which can be considered as if they were cross-sections of the same learners spread out through time. An example of a cross-sectional study is Padilla and Lindholm (1976), who tested Spanish-English bilingual children between the ages of2 and 6.5 to establish the order of development of interrogatives, negatives, and possessives. Cross-sectional studies have the advantage of requiring less time, since all stages of learning can in effect be studied simultaneously, but also the disadvantage that they do not study the same learners and hence, strictly speaking, they are not concerned with "development" itself. Thirdly, time may be built into the design in the form of an experiment with a starting and a finishing point; the changes in the learners over this period are then evaluated. The period of time used is usually minutes or weeks rather than years. For example, the paper by Nation and McLaughlin in the present volume concerns people who learn an artificial language for a matter of minutes and are then tested on how well they have mastered the implicit grammatical rules. In this case, as in the longitudinal study, in a sense the research uses "real" time in that the period of development corresponds with the period studied. The difficulty with short term experimental periods of time is in fact whether they tell one very much about learning over longer periods.

Several ways of categorizing L2 research are doubtless possible. One convenient approach is to divide it into three major methods.

 1. Language Elicitation and Analysis

The most straightforward method is to record samples of the learner's language, technically a "corpus", and then to analyse them; this may be called the "observational" method. The outcome of the research is typically the description of the stages through which the learner progresses, couched in terms of syntax, learning strategies, etc. Inevitably, the observational method concentrates primarily on language production, which is readily observable, rather than on comprehension, which is hard to establish from observation alone.

The language samples could in principle be "authentic", that is to say examples of the learner's natural use in real-life situations, such as shops or travel agents, or the by-products of teaching, such as student essays. The common factor in "authentic" language data is that it has not been produced solely for purposes of research but for the purposes of the learner himself, unaffected by the observing experimenter. Alternatively the language samples can be "non-authentic", i.e. specially produced for the research. A typical format is an interview in which the researcher conducts a guided conversation with the learner which is recorded on tape or video. Evelyn Hatch (1978), for instance, uses data from interviews between native speakers and foreign learners. The same technique is also reflected in the more structured use of picture description or storytelling; for example, in the BSM test the experimenter asks questions about seven cartoon pictures (Dulay and Burt, 1973). The overall difficulty with both techniques is the representativeness of the corpus that has been recorded; in some ways it may be untypical of the learner's behaviour; for instance, picture description may lend itself to the present continuous tense and hence under-represent the past tense or the present simple. The use of "non-authentic" samples involves the problem that the technique of collection may influence the data; an interview is structured so that it has the participants in leader and follower roles; it is not surprising that analysis of interviews shows that learners adopt strategies for "following" the conversation, of checking, reacting and so on, rather than for "leading" the conversation, which may be untypical of their conversational behaviour in other situations.

 The observational method also necessitates a decision about the proportion of the data to be taken into account. At one extreme, there is the selectivity of "diary" studies; the researcher notes down pieces of language that he or she feels to be import1int. Inevitably this biases the sample in favour of language items that stand out because they are unusual. We do not notice how often we use the word "off" but we are aware how often we use "uninstantiated". At the other extreme, all the data is used; everything in the corpus is to be analysed by hook or by crook. In between these extremes come variations in which some proportion of the data is studied and the rest rejected, according to a set of explicit principles. Put in another way, the problem is distinguishing competence from performance; the speech of natives contains a fair proportion of accidental performance errors that are not related to their knowledge of the language, their linguistic competence; an L2 learner is subject to similar constraints. To use Pit Corder's terms (Corder, 1967) a corpus contains both performance "mistakes" and competence 'errors"; an observational method has to decide what it is going to consider competence, and what performance.

2. Measurement of Learner or Situational Variables and Correlation with Language

The second major method consists of finding two groups of learners who differ in their makeup or in the situation they encounter and seeing how the chosen factor affects their L2 learning. The end product is an account of the differences between the two groups, measured in various ways. This can be called the "difference method". While the learners are tested naturally or artificially, neither they nor the situation are modified by the researcher.

The difference method therefore assumes that L2learning varies from one person to another and from one situation to another. While all normal human children are commonly believed to acquire their first language successfully, this is often said to be far from the case in a second language; not all L2 learners are equally successful and few achieve the heights of a native child. The difference method assumes that the key to L2 research is to discover the differences in the individual or the situation that cause this. Nevertheless it is in itself a moot point whether some features of L2 learning are common to all learners and so not appropriate to the difference method.

It would be a mammoth task to describe all the varieties of learner difference that have been claimed to be relevant to L2 learning. It may be that age is an important factor: say, older learners are better than younger ones, as discussed in the next chapter. It may be that sex is important; perhaps girls are better than boys at L2 learning, as suggested for example in Burstall et al. (1974). It may be the learners' motivations and attitudes that count: the desire to integrate with the foreign culture or to use the new language within one's own society (Gardner and Lambert, 1972). It may be various personality factors that are crucial: extroverts may be better than introverts (Cohen, 1977); those with a "field-dependent" cognitive style may do better in some circumstances than those with a "field-independent" style, and vice versa (Hansen and Stansfield, 1981). Learners who have learnt two languages may be better at a third, as the paper by Nation and McLaughlin in this volume suggests. There may be a conglomeration of factors amounting to aptitude for learning a second language, discussed by Skehan and by Esser and Kossling in this book. All of these and many more have been advanced as explanations for learner differences; a more complete list can be found in Hammerley (1982). It must also not be forgotten that the relationship between learner variables and L2 learning goes in both directions; learning a second language can affect the learner's makeup as well as vice versa. Thus while some have looked at the beneficial effects of intelligence on L2 learning (Genesee, 1976), others have looked at the beneficial effects of L2 learning on intelligence (Peal and Lambert, 1962).

One category of difference consists of the learner's first language. At one level this is a matter of the overall relationship between the L1 and the L2. Some languages seem more difficult to learn than others; Chinese takes twice as long as French to learn to the same level of competence. But this has to be qualified since it is based on the experiences of English-speaking learners of second languages. For a Japanese learner, Chinese may be twice as easy as English; for a Finn they may be equally difficult. The second paper contributed to this volume by Paul Meara looks at particular problems arising from the use of Spanish learners of English. At another level, specific differences in terms of grammatical mistakes or sequences of learning may be related to the different mother tongues of the learner. Indeed, the importance of L1 involvement in L2 learning has happily occupied L2 researchers as an area of its own. On the one hand, there is Contrastive Analysis, that sets store by comparison of the two languages yielding predictions of learner errors; on the other, there is Error Analysis, that uses L1 transfer as one of several explanations for the peculiarities of learners' languages.

As well as variation between learners there is also variation between situations. One overall difference is between those who acquire the L2 through self-structuring "informal" interaction with native speakers and those who acquire it by structured "formal" teaching in a classroom. Some L2 researchers have seen this difference as crucial and have separated "naturalistic" and "taught" second language learning (Wode, 1981). Others have pointed to the mixed nature of actual classrooms in which both naturalistic and taught learning coexist (Krashen, 1976). A compromise approach to "natural" development of classroom learners is to tackle aspects of language that teachers do not cover in class, such as the difference between "eager to please" and "easy to please". Other situational differences may also be crucial to L2 learning, for example, the difference between immigrant and non-immigrant situations discussed in the next chapter.

Learners and situations do not, however, exist independently from each other. One theoretical position called "interactionism" sees all behaviour as an interaction between the person and the situation; an observation of a person's behaviour reflects what that particular person does in that particular situation, and is true neither of all people nor of all situations (Magnusson, 1974). We cannot say that someone is intrinsically intelligent, for instance, merely that he displays intelligent behaviour in a particular range of situations. Thus in L2 research individual factors cross with situational factors (Cook, 1981b). If we want to study, say, the factor of sex, it would be dangerous to compare boys and girls without specifying the situations within which they are to be compared. A comparison of school-based L2 learning might well demonstrate the superiority of girls whereas a comparison of natural situations might show the reverse; whatever differences there are could be connected with situation, necessitating a four-way comparison of both types of learner in both situations rather than a two-way comparison between girls and boys.

 3. Manipulation of Situation and Measurements of Effects

The two earlier methods accepted the learner's behaviour as it stands. A learning experiment however tries to see how behaviour is affected by something; it measures the differences between before and after. A typical framework compares two groups of learners, who start as similar as possible and are then treated in different ways. One, the control group, is given the minimum of treatment; the other, the experimental group, is given special treatment to test the question the researcher has in mind. Then both groups are given the same test and the differences that arise between them can be ascribed to the different treatments they have had. So a typical piece of language teaching research compares two groups of students who have been taught in different ways, as for instance in the paper by Garcia and Winitz in this volume. Alternatively, the treatment remains the same but the students are different; we see which group does best. Or, indeed, as well as the learner and the situation, the language itself is precisely controlled through a miniature artificial microlanguage (MAL), as in the papers by Esser and Kosslin and by Nation and McLaughlin in this volume. Let us call this type of research the "manipulative method".

The classic form of the manipulative method in L2 learning is, then, the comparison of teaching methods. Large scale projects with this basic design were set up for assessing the advantages of, for example, the language laboratory (Green, 1975), and the audiolingual method (Smith, 1970). Smaller scale research on these lines has also been common, such as the studies of listening-based methods looked at in the next chapter. If L2 research is to be applied to language teaching this kind of method seems vital; to take a partisan view, language teaching is forever doomed to swing like a pendulum from one method to another without a basis in scientific evidence, if there is no controlled research into its effectiveness. There are nevertheless many pitfalls; few of the major projects mentioned escaped a barrage of criticisms. One problem is the extent to which overt changes in teaching method actually affect the students' experience of the classroom. Many other factors in the situation may be more crucial. Different classes may be taught by different teachers, whose personality and teaching style affect the issue, despite the fact that they are both using the same teaching method. Or, if one teacher teaches two classes by different methods, there is likely to be a clash between teacher and either the experimental or controlled method; he may have extra enthusiasm for the new method or be fondly attached to the control method. The same personality preferences are of course true of the students. A comparison of different classes in a school also has to take into account any differences between the classes, whether in their selection or some other factor; for instance, the American psychological practice of referring to grades rather than ages in schoolchildren is potentially dangerous if children are advanced or retarded grades for any reason and the grouping is no longer purely differentiated by age. A comparison between different schools raises a host of problems about the possible differences in the social class of the students or in the nature of the institutions themselves. The paper in this book by Peter Skehan discusses whether method comparison needs also to take learner types into account.

Evidence and Measurement

While it may seem obvious that evidence to settle issues in L2 learning should come from L2 learning, this has sometimes been overlooked. The audiolingual method interpreted behaviourist psychology as saying that a second language should be learnt as a set of habits by practice and overlearning. Very little, if any, of the psychological evidence concerned L2 learning; hence the theory was accepted on the grounds of first language acquisition. It may be that first and second language learning are substantially identical and that theories of L1 acquisition carryover to L2 1earning by analogy. But this is in itself a research issue to which many people have addressed themselves, with fairly inconclusive results. What is even more unsatisfactory is that the behaviourist theory of first language learning was itself not based on research with children but on analogy with experiments with rats and pigeons. Evidence from L2 learning clearly seems preferable over that drawn from a different field, unless the analogy between them is watertight.

The type of L2 evidence is also crucial. The observational method confines itself to naturally occurring evidence; the other methods take in a wider range. In particular, a decision must be made about the aspects of language or language learning to be studied. For many years the bulk of L2 research has been conducted within a structuralist model of language, though not of language learning, with great emphasis on syntax; the crucial aspects of language were seen as grammatical morphemes, such as possessive "s", and major sentence types, such as questions or imperatives (Padilla and Lindholm, 1976) or grammatical categories such as auxiliary (Cancino et al., 1975); phonology was usually discussed within a phoneme framework. It was comparatively uncommon to come across research into other aspects of syntax, such as sentence embedding (d'Anglejan and Tucker, 1975), or phonology, let alone semantics. Only recently has some work started to be done within more current views of discourse or of syntactic description, represented in this volume by White's paper. In general, L2 learning research established its independence from linguistics by taking for granted the central importance of conventional structuralist grammar. This limits what it has to say, not only to one view about language, but also to certain areas of language within that view. While phonology and semantics have had their devotees, they have not been developed to nearly the same extent as surface grammar. A typical introduction to second language learning such as Language Two (Dulay, Burt and Krashen, 1982) is almost totally concerned with phrase structure, syntax or morphology; phonology is allocated thirty pages; semantics, lexis and vocabulary do not even figure in the index. In principle, nevertheless, the evidence for L21earning could come from many different theories of language, and from different aspects of language within these theories. The articles in the present volume try to cover a wider area, ranging from pragmatics and generative syntax to vocabulary, inter alia.

Evidence of the learner's behaviour can be gathered from many sources; potentially we are concerned with the full complex picture of L2 learning. Three broad types of evidence can be drawn on.

1. Introspective Data

One possibility is to ask people to examine their own feelings and experiences about L2 learning. At one time introspective study of one's thought processes was of great importance; Wundt, the founding father of psychology, after all, talked about his subjective experiences of listening to metronome beats (Miller, 1964). While introspection fell into disrepute under the behaviourist regime which accepted "objective" visible evidence, in linguistics it was legitimised by Chomsky's argument that the linguistic intuitions of the native speaker could be formalized through a rigorous form of statement; to quote an early precursor "introspection may make the preliminary survey, but it must be followed by the chain and transit of objective measurement" (Lashley, 1923). One type of introspection commonly used as evidence by linguistics is the native speaker's judgement of whether sentences are grammatical or acceptable, exemplified in White's paper in this book. A recent revival of introspection in psychology is signalled in a paper by Ericsson and Simon (1980) who argue that verbal reports of mental processes are accurate provided that certain conditions are met; namely, that the process involved is related to language, that it does not need extra processing to be available for the individual to report, and that it is not transformed in some way before the point at which the report is made. In L2learning, people's learning or processing strategies can be established by asking them what they are doing. Cohen and Hosenfeld (1981) distinguish between two types of introspective activity. One is "thinking aloud" - verbalizing about what is going on in one's head as it happens. (An interesting example of this technique is seen in the contribution to this volume by Dechert and Sandrock, which reports an experiment in which a German student gave a running commentary as she translated a passage from English to German.) The second type of activity described by Cohen and Hosenfeld is "self-observation", which can range from semi-analysed discussion to highly abstract accounts of mental processes or motivation.

Further provisos on introspective evidence can be added. The question that learners are asked must be put unambiguously; there is no point in asking "Do you think children or adults learn foreign languages more easily?" if it is not clear whether learning means a conscious knowledge of the rules or an ability to use the language in real communication, to name two possibilities. In other words, since there is no objective check on introspection we have to ensure that the questions are understood by the learners in the way that we intend. Secondly, the accuracy of the answers cannot necessarily be relied on; given the best will in the world the learners may be influenced by their desire to give the answer they think we will like, or they may simply not be very good at making generalizations about their own behaviour-after all, if they were, there would be no need for a discipline of psychology. Introspective data can yield interesting information about people's opinions and these make up an important component in the learner's mental makeup, which affects learning in many ways. For instance, Carol Hosenfeld (1976) has produced interesting studies based on L2 learners' accounts of how they tackle various teaching techniques. But it is clearly dangerous to take such self-observations at their face value as reflecting what actually happens, rather than what the learner feels happens.

 2. Natural Data

Another possibility is to use the observational method to collect evidence. The researcher notes what happens, while trying to influence it as little as possible. Then he links it to some measure of the learner's success, for instance the equivalent language of the native speaker. For example, the strategies used by good language learners were studied in part by seeing how L2learners behaved in classrooms (Naiman, et al., 1975). Similarly, conversational strategies have mostly been established by analysing transcripts of learners' behaviour, as for instance in Hatch (1978). The usefulness of natural data depends on three conditions being met:

(a) That the evidence can indeed be observed. One problem with language is that what goes on in the mind is Dot visible. Observations of speech tell us about the visible tip of the iceberg from which we hope to deduce the invisible.

(b) That it occurs with sufficient frequency. Suppose we wanted to examine the development of the passive voice in L2learners talking in classrooms. Unless it were the specific teaching point or the type of language was very specialized, the number of  passives observed per hour is likely to be rather small. Observations of natural situations can be frustrating because enough examples of any given phenomenon may not occur-another of the well-known problems attached to using a corpus.

(c) That there is a real cause and effect relationship between the behaviour and the factor held responsible. A well-known statistical tall story concerns the status of lice in the New Hebrides (Huff, 1954); the natives used to claim that lice were good for the health because they are never seen on a person who is ill; the evidence completely confirmed this. However, the explanation was that lice prefer to live at the normal human temperature and shun people whose temperature is higher. Rather than causing health, lice avoid illness. One needs to demonstrate that the effects really have the cause ascribed to them, something that is never totally possible with a non-manipulative method, as the people who dismiss the harmful effects of tobacco argue.

 3. Controlled Data

The data can also be specially created for the research, so to speak. The learner is put into a particular situation that will yield the type of behaviour we want to study. The controlled data that this affords can be put into two categories, according to whether it is directly or indirectly related to language.

 Direct data can come from the non-authentic language gathered in an observational method; for instance, instead of recording the student in a real life situation, he is interviewed in front of a TV camera as was done by Claus Faerch in the PIF project at Copenhagen (Sigurd and Svartvik, 1981, p. 214). This yields data that is easier to obtain and to handle than unorganized recordings, and by adapting the situation or the topic, it can be slanted towards the syntactic structure, or whatever, that we are interested in. Controlled data may also be the result of a comprehension task where the learner has to attend to some particular issue, say relative clauses. This shades over into the many straight orthodox techniques for measuring aspects of the learner's language. The difference between using these for L2 research purposes and as tests is chiefly that L2 research is more interested in the results as indications of language learning in general, say within the difference method, than in the implications for the individual learner as part of the educational process.

Indirect evidence comes from a variety of psychological techniques that provide data that is not about language itself but connected with it by a process of inference. Learners are given a language-related task and aspects of their behaviour other than language are measured. Then the nature of their language is inferred from the measurements. A typical example is reaction-time experiments, in which the time that people take to comprehend sentences is taken as an indication of the level of proficiency in the language; Lambert (1955), for example, showed it was possible to establish the dominant language in bilinguals by finding which had the longest reaction time.

One possibility is indeed to use a measure that people have already found fruitful elsewhere; an L2 research experiment often starts from some measure that is used in the study of first language acquisition or of native speaker performance, the main advantage being that it is known to work. For example, a standard controlled test of children's vocabulary development could be adapted to L2 research. Ram say and Wright (1974) used such a test, the Ammons Picture Vocabulary Test, to measure the effects of age of arrival on the language of immigrants. A borrowed technique can specifically test whether there are similarities or differences from L1 learning or L1 processing; the point is precisely to replicate work from other areas in L2learning. Or the measure is used to establish something about L2 learning itself, its origin being irrelevant. However, borrowing measures from outside does limit what can be found out about L2 learning to that which is accessible through the methodology of other fields; it misses just those phenomena that are peculiar to L2 learning. Hence there are strong arguments for designing special measures whenever it is felt that L21earning has some distinctive feature of its own. Controlled data has the advantage that it yields the information we are looking for. It has the disadvantage of artificiality. An experiment deliberately sets up a situation that is not the everyday world, and this can eliminate relevant real-life factors; the baby may be thrown out with the bathwater. Great caution must be observed in generalizing from controlled and limited data to general, real-world situations; the behaviour that is studied must correspond with something outside the laboratory if it is to have any ultimate relevance. There is, then, a continual tension between "internal validity" (the attempt to make the experiment as rigorous as possible) and "external validity" (the attempt to make it reflect something in the world outside the laboratory). One step in this direction is to ensure that the results are not just a matter of accident. The way to do this is partly through statistical techniques, descriptions of which can be found in books such as Hatch and Farhady (1982) or Robson (1973). Some techniques attempt to show that the results are reliable, or, to use the technical term, "significant"; does the experiment use an appropriate statistical technique to show that its results can be relied on? Other techniques reveal ways of handling the data, such as the use of cluster analysis in the paper by Peter Skehan in this volume. It is, however, easy to put too much faith in statistics. To quote M. H. Moroney, "It is an easy and a fatal step to think that the accuracy of our arithmetic is equivalent to the accuracy of our knowledge about the problem in hand. We suffer from delusions of accuracy." (Moroney, 1956). This must be particularly true when one is dealing with language, little of which is quantifiable, and the part that is may be the least important.

The Rationale for an Experimental Approach

Paivio and Begg (1981) put the basic rationale for an experimental approach in a nutshell; " An experiment is a controlled look at nature. The experimenter sets up a situation in which the structure of the task is explicit, the nature of the performance being studied is explicit and the question that is being asked is precise." Much L2 research makes use of elements taken from experiments without strictly conforming to this design. Let us define an experimental approach as one that tries to specify the relevant background of the learners, to ask precise questions, either to control the situation in which the data is collected in an appropriate way or to take into account its restrictions, to support its statements with objective measurable evidence, and to argue carefully from these results to wider conclusions. An experimental approach starts from the realization that the real world is a complex bundle of many things; it focuses on a single aspect at a time in order to establish its nature; it brings everything down to a single, precise question. Having posed the question, it looks for evidence that is explicit and objective. From this evidence it argues its way back to the original issue.

An observational method may, then, fit within an experimental approach-provided that the researcher makes clear the elicitation techniques involved and is aware of the ways they may have shaped his data; provided that he quantifies the data in some way and shows what proportion of it he is accounting for; provided that his techniques of recording the data are objective and yield the same results whoever applies them; then he is clearly providing objective data within a controlled situation. What lies outside the experimental approach is research which uses evidence collected in an unorganized way and presented in an unquantified way to make generalizations about L2learning. Studies of this kind are powerful precisely because they throw caution to the winds and ignore conventional safeguards. Halliday's account of his own child's development from "diary" studies is, after all, one of the major contributions to the study of child development (Halliday, 1975). In L2learning, W. F. Leopold (1939-1949) provides a good example of a diary study of his two bilingual children. Such work can provide an initial insight, a brilliant new generalization or theory, or simply some intriguing and unexpected data. It does need, however, to be checked out in due course by research based on methods of sufficient rigour.

The choice of question is also vital. Innumerable questions could be asked that would yield impressive results of no value at all. On the one hand the research might demonstrate something that is so obvious it was hardly necessary to prove it; there is little need for example to demonstrate the disastrous effects of jumping out of a plane without a parachute. The difficulty with second language learning, as with other areas, is knowing what is obvious and what needs to be proved. To many it was obvious that young children were better at L2 learning than adults; actual research is far from conclusive, as we see in the next chapter. On the other hand, an experiment can show something that is interesting in itself but-has no connection with any wider issue or body of knowledge. A single isolated experiment may lead nowhere: it has to be linked to an overall theory, to have a plausible connection with broader or deeper ideas. Nevertheless, sometimes a one-off experiment is intriguing; an L2 learning example is Lyczak (1979), who shows that people who listen to a language without understanding it for a few hours learn it better when they are subsequently taught it than those who had not listened to it. While this experiment does not relate to other work in L2 learning, potentially it can suggest interesting possibilities.

 Overall general questions need, then, to be carefully restricted to yield results within an experimental approach. For instance, one such question is whether second language learning is like first language learning. The difficulty here is that the question takes for granted one of the terms in the comparison, namely "first language learning". Without specifying what theory of first language learning or which aspect of language is involved, or defining how similarity is to be measured, the question is unanswerable. A further general question is "Why are some people better at learning second languages than others?"-again fairly meaningless without any specification of "better": academic knowledge of the language, ability to engage in conversations, or what? Another frequent question is "Is it a good thing to be a bilingual?" Even accepting the questioner's assumptions that bilingualism is in some way peculiar rather than being the normal condition of much of the human race, the question still needs to specify more detail; what kind of bilingualism? What level? Good for what? The answer can only relate to bilingualism within the context of a particular culture and society.

So far as the applications of L2 research to language teaching are concerned, many have fallen into the trap of taking certain narrow conclusions and applying them to general areas without sufficient qualification. One may investigate the order in which eight grammatical morphemes are used by 151 Spanish-speaking children aged 6-8 learning English-a study of one age-group, one mother tongue, one target language, one immigrant situation, one aspect of grammar, tested in one way. Is one entitled to use this, as was indeed done, as a major component in answering the question "Should we teach children syntax?" (Dulay and Burt, 1973). A true application of L2 research depends on evidence that is appropriate; in other words it must apply to the type of learner, to the aspect of language that is in question, and to the situation in which the learners are placed.

The Ethics of Research

A point that has now to be made concerns the ethics of research. A manipulative approach that changes the world has to decide whether these changes are in the interests of society in general and of the people it is affecting in particular, though other approaches are not immune to ethical problems as well. The problem for first language acquisition research is that experiments to settle most of the major issues are ruled out because of their effects on the children who take part in them. A systematic test of language deprivation would be to see what happens when a child is totally deprived of language for 10 years. Psammetichus allegedly isolated two babies till they said their first word, which happened to be the word for bread in Phrygian (Fromkin and Rodman, 1974)-a type of experiment only possible to Pharoahs. Hence the interest of linguists in Genie, a girl whose parents had deprived her of language until her early teens and thus carried out the experiment that they themselves were forbidden (Curtess, 1977). Suppose in a less dramatic form that a researcher hypothesizes that a particular type of interaction, lets say peekaboo games, is crucial to first language acquisition (Bruner, 1983); an experiment could be designed in which one group of mothers were encouraged to play peekaboo, the other group prevented. But a researcher who sincerely believes in his hypothesis must feel he has done harm to one group of children, who might well have prospered without his interference. Paradoxically only if his hypothesis is of trivial importance could he say that the effects do not matter. For these reasons many questions about the relationship of environment to language learning lend themselves to observation rather than experimental manipulation; at least only the children's own parents are to blame. In first language acquisition a brief syntactic experiment lasting a few minutes may do minimum harm. But some risk is unavoidable. Carey (1978) showed that months after hearing a word a handful of times children knew more or less what it meant; Nelson (1982) advocated a rare-event theory of learning, where exposure of the child for 20 minutes to a syntactic structure for which he is "ready" can change a child from a non-user to a user. In other areas experiments have indeed had to be abandoned because of the effects on the subjects. At a personal level, I was once proposing to administer a syntactic test to children under 5 in which they had to show whether the dog bit the cat or vice versa, a common enough experimental technique; I had to spend a morning explaining to an educational psychologist that this did not have overtones of oral sadism before I was allowed to carry out the experiment!

In second language learning the ethics are less complicated. First of all the effects may not be so serious; a slightly less efficient way of using an L2 may not be a severe handicap to the people that are involved, certainly not the same as marring the child's first language for life. Secondly there is a relationship to teaching. In a sense, language teaching is a large-scale experiment, deliberately structuring the world for the language learner; teachers chop and change continually to get the optimum results. However, it is still true that they are aiming at positive results and, like the first language researcher, cannot responsibly commit themselves to bad teaching simply to demonstrate that it is bad. Research comparing two teaching methods is usually conducted by someone who actually prefers one of them and so is sacrificing the interests of some students who will get, in his view, inferior treatment in favour of the long-term interests of all language students. An interesting example of this is seen in Wesche (1981). She selected matched pairs of students who were both best suited to an "Analytical Approach" and put one member of the pair in an AA class, the other in an AudioVisual class. She found that "the appropriately matched students ... achieved superior scores on three of the four achievement measures of listening comprehension and expression". Half the students were put into a situation that she believed was unsuitable to them; as individuals they had clearly misplaced their trust in the teacher to do his or her best by them. It is a nice point whether the wrong to these individuals is outweighed by the gains to later generations of students.

However, L2 learners can sometimes give their consent to the experiment; they can be put in the picture as to what is happening and as to any risks involved. Yet such discussion with the subjects can never be completely honest. It is impossible to tell a group of students that they are the control group who are supposed to do badly, as Wesche could have done, as it would simply be a self-fulfilling prophecy. Nor is it often possible to tell them exactly what is being tested without altering their behaviour. Quirk and Svartvik, for example, tested ideas of grammaticality not by asking people if they thought sentences were grammatical but by getting them to change the sentences in various ways; at the end of the test many of the subjects thought they had been taking an intelligence test (Quirk and Svartvik, 1966).

Even during the experiment, ethical considerations may have to be taken into account. To take a personal experience from the same cats-and-dogs experiment mentioned earlier, one of the 4-year-olds started telling me what seemed to be a fantasy about her appearance in a court case that week. At this point I turned into a sympathetic adult and abandoned the precise conditions of the test - this was fortunate as it turned out that it was a true account of what had happened to her and it was probably helpful to her to talk it over with someone. Strictly speaking, part of the data had been thrown away. Humanely speaking, little other alternative was possible.

Designing a Research Project Within the Experimental Approach

Finally, let us see how a broad question about language could be turned into the precise form of an experiment. A suitable issue might be the perennial dispute about the relationship between speech and writing. This debate can be carried out in the form of questions.

1. Is speech more important than writing?

The answer "Yes" to this question formed part of the foundation for one school of linguistics, Bloomfieldian structuralism, and for one influential language teaching method, audiolingualism; indeed an affirmative answer is still assumed by most later schools of linguistics and language teaching theories, as shown for example by the assumption common among communicative language teachers that communication should be channelled through speech. What was this affirmative answer based on?

The justification usually presented for the primacy of speech is from history and language development (Bloomfield, 1933); in the prehistory of the human race speech occurs before writing; societies still exist with a spoken but not a written language; in the development of the child, speech invariably precedes writing and is acquired with ease by virtually all children; writing, however, has to be taught and some children acquire it with difficulty. Householder pointed out that this confused two issues - the way language develops and the way it is used by adult speakers (Householder, 1971). Language development is beside the point so far as the relationship of speech to writing in the mature native speaker is concerned, however interesting it may be in its own right; what matters is how they are connected now. On these lines he indeed argued that writing is dominant in adults' usage rather than speech. He was suggesting that the evidence provided an answer to the question "Does speech come before writing in the development of the human race and the human child?" rather than "Is speech more important than writing?" The applicability to second language learning is even more suspect; why should information about children learning their first language or Egyptians inventing hieroglyphics be relevant to people learning a second language? The audiolingualists' agreement with Bloomfield was premature; L2 learning is not directly concerned with either language history or first language acquisition. The relevance of Bloomfield's arguments for second language teaching depends at best on a strong analogy between first language learning and second language learning, which should not be assumed in advance. What is required is evidence to settle a slightly more precise question, namely:

 2. Is speech more important than writing in L2 learning ?

L2 learning itself has many aspects, each of which might be affected differently by speech and writing; the question needs to specify which aspect we are interested in. One of the crucial axioms of audiolingualism was that a word should always be introduced in speech rather than in writing. Let us assume, then, that vocabulary is the vital aspect involved. While gaining precision, we have inevitably lost generality; it may be that vocabulary learning is a peculiar issue of its own, unlike other aspects of L2 learning. Taking vocabulary acquisition as a particular test case the question can be recast as:

 3. Is speech more important than writing in the L2 learning of vocabulary?

There are still uncertainties to be resolved in this version. Most importantly, the question is phrased in terms that apply to all learners. Perhaps, indeed, all people learn second language vocabulary in the same way. Perhaps, on the contrary, there is great individual variation according to some factor in the learner's mental makeup; it might be a matter of cognitive style or motivation; it might be that older learners depend more on visual aids, perhaps. Let us therefore arbitrarily choose to study adults, thus eliminating at least some of the possible variations.

Question 3 also lacks specification of the mother tongue and the second language involved. Quite possibly the relationship between the two languages will have some effect on the learning of vocabulary. In the case of speech and writing, one particular problem is the type of written script; the relative importance of speech and writing may depend upon whether the script is ideographic, like Chinese, or semi-phonetic, like English. A further problem is the question of literacy, regardless of script; a learner who can read may bring certain preconceptions to second language learning, such as, say, the overall importance of the written language. The question lacks details of the mother tongue and target language involved. Let us then decide arbitrarily to study educated French learners of English. The question now reads:

4. Is speech more important than writing in the learning of English vocabulary by educated French adult learners?

The precise situation of the learners could also affect the result. Learners in classrooms are treated in a bewildering variety of ways, each of them is in some way involved in the question of speech versus writing; learners outside classrooms meet speech and writing in different proportions according to their jobs, to whether they are in their own countries, and so on. Let us decide to concentrate on one particular type of situation, say, technical schools in France. The question now reads:

5. Is speech more important than writing in the learning of English vocabulary by educated French adult learners being taught in technical schools in France?

In terms of an experimental approach the next step is to set up two groups of learners so that the only difference between them is that between reading and writing. To operationalise question 5 we need two situations where only vocabulary learning is involved and where the only difference between them is oral versus written presentation - the controlled situation of Paivio and Begg (op. cit). In this case we might consider a straight experiment of short duration in which two randomly chosen groups at the technical school are taught vocabulary in the different ways and their success assessed. Or we might take two otherwise equivalent classes in the technical school and require one to be taught orally, the other through writing; clearly this already introduces all kinds of complications - will the students in the two groups actually be equivalent? Will the teachers handle both methods equally well? And so on. Even if this ideal cannot be achieved, at least we should be aware of the other factors that might cause the effects we are concerned with. Let us assume then that we set up an experimental situation in which two groups of learners are taught vocabulary in speech and in writing respectively.

6. Is a group of educated French adult learners of English in technical schools in France who are taught orally better at learning vocabulary than an otherwise identical group who are taught through writing?

The next problem to consider is how we can measure whether one group is better than the other. Again our decision will reflect our model of language and our view of what is important: Range of vocabulary? Ease of use? Speed of reaction? Fluency of expression? And so on. We might collect specimens of authentic data from the learners and subject it to an analysis of the vocabulary. Or we might decide to invent our own test to cover the particular point we are interested in. Or to use an existing test, say, the English Picture Vocabulary Test (EPVT), a standard test of vocabulary development. But the point is to have a yardstick that will impartially measure the relevant aspects of vocabulary in both groups. If we adopt the EPVT, the question now reads :

7. Are the scores on the EPVT of a group of educated French adult learners of English in technical schools in France who are taught orally better than those of an otherwise identical group who are taught through writing?

Finally we still need to quantify the word "better": how much do one group's scores need to differ from the others to be called "better"? Partly, this is a matter of statistics and levels of significance; we need to be certain that the results are not sheer chance and this can be done through various mathematical tools. Our design and system of measuring must then permit the use of such tools. Partly, it is a matter of common- sense and interpretation; do we feel that the difference we find is actually important to language learning, whatever the statistics may say? Hence the ultimate form of our question is:

8. Are the scores on the EPVT of a group of educated French adult learners of English in technical schools in France who are taught orally significantly better than those of an otherwise identical group who are taught through writing?

This quest has tried to narrow down the research question so that it can receive a reliable answer. Instead of the overall, untestable proposition about speech and writing there is now a precise, answerable question. The technique of breaking a problem down into a single testable issue has a long history in science as the "method of detail" (Pacey, 1974). Galileo, for instance, tackled the problem of the strength of ships' hulls by reducing it to the strength of a single beam. The decomposition of a problem into elements is also often discussed as one of the benefits of teaching computing to children (O'Shea and Self, 1983). It is not then an approach unique to second language learning research.


d'Anglejan and Tucker, G. R. (1975) The acquisition of complex English structures by adult language learners, Language Learning, 25/2.

Bloomfield, L. (1933) Language.

Bruner, J. (1983) Child's Talk, O.U.P.

Burstall, C., Jamieson, M., Cohen, S. and Hargreaves, M. (1974) Primary French in the Balance, Slough, NFER.

Cancino, H., Rosansky, E. J. and Schumann, J. H. (1975) The acquisition of the English auxiliaries by native Spanish speakers, TESOL Quarterly, 9:4, 421-430.

Carey, S. (1978) The child as word learner, in M. Halle, J. Bresnan, and G. A. Miller (eds.), Linguistic Theory and Psychological Reality, M.I. T.

Cohen, A. (1977) Successful second language learners: a review of research literature, Balshaunt Shimushit, I.

Cohen, A. and Hosenfeld, C. (1981) Some uses of mentalistic data in second language research, Language Learning,11/2.

Cook, V. J. (1981a) Some uses for second language learning research in H. Winitz(ed.), Native Language and Foreign Language Acquisition, Annals of the New York Academy of Sciences, 179,251-258. Online version

Cook, V. J. (1981b) Second language acquisition from an interactionist viewpoint, ISB-U, 6/1, 1981-82. online version

Cook, V. J. (1985) Chomsky's universal grammar and second language learning, Applied Linguistics, 6. online version

Corder, S. P. (1967) The significance of learner's errors, IRAL, V/4.

Curtess, S. (1977) Genie: a psycholinguistic study of a modern-day "Wild child", New York, Academic Press.

Dulay, H. C. and Burt, M. K. (1973) Should we teach children syntax? Language Learning, 21/2, 245-258.

Dulay, H. C., Burt, M. and Krashen, S. (1982) Language Two, O.U.P./N. Y.

Ericsson, K. A. and Simon, H. A. (1980) Verbal reports as data, Psychological Review, 87, 3, 215-251.

Fromkin, V. and Rodman, R. (1978) An Introduction to Language, New York.

Gardner, R. C. and Lambert, W. E. (1972) Attitudes and Motivation in Second Language Learning, Newbury House.

Gass, S. and Ard, J. (1980) L2 data: their relevance for language universals, TESOL Quarterly, XIV/4, 443-452.

Genesee, F. (1976) The role of intelligence in L2 learning, Language Learning, XXVI/2, 267-280.

Green, P. S. (ed.) (1975) The Language Laboratory in Schools, Oliver and Boyd, Edinburgh and New York. H

alliday, M. A. K. (1975) Learning How to Mean, Edward Arnold.

Hammerley, H. (1982) Synthesis in Language Teaching, Blaine, Washington; Second Language Publications.

Hansen, J. and Stansfield, C. (1981) The relationship of field dependent-independent cognitive styles to foreign language achievement, Language Learning, 11.

Hatch, E. (1978) Discourse analysis and second language acquisition, in E. Hatch (ed.), Second Language Acquisition, Newbury House.

Hatch, E. (1980) Second language acquisition-avoiding the problem, in S. Felix (ed.), Second Language Development, Gunter Narr, Tubingen.

Hatch, E. and Farhady, H. (1982) Research Design and Statistics for Applied Linguistics, Newbury House.

Hosenfeld, C. (1976) Learning about learning: discovering our students' strategies, Foreign Language Annals, 912, 117-129.

Householder, F. W. (1971) Linguistic Speculations, C.U.P. Huff, D. (1954) How to Lie with Statistics, Gollancz.

Krashen, S. D. (1976) Formal and informal linguistic environments in language acquisition and language learning, TESOL Quarterly, 10/2.

Krashen, S. D. (1981) Second Language Acquisition and Second Language Learning, Pergamon, Oxford.

Lambert, W. E. (1955) Measurement of the linguistic dominance of bilinguals, J. Ab. Soc. Psych. 50,197-200.

Lashley, K. (1923) The behaviouristic interpretation of consciousness, Psych. Review, 10, 329-353.

Leopold, W. F. (1939-49) Speech Development of a Bilingual Child: A Linguist's Record, Northwestern University Press.

Littlewood, W. (1984) Foreign and Second Language Learning, C.U.P.

Lyczak, R. A. (1979) The effects of exposures to a language on subsequent learning, Language and speech, 22.

Magnusson, D. (1974) The individual in the situation, Studia Psychologica, XVI, 124-132.

Miller, G. A. (1964) Psychology: the Science of Mental Life, Hutchinson.

Moroney, M. (1956) Facts from Figures, Penguin.

Naiman, N., Frohlich, M. and Stern, H. H. (1975) The Good Language Learner, OISE, Toronto.

Nelson, K. (1982) Toward a rare-event cognitive comparison theory of syntax acquisition, in P. Dale and D. Ingram (eds.), Child Language: An International Perspective, University Park Press, Baltimore, 229-240.

O'Shea, T. and Self, J. (1983) Learning and Teaching with Computers, Harvester Press. Pacey, A. (1974) The Maze of Ingenuity, Allen Lane.

Padilla, A. M. and Lindholm, K. J. (1976) Development of interrogative, negative and possessive forms in the speech of young Spanish/English bilinguals, Bilingual Review, 111/2, 122-152.

Paivio, A. and Begg, I. (1981) Psychology of Language, Prentice Hall.

Peal, E. and Lambert, W. A. (1962) The relation of bilingualism to intelligence, Psychological Monographs: General and Applied, 76,27.

Quirk, R. and Svartvik, J. (1966) Investigating Linguistic Acceptability, Mouton.

Ramsey, C. A. and Wright, E. N. (1974) Age and second language learning, J. Soc. Psych., 94, 51-121.

Robson, C. (1973) Experiment, Design and Statistics in Psychology, Penguin.

Schumann, J. H. (1978) The acculturation model for second language acquisition, in Gingras, R. C. (ed.), Second Language Acquisition and Foreign Language Teaching, Center for Applied Linguistics.

Sigurd, B. and Svartvik, J. (1981) (ed.) Studia Linguistica, J5, 1/2.

Smith, P. D. (1970) A Comparison of the Cognitive and Audiolingual Approaches to Foreign Language Instruction: The Pennsylvania Project, Center for Curriculum Development.

Taeschner, T. (1983) The Sun is Feminine, Springer-Verlag.

Wesche, M. B. (1981) Language aptitude measures in streaming, matching students with methods and diagnosis of learning problems, in K. C. Diller (ed.), Individual Differences and Universals in Language Learning Aptitude, Newbury House.

Wode, H. (1981) Learning a Second Language, Gunter Narr-Verlag, Tubingen.