The Sound System of Language
[From Inside Language (1997), which is now out of print: this is the pre-publication doc file. Sorry
the pages don't work very well in Dreamweaver; some of the IPA symbols have been given as jpgs to try and get them right, though it makes it look a bit spotty. A glossary of phonological terms is available.
And of course the content is now rather dated!]
Why do people speaking English and French sound so different when the languages look so similar in writing, apart from a few accents? This chapter looks at the some of the properties of sound systems that languages have in common and some of the ways in which they differ. This chapter starts by discussing the ways in which languages use the pitch variation of intonation, goes on to the mechanisms by which speech sounds are produced, and then turns to how sounds are organised in speech. Its theme is the diverse ways in which languages make use of the same resources for producing speech.
The pronunciation of English is taken as a starting point. However, this decision immediately raises the problem of selecting one out of the many accents of English. Most textbooks choose the variety of English called RP, originally taken from Received Pronunciation, now mostly known simply by the initials RP on their own. Usually RP is thought of as the accent of educated British speakers of English, which does not vary according to region within England, though it differs from other world accents such as General American English. RP accent is then a different concept from the standard language of English, which has a consistent grammar and vocabulary almost everywhere it is spoken, apart from a few well-known local peculiarities.
Choosing RP as a reference point is not to suggest that more than a small minority of people in England speak it, on one estimate 3%. In general the majority of educated speakers in England nowadays have a modified RP with some regional forms. The Today news programme on BBC Radio 4 for instance has a range of English speakers each day. In a recording of a typical programme, although about two-thirds of the speakers were within the general bounds of RP accent, most of them had some regional forms. The continuity announcer for instance said fastest with the short a sound of lass /æ/ rather than the long ah sound of last /a:/, showing she probably comes from north of an east-west line from Wales to the Wash, rather than from the South. The weatherman pronounced throughout with an initial f, showing a feature often associated with a London accent. Variations in English accent will be dealt with in Chapter Seven.
1. Sounds and Meanings
This section looks at whether there is a logical link between the sounds of speech and meanings or whether the relationship is purely arbitrary. In a sense there is no reason why a particular speech sound should convey a particular meaning; a rose by any other name would indeed smell as sweet. But could Romeo in fact be called Fred? Juliet Gertie? Or Gertrude Tracey? Partly the naturalness of the names is due to our long familiarity with the characters these portray. What could Gussie Finknottle be called instead of Gussie Finknottle? Anna Karenina? James Bond? Irma Prunesqualor? John F. Kennedy? Why do rose-growers prefer to name roses Ingrid Bergman or Queen Elizabeth rather than Olive Oyl? However, the sounds in a handful of words do fit their meanings better than chance would suggest. Take the following two objects:
Object A Object B
Fig. 1 Names for arbitrary shapes
Suppose that in an unknown language one of these objects is called a pling, the other a plung: which is which? 71% of the Inside Language panel thought Object A was a plung, only 25% calling it a pling (Q4). Something about the vowel u conveys a sense of large and darkplung: something about the vowel i conveys a sense of small and lightpling.
Several English words with low vowels like u or a indeed suggest something large or dark: huge, large, vast, enormous, humongous. Others with high vowels such as i mean something small or light: little, teeny, wee, titchy, mini, pygmy, Lilliput, thin, itsy-bitsy teeny-weeny polka-dot bikini.. In French small is petit, big is grand, in Greek mikro and megalo; in Spanish chico and gordo, in Dyribal, an Australian Aboriginal language, /midi/ and /bulgan/, and in Mandarin Chinese xiaoé and daç.
John Ohalla has explained this phenomenon through the Frequency Code. Vowels with high frequencies such as i go with small size and sharpnesspling. Vowels with low frequencies such as u go with large size and softnessplung. The Frequency Code hypothesis claims that low sounds in general go with aggressiveness and assertion of power, not just vowels; Margaret Thatcher is believed to have had speech lessons to deepen her voice. The Frequency Code applies across languages, and indeed across species; dogs threaten with a low-pitched growl, but submit with a high-pitched yelp. However, in practical terms, the Frequency Code difference between high and low vowels works for only a handful of words in each language. It is after all contradicted by the vowels in the very words big and small themselves!
Not only individual sounds but also certain combinations of sounds tend to go together with certain meanings. The sn combination in English often suggests something to do with breathing noises and the nose: sniff, sneeze, snout, snot, sneer, snuff, snorkel, snooze, snore, snicker, snivel, snort, snuffle. A person who goes round with their nose in the air might be snooty, snub people, and a bit of a snob. Other combinations in English that suggest particular associations are:
A group of ion words is popular in the rhymes of reggae songs, revolution, generation, appreciation, consideration, nation, though it is not clear why this sound combination should be attractive. Indeed academic speakers often share this preference; a recent talk emphasised marketisation, negotiation and theorization.
Some idea of which sounds English speakers consider pleasant and unpleasant can be obtained from the names that science-fiction writers invent for aliens. Classical hostile aliens have short aggressive names like Kryptons, Vatch, Rull, Glotch or Perks. Neutral-seeming or friendly aliens have polysyllabic names like Alaree, animaloids, Osnomians, Voltiscians and Eladeldi.
One of the areas in which speech sounds might be expected to be closest to their meaning is when they are linked to actual noises. Here is a sample of familiar domestic and barnyard noises as portrayed in different languages:
cats dogs sheep cows roosters
English: meow woof woof baa moo cockadoodloo
Japanese: nyanya oue-oue mee moo kokekokou
Persian: meyu wag-wag ba? ba? mâ mâ gogoligogo
Hokkien: meow wo-wo meeeh hmoo kok-kok
Thai: meow bog-bog bae mor ek-i-ek-ek
Greek: yiau jav jav bee muu cucuricuu
Spanish:miau guau-guau mee muu quíquiriqúi
Dholuo: ywak guu guu meeem boo kokorioko
French:miaou ouah ouah bêêêh meuh cocorico
Korean:yow-ong mong mong meh-eh-eh um-meh cork-eeyo
German:miau-miau wau-wau baa-baa muh-muh kikeriki
as these are written in the Roman alphabet rather than a phonetic script, they
do not represent the spoken sounds adequately, only the written forms.
Source: mostly Essex University students and staff
2. Intonation and its function
[For examples of tones see Youtube]
However, despite the emotional overtones conveyed by a handful of speech sounds, the aspect of sound that is most associated with emotion is the intonation patternthe way that the pitch of the voice rises or falls. Most people probably do not even consider intonation to be part of the sound system of language; it seems the natural way of speaking and conveying ones attitudes. Yet this resource is used in very different ways in different languages, which can cause serious misunderstandings between their speakers. This section looks at part of the intonation system of English, comparing it with the system in languages like Chinese.
The nuclear tones of English
This is called a high-fall tone. A high-fall tone often makes the speaker sound interested and
involved in what he or she is saying:
A: Would you like a coffee?
B : Yes!
This might be an incredulous repetition:
Well he proposed and I said yes.
B: ä Yes?
This often sounds definite and serious:
A: Am I all right, doctor?
A low-rise on the other hand starts low and rises to the middle of the speakers range, as in yes:
This sounds cool and perhaps indifferent:
A: Can you help me?
English has two more tones that change pitch direction rather than going continuously up or down. One is the sceptical-sounding fall-rise.
A: Do you agree?
B: Yes. There may be
The other is the enthusiastic-sounding rise-fall:
A: Do you want to go to the Bahamas for Christmas?
Finally there is a level mid or high tone, which occurs less frequently than the others and is typically used for calling people, as in cooee.
c o o e e
This might be a mother calling a child from the garden.
Needless to say, a full account of English needs many other features of intonation than these seven tones. In particular it needs to describe the system for choosing which syllable to put the tone on, such as the difference between Susan liked Joan, Susan liked Joan, and Susan liked Joan. Nor is there agreement that seven tones is the actual number required, some linguists reducing the seven tones to a two-way contrast between rise and fall, some elaborating them with additional patterns, such as the fall-rise-fall or the mid fall.
The ways in which speakers use these tones is even harder to pin down. A particular tone often goes with a particular grammatical type of sentence. Jones with a high-fall is an answer to a question:
A: Whos that?
Jones with a low-rise is a polite checking question:
A: Jones? Youre next.
Sometimes, however, the tone forces the other person to reply in a particular way, as in tag questions such as did he? or arent you? added to sentences in conversation. Youre Peter, aren't you? has a high-fall on the tag arent you which invites the person addressed to agree. Youre Peter, arent you? with a high-rise on the tag arent you? leaves it open to them to agree or disagree.
On Tuesday I went to London
has a low rise on Tuesday at the end of the phrase but a high fall on London at the end of the sentence.
Intonation across languages
When learning another language, the emotional overtones of intonation often create difficulties. Speakers use intonation without thinking and it seems to them the natural way of expressing their emotions. Though there are similarities across languages such as the rise/fall distinction, each language has an intonation system of its own. Certain intonation patterns are dangerous in that they convey something different in another language.
A person learning a second language may produce a perfectly plausible intonation for the new language but convey the wrong emotion. An overseas student used to say Good bye to me with a dramatic high-fall on bye suggesting that she had been mortally offended. Similarly thank you with a low rise may suggest that L2 speakers are casual and offhand when it is simply the normal polite intonation for saying thank you in their own language.
While there are many other attributes to national stereotypes, intonation certainly makes a contribution. To the English, for example, Germans are serious, Italians are excitable, reflecting how they would react to an English person who spoke with their characteristic intonation patternshigh pitch by Italians, low falls by Germans. An English newspaper finds, for instance, that Arnold Schwartzenegger speaks in a flat Austrian monotone. In these cases the listener is interpreting the intonation as conveying the emotion that an English speaker would intend rather than as the transfer of features from a different intonation system. Unfortunately native speakers of a language do not interpret a mistake in intonation as a foreigners mistake but regard it as a deliberate attempt to convey a particular emotion. Despite having taught intonation, I only realised Good bye with an emphatic high fall on bye was an intonation mistake when she said it every week.
For the most part, English intonation adds an overtone to a word rather than forming an integral part of it. High fall Yes is yes plus polite interest; Jones is Jones plus polite query; low rise arent you is arent you plus demand for agreement. The meaning of the word itself does not change with the intonation pattern; yes is still yes whether it has a high-rise or a low-fall.The function of intonation in tone languages like Yoruba or Chinese is very different. In the Songjiang dialect of Chinese, for example, the syllable di can be pronounced with three different tones. di with a high-fall means low; di with a level tone means bottom; di with a high rise means emperor. The same sequence of sounds is three distinct words, depending on the tone used. The Chinese tone is an integral part of the word which distinguishes it from other words, rather than something that is tacked on to give a grammatical or emotional extra.
The fact that intonation in tone languages like Chinese shows differences between words rather than emotions and attitudes inevitably poses problems when its speakers come to learn a language such as English. Chinese learners may have difficulty in using intonation to convey emotional overtones in English and hence their English may sound unvarying in emotion. Going the other way, English speakers may easily say the wrong Chinese word if they use the wrong tone: a person who went to a shop and asked for li zi (rising tone) would get pears; li zi (fall rise) would get plums, and li zi (falling) chestnuts. [Note some Chinese students tell me this example doesn't work, though it was supplied by a Chinese teacher]
3. Producing the sounds of speech
This section outlines the complex ways in which speech sounds are made, starting with vowels and going on to consonants. As some parts are unavoidably dense, it can partly be used as a reference source for later chapters rather than read straight through.
Even in languages with a writing system based on sounds, the letters of the alphabet reflect the sounds of speech rather poorly. English has a single "th" spelling in thing and then but two different th sounds. It is therefore necessary to use a phonetic script that reflects the different sounds of speech more accurately than the usual written forms. So the English word thing is transcribed with a phonetic symbol for the th sound / /, plus one for the vowel //, and one for the final / /, i.e. //. The word then, however, starts with the other th sound /ð/, followed by an /e/, followed by a different nasal /n/, i.e. /ðen/.
The symbols of phonetic transcription are usually distinguished from normal writing by being put within slanted lines or square brackets: /ki:/ versus key or/men/ versus men. Since the reference point here is mostly English, the phonetic symbols are those conventionally used by linguists for describing English, found with slight variation in most dictionaries and guides to English pronunciation. One minor difficulty is that British and American linguists have alternative symbols for some of the sounds of English, for example /a/ versus // for the vowel of hot.
The symbols for English ultimately relate to the phonetic alphabet laid down by the International Phonetic Association (IPA), which was devised a hundred years ago to provide a means of writing down the sounds of all languages in a consistent fashion and has been revised many times since. The figures for the languages of the world in this chapter are based on those calculated by Ian Maddiesen from a sample of 317 representative languages in the UCLA Segment Inventory Database (UPSID). The percentages for particular features refer to this sample rather than to all the languages known to exist.
As well as having smoothly-flowing air, a vowel involves voice produced by the vocal cords in the throat. These are flaps in the larynx which open and close rapidly during speech to let out puffs of air, producing a basic vibrating noise, in much the same way that a saxophone reed is vibrated by blowing through it. How fast the vocal cords vibrate affects the pitch of the sound; the individual vibrations can be felt in very slow speech. This sub-section deals with pure vowels which have a continuous sound; diphthongs in which the sound changes are described in later sections.
The two dimensions of tongue position
The sound produced by vibration is modified by the size and shape of the air spaces through which it then passes. A baritone saxophone produces a deeper note than a soprano saxophone because its internal air space is far larger. The characteristic sound can be changed within limits by making temporary adjustments to the permanent air space; saxophone players alter the length of the air space inside the saxophone tube with their fingering to change the note that comes out.
In the same way, speaking modifies the space inside the mouth by altering the position of the tongue. When the tongue is towards the front of the mouth, the empty space takes a particular overall shape, thus affecting the sound. The sounds produced with the tongue towards the front of the mouth are called front vowels, for instance /e/ (men) and /æ/ (man). When the tongue is moved towards the back of the mouth, a different shaped air space is created, producing back vowels, such as /u:/ (loot) or // (lot). In between the front and back positions of the tongue come central sounds, such as the // vowel (bird). All vowels vary in a dimension from the front to the back of the mouth.
The height of the tongue also affects the air space in the mouth. The // in sit, the /e/ in set, and the /æ/ in sat are all front sounds but differ in height. When the tongue is raised towards the roof of the mouth, a high vowel is produced like // (sit) and /u:/ (room). When the tongue is lowered towards the bottom of the mouth, low vowels are produced such as /æ/ (sat). In between come mid vowels such as /e/ (Ben) or / / (firm). Much of the variation in vowels amounts to changes in the position of the tongue in the two dimensions from front to back and from high to low.
All vowels can be located somewhere within this two-dimensional space. Often, to make the description easier, the two dimensions are each divided into three areas and the front of the mouth is slanted to correspond better to the shape at the front of the mouth:
To make the locations within this space precise, the different points on the perimeter are assigned Cardinal vowels, rather like the points of the compass. The Cardinal vowels are theoretical rather than having an actual existence in any language. The most extreme close and high vowel that the human mouth could possibly produce is Cardinal [i], the most extreme close and back (and rounded) vowel Cardinal [u], the extreme front and open is Cardinal [a], and the extreme back and open is Cardinal [a]. Other cardinal vowels are provided for each of the reference points on the perimeter and for rounded versus unrounded vowels. Thus any vowel in any language can be located with reference to this grid. English /i:/ bee for example is near to cardinal while English // dog is at the back, a fraction above cardinal /a/. The following diagram also gives the approximate positions of the RP /u:/ (moon), and /æ/ (pat). The RP pure vowels are given in full in the diagram in the box.
This diagram is as abstract as representing the solar system as rings round the Sun.
Counting pure vowels rather than diphthongs, eighteen languages in UPSID have only three vowels, mostly conforming to this triangular pattern of /i/, /u/, and /a/, for example Arabic, Greenland Inuit, and Dyribal, an Australian Aboriginal language. The vast majority of languages incorporate these three vowels within their sound system; 91.5% have an sound, 88% an /a/, and 83.9% an /u/. People who were asked to distinguish from /u/ in an experiment only made two mistakes out of ten thousand attempts, suggesting that these two sounds are indeed as different as they could possibly be.
Other languages with a five vowel system of similar types are Japanese, Spanish, Zulu, and Basque.
Lip shape in vowels
Changing the shape of the lips is another way of modifying the sound that comes out. English front vowels like /i:/ (see) are made with unrounded lips. while back vowels like /u:/ (ooze) require the lips to be rounded. Though there is no logical reason why back vowels should be rounded and front vowels should not, front vowels are in fact unrounded in 94% of languages, back vowels rounded in 93.5%. Even by the age of four months, babies are able to tell that requires spread lips, and /u/ rounded lips.
The sounds of speech also differ in terms of how long they take to say - 'length'. A long sound is indicated in phonetic script by a following colon ":". In a few languages consonants differ in terms of length. For example in Slovak vrch /vrÙx/ (summit) has a short /r/ but vräsèok /vr:Xok/ (hill) has a long /r:/. In Finnish short /l/ as in /â eli/ (brother) differs from long /l:/ as in /â el:i/ (gruel).
More commonly, languages use length to distinguish vowels. The /i:/ of bean is long while the // of bin is short: the /u:/ of moon long but the /u / of wood short. Length effectively doubles the number of potential vowels by having pairs of long and short at a particular point in the vowel space. A five vowel triangular system becomes a ten vowel system by using length as an additional factor, for example in Hawaiian.
Other factors are often tied in with length. In the long /i:/ of English beat, the tongue position is also slightly higher and fronter than in its short counterpart, the // of bit, and the muscles of the lips and tongue are slightly more tense. Similar slight differences are found in other long/short pairs such as the long relaxed /:/ in dawn versus the short tense // in don, one reason why different symbols are used for the two vowels /i:/ beat and // bit as well as the length marker ":". That is to say, long vowels tend also to be said more tensely than short vowels, and to have slightly different tongue positions.
Diphthongs are a type of vowel in which the tongue moves from one vowel position to another while the vowel is being produced. The vowel sound is not the same at the beginning as at the end. The method of describing diphthongs is to state their starting point and the destination towards which they move (but do not necessarily reach).
In the English /XX/ of toy the starting point of the tongue is the back mid position, the destination towards the front high / / position, as in: In the English /XX/ of go, however, the tongue starts centrally and moves back and up towards the /XX/ position.
Because diphthongs involve movement, it is impossible to produce them continuously; the listener ends up hearing only the second vowel. RP has seven or eight diphthongs, depending whether a speaker pronounces words like poor with a diphthong /XX/ or with the same /X:/ sound as in paw.
While the figures for diphthongs in the worlds languages are not very certain, the commonest seem to be /ei/ and /au/, the rarest /XX/. The language with the highest number is !Xu with no less than 22.
English Vowels around the World
The following chart comparing the RP vowels with other accents of English is based on John Wells, Accents of English. It gives the pronunciation for various test words having pure vowels in the different accents; it disguises many differences in pronunciation, particularly the effects of /r/, to be discussed in Chapter Seven.
Consonants differ from vowels because the lips or tongue disturb the stream of air rather than letting it flow out smoothly. Since they are produced by obstructing the air, this section describes where the obstruction islips, teeth, and so on, what forms the obstructiontongue, lips etc, and the manner in which it is made. In common with vowels, consonants may , but do not have to, use voice from the vocal cords and may be said with tense or relaxed muscles.
One method of producing a consonant is to interrupt the flow of air from the mouth by blocking it for a brief moment. Consonants such as /b/ in brain and /k/ in crane are known as stop or plosive consonants because the flow of air is stopped and then explodes abruptly out from behind the obstruction. English plosive consonants block the air at three different places, shown in the following diagram of the mouth.
Fig. 2. Reference points for the production of consonants
When the air is temporarily blocked by both lips, the consonants produced are the voiced /b/ (lab) or the voiceless /p/ (lap). When it is the tip of the tongue that blocks the air by contacting the ridge behind the teeth (the alveolar ridge), the consonants are the voiced /d/ (dime) or voiceless /t/ (time). Placing the back of the tongue against the back of the roof of the mouth (the soft palate or velum) produces the voiced sound /g/ (get) or the voiceless /k/ (kid).
Fricatives and other consonants
The second major method of producing consonants is to let the air escape through a narrow gap rather than blocking it completely fricative sounds. English fricatives involve three of the same places of contact, the lips and teeth /v/ (live) and /f/ (life), the back of the teeth /ð/ (this) and // (thick), and the teeth ridge /z/ (rise) and /s/ (sip). In addition the fricatives // (garage) and // (fish) use the teeth ridge, but differ from /z/ and /s/ in that the tongue is further back and lets the air escape over a larger area. Because of their distinctive hissing noise, these four are sometimes grouped together as sibilants.
Vowels too can be nasalised if some air escapes through the nose at the same time as through the mouth. In Bengali, the triangular seven-vowel system is doubled at each position by a nasalised counterpart. Nasalisation occurs occasionally in isolated words of English, such as the final nasalised vowel in my own pronunciation of restaurant. In French such vowels are far more frequent. Syllable-final /n/ was lost in many French words but replaced by nasalisation of the final vowel of the syllablefin (end), son (his), rien (nothing).
So it is not just the English plosives that make use of the same four points but also the fricatives and nasals. All the points of contact have two or more of the sounds that are possible, divided into pairs of voiced and voiceless, apart from the nasals.
The four columns of the diagram conceal further possible sounds that do not happen to occur in English. English /f/ and /v/ involve the bottom lip and the upper teeth (labiodental) rather than both lips together (bilabial). The English fricatives /f/ and /v/ do not match the lip plosives /p/ and /b/ in the same way that the fricatives /s/ and /z/ match the plosives /t/ and /d/. So, not surprisingly, some languages have bilabial fricatives. Greek for instance has a voiced fricative /XX / involving both lips in words like biblio (book).
The other possibility concealed by the English chart is the existence of a fifth column for sounds made on the roof of the mouth itself, the palate. Many languages have such palatal consonants. Greek has a palatal fricative /XX/ in words like /stio/ (ghost). French has a palatal nasal /X/ in /ao/ agneau (lamb). Irish has both a fricative palatal /X/ in /XX/ oiåche (night) and a nasal palatal in words such as /aXir/ Ainir (maiden).
Putting all these together, English has 24 consonants, close to the average 22.8 for a language. The proportion of consonants to vowels is 1.27 to one, slightly low compared to the world average of 2.5 to one; that is to say, proportionately English has rather more vowels than the average.
4. Air and speech
So far this chapter has taken it for granted that the air for speech is produced by the lungs breathing out. In order for speech to be regular, this lung air has to come out at a fairly constant pressure, regulated by complex muscles in the diaphragm and the rib-cage. Otherwise speech would be high in pitch just after speakers breathe in and would tail away to low pitch as they go onthis effect can easily be seen if you try read a long sentence aloud on one breath.
No languages seem to use indrawn breath for normal speech. There are, however, occasions when it is used to disguise the speakers voice. Suitors serenading their loves are said to use in-drawn air in some parts of the Philippines and in German-speaking Switzerland in order to preserve their anonymity. English has a minor non-speech use of indrawn breath in the sound made when one burns oneself accidentally.
The lungs are not the only source of moving air. Southern African languages such as Nama and Zulu use sounds called clicks produced by the tongue sucking air into the mouth; they will be familiar to listeners to music from this region such as the Xhosa click songs of Miriam Makeba. English has some marginal non-speech clicks, for instance the giddyup noise made to horses or the tut-tut noise of disapproval.
The point of a
tongue-twister is to confuse the language system in the mind by repeating
related sounds over and over again.
Mrs Pipple Popple popped a pebble in poor Polly Pepper's eye.
Charlie chooses cheese and cherries.
Old oily Ollie oils oily automobiles.
He ran from the Indes to the Andes in his undies.
Rubber baby buggy bumpers.
Shave a cedar shingle thin.
This thistle seems like that thistle.
Unique New York!
Miss Ruths red roof thatch.
Any noise annoys an oyster but a noisy noise annoys an oyster most.
Tongue-twisters in different languages:
Tres tristes tigres trillaron trigo en un trigal. (Spanish: Three sad tigers threshed wheat in a wheat field)
Nama-mugi, nama-gome, nama-tamago. (Japanese: raw wheat, raw rice, raw eggs)
Le ver vert va vers le verre vert. (French: the green grub goes to the green glass)
Nie pieprz wieprza pieprzem. (Polish: do not pepper the hog with pepper)
Un limon, mezzo limon (Italian: one lemon, half a lemon)
Surrealistic aphorisms by Marcel Duchamps
Abominable fourrures abdominales. (abominable abdominal furs)
My niece is cold because my knees are cold.
Etrangler létranger. (strangle the stranger)
Examples invented for a competition by nine-year-old children in Ardleigh, a village in Essex:
The stranger strangles Susey with some long stretchy string. Tongue twisters give me blisters.
Bob and Bill brought bits.
My monkey mistakes my mums messy mixture for a monkey.
Trees with green leaves.
Clearly not all the children have understood how a tongue-twister works.
Timing of voice in consonants
Voice has been treated thus far in a simplified fashion as either on or off: either the vocal cords vibrate, as in // (ship), or they do not, as in /t/ (tip). A crucial factor, however, is the moment at which voicing starts during the production of the consonant.
Overlap of English /g/ and Spanish /k/
Consequently Spanish does not have the wide tolerance in VOT for /g/ allowed in English. An English person may take a Spanish /k/ to be a /g/; a Spanish person take some English /g/s to be /k/s, seen in the overlap in the following diagram. The two languages both use voice to distinguish sounds but they use it differently, just as a Hong Kong dollar is worth less than a Singapore dollar. The languages have settled on different ways of making the voice distinction. This voiceless burst of air is in a sense accidental in English. A /p/ before a vowel will have an aspirated puff of air after it, as in pit; a /p/ following /s/ may not, as in spit. Using too much aspiration or too little will not interfere with the meaning of the sentence. However, in languages such as Hindi there are two different sounds, one with, one without aspiration/pá l/ (fruit), and /pl/ (moment). Potentially there is a three-way distinction between early VOT, 0 VOT, and late VOT, rather than the two-way choice of late and early VOT found in Spanish. Thai and Burmese also have three distinct sounds at the dental position.
The idea that speakers of a language divide up sounds into either/or distinctions leads to a general characteristic of languages: speakers perceive speech sounds as distinct categories rather than as continuous variation. A sound is either a /g/ or a /k/, never something in between. Experiments have tested how people perceive synthesised sounds that gradually increase in VOT. It is not that there are two areas within which people are certain of which sound is involved and a grey area in the middle where they are uncertain. Instead they are committed to one sound up to the particular point at which they switch to the other, even if they differ over the location of this point. Though VOT is a continuous scale, it has a cut-off point where the listener has an either/or choice. One characteristic of human beings seems to be that they cannot hear intermediate types of speech sounds but force them into one or other of the categories of the language. This ability is called categorial perception, that is to say, perceiving sounds as discrete categories rather than as a continuous variation. Like a piano defining 85 notes from a vast range, the human mind perceives sounds as separate items.
5. Combining sounds into syllables
Suppose you had to decide on a name for a new washing powder. A computer produces a list of possible names: Mrah, Bliff, Bnill. Which would you choose?
Though each of the sounds in these words is English, only one of the words is in fact possible, namely Bliff. For this is the only one of the three that conforms to the rules for making English syllables. The differences between the words are not in the actual sounds, all of which are possible in English, but in how they are combined. The crucial element for combining sounds together is the syllable. This section looks at the ways in which syllables are constructed, which varies from one language to another. Syllables have centres, usually vowels; they have beginnings and endings, usually consonants. Sounds vary between the more vowel-like ones that occur in the centre of the syllable where the airstream is least obstructed, and the more consonant-like ones that occur at the beginning or end that have most obstruction, technically called the sonority hierarchy.
Sounds and phonemes
The vague word sound has been used up to this point to talk about speech. But there are different levels of speech sound. One level is the actual description of the sounds themselves as sheer sound, studied in the branch of linguistics called phonetics. The next level is the sound system of a particular language. English or French or Japanese use a small selection of the sounds possible in human languages, the subject matter of phonology. The present section then looks at this next level, namely how particular languages use sounds within their own phonological systems.
7. Alternatives to speech sounds
Spoken sounds are only one of the means through which language can be expressed. There are forms of language that do not involve sounds produced by the vocal organs. The most obvious is written language, whether using an alphabet based on sounds or a character system based on meanings, as seen in Chapter Five. In Zaire, however, there are drum languages in which the sounds are conveyed on a wooden drum called a boungu tuned to give two notes, Low (male) and High (female), when hit on different sides. Any word can be converted into a sequence of High and Low notes, rather like the Long and Short of Morse Code, and broadcast for up to seven miles on a still night. Thus in Kele a word such as sango (father) is a sequence of two High notes HH; nyango (mother) is a Low followed by a High LH; and wana (child) is a Low followed by a High . To arrive at the drum expression for orphan means adding some grammatical words:
English: child has no father nor mother
Kele: wana ati la sango la nyango
Drums: H L L H L H H L L H
A further alternative to speech sounds is whistling, which is used to communicate across distances of up to 5 kilometres across thinly populated country, for example by shepherds or by hunters, in parts of the globe ranging from Mexico to Burma to the Canaries. Whistle languages do not convert speech sounds to high and low notes, but substitute particular notes for each vowel with consonants given by transitions between the vowels. Both drumming and whistling convert spoken language into a different medium rather than being an independent form in their own right. In other words, they are like Morse code or shorthand in being parasitic on spoken language.A true alternative to speech is, however, found in the languages used by the deaf, which involve gestures rather than sounds and are capable of communicating as complex ideas through as complex structures as any other human language. Take two signs from British Sign Language (BSL). The sign for woman is the index finger of the right hand stroking the right cheek; the sign for England is the two hands in front of the chest with the two index fingers stretched out horizontally moving to and fro, from left to right. These gestures are just as difficult to describe in words as the sounds of speech. For the gestures of deaf language are organised in the same way as the sounds of speech. Just as the organ making the speech sounds, such as the tongue, needs to be specified so does the shape of the hand, with 51 different handshapes possible in BSL. Then, as for plosives and fricatives and diphthongs, the types of movement need to be described, some 37 for BSL. As with the vowel space inside the mouth, the location where the sign is made needs to be specified, including in BSL nine positions on the face and four on the neck and trunk. Sometimes the same sign has different meanings if produced at a different level, just as a /p/ is different from a /k/. Thus sign language has all the normal possibilities of the phonological system of human languages. Sign languages should not then be confused with natural gesture systems based on mime. Many deaf language signs may have originated in natural gestures: the BSL sign for bird is the finger and thumb of the right hand opening and closing at nose level, clearly representing a beak. Most signs have, however, become purely arbitrary; the sign for England mentioned above for example is a remote descendant of a finger-spelling sign rather than any recognisable shape. Sometimes fanciful origins for signs have been devised. The BSL cheek stroking sign for woman has been explained variously as curls on a womans cheek, bonnet strings, and soft cheek. Yet a hundred year ago the sign was stroking the lips, showing that none of these explanations can be right. While there may be some visual links between some signs and what they mean, these are not much closer than those between natural sounds and the sounds of speech. Indeed otherwise there would not be large differences between the different sign languages of the world, whether Chinese Sign Language, British Sign Language, or French Sign Language. Even within a single country such as France or England there are strong dialect differences. Sign users from different regions may not understand each other completely. Deaf members of a theatre audience in Manchester for example complained that they could not understand the BSL interpreter of a play because he was not using the signs current in that city. This chapter has then shown that the sound system of a language consists on the one hand of particular intonation patterns, on the other of a certain number of phonemes. The actual sounds are limited by what the organs of speech can do and by universal factors such as distinctive features and sonority. Even when languages have the same sounds, they use them in specific ways according to their own systems. It is the meaningful contrasts between the sounds that are important - High Rise John versus High Fall John, or got versus cot - not the sheer sounds themselves.
Sounds and meanings
A collection of articles on sound symbolism is: Hinton, L., Nichols, J. & Ohala, J. (1994), Sound Symbolism, CUP, which has a paper by John Ohala putting forward the Frequency Code.
Intonation and its functions
A standard introduction to intonation is: Cruttendon, A. (1986), Intonation, CUP. English intonation in particular is covered in: O'Connor, J.D., & Arnold, G.F. (1973), Intonation of Colloquial English, Longman, 2nd Edition. Rises in New Zealand intonation are described in: Britain, D., & Newman, J. (1992), High rising terminals in New Zealand English, Journal of the International Phonetic Association, 22, 1-11.
Air and speech
The original article on VOT is: Liberman, A.M, Cooper, F.S., Shankweiler, D.S. & Studdert-Kennedy, M. (1967), Perception of the speech code, Psychological Review, 74, 431-461
Combining sounds into syllables
Epenthetic vowels in L2 learners are discussed in: Broselow, E. (1988), Prosodic Phonology and the Acquisition of a Second Language, in S. Flynn and W. O'Neil (eds.), Linguistic Theory in Second Language Acquisition, Kluwer, Dordrecht
Sounds and phonemes
Estuary English is described in a popular book: Coggle, P. (1993), Do You Speak Estuary?, Bloomsbury, London. English pronunciation exercises can be found in: Baker, A. (1981), Ship or Sheep?. CUP; Hill, L.A. (1961) Drills and Tests in the English Sounds, Longman
Alternatives to speech sounds
The source for drum language is: Carrington, J. (1947), Talking Drums of Africa, London. British Sign Language is described in: Woll, B., Kyle, J. and Deuchar, M. (199X), Perspectives on British Sign Language and Deafness, CUP; Kyle, J.G., and Woll, B. (1985), Sign Language, CUP. Hand gestures themselves are covered in: McNeill, D. (1992), Hand and Mind, University of Chicago. Whistle languages can be found in: Busnel, R.G., and Classe, A. (1976), Whistled Languages, Berlin, Springer; Thomas, A. (1995), Whistled languages, e-mail summary THOMAS@arts.uoguelph.ca