What is a word?

It may seem perfectly obvious what a word is. ‘beer’ is a word, ‘reeb’ is not. ‘beer’ has a meaning, ‘reeb’ does not. So a word is a sequence of letters (or sounds) with meaning. Is that all there is to it?

For over a hundred years, researchers looking at language meaning have used an element called a morpheme – the smallest element of a sentence that has a meaning. Sometimes this may be a word: ‘beer’ is a morpheme as it has a meaning, ‘reeb’ is not as it has no meaning. Sometimes a morpheme may be smaller than  a word; ‘beers’ and ‘beery’ have different meanings from ‘beer’ so the elements ‘s’  and ‘y’ so are morphemes.

 - units able to occur on their own

What makes ‘beer’ a word is partly that it can stand on its own – ‘What’ll you have?’ ‘Beer’ – but ‘s’ or ‘y’ have to be added to something else, ‘beers/beery’, ‘flour/floury’ etc. Nobody could ask a question to which ‘s’ or ‘y’ is the answer (except of course in technical discussions of language). One working definition of word is then the smallest unit of language that be said on its own – the minimal free form. Since ‘s’ and ‘y’ are bound to a preceding word, just like ‘-ish’, ‘beerish’, or ‘-ly’, ‘beerily’, they are bound morphemes not morphemes. If you were compiling a dictionary, it would, however, be highly laborious to hunt for occasions on which every word in the dictionary had occurred on its own. In practice this definition is not terribly useful. And it does not cover the important category of function words like ‘of’ and ‘my’, which can hardly be said to occur on their own yet we feel are words.

 - units that can’t be split up

In English you can add to the beginning of a word, ‘happy> unhappy’ or to the end ‘happy>happiness’ but you can’t usually add anything to the middle; a word is uninterruptible. The only exception are certain exclamatory remarks such as ‘absobloominglutely’ where ‘blooming’ has appeared in the middle of ‘absolutely’ – add your favourite swear-word inside ‘fan…….tastic’ and ‘in……credible’. Doug Coupland coined some words in this way such as ‘emallgration’; Homer Simpson talks of a ‘saxamophone’. See infixes for more examples.

 - items listed in a dictionary

A word is also something that can be listed in a dictionary: you can look up ‘beer’ but you can’t look up ‘s’ (though it may depend on the dictionary, some of which will certainly have an entry for the meanings of ‘-y’). When we talk about the words of a language, it is usually this list in a dictionary that we have in mind.

There are still problems. Dictionaries actually have ‘entries’ rather than words. ‘man’ is one entry, hence it is one word. But this does not mean that it has one meaning; the Oxford English Dictionary (OED) has seventeen main meanings for 'man' as a noun including the expected 'a human being (irrespective of sex or age)' but also 'one of the pieces used in chess' and 'a cairn or pile of stones marking a summit or prominent point of a mountain'. If the principle is one word = one meaning, how many words ‘man’ are there? Sometimes people treat these multiple meaning as ‘homophones’, different words with the same sounds or letters that have the same meaning, like ‘bank’ (of a river) versus ‘bank’ (type of business). Sometimes they see it as extensions from a ‘central meaning’. In some subtle way the meaning of ‘human being’ extends to anything that looks like a  human being such as a cairn of stones. So the list of separate entries in the dictionary may only be a rough guide to the numbers of words.

 - chunks divided by spaces or sounds

A word is also a chunk of language that can be chopped out from a sentence. It is of course possible to speak with clear pauses between the words: ‘We – shall – overcome’. But normally this style is reserved for beginning readers and Daleks. If you listen to someone speaking, it is hard to detect pauses in between the words – ‘weshallovercome’; listen to a foreign language and you probably have no idea where words begin and end. In speech pauses are  used to show grammatical divisions, hesitations and so on, rarely divisions between words..

Writing of course is very different. Words stand out because they have spaces in between them; we know that there are three words in ‘We shall overcome’ because there are two spaces: a word is ‘a sequence of letters without any spaces’. This definition works admirably for writing and distinguishes between words and bound morphemes which are not separated by spaces. Probably this is the working definition most people use: if it has a space before and a space after, it’s a word.

The snag is that the convention of putting spaces between words only came in around the 8th century AD; English had got along without it for some time already. Nor does it apply to other languages; Chinese has spaces between characters, not between words; Thai and Inuit are alphabetic scripts that do not separate words with spaces. If words need spaces, then none of these languages have words. Nor can this definition cannot apply to the spoken language where the equivalent pauses are potential rather than invariable; defining the units of the spoken language through units of the written language is, according to most, putting the cart before the horse. Pauses may indeed potentially isolate words but it would take a lifetime of listening to catch the appropriate pauses for each word in the language

Having sorted out where words start and end, the other problem is still in English what counts as a word. Ok ‘carpet’ is a word and fits all the above definitions. So is ‘carpets’ a different word? Is ‘to carpet someone’ a different word? A ‘carpeter’? Are ‘carpeting’ and ‘carpeted’ still different words? But what about ‘recarpet’ and ‘uncarpeted’? ‘carpet-bag’ and ‘red-carpeted’? If we are counting the number of  occurrences of ‘carpet’ in English texts, do we have to add up all of these nouns, verbs and adjectives? Do we draw the line at compounds like ‘red-carpet’? Or are we even stricter and exclude forms that have affixes like ‘carpeter’ and or prefixes like ‘recarpet’? Indeed I though I had made these two forms up – the spell-checker in Word rejected them - till I checked on Google and found 9,000 web-pages with ‘carpeter’ and 20,000 for ‘recarpet’. This is then why it is so difficult to set a figure on how many words someone knows or how many words there are in a language: it all depends on what you count as a word.

One solution is not to count words but word-families. A word family is defined by Paul Nation as ‘a headword, its inflected forms, and its closely related derived forms’. So included with the headword ‘book’ as a noun are the inflected form ‘books’ and the derived forms ‘bookish’; ‘book’ as a verb has inflected forms ‘books’, ‘booked’, ‘booking’ and derived forms ‘bookable’ and ‘booker’. So there are two ‘book’ families with a limited number of  relations.

But is a ‘booklet’ part of the noun family ‘book’? A ‘bookie’ part of the verb family ‘book’? Each person comes to a  different decision – before you even start thinking whether elements in a word family have to share a common meaning – is the book which is a ‘printed treatise’ the same family as the book that is ‘a telephone directory’ or ‘words to which a musical is set?’ or ‘the total of charges against a person’ or ‘a record of bets’? And the meaning affects which of the related forms can be used: ‘bookish’ can’t be used for musicals; ‘bookies’ take bets; ‘bookers’ book tickets, etc

