Common English words
A crucial aspect of much study of words is how often they occur, their frequency. Often the crucial question is not whether a word exists, but how often it is used; immunosurveillance may be in the OED but it doesn’t occur once in the BNC. Modern computers have made establishing word frequency quite easy, as this example shows. The technique is to establish a ‘corpus’ of English texts from books etc, usually now running to hundreds of millions of words, and then to search it for occurrences of a word or phrase using a program called a concordancer, usually taking into account a spread of words before and after the target word. Different corpora exist going from the British National Corpus (BNC) to Collins and Birmingham University International Language Database (COBUILD), but it is easy to construct your own and to search it with an easily obtaoinable concordancers such as Wordsmith. Even Google can be used: feed in immunosurveillance and it lists 58,900 pages; feed in phone and it lists 933 million – of course this is only counting pages, not words themselves as a word may be used many times on a single page.
One interesting thing is that there is a very little difference between different sources over which words are most frequent. The following list compare the most frequent words from the BNC a wide-ranging source, from the writing of seven-year-old children, from the narrative parts of Jane Austen’s novels and from Japanese learners of English.
|
BNC |
7-year-olds’ writing |
Jane Austen |
Japanese learners |
1. |
the |
and |
the |
I |
2. |
of |
the |
to |
to |
3. |
and |
a |
and |
the |
4. |
a |
I |
of |
you |
5. |
in |
to |
a |
and |
6. |
to |
was |
her |
a |
7. |
it |
it |
I |
my |
8. |
is |
he |
was |
in |
9. | was |
we |
in |
it |
10. |
I |
in |
it |
for |
As can be seen there is very little difference between these; the is in the top three for all of them; of, and, a, to, I and it are in all the lists, was in all the lists but one. Whoever you are, whatever you are writing about, you’re going to be using the same highly frequent words. Yet probably you didn’t make all three of your commonest words in English structure words above. Structure words like of and the glue the nouns and verbs of English together (see Content and Structure Words). The top 100 words are all structure words bar four – time, people, new and way.
Here are the most common content words from the BNC:
|
Nouns |
Verbs |
Adjectives |
1. |
time |
say |
new |
2. |
people |
know |
good |
3. |
way |
get |
old |
4. |
year |
go |
different |
5. |
government |
see |
local |
6. |
day |
make |
small |
7. |
man |
think |
great |
8. |
world |
take |
social |
9. |
work |
some |
important |
10. |
life |
use |
national |
Again it is unlikely that you had all of these right. Our off-the-cuff guesses might include man and day but who would have guessed government and world? These frequencies are quite different from those used say in teaching English to non-native speakers, which tend to start from concrete visualisable words like train and banana rather than abstractions like year or work.
Main source: British National Corpus (BNC)
Formerly at homepage.ntlworld.com/vivian.c, which is defunct