Frequencies for English Punctuation Marks

Vivian Cook  Spelling stats

Based on a writing system corpus some 459 thousand words long. This includes three novels of different types (276 thousand words), selections of articles from two newspapers (55 thousand), one bureaucratic report (94 thousand), and assorted academic papers on language topics (34 thousand). More information is in Cook, V.J. (2013) ‘Standard punctuation and the punctuation of the street’ in M. Pawlak and L. Aronin (eds.), Essential Topics in Applied Linguistics and Multilingualism, Springer International Publishing Switzerland (2013), 267-290

Score per 1000 running words

 

Average

. full stop

65.3

, comma

61.6

; semi-colon

3.2

: colon

3.4

! exclamation

3.3

? question

5.6

’ apostrophe/ single quotation

24.3

“ double quotation

26.7

- hyphen

15.3

TOTAL
208.7
score = divided by total words in 1000s, .05 rounded up

 

Frequency per 1000 words (bold) in various text types
 

Pick
wick

Sons & Lovers

Myste
rious

Death world

Guard
ian

NY
Times

CEFR

Ling Papers

Ave
rage

COCA

Total words

296,000

161,000

57,106

58,095

11,940

12,084

93,808

33,726

 

 

.

19,364

65.4

14,515

90.1

4,650

81.6

4,783

82.5

522

43.5

760

63.3

4644

49.4

2,269

66.7

65.3

21,304,861

,

32,618

110.2

12,492

77.6

4,282

75.1

2,420

41.7

570

47.5

776

64.7

4810

51.2

2,258

66.4

61.6

23,849,941

;

3,469

11.1

697

4.3

77

1.4

6

0.1

12

1

12

1

356

3.8

106

3.1

3.2

782,692

:

144

0.5

273

1.7

132

2.3

5

0.1

43

3.6

41

3.4

552

5.9

321

9.4

3.4

1,360,298

!

1,527

5.2

1,973

12.3

413

7.2

75

1.3

0

 

27

0.3

5

0.1

3.3

368,622

?

1,897

6.4

1,335

8.3

838

14.7

314

5.4

13

1.1

6

0.5

809

8.6

1

0.0

5.6

1,620,720

24,688

83.4

4,259

26.5

1,104

19.4

1,225

21.1

175

14.6

175

14.6

384

4.1

362

10.6

24.3

953,027

1,560

5.3

10,430

64.8

4,437

77.8

2105

36.3

271

22.6

262

21.9

239

2.5

44

1.3

26.7

na

-

8,297

28.0

3,109

11.9

1,457

25.6

1006

17.3

133

11.1

154

12.8

1081

11.5

420

12.4

15.3

392,700

Total per 1000
315.5
297.5
305.1
205.8
145
182.2
186.5
236.7
208.7
 

Source: same corpus, compared with COCA (Corpus of Contemporary American)

 

Punctuation Mark Percentages
Source: Meyer (1987)'s analysis of the Brown Corpus
Commas
47%
Full stops
45%
Dashes
  2%
Parentheses
  2%
Semicolons
  2%
Question marks
  1%
Colons 
  1%
Exclamation marks
  1%

Frequencies for Ngrams
NB commas are missing as Ngrams uses them as dividers

 

 Frequencies for comma distribution

Elements in a series (words, phrases, clauses etc)

20.3%

Sentence-initial elements (words, phrases, clauses etc) 
20.2%
Sentence-final elements (phrases, clauses)

  5.0%

Non-restrictive phrases or clauses
17.3%
Appositives
26.1%
Interrupters
  6.6%
Quotations
  4.5%

Source: Bayraktar et al, 1998 based on Wall Street Journal

Google Ngram historical frequencies  for English 1500-2000 AD

Colon :

 

Double quotation " "


Full stop .


Question mark ?


Exclamation mark !


Semi-colon ;


Punctuation web   Spelling Stats    Street punctuation