Lists of words and their frequencies have long been available for teachers and learners of language. For example, Thorndike (1921, 1932) and Thorndike and Lorge (1944) produced word frequency books with counts of word occurrences in texts used in the education of American children. Michael West's General Service List of English Words (1953) was primarily aimed at foreign learners of English. More recently, with the aid of efficient computer software and very large bodies of language data (called corpora), researchers have been able to provide more sophisticated frequency counts from both written text and transcribed speech. One important feature of the resulting frequencies presented in this series is that they are derived from recently collected language data. The earlier lists for English included samples from, for example, Austen's Pride and Prejudice and Defoe's Robinson Crusoe, thus they could no longer represent present-day language in any sense.
Frequency data derived from a large representative corpus of a language brings students closer to language as it is used in real life as opposed to textbook language (which often distorts the frequencies of features in a language, see Ljung, 1990). The information in these dictionaries is presented in a number of formats to allow users to access the data in different ways. So, for example, if you would prefer not to simply drill down through the word frequency list, but would rather focus on verbs for example, the part of speech index will allow you to focus on just the most frequent verbs. Given that verbs typically account for 20 percent of all words in a language, this may be a good strategy. Also, a focus on function words may be equally rewarding—60 percent of speech in English is composed of a mere 50 function words. The series also provides information of use to the language teacher. The idea that frequency information may have a role to play in syllabus design is not new (see, for example, Sinclair and Renouf 1988). However, to date it has been difficult for those teaching languages other than English to use frequency information in syllabus design because of a lack of data.
Frequency information should not be studied to the exclusion of other contextual and situational knowledge about language use and we may even doubt the validity of frequency information derived from large corpora. It is interesting to note that Alderson (2007) found that corpus frequencies may not match a native speaker's intuition about estimates of word frequency and that a set of estimates of word frequencies collected from language experts varied widely. Thus corpus-derived frequencies are still the best current estimate of a word's importance that a learner will come across. Around the time of the construction of the first machine-readable corpora, Halliday (1971: 344) stated that "a rough indication of frequencies is often just what is needed." Our aim in this series is to provide as accurate as possible estimates of word frequencies.
Paul Rayson and Mark Davies Lancaster and Provo, 2008
References
Alderson, J.C. (2007) Judging the frequency of English words. Applied Linguistics, 28(3): 383-409.
Gardner, D. (2007) Validating the construct of "word" in applied corpus-based vocabulary research: A critical survey. Applied Linguistics 28, 241-265.
Halliday, M.A.K. (1971) Linguistic functions and literary style. In S. Chatman (ed.) Style: A Symposium. Oxford University Press, Oxford, 330-365.
Ljung, M. (1990) A Study of TEFL Vocabulary. Almqvist & Wiksell International, Stockholm. Nation, I.S.P. (1990) Teaching and Learning Vocabulary. Heinle & Heinle, Boston.
Sinclair, J.M., and Renouf, A. (1988) A lexical syllabus for language learning. In R. Carter and M. McCarthy (eds) Vocabulary and Language Teaching. Longman, London, 140-158.
Thorndike, E.L. (1921) Teacher's Word Book. Columbia Teachers College, New York.
Thorndike, E.L. (1932) A Teacher's Word Book of 20,000 Words. Columbia Teachers College, New York.
Thorndike, E.L. and Lorge, I. (1944) The Teacher's Word Book of 30,000 Words. Columbia Teachers College, New York.
West, M. (1953) A General Service List of English Words. Longman, London.
Acknowledgments
We are indebted to a number of students from Brigham Young University who helped with this project: Athelia Graham, Andrea Bowden, Amy Heaton, Tim Wallace, Tim Heaton, Kyle Jepson, Timothy Hewitt, Mikkel Davis, Jared Garrett, Teresa Martin, Billy Wilson, and Dave Ogden, and several student employees at Brigham Young University's English Language Center. A special thanks to Brigham Young University's English Language Center, the College of Humanities, the Department of Linguistics and English Language, and the Data-Based Research Group for their financial support.
Abbreviations
The following are the part of speech codes for the 5,000 headwords in the dictionary.
Code
No. of Words
Explanation
Examples
a
11
article
the, a, your
c
38
conjunction
if because, whereas
d
34
determiner
this, most, either
e
1
existential
there
g
1
genitive
'
i
96
preposition
with, instead, except
j
839
adjective
shy, risky, tender
m
36
number
seven, fifth, two-thirds
n
2558
noun
bulb, tolerance, slot
p
46
pronoun
we, somebody, mine
r
333
adverb
up, seldom, fortunately
t
1
to + infinitive
to
u
12
interjection
yeah, hi, wow
v
992
verb
modify, scan, govern
x
2
negation
not, n't
Introduction
The value of this frequency dictionary of English
"I don't know that word." "What does that word mean?" "How is that word used?" These are some of the most common pleas for help by language learners—and justifiably so.
Not knowing enough words, or the right words, is often the root cause of miscommunication, the inability to read and write well, and a host of related problems. This fundamental need is compounded by the fact that there are simply so many words to know in any language, but especially in English, which may contain well over two million distinct words (Crystal, 1995)—and growing fast. Thirty years ago, who would have thought that we would be "surfing" in our own homes, or that "chips" would be good things to have inside our equipment, or that we would be excited "to google this" and "to google that."
Without belaboring the obvious, it is little wonder that learners, teachers, researchers, materials developers, and many others are interested in establishing some sense of priority and direction to what could easily become vocabulary chaos. Our frequency dictionary is designed for this very purpose. We wanted to know which of the vast number of English words to start with, and we also wanted to know which other words these words "hang out with"—their neighbors (or collocates)—which provide crucial information about the meaning and use of these words. Perhaps even more importantly, we wanted to know this for our current day, not for some English of the past, when punch cards were used to program computers, and when surfing was only done at the beach. In short, we offer A Frequency Dictionary of Contemporary American English with the hope that it will benefit those who are trying to learn our current mother tongue, as well as for those who desire to assist them.