As a final introductory note, we might mention that if you find this dictionary valuable and would like a similar electronic version (fewer collocates, but more of other features), feel free to visit http://www.americancorpus.org/dictionary.
What is in this dictionary?
This frequency dictionary is designed to meet the needs of a wide range of language students and teachers, as well as those who are interested in the computational processing of English. The main index contains the 5,000 most common words in American English, starting with such basic words as the and of, and quickly progressing through to more intermediate and advanced words. Because the dictionary is based on the actual frequency of words in a large 385-million-word corpus of many different types of English texts (spoken, fiction, magazines, newspaper, and academic), the user can feel comfortable that these words are very likely to be encountered in the "real world."
In addition to providing a listing of the most frequent 5,000 words, the entries provide other information that should be of great use to the language learner. Each entry shows the main collocates for each word, grouped by part of speech and in order of frequency. These collocates provide important and useful insight into the meaning and usage of the word, following the idea that "you can tell a lot about a word by the other words that it hangs out with." The entries also show where each of the collocates occur with regards to the head word (before, after, or both), which denotes whether they are subject, object, and so on. Finally, the entries indicate whether the words are more common in one genre of English (e.g. spoken or academic) than in the others.
Aside from the main frequency listing, there are also indexes that sort the entries by alphabetical order and part of speech. The alphabetical index can be of great value to students who, for example, want to look up a word from a short story or newspaper article, and see how common the word is in general. The part of speech indexes could be of benefit to
students who want to focus selectively on verbs, nouns, or some other part of speech. Finally, there are a number of thematically related lists (clothing, foods, emotions, etc.) as well as comparisons of vocabulary across genres and over time, all of which should enhance the learning experience. The expectation, then, is that this frequency dictionary will significantly support the efforts of a wide range of students and teachers who are involved in the acquisition and teaching of English vocabulary.
Comparison to other frequency dictionaries of English
Historically, most frequency dictionaries (also referred to as word books and word lists) have been created to meet educational needs, with many designed specifically to meet the needs of foreign- and second-language learners of English. Prominent among these are: The Teacher's Word Book of 30,000 Words (Thorndike and Lorge, 1944)—based on 4.5 million words from general English texts, magazines, and juvenile books; The General Service List of English Words (West, 1953)—a list of the 2,000 highest frequency words (with semantic distinctions and counts) based on visual inspections by semanticists of 5 million words from various sources (encyclopedias, magazines, textbooks, novels, etc.); the Brown Corpus list (Francis and KuCera, 1982)— based on 1 million words of written American English; and its British English counterpart—the LOB corpus list (Johansson and Hofland, 1989).
For many purposes, these latter two replaced the older lists of Thorndike and Lorge. Additionally, there are several more specialized school lists, such as: the American Heritage Word Frequency Book (Carroll, Davies, and Richman 1971)—based on 5 million running words of written school English (grades 3 through 9); the Academic Word List (Coxhead, 2000)—570 academic word families based on 3.5 million running words of academic texts; and the very early A Basic Vocabulary of Elementary School Children (Rinsland, 1945)—based on 6 million running words of actual children's writing samples.
A great debt is owed to the pioneering scholars who generated these and other frequency lists to facilitate English vocabulary learning, research, and description. Building on these earlier efforts, A Frequency Dictionary of Contemporary American
English addresses several vocabulary needs in the field of English language education. First, and perhaps most obvious, it is based on contemporary American English, thus making it more ecologically valid in educational and research settings where American English is the target, and where many are still relying on the nearly 30-year-old Brown Corpus (Francis and Kucera, 1982) for frequency information about American English vocabulary. (Note: the actual texts for the Brown Corpus were from 1961.) Second, unlike the Brown Corpus (1 million words of written English only), the frequency counts in this dictionary are based on a very large and balanced corpus of both written and spoken materials (385 million words from five major genres), thus adding confidence that the highest frequency words have indeed been determined and properly ranked, and that these words have a high degree of utility across major genres of importance to English language learners (spoken, fiction, newspapers, magazines, and academic).
Third, the inclusion of collocates (by part of speech) for each of the 5,000 high-frequency node words adds a semantic richness to the dictionary that is often lacking when only the forms of words are tallied without consideration of their potential meanings (Gardner, 2007). The tightness of some of these node-collocate relationships (big deal, bad habit, make sense, trash talk, etc.) also highlights the phrasal nature of many English vocabulary items (Cowie, 1998). Such collocational knowledge is a crucial component of what it means to know a word (Nation, 2001) and has also been recognized as a characteristic difference between native and non-native language abilities (Nesselhauf, 2005). Therefore, language learners and their teachers should benefit from the rich semantic and pragmatic information the collocates provide, thus taking us one step closer to Read's (2000) call for new high-frequency word lists that are based on large electronic corpora, but which also account for the many meanings that language learners need to negotiate. Although semantic frequency is not fully realized in this dictionary, the collocates do provide some support for semantic interpretations, and will certainly aid in determining which meanings of a word form to teach or learn.
Finally, the 30 call-out boxes in this dictionary are packed with useful vocabulary information for
language learners and their teachers, including words that make up many of the basic semantic sets of the language (animals, body, clothing, colors, emotions, family, food, etc.), words that characterize a specific genre of the language (spoken, fiction, academic, etc.), words that are new to American English, words that tend to be characteristically American or British, productive suffixes and the actual content words they are found in (nouns and adjectives), and the highest frequency phrasal verbs of American English. (Compare with Gardner and Davies, 2007, which lists the highest frequency phrasal verbs of British English.) These and other call-out boxes in the dictionary can be used for self-study, teaching, assessment, materials development, and research purposes.