Purchase data
Overview
Compare to Google
Processing the data
Samples (COCA)
Level 1 (free)
Level 2
Level 3
Historical (COHA)
Free (1 million)
Spanish data
Portuguese data
Related sites
Full-text data
Word frequency
Collocates
WordAndPhrase
Academic vocabulary
corpus.byu.edu
Contact us
|
These n-grams are based on the largest
publicly-available, genre-balanced corpus
of English -- the
520 million word Corpus of
Contemporary American English (COCA). With this n-grams data (2,
3, 4, 5-word sequences, with their frequency), you can carry
out powerful queries offline -- without needing to access the
corpus via the web interface. Short sample:
|
frequency |
word1 |
word2 |
word3 |
1419
|
much
|
the
|
same
|
461
|
much
|
more
|
likely
|
432
|
much
|
better
|
than
|
266
|
much
|
more
|
difficult
|
235
|
much
|
of
|
the
|
226
|
much
|
more
|
than
|
|
A few more examples (from among an
unlimited number of n-grams) might be:
NOUN + NOUN sequences |
three word strings with a preposition in the middle position |
VERB + the +
NOUN sequences |
two word strings, where the words begin or end with
certain letters |
like + word + word |
(potential) phrasal verb: VERB +
ADV particle |
The data is available in several different formats:
1 |
Free lists |
1 million most
frequent 2, 3, 4, and 5-grams |
2 |
Inexpensive data sets |
All n-grams that occur three times or more:
6.2 million
2-grams, 11.9 million 3-grams, and 8.3 million 4-grams |
3 |
All 2, 3, and 4-grams |
Up to
155
million distinct strings -- searchable by word form and part of speech
(as above), and also lemma |
If you're interested in the frequency
of single words (including frequency by genre and sub-genre), or collocates (all
words "near by" a given word), you might look at
www.wordfrequency.info.
gipoco.com
is neither affiliated with the authors of this page or responsible
for its contents. This is a safe-cache copy of the original web site.
gipoco.com
is neither affiliated with the authors of this page nor responsible
for its contents. This is a safe-cache copy of the original web site.
|