spacer

N-grams data

Corpus of Contemporary American English


 Purchase data 

Overview
Compare to Google
Processing the data

Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3

Historical (COHA)
Free (1 million)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Word frequency
  Collocates
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 520 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.

Short sample:

 

frequency

word1

word2

word3

1419

much

the

same

461

much

more

likely

432

much

better

than

266

much

more

difficult

235

much

of

the

226

much

more

than

A few more examples (from among an unlimited number of n-grams) might be:

 NOUN + NOUN sequences  three word strings with a preposition in the middle position
 VERB + the + NOUN sequences  two word strings, where the words begin or end with certain letters
 like + word + word  (potential) phrasal verb: VERB + ADV particle

The data is available in several different formats:

1 Free lists

1 million most frequent 2, 3, 4, and 5-grams

2 Inexpensive data sets

All n-grams that occur three times or more: 6.2 million 2-grams, 11.9 million 3-grams, and 8.3 million 4-grams

3 All 2, 3, and 4-grams

Up to 155 million distinct strings -- searchable by word form and part of speech (as above), and also lemma

If you're interested in the frequency of single words (including frequency by genre and sub-genre), or collocates (all words "near by" a given word), you might look at www.wordfrequency.info.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.