spacer

corpus.byu.edu

corpora, size, queries = better resources, more insight


 Contribute   Contributors 

 Academic site license 

Overview
Corpora
Size, speed, queries
Insight into variation

History / updates
FAQ / questions
Researchers
Log in / password
Profile / register

Related resources
   Full-text data
   Word frequency
   Collocates
   N-grams
   WordAndPhrase
   Academic vocabulary

Problems
Contact us


Created by Mark Davies, BYU. Overview, search types, looking at variation, corpus-based resources.

The most widely used online corpora -- more than 130,000 distinct researchers, teachers, and students each month.
 

English

# words

language/dialect

time period

 compare

 NEW  Hansard Corpus (British Parliament) 1.6 billion British 1803-2005 Info

Wikipedia Corpus (with virtual corpora)

1.9 billion

English

-2014

Info

Global Web-Based English (GloWbE)

1.9 billion

20 countries

2012-13

 

Corpus of Contemporary American English (COCA)

520 million

American

1990-2015

* * * * *

Corpus of Historical American English (COHA)

400 million

American

1810-2009

* *

TIME Magazine Corpus

100 million

American

1923-2006

 

Corpus of American Soap Operas

100 million

American

2001-2012

*

British National Corpus (BYU-BNC)*

100 million

British

1980s-1993

* *

Strathy Corpus (Canada)

50 million

Canadian

1970s-2000s

 

Other languages

       

Corpus del Espaol   (see also...)

100 million

Spanish

1200s-1900s

*

Corpus do Portugus   (see also...)

45 million

Portuguese

1300s-1900s

 

N-grams

       

Google Books: American English

155 billion

American

1500s-2000s

*

Google Books: British English

34 billion

British

1500s-2000s

 

Google Books: One Million Books

89 billion

Am/Br

1500s-2000s

 
Google Books: Spanish 45 billion Spanish 1500s-2000s  

* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.