Concordancing

Home ] [ Concordancing ] General Resources ] Lessons & Quizzes ] Getting Together ] Computers ] Known Errors ] Academic Writing ] Academic Wordlist ] Dictionaries ] Pronunciation ]

 

 

Corpus Linguistics ] Data Driven Learning ] Courses ] Essential Links ] American Corpus ] Illustrative Sentences ]

      Some Concordancers and Corpora on the web

Students who have achieved the threshold level of English can discover a great deal about how the language works for themselves, especially using such resources as those linked below. These resources also provide a lot more information than dictionaries can: in fact, modern dictionaries are created using such resources and are a distillation of the data. 

Go to my fledgling site on Corpus Linguistics 

Corpus-based Language Study, a portal that includes links to software and various tools as well as to online articles and some other language tools suitable for teachers and students of English.

The Sketch Engine

The SkE is above all a concordancing program, from Lexical Computing Ltd. The Sketch Engine is a new Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. Register here to use the sample data, which includes the British National Corpus.  You may use the website at no cost for demonstration, research and teaching purposes only. Wordsketches is an online resource that automates some of the findings of both concordance and collocation searches. You get statistical data about how a word combines with both lexical and grammar words, and from there you can see concordances of specific combinations. It was conceived by Adam Kilgarriff and created by Pavel Rychlý.

Registration includes access to WebBootCat, a A user-friendly, web-based tool for building corpora instantly from publicly accessible documents on the web. It is particularly useful for small, short term projects such as translating and preparing topic-based teaching material. The corpora you create are then searchable via the Word Sketch Engine.

See also:
Click here for some examples of how the various search windows and data summaries can be used.
Click here for information and examples about using Corpus Query Language with the BNC.

Cobuild

An Introduction to Concordancing through the Collins Cobuild Corpus Sampler: my ten session tutorial teaching some aspects of using concordancers and collocations in language study. The Collins Cobuild Corpus Sampler is a demo version of Cobuild's mighty site, which provides you with an opportunity to try it out before subscribing. Using the query skills taught in my "Introduction" site, a great deal of benefit can be derived from this program. 

Just the Word

Just the Word, developed by Pete Whitelock of Sharp (UK). This gives you a detailed description of the company which a word keeps in modern-day English ... It combines the advantages of thesaurus and dictionary. Read the easy-to-use Getting Started, and get started. Enter a word and it returns significant data on collocations and colligation which we need when learning vocabulary thoroughly. This data is especially valuable in that it is preprocessed: we normally go through many separate steps to arrive at this. Furthermore when you click on the word combination in the list, concordances appear in the so-called KWIC format (key word in context), which allows you to observe further features, such as the domains and genres in which this is used.

VIEW

VIEW: Variation in English Words and Phrases, by Mark Davies of Brigham Young University, USA. This website allows you to search the 100 million word British National Corpus and to use "anchors" and "targets" for fuzzy matches, for example, all nouns somewhere near "break" (v), adjectives near "woman", verbs near "way", and nouns near "small". Perhaps the most unique aspect of the corpus is the ability to find the frequency of words and phrases in any combination of registers that you define (spoken, academic, poetry, medical, etc). In addition, you can compare between registers, for example, verbs that are more common in legal or medical texts, phrases like [I * that] that are more common in conversation than in non-fiction texts, nouns near "break" (v) that are found primarily in academic writings, etc. 
[These notes are based on Mark Davies' description of his VIEW.]

Other corpora available from Brigham Young can be found here.

BYU Corpus of American English

The corpus is composed of more than 360 million words in nearly 150,000 texts, including 20 million words each year from 1990-2007. For each year (and therefore overall, as well), the corpus is evenly divided between the five genres of spoken, fiction, popular magazines, newspapers, and academic journals. The texts come from a variety of sources. More info here ...

CLT

The Compleat Lexical Tutor (CLT) is a complex website created by Tom Cobb from the Université du Québec, Montréal. It contains a large number of resources for studying and testing yourself (column one), for researching language (column two), and for teachers to create interactive online resources for their students (column three). To open the concordancer, click on “Concordance” in the middle column. It seems to be powered by the Virtual Language Centre but has more options: as well as entering your search and output requirements, you must also choose a corpus: halfway down, you can choose All of the Above which gives a general corpus of 4 million words. From the output, you can click on the node which gives you the full sentence of the concordance. These pages also link to Wordnet.

There is a single page website here with some tips and recommendations.

Another tool provided by CLT is Hypertext which makes a hypertext version of a text you select. Go to The Aviator to see an example of a text that can be read with the assistance of CLT's dictionary and concordancer. They are the first and last in the list of eight activities based on this text. 

There is also Multiconc which makes a page of concordancers for a set of words that you enter. Click here for an example highlighting some words whose colligations learners tend to "misacquire". 

PIE

Exploring Words and Phrases from the British National Corpus is an online resource developed by William H. Fletcher of the US Naval Academy. It searches the British National Corpus to provide examples of chunks of language that words occur in. It aims to provide a simple yet powerful interface for studying words and phrases up to six words long appropriate for both experienced researchers and novice users. Searching for one word returns a random sample of up to fifty full sentences from the BNC - they are not in KWIC format.  “Phrases in English” allows you to select the part of speech. Therefore, for example, you can specify searching for fast as a noun, verb, adverb or adjective. You will need to read at least "What can PIE do now?"

BNC

The British National Corpus: the Simple Search of BNC-World is a sampler program. Click here for its specifications. You can search for parts of speech using the tags, which can be found here.

Webcorp

Web corp is a tool which extracts instances of language use from Web texts, providing a range of filtering and formatting options (including KWIC). In searching the whole web, it can be a bit slow, although it is under development. Using the Advanced mode, there are many possibilities for refining your search although part of speech and lemmatising are not possible. It also provides a Word list generator.

Glossanet

Glossanet consults the current edition of newspaper(s) that you select, applies your word query and sends you concordances in a email everyday. For an example of the daily output, click here

KWIC Finder

Also from William Fletcher (PIE above), this downloadable program builds KWIC concordances using the web as its corpus. It can therefore be used for any language.

MICASE

The Michigan Corpus of Academic Spoken English currently contains 152 transcripts (totaling 1,848,364 words). It can be browsed with ten search categories of speaker and speech event attributes, e.g., academic discipline, gender, first language. And it can be searched, which returns a standard KWIC concordance format.

VLC

The Virtual Language Centre includes a concordancer in the middle of a rather cluttered page, and many other useful things for language study and teaching as can be seen and heard at their Preview page. Its concordancer does not give many search or output options. 

Other corpora, other languages

This site contains links to corpora in many languages and other English corpora not included on this page. But it hasn't been updated since 2000.

David Lee's Devoted to Corpora webpage has many links to useful resources and is kept up to date.

 

                                               

Last updated 25.02.2008 .                                                                                                    Visitor number