|
Students who have achieved the threshold level
of
English can discover a great deal about how the language works for themselves,
especially using such resources as those linked below. These resources also
provide a lot more information than dictionaries can: in fact, modern
dictionaries are created using such resources and are a distillation of the
data.
Go to my fledgling site on Corpus
Linguistics
Corpus-based Language Study, a portal that includes links to software and
various tools as well as to online articles and some other language tools
suitable for teachers and students of English.
The
Sketch Engine
The SkE is above all a
concordancing program, from Lexical Computing Ltd. The Sketch Engine is a new
Corpus Query System incorporating word sketches, grammatical relations,
and a distributional thesaurus. Register
here to use the sample
data, which includes the British National Corpus.
You may use the website at no cost for demonstration, research and teaching purposes only.
Wordsketches
is an online resource that automates some of the findings of both concordance
and collocation searches. You get statistical
data about how a word combines with both lexical and grammar words, and from
there you can see concordances of specific combinations. It was conceived by Adam
Kilgarriff and created by Pavel Rychlý.
Registration includes access to WebBootCat,
a A user-friendly, web-based tool for building corpora instantly
from publicly accessible documents on the web. It is particularly useful for
small, short term projects such as translating and preparing topic-based
teaching material. The corpora you create are then searchable via the Word
Sketch Engine.
See also:
Click here
for some examples of how the various search windows and data summaries can be
used.
Click here for
information and examples about using Corpus Query Language with the BNC.
Cobuild
An Introduction to
Concordancing through the Collins Cobuild Corpus Sampler: my ten session
tutorial teaching some aspects of using concordancers and collocations in language
study. The Collins Cobuild
Corpus Sampler is a demo version of Cobuild's mighty site, which
provides you with an opportunity to try it out before
subscribing. Using the
query skills taught in my "Introduction" site, a great deal of
benefit can be derived from this program.
Just the Word
Just the Word,
developed by Pete Whitelock of Sharp (UK). This gives you a detailed description of the
company which a word keeps in
modern-day English ... It combines the advantages of thesaurus and
dictionary. Read the easy-to-use Getting Started, and get started. Enter
a word and it returns significant data on collocations and colligation which we
need when learning vocabulary thoroughly. This data is especially valuable in that it
is preprocessed: we normally go through many separate steps to arrive at
this. Furthermore when you click on the word combination in the list,
concordances appear in the so-called KWIC format (key word in context), which
allows you to observe further features, such as the domains and genres in which
this is used.
VIEW
VIEW: Variation
in English Words and Phrases, by Mark Davies of Brigham
Young University, USA. This website allows you to
search the 100 million word British National Corpus and to use
"anchors" and "targets" for fuzzy matches, for example, all
nouns somewhere near "break" (v), adjectives near "woman",
verbs near "way", and nouns near "small". Perhaps the most
unique aspect of the corpus is the ability to find the frequency of words and
phrases in any combination of registers that you define (spoken, academic,
poetry, medical, etc). In addition, you can compare between registers, for
example, verbs that are more common in legal or medical texts, phrases like [I *
that] that are more common in conversation than in non-fiction texts, nouns near
"break" (v) that are found primarily in academic writings, etc.
[These notes are based on Mark Davies' description of
his VIEW.]
Other corpora available from
Brigham Young can be found here.
BYU Corpus of American English
The corpus is composed of more than 360 million
words in nearly 150,000 texts, including 20 million words each year from
1990-2007. For each year (and therefore overall, as well), the corpus is evenly
divided between the five genres of spoken, fiction, popular magazines,
newspapers, and academic journals. The texts come from a variety of sources.
More info here ...
CLT
The Compleat Lexical Tutor
(CLT) is a complex
website created by Tom Cobb from the Université du Québec, Montréal. It
contains a large number of resources for studying and testing yourself (column
one), for researching language (column two), and for teachers to create
interactive online resources for their students (column three). To
open the concordancer, click on “Concordance” in the middle column.
It seems to be powered by the Virtual Language Centre but has more options: as
well as entering your search and output requirements, you must also choose a
corpus: halfway down, you can choose All of the Above which gives a
general corpus of 4 million words. From the output, you can click on the node
which gives you the full sentence of the concordance. These pages also link to
Wordnet.
There is a single page website
here with some tips and
recommendations.
Another
tool provided by CLT is Hypertext
which makes a hypertext version of a text you select. Go to The
Aviator to see an example of a text that can be read with the
assistance of CLT's dictionary and concordancer. They are the first and last
in the list of eight activities based on this text.
There
is also Multiconc which makes a page of concordancers for a set of words that you enter.
Click here for an example
highlighting some words whose colligations learners tend to "misacquire".
PIE
Exploring Words and Phrases from the British National Corpus is an online resource developed by William H. Fletcher of the US Naval Academy. It searches the British National Corpus to provide examples of chunks of language that words occur in.
It aims to provide a
simple yet powerful interface for studying words and phrases up to six words
long appropriate for both experienced researchers and novice users. Searching
for one word returns a random sample of up to fifty full sentences from the BNC
- they are not in KWIC format. “Phrases in English” allows you to
select the part of speech. Therefore, for example, you can specify searching for
fast as a noun, verb, adverb or adjective. You will need to read at least
"What
can PIE do now?"
BNC
The
British National Corpus: the Simple Search of BNC-World
is a sampler program. Click
here for its specifications. You can search for parts of speech using the
tags, which can be found here.
Webcorp
Web
corp
is a tool which extracts instances of language use from Web texts,
providing a range of filtering and formatting options (including KWIC). In
searching the whole web, it can be
a bit slow, although it is under development.
Using the Advanced mode, there are many possibilities for refining your search
although part of speech and lemmatising are not possible. It also provides a Word
list generator.
Glossanet
Glossanet
consults the current edition of newspaper(s) that you select, applies your word
query and sends you concordances in a email everyday. For an
example of the daily output, click here.
KWIC Finder
Also from William Fletcher (PIE above), this downloadable
program builds KWIC concordances using the web as its corpus. It can
therefore be used for any language.
MICASE
The
Michigan
Corpus of Academic Spoken English currently contains 152 transcripts
(totaling 1,848,364 words). It can be browsed with ten search
categories of speaker and speech event attributes, e.g., academic discipline,
gender, first language. And it can be searched, which returns a
standard KWIC concordance format.
VLC
The Virtual Language Centre includes
a concordancer in the middle of a rather cluttered page, and many other useful things
for language study and teaching as can be seen and heard at their Preview
page. Its concordancer does not
give many search or output options.
Other corpora, other languages
This site
contains links to corpora in many languages and other English corpora not
included on this page. But it hasn't been updated since 2000.
David Lee's Devoted
to Corpora webpage has many links to useful resources and is kept up
to date.
|