Using Corpus Query Language for complex searches

These notes detail creating detailed searches of the British National Corpus when using the Word Sketch Engine. Click here for links to them.
  1. Operators
    1. Capitals
  2. Corpus Query Language
    1. A word
    2. Part of speech
    3. Lemma
    4. Combining elements
    5. Strings of elements
    6. How to allow space between elements
    7. How to exclude elements 
    8. Searching for punctuation 
    9. BNC Anomalies
      1. One word equals two
      2. Two words (and more) equal one
  3. Word Sketch Engine Links
Operators The following operators can be used in all fields: lemma, phrase, word form, CQL.

full stop before, during or after a word finds that word with any letter in the space of the full stop.
b.g gives big bag beg bog bug (click on Node Forms in Frequency to obtain the list)
asterisk after a full stop finds any number of letters in place of the full stop
.*ship finds words ending with ship
oxy.* finds words beginning with oxy
.*interest finds words beginning with interest
finds words that start with bl and end with ng
.*advantag.* finds words with advantag in the middle.
question mark The bracketed items are optional characters.
blond(e)? finds blond and blonde
vertical bar struggle|battle|fight searches for all of these items
slash To search for a full stop, use the slash and the full stop.
Dr\. finds Dr.


If you type Doctor in lemma, you only get Doctor.
However, doctor in lemma gives all possibilities, upper and lower case.
To have doctor only, type it into word form and choose Match Case.


Corpus Query Language

In the Word Sketch Engine, queries can be automatically generated by typing words into the lemma, phrase, left context etc., fields. Typing your own queries gives you greater control over what you search for.

Each element of a query is enclosed in square brackets: [ ] and you can type a long string of elements.
Specific search items, usually words and tags, are enclosed in quotation marks: " "

Here is an example of a query that includes many of the elements that are illustrated below. It searches the BNC for the lemma "bias" followed by either "towards" or "torward" which are followed by a noun within three words.

This is typed into the CQL field: [lemma = "bias"] [word = "towards|toward"] []{1,3}[tag= "NN."]

Click the picture to see the result of the search.

bias towards

A word

Create a query searching for a particular word. Use lower case.
[word  = "untoward"]

To search for more than one word, use vertical bar.
[word = "amid|amidst"]
[word = "struggle|battle|fight"]


Part of speech

Create a query searching for a particular part of speech (POS). Use upper case.
A list of POS can be found here.      
[tag = "   "]

Note: parts of tags can be substituted with a full stop. All verb tags, for example, start with V.
The second element is B for the verb to be, H for to have, D for to do, M for modals and V for lexical verbs.
The third element is B for base form, D for past tense, N for past participle, G for ing form, Z for third person singular.
For example:
[tag = "V.."] searches for all verbs in all forms
[tag = "VV."]  searches for all lexical verbs in all forms
[tag = "VD."] searches for all forms of the verb to do
[tag = "V.N"] searches for the past participle forms of all verbs


Create a query searching for a particular lemma ...
[lemma = "impact"]

... or particular lemmas.
[lemma = "struggle|battle|fight"]


Combining elements

"impact" is a noun and a verb. To search for the lemma with a specific POS we use ampersand.
[lemma = "impact" & tag = "V.."]
[lemma = "criterion" & tag = "NN2"] finds the noun criterion in the plural

Be careful to

Strings of elements

What prepositions follow impact?
[lemma = "impact"] [tag = "PRP"]

What preposition follows the noun impact?
[lemma = "impact" & tag = "N.."] [tag = "PRP"]

What prepositions follow these near synonyms?
[lemma = "struggle|battle|fight"] [tag = "PRP"]


How to allow space between elements

Words often appear between your target elements. For example, nouns are often proceeded by determiners and adjectives, phrasal and delexical verb groups often have other elements between the components, all sorts of phrases and structures permit variation.
The empty brackets allow any one word to appear inbetween.
[lemma = ""][] [lemma = "approach"]
The number between the braces {} indicates the number of words permitted inbetween. This query asks for three words between make and success.
[lemma = "make"][]{3}[lemma ="success"]
Using {1,3} gives the range - from one to three. This query asks for one or two or three words between let and down.
[lemma = "let"][]{1,3}[word ="down"]
This query asks for up to five words separating whether and or not.
[word = "whether"][]{1,5}[word ="or"][word ="not"]
This query asks for up to three words between approach and singular or plural noun followed by an infinitive with to.
[lemma ="approach"] []{1,3}[tag="NN."] [tag = "TO0"][tag = "VVI"]  


How to exclude elements 

The exclamation mark preceding the equals sign means does not equal. The following query will find fast as a noun, verb and adverb, but not as an adjective.
[lemma="fast" & tag != "AJ0"]
The next example finds dream followed by anything but about.
[lemma="dream"] [word !="about"]
The next examples find all forms of break followed by five words and then smile not as a verb.
[lemma = "break"] []{5} [lemma="smile" & tag !="V.."]


Searching for punctuation 

As some punctuation serves as query codes, it is necessary to escape them by using the forward slash \. The first example here searches for which preceded by a comman. The second example without.
[word = "\,"][word = "which"]
[word != "\,"][word = "which"]

BNC Anomalies

One word equals two (and more)

You can search for don't as a lemma but it returns the noun form as in the do's and don'ts.  To search for contractions, ..... The full list of these forms and their tags can be found at the BNC site.

Two words (and more) equal one

Many multi-word units (MWU) have been tagged as single words. For example, in case, every so often, out of touch with. This will affect the results of your search if, for example, your search for the preposition preceding case or touch was formed by looking for these key words. Try it and see!
The full list of these MWUs and their tags can be found at the BNC site.


Word Sketch Engine Links

If you are not already logged in, click on Cancel on the log in window and you will be taken to the relevant registration page.

Visitor number