Corpus Managers and their effective implementation
The thesis deals with corpus managers
-- software tools for
text processing. A corpus
is understood here as a huge
collection of texts in electronic form. It is used as a resource of
the empirical language data, i.e. words, their meanings and contexts
they occur in. The corpora can be employed in many fields of
linguistics (morphology, syntax, semantics, stylistics,
sociolinguistics etc.) and the corpus managers are primary tools
enabling corpus exploration.
In the work we would like to describe and explain what services
corpus manager should offer and can offer. We describe the
individual features from the users' viewpoint and the respective
implementation problems as well. For the key operations of the
corpus manager we present the respective algorithms and data
structures, which guarantee fast performance with minimal
requirements on main and disk memory.
Our results are already being used for building a new faster and
more efficient corpus manager.