The problem outlined in the present paper presented itself in the planning stage of a project aiming at constructing the IPI PAN corpus of written Polish. The IPI PAN corpus is going to contain at least 75-100 million words. It is also going to be annotated structurally and morphosyntactically according to the suggestions laid out in the Corpus Encoding Standard Guidelines (Ide et al. 1996). The corpus will contain several sub-corpora divided according to the genre of the texts that make them up (e.g., literary texts, dialogue transcripts, etc.), as well as a balanced reference subcorpus that should be representative of modern standard Polish, and a hand-verified subcorpus designed for the purpose of training the morphosyntactic tagger. In what follows, we briefly report on a class of design problems related to the use of the so-called stand-off morphosyntactic and structural annotation, advocated by the Corpus Encoding Standard.