-We present modified three-way search methodology for source retrieval subtask.
-TODO Neco podrobnejsiho.
-
-For the text alignment subtask, we use the similar approach as in PAN 2012.
-We detect common features of various types between the suspicious and source
-documents. We have experimented with more types of features. The best
-results had the combination of sorted word 4-grams with unsorted stop-word
-8-grams. From the common features we compute valid intervals, which map
-passages from the suspicious document to the passages of the source document,
-such that these passages are covered ``densely enough'' with corresponding
-common features. For PAN 2013, we have modified the postprocessing phase:
-the fact that the algorithm had access to the whole corpus of source and
-suspicious documents at once allowed us to process the documents in one
-batch and to perform a global post-processing, handling the overlapping
-detections not only between the given suspicious and source document,
-but also between all the detections from a given suspicious document.
-The modifications brought a significant improvement compared to PAN 2013
-on a training corpus, and the results from the competition corpus
-are similar enough to claim that these improvements are usable in general.