+We have represented each feature with the 32 highest-order bits of its
+MD5 digest. This is only a performance optimization, targeted for
+larger systems. The number of features in a document pair is several orders
+of magnitude lower than $2^{32}$, so the probability of hash function
+collision is low. For pair-wise comparison, it would be feasible to compare
+the features directly instead of their MD5 sums.
+
+Each feature has also the offset and length attributes.
+Offset is taken as the offset of the first word in a given feature,
+and length is the offset of the last character in a given feature
+minus the offset of the feature itself.
+
+\subsection{Common Features}
+
+For further processing, we have taken into account only the features
+present both in source and suspicious document. For each such
+{\it common feature}, we have created the list of
+$(\makebox{offset}, \makebox{length})$ pairs for the source document,
+and a similar list for the suspicious document. Note that a given feature
+can occur multiple times both in source and suspicious document.
+
+\subsection{Valid Intervals}
+
+To detect a plagiarized passage, we need to find a set of common features,
+which map to a dense-enough interval both in the source and suspicious
+document. In our previous work, we have presented the algorithm
+for discovering these {\it valid intervals} \cite{Kasprzak2009a}.
+A similar approach is used also in \cite{stamatatos2011plagiarism}.
+Both of these algorithms use the features of a single type, which
+allows to use the ordering of features as a measure of distance.
+
+When we use features of different types, there is no natural ordering
+of them: for example, a stop-word 8-gram can span multiple sentences,
+which can contain several word 5-grams. The assumption of both of the
+above algorithms, that the last character of the previous feature
+is before the last character of the current feature, is broken.
+
+We have modified the algorithm for computing valid intervals with
+multi-feature detection to use character offsets