%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
-\title{Improving plagiarism detection}
+\title{Neco Simonovo and Feature Type Selection for Pairwise Document Comparison}
%%% Please do not remove the subtitle.
\subtitle{Notebook for PAN at CLEF 2013}
This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition
on uncovering plagiarism, authorship, and social software misuse.
We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance.
-Next, we show changes in selected feature for text alignement which led to plagdet score improvement.
-The results of source retrieval show, that presented approach is adaptable in real-world plagiarism situations.
-Improved results for text alignment achieved in the competition overall third place.
+The results show, that presented approach is adaptable in real-world plagiarism situations.
+For the detailed comparison task, we discuss feature type selection,
+global postprocessing. We have significantly improved the pairwise comparison
+results with even further optimizations possible.
\end{abstract}
\section{Introduction}
In PAN 2013 competition on plagiarism detection we participated in both the Source Retrieval
-and the Text Alignment subtask. In both tasks we adapted methodology used in PAN 2012.
+and the Text Alignment subtask. In both tasks we adapted methodology used in PAN 2012\footnote{%
+See \cite{pan2012} for an overview of PAN 2012 plagiarism detection campaign.} \cite{suchomel_kas_12}.
Section~\ref{source_retr} describes querying approach for source retrieval, where we used three different
types of queries. We present a new type of query based on text paragraphs.
The query execution were controled by its type and by preliminary similarities
discovered during the searches.
-In section~\ref{text_alignment} we present modified common text feature fot text alignment.
-We also compare performance of both the previous and the modified algorithms.
-
+In Section~\ref{text_alignment} we describe our approach for the text alignment
+(pairwise comparison) subtask. We briefly introduce our system,
+and then we discuss the feature types, which are usable for pairwise comparison,including the evaluation of their feasibility for this purpose. We then describe
+the global (corpus-wide) optimizations used, and finally we discuss
+the results achieved and further development.
\input{simon-source_retrieval}
\input{yenya-text_alignment}
Unfortunately the ChatNoir search engine does not support phrasal search, therefore it
is possible that evaluated results may be quite distorted in this manner.
+In the text alignment subtask, we have achieved a significant improvement
+with respect to our system from PAN 2012. Further development in this
+area is still possible. For a real-world system, however, a completely
+different set of parameters and heuristics needs to be used, as a result
+of plagdet score together with the structure of the competition corpus
+being too different from the real world.
+
\bibliographystyle{splncs03}
\begin{raggedright}
\bibliography{pan13-notebook}