X-Git-Url: https://www.fi.muni.cz/~kas/git//home/kas/public_html/git/?a=blobdiff_plain;f=pan13-paper%2Fpan13-notebook.tex;h=1d1330065b5e6b618dcf3cd86229f914d72d23c2;hb=6eefa7da2975121c65c1e7044098647f905af4cb;hp=a6a711656247111e8eabf7c95e05c38c1fbf4a99;hpb=14ecfe62bce797cf4a4dba67481fccce2bba24aa;p=pan13-paper.git diff --git a/pan13-paper/pan13-notebook.tex b/pan13-paper/pan13-notebook.tex index a6a7116..1d13300 100755 --- a/pan13-paper/pan13-notebook.tex +++ b/pan13-paper/pan13-notebook.tex @@ -7,7 +7,7 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} -\title{Neco Simonovo and Feature Type Selection for Pairwise Document Comparison} +\title{Diverse Queries and Feature Type Selection for Pairwise Document Comparison} %%% Please do not remove the subtitle. \subtitle{Notebook for PAN at CLEF 2013} @@ -22,8 +22,8 @@ This paper describes approaches used for the Plagiarism Detection task in PAN 20 on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. -For the detailed comparison task, we discuss feature type selection, -global postprocessing. We have significantly improved the pairwise comparison +For the Detailed Comparison task, we discuss feature type selection, +global postprocessing. We significantly improved the pairwise comparison results with even further optimizations possible. \end{abstract} @@ -36,9 +36,10 @@ Section~\ref{source_retr} describes querying approach for source retrieval, wher types of queries. We present a new type of query based on text paragraphs. The query execution were controled by its type and by preliminary similarities discovered during the searches. -In Section~\ref{text_alignment} we describe our approach for the text alignment +Section~\ref{text_alignment} describes our approach for the text alignment (pairwise comparison) subtask. We briefly introduce our system, -and then we discuss the feature types, which are usable for pairwise comparison,including the evaluation of their feasibility for this purpose. We then describe +and then we discuss the feature types, which are usable for pairwise comparison, +including the evaluation of their feasibility for this purpose. We then describe the global (corpus-wide) optimizations used, and finally we discuss the results achieved and further development. @@ -47,8 +48,12 @@ the results achieved and further development. \section{Conclusions} - -Unfortunately the ChatNoir search engine does not support phrasal search, therefore it +We introduces querying strategy with snippet similarity measure which approved to be +competitive. In source retrieval subtask the strategy performed with the second best ratio +of recall to the number of used queries. +We focused our queries on selected parts of text +and on parts with no discovered external similarities. +Unfortunately the ChatNoir search engine currently does not support phrasal search, therefore it is possible that evaluated results may be quite distorted in this manner. In the text alignment subtask, we have achieved a significant improvement