Prvni draft

[pan12-paper.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 9a907578daf0f480748324a59f8fa2bea28f2dd8..dd3c3b7eb3bca282660dc46085e1476516aead19 100755 (executable)
--- a/paper.tex
+++ b/paper.tex
@@ -7,6 +7,7 @@
  \usepackage{algorithm}
  \usepackage{algorithmic}
  \usepackage{amssymb}
+\usepackage{multirow}
  
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \begin{document}
@@ -33,17 +34,22 @@ Briefly describe the main ideas of your approach.
  
  Due to the increasing ease of plagiarism the plagiarism detection has nowadays become a need for many institutions.
  Especially for universities where modern learning methods include e-learning and a vast document sources are online available.
-In the Information System of Masaryk University there is also an antiplagiarism tool which is based upon the same principles as are shown in this paper.
+%In the Information System of Masaryk University~\cite{ismu} there is also an antiplagiarism tool which is based upon the same principles as are shown in this paper.
  The core methods for automatic plagiarism detection, which also work in practice on extensive collections of documents,
  are based on computation document similarities. In order to compute a similarity
  we need to possess the original and the plagiarized document.
-The most straightforward method is to use an online search engine in order to enrich
-document base with potential plagiarized documents and evaluate the amount of plagiarism by detailed document comparison. 
-In this paper we introduce a method which has been used in PAN 2012 competition\footnote{\url{http://pan.webis.de/}}
-in plagiarism detection.
-In the first section we described our aproach to retrieve candidate documents for detailed document comparison from online sources.
+%The most straightforward method is to use an online search engine in order to enrich
+%document base with potential plagiarized documents and evaluate the amount of plagiarism by detailed document comparison. 
+%In this paper we introduce a method which has been used in PAN 2012 competition\footnote{\url{http://pan.webis.de/}}
+%in plagiarism detection.
+In the first section we will introduce methods for candidate document retrieval from online sources, which took part in
+PAN 2012  competition\footnote{\url{http://pan.webis.de/}} in plagiarism detection. 
+The task was to retrieve a set of candidate source documents that may had served as an original to plagiarize from.
+In the PAN 2012 candidate document retrieval test corpus, there were 32 text documents all contained at least one plagiarism case.
+The documents were approximately 30 KB of size, the smallest were 18 KB and the largest were 44 KB.
+
+In the second section we describe our approach of detailed document comparison.
   
-The next section describes used methods of computation document similarities.
  We also discuss the performance ...
  
  
@@ -53,14 +59,16 @@ We also discuss the performance ...
  
  \section{Conclusions}
  
-We have presented methods for candidate document retrieval which has led to
-discovery the decent amount of plagiarism with minimizing the number of used queries.   
+We present methods for candidate document retrieval which lead to
+discovery the decent amount of plagiarism with minimizing the number of used queries. 
+The proposed methods are applicable in general to any type of text input with no apriori information about the input document.
+In PAN 2012 competition the proposed methods succeeded with similar amount of plagiarism detected with
+only a small fraction of used queries compared to the others.  
+ 
+
+   
  
-We have created three main types of queries: keywords based, intrinsic plagiarism based and headers based.
-....
-%We distinguish two properties of queries: positionable, conditionally executable  
  
-....
  \bibliographystyle{splncs03}
  \begin{raggedright}
  \bibliography{paper}