X-Git-Url: https://www.fi.muni.cz/~kas/git//home/kas/public_html/git/?a=blobdiff_plain;f=paper.tex;fp=paper.tex;h=92e4a39a8646e4c8841563496cc0c2d4b076c511;hb=f60109308b38984bfe852f58aea209863e482ff3;hp=e098f4ae4ceb6da19f51cf9b6876d7e38a20b561;hpb=156645b9e5fe38063870bb9e6447a7167ab44a28;p=pan12-paper.git diff --git a/paper.tex b/paper.tex index e098f4a..92e4a39 100755 --- a/paper.tex +++ b/paper.tex @@ -8,7 +8,7 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} -\title{Your Title} +\title{Three way search engine queries with multi-feature document comparison for plagiarism detection} %%% Please do not remove the subtitle. \subtitle{Notebook for PAN at CLEF 2012} @@ -28,8 +28,20 @@ Briefly describe the main ideas of your approach. %The notebooks shall contain a full write-up of your approach, including all details necessary to reproduce your results. -Due to the increasing ease of plagirism the plagiarism detection has nowdays become a need for many instutisions. Especially for universities where modern learning methods include e-learning and a vast document sources are online available. - +Due to the increasing ease of plagiarism the plagiarism detection has nowadays become a need for many institutions. +Especially for universities where modern learning methods include e-learning and a vast document sources are online available. +In the Information System of Masaryk University there is also an antiplagiarism tool which is based upon the same principles as are shown in this paper. +The core methods for automatic plagiarism detection, which also work in practice on extensive collections of documents, +are based on computation document similarities. In order to compute a similarity +we need to possess the original and the plagiarized document. +The most straightforward method is to use an online search engine in order to enrich +document base with potential plagiarized documents and evaluate the amount of plagiarism by detailed document comparison. +In this paper we introduce a method which has been used in PAN 2012 competition\footnote{\url{http://pan.webis.de/}} +in plagiarism detection. +In the first section we described our aproach to retrieve candidate documents for detailed document comparison from online sources. + +The next section describes used methods of computation document similarities. +We also discuss the performance ... @@ -38,8 +50,14 @@ Due to the increasing ease of plagirism the plagiarism detection has nowdays bec \section{Conclusions} -Tady napsat zaver +We have presented methods for candidate document retrieval which has led to +discovery the decent amount of plagiarism with minimizing the number of used queries. + +We have created three main types of queries: keywords based, intrinsic plagiarism based and headers based. +.... +%We distinguish two properties of queries: positionable, conditionally executable +.... \bibliographystyle{splncs03} \begin{raggedright} \bibliography{paper}