X-Git-Url: https://www.fi.muni.cz/~kas/git//home/kas/public_html/git/?a=blobdiff_plain;f=pan13-paper%2Fsimon-source_retrieval.tex;h=2777f3777f544a469697bf2305e6226e1fa9d287;hb=b2a29fbd0610261c2e1ba0738b181a9a98ed01ee;hp=4370a1de5ae5f1a2bedd226e9167c0c149823621;hpb=ebba97ad24be305e65ceb7cfdbb34d54d9a6bfba;p=pan13-paper.git diff --git a/pan13-paper/simon-source_retrieval.tex b/pan13-paper/simon-source_retrieval.tex index 4370a1d..2777f37 100755 --- a/pan13-paper/simon-source_retrieval.tex +++ b/pan13-paper/simon-source_retrieval.tex @@ -5,7 +5,7 @@ large corpus. Those candidate documents are usually further compared in detail w suspicious document. In PAN 2013 source retrieval subtask the main goal was to identify web pages which have been used as a source of plagiarism for test corpus creation. -The test corpus contained 58 documents each discussing only one theme. +The test corpus contained 58 documents each discussing one topic only. Those documents were created intentionally by semiprofessional writers, thus they featured nearly realistic plagiarism cases~\cite{plagCorpus}. Resources were looked up in the ClueWeb\footnote{\url{http://lemurproject.org/clueweb09.php/}} corpus.