From: Simon Suchomel <xsuchom1@anxur.fi.muni.cz>
Date: Thu, 19 Sep 2013 13:19:02 +0000 (+0200)
Subject: 1. verze hotove Simonovy casti
X-Git-Tag: 20130920-vytisteno~9
X-Git-Url: https://www.fi.muni.cz/~kas/git//home/kas/public_html/git/?p=pan13-paper.git;a=commitdiff_plain;h=b92a122ec0f3aca815db768cbd5ff1cde427cd38

1. verze hotove Simonovy casti
---

diff --git a/pan13-poster/img/document_awfc.pdf b/pan13-poster/img/document_awfc.pdf
index 71e5eff..0a48308 100755
Binary files a/pan13-poster/img/document_awfc.pdf and b/pan13-poster/img/document_awfc.pdf differ
diff --git a/pan13-poster/img/document_keywords.pdf b/pan13-poster/img/document_keywords.pdf
new file mode 100755
index 0000000..f60baf6
Binary files /dev/null and b/pan13-poster/img/document_keywords.pdf differ
diff --git a/pan13-poster/img/document_paragraphs.pdf b/pan13-poster/img/document_paragraphs.pdf
new file mode 100755
index 0000000..38c4372
Binary files /dev/null and b/pan13-poster/img/document_paragraphs.pdf differ
diff --git a/pan13-poster/img/queryprocess.pdf b/pan13-poster/img/queryprocess.pdf
new file mode 100755
index 0000000..e6d8a1a
Binary files /dev/null and b/pan13-poster/img/queryprocess.pdf differ
diff --git a/pan13-poster/poster.tex b/pan13-poster/poster.tex
index 42987ac..5e3c9a0 100755
--- a/pan13-poster/poster.tex
+++ b/pan13-poster/poster.tex
@@ -116,41 +116,32 @@
 
 
 \begin{multicols}{2}\setlength{\columnseprule}{0pt}
-
-
 \section{Introduction}
-
+%
 PAN 2013 LOrem ipsum Lorem ipsum Lorem ipsumLorem ipsumLorem ipsumLorem ipsumLorem ipsum 
-
-
+%
 \vfill
 \columnbreak
-
+%
 \begin{figure}
  \centering
-  \includegraphics[width=0.8\textwidth]{img/source_retrieval_process.pdf}
+  \includegraphics[width=0.6\textwidth]{img/source_retrieval_process.pdf}
   \caption{Plagiarism discovery process.}
   \label{fig:process}
 \end{figure} 
-
-
 \end{multicols}
-
-
-
 \begin{multicols}{2}
-
 %\rm
-
 %%% Introduction
 \section{Querying}
 Querying means to effectively utilize the search engine in order to retrieve as many relevant
 documents as possible with the minimum amount of queries.
 %We consider the resulting document relevantif it shares some of text characteristics with the suspicious document.
-In real-world queries as such represent appreciable cost, therefore their minimization should be one of the top priorities. \\
-\subsection{Types of Queries}
-From the suspicious document, there were three diverse types of queries extracted.
-\subsubsection{Keywords Based Queries}
+In real-world queries as such represent appreciable cost, therefore their minimization should be one of the top priorities. 
+%\subsection{Types of Queries}
+From the suspicious document, there were three diverse types of queries extracted.\\
+\begin{minipage}{0.55\linewidth}
+\subsection{Keywords Based Queries}
 \begin{ytemize}
 \item TF--IDF base automated keywords extraction;
 \item 5-token long; 
@@ -158,9 +149,15 @@ From the suspicious document, there were three diverse types of queries extracte
 \item Non-positional;
 \item Non-phrasal.
 \end{ytemize}
-
+\end{minipage}
+\begin{minipage}{0.45\linewidth}
+\begin{figure}[h]
+ %\centering
+  \includegraphics[width=1\linewidth]{img/document_keywords.pdf}
+\end{figure}
+\end{minipage}
 \begin{minipage}{0.55\linewidth}
-\subsubsection{Intrinsic Plagiarism Based Queries}
+\subsection{Intrinsic Plagiarism Based Queries}
 \begin{ytemize}
 \item Averaged Word Frequency Class based chunking~\cite{AWFC};
 \item Random sentence selection from the chunk;
@@ -175,16 +172,35 @@ From the suspicious document, there were three diverse types of queries extracte
   \includegraphics[width=1\linewidth]{img/document_awfc.pdf}
 \end{figure}
 \end{minipage}
-
-\subsubsection{Paragraph Based Queries}
+\begin{minipage}{0.55\linewidth}
+\subsection{Paragraph Based Queries}
 \begin{ytemize}
 \item Longest sentences from miscellaneous paragraphs;
 \item Deterministic;
 \item Positional;
 \item Phrasal.
 \end{ytemize}
+\end{minipage}
+\begin{minipage}{0.45\linewidth}
+\begin{figure}[h]
+ %\centering
+  \includegraphics[width=1\linewidth]{img/document_paragraphs.pdf}
+\end{figure}
+\end{minipage}
+
+\begin{figure}[h]
+ \centering
+  \includegraphics[width=0.8\linewidth]{img/queryprocess.pdf}
+   \caption{Stepwise queries execution process.}
+\end{figure}
 
 \section{Selecting}
+Document snippets were used for deciding whether to download the document for the text alignment.
+We used 2-tuples measurement, which indicates how many neighbouring word pairs coexist in the snippet and in the suspicious document.
+Performance of this measure is depicted at picture~\ref{fig:snippet_graph}.
+Having this measure, a threshold for download decision needs to be set in order to maximize all discovered similarities
+and minimize total downloads.
+A profitable threshold is such that matches with the largest distance between those two curves.
 \begin{figure}
   \centering
   \includegraphics[width=0.8\textwidth]{img/snippets_graph.pdf}
@@ -192,6 +208,7 @@ From the suspicious document, there were three diverse types of queries extracte
   \label{fig:snippet_graph}
 \end{figure}
 
+
 %
 % Yenyova cast
 %