1. verze hotove Simonovy casti

[pan13-paper.git] / pan13-poster / poster.tex
diff --git a/pan13-poster/poster.tex b/pan13-poster/poster.tex

index c3ab1c57b5ad05ceb3e72724b69833b96c2be5cb..5e3c9a095b4e30283e5c9b63b798d7325becc268 100755 (executable)
--- a/pan13-poster/poster.tex
+++ b/pan13-poster/poster.tex
@@ -7,6 +7,8 @@
  \usepackage{bera}\r
  \usepackage[utf8]{inputenc}\r
  %\usepackage{fancybullets}\r
+%\usepackage{floatflt}\r
+%\usepackage{graphics}\r
  \r
  \definecolor{BoxCol}{rgb}{0.9,0.9,1}\r
  % uncomment for light blue background to \section boxes \r
@@ -114,38 +116,32 @@
  \r
  \r
  \begin{multicols}{2}\setlength{\columnseprule}{0pt}\r
-\r
-\r
  \section{Introduction}\r
+%\r
  PAN 2013 LOrem ipsum Lorem ipsum Lorem ipsumLorem ipsumLorem ipsumLorem ipsumLorem ipsum \r
-\r
-\r
-\r
+%\r
+\vfill\r
+\columnbreak\r
+%\r
  \begin{figure}\r
   \centering\r
-  \includegraphics[width=0.8\textwidth]{img/source_retrieval_process.pdf}\r
+  \includegraphics[width=0.6\textwidth]{img/source_retrieval_process.pdf}\r
    \caption{Plagiarism discovery process.}\r
    \label{fig:process}\r
  \end{figure} \r
-\r
-\r
  \end{multicols}\r
-\r
-\r
-\r
  \begin{multicols}{2}\r
-\r
  %\rm\r
-\r
  %%% Introduction\r
  \section{Querying}\r
  Querying means to effectively utilize the search engine in order to retrieve as many relevant\r
  documents as possible with the minimum amount of queries.\r
  %We consider the resulting document relevantif it shares some of text characteristics with the suspicious document.\r
-In real-world queries as such represent appreciable cost, therefore their minimization should be one of the top priorities. \\\r
-\subsection{Types of Queries}\r
-From the suspicious document, there were three diverse types of queries extracted.\r
-\subsubsection{Keywords Based Queries}\r
+In real-world queries as such represent appreciable cost, therefore their minimization should be one of the top priorities. \r
+%\subsection{Types of Queries}\r
+From the suspicious document, there were three diverse types of queries extracted.\\\r
+\begin{minipage}{0.55\linewidth}\r
+\subsection{Keywords Based Queries}\r
  \begin{ytemize}\r
  \item TF--IDF base automated keywords extraction;\r
  \item 5-token long; \r
@@ -153,7 +149,15 @@ From the suspicious document, there were three diverse types of queries extracte
  \item Non-positional;\r
  \item Non-phrasal.\r
  \end{ytemize}\r
-\subsubsection{Intrinsic Plagiarism Based Queries}\r
+\end{minipage}\r
+\begin{minipage}{0.45\linewidth}\r
+\begin{figure}[h]\r
+ %\centering\r
+  \includegraphics[width=1\linewidth]{img/document_keywords.pdf}\r
+\end{figure}\r
+\end{minipage}\r
+\begin{minipage}{0.55\linewidth}\r
+\subsection{Intrinsic Plagiarism Based Queries}\r
  \begin{ytemize}\r
  \item Averaged Word Frequency Class based chunking~\cite{AWFC};\r
  \item Random sentence selection from the chunk;\r
@@ -161,15 +165,42 @@ From the suspicious document, there were three diverse types of queries extracte
  \item Positional;\r
  \item Phrasal.\r
  \end{ytemize}\r
-\subsubsection{Paragraph Based Queries}\r
+\end{minipage}\r
+\begin{minipage}{0.45\linewidth}\r
+\begin{figure}[h]\r
+ %\centering\r
+  \includegraphics[width=1\linewidth]{img/document_awfc.pdf}\r
+\end{figure}\r
+\end{minipage}\r
+\begin{minipage}{0.55\linewidth}\r
+\subsection{Paragraph Based Queries}\r
  \begin{ytemize}\r
  \item Longest sentences from miscellaneous paragraphs;\r
  \item Deterministic;\r
  \item Positional;\r
  \item Phrasal.\r
  \end{ytemize}\r
+\end{minipage}\r
+\begin{minipage}{0.45\linewidth}\r
+\begin{figure}[h]\r
+ %\centering\r
+  \includegraphics[width=1\linewidth]{img/document_paragraphs.pdf}\r
+\end{figure}\r
+\end{minipage}\r
+\r
+\begin{figure}[h]\r
+ \centering\r
+  \includegraphics[width=0.8\linewidth]{img/queryprocess.pdf}\r
+   \caption{Stepwise queries execution process.}\r
+\end{figure}\r
  \r
  \section{Selecting}\r
+Document snippets were used for deciding whether to download the document for the text alignment.\r
+We used 2-tuples measurement, which indicates how many neighbouring word pairs coexist in the snippet and in the suspicious document.\r
+Performance of this measure is depicted at picture~\ref{fig:snippet_graph}.\r
+Having this measure, a threshold for download decision needs to be set in order to maximize all discovered similarities\r
+and minimize total downloads.\r
+A profitable threshold is such that matches with the largest distance between those two curves.\r
  \begin{figure}\r
    \centering\r
    \includegraphics[width=0.8\textwidth]{img/snippets_graph.pdf}\r
@@ -177,8 +208,17 @@ From the suspicious document, there were three diverse types of queries extracte
    \label{fig:snippet_graph}\r
  \end{figure}\r
  \r
+\r
+%\r
+% Yenyova cast\r
+%\r
+\r
  \section{Text Alignment}\r
  \r
+%\r
+% Spolecna cast\r
+%\r
+\r
  \section{Conclusion}\r
  \r
  Nějaký závěr\r