source retrieval querying search engine snippet url download plagiarism detection pairwise document comparison plagiarized passage detection common features valid intervals