The report FIMU-RS-2008-04
Distributed System for Discovering Similar Documents
A full version of the paper presented at the ICEIS 2008 converence (www.iceis.org). July 2008, 14 pages.
Available as Postscript,
One of the drawbacks of e-learning methods such as Web-based submission
and evaluation of students` papers and essays is that it has become easier
for students to plagiarize the work of other people.
In this paper we present a computer-based system for discovering
similar documents, which has been in use at Masaryk University in Brno
since August 2006, and which will also be used
in the forthcoming Czech national archive of graduate theses. We also
focus on practical aspects of this system: achieving near real-time response
to newly imported documents, and computational feasibility of handling large
sets of documents on commodity hardware. We also show the possibilities
and problems with parallelization of this system for running on a distributed
cluster of computers.