Jan Pazdziora: Perl

Česká verze této stránky

Perl modules

XBase / DBD::XBase for dbf files

Distribution DBD::XBase (version 0.241) contains module XBase that provides read and write access to dbf files, including memo information in dbt and fpt files. The distribution also includes DBI driver DBD::XBase that provides SQL interface.

Distribution Cstools with cstocs

Distribution Cstools (version 3.42) contains well-known character set conversion program cstocs, and module Cz::Cstocs that makes it easy to do charset conversions directly in Perl scripts, without spawning external process. Also included is Cz::Sort module that provides functions for Czech collation in Perl scripts without a need of locale settings. It is based on conversion table from csr program by Petr Olšák.

Docclient / docserver for Word a Excel

Docserver (version 1.12) provides conversion of proprietary formats using remote calls of native Windows applications. You need some Windows box with Word and/or Excel, run docserver on it and then you can run conversions to plain text, HTML or other formats from anywhere, using either Perl code or via command line scripts. The results are however only as good as provided by those native applications.

Principles of the solution as presented at YAPC::Europe 2000 (also available in PostScript).

DBIx::ShowCaller

When DBIx::ShowCaller (version 0.80) is used to connect to the database instead of direct DBI->connect, it adds Perl caller information to SQL commands. It then makes it easier to debug the system using for example command cache and logs of the database server. The SQL commands look like

/* /www/scripts/script.pl at line 25 */
select name from prodcuts where id = ?

Fixup Compress::Bzip2

I fixed the Compress::Bzip2 1.00 module to work with current (1.0.2+) versions of the bzip2 library, fetch the Compress-Bzip2-1.0002.tar.bz2 distribution.

RayApp

Module where Perl scripts run in the mod_perl environment do not generate any markup output (like HTML), but return data that is embedded into data structure description (DSD). The resulting can be processed on the server, for example with XSLT, or sent to client.

MyConText / DBIx::FullTextSearch — fulltext indexing using MySQL

This module is currently on CPAN and SourceForge as DBIx::FullTextSearch. I haven't been participating in the development lately.

Flexible module for indexing word occurencies in documents. Anything can be a document — file, record in MySQL, Web page. MyConText provides methods for building complex index and search solution. It is not an application, but a tool for creating such an application. The main advantage lies in the fact that additional modules can fine tune the basic behaviour. For example, lemmatizer, email header parser or HTML parser are specified as Perl code. It is the developer of the final solution who codes the behaviour, this module only provides the interface.

Tie::STDERR

Module Tie::STDERR (version 0.26) catches outputs that go to standard error output and sends them via email, or writes them to file or to a process. The email is sent or program run only if there actually is some output, like cron does it.

CGI::BuildPage

In projects of Faculty administration of FI and Information system of MU we used CGI scripts with Perl module that automatically finishes HTML output with visual elements like icons, colors, logos or titles. The interface is the same as that of CGI.pm but the HTML code that in CGI is returned from method calls is here stored inside of the $query object. The object also has some additional methods that support for example two-column output. Here we present an export version called CGI::BuildPage (version 0.95).

Font::TFM

Module Font::TFM (version 0.130) provides access to information stored in TeX font metric (TFM) files.

TeX::DVI

Module TeX::DVI (version 0.110) makes it easy to produce output in DVI (DeVice Independent) format. The distribution also contains modules TeX::DVI::Parse and TeX::DVI::Print that parse DVI input and print its content (similar to dvitype) and also makes it easy to assign callbacks to objects in the file.

TeX::Hyphen

Module TeX::Hyphen (version 0.140) creates object that returns hyphenation information based on TeX hyphenation patterns.

Chaining of mod_perl handlers in Apache server

Module Apache::OutputChain (version 0.11) provides way of implementing filtering in Apache server using mod_perl handlers. Alternatively you may way to check the Apache::Filter module on CPAN, or use Apache 2+ and mod_perl 1.99+ and their native filtering features.

Access to spellchecking library by Pavel Ševeček

Module Lspell (version 0.30) provides access to Ševeček's lspell library directly from Perl scripts.

Czech support for GD library

Distribution GD::Latin2 (version 0.54) contains definition of fonts and a patch that adds support for Czech texts into the libgd library. Also included is program bdftogd that was used to generate the fonts and that can be used to add additional encodings.

ISO-8859-2 fonts are now directly in the GD distribution. I'm leaving the package here as an example and description on ways of changing gdlib character sets. The distribution inlucdes a script to print the patterns and its output.

My diploma work — typographic module

Main part of my diploma work consisted of module BGP.pm, that implemented basic typographical data types (box, glue, penalty) and typesetting algorithm. The above TeX modules were also used.

License

These modules are free software; you can distribute and modify them under the same terms as Perl itself — either the GNU General Public License or the Artistic License. This is the primary source, copies are available from CPAN.

Czech Perl mailinglist

Czech Perl mailinglist runs on listserver listserv@muni.cz. It is also available in net-news as cz.comp.lang.perl. I've passed over the administration to Yenya Kasprzak but before he issues new charter, original Meta-FAQ of the list (in Czech) is still valid.

CPAN

The basic and only archive of Perl software, modules and documentation, CPAN, is good start in any search for Perl answers or solutions. One of the mirrors is at ftp.fi.muni.cz/pub/perl/. The basic overview is also in my article about CPAN (in Czech) for Linuxové noviny.

Tutorial at EurOpen

In May 2004, I lead a tutorial about Perl, under the title Perl: the correct, standard and clean way (in Czech only). It contains both mandatory intro to the language features, and more interesting topics, including Unicode and UTF-8 and handling Czech data, DBI, and mod_perl.

PV005: Computer network services

For a couple of last years I'm being invited to spend some time of PV005 Computer network services course to talk about Perl and its applications. The main stress is put on live demonstrations of the topics, but there are rather oldish lecture notes available (in Czech only), together with slides in PostScript and the second part abou Perl modules.

In the lecture I show some examples of using Perl and modules from CPAN for building network applications. Scripts shown here should only be used as examples, not as complete solutions. Before running them on production machine check all security aspects. For example with the httpd, what would happen if someone sends you two dots to get to parent directory?

The simplest way of downloading document from the Web is using LWP::Simple. A simple solution is also provided by lwp-rget for recursive downloads from server. If you want to process the HTML document we received, we should use LWP::UserAgent and either write our own regular expression for relevant data or use existing modules, for example for listing the links or further postprocessing (conversion to absolute URLs).

We can use Perl for parsing httpd logs and producing statistics or for CGI scripts (here you can check its behaviour).

In Perl, we can write network applications directly: simple server, that repeats what is sent to it, another version without fork and an client example (but beware of buffering). And we can even write an http server (also a version without fork) or proxy that converts HTTP requests to HTTPS, co we can easily print the whole communication.