Since 10 November our faculty has a new website! The old website will still be available at oldwww.fi.muni.cz for now. Something is broken? Please report it to webmaster@fi.muni.cz or use our webform.

translated by Google

There are three ways to convert between Czech encodings:

Convert using cstocs

The faculty machines are in module cstools available program cstocs to convert between individual English encodings.

The use is as follows:

Add the cstools module: module add cstools .

We have a prace.txt document in Windows 1250 coding that we want to read in UNIX that uses ISO-8859-2 encoding. Then just write
cstocs 1250 il2 <prace.txt >prace.il2.txt
and we have our file in ISO-8859-2 in the prace.il2.txt file.

What kind of encoding cstocs can? Here is part of the manual page (the whole is available with the command man cstocs ):

AVAILABLE ENCODINGS
     ascii
          This is 7-bit ASCII encoding.  Can  be  used  to  strip
          diacritic from characters.

     il1  ISO-8859-1 (West European languages)

     il2  ISO-8859-2 (East European languages)

     cork Cork (T1) encoding used by TeX's DC fonts and by  LaTeX
          2e.

     kam  Kamenicky encoding (it  was  one  of  the  most  popuar
          encodings in Czech/Slovak language space).

     koi  It is KOI8-cs encoding (very old).

     vga  Encoding used by standard IBM PC vga cards.

     pc2  PC-Latin2 encoding, supported by M$-DOS.

     1250 Encoding used by czech M$-Windows.

Since cstocs work with standard input and output, it can also be used in the command line:

skript_generujici_text | cstocs il2 1250 | lpr -Ptlj4 

or may be used for transcoding

Convert using iconv

On faculty machines a basic program is available in the basic installation iconv to convert between the individual encodings you receive using:

iconv -l
Usage itself iconv is similar to u cstocs . Here's the Help (see man iconv ):
Použití: iconv [PŘEPÍNAČ...] [SOUBOR...]
Konvertuje zadané soubory z jednoho kódování do druhého.

 Zadání vstupně/výstupního formátu:
  -f, --from-code=NÁZEV      kódování vstupního textu
  -t, --to-code=NÁZEV        výstupní kódování

 Informace:
  -l, --list                 vypíše všechny známé znakové sady

 Řízení výstupu:
  -c                         omit invalid characters from output
  -o, --output=SOUBOR        výstupní soubor
  -s, --silent               suppress warnings
      --verbose              vypisuje informace o průběhu

  -?, --help                 Vypíše tuto nápovědu
      --usage                Vypíše krátký návod na použití
  -V, --version              Vypíše označení verze programu

or may be used for transcoding

script from L. Škarvady

The following command files for sed stored in the / packages / share / CHARSETS directory are used to convert text between different encoders:

cork.ascii      isolat1.ascii   isolat2.koi8    koi8.ascii      unix.dos
cork.isolat1    isolat1.cork    isolat2.pclat2  koi8.isolat2
cork.isolat2    isolat2.ascii   kam.ascii       koi8.kam
dos.unix        isolat2.cork    kam.isolat2     pclat2.ascii
hex-cork.kam    isolat2.kam     kam.pclat2      pclat2.isolat2

For example, they can be used to convert "Stone to ISO 8859-2" as follows:

cat soubor.kam | sed -f /packages/share/CHARSETS/kam.isolat2 > soubor.isolat2

You can add a ~ / .kshrc file to save the file:

          cnv () { if [ -z "$4" ]
                    then sed -f /packages/share/CHARSETS/$1.$2 $3
                    else sed -f /packages/share/CHARSETS/$1.$2 $3 >$4
                   fi
                 }
and wrote only:
cnv kam isolat2 soubor.kam soubor.isolat2

For information on individual codes, here .

The author of the script is Libor Škarvada libor (at) fi.muni.cz .