Encoding of Czech in documents
Each web document is, as pure text in its essence, written in one of the character encoding (character set, character set, charset, character encoding). The usual coding for English documents is ISO-8859-1, for Czech ISO-8859-2 or similar WINDOWS-1250; Universal UTF-8 is an expanded universal encoding. It is necessary to ensure that the encoding in which the document is actually written agrees with the encoding that the target web browser will use when interpreting / decoding it. In case of inconsistency, the person who will view your site will see more or less incomprehensible text.
When interpreting a downloaded web document, the browser detects the coding in which the document is written by the following metrics:
- HTTP header
Content-Type
- mark
meta
in the source text of the document - default preset encoding settings in browser
Content-Type
with the specified coding, its browser will be considered as standard. If an adequate header is not present, the browser looks for any tags
meta
in the source text of the document. If it is not in the required form, the default browser settings will be accessed.
The Web server at Aise (or www.fi.muni.cz) always sends the HTTP header. The character set in this header is decided by the following elements (again in the priority order):
- file extension with document to Aise
- directive
AddDefaultCharset
in an adequate file.htaccess
- default web server settings (ISO-8859-2)
The first point applies only in situations where the directory (Aise) contained in the document has Unix authorization
r
for others. The document, whose URL will be https://www.fi.muni.cz/~xnovak99/stranka.html, can be stored on the server under the name
stranka.html.
kodovani
, with the coding specified by the HTTP header being displayed to the viewing browser.
Directive
AddDefaultCharset
has syntax
AddDefaultCharset
kódování
and its presence (on its own line) in the file
.htaccess
in the Web Docs Directory will cause these codes to be specified by the HTTP Browser header by the directive-specified encoding. Instead of valid coding, it is possible to include a word in the parameter
Off
, as a result of which you will deactivate the HTTP encoding effect completely (ie, it will go to the 2nd or 3rd point in the primary list above).
From the above it follows, inter alia, that to the brand
meta
, which you have defined the encoding in the page itself, will only be taken into account when
.htaccess
please specify
AddDefaultCharset Off
.
Server-backed creation of multilingual document mutations
Part of the site-specific request sent by the web browser to the server is also a pre-arranged language specification in which the browser prefers the site. (This preference list is set in the browser by the user.) On the server, different page-to-page mutations are stored in different files in a single directory, with each language mutation marked with an additional language extension. For example, https://www.fi.muni.cz/~xnovak99/stranka.html can be represented by files on the disk
stranka.html.cs
,
stranka.html.en
and
stranka.html.pl
(Czech, English and Polish versions).
If the server has multiple language versions of the requested document, it returns the browser from the available variants that best suits the browser's priorities. If none of the preferred language variants is available, the server returns a Czech version.
This behavior is (server-side) suppressed if at least one of the following conditions is true:
- a file that matches the name of the document in the URL is available (ie without a language extension)
- in the directory where multi-language versions are not to be taken into account, the file is present
.htaccess
with a lineOptions -MultiViews
- a directory where multilingual versioning is not taken into account does not have Unix permissions
r
for others
stranka.html.cs
,
stranka.html.en
and
stranka.html
, so the user will always return the file content to the document request page.html
stranka.html
.
Combining language and code multivertising
It is not possible to combine both systems using file extensions (e.g.
stranka.html.cs.utf-8
or
stranka.html.utf-8.cs
).
Since you can only distinguish language versions with extensions, it is possible to specify encoding for each file separately only by using
meta
tag in source text of documents (v
.htaccess
indicated
AddDefaultCharset Off
).