PHC
Development of the speech synthesis and recognition systems and other branches of the natural language processing area as well as creating and analysing of linguistically oriented corpora result in growing need of phonetic data standardisation. In this PROJECT we propose the format PHC (PHonetic Corpus) designed especially for such applications. The format is supported by a library of utilities (conversion program, editing subroutines etc.) which should make easier and more comfortable use of data in this format. Internally the format is used for developing of the syllable based speech synthesizer DEMOSTHENES.
The basic ideas of the proposed format are as follows:
Because of the heterogenity of possible applications, free structure of the format header was chosen.
Data file in PHC format consists of the header and the data part.
Bytes 1-3 contain characters ‘p’, ‘h’, ‘c’ denoting the PHC format.
Further header data may contain specifications of the type:
#keyword=value^
and comments as free text between specifications.
Keywords (specifications) can be either reserved or private. Reserved keywords have specific meaning declared in the format specification and consist of small letters. Private keywords are supposed to be explained in comments and consist of capital letters. Using of private specifications is supposed when no suitable reserved specification is available.
Header is terminated by the sequence of characters ‘#’ ‘#’;
Contact
Pavel Fryda
Faculty of Informatics, Botanicka 68a,
60200 Brno, Czech Republic
E-mail:
fryda@fi.muni.cz
Ivan Kopecek
Faculty of Informatics, Botanicka 68a,
60200 Brno, Czech Republic
E-mail:
kopecek@fi.muni.cz WWW: http://www.fi.muni.cz/~kopecek/Team (in alphabetical ordering)
Jan Dvorak, Ivan Cicha, Pavel Fryda, Tomas Hudec,Ivan Kopecek, Pavel Matyasko, Tomas Novotny, Ales Vitek
References:
http://www.fi.muni.cz/~kopecek/pub.htm