Tue, 27 May 2014
MPEG Transport Stream
Today I have investigated why some files with the
do not have their MIME type detected. The file starts with the following
$ od -tx1 file.mts | head -n 1 0000000 00 00 00 00 47 40 00 10 00 00 b0 11 00 00 c1 00
According to the current
/usr/share/magic from Fedora 20,
it is quite similar to the following entry:
0 belong&0xFF5FFF10 0x47400010 >188 byte 0x47 MPEG transport stream data
Also, the shared-mime-info package contains something similar:
<match type="big32" value="0x47400010" mask="0xff4000df" offset="0"/>
Note that both files expect the 0x47 byte to be at the beginning of the
file, not after four NULL bytes as in my example. Yet
can play these files, and
ffprobe(1) can detect it as "mpegts"
with an audio and video stream. Looking into the
I have discovered it does horrible things in order to detect a file format.
For example, for
mpegts, it scans the file for a 0x47 byte
at offset divisible by four, and then evaluates some other conditions.
The probe function returns score, and a file format with greatest score
is returned from the probe function. Ugly as hell, but probably needed
for handling real-world data files.
So, what should I do next? Should I submit a patch to
shared-mime-info to accept also the magic number at offset 4?
Are we getting to the point where the already-complicated language
/usr/share/magic file is not powerful enough?