Tue, 27 May 2014
MPEG Transport Stream
Today I have investigated why some files with the
do not have their MIME type detected. The file starts with the following
$ od -tx1 file.mts | head -n 1 0000000 00 00 00 00 47 40 00 10 00 00 b0 11 00 00 c1 00
According to the current
/usr/share/magic from Fedora 20,
it is quite similar to the following entry:
0 belong&0xFF5FFF10 0x47400010 >188 byte 0x47 MPEG transport stream data
Also, the shared-mime-info package contains something similar:
<match type="big32" value="0x47400010" mask="0xff4000df" offset="0"/>
Note that both files expect the 0x47 byte to be at the beginning of the
file, not after four NULL bytes as in my example. Yet
can play these files, and
ffprobe(1) can detect it as "mpegts"
with an audio and video stream. Looking into the
I have discovered it does horrible things in order to detect a file format.
For example, for
mpegts, it scans the file for a 0x47 byte
at offset divisible by four, and then evaluates some other conditions.
The probe function returns score, and a file format with greatest score
is returned from the probe function. Ugly as hell, but probably needed
for handling real-world data files.
So, what should I do next? Should I submit a patch to
shared-mime-info to accept also the magic number at offset 4?
Are we getting to the point where the already-complicated language
/usr/share/magic file is not powerful enough?
4 replies for this story:
file suffers from lack of tests and the parser rules are horrible. Fedora maintainer is pondering complete rewriting becuse of some fundamental insuffuciences in the languge (e.g. early text and binary format fork). There are always funny bug reportes to file (like today's one [https://bugzilla.redhat.com/show_bug.cgi?id=1101404]). I think you can submit your patch and look forward to various regressions :)
Ondřej Caletka wrote:
As MPEG TS is consisted of almost independent 188-bytes long packet, there is no room for good header. AFAIK the only way to detect TS is to seek for 0x47 packet start mark and then check that this mark repeats every 188 bytes. However, for TS stored in file, there is no reason why 0x47 shouldn't be the first byte. Maybe your file is not a raw MPEG TS stream but some MPEG TS stream augumented with packet timestamps.
This is likely a Blu-Ray video, where the 188-byte pathets are prepended with 4 byte timecodes. http://en.wikipedia.org/wiki/MPEG_transport_stream#Use_in_digital_video_cameras
packets, of course :)