Yenya's World

Tue, 27 May 2014

MPEG Transport Stream

Today I have investigated why some files with the .MTS extension do not have their MIME type detected. The file starts with the following bytes:

$ od -tx1 file.mts | head -n 1
0000000 00 00 00 00 47 40 00 10 00 00 b0 11 00 00 c1 00

According to the current /usr/share/magic from Fedora 20, it is quite similar to the following entry:

0       belong&0xFF5FFF10       0x47400010
>188    byte                    0x47            MPEG transport stream data

Also, the shared-mime-info package contains something similar:

<match type="big32" value="0x47400010" mask="0xff4000df" offset="0"/>

Note that both files expect the 0x47 byte to be at the beginning of the file, not after four NULL bytes as in my example. Yet mplayer(1) can play these files, and ffprobe(1) can detect it as "mpegts" with an audio and video stream. Looking into the ffmpeg source, I have discovered it does horrible things in order to detect a file format. For example, for mpegts, it scans the file for a 0x47 byte at offset divisible by four, and then evaluates some other conditions. The probe function returns score, and a file format with greatest score is returned from the probe function. Ugly as hell, but probably needed for handling real-world data files.

So, what should I do next? Should I submit a patch to file(1) and shared-mime-info to accept also the magic number at offset 4? Are we getting to the point where the already-complicated language of the /usr/share/magic file is not powerful enough?

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

petr_p wrote:

file suffers from lack of tests and the parser rules are horrible. Fedora maintainer is pondering complete rewriting becuse of some fundamental insuffuciences in the languge (e.g. early text and binary format fork). There are always funny bug reportes to file (like today's one [https://bugzilla.redhat.com/show_bug.cgi?id=1101404]). I think you can submit your patch and look forward to various regressions :)

Ondřej Caletka wrote:

As MPEG TS is consisted of almost independent 188-bytes long packet, there is no room for good header. AFAIK the only way to detect TS is to seek for 0x47 packet start mark and then check that this mark repeats every 188 bytes. However, for TS stored in file, there is no reason why 0x47 shouldn't be the first byte. Maybe your file is not a raw MPEG TS stream but some MPEG TS stream augumented with packet timestamps.

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)