diplomat.utils.frame_store_fmtΒΆ

Contains 2 Utility Classes for reading and writing the DIPLOMAT Frame Store format. The format allows for processing videos using DIPLOMAT and then running predictions on the probability map data later. Below is a specification for the DIPLOMAT Frame Store format.

DIPLOMAT FRAME STORE BINARY FORMAT (All multi-byte fields are in little-endian format)
['DLFS'] -> DipLomat (or Deep Learning) Frame Store - 4 Bytes (file magic)

Header:
    ['DLFH'] -> Diplomat Header
    [num_frames] - the number of frames. 8 Bytes (long unsigned integer)
    [num_bp] - number of body parts contained per frame. 4 Bytes (unsigned integer)
    [frame_height] - The height of a frame. 4 Bytes (unsigned integer)
    [frame_width] - The width of a frame. 4 Bytes (unsigned integer)
    [frame_rate] - The frame rate, in frames per second. 8 Bytes (double float).
    [stride] - The original video upscaling multiplier relative to current frame size. 8 Bytes (double float)
    [orig_video_height] - The original video height. 4 Bytes (unsigned integer)
    [orig_video_width] - The original video width. 4 Bytes (unsigned integer)
    [crop_y1] - The y offset of the cropped box, set to max value to indicate no cropping... 4 Bytes (unsigned integer)
    [crop_x1] - The x offset of the cropped box, set to max value to indicate no cropping... 4 Bytes (unsigned integer)

Bodypart Names:
    ['DBPN'] -> Diplomat Body Part Names
    (num_bp entries):
        [bp_len] - The length of the name of the bodypart. 2 Bytes (unsigned short)
        [DATA of length bp_len] - UTF8 Encoded name of the bodypart.

Frame Lookup Chunk:
    ['FLUP'] -> Frame LookUP
    (num_frames entries):
        [frame_offset_ptr] -> The offset of frame i into the FDAT chunk, excluding the chunk signature. 8 Bytes (unsigned long)

Frame data block:
    ['FDAT'] -> Frame DATa
    Now the data (num_frames entries):
        Each sub-frame entry (num_bp entries):

            Single Byte: 000000[offsets_included][sparse_fmt]:
                [sparse_fmt]- Single bit, whether we are using the sparse format. See difference in storage below:
                [offsets_included] - Single bit, whether we have offset data included. See difference in storage below:
            [data_length] - The length of the compressed/uncompressed frame data, 8 Bytes (long unsigned integer)

            DATA (The below is compressed in the zlib format and must be uncompressed first). Based on 'sparse_fmt' flag:

                If it is false, frames are stored as 4 byte float arrays, row-by-row, as below (x, y order below):
                    prob(1, 1), prob(2, 1), prob(3, 1), ....., prob(x, 1)
                    prob(1, 2), prob(2, 2), prob(3, 2), ....., prob(x, 2)
                    .....................................................
                    prob(1, y), prob(2, y), prob(3, y), ....., prob(x, y)
                Length of the above data will be frame height * frame width...
                if [offsets_included] == 1:
                    Then 2 more maps equivalent to the above store the offset within the map when converting back
                    to video:
                        off_y(1, 1), off_y(2, 1), off_y(3, 1), ....., off_y(x, 1)
                        off_y(1, 2), off_y(2, 2), off_y(3, 2), ....., off_y(x, 2)
                        .........................................................
                        off_y(1, y), off_y(2, y), off_y(3, y), ....., off_y(x, y)

                        off_x(1, 1), off_x(2, 1), off_x(3, 1), ....., off_x(x, 1)
                        off_x(1, 2), off_x(2, 2), off_x(3, 2), ....., off_x(x, 2)
                        .........................................................
                        off_x(1, y), off_x(2, y), off_x(3, y), ....., off_x(x, y)
                Otherwise frames are stored in the format below.

                Sparse Frame Format (num_bp entries):
                    [num_entries] - Number of sparse entries in the frame, 8 bytes, unsigned integer.
                    [arr y] - list of 4 byte unsigned integers of length num_entries. Stores y coordinates of probabilities.
                    [arr x] - list of 4 byte unsigned integers of length num_entries. Stores x coordinates of probabilities.
                    [probs] - list of 4 byte floats, Stores probabilities specified at x and y coordinates above.
                    if [offsets_included] == 1:
                        [off y] - list of 4 byte floats, stores y offset within the block of pixels.
                        [off x] - list of 4 byte floats, stores x offset within the block of pixels.

Frame Terminating/Ending Chunk:
    This chunk is optional, but when included allows this file to be placed onto the end of another file
    (in most cases this is a video file). Readers should first check for the above header, and if they don't see "DLFS",
    then seek to the end of the file and search for a DLFE chunk.

    ['DLFE'] -> Deep Learning Frame End
    [file_size] -> The size of the file, excluding this chunk (or the whole file size -12 bytes). (long unsigned integer, 8 bytes)

Classes

DLFSConstants()

Class stores some constants for the Diplomat Frame Store format.

DLFSReader(file)

A DIPLOMAT Frame Store Reader.

DLFSWriter(file, header[, threshold, ...])

A DIPLOMAT Frame Store Writer.