README.md

lintel

Lintel is a Python module that can be used to decode videos, and return a byte array of all of the frames in the video, using the FFmpeg C interface directly.

Lintel was created for the purpose of developing machine learning algorithms using video datasets such as the NTU RGB+D, Kinetics and Charades action recognition datasets.

The foremost advantages of Lintel are:

Lintel provides a simple and fast Python interface to video decoding that can be dropped into existing machine learning training scripts.
By decoding video on the fly in a data processing pipeline, input pipelines can be made to circumvent an I/O bottleneck that would become a problem if data were stored as frames in an encoded image format such as JPEG.
Decoding videos on the fly using the FFmpeg C API provides a high degree of control over the input. For example, video can be decoded at dynamic framerates, with no loss in efficiency.

An example of this control is in the implementation of loadvid_frame_nums, where a list of frame indices is passed to indicate which specific frames are to be decoded.
By using the FFmpeg C API directly, as opposed to piping input to the ffmpeg command line tool, a number of issues and complications surrounding interfacing the ffmpeg command line tool from a performance-intensive machine learning application are completely avoided.

Pre-requisites

Python 3 is required. To use Python 2, I believe a (potentially small) amount of code would have to be changed in lintel/py_ext/lintelmodule.c.

A version of FFmpeg that supports the avcodec_send_packet() and avcodec_receive_frame() API for decoding video is required. For example, FFmpeg version 3.3.6 should work fine, as should downloading and installing the latest development version of FFmpeg.

If the version of FFmpeg distributed with your system is too old, see the Installing FFmpeg from Source section below to install a newer version.

Installation

Run the following to install a locally editable version of the library, with pip:

pip3 install --editable . --user

Testing Lintel

After installing, run:

lintel_test --filename <video-filename> --width <width> --height <height>

Pass criteria: decoded frames from the video should show up without distortion, decoding each clip in < 500ms.

Usage in a data processing pipeline

The lintel.loadvid interface can be used in a Python input pipeline as follows:

def _load_video_to_4darray(video, dataset, should_random_seek, fps_cap):
    """Decodes a video to a 4D numpy array, supporting random seeking.

    Args:
        video: Encoded video.
        dataset: Dataset meta-info, e.g., width and height.
        should_random_seek: If set to `True`, then `lintel.loadvid` will start
            decoding from a uniformly random seek point in the video (with
            enough space to decode the requested number of frames).

            The seek distance will be returned, so that if the label of the
            data depends on the timestamp, then the label can be dynamically
            set.
        fps_cap: The _maximum_ framerate that will be captured from the video.
            Excess frames will be dropped, i.e., if `fps_cap` is 30 for a video
            with a 60 fps framerate, every other frame will be dropped.

    Returns:
        A tuple (frames, seek_distance) where `frames` is a 4-D numpy array
        loaded from the byte array returned by `lintel.loadvid`, and
        `seek_distance` is the number of seconds into `video` that decoding
        started from.
    """
    video, seek_distance = lintel.loadvid(
        video,
        should_random_seek=should_random_seek,
        width=dataset.width,
        height=dataset.height,
        num_frames=dataset.num_frames,
        fps_cap=fps_cap)
    video = np.frombuffer(video, dtype=np.uint8)
    video = np.reshape(
        video, newshape=(dataset.num_frames, dataset.height, dataset.width, 3))

    return video, seek_distance

The lintel.loadvid_frame_nums API can be used similarly:

def _load_frame_nums_to_4darray(video, dataset, frame_nums):
    """Decodes a specific set of frames from `video` to a 4D numpy array.

    Args:
        video: Encoded video.
        dataset: Dataset meta-info, e.g., width and height.
        frame_nums: Indices of specific frame indices to decode, e.g.,
            [1, 10, 30, 35] will return four frames: the first, 10th, 30th and
            35 frames in `video`. Indices must be in strictly increasing order.

    Returns:
        A numpy array, loaded from the byte array returned by
        `lintel.loadvid_frame_nums`, containing the specified frames, decoded.
    """
    decoded_frames = lintel.loadvid_frame_nums(video,
                                               frame_nums=frame_nums,
                                               width=dataset.width,
                                               height=dataset.height)
    decoded_frames = np.frombuffer(decoded_frames, dtype=np.uint8)
    decoded_frames = np.reshape(
        decoded_frames,
        newshape=(dataset.num_frames, dataset.height, dataset.width, 3))

    return decoded_frames

Installing FFmpeg from Source

It may be necessary to compile FFmpeg from source, e.g. if there is no way to get the development FFmpeg files from the package manager. To do so, nasm, x264 and FFmpeg must all be installed.

Note in the following I assume that you have created a directory $HOME/.local for local installations, and that $HOME/.local/include, $HOME/.local/bin and $HOME/.local/lib are in your CPATH, PATH and LD_LIBRARY_PATH (as well as LIBRARY_PATH) environment variables, respectively.

Download and install nasm:

wget http://www.nasm.us/pub/nasm/releasebuilds/2.13.01/nasm-2.13.01.tar.bz2

tar xvjf nasm-2.13.01.tar.bz2 && cd nasm-2.13.01

./configure --prefix=$HOME/.local/

make -j$(nproc) && make install

Download and install x264:

git clone git://git.videolan.org/x264.git && cd x264

./configure --enable-static --enable-shared --prefix=$HOME/.local

make -j$(nproc) && make install

Download and install FFmpeg:

git clone https://github.com/FFmpeg/FFmpeg.git && cd FFmpeg

./configure --enable-shared --enable-gpl --enable-libx264 --enable-pic --enable-runtime-cpudetect --cc="gcc -fPIC" --prefix=$HOME/.local

- make -j$(nproc) && make install

Citing

If you find Lintel useful for an academic publication, then please use the following BibTeX to cite it:

@misc{lintel,
  author = {Duke, Brendan},
  title = {Lintel: Python Video Decoding},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/dukebw/lintel}},
}

GitHub - dukebw/lintel: A Python module to decode video frames directly, using t...

README.md

lintel

Pre-requisites

Installation

Testing Lintel

Usage in a data processing pipeline

Installing FFmpeg from Source

Citing

Recommend

新时代区块链研究院 - 一所专注于区块链技术培训和区块链技术人才推荐的研究院 - NEXT

26日0点:京东人民文学出版社67周年社庆图书特惠每满150减50，满减+用券，最高满30...

MI 小米 Note 3 全网通智能手机 6GB+128GB 亮黑色 1999元包邮_苏宁易购优惠

促销活动:天猫国际官方直营全品类大促 188-100元、299-40元、599-80元、1899-300元优...

转行找 Python 工作好难简历就是个头疼的事,不造假 HR 初选都过不了,造假写几年工作...

看到 v2 上都是吐槽某东的，话说某东这么差，大家为什么还要去用吗？ - V2EX

发现每天在车上浪费了好多时间，你们是怎么利用的？ - V2EX

京东购物 15 块的运费你们怎么解决？ - V2EX

索尼的 2018 年 OLED 电视系列售价由 US$2,800 起跳

JavaScript：面试频繁出现的几个易错点

About Joyk