Video Frame Rate Enhancement: RIFE and its Architecture

2 min read
Modified
Progress 11 / 12
Table of Contents

RIFE is technology aiming to improve video frame rate by predicting pixel change between continuous frames using deep learning and generating intermediate frame. In this article, I explain briefly about IFNet architecture and method of RIFE using privileged distillation technology.

Paper can be read from below, so please refer if curious.

Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Real-time video frame interpolation (VFI) is very useful in video processing, media players, and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for VFI. To realize a high-quality flow-based VFI method, RIFE uses a neural network named IFNet that can estimate the intermediate flows end-to-end with much faster speed. A privileged distillation scheme is designed for stable IFNet training and improve the overall performance. RIFE does not rely on pre-trained optical flow models and can support arbitrary-timestep frame interpolation with the temporal encoding input. Experiments demonstrate that RIFE achieves state-of-the-art performance on several public benchmarks. Compared with the popular SuperSlomo and DAIN methods, RIFE is 4--27 times faster and produces better results. Furthermore, RIFE can be extended to wider applications thanks to temporal encoding. The code is available at https://github.com/megvii-research/ECCV2022-RIFE.

arxiv.org

Overview of RIFE

RIFE stands for “Real-Time Intermediate Flow Estimation”, and is technology capable of estimating optical flow at high speed with good precision. For high precision flow estimation, it proposes architecture called IFNet and special distillation method. (Optical flow is vector showing movement of pixel between frames.)

By using estimated optical flow, intermediate frame can be calculated, and it becomes possible to improve frame rate of video.

Architecture of Intermediate Frame Estimation using RIFE (3-1)

Following pipeline is used to estimate intermediate frame from continuous 2 frames.

  1. Intermediate Flow Estimation (IFNet)
    • Estimation of optical flow and fusion map (weight showing which info to use how much between 2 frames)
  2. Generation of Intermediate Frame
    • Generate intermediate frame from output of IFNet
    • Calculation formula is in section 3-1 of paper
  3. Quality Improvement of Intermediate Frame (RefineNet)
    • Process generated intermediate frame with model improving image quality (RefineNet).
    • RefineNet itself is model proposed in other paper
    • It is not essence of paper, but story is about precision improved when used

Architecture of IFNet (3-2)

IFNet is model like below. Stacking IFBlock based on CNN using Residual Connection.

Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, Shuchang Zhou, 2020, Real-Time Intermediate Flow Estimation for Video Frame Interpolation, https://arxiv.org/abs/2011.06294
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, Shuchang Zhou, 2020, Real-Time Intermediate Flow Estimation for Video Frame Interpolation, https://arxiv.org/abs/2011.06294

Privileged Distillation of IFNet (3-3)

Regarding Privileged Distillation of RIFE, what it is doing is same as image below in paper.

Only teacher model receives actual intermediate frame as input in addition to 2 frames learning model receives. As much as knowing actual intermediate frame, teacher model can estimate optical flow with good precision, and by distilling this result to student model, student model can learn effectively and improvement of precision is expected.

Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, Shuchang Zhou, 2020, Real-Time Intermediate Flow Estimation for Video Frame Interpolation, https://arxiv.org/abs/2011.06294
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, Shuchang Zhou, 2020, Real-Time Intermediate Flow Estimation for Video Frame Interpolation, https://arxiv.org/abs/2011.06294

Evaluation (Chapter 4)

Verifying effectiveness using indices like PSNR, SSIM, IE. (Refer to Chapter 4 for specific definition)

Conclusion

RIFE is slightly old, but since precision is good for relatively simple pipeline, it is technique easy to use among intermediate flow estimation technologies using deep learning. Since there is person implementing as extension in ComfyUI which I use often when testing video generation, those interested should try.

ComfyUI extension where RIFE is implemented https://github.com/Fannovel16/ComfyUI-Frame-Interpolation