paper - Video Compression through Image Interpolation

Ann Liu
Aug 20, 2018
3 min read

此篇為 ECCV 2018 paper - 由 Chao-Yuan Wu : The University of Texas at Austin 提出，

提出一種 end-to-end deep learning codec，以近來deep image interpolation and generation的進步，以可以超過H.261, MPEG-4 Part2，並可以和H.264相提並論。

下圖將codec效果相互比較，可以發現block artifacts比其他方法少許多，

簡單來說，先是針對anchor frames (key frames)進行encode，使用標準的deep image compression；再來，利用論文方法內插重建anchor frames剩餘的frames，進一步，image interpolation with hierachical manner 以降低頻率。

在video codec中 state-of-the-art : HEVC, H.264, MPEG-4 Part2, H.261，為論文比較對象；再以standard uncompressed datasets : Video Trace Library(VTL), Ultra Video Group測試。

Video Trace Library : http://trace.eas.asu.edu/yuv/index.html

Ultra Video : http://ultravideo.cs.tut.fi/#testsequences

最後實驗階段，利用MS-SSIM(多尺度結構相似度), PSNR(峰值信躁比) 來評估結果

https://zhuanlan.zhihu.com/p/37813759 內有相關公式介紹。

Video Compression through Interpolation 方法介紹

首先，利用去年CVPR2017的Toderici et al [1] 壓縮演算法 encode video I-frames，如(a)圖示

Interpolation network

I-frames不做encode，剩下的R-frames利用blind interpolation network內插。

context network C extract 在各種不同解析度下feature maps，不同尺度的圖片而是利用U-net upconvolution還原出來的，f1, f2 對原始 I1, I2 分別取出特徵。

interpolate network D 將C取 f1, f2 放入Decoder中還原 R-frames，其中若 R 沒有抓取到 I1, I2 額外重要資訊，會造成問題，所以需要以Motion Compensated interpolation補強，論文中提及block motion estimation 和 optical flow，block motion estimation較容易壓縮，optical flow多較多的細節不易壓縮，利用residual motion資訊warp所有的 f (?還沒搞清楚這做法，是否直接concat)，將warped context features當作輸入Decoder中，加上motion compensation加強照片產生，而非針對motion estimation。

framework可以學習到variable rate compression，就interpolation network近處的frame可以用較少編碼，離較遠的frame則需要較多的bits

Hierarchical interpolation

先是interpolate一些frames，如上圖來說，中間綠色貓咪先產生，再利用中間貓咪配上藍色貓咪內插，error propagation當超過三層會造成codec效果不彰，不同時續長度(Ma,b- a,b分別是對前第幾張，b是對後第幾張)需要訓練各別網路，所以每兩張中間最多內插2^3=8張，M21和M12為flip model只要將input 兩張參考照片交換即可，再合併M33,M66可以從生成8張變成生12張(事實上我認為應該是7和11張)

Bitrate optimization：

每一層所要encode的bitrate不同，那不同的bitrate會因為error propagate的問題，蔓延到下幾層，論文提及利用beam search，列舉原始的m種bitrate，再來是延伸出的frame encode m 種，共m^2種，到下個階段只保留m種好的組合，複雜度就 O(L*m^2)，省了很多計算。

實驗結果 Experiments:

Ablation study -

較為值得一提，Motion-compensated interpolation相對vanilla interpolation大幅進步，final model 搭配 entropy coding可以讓coding的效果達到最好，compression問題要focus在 low bitrate 能表現多好。

Motion -

是針對motion vector (H.264's algorithm) 和 optical flow(openCV implementation)做比較，其中flow*是假設flow compression 是 lossless 的upper bound效果，但找尋這樣的flow compression並不是論文重點，所以還是以block motion estimation為主。

Entropy Coding -