The evolution in display technologies and the ever-increasing popularity of video content has led to an increased demand for video compression to save on storage and bandwidth costs.
Compression is done by exploiting the similarity between video frames. This is possible because most content is nearly identical between video frames, with a typical video having 30 frames per second. The compression algorithm tries to find the remaining information between video frames.
Video compression is all about finding the perfect trade-off between image quality and video size. Videos must be served to millions of customers, not all of them can get the network capacity to get the highest visual quality from the video.
Although current video compression methods can save significant bandwidth, their progress still depends on traditional experimental methods. The latest video codec, the Versatile Video Codec (VVC), has been sharing components with video codecs for two decades.
The problem of video compression also belongs to the group of problems that are addressed by neural networks. Advances in neural video compression have been gaining momentum recently, and they have been able to get performance on par with traditional video codecs.
Despite achieving impressive compression performance, neural video compression methods suffer from producing “realistic” output. They can create the video output close to the input, but they lack realism. For example, if you examine the hair of people who have been compressed by a neural video compression model, you can see that they look a little off.
The point of adding a realism constraint to neural networks is to ensure that the output is indistinguishable from the real images while staying close to the input video. The main challenge is to ensure that the network can generalize well to unknown content.
This is the problem that this paper attempts to solve. They carefully build a neurogenerative video compression technology that excels at synthesis and preservation of detail. This is achieved through the use of a Generative Adversarial Network (GAN) and giving utmost importance to the GAN loss function.
In video compression, certain frames are identified as master frames (I-Frames) that are used as the basis for rebuilding upcoming frames. Higher bit rates are allocated to these frames; Therefore, they have better details. This is also valid for the suggested method
Dependent frames (P-Frames) based on the available I-Frame. It uses a three-step strategy.
First, it assembles key details inside the I-frame, which will be used as a base for upcoming frames. This is done using a combination of a convolutional neural network (CNN) and GAN components. The discriminator in the GAN component is responsible for ensuring the detail of the first frame level.
Second, composite details are published when needed. The powerful luminous flux (UFlow) method is used to predict movement between frames. The P-frame component consists of two autocoding parts, one for visual flow prediction and the other for residual information. These two parts work together to spread the details from the previous step as accurately as possible.
Finally, another automatic encoder is used to determine when new details are collected from an I-frame. Since new content can appear in P-frames, existing details may become irrelevant, and publishing them may distort the visual quality in this case. So, when that happens, the network has to piece together new details. The remaining autoencoder component achieves this.
The authors state that the two components are important in this method. The first component is the adaptation of the residual generator to a potential obtained from the previous distorted reconstruction. The second component is to increase the fine flux from the luminous flux network. The proposed method is evaluated objectively and subjectively, and in both cases, it outperformed existing neural video compression methods.
This was a summary of the paper “Neural Video Compression Using GANs to Gather and Disseminate Details” from the Google Research Group. You can check out the links below if you are interested in knowing more details.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Neural Video Compression using GANs for Detail Synthesis and Propagation'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.
Please Don't Forget To Join Our ML Subreddit
Akram Cetinkaya has a bachelor’s degree. in 2018 and MA. in 2019 from Ozyegin University, Istanbul, Turkey. He wrote his master’s degree. A thesis on image noise reduction using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and works as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.