Kurzbeschreibung

Today’s neuronal networks for action recognition often use a pixel-based representation of videos as input for classification. The disadvantage for this type of representation is the high redundancy between frames and therefore the resulting complexity of networks. A possible approach for reducing the complexity of the networks is to perform classification in the compressed video domain as video encoders reduce redundancies in videos to optimize them for storage and transmission. Motivated by this, in this paper I intend to find an appropriate approach to use components of compressed videos for action recognition, while reducing the overall complexity of the network, and investigate the possibility to describe temporal relations with motion vectors. Experimental results show, that the developed network is not performing as well as expected since the part of the network using motion vectors as input is not able to learn successful temporal features.

Dokumentation

All experiments are implemented with Keras and codes are available at https://gitlab.ldv.ei.tum.de/EmoVid/video-representation


Dateien


  • Keine Stichwörter