论文部分内容阅读
鲁棒的视频行为识别由于其复杂性成为了一项极具挑战的任务.如何有效提取鲁棒的时空特征成为解决问题的关键.在本文中,提出使用双向长短时记忆单元(Bi-LSTM)作为主要框架去捕获视频序列的双向时空特征.首先,为了增强特征表达,使用多层的卷积神经网络特征代替传统的手工特征.多层卷积特征融合了低层形状信息和高层语义信息,能够捕获丰富的空间信息.然后,将提取到的卷积特征输入Bi-LSTM,Bi-LSTM包含两个不同方向的LSTM层.前向层从前向后捕获视频演变,后向层反方向建模视频演变.最后两个方向的演变表达融合到Softmax中,得到最后的分类结果.在UCF101和HMDB51数据集上的实验结果显示本文的方法在行为识别上可以取得较好的性能.
Robust video behavior recognition has become a challenging task because of its complexity.How to extract robust spatio-temporal features effectively becomes the key to solve the problem.In this paper, we propose a Bi-LSTM (Bi-Directional Long-Short Time Memory Unit) As the main framework to capture two-way spatial-temporal features of video sequence.Firstly, in order to enhance the representation of features, a multi-layer convolutional neural network is used instead of the traditional manual features.The multi-layer convolution feature combines low-level shape information and high-level semantic information, Then, the extracted convolution features are input into Bi-LSTM, which contains two LSTM layers in different directions.The forward layer captures the video evolution from the front to the back, and the backward layer models the video in the reverse direction The evolution of the last two directions is expressed in Softmax and the final classification results are obtained.The experimental results on UCF101 and HMDB51 datasets show that the proposed method can achieve better performance in behavior recognition.