Describing Videos by Exploiting Temporal Structure-软件玩家

admin管理员组
文章数量:823462

Describing Videos by Exploiting Temporal Structure

注：本文的数据集准备同样适应于以下两篇文章。

Attention-based LSTM with Semantic Consistency for Videos Captioning

Hierarchical LSTM with Adjusted TEmporal Attention for Video Captioning

作者GitHub提供了相关代码，根据作者README下载相应数据集可以跑通实验，但作者只提供了MSVD处理数据，如果想在其他数据集测试，则需要自行建立。作者没有提供相关代码，本文结合自己的实验过程，给出自己建立数据的代码，相互学习。

1、视频转单帧

import os
video_path = '/data/MSRVTTClips/train-video/'
frame_path = '/data/MSRVTTFrames/'count = 0
for video in os.listdir(video_path):os.mkdir(frame_path+video.split('eo')[0]+video.split('eo')[-1].split('.mp4')[0])os.system("ffmpeg -i "+video_path+video+" "+frame_path+video.split('eo')[0]+video.split('eo')[-1].split('.mp4')[0]+"/frame-%4d.jpg")

2、提取帧特征

import os
import sys
caffe_root = '/home/caffe_cudnn/python/'
sys.path.insert(0,caffe_root)
import numpy as npgpu_id = 2
import caffe
caffe.set_device(gpu_id)
caffe.set_mode_gpu()layer_num = 152
extract_from_layer = 'pool5'
model_def = "/home/caffe_cudnn/models/resnet/ResNet-"+str(layer_num)+"-deploy.prototxt"
pretrained_model = "/home/caffe_cudnn/models/resnet/ResNet-"+str(layer_num)+"-model.caffemodel"
batch_size = 1
folder_path = '/data/MSRVTTFrames/'
save_path = '/data/msrvtt/resnet'+str(layer_num)+'/'
mean_file = "/home/caffe_cudnn/models/resnet/ResNet_mean.npy"net = caffe.Net(model_def, pretrained_model, caffe.TEST)transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255)
transformer.set_mean('data', np.reshape(np.load(mean_file),(3,224,224)))for i in range(1,10001):video_path = os.path.join(folder_path, 'vid'+str(i)+'/')feature = []for idx in range(1,len(os.listdir(video_path))+1):frame = caffe.io.load_image(video_path+'frame-'+str(idx).zfill(4)+'.jpg')net.blobs['data'].data[0] = transformer.preprocess('data', frame)temp = net.forward()feat = net.blobs[extract_from_layer].data[0].copy()feat = np.reshape(feat, (2048,))feature.append(feat)feature = np.asarray(feature)np.save(save_path+'vid'+str(i)+'.npy', feature)print video_path

3、获取CAP以及worddict等

import json
import nltk
import pickle
from collections import Counter
import collections
import randomwith open(anno_json_path, 'r') as f:anno_json = json.load(f)
anno_data = anno_json['sentences']sentences = anno_datacounter = Counter()
ncaptions = len(sentences)
for i, row in enumerate(sentences):caption = row.split('\t')[1]# 直接按照空格进行单词的切分# tokens = caption.lower().split(' ')# 使用nltk来进行单词切分tokens = nltk.tokenize.word_tokenize(caption.lower())counter.update(tokens)if i % 10000 == 0:print('[{}/{}] tokenized the captions.'.format(i, ncaptions))with open('/data/msrvtt/worddict.pkl','w') as f:pickle.dump(counter, f) temp = {}
for j in range(1,10001):temp['vid'+str(j)] = []
for i in range(len(sentences)):tmp = {}tmp['caption'] = sentences[i]['caption']tmp['cap_id'] = sentences[i]['sen_id']tmp['image_id'] = 'vid'+str(int(sentences[i]['video_id'].split('video')[-1])+1)tmp['tokenized'] = ' '.join(nltk.tokenize.word_tokenize(sentences[i]['caption'].lower()))temp['vid'+str(int(sentences[i]['video_id'].split('video')[-1])+1)].append(tmp)tp = {}
for j in range(1,10001):tp['vid'+str(j)] = []
for k in range(1,10001):tmp = temp['vid'+str(k)]min_id = min(tmp[i]['cap_id'] for i in range(len(tmp)))for m in range(len(tmp)):tmp[m]['cap_id'] -= min_idtmp[m]['cap_id'] = str(tmp[m]['cap_id'])tp['vid'+str(k)] = tmpd = collections.OrderedDict()
for i in range(1,10001):d['vid'+str(i)] = tp['vid'+str(i)]    
with open('/data/msrvtt/CAP.pkl','w') as f:pickle.dump(d, f)       tmp = []
for i in range(1,6514):for j in range(20):tmp.append('vid'+str(i)+'_'+str(j)) 
random.shuffle(tmp)
with open('/data/msrvtt/train.pkl','w') as f:pickle.dump(tmp, f)tmp = []
for i in range(6514,7011):for j in range(20):tmp.append('vid'+str(i)+'_'+str(j)) 
random.shuffle(tmp)    
with open('/data/msrvtt/valid.pkl','w') as f:pickle.dump(tmp, f)tmp = []
for i in range(7011,10001):for j in range(20):tmp.append('vid'+str(i)+'_'+str(j)) 
random.shuffle(tmp)    
with open('/data/msrvtt/test.pkl','w') as f:pickle.dump(tmp, f)

本文标签： Describing Videos by Exploiting Temporal Structure

版权声明：本文标题：Describing Videos by Exploiting Temporal Structure 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/biancheng/1715775030a834679.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Describing Videos by Exploiting Temporal Structure

Describing Videos by Exploiting Temporal Structure

更多相关文章

Describing Videos by Exploiting Temporal Structure

发表评论

推荐文章

岳阳楼记范仲淹原文及译文

电脑编程猫有什么用途呢

python 保存json去掉空格

无法对代码运行测试。错误：未找到测试文件：“测试”

WPF 图片显示不清晰问题的解决

热门文章

编程课程讲什么内容比较好

九十二、Spark

艺术加计算机编程是什么

办公室正确坐姿对于改善健康和工作效率有多重要

Node,js Web Scrape

联想昭阳E41-50 i5 1035G18GB512GB集显参数报价

python123测验答案第九周

ADO.NET总结

java 文件名或扩展名太长

Python包简介

最新文章

电影《泰囧》中的一个穿帮漏洞

将 chatgpt 的 api 添加到 discord.js 中的 discord 命令

JSON（一）

学习西门子编程用什么电脑

Hama安装及示例运行

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

联想IdeaPad 14s 2023 酷睿版 i5 12450H16GB512GB参数报价

七彩虹将星 X15 AT 2023 i5 13500HX16GB512GBRTX4060参数报价

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB512GB集显参数报价

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB1TB集显参数报价

戴尔成就5310 Vostro 13-5310-2608A参数报价

编程频道|软件玩家 - 软件改变生活！

Describing Videos by Exploiting Temporal Structure

Describing Videos by Exploiting Temporal Structure

更多相关文章

Describing Videos by Exploiting Temporal Structure

发表评论

推荐文章

岳阳楼记 范仲淹原文及译文

电脑编程猫有什么用途呢

python 保存json去掉空格

无法对代码运行测试。错误：未找到测试文件：“测试”

WPF 图片显示不清晰问题的解决

热门文章

编程课程讲什么内容比较好

九十二、Spark

艺术加计算机编程是什么

办公室正确坐姿对于改善健康和工作效率有多重要

Node,js Web Scrape

联想昭阳E41-50 i5 1035G18GB512GB集显参数报价

python123测验答案第九周

ADO.NET总结

java 文件名或扩展名太长

Python包简介

最新文章

电影《泰囧》中的一个穿帮漏洞

将 chatgpt 的 api 添加到 discord.js 中的 discord 命令

JSON（一）

学习西门子编程用什么电脑

Hama安装及示例运行

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

联想IdeaPad 14s 2023 酷睿版 i5 12450H16GB512GB参数报价

七彩虹将星 X15 AT 2023 i5 13500HX16GB512GBRTX4060参数报价

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB512GB集显 参数报价

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB1TB集显 参数报价

戴尔成就5310 Vostro 13-5310-2608A参数报价

岳阳楼记范仲淹原文及译文

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB512GB集显参数报价

攀升IPASON 暴风龙P3 护眼版 i5 1155G716GB1TB集显参数报价