BMN（Boundary Matching Network）代码解读

2023-09-19 01:34:44

代码地址：https://github.com/JJBOY/BMN-Boundary-Matching-Network
该代码由JJBOY借鉴论文作者的上一工作BSN的源码写成

本人对代码的理解主要写在了注释中

该代码以main.py中main函数为入口，根据命令行参数，执行不同的函数。先看opt[“mode”] == “train”

BMN_Train(opt)

该函数中首先创建了BMN对象，所以先来看BMN类的代码（models.py中）

BMN(opt)

构造函数接受一个参数字典opt，设置各参数，然后调用了_get_interp1d_mask()，之后是构建网络层。

先来看_get_interp1d_mask()函数，该函数生成了论文中的sampling mask weight W∈ R(NTDT)。函数中穷举了所有可能的提议，将每个提议的区间范围扩充，然后调用_get_interp1d_bin_mask函数为每个提议生成相应的w(i,j)∈ R(NT)

_get_interp1d_mask(…)

        def _get_interp1d_mask(self):# generate sample mask for each boundary-matching pairmask_mat = []for end_index in range(self.tscale):    # 视频特征序列长度为tscale(100)mask_mat_vector = []for start_index in range(self.tscale):  # tscale次循环if start_index <= end_index:    # 穷举时段 对每一个时段划分num_sample个proposal，再对每个proposal采样num_sample_perbin个点p_xmin = start_indexp_xmax = end_index + 1      # +1?center_len = float(p_xmax - p_xmin) + 1 # 长度 center? +1?sample_xmin = p_xmin - center_len * self.prop_boundary_ratio    # 区间向左右各扩展了总长度的prop_boundary_ratio(0.5)sample_xmax = p_xmax + center_len * self.prop_boundary_ratio    # 论文中prop_boundary_ratio为0.25p_mask = self._get_interp1d_bin_mask(   # shape:(tscale1-100,num_sample-32)sample_xmin, sample_xmax, self.tscale, self.num_sample, self.num_sample_perbin)else:p_mask = np.zeros([self.tscale, self.num_sample])mask_mat_vector.append(p_mask)# print(len(mask_mat_vector)) # tscale2个（tscale1，num_sample）的数组mask_mat_vector = np.stack(mask_mat_vector, axis=2) #（tscale1，num_sample,tscale2）mask_mat.append(mask_mat_vector)# print(len(mask_mat))     # tscale3个（tscale1，num_sample,tscale2）的数组mask_mat = np.stack(mask_mat, axis=3)   # （tscale1，num_sample,tscale2，tscale3）mask_mat = mask_mat.astype(np.float32)  # 生成W(i,j)∈ R(N*T*D*T)  shape :[100,32，100，100]# nn.Parameter是继承自torch.Tensor的子类，其主要作用是作为nn.Module中的可训练参数使用。# 它与torch.Tensor的区别就是nn.Parameter会自动被认为是module的可训练参数，即加入到parameter()这个迭代器中去；# 而module中非nn.Parameter()的普通tensor是不在parameter中的。# nn.Parameter的对象的requires_grad属性的默认值是True，即是可被训练的，这与torth.Tensor对象的默认值相反# torch.Tensor是默认的tensor类型（torch.FlaotTensor）的简称。 一个张量tensor可以从Python的list或序列构建# view返回一个有相同数据但大小不同的tensor -1表示该维度值根据数据总数和另一个维度值得到(除法)self.sample_mask = nn.Parameter(torch.Tensor(mask_mat).view(self.tscale, -1), requires_grad=False)# torch.Size([100, 320000])

_get_interp1d_bin_mask(…)

    def _get_interp1d_bin_mask(self, seg_xmin, seg_xmax, tscale, num_sample, num_sample_perbin):# generate sample mask for a boundary-matching pair# num_sample为采样点数 num_sample_perbin为对每个采样点再细分的点数 共 32*3=96个点# 此处是 使用每个大采样点对应的小采样点 生成该大采样点的w(i,j)plen = float(seg_xmax - seg_xmin)   # 扩展后的长度plen_sample = plen / (num_sample * num_sample_perbin - 1.0) # 每“小段”样本长total_samples = [seg_xmin + plen_sample * ii for ii in range(num_sample * num_sample_perbin)]   # 所有采样点p_mask = []# 使用每个大采样点对应的小采样点 生成该大采样点的w(i,j) 共num_sample个for idx in range(num_sample):bin_samples = total_samples[idx * num_sample_perbin:(idx + 1) * num_sample_perbin]  # 切片出每个proposal的采样点bin_vector = np.zeros([tscale]) # size=tscale??for sample in bin_samples:  # 参照论文 w(i,j,n)[t]的生成sample_upper = math.ceil(sample)    # 向上取整sample_decimal, sample_down = math.modf(sample) # 返回sample的整数部分与小数部分 左小右整if int(sample_down) <= (tscale - 1) and int(sample_down) >= 0:bin_vector[int(sample_down)] += 1 - sample_decimalif int(sample_upper) <= (tscale - 1) and int(sample_upper) >= 0:bin_vector[int(sample_upper)] += sample_decimalbin_vector = 1.0 / num_sample_perbin * bin_vector   # 除以取样数p_mask.append(bin_vector)   # 最终变为包含num_sample个长度为tscale（100）的列表的列表，即（num_sample，100）p_mask = np.stack(p_mask, axis=1)   # axis=1 即将num_sample个列表（对应元素）堆叠，得100个长度为num_sample的数组,即（100，num_sample ）return p_mask   # 生成w[i,j]∈ R(N*T)  shape :[100,32]

回到BMN的构造函数，结合forward()函数看网络层的设置。该网络层设置与论文中Table1给出的略有不同：一是Base Module的x_1d_b中第二次卷积，论文中是使维度变为128 而非保持256不变；二是多了Proposal Evaluation Module的x_1d_p；三是Proposal Evaluation Module的x_2d_p，相对论文中，多了一组“nn.Conv2d(self.hidden_dim_2d, self.hidden_dim_2d, kernel_size=3, padding=1), nn.ReLU(inplace=True)”。数据在网络传导过程中的形状变化见forward中注释。

    def forward(self, x):   # x: torch.Size([8, 400, 100])base_feature = self.x_1d_b(x)   # torch.Size([8, 256, 100]) start = self.x_1d_s(base_feature).squeeze(1)    # squeeze(1) 当第二个维度值为1时 去除该维度end = self.x_1d_e(base_feature).squeeze(1)  # torch.Size([8, 1, 100])变为torch.Size([8, 100])confidence_map = self.x_1d_p(base_feature)  # torch.Size([8, 256, 100]) S(F) ∈ R(C×T) confidence_map = self._boundary_matching_layer(confidence_map)  # torch.Size([8, 256, 32, 100, 100])confidence_map = self.x_3d_p(confidence_map).squeeze(2) # torch.Size([8, 512, 1, 100, 100])变为torch.Size([8, 512, 100, 100])confidence_map = self.x_2d_p(confidence_map)    # torch.Size([8, 2, 100, 100])return confidence_map, start, end

至此，BMN类的代码已看完。接着回到BMN_Train函数

def BMN_Train(opt):model = BMN(opt)    # 首先创建BMN对象model = torch.nn.DataParallel(model, device_ids=[0]).cuda()  # 设置多卡训练（但本机单卡）# filter过滤掉requires_grad==False，即不需要计算梯度的parameteroptimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=opt["training_lr"],weight_decay=opt["weight_decay"])# DataLoader的第一个参数可以是map-style或iterable-style的datasets# 这里的VideoDataSet即是map-style（需要有__getitem__() and __len__()方法）train_loader = torch.utils.data.DataLoader(VideoDataSet(opt, subset="train"),batch_size=opt["batch_size"], shuffle=True,num_workers=8, pin_memory=True)# num_workers 决定了有几个进程来处理data loading。0意味着所有的数据都会被load进主进程。（默认为0）# pin_memory如果设置为True，那么data loader将会在返回它们之前，将tensors拷贝到CUDA中的固定内存（CUDA pinned memory）中test_loader = torch.utils.data.DataLoader(VideoDataSet(opt, subset="validation"),batch_size=opt["batch_size"], shuffle=False,num_workers=8, pin_memory=True)# 调整学习率机制  每过step_size个epoch lr=lr*gammascheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=opt["step_size"], gamma=opt["step_gamma"])bm_mask = get_mask(opt["temporal_scale"])   # [tscale, tscale]的tensor，上三角全1 下三角全0for epoch in range(opt["train_epochs"]):# warning: calling scheduler.step() before calling optimizer.step()) will skip the first value of the learning rate schedule# may be unable to reproduce resultsscheduler.step()train_BMN(train_loader, model, optimizer, epoch, bm_mask)test_BMN(test_loader, model, epoch, bm_mask)

其中作为DataLoader参数的VideoDataSet是一个关键类，进入到该类的代码

VideoDataSet(data.Dataset)

构造函数

    def __init__(self, opt, subset="train"):# 此处略去一些属性设置self._getDatasetDict()  # 提取anno_database中属于当前集合(self.subset)的数据  dict:{id：标注信息,...}# r(tn) = [tn−df/2, tn+df/2], where df=tn−tn−1 is the temporal interval between two locations.# 共100个时间点 df=1# 0-1标准化后的每个候选点扩充后的区间左右端点值列表：self.anchor_xmin = [self.temporal_gap * (i - 0.5) for i in range(self.temporal_scale)]  # [-0.005, 0.005, 0.015, 0.025, 0.035,...,0.925, 0.935, 0.9450000000000001, 0.9550000000000001, 0.965, 0.975, 0.985]self.anchor_xmax = [self.temporal_gap * (i + 0.5) for i in range(self.temporal_scale)]  # [ 0.005, 0.015, 0.025, 0.035, 0.045,...,0.935, 0.9450000000000001, 0.9550000000000001, 0.965, 0.975, 0.985, 0.995]

其中self._getDatasetDict()用于提取anno_database中属于当前集合(self.subset)的数据 dict:{id：标注信息,…}

_getDatasetDict()

    def _getDatasetDict(self):anno_df = pd.read_csv(self.video_info_path) # DataFrame shape:(19228, 7)    包含id和所属集合等anno_database = load_json(self.video_anno_path) # dict  len:19228           包含id和标注信息等self.video_dict = {}for i in range(len(anno_df)):   # 将anno_database中属于当前集合(self.subset)的数据 放入self.video_dictvideo_name = anno_df.video.values[i]    # anno_df的“video”列的第i个值 video即id/namevideo_info = anno_database[video_name]  # id相应的标注信息video_subset = anno_df.subset.values[i] # anno_df的“subset”列的第i个值if self.subset in video_subset: # e.g. if "train" in "training":self.video_dict[video_name] = video_infoself.video_list = list(self.video_dict.keys())  # id列表print("%s subset video numbers: %d" % (self.subset, len(self.video_list)))

分割线——————————————————————————————————

在前面BMN_Train部分代码的注释中提到：DataLoader的第一个参数可以是map-style或iterable-style的datasets；这里的VideoDataSet即是map-style（需要有__getitem__() and len()方法）。VideoDataSet中实现的__getitem__() 函数，当self.mode == "train"时，返回特征数据、置信图、起点得分、终点得分；否则返回索引值和特征数据。
getitem() 函数中主要涉及_load_file()和_get_train_label()两个函数，分别用于获得特征数据和标签。其中_load_file()的代码主要是对数据的读取与转换，无甚要点。下面主要看_get_train_label()

def _get_train_label(self, index, anchor_xmin, anchor_xmax)

首先是读取出一些信息，并使用“特征帧数/总帧数*总时长”得到有效时长corrected_second。但我发现有些feature_frame>video_frame。。

		# change the measurement from second to percentagegt_bbox = []    # 存放该视频中若干个动作实例的起点终点对for j in range(len(video_labels)):tmp_info = video_labels[j]# 若相应时间点不超过有效总时长 则tmp_* = tmp_info['segment'][x] / corrected_second# 0-1标准化 （将度量值从秒变为百分数）tmp_start = max(min(1, tmp_info['segment'][0] / corrected_second), 0)tmp_end = max(min(1, tmp_info['segment'][1] / corrected_second), 0)gt_bbox.append([tmp_start, tmp_end])# generate R_s and R_egt_bbox = np.array(gt_bbox) # shape (n,2),n为segment个数gt_xmins = gt_bbox[:, 0]    # (n,)  长度为n的一维数组gt_xmaxs = gt_bbox[:, 1]# for a ground-truth action instance φg=(ts,te) with duration dg = te−ts# we denote its starting and ending regions as rS=[ts−dg/10,ts+dg/10] and rE=[te−dg/10,te+dg/10]# 而下面采用了定长gt_len_small=0.03 即将每个点向左右扩充0.015得到相应区间 略有问题gt_lens = gt_xmaxs - gt_xminsgt_len_small = 3 * self.temporal_gap  # np.maximum(self.temporal_gap, self.boundary_ratio * gt_lens)# 两个一维array组成的tuple 通过np.stack 得到（n,2）的二维array  n为segment个数gt_start_bboxs = np.stack((gt_xmins - gt_len_small / 2, gt_xmins + gt_len_small / 2), axis=1)gt_end_bboxs = np.stack((gt_xmaxs - gt_len_small / 2, gt_xmaxs + gt_len_small / 2), axis=1)gt_iou_map = np.zeros([self.temporal_scale, self.temporal_scale])for i in range(self.temporal_scale):    # 穷举所有可能的区间，计算与当前实例的每个真实区间的交并比for j in range(i, self.temporal_scale):gt_iou_map[i, j] = np.max(  # np.max取返回的一维数组中的最大值iou_with_anchors(i * self.temporal_gap, (j + 1) * self.temporal_gap, gt_xmins, gt_xmaxs))# 参数依次为：候选区间左端点、候选点区间右端点  真实起点列表 真实终点列表gt_iou_map = torch.Tensor(gt_iou_map)# 计算每个候选点扩充后的区间 与真实点扩充后区间的IoRmatch_score_start = []for jdx in range(len(anchor_xmin)):match_score_start.append(np.max(ioa_with_anchors(anchor_xmin[jdx], anchor_xmax[jdx], gt_start_bboxs[:, 0], gt_start_bboxs[:, 1])))# 参数依次为：候选点扩充后左端点、候选点扩充后右端点   所有真实“起“点扩充后的左端点列表 和右端点列表match_score_end = []for jdx in range(len(anchor_xmin)):match_score_end.append(np.max(ioa_with_anchors(anchor_xmin[jdx], anchor_xmax[jdx], gt_end_bboxs[:, 0], gt_end_bboxs[:, 1])))# 参数依次为：候选点扩充后左端点、候选点扩充后右端点   所有真实”终“点扩充后的左端点列表 和右端点列表match_score_start = torch.Tensor(match_score_start) # torch.Size([100])match_score_end = torch.Tensor(match_score_end)     # torch.Size([100])return match_score_start, match_score_end, gt_iou_map

其中用于计算区间IoU和IoA（论文中为IoR，但实际计算方式似乎一样）的函数位于utils.py中。下面仅看iou_with_anchors()函数代码，因为ioa_with_anchors的代码与其基本一致。

iou_with_anchors(anchors_min, anchors_max, box_min, box_max)

def iou_with_anchors(anchors_min, anchors_max, box_min, box_max):""" 计算提议区间(anchors_min, anchors_max)与真实区间的交并比box_min 为真实区间的起点构成的数组，box_max 为真实区间的终点构成的数组"""len_anchors = anchors_max - anchors_min# 两个区间(s1,e1)和(s2,e2)的交集 为较大的起点max(s1,s2)和较小的终点min(e2,e2)所构成的区间# 若min(e2,e2)<=max(s1,s2) 说明无交集# np.maximum用于逐元素比较两个array的大小 选择最大值int_xmin = np.maximum(anchors_min, box_min) # 取较大的起点int_xmax = np.minimum(anchors_max, box_max) # 取较小的终点inter_len = np.maximum(int_xmax - int_xmin, 0.) # 计算交集大小  若<0 说明无交集，取0union_len = len_anchors + box_max - box_min - inter_len# 并集大小=两集合大小之和-交集大小jaccard = np.divide(inter_len, union_len)return jaccard  # 返回一个一维数组（长度为真实区间(segment)个数）

至此，VideoDataSet类相关代码介绍完毕。当使用VideoDataSet对象构建DataLoader后，就可以以如下方式获取数据。

for n_iter, (input_data, label_confidence, label_start, label_end) in enumerate(data_loader):

训练过程就是每次使用上面的方法获取数据，并将特征数据input_data输入到网络，经过forward获得输出的置信图，起点得分值，终点得分值。再和真实的置信图，起点得分值，终点得分值一起送入bmn_loss_func函数，计算损失值。然后通过反向传播，迭代优化（调用torch几个函数而已）

至于损失函数、BMN_inference（生成提议）、BMN_post_processing（筛选提议）代码，没啥好说的，略。

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce