基于Excel的神经网络工具箱（之二）——前向传播及后向传播的算法VBA代码实现

2023-10-26 05:03:13

基于Excel的神经网络工具箱（之二）——ANN前向传播FP及反向传播算法BP的VBA算法实现

神经网络数据结构
前向传播算法
- 单层感知器的感知结果计算
- 感知器结果的前向传播
反向传播算法的实现
- 误差和梯度的计算
- 梯度从后往前的传播
- 随机梯度下降算法和标准梯度下降算法
- Levenberg Marquardt算法
- SCG算法
- 相关矩阵操作函数
- 卷积网络的前向和反向传播算法

本系列文章的其他部分链接如下：

基于Excel的神经网络工具箱（开篇）——ANNToolbox介绍
基于Excel的神经网络工具箱（之一）—— DNN神经网络数据结构的算法实现

神经网络数据结构

在上一篇文章中，我已经给出了ANN Toolbox中的完整神经网络数据结构的相关代码，包括数据结构定义、基本的单元、层、神经网络的创建、初始化、操作、删除以及单元计算的算法，详情可以参见前一篇文章。

前向传播算法

前向传播是ANN的“感知”过程。简单地讲，ANN的第一层与我们的输入数据紧密相连，这一层的每一个单元都与输入数据完全连接，而每一个单元是否激活就取决于单元的每一个连接权值和输入的加权平均，加上偏置bias值。公式如下：
$A(\sum_i^N x_i * w_i + b)$
其中 $A (x)$ 为单元的激活函数
在ANN Toolbox中，代码如下：

Private Function calcUnit(unit As NNUnit, vec() As Double, f As Byte, Optional d As Byte = 1) As Double
' calculates the output of a unit, returns single double result, this function is used for FULL-CONNECTION layer
' f as predefined type of unit activation function
' d as dimension of unit weights array
' calculation is feasible only when dimension and size of vec is the same as those of unit weights, _but it is not verified before calculation
Dim wc As Long, vc As Long, dimension As Byte
Dim i As Long, j As Long, k As Long
Dim m As Long, n As Long, o As Long
Dim result As DoubleOn Error GoTo errorHandlerWith unitresult = 0Select Case dCase 1For i = 1 To UBound(.w, 1)result = result + .w(i) * vec(i)NextCase 2For i = 1 To UBound(.w, 1)For j = 1 To UBound(.w, 2)result = result + .w(i, j) * vec(i, j)NextNextCase 3For i = 1 To UBound(.w, 1)For j = 1 To UBound(.w, 2)For k = 1 To UBound(.w, 3)result = result + .w(i, j, k) * vec(i, j, k)NextNextNextEnd Selectresult = result + .bresult = UTFunction(f, result)End WithcalcUnit = resultExit Function
errorHandler:calcUnit = 0MsgBox "Calculating Unit Error!"
End Function

单层感知器的感知结果计算

上面的代码计算 $\sum_i^N x_i * w_i + b$ 的值，并把结果传入TUFunction()函数中，这是单元的激活函数，计算最终的激活值：

Private Function UTFunction(fType As Byte, X As Double) As Double
' this function calculates the Unit Transfering Function
' to prevent from overflow error, input x should be larger than 700If X > -700 And X < 700 ThenSelect Case fTypeCase UTFLOGUTFunction = 1 / (1 + Exp(-1 * X))Case UTFTanH'UTFunction = Tanh(x)UTFunction = (1 - Exp(-1 * X)) / (1 + Exp(-1 * X))Case UTFReLUTFunction = max(0, X)Case UTFLkyReLUTFunction = max(0, X) + 0.01 * min(0, X)Case UTFLinUTFunction = XCase UTFSoftMax' softmax is calculated later when results of all units are calculated, here only gives an intermediate resultUTFunction = Exp(X)End SelectElseIf X < -700 Then 'if x<-700 exp function will return "overflow" error, then value should be calculated manuallySelect Case fTypeCase UTFLOGUTFunction = 0Case UTFTanH'UTFunction = Tanh(x)UTFunction = -1Case UTFReLUTFunction = 0Case UTFLkyReLUTFunction = 0.01 * XCase UTFLinUTFunction = XCase UTFSoftMaxUTFunction = 0End SelectElseIf X > 700 ThenSelect Case fTypeCase UTFLOGUTFunction = 1Case UTFTanH'UTFunction = Tanh(x)UTFunction = 1Case UTFReLUTFunction = XCase UTFLkyReLUTFunction = XCase UTFLinUTFunction = XCase UTFSoftMaxUTFunction = 1E+200End SelectEnd If
End Function

感知器结果的前向传播

第一层感知器从输入接受到信号后，运用激活函数计算出第一层的输出，这时会将自己的输出传入下一层，下一层感知器运用完全相同的法则，计算下一层的激活值，并且逐级传递，一直到最后的输出层：
这个过程，在ANN Toolbox中使用了NNFP()函数实现：

Public Function NNFP(net As NN, vec() As Double, output() As MultiArr) As Long
' get foreward calculation result - to calculate one step the output of a NN given input vector, output output vector of all layers
' vec() is the input vector, function returns the classification of final output if size of output is larger than 1Dim vecSize As Long, outputSize As Long
Dim curUnit As NNUnit, utf As Byte, num As Double
Dim i As Long, j As Long, n As Long, m As LongvecSize = UBound(vec)With net'''' set up the structure of multi array output()n = .hlayercount + 1ReDim output(0 To n)For i = 0 To n''  gets the number of units for each layerSelect Case iCase 0m = vecSize         ' input is stored in output(0)Case nm = UBound(.OutLayer.units)Case Elsem = UBound(.hLayer(i).units)End SelectReDim output(i).d(1 To m)Next'' start calculation of neural network outputsIf vecSize = .inSize Thenoutput(0).d = vecFor i = 1 To n                                  'calculate output layer by layer, including last layer = outlayer, and store result of each layer in the multi arrayIf i < n ThenoutputSize = UBound(.hLayer(i).units)   'output size = either the unit count of a hidden layer, or the NNUnit count of output layerutf = .hLayer(i).utf                    'get the unit transfering function for the layerElseoutputSize = UBound(.OutLayer.units)utf = .OutLayer.utfEnd IfFor j = 1 To outputSizeIf i <= .hlayercount ThencurUnit = .hLayer(i).units(j)ElsecurUnit = .OutLayer.units(j)End Ifoutput(i).d(j) = calcUnit(curUnit, output(i - 1).d, utf)NextNextEnd If' findout the largest value in output vector, and return the class represented by the largest _calculate output value of outlayer if outlayer is softmaxj = 1If outputSize > 1 ThenIf .OutLayer.utf = UTFSoftMax Thennum = 0For i = 1 To outputSizenum = num + output(n).d(i)If output(n).d(j) < output(n).d(i) Then j = iNext' calculate the value of softmax, overflow if num = 0If num <> 0 ThenFor i = 1 To outputSizeoutput(n).d(i) = output(n).d(i) / numNextEnd IfElseFor i = 2 To outputSizeIf output(n).d(j) < output(n).d(i) Then j = iNextEnd IfEnd IfNNFP = jEnd With
End Function

这个函数仅适用于全连接网络的计算，因为CNN的卷积计算需要用到不同的逻辑，因此CNNFP的算法稍有不同。但思想是一致的，根据输入层的数据，逐级从前到后计算所有神经元的激活值，并输出到下一层。对CNN来说，不同的是数据维度不同。

反向传播算法的实现

神经网络完成一次前向传播后，就已经完成了输出了。如果是训练好的网络，它的权值已经高度适应相关的输入信号，能够在特定的神经元中产生特定的激活信号，最终在输出层产生有“意义”的输出，这时候我们是不需要后向传播或反向传播算法的。

反向传播算法用在神经元的训练上。如果是没有经过训练的神经网络，它的输出将会是无意义的随机值。这时，我们的训练方法是所谓的“有监督学习”。意思是说，我们在给神经网络特定的输入时，让神经网络首先产生随机的输出，但同时，我们直接告诉神经网络“正确”的输出是什么。通过计算网络实际输出和“正确”输出之间的差值（我们称为误差 $\delta$ ），我们能够推导出神经网络每一层的神经元距离应该输出正确值的权值的误差。不过这个误差是从最后一层（输出层）一级级反推出来的：首先计算输出的误差，反推出倒数第二层的误差、再反推出倒数第三层的误差……最后得出整个网络的误差。

得到整个网络的误差后，我们会按照这个误差的方向“稍微”调整一下各个神经元的权值，再进行下一轮的“学习”

这就是神经网络学习的过程，因为误差是沿着数据输入的路径反向传播的，因此得名反向传播算法。

误差和梯度的计算

ANN Toolbox中的误差和梯度的计算非常依赖与NNGrad()函数，在这个函数中，首先计算网络输出值与目标值之间的差

但是，计算出差值之后，还需要计算输出层每个单元的偏导数，这需要用到激活函数的求导函数，这个函数定义为dUTFunction()

Private Function dUTFunction(fType As Byte, X As Double) As Double
' this function calculates the derived function of the unit transfering function
' for logistic sigmoid and hyperbolic tangent functions the input X = f(x)Select Case fTypeCase UTFLOGdUTFunction = X * (1 - X)Case UTFTanHdUTFunction = (1 + X) * (1 - X)Case UTFReL     ' according to Excel manual calc, d-function of ReLU is 1'If X > 0 ThendUTFunction = 1'Else'    dUTFunction = 0'End IfCase UTFLkyReLIf X > 0 ThendUTFunction = 1ElsedUTFunction = 0.01End IfCase UTFLindUTFunction = 1Case UTFSoftMaxdUTFunction = 1Case Else   ' for MaxPool and AvgPooldUTFunction = 1End Select
End Function

利用上面的函数对单元求导，以这个方向作为梯度下降的方向，计算梯度，下面是NNGrad()函数的定义

Private Function NNGrad(net As NN, Gradient() As TripleArr, Jacobian() As TripleArr, Funcw() As TripleArr) As Double
' another important function that calculates the error value and then gradient of a net
' gradient of a net is the basis of almost all training algoriths
' gradient is calculated and transferred by ref directly to the input parameter Gradient()
' Jacobian is calculated and passed by-Ref directly to the input parameter Jacobian()
' Error function is passed by-Ref directly to the input parameter ErrorFunction
' total errorvalue is returned as function valueDim delta() As Double, vec() As Double, net0 As NN
Dim output() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, target() As Double
Dim b As Double, num As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, sampleCount As Long
Dim layerSize As Long, layercount As Long, curSample As Long, inSize As Long
Dim preLayer As layer
Dim curLayer As layer
Dim curDelta As MultiArr
Dim preDelta As MultiArr        ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As ByteDim unitSize As Longlayercount = getLayerCount(net)With netoutSize = UBound(.OutLayer.units)sampleCount = trainSampleCount + validationSampleCountlabelSize = targetSampleSizevecSize = .inSizeFor curSample = 1 To sampleCount   ' calculate the grad for all samples''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' section 1: alculate the grad of outlayervec = InputData(curSample).d        ' read the input vec from input vec settarget = targetData(curSample).d    'read the target from target setReDim curDelta.d(1 To labelSize)utf = .OutLayer.utfNNFP net, vec, output        ' calculate the outputs of netFor i = 1 To outSize            ' calculates the error of output by calculating value of target - output for each output unitout = output(layercount).d(i)derive(layercount).d(i) = dUTFunction(utf, out)tar = target(i)num = (tar - out)Funcw(layercount).m(i).d(curSample) = numcurDelta.d(i) = num * derive(layercount).d(i)      ' delta is calculated via dfunction of utfJacobian(layercount).m(i).d(curSample, 0) = derive(layercount).d(i)For k = 1 To UBound(.OutLayer.units(1).w)Jacobian(layercount).m(i).d(curSample, k) = derive(layercount).d(i) * output(layercount - 1).d(k)Next kNext iIf layercount > 1 Then          ' there are hidden layers, delta should be calculated layer by layer''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Section 2: calculate the delta value of hidden layerspreLayer = .OutLayer                    ''''' let prelayer = outlayer'to calculate delta of current layer, the whole previous layer should be used to calculatepreDelta = curDelta       ''''' let predelta = delta of outlayer'''''''''''''''''''''For i = layercount - 1 To 1 Step -1     ' calculate from the last hidden layer back to the first hidden layercurLayer = .hLayer(i)utf = curLayer.utflayerSize = UBound(curLayer.units)ReDim curDelta.d(1 To layerSize)For j = 1 To layerSize              ' calculate delta for all units in curlayer based on previous layercur = 0                         '  prepare to calculate the sum productFor k = 1 To UBound(preDelta.d) ' delta for each units is based on the sum of product of predelta and weight of the unitcur = cur + preDelta.d(k) * preLayer.units(k).w(j)Next kFuncw(i).m(j).d(curSample) = curout = output(i).d(j)derive(i).d(j) = dUTFunction(utf, out)num = cur * derive(i).d(j)      ' delta value for each level is determined by the d function of utf of each layercurDelta.d(j) = numJacobian(i).m(j).d(curSample, 0) = derive(i).d(j) 'biasFor k = 1 To UBound(.hLayer(i).units(1).w)Jacobian(i).m(j).d(curSample, k) = derive(i).d(j) * output(i - 1).d(k)NextNext jpreLayer = curLayerpreDelta = curDeltaNext iEnd IfNext curSamplenum = 0For i = 1 To layercount     ' calculate gradientIf i < layercount ThenlayerSize = UBound(.hLayer(i).units)ElselayerSize = outSizeEnd IfFor j = 1 To layerSizeGradient(i).m(j).d = mMulti(Jacobian(i).m(j).d, Funcw(i).m(j).d, True, , , 1)num = num + mMulti(Funcw(i).m(j).d, Funcw(i).m(j).d, True, , 1, 1)NextNextEnd WithNNGrad = num / validationSampleCountEnd Function

梯度从后往前的传播

每一层的梯度算出来后，向前反向传播，并接着计算前一层的梯度，这是通过NNBP()函数实现的：

Public Function NNBP(net As NN, vec() As Double, target() As Double, Ate As Double, Optional m As Double = 0, Optional ClsCor As Byte = 0) As Double
' This function provides stochastic gradiant decendant algorithms with weiths update base on one sample _of training vector
' This function returns total error value before NNBP
' complete one step FP and BP updates of all weights, return updated NN
' vec() as input vector, label() as label of input vector, Ate as learning speed, net as net
Dim delta() As Double, previousDelta() As Double
Dim output() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double
Dim b As Double, num As Double, ErrorValue As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, layercount As Long, outClass As Long, tgtClass As Long
Dim layerSize As Long
Dim preLayer As layer
Dim curLayer As layer
Dim layerDelta() As MultiArr ' stores all delta value in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr   ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As Bytelayercount = getLayerCount(net)ErrorValue = 0                      ' calculate the total errorReDim layerDelta(1 To layercount)   ' set the layer delta structureIf m > 0 ThendeltaW = allZeroNNEnd IfWith netoutSize = UBound(.OutLayer.units)labelSize = UBound(target)vecSize = UBound(vec)If labelSize = outSize Then         ' checks if the size of label vector is the same as size of output vector''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of outlayerReDim curDelta.d(1 To labelSize)utf = .OutLayer.utfoutClass = NNFP(net, vec, output)       ' calculate the outputs of nettgtClass = 1tar = 0For i = 1 To outSize            ' calculates the error of output by calculating value of target - output for each output unitout = output(.hlayercount + 1).d(i)If target(i) > tar Then tgtClass = itar = target(i)num = (tar - out)ErrorValue = ErrorValue + num * num     ' calculate total error valuecurDelta.d(i) = num * dUTFunction(utf, out)      ' delta is calculated via dfunction of utfNextIf outClass = tgtClass Then ClsCor = 1ErrorValue = ErrorValue / 2layerDelta(layercount) = curDeltaIf layercount > 1 Then          ' there are hidden layers, delta should be calculated layer by layer''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of hidden layerspreLayer = .OutLayer                    ''''' let prelayer = outlayer'to calculate delta of current layer, the whole previous layer should be used to calculatepreDelta = layerDelta(layercount)       ''''' let predelta = delta of outlayer'''''''''''''''''''''For i = layercount - 1 To 1 Step -1     ' calculate from the last hidden layer back to the first hidden layercurLayer = .hLayer(i)utf = curLayer.utflayerSize = UBound(curLayer.units)ReDim curDelta.d(1 To layerSize)For j = 1 To layerSize              ' calculate delta for all units in curlayer based on previous layercur = 0                         '  prepare to calculate the sum productFor k = 1 To UBound(preDelta.d) ' delta for each units is based on the sum of product of predelta and weight of the unitcur = cur + preDelta.d(k) * preLayer.units(k).w(j)Nextout = output(i).d(j)num = cur * dUTFunction(utf, out) ' delta value for each level is determined by the d function of utf of each layercurDelta.d(j) = numNextpreLayer = curLayerpreDelta = curDeltalayerDelta(i) = curDeltaNext'''''''''''' All deltas for each unit in all layers have been calculated'''''''''''' to update all weightsinputs = vec                             ' input is to be used to update the first layerFor i = 1 To layercount - 1             ' update weights for each hidden layerlayerSize = UBound(.hLayer(i).units)For j = 1 To layerSize              ' for each units in the layer, updateunits with Ate*delta* inputtempIn = inputsb = layerDelta(i).d(j) * Ate    ' update the bias valueIf m > 0 ThenupdateUnit deltaW.hLayer(i).units(j), tempIn, 1, bupdateUnit deltaW.hLayer(i).units(j), previousDeltaW.hLayer(i).units(j).w, previousDeltaW.hLayer(i).units(j).b, m.hLayer(i).units(j) = updateUnit(.hLayer(i).units(j), previousDeltaW.hLayer(i).units(j).w, previousDeltaW.hLayer(i).units(j).b)End IfIf m = 0 Then .hLayer(i).units(j) = updateUnit(.hLayer(i).units(j), tempIn, 1, b) ' b as ratio of updating' to save time, temp net is not used if momentum =0Nextinputs = output(i).dNextlayerSize = UBound(.OutLayer.units) ' prepare the outlayer and update weights for each out unitsFor j = 1 To layerSize   ' for each units in the outlayer, updateunits with Ate*delta* inputtempIn = inputsb = layerDelta(layercount).d(j) * AteIf m > 0 ThenupdateUnit deltaW.OutLayer.units(j), tempIn, 1, bupdateUnit deltaW.OutLayer.units(j), previousDeltaW.OutLayer.units(j).w, previousDeltaW.OutLayer.units(j).b, m.OutLayer.units(j) = updateUnit(.OutLayer.units(j), deltaW.OutLayer.units(j).w, deltaW.OutLayer.units(j).b)End IfIf m = 0 Then .OutLayer.units(j) = updateUnit(.OutLayer.units(j), tempIn, 1, b)Next' update of all weights done, copy current delta weights to previous delta weights for next round updatesIf m > 0 Then previousDeltaW = deltaWEnd IfElseMsgBox "labelsize does not equal to outsize"End IfEnd WithNNBP = ErrorValue
End Function

随机梯度下降算法和标准梯度下降算法

上面的函数首先调用一次前向传播NNFP()计算网络的输出，然后计算输出和目标值（也就是”正确答案“）之间的差异，并计算导数。逐级传递到第一层。即算完成后，逐级更新所有单元的连接权值。完成一次权值更新。

上面的函数在每次更新权值之前只进行一次前向传播，也就是说，只进行一组数据的学习，检查到误差值后，就立即更新权值，这种方法被称为“Stochastic Gradient Descent"随机梯度下降算法，很多情况下一组数据可能覆盖面太窄，因此可以让网络对比如说100组数据进行学习，计算这一百组数据的实际输出和目标值之间的平均误差，并用这个平均误差来更新权值。这种做法是“Standard Gradient Descent”标准梯度下降算法。这种算法在NNBP2()函数中体现：

Public Function NNBP2(net As NN, vset() As MultiArr, tgtset() As MultiArr, Ate As Double, Optional m As Double = 0) As Double
' This function provides one epoch of standard gradiant decenant algorithm that calculata delta value of weights base on _the whole test vector set and updates weights at the summed delta weights
' This function returns total error value before NNBP2 epoch
' vset() is a multi array that contains multiple input vectors in multiple rows, one vector in each item
' tgtset() is a multi array that contains targets of input vector sets, one target() in each item
' net is the network to be trained, Ate is learning speed
' M is the parameter momentum
' the essential of batch or standard gradiant decendant is to repeat back propagation on the same _net work and sum all delta weights, and then updates all weights in one time. thus net0 will be initialized as an "all 0" _and receives all the weights update during NNBP of the net, and the net will be updated after all samples are calculated
Dim delta() As Double, previousDelta() As Double, vec() As Double
Dim output() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, target() As Double
Dim b As Double, num As Double, ErrorValue As Double, totalError As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, sampleCount As Long
Dim layerSize As Long, layercount As Long, curSample As Long
Dim preLayer As layer
Dim curLayer As layer
Dim layerDelta() As MultiArr ' stores all delta values in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr   ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As Byte
Dim net0 As NNlayercount = getLayerCount(net)totalError = 0                      ' set starting value of total errorReDim layerDelta(1 To layercount)   ' set the layer delta structurenet0 = allZeroNN                    ' set net0 to be an "all 0" net work to receive sum of weightsWith netoutSize = UBound(.OutLayer.units)sampleCount = UBound(vset)labelSize = targetSampleSizevecSize = .inSizeIf labelSize = outSize And vecSize = inputSampleSize Then     ' checks if the size of label vector is the same as size of output vectorFor curSample = 1 To sampleCount   ' calculate the delta value for all samples before updating weights''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of outlayerErrorValue = 0                      ' set starting value of errorvaluevec = InputData(curSample).d        ' read the input vec from input vec settarget = targetData(curSample).d    'read the target from target setReDim curDelta.d(1 To labelSize)utf = .OutLayer.utfNNFP net, vec, output        ' calculate the outputs of netFor i = 1 To outSize            ' calculates the error of output by calculating value of target - output for each output unitout = output(.hlayercount + 1).d(i)tar = target(i)num = (tar - out)If curSample > sampleCount - validationSampleCount Then _ErrorValue = ErrorValue + num * num    ' error value is recorded only for validation samplescurDelta.d(i) = num * dUTFunction(utf, out)      ' delta is calculated via dfunction of utfNextErrorValue = ErrorValue / 2layerDelta(layercount) = curDeltaIf layercount > 1 Then          ' there are hidden layers, delta should be calculated layer by layer''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of hidden layerspreLayer = .OutLayer                    ''''' let prelayer = outlayer'to calculate delta of current layer, the whole previous layer should be used to calculatepreDelta = layerDelta(layercount)       ''''' let predelta = delta of outlayer'''''''''''''''''''''For i = layercount - 1 To 1 Step -1     ' calculate from the last hidden layer back to the first hidden layer'preDelta = curDelta                'to calculate delta of current layer, that of previous layer is requiredcurLayer = .hLayer(i)utf = curLayer.utflayerSize = UBound(curLayer.units)ReDim curDelta.d(1 To layerSize)For j = 1 To layerSize              ' calculate delta for all units in curlayer based on previous layercur = 0                         '  prepare to calculate the sum productFor k = 1 To UBound(preDelta.d) ' delta for each units is based on the sum of product of predelta and weight of the unitcur = cur + preDelta.d(k) * preLayer.units(k).w(j)Next kout = output(i).d(j)num = cur * dUTFunction(utf, out) ' delta value for each level is determined by the d function of utf of each layercurDelta.d(j) = numNext jpreLayer = curLayerpreDelta = curDeltalayerDelta(i) = curDeltaNext iEnd If'''''''''''' delta calculated from one sample is completed'''''''''''' a temperary weight array will be updated to sum all weight changeinputs = vec                            ' input is to be used to update the first layerFor i = 1 To layercount - 1             ' update weights for each hidden layerlayerSize = UBound(.hLayer(i).units)For j = 1 To layerSize              ' for each units in the layer, updateunits with Ate*delta* inputtempIn = inputsFor k = 1 To UBound(inputs)num = tempIn(k) * layerDelta(i).d(j) * AtetempIn(k) = numNextb = layerDelta(i).d(j) * Ate    ' update the bias valuenet0.hLayer(i).units(j) = updateUnit(net0.hLayer(i).units(j), tempIn, b) 'in NNBP2, net0 is updated for each sampleNextReDim inputs(1 To layerSize)ReDim tempIn(1 To layerSize)inputs = output(i).dNextlayerSize = UBound(.OutLayer.units) ' prepare the outlayer and update weights for each out unitsFor j = 1 To layerSize   ' for each units in the outlayer, updateunits with Ate*delta* inputtempIn = inputsFor k = 1 To UBound(inputs)num = tempIn(k) * layerDelta(layercount).d(j) * Ate  ' get the delta * input for the out layertempIn(k) = numNextb = layerDelta(layercount).d(j) * Atenet0.OutLayer.units(j) = updateUnit(net0.OutLayer.units(j), tempIn, b)  'in NNBP2, net0 is updated for each sampleNexttotalError = totalError + ErrorValueNext curSample''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''complete of summing weights into net0, next step: add net0 weights to net'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''For i = 1 To layercount - 1 ' add all hiddenlayerslayerSize = UBound(.hLayer(i).units)For j = 1 To layerSizeupdateUnit .hLayer(i).units(j), net0.hLayer(i).units(j).w, net0.hLayer(i).units(j).bIf m > 0 Then updateUnit .hLayer(i).units(j), previousDeltaW.hLayer(i).units(j).w, _previousDeltaW.hLayer(i).units(j).b, m' update one more time according to previous update of weightsNext jNext ilayerSize = UBound(.OutLayer.units)For j = 1 To layerSizeupdateUnit .OutLayer.units(j), net0.OutLayer.units(j).w, net0.OutLayer.units(j).bIf m > 0 Then updateUnit .OutLayer.units(j), previousDeltaW.OutLayer.units(j).w, _previousDeltaW.OutLayer.units(j).b, m' update one more time according to previous update of weightsNextpreviousDeltaW = net0        ' let previous delta weights which will be used in next iterateElseMsgBox "labelsize does not equal to outsize"End IfEnd WithIf sampleCount > validationSampleCount Then sampleCount = validationSampleCountNNBP2 = totalError / sampleCount
End Function

Levenberg Marquardt算法

随机梯度下降和梯度下降算法都是所谓爬山法的变体。两种算法的特点都是“小步快跑”，需要对网络的权值进行反复的调整，每次调整一点点（通常学习速率在0.02～0.1之间），因此可能需要很长的时间才能实现网络的收敛。而在早期计算性能还不够的时候，人们开发出了几种更加有效的算法，能够极大地提升学习效率，Levenberg Marquardt就是其中一种。在网络不算太大的时候，LM算法可能只需要2～5个回合就能达到最佳精度的收敛，而梯度下降法可能需要1000个回合以上。

不过这并不能说明LM是万能的。它虽然需要的训练回合数少，但是每一个回合的计算复杂度相当高，在网络规模较小的时候，能够有几十倍到几百倍的效率提升，但是只要网络复杂度提升，它的效率优势很快就会被抵消。

LM算法的实现在NNLM()和PrepLM()这两个函数中：

PrepLM()用于计算LM算法所需要的几个关键中间变量。可以看到这里的中间变量都适用MultiArr数据结构

Public Function prepLM(net As NN, vset() As MultiArr) As Double
' prepare triple array matrices that are used for NNLM training and initialize them before training
Dim layerSize As Long, layercount As Long, unitSize As Long, sampleCount As Long
Dim i As Long, j As LongWith netlayercount = getLayerCount(net)sampleCount = trainSampleCount + validationSampleCountReDim Jacob(1 To layercount)ReDim A(1 To layercount)ReDim grad(1 To layercount)ReDim hlm(1 To layercount)ReDim fw(1 To layercount)ReDim fwnew(1 To layercount)ReDim id(1 To layercount)'initialize all multiarrsReDim lamda(1 To layercount)ReDim v(1 To layercount)ReDim r(1 To layercount)ReDim derive(1 To layercount)For i = 1 To layercount' redim Jacobians and JJs layer by layerIf i < layercount Then  ' initialize J and JJ for hidden layerslayerSize = UBound(.hLayer(i).units)  ' unitcount = number of unitsElse                    ' initialize J and JJ for outlayerlayerSize = UBound(.OutLayer.units)  ' unitcount = number of unitsEnd IfReDim Jacob(i).m(1 To layerSize)ReDim A(i).m(1 To layerSize)ReDim grad(i).m(1 To layerSize)ReDim hlm(i).m(1 To layerSize)ReDim fw(i).m(1 To layerSize)ReDim fwnew(i).m(1 To layerSize)ReDim id(i).m(1 To layerSize)ReDim lamda(i).d(1 To layerSize)ReDim v(i).d(1 To layerSize)ReDim r(i).d(1 To layerSize)ReDim derive(i).d(1 To layerSize)For j = 1 To layerSize    ' initialize Jacobian and JJ for each unitIf i < layercount ThenunitSize = .hLayer(i).unitSize1ElseunitSize = .OutLayer.unitSize1End If' 2 dimensional array to store Jacobian, delta w are stored in rows, _b(0, n) is used for bias value also as w0 in order to facilitate array multiplicationReDim Jacob(i).m(j).d(1 To sampleCount, 0 To unitSize)ReDim A(i).m(j).d(0 To unitSize, 0 To unitSize)   ' JJ as result of J(T) * J is smaller in sizeReDim fw(i).m(j).d(1 To sampleCount)ReDim fwnew(i).m(j).d(1 To sampleCount)ReDim grad(i).m(j).d(0 To unitSize)      ' grad is a vector for each unitReDim hlm(i).m(j).d(0 To unitSize)ReDim id(i).m(j).d(0 To unitSize, 0 To unitSize)id(i).m(j).d = mIdentity(0, unitSize)         'initialize identity matrixlamda(i).d(j) = 0.1                         'initializa all lamdav(i).d(j) = 2r(i).d(j) = 0derive(i).d(j) = 0NextNextEnd With
End FunctionPublic Function NNLM(net As NN, vset() As MultiArr, tgtset() As MultiArr) As Double
' This function provides one epoch of neural network training with Levenberg Marquardt algorithm
' This function returns total error value of the NN output, the total error will be used to decide when training stops
' vset() is a multi array that contains multiple input vectors in multiple rows, one vector in each item
' tgtset() is a multi array that contains targets of input vector sets, one target() in each item
' net is the network whose weights are adjusted according to the data and its error
' lamda is the damping parameter used in the function to define next updates of weights, and is adjusted according to _the output of updated net. Parameter Lamda is passed in byRef, thus this function changes the value of Lamda _directly and updated Lamda will be used in the next iteration or epoch. Lamda is defined as a module level variable
' the Levenberg Marquardts algorithms uses the approximation of second dirivtives of the mean square error when possible, _combined with gradient descendant algorithm normally provides much faster speed of convergence and much less epochs. the _LM algorithm is among the most efficient algorithms of nerwork trainingDim delta() As Double, vec() As Double
Dim output() As MultiArr, output0() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, target() As Double
Dim b As Double, num As Double, ErrorValue As Double, totalError As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, sampleCount As Long
Dim layerSize As Long, layercount As Long, curSample As Long, inSize As Long
Dim preLayer As layer, aInv As Variant
Dim curLayer As layer
'Dim layerDelta() As MultiArr    ' all delta values in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr        ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As Byte
Dim net0 As NNlayercount = getLayerCount(net)totalError = 0                      ' set starting value of total errorReDim layerDelta(1 To layercount)   ' set the layer delta structurenet0 = net                          ' next new net is updated based on current netWith nettotalError = NNGrad(net, grad, Jacob, fw)''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''complete of calculating of all Jacobian for units'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' section 3, calculate of hlmFor i = 1 To layercountIf i < layercount ThenlayerSize = UBound(.hLayer(i).units)ElselayerSize = UBound(.OutLayer.units)End IfFor j = 1 To layerSize' to calculate JJ for layer(i) unit(j)A(i).m(j).d = mMulti(Jacob(i).m(j).d, Jacob(i).m(j).d, True)        ' get J(T) * JA(i).m(j).d = mAdd(A(i).m(j).d, id(i).m(j).d, , lamda(i).d(j))      ' get J(T) * J + Lamda * IdA(i).m(j).d = mInv(A(i).m(j).d)                                    ' get (J(T) * J + Lamda * Id) ^ -1grad(i).m(j).d = mMulti(Jacob(i).m(j).d, fw(i).m(j).d, True, , , 1)     ' get J(T) * fwhlm(i).m(j).d = mMulti(A(i).m(j).d, grad(i).m(j).d, , , , 1)' calculate the result of (J(T) * J + Lamda * I)^-1 * J(T) * grad' update the net for result testingIf i < layercount ThenupdateUnit net0.hLayer(i).units(j), hlm(i).m(j).d, hlm(i).m(j).d(0)ElseupdateUnit net0.OutLayer.units(j), hlm(i).m(j).d, hlm(i).m(j).d(0)End IfNext jNext i'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Section 4: calculate New f(w) and F(w) with updated net0NNFw net0, fwnew'''''''''''''''''''''''''''''''''''''''''' Section 5: calculate the gain ratio q value: q = F(w)-F(w new) / 0.5 * hlm (lamda * hlm - grad)' gain ratio is defined as the ratio between actual F(w) change to estimated change' calculate F(w) and F(w new), in which F(w) = 0.5 * f(w)T * f(w)For i = 1 To layercount ' calculate F(w)-F(w new) layer by layerIf i < layercount ThenlayerSize = UBound(.hLayer(i).units)ElselayerSize = UBound(.OutLayer.units)End IfFor j = 1 To layerSize' calculate estimated ratio changegrad(i).m(j).d = mAdd(hlm(i).m(j).d, grad(i).m(j).d, lamda(i).d(j), -1, 1)num = mMulti(hlm(i).m(j).d, grad(i).m(j).d, True, , 1, 1)num = (mMulti(fw(i).m(j).d, fw(i).m(j).d, True, , 1, 1) - _mMulti(fwnew(i).m(j).d, fwnew(i).m(j).d, True, , 1, 1)) / -num' gain ratio = Fw - Fwnew / 0.5 * hlm ... => 0.5* fwT*fw - 0.5*fwnewT*fwnew / 0.5 * hlm ...' => 2 * (0.5*fwT*fw - 0.5*fwnewT*fwnew) / hlm ... => (fwT*fw - fwnewT*fwnew) / hlm ...r(i).d(j) = num' update lamda and net according to gain ratioIf num > 0 Then' accept the step and update current unit of Net and LamdaIf i < layercount Then.hLayer(i).units(j) = net0.hLayer(i).units(j)Else.OutLayer.units(j) = net0.OutLayer.units(j)End If' update current lamdaIf num > 10 Then num = 10lamda(i).d(j) = lamda(i).d(j) * max(0.33333, 1 - (2 * num - 1) ^ 3)v(i).d(j) = 2Else' current unit of net will NOT be updated, step Discarded, update LamdaIf lamda(i).d(j) < 10000 Then lamda(i).d(j) = lamda(i).d(j) * v(i).d(j)If v(i).d(j) < 1000 Then v(i).d(j) = v(i).d(j) * 2End IfNextNext' all units of nn are updated, calculate latest total errorEnd WithNNLM = totalError / validationSampleCount
End Function

关于Levenberg Marquardt算法的详细介绍，可以查阅更多资料：

SCG算法

Scaled Conjugate Gradient比例共轭梯度法是另一种提高神经网络训练效率的算法。SCG算法的实现在PrepCG和NNCG两个函数中。
SCG算法在某些情况下性能接近LM，能够在15～30个回合内实现收敛，不过其优点在于每一个回合的计算开销相对LM来说较低。

Public Function prepCG(net As NN, vset() As MultiArr, tgtset() As MultiArr) As Double
' prepare variables needed for scaled conjugate gradient training, and calculate _gradient for the first iterateDim delta() As Double, vec() As Double, net0 As NN
Dim output() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, target() As Double
Dim b As Double, num As Double, ErrorValue As Double, totalError As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, sampleCount As Long
Dim layerSize As Long, layercount As Long, curSample As Long, inSize As Long
Dim preLayer As layer
Dim curLayer As layer
'Dim layerDelta() As MultiArr    ' all delta values in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr        ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As ByteDim unitSize As LongWith netlayercount = getLayerCount(net)sampleCount = trainSampleCount + validationSampleCountReDim vectorS(1 To layercount)ReDim vectorP(1 To layercount)ReDim vectorR(1 To layercount)ReDim NewVectorR(1 To layercount)ReDim Jacob(1 To layercount)ReDim fw(1 To layercount)ReDim fwnew(1 To layercount)ReDim grad(1 To layercount)ReDim SCGSuccess(1 To layercount)ReDim SCGFound(1 To layercount)'initialize all multiarrsReDim lamda(1 To layercount)ReDim lamdaBar(1 To layercount)ReDim sigma(1 To layercount)ReDim alpha(1 To layercount)ReDim scalarMu(1 To layercount)ReDim scalarDelta(1 To layercount)ReDim scalarDeltaC(1 To layercount)ReDim derive(1 To layercount)For i = 1 To layercount' redim Jacobians and JJs layer by layerIf i < layercount Then  ' initialize J and JJ for hidden layerslayerSize = UBound(.hLayer(i).units)  ' unitcount = number of unitsElse                    ' initialize J and JJ for outlayerlayerSize = UBound(.OutLayer.units)  ' unitcount = number of unitsEnd IfReDim vectorS(i).m(1 To layerSize)ReDim vectorP(i).m(1 To layerSize)ReDim vectorR(i).m(1 To layerSize)ReDim NewVectorR(i).m(1 To layerSize)ReDim Jacob(i).m(1 To layerSize)ReDim fw(i).m(1 To layerSize)ReDim fwnew(i).m(1 To layerSize)ReDim grad(i).m(1 To layerSize)ReDim SCGSuccess(i).d(1 To layerSize)ReDim SCGFound(i).d(1 To layerSize)ReDim lamda(i).d(1 To layerSize)ReDim lamdaBar(i).d(1 To layerSize)ReDim sigma(i).d(1 To layerSize)ReDim alpha(i).d(1 To layerSize)ReDim scalarMu(i).d(1 To layerSize)ReDim scalarDelta(i).d(1 To layerSize)ReDim scalarDeltaC(i).d(1 To layerSize)ReDim derive(i).d(1 To layerSize)For j = 1 To layerSize    ' initialize Jacobian and JJ for each unitIf i < layercount ThenunitSize = .hLayer(i).unitSize1ElseunitSize = .OutLayer.unitSize1End If' 2 dimensional array to store Jacobian, delta w are stored in rows, _b(0, n) is used for bias value also as w0 in order to facilitate array multiplicationReDim vectorS(i).m(j).d(0 To unitSize)ReDim vectorP(i).m(j).d(0 To unitSize)ReDim vectorR(i).m(j).d(0 To unitSize)ReDim NewVectorR(i).m(j).d(0 To unitSize)ReDim Jacob(i).m(j).d(1 To sampleCount, 0 To unitSize)ReDim fw(i).m(j).d(1 To sampleCount)ReDim fwnew(i).m(j).d(1 To sampleCount)ReDim grad(i).m(j).d(0 To unitSize)' initial all scalarslamda(i).d(j) = 0.5lamdaBar(i).d(j) = 0sigma(i).d(j) = 0.01SCGSuccess(i).d(j) = 1      ' success = True : 1 / False : 0, initially set success factor = trueSCGFound(i).d(j) = 1        ' found = gradient of the unit, = 0 then target foundscalarMu(i).d(j) = 0'scalarDelta(i).d(j) = 0'scalarDeltaC(i).d(j) = 0NextNextEnd With' section 2, calculate grad for the initial net to facilitate the algorithm''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''NNGrad net, vectorP, Jacob, fwvectorR = vectorPEnd FunctionPublic Function NNCG(net As NN, vset() As MultiArr, tgtset() As MultiArr, epochs As Long) As Double
' This function provides one epoch of neural network training with Scaled Conjugated Gradient algorithm
' This function returns total error value (Mean Square Error) before the adjustment of weights
' vset() is a multi array that contains multiple input vectors in multiple rows, one vector in each item
' tgtset() is a multi array that contains targets of input vector sets, one target() in each item
' net is the network whose weights are adjusted according to the data and its error
' epochs is the current iteration epoch for SCG algorithm to determine if the conjugate vector should be recalculated
' the Conjugated gradient descendant algorithm adjusts the weights of network using second derivitives of the cost function _so that allows the convergence to a local minimizer much faster than standard and stochastic gradient descendant algorithms' codes for conjugated gradient descendant training
Const initSigma As Double = 0.01Dim delta() As Double, vec() As Double, pp As Double
Dim output() As MultiArr, output0() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, target() As Double, success As Byte
Dim b As Double, num As Double, ErrorValue As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, sampleCount As Long
Dim layerSize As Long, layercount As Long, curSample As Long, inSize As Long
Dim preLayer As layer
Dim curLayer As layer
'Dim layerDelta() As MultiArr    ' all delta values in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr        ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, utf As Byte
Dim net0 As NN''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'' section 0, the gradient for first epoch is calculated already in prepCG()
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'' section 1, if success then update net0 to calculate the second order information
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''layercount = getLayerCount(net)ErrorValue = 0                      ' set starting value of total errornet0 = netWith net0outSize = UBound(.OutLayer.units)sampleCount = trainSampleCount + validationSampleCountlabelSize = targetSampleSizevecSize = .inSizesuccess = 0         ' identify if any unit is successfuly updatedFor i = 1 To layercountIf i < layercount ThenlayerSize = UBound(.hLayer(i).units)ElselayerSize = outSizeEnd IfFor j = 1 To layerSize' calculate sigmaIf SCGFound(i).d(j) > 1E-100 ThenIf SCGSuccess(i).d(j) = 1 Then      ' new net is updated only if success = truesigma(i).d(j) = initSigma / Sqr(mMulti(vectorP(i).m(j).d, vectorP(i).m(j).d, True, , 1, 1))'update unitsuccess = 1                     ' one of the unit is successfuly updated, second order information can be calcIf i < layercount ThenupdateUnit .hLayer(i).units(j), vectorP(i).m(j).d, vectorP(i).m(j).d(0), sigma(i).d(j)ElseupdateUnit .OutLayer.units(j), vectorP(i).m(j).d, vectorP(i).m(j).d(0), sigma(i).d(j)End IfEnd IfEnd IfNextNextEnd WithIf success = 1 ThenNNGrad net0, grad, Jacob, fwnew'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' section 3, calculate second order information: sigma(k) = sigma / grad(T) * grad and update new weights'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''For i = 1 To layercountWith netIf i < layercount ThencurLayer = .hLayer(i)ElsecurLayer = .OutLayerEnd IfEnd WithlayerSize = UBound(curLayer.units)For j = 1 To layerSize' to calculate grad for layer(i) unit(j)If SCGFound(i).d(j) > 1E-100 ThenvectorS(i).m(j).d = mAdd(grad(i).m(j).d, vectorR(i).m(j).d, 1 / sigma(i).d(j), -1 / sigma(i).d(j), 1)scalarDelta(i).d(j) = mMulti(vectorP(i).m(j).d, vectorS(i).m(j).d, True, , 1, 1)num = scalarDelta(i).d(j)End IfNextNextEnd Ifnet0 = netWith net0For i = 1 To layercountIf i < layercount ThencurLayer = .hLayer(i)ElsecurLayer = .OutLayerEnd IflayerSize = UBound(curLayer.units)For j = 1 To layerSizeIf SCGFound(i).d(j) > 1E-100 Then'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''section 3.1: find optimal eta with line search method' start line search'pp = mMulti(vectorP(i).m(j).d, vectorP(i).m(j).d, True, , 1, 1)vectorS(i).m(j).d = mAdd(vectorS(i).m(j).d, vectorP(i).m(j).d, _, lamda(i).d(j) - lamdaBar(i).d(j), 1) 'calculate SscalarDelta(i).d(j) = scalarDelta(i).d(j) + (lamda(i).d(j) - lamdaBar(i).d(j)) * ppscalarDelta(i).d(j) = scalarDelta(i).d(j)'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''section 3.2: if scalardelta is smaller than 0 then make it positive definiteIf scalarDelta(i).d(j) <= 0 Thennum = 2 * scalarDelta(i).d(j) / ppvectorS(i).m(j).d = mAdd(vectorS(i).m(j).d, vectorP(i).m(j).d, , lamda(i).d(j) - num, 1)lamdaBar(i).d(j) = 2 * lamda(i).d(j) - numscalarDelta(i).d(j) = lamda(i).d(j) * pp - scalarDelta(i).d(j)scalarDelta(i).d(j) = scalarDelta(i).d(j)End If'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''section 3.3: calculate step size eta: alpha = pT*r/delta mind that p*r <> p*pscalarMu(i).d(j) = mMulti(vectorP(i).m(j).d, vectorR(i).m(j).d, True, , 1, 1)alpha(i).d(j) = scalarMu(i).d(j) / scalarDelta(i).d(j)alpha(i).d(j) = alpha(i).d(j)'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''section 3.4: update weights with calculated step size to prepare for comparisonIf i < layercount ThenupdateUnit .hLayer(i).units(j), vectorP(i).m(j).d, vectorP(i).m(j).d(0), alpha(i).d(j)ElseupdateUnit .OutLayer.units(j), vectorP(i).m(j).d, vectorP(i).m(j).d(0), alpha(i).d(j)End IfEnd IfNext jNext iEnd With'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Section 4: Calculate E(w new) and then calculate Sk = E(w) - E(wnew)''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''NNFw net0, fwnewWith netFor i = 1 To layercount ' calculate F(w)-F(w new) layer by layerIf i < layercount ThencurLayer = .hLayer(i)ElsecurLayer = .OutLayerEnd IflayerSize = UBound(curLayer.units)For j = 1 To layerSizeIf SCGFound(i).d(j) > 1E-100 Then' calculate estimated comparison parametersnum = mMulti(fwnew(i).m(j).d, fwnew(i).m(j).d, True, , 1, 1)num = mMulti(fw(i).m(j).d, fw(i).m(j).d, True, , 1, 1) - numnum = 2 * scalarDelta(i).d(j) * numnum = num / scalarMu(i).d(j) ^ 2num = numIf num >= 0 Then' accept the step and update current unit of Net and LamdaIf i < layercount Then.hLayer(i).units(j) = net0.hLayer(i).units(j)Else.OutLayer.units(j) = net0.OutLayer.units(j)End If' update current lamdalamdaBar(i).d(j) = 0SCGSuccess(i).d(j) = 1      ' success = trueIf num >= 0.75 Then lamda(i).d(j) = lamda(i).d(j) / 2'the algorithm will be restarted if k mod N = zero and this is a parameter _passed from calling sub processElseIf num < 0 Then' current unit of net will NOT be updated, step Discarded, update LamdalamdaBar(i).d(j) = lamda(i).d(j)SCGSuccess(i).d(j) = 0      ' success = falseElseIf num < 0.25 Then lamda(i).d(j) = lamda(i).d(j) * 4End IfEnd IfNextNextEnd WithNNCG = NNGrad(net, NewVectorR, Jacob, fw)       ' get gradient of net for next iterationFor i = 1 To layercount ' calculate F(w)-F(w new) layer by layerIf i < layercount ThencurLayer = net.hLayer(i)ElsecurLayer = net.OutLayerEnd IflayerSize = UBound(curLayer.units)For j = 1 To layerSizevectorR(i).m(j).d = NewVectorR(i).m(j).dSCGFound(i).d(j) = mMulti(vectorR(i).m(j).d, vectorR(i).m(j).d, True, , 1, 1)If epochs Mod (curLayer.unitSize1 + 1) = 0 Then' restart algorithm, reset start direction P, N is different layer by layervectorP(i).m(j).d = vectorR(i).m(j).dElse                ' calculate next conjugate direction P with factor beta'calculate beta and beta = numnum = mMulti(NewVectorR(i).m(j).d, NewVectorR(i).m(j).d, True, , 1, 1)num = num - mMulti(NewVectorR(i).m(j).d, vectorR(i).m(j).d, True, , 1, 1)num = num / scalarMu(i).d(j)vectorP(i).m(j).d = mAdd(vectorR(i).m(j).d, vectorP(i).m(j).d, , num, 1)End IfNextNextEnd Function

卷积网络的前向和反向传播算法

在ANN Toolbox中针对CNN仅实现了最简单的FP前向传播和反向传播算法，这部分算法放在CNNFP()和CNNBP()两个函数中。不过，因为CNN的复杂性，VBA的计算速度已经跟不上了，对于超过MNIST手写数字规模的CNN来说，训练时间已经非常长了。

Public Function CNNFP(net As NN, vec() As Double, output() As MultiArr) As Long
' get feed forward calculation result of a convolutional neural network
' vec() is the input vector, vec() is a 3-Dimensional array, representing 1/2-D data with one or multi channel
' output result is stored in a Multiarray
' the function returns classification of output data if output is a class typeDim inRows As Long, inCols As Long, inChannels As Long, kRows As Long, kCols As Long
Dim i As Long, j As Long, k As Long, h As Long, m As Long, n As Long, o As Long, ii As Long, jj As Long
Dim curLayer As layer, utf As Byte, curUnit As Long
Dim num As Double, num2 As DoubleinRows = UBound(vec, 1)inCols = UBound(vec, 2)inChannels = UBound(vec, 3)With net'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' section 1, prepare output structure' output is defined as a multi array that contains a 3D array for each layern = .hlayercount + 1ReDim output(0 To n)    'output(0) stores input data'ReDim output(0).d(1 To inRows, 1 To inCols, 1 To inChannels)For i = 1 To nIf i < n ThencurLayer = .hLayer(i)ElsecurLayer = .OutLayerEnd IfWith curLayerReDim output(i).d(1 To .outSize1, 1 To .outSize2, 1 To .outSize3)' the outputs of all CNN layers are 3-DEnd WithNext'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' output structure defined, transfer inputdata to output(0)' invec is a multi dimensional array and output(0) is a multi array, value has to be transfered iteratively' input data of CNN is always a 3-D arrayoutput(0).d = vec''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate layer by layer the output according to type of each layerFor h = 1 To n ' n = .hlayercount + 1If h < n ThencurLayer = .hLayer(h)ElsecurLayer = .OutLayerEnd Ifnum = 0With curLayerutf = .utfinRows = UBound(output(h - 1).d, 1)inCols = UBound(output(h - 1).d, 2)inChannels = UBound(output(h - 1).d, 3)Select Case .layerTypeCase NNFullConLayerFor curUnit = 1 To UBound(.units)For k = 1 To inChannels           ' for each channelFor i = 1 To inRows   ' for each item in the channel (input is 2-D)For j = 1 To inColsnum = num + output(h - 1).d(i, j, k) * .units(curUnit).w(i, j, k)NextNextNextnum = num + .units(curUnit).bnum = UTFunction(utf, num)output(h).d(curUnit, 1, 1) = num    ' output of all layers are 3-D, but fullcon layer will only use _the first dimensionnum = 0Next curUnitCase NNConvLayer        'For curUnit = 1 To UBound(.units)kRows = .unitSize1kCols = .unitSize2      ' kchannel = unitsize3 = inchannels' IMPROVEMENT is POSSIBLE' next section can be put in the mConv function' NO! separating in mConv function is NOT possible, because the output of convolution of two arrays is a _2-D array, and this 2_D array can not be directly copied to one channel of a 3D arrayFor i = 1 To inRows - kRows + 1 ' for each item in the channel (input is 2-D)For j = 1 To inCols - kCols + 1For k = 1 To inChannelsFor ii = 0 To kRows - 1For jj = 0 To kCols - 1num = num + output(h - 1).d(i + ii, j + jj, k) * .units(curUnit).w(ii + 1, jj + 1, k)Next jjNext iiNext knum = num + .units(curUnit).bnum = UTFunction(utf, num)  ' * activation function of ReLU or Sigmoid is needed in CNNoutput(h).d(i, j, curUnit) = numnum = 0Next jNext iNext curUnitCase NNPoolLayer        ' there's only one unit which is always emptykRows = .unitSize1       ' pooling rate on rows and columns of input mapkCols = .unitSize2       ' kchannel = unitsize3 = inchannelsFor i = 1 To inRows Step kRows  ' for every kRows X kCols block in the input arrayFor j = 1 To inCols Step kColsFor k = 1 To inChannelsSelect Case .utfCase UTFMaxP    ' this is the definition of max poolFor ii = 0 To kRows - 1For jj = 0 To kCols - 1num2 = output(h - 1).d(i + ii, j + jj, k)If num < num2 Then num = num2Next jjNext iiCase UTFAvgP    ' this is the definition of average poolingFor ii = 0 To kRows - 1For jj = 0 To kCols - 1num2 = num2 + output(h - 1).d(i + ii, j + jj, k)Next jjNext iinum = num2 / kRows * kColsEnd Selectoutput(h).d((i + 1) / kRows, (j + 1) / kCols, k) = num ' activation function is not needednum = 0num2 = 0Next kNext jNext iEnd SelectEnd WithNext hj = 1' get the category of output if outsize > 2 (categorization problem), _also calculate softmax value if outlayer.utf = utfSoftMaxIf .OutLayer.outSize1 > 1 ThenIf .OutLayer.utf = UTFSoftMax Thennum = 0For i = 1 To .OutLayer.outSize1num = num + output(n).d(i, 1, 1)If output(n).d(j, 1, 1) < output(n).d(i, 1, 1) Then j = iNext' calculate the value of softmax, overflow if num = 0If num <> 0 ThenFor i = 1 To .OutLayer.outSize1output(n).d(i, 1, 1) = output(n).d(i, 1, 1) / numNextEnd IfElseFor i = 1 To .OutLayer.outSize1If output(n).d(j, 1, 1) < output(n).d(i, 1, 1) Then j = iNextEnd IfEnd IfCNNFP = jEnd With
End FunctionPublic Function CNNBP(net As NN, vec() As Double, target() As Double, Ate As Double, Optional m As Double = 0, Optional ClsCor As Byte = 0) As Double
' to update the weights and bias of given net according to the difference between target and output value
' ClsCur is a identifier that indicates if previous output classification equals to target classification, 0 means not equal, 1 means equal
' this function uses stochastic gradient decendant algorithm, learning speed Ate and momentum are given as parametersDim output() As MultiArr, out As Double, tar As Double, cur As Double
Dim inputs() As Double, tempIn() As Double, dUnit As NNUnit
Dim b As Double, num As Double, ErrorValue As Double
Dim vecSize As Long, labelSize As Long, outSize As Long, layercount As Long
Dim vecSize2 As Long, vecSize3 As Long, outSize2 As Long, outSize3 As Long
Dim layerSize As Long, layerSize2 As Long, layerSize3 As Long
Dim kRow As Long, kCol As Long, curRow As Long, curCol As Long, dRow As Long, dCol As Long, theRow As Long, theCol As Long
Dim preLayer As layer
Dim curLayer As layer
Dim layerDelta() As MultiArr, expndDelta() As Double ' stores all delta value in a group of deltas
Dim curDelta As MultiArr
Dim preDelta As MultiArr   ' data type to store all deltas before weights in a net is updated
Dim i As Long, j As Long, k As Long, l As Long, ii As Long, jj As Long, kk As Long, utf As Byte
Dim outClass As Long, tgtClass As Longlayercount = getLayerCount(net)ErrorValue = 0                      ' calculate the total errorReDim layerDelta(1 To layercount)   ' set the layer delta structureIf m > 0 Then deltaW = allZeroNN    ' deltaW is previous delta w of netWith net.OutLayeroutSize = .outSize1outSize2 = .outSize2outSize3 = .outSize3vecSize = UBound(vec, 1)vecSize2 = UBound(vec, 2)vecSize3 = UBound(vec, 3)''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of outlayerReDim curDelta.d(1 To outSize, 1 To outSize2, 1 To outSize3)utf = .utfoutClass = CNNFP(net, vec, output)       ' calculate the outputs of nettgtClass = 1tar = 0For i = 1 To outSize            ' calculates the error of output by calculating value of target - output for each output unitFor j = 1 To outSize2For k = 1 To outSize3out = output(layercount).d(i, j, k)If target(i) > tar Then tgtClass = itar = target(i)num = (tar - out)ErrorValue = ErrorValue + num * num     ' calculate total error valuecurDelta.d(i, j, k) = num * dUTFunction(utf, out)    ' delta is calculated via dfunction of utfNextNextNextIf outClass = tgtClass Then ClsCor = 1ErrorValue = ErrorValue / 2layerDelta(layercount) = curDeltaEnd WithIf layercount > 1 Then          ' there are hidden layers, delta should be calculated layer by layer and according to layer type''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' start to calculate the delta value of hidden layers depending on their layer typespreLayer = net.OutLayer                    ''''' let prelayer = outlayer'to calculate delta of current layer, the whole previous layer should be used to calculatepreDelta = layerDelta(layercount)       ''''' let predelta = delta of outlayer'''''''''''''''''''''With preLayerFor i = layercount - 1 To 1 Step -1     ' calculate from the last hidden layer back to the first hidden layercurLayer = net.hLayer(i)utf = curLayer.utfoutSize = curLayer.outSize1outSize2 = curLayer.outSize2outSize3 = curLayer.outSize3ReDim curDelta.d(1 To outSize, 1 To outSize2, 1 To outSize3)Select Case .layerType      ' delta calculation is different for different typesCase NNFullConLayer                 ' delta from full connection layerFor ii = 1 To outSize              ' calculate delta for all units in curlayer based on previous layerFor jj = 1 To outSize2For kk = 1 To outSize3cur = 0                         '  prepare to calculate the sum productFor k = 1 To UBound(preDelta.d) ' delta for each units is based on the sum of product of predelta and weight of the unitcur = cur + preDelta.d(k, 1, 1) * preLayer.units(k).w(ii, jj, kk)Nextout = output(i).d(ii, jj, kk)num = cur * dUTFunction(utf, out) ' delta value for each level is determined by the d function of utf of each layercurDelta.d(ii, jj, kk) = numNextNextNextCase NNConvLayer                    ' delta from convolutional layer is convolution of current weights on previous deltakRow = .unitSize1kCol = .unitSize2dRow = .outSize1dCol = .outSize2expndDelta = mExpand(preDelta.d, kRow - 1)  ' expand the delta array, all channels will be expanded in the same timeFor kk = 1 To outSize3For ii = 1 To outSizeFor jj = 1 To outSize2cur = 0For j = 1 To UBound(.units) ' per prev channel, add the convolution of prev w and prev delta' sumproduct calculationFor k = kRow To 1 Step -1For l = kCol To 1 Step -1cur = cur + .units(j).w(k, l, kk) * expndDelta(ii + kRow - k, jj + kCol - l, j)NextNextNextout = output(i).d(ii, jj, kk)num = cur * dUTFunction(utf, out) ' delta value for each level is determined by the d function of utf of each layercurDelta.d(ii, jj, kk) = numNextNextNextCase NNPoolLayer                    ' delta from pooling layer is copied directlykRow = .unitSize1kCol = .unitSize2For kk = 1 To outSize3For ii = 1 To outSize / kRowcurRow = ii * kRowFor jj = 1 To outSize2 / kColnum = preDelta.d(ii, jj, kk)curCol = jj * kColFor k = 0 To kRow - 1For l = 0 To kCol - 1curDelta.d(curRow - k, curCol - l, kk) = numNextNextNextNextNextEnd SelectpreLayer = curLayerpreDelta = curDeltalayerDelta(i) = curDeltaNextEnd With'''''''''''' All deltas for each unit in all layers have been calculated'''''''''''' to update all weightsinputs = vecFor i = 1 To layercount - 1             ' update weights for each hidden layerWith net.hLayer(i)layerSize = UBound(.units)''''calculate the input size in 3 D for conv layer weight updatingvecSize = .unitSize1 + .outSize1 - 1vecSize2 = .unitSize2 + .outSize2 - 1vecSize3 = .unitSize3 + .outSize3 - 1For j = 1 To layerSize              ' for each units in the layer, updateunits according to layer typetempIn = inputsSelect Case .layerTypeCase NNFullConLayer             ' unit = delta * input while input size = unit sizeb = layerDelta(i).d(j, 1, 1) * Ate  ' update the bias value.units(j) = updateUnit(.units(j), tempIn, 1, b, 3)  ' in this case b = delta*ate = ratio of weight updateCase NNConvLayer'b = sum of deltadUnit.b = .units(j).b   'temp unit used to update .units(j)dUnit.w = .units(j).wb = 0For curRow = 1 To .outSize1For curCol = 1 To .outSize2b = b + layerDelta(i).d(curRow, curCol, j)NextNextdUnit.b = b'w = rot of delta conv on rot of inputFor kk = 1 To .unitSize3         ' for each channel of unit(j) in layer(i)For ii = 1 To .unitSize1     ' for each row in channel(kk)For jj = 1 To .unitSize2 ' for each col in row(ii)' the value is sumproduct of rot delta and rot inputcur = 0 ' get ready to sum all weightsFor curRow = .outSize1 To 1 Step -1     ' for each row in the conv area = outsize = delta sizeFor curCol = .outSize2 To 1 Step -1theRow = vecSize - .outSize1 + curRow - ii + 1theCol = vecSize2 - .outSize2 + curCol - jj + 1cur = cur + layerDelta(i).d(curRow, curCol, j) * tempIn(theRow, theCol, kk)NextNextdUnit.w(ii, jj, kk) = curNextNextNext.units(j) = updateUnit(.units(j), dUnit.w, dUnit.b, Ate, 3)Case NNPoolLayer' no weights to be updated in pooling layersEnd SelectNextinputs = output(i).dEnd WithNextEnd IfWith net.OutLayerlayerSize = UBound(.units) ' prepare the outlayer and update weights for each out unitsFor j = 1 To layerSize   ' for each units in the outlayer, updateunits with Ate*delta* inputtempIn = inputsb = layerDelta(layercount).d(j, 1, 1) * Ate.units(j) = updateUnit(.units(j), tempIn, 1, b, 3)Next' update of all weights done, copy current delta weights to previous delta weights for next round updatesIf m > 0 Then previousDeltaW = deltaWEnd WithCNNBP = ErrorValue
End Function

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > sqlite java excel,将SQLite中的数据转换为Excel表
下一篇 > RabbitMQ的可视化管理页面简介

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce