UFLDL 学习笔记


最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~



function [f,g] = linear_regression(theta, X,y)  %  % Arguments:  %   theta - A vector containing the parameter values to optimize.  %   X - The examples stored in a matrix.  %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The target value for each example.  y(j) is the target for example j.  %  m=size(X,2);%样本数量  n=size(X,1);%特征维度  f=0;  g=zeros(size(theta));  %  % TODO:  Compute the linear regression objective by looping over the examples in X.  %        Store the objective function value in 'f'.  %  % TODO:  Compute the gradient of the objective with respect to theta by looping over  %        the examples in X and adding up the gradient for each example.  Store the  %        computed gradient in 'g'.%%% YOUR CODE HERE %%% for j = 1:m     f = f + 0.5*(theta'*X(:,j)-y(j))^2; end%  ---------- for i = 1:n     for j = 1:m         g(i) = g(i) + X(i,j)*(theta'*X(:,j)-y(j))     end end


Optimization took 128.640734 seconds.%花这么多时间是因为我把循环里的参数打出来了RMS training error: 4.843147RMS testing error: 4.151706



  function [f,g] = logistic_regression(theta, X,y)  %  % Arguments:  %   theta - A column vector containing the parameter values to optimize.  %   X - The examples stored in a matrix.    %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The label for each example.  y(j) is the j'th example's label.  %  m=size(X,2);%训练图片数量  n=size(X,1);%图片像素点数+1  % initialize objective value and gradient.  f = 0;  g = zeros(size(theta));  %  % TODO:  Compute the objective function by looping over the dataset and summing  %        up the objective values for each example.  Store the result in 'f'.  %  % TODO:  Compute the gradient of the objective by looping over the dataset and summing  %        up the gradients (df/dtheta) for each example. Store the result in 'g'.  %%%% YOUR CODE HERE %%%for j = 1:m     f = f - ( y(j)*log(1/(1+exp(-theta'*X(:,j)))) + (1-y(j))*log(1-(1/(1+exp(-theta'*X(:,j))))) );     end%  ---------- for i = 1:n     for j = 1:m         g(i) = g(i) + X(i,j)*(1/(1+exp(-theta'*X(:,j)))-y(j));     end end


Optimization took 7874.049756 seconds.%我等到花儿都谢了Training accuracy: 100.0%Test accuracy: 100.0% 



function [f,g] = linear_regression_vec(theta, X,y)  %  % Arguments:      %   theta - A vector containing the parameter values to optimize.  %   X - The examples stored in a matrix.  %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The target value for each example.  y(j) is the target for example j.  %  m=size(X,2);  % initialize objective value and gradient.  f = 0;  g = zeros(size(theta));  %  % TODO:  Compute the linear regression objective function and gradient   %        using vectorized code.  (It will be just a few lines of code!)  %        Store the objective function value in 'f', and the gradient in 'g'.  %%%% YOUR CODE HERE %%% f = sum((theta'*X - y).^2) * 0.5; y_hat = theta'*X; g = X*(y_hat' - y');


Optimization took 0.108650 seconds.RMS training error: 4.650101RMS testing error: 4.856230

真是非常省时省力哈。不过这些i,j下标,还有转置真是让人头晕,实际写的时候可以用调试模式来观察你的数据,然后修改你的小标,决定是否转置(目的不都是为了矩阵符合相乘的条件嘛)。还有在一次试验中尽量记住每一个常用变量的含义,比如在整篇教程中,m 代表样本数量,n 代表特征维度。

下面是Logistic 回归的向量化代码:

function [f,g] = logistic_regression_vec(theta, X,y)  %  % Arguments:  %   theta - A column vector containing the parameter values to optimize.  %   X - The examples stored in a matrix.    %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The label for each example.  y(j) is the j'th example's label.  %  m=size(X,2);  % initialize objective value and gradient.  f = 0;  g = zeros(size(theta));  %  % TODO:  Compute the logistic regression objective function and gradient   %        using vectorized code.  (It will be just a few lines of code!)  %        Store the objective function value in 'f', and the gradient in 'g'.  %%%% YOUR CODE HERE %%% h = sigmoid(theta'*X); f = -sum(y.*log(h) + (1-y).*log(1 - h)); g = X*(h - y)'; 


Optimization took 3.064685 seconds.Training accuracy: 100.0%Test accuracy: 100.0%




function average_error = grad_check(fun, theta0, num_checks, varargin)  delta=1e-3;   sum_error=0;  fprintf(' Iter       i             err');  fprintf('           g_est               g               f\n')  for i=1:num_checks    T = theta0;    j = randsample(numel(T),1);%theta选择一个随机下标    T0=T; T0(j) = T0(j)-delta;%θ(j-),亦即θ的第j个元素减去delta    T1=T; T1(j) = T1(j)+delta;%θ(j+)    [f,g] = fun(T, varargin{:});    f0 = fun(T0, varargin{:});%J(θ(j-))    f1 = fun(T1, varargin{:});%J(θ(j+))    g_est = (f1-f0) / (2*delta);    error = abs(g(j) - g_est);    %循环次数,theta下标,偏差绝对值,真实值,估计值,函数值    fprintf('% 5d  % 6d % 15g % 15f % 15f % 15f\n', ...            i,j,error,g(j),g_est,f);    sum_error = sum_error + error;  end  average_error=sum_error/num_checks;


average_error = grad_check(@linear_regression_vec,theta,30,train.X,train.y);  fprintf('The Average error is :%f\n',average_error);  


Iter       i             err           g_est               g               f    1      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147    2       3     3.73228e-07  1100385.922200  1100385.922200  14517559.734147    3       4     2.48384e-06  1236106.996470  1236106.996473  14517559.734147    4      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147    5      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147    6      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147    7      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147    8      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147    9      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147   10      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147   11      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147   12       2     1.97807e-06   436767.013214   436767.013212  14517559.734147   13      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147   14      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147   15      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147   16       1     3.02999e-06   106041.865458   106041.865461  14517559.734147   17       5     1.42339e-06     6344.599333     6344.599332  14517559.734147   18       9      3.8307e-06   389421.210472   389421.210468  14517559.734147   19       7     3.66173e-06   660532.159808   660532.159812  14517559.734147   20       5     1.42339e-06     6344.599333     6344.599332  14517559.734147   21       4     2.48384e-06  1236106.996470  1236106.996473  14517559.734147   22       9      3.8307e-06   389421.210472   389421.210468  14517559.734147   23       7     3.66173e-06   660532.159808   660532.159812  14517559.734147   24      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147   25      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147   26      12     2.83984e-06  1978417.905024  1978417.905027  14517559.734147   27       5     1.42339e-06     6344.599333     6344.599332  14517559.734147   28      12     2.83984e-06  1978417.905024  1978417.905027  14517559.734147   29       5     1.42339e-06     6344.599333     6344.599332  14517559.734147   30      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147The Average error is :0.000004


 T = theta0; [f,g] = fun(T, varargin{:});

SoftMax 回归


function [f,g] = softmax_regression_vec(theta, X,y)  %  % Arguments:  %   theta - A vector containing the parameter values to optimize.  %       In minFunc, theta is reshaped to a long vector.  So we need to  %       resize it to an n-by-(num_classes-1) matrix.  %       Recall that we assume theta(:,num_classes) = 0.  %  %   X - The examples stored in a matrix.    %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The label for each example.  y(j) is the j'th example's label.  %  m=size(X,2);%样本数量  n=size(X,1);%特征维度  % theta is a vector;  need to reshape to n x num_classes.  theta=reshape(theta, n, []);  num_classes=size(theta,2)+1;  % initialize objective value and gradient.  f = 0;  g = zeros(size(theta));  %  % TODO:  Compute the softmax objective function and gradient using vectorized code.  %        Store the objective function value in 'f', and the gradient in 'g'.  %        Before returning g, make sure you form it back into a vector with g=g(:);  %%%% YOUR CODE HERE %%%  indictor = full(sparse(y, 1:m, 1));%示性函数  theta = [theta,zeros(n,1)]; %恢复theta,增加一行   a =  exp(theta'*X);  p = bsxfun(@rdivide,a,sum(a));    l = log(p);  %f = -sum(indictor*log(p);%这样的话产生过大的矩阵,不允许  f = -indictor(:)'*l(:);  g = -X * (indictor-p)';  g = g(:,1:end- 1); %减去一行   g=g(:); % make gradient a vector for minFunc


Optimization took 91.072469 seconds.Training accuracy: 94.4%Test accuracy: 92.2%

关键字:深度学习, 机器学习, 神经网络, #e-06#

本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!




