北京网站建设建设公司上海网站建设平台
北京网站建设建设公司,上海网站建设平台,中小企业网站制作塞尼铁克,网站规划建设与管理维护教学大纲Linear Decoders Deep Learning and Unsupervised Feature Learning Tutorial Solutions
以三层的稀疏编码神经网络而言#xff0c;在sparse autoencoder中的输出层满足下面的公式 从公式中可以看出#xff0c;a3的输出值是f函数的输出#xff0c;而在普通的sparse autoenc…
Linear Decoders Deep Learning and Unsupervised Feature Learning Tutorial Solutions
以三层的稀疏编码神经网络而言在sparse autoencoder中的输出层满足下面的公式 从公式中可以看出a3的输出值是f函数的输出而在普通的sparse autoencoder中f函数一般为sigmoid函数所以其输出值的范围为(0,1)所以可以知道a3的输出值范围也在0到1之间。
另外我们知道在稀疏模型中的输出层应该是尽量和输入层特征相同也就是说a3x1这样就可以推导出x1也是在0和1之间那就是要求我们对输入到网络中的数据要先变换到0和1之间这一条件虽然在有些领域满足比如前面实验中的MINIST数字识别。 但是有些领域比如说使用了PCA Whitening后的数据其范围却不一定在0和1之间。因此Linear Decoder方法就出现了。Linear Decoder是指在隐含层采用的激发函数是sigmoid函数而在输出层的激发函数采用的是线性函数比如说最特别的线性函数——等值函数。此时也就是说输出层满足下面公式 一个 S 型或 tanh 隐含层以及线性输出层构成的自编码器我们称为线性解码器。 随着输出单元的激励函数的改变这个输出单元梯度也相应变化。回顾之前每一个输出单元误差项定义为 其中 y x 是所期望的输出, 是自编码器的输出, 是激励函数.因为在输出层激励函数为 f(z) z, 这样 f’(z) 1所以上述公式可以简化为 当然若使用反向传播算法来计算隐含层的误差项时: 因为隐含层采用一个 S 型或 tanh的激励函数 f,在上述公式中 依然是 S 型或 tanh函数的导数。
这样在用算法进行梯度的求解时只需要更改误差的计算公式而已改成如下公式 实验步骤 1.初始化参数编写计算线性解码器代价函数及其梯度的函数sparseAutoencoderLinearCost.m主要是在sparseAutoencoderCost.m的基础上稍微修改然后再检查其梯度实现是否正确。 2.加载数据并原始数据进行ZCA Whitening的预处理。 3.学习特征即用LBFG算法训练整个线性解码器网络得到整个网络权值optTheta。 4.可视化第一层学习到的特征。 linearDecoderExercise.m
%% CS294A/CS294W Linear Decoder Exercise% Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear decoder exericse. For this exercise, you will only need to modify
% the code in sparseAutoencoderLinearCost.m. You will not need to modify
% any code in this file.%%
%% STEP 0: Initialization
% Here we initialize some parameters used for the exercise.imageChannels 3; % number of channels (rgb, so 3)patchDim 8; % patch dimension
numPatches 100000; % number of patchesvisibleSize patchDim * patchDim * imageChannels; % number of input units
outputSize visibleSize; % number of output units
hiddenSize 400; % number of hidden units sparsityParam 0.035; % desired average activation of the hidden units.
lambda 3e-3; % weight decay parameter
beta 5; % weight of sparsity penalty term epsilon 0.1; % epsilon for ZCA whitening%%
%% STEP 1: Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,
% and check gradients
% You should copy sparseAutoencoderCost.m from your earlier exercise
% and rename it to sparseAutoencoderLinearCost.m.
% Then you need to rename the function from sparseAutoencoderCost to
% sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder
% uses a linear decoder instead. Once that is done, you should check
% your gradients to verify that they are correct.% NOTE: Modify sparseAutoencoderCost first!% To speed up gradient checking, we will use a reduced network and some
% dummy patchesdebugHiddenSize 5;
debugvisibleSize 8;
patches rand([8 10]);%随机产生10个样本每个样本为一个8维的列向量元素值为0~1
theta initializeParameters(debugHiddenSize, debugvisibleSize); [cost, grad] sparseAutoencoderLinearCost(theta, debugvisibleSize, debugHiddenSize, ...lambda, sparsityParam, beta, ...patches);% Check gradients
numGrad computeNumericalGradient( (x) sparseAutoencoderLinearCost(x, debugvisibleSize, debugHiddenSize, ...lambda, sparsityParam, beta, ...patches), theta);% Use this to visually compare the gradients side by side
disp([numGrad grad]); diff norm(numGrad-grad)/norm(numGradgrad);
% Should be small. In our implementation, these values are usually less than 1e-9.
disp(diff); assert(diff 1e-9, Difference too large. Check your gradient computation again);% NOTE: Once your gradients check out, you should run step 0 again to
% reinitialize the parameters
%}%%
%% STEP 2: Learn features on small patches从pathes中学习特征
% In this step, you will use your sparse autoencoder (which now uses a
% linear decoder) to learn features on small patches sampled from related
% images.%% STEP 2a: Load patches 加载数据
% In this step, we load 100k patches sampled from the STL10 dataset and
% visualize them. Note that these patches have been scaled to [0,1]load stlSampledPatches.mat %里面自己定义了变量patches的值displayColorNetwork(patches(:, 1:100));%% STEP 2b: Apply preprocessing预处理
% In this sub-step, we preprocess the sampled patches, in particular,
% ZCA whitening them.
%
% In a later exercise on convolution and pooling, you will need to replicate
% exactly the preprocessing steps you apply to these patches before
% using the autoencoder to learn features on them. Hence, we will save the
% ZCA whitening and mean image matrices together with the learned features
% later on.% Subtract mean patch (hence zeroing the mean of the patches)
meanPatch mean(patches, 2); %注意这里减掉的是每一维属性的均值
%为什么是对每行求平均以前是对每列即每个样本求平均呀因为以前是灰度图现在是彩色图如果现在对每列平均就是对三个通道求平均这肯定不行
patches bsxfun(minus, patches, meanPatch);%每一维都均值化% Apply ZCA whitening
sigma patches * patches / numPatches;%协方差矩阵
[u, s, v] svd(sigma);
ZCAWhite u * diag(1 ./ sqrt(diag(s) epsilon)) * u;%求出ZCAWhitening矩阵
patches ZCAWhite * patches;displayColorNetwork(patches(:, 1:100));%% STEP 2c: Learn features
% You will now use your sparse autoencoder (with linear decoder) to learn
% features on the preprocessed patches. This should take around 45 minutes.theta initializeParameters(hiddenSize, visibleSize);% Use minFunc to minimize the function
addpath minFunc/options struct;
options.Method lbfgs;
options.maxIter 400;
options.display on;[optTheta, cost] minFunc( (p) sparseAutoencoderLinearCost(p, ...visibleSize, hiddenSize, ...lambda, sparsityParam, ...beta, patches), ...theta, options);% Save the learned features and the preprocessing matrices for use in
% the later exercise on convolution and pooling
fprintf(Saving learned features and preprocessing matrices...\n);
save(STL10Features.mat, optTheta, ZCAWhite, meanPatch);
fprintf(Saved\n);%% STEP 2d: Visualize learned featuresW reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b optTheta(2*hiddenSize*visibleSize1:2*hiddenSize*visibleSizehiddenSize);
figure;
%这里为什么要用(W*ZCAWhite)呢首先使用W*ZCAWhite是因为每个样本x输入网络
%其输出等价于W*ZCAWhite*x另外由于W*ZCAWhite的每一行才是一个隐含节点的变换值
%而displayColorNetwork函数是把每一列显示一个小图像块的所以需要对其转置。displayColorNetwork( (W*ZCAWhite));sparseAutoencoderLinearCost.m
function [cost,grad,features] sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...lambda, sparsityParam, beta, data)
% -------------------- YOUR CODE HERE --------------------
% Instructions:
% Copy sparseAutoencoderCost in sparseAutoencoderCost.m from your
% earlier exercise onto this file, renaming the function to
% sparseAutoencoderLinearCost, and changing the autoencoder to use a
% linear decoder.
% -------------------- YOUR CODE HERE -------------------- %计算线性解码器代价函数及其梯度
% visibleSize:输入层神经单元节点数
% hiddenSize:隐藏层神经单元节点数
% lambda: 权重衰减系数
% sparsityParam: 稀疏性参数
% beta: 稀疏惩罚项的权重
% data: 训练集
% theta参数向量包含W1、W2、b1、b2W1 reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
W2 reshape(theta(hiddenSize*visibleSize1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);
b1 theta(2*hiddenSize*visibleSize1:2*hiddenSize*visibleSizehiddenSize);
b2 theta(2*hiddenSize*visibleSizehiddenSize1:end);% Loss and gradient variables (your code needs to compute these values)
m size(data, 2); % 样本数量%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute the loss for the Sparse Autoencoder and gradients
% W1grad, W2grad, b1grad, b2grad
%
% Hint: 1) data(:,i) is the i-th example
% 2) your computation of loss and gradients should match the size
% above for loss, W1grad, W2grad, b1grad, b2grad% z2 W1 * x b1
% a2 f(z2)
% z3 W2 * a2 b2
% h_Wb a3 f(z3)z2 W1 * data repmat(b1, [1, m]);
a2 sigmoid(z2);
z3 W2 * a2 repmat(b2, [1, m]);
a3 z3;rhohats mean(a2,2);
rho sparsityParam;
KLsum sum(rho * log(rho ./ rhohats) (1-rho) * log((1-rho) ./ (1-rhohats)));squares (a3 - data).^2;
squared_err_J (1/2) * (1/m) * sum(squares(:)); %均方差项
weight_decay_J (lambda/2) * (sum(W1(:).^2) sum(W2(:).^2));%权重衰减项
sparsity_J beta * KLsum; %惩罚项cost squared_err_J weight_decay_J sparsity_J;%损失函数值% delta3 -(data - a3) .* fprime(z3);
% but fprime(z3) a3 * (1-a3)
delta3 -(data - a3);
beta_term beta * (- rho ./ rhohats (1-rho) ./ (1-rhohats));
delta2 ((W2 * delta3) repmat(beta_term, [1,m]) ) .* a2 .* (1-a2);W2grad (1/m) * delta3 * a2 lambda * W2; % W2梯度
b2grad (1/m) * sum(delta3, 2); % b2梯度
W1grad (1/m) * delta2 * data lambda * W1; % W1梯度
b1grad (1/m) * sum(delta2, 2); % b1梯度%-------------------------------------------------------------------
% Convert weights and bias gradients to a compressed form
% This step will concatenate and flatten all your gradients to a vector
% which can be used in the optimization method.
grad [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];%-------------------------------------------------------------------
% We are giving you the sigmoid function, you may find this function
% useful in your computation of the loss and the gradients.
function sigm sigmoid(x)sigm 1 ./ (1 exp(-x));
endenddisplayColorNetwork.m
function displayColorNetwork(A)% display receptive field(s) or basis vector(s) for image patches
%
% A the basis, with patches as column vectors% In case the midpoint is not set at 0, we shift it dynamically
if min(A(:)) 0 A A - mean(A(:));%0均值化
endcols round(sqrt(size(A, 2)));% 每行大图像中小图像块的个数channel_size size(A,1) / 3;
dim sqrt(channel_size);% 小图像块内每行或列像素点个数
dimp dim1;
rows ceil(size(A,2)/cols);% 每列大图像中小图像块的个数
B A(1:channel_size,:);% R通道像素值
C A(channel_size1:channel_size*2,:);% G通道像素值
D A(2*channel_size1:channel_size*3,:);% B通道像素值
BB./(ones(size(B,1),1)*max(abs(B)));% 归一化
CC./(ones(size(C,1),1)*max(abs(C)));
DD./(ones(size(D,1),1)*max(abs(D)));
% Initialization of the image
I ones(dim*rowsrows-1,dim*colscols-1,3);%Transfer features to this image matrix
for i0:rows-1for j0:cols-1if i*colsj1 size(B, 2)breakend% This sets the patchI(i*dimp1:i*dimpdim,j*dimp1:j*dimpdim,1) ...reshape(B(:,i*colsj1),[dim dim]);I(i*dimp1:i*dimpdim,j*dimp1:j*dimpdim,2) ...reshape(C(:,i*colsj1),[dim dim]);I(i*dimp1:i*dimpdim,j*dimp1:j*dimpdim,3) ...reshape(D(:,i*colsj1),[dim dim]);end
endI I 1;% 使I的范围从[-1,1]变为[02]
I I / 2;% 使I的范围从[02]变为[0, 1]
imagesc(I);
axis equal% 等比坐标轴设置屏幕高宽比使得每个坐标轴的具有均匀的刻度间隔
axis off% 关闭所有的坐标轴标签、刻度、背景end 参考文献 Exercise:Learning color features with Sparse Autoencoders
Deep learning十七(Linear DecodersConvolution和Pooling)
线性解码器
吴恩达 Andrew Ng 的公开课
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/89509.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!