微信 html5 网站上海网站营销怎么样
news/
2025/9/23 6:59:56/
文章来源:
微信 html5 网站,上海网站营销怎么样,wordpress数据库使用,wordpress页面如何排序BatchNorm算法详解
1 BatchNorm原理
BatchNorm通过对输入的每个mini-batch的数据进行标准化#xff0c;使得网络的输入分布更加稳定。
在训练过程中#xff0c;每轮迭代网络层的输入数据分布变化很大的话#xff0c;使得数据抖动很大#xff0c;导致权重变化也会很大使得网络的输入分布更加稳定。
在训练过程中每轮迭代网络层的输入数据分布变化很大的话使得数据抖动很大导致权重变化也会很大网络很难收敛。而batch norm会将数据归一化减少不同batch间数据的抖动情况从而提高训练速度加快收敛。
BatchNorm计算流程
输入 设一个mini-batch为 B { x 1... m } \mathcal{B}\{x_{1...m}\} B{x1...m} γ , β \gamma,\beta γ,β为可学习的参数
首先计算 B \mathcal{B} B的均值 μ B ← 1 m ∑ i 1 m x i \mu_\mathcal{B} \leftarrow \frac{1}{m} \sum^{m}_{i1}x_i μB←m1i1∑mxi 然后计算 B \mathcal{B} B的方差 σ B 2 ← 1 m ∑ i 1 m ( x i − μ B ) 2 \sigma^2_\mathcal{B} \leftarrow \frac{1}{m} \sum^{m}_{i1}(x_i - \mu_\mathcal{B})^2 σB2←m1i1∑m(xi−μB)2 归一化数据 x i ^ ← x i − μ B σ B 2 ϵ \hat{x_i} \leftarrow \frac{x_i - \mu_{\mathcal{B}}}{\sqrt{\sigma^2_{\mathcal{B}} \epsilon}} xi^←σB2ϵ xi−μB 其中 ϵ \epsilon ϵ的作用是防止方差为0导致出错 ϵ \epsilon ϵ的值为1e-5。
最后对归一化的数据进行缩放scale和平移shift y i ← γ x i ^ β y_i \leftarrow \gamma \hat{x_i} \beta yi←γxi^β 其中 γ , β \gamma,\beta γ,β是通过训练学习到的。
2 BatchNorm代码实现
def batchnorm_forward(x, gamma, beta, bn_param):Forward pass for batch normalization.During training the sample mean and (uncorrected) sample variance arecomputed from minibatch statistics and used to normalize the incoming data.During training we also keep an exponentially decaying running mean of themean and variance of each feature, and these averages are used to normalizedata at test-time.At each timestep we update the running averages for mean and variance usingan exponential decay based on the momentum parameter:running_mean momentum * running_mean (1 - momentum) * sample_meanrunning_var momentum * running_var (1 - momentum) * sample_varInput:- x: Data of shape (N, D)- gamma: Scale parameter of shape (D,)- beta: Shift paremeter of shape (D,)- bn_param: Dictionary with the following keys:- mode: train or test; required- eps: Constant for numeric stability- momentum: Constant for running mean / variance.- running_mean: Array of shape (D,) giving running mean of features- running_var Array of shape (D,) giving running variance of featuresReturns a tuple of:- out: of shape (N, D)- cache: A tuple of values needed in the backward passmode bn_param[mode]eps bn_param.get(eps, 1e-5)momentum bn_param.get(momentum, 0.9)N, D x.shaperunning_mean bn_param.get(running_mean, np.zeros(D, dtypex.dtype))running_var bn_param.get(running_var, np.ones(D, dtypex.dtype))if mode train:sample_mean x.mean(axis0)sample_var x.var(axis0)running_mean momentum * running_mean (1 - momentum) * sample_meanrunning_var momentum * running_var (1 - momentum) * sample_varstd np.sqrt(sample_var eps)x_centered x - sample_meanx_norm x_centered / stdout gamma * x_norm betacache (x_norm, x_centered, std, gamma)elif mode test:x_norm (x - running_mean) / np.sqrt(running_var eps)out gamma * x_norm betaelse:raise ValueError(Invalid forward batchnorm mode %s % mode)# Store the updated running means back into bn_parambn_param[running_mean] running_meanbn_param[running_var] running_varreturn out, cache3 为什么要做滑动平均
我们一开始训练不可能获得整个训练集的均值和方差
就算我们在训练前把整个训练集做一次完全的forward拿到了均值和方差但是在模型参数变化后均值和方差也会随之变化。所以我们要通过滑动平均的方法来获取整个训练集的均值和方差。
4 BN中的滑动平均
训练过程中的每一个batch都会进行一次滑动平均的计算
初始值moving_mean 0moving_var 1相当于标准正态分布。理论上初始化为任意值。momentum 0.9
moving_mean - (moving_mean - batch_mean) * (1 - momentum)
moving_var - (moving_var - batch_var) * (1 - momentum)
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/911659.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!