直接上代码:
x = np.random.rand(10,5) #随机生成一组样本
x -= x.mean(axis=0) 	# 见详注1
C = x.T.dot(x)    		# 计算自协方差矩阵
lam,v= lina.eig(C)    	# 特征分解,v是
new_index = np.argsort(lam)[::-1]    # 特征值排序,见详注2
A = -v[:,new_index]					 # 得到A
w = x.dot(A)						 # 计算变换后的特征
r = lam[new_index]/lam.sum()		 # 计算所有特征对应的贡献率测试一下:
w[:,:2]		# 新特征的前2个>>> array([[-0.3939524518, -0.4184678305],[-0.5907434013,  0.2033346207],[-0.4585388051, -0.111367225 ],[ 0.4552495673, -0.0405062598],[-0.2335902798, -0.4260334862],[ 0.4523182771,  0.039755097 ],[ 0.0902288594,  0.1869543779],[ 0.089419155 ,  0.7656098218],[ 0.7645053936, -0.3353675658],[-0.1748963144,  0.1360884499]])r			# 各个特征值对应的贡献率>>> array([0.4026073116, 0.2589988934, 0.2088275432, 0.0902665298,0.0392997221])
对比SKLEARN实现:
pca = PCA(n_components=2)pca.fit(x)	# x还是最开始那个xpca.explained_variance_ratio_>>> array([0.4026073116, 0.2589988934])	# 前2个特征对应的贡献率(完全一致)pca.transform(x)								# 降维变换(完全一致)>>> array([[-0.3939524518, -0.4184678305],[-0.5907434013,  0.2033346207],[-0.4585388051, -0.111367225 ],[ 0.4552495673, -0.0405062598],[-0.2335902798, -0.4260334862],[ 0.4523182771,  0.039755097 ],[ 0.0902288594,  0.1869543779],[ 0.089419155 ,  0.7656098218],[ 0.7645053936, -0.3353675658],[-0.1748963144,  0.1360884499]]) 
计算变换后的特征差值:
 w1 = w[:,0]w2 = pca.transform(c)[:,0]
((w1-w2)**2).sum()
>>> 2.2980196096428498e-30
贡献率的值:
pca.explained_variance_ratio_ - r[:2]
>>> array([ 1.1102230246e-16, -1.6653345369e-16])详细注解见: https://blog.csdn.net/cauchy7203/article/details/107421996