淘宝上面如何做网站本地资讯网站做的最好的
淘宝上面如何做网站,本地资讯网站做的最好的,做网站卖装备,自己建设购物网站原题采用Kmeans方法对西瓜数据集进行聚类。我花了一些时间居然没找到西瓜数据集4.0在哪里#xff0c;于是直接采用sklearn给的例子来分析一遍#xff0c;更能说明Kmeans的效果。 #!/usr/bin/python
# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as p…原题采用Kmeans方法对西瓜数据集进行聚类。我花了一些时间居然没找到西瓜数据集4.0在哪里于是直接采用sklearn给的例子来分析一遍更能说明Kmeans的效果。 #!/usr/bin/python
# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as pltfrom sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifierfile1 open(c:\quant\watermelon.csv,r)
data [line.strip(\n).split(,) for line in file1]
data np.array(data)
#X [[float(raw[-7]),float(raw[-6]),float(raw[-5]),float(raw[-4]),float(raw[-3]), float(raw[-2])] for raw in data[1:,1:-1]]X [[float(raw[-3]), float(raw[-2])] for raw in data[1:]]
y [1 if raw[-1]1 else 0 for raw in data[1:]]
X np.array(X)
y np.array(y)print(__doc__)from time import time
import numpy as np
import matplotlib.pyplot as pltfrom sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scalenp.random.seed(42)digits load_digits()
data scale(digits.data)
n_samples, n_features data.shape
n_digits len(np.unique(digits.target))
labels digits.targetsample_size 300print(n_digits: %d, \t n_samples %d, \t n_features %d% (n_digits, n_samples, n_features))
#一共十个不同的类print(79 * _)
print(% 9s % init time inertia homo compl v-meas ARI AMI silhouette)def bench_k_means(estimator, name, data):t0 time()estimator.fit(data)print(% 9s %.2fs %i %.3f %.3f %.3f %.3f %.3f %.3f% (name, (time() - t0), estimator.inertia_,metrics.homogeneity_score(labels, estimator.labels_),metrics.completeness_score(labels, estimator.labels_),metrics.v_measure_score(labels, estimator.labels_),metrics.adjusted_rand_score(labels, estimator.labels_),metrics.adjusted_mutual_info_score(labels, estimator.labels_),metrics.silhouette_score(data, estimator.labels_,metriceuclidean,sample_sizesample_size)))
#Homogeneity 和 completeness 表示簇的均一性和完整性。V值是他们的调和平均值越大说明效果越好。
bench_k_means(KMeans(initk-means, n_clustersn_digits, n_init10),namek-means, datadata)bench_k_means(KMeans(initrandom, n_clustersn_digits, n_init10),namerandom, datadata)# in this case the seeding of the centers is deterministic, hence we run the
# kmeans algorithm only once with n_init1
pca PCA(n_componentsn_digits).fit(data)
bench_k_means(KMeans(initpca.components_, n_clustersn_digits, n_init1),namePCA-based,datadata)
print(79 * _)###############################################################################
# Visualize the results on PCA-reduced datareduced_data PCA(n_components2).fit_transform(data)
kmeans KMeans(initk-means, n_clustersn_digits, n_init10)
kmeans.fit(reduced_data)# Step size of the mesh. Decrease to increase the quality of the VQ.
h .02 # point in the mesh [x_min, m_max]x[y_min, y_max].# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() 1
y_min, y_max reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() 1
xx, yy np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))# Obtain labels for each point in mesh. Use last trained model.
Z kmeans.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plot
Z Z.reshape(xx.shape)
plt.figure(1)
plt.clf()
plt.imshow(Z, interpolationnearest,extent(xx.min(), xx.max(), yy.min(), yy.max()),cmapplt.cm.Paired,aspectauto, originlower)plt.plot(reduced_data[:, 0], reduced_data[:, 1], k., markersize2)
# Plot the centroids as a white X
centroids kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1],markerx, s169, linewidths3,colorw, zorder10)
plt.title(K-means clustering on the digits dataset (PCA-reduced data)\nCentroids are marked with white cross)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show() 运行文本结果 n_digits: 10, n_samples 1797, n_features 64
_______________________________________________________________________________
init time inertia homo compl v-meas ARI AMI silhouette
k-means 0.21s 69432 0.602 0.650 0.625 0.465 0.598 0.146random 0.20s 69694 0.669 0.710 0.689 0.553 0.666 0.147
PCA-based 0.02s 71820 0.673 0.715 0.693 0.567 0.670 0.150我们可以看到降维处理后运行时间缩短而且V值还略高于以上两种方法。 图片结果 转载于:https://www.cnblogs.com/zhusleep/p/5648244.html
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/88233.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!