一、实验目的
使用Python实现K-means 算法。
二、实验原理
(1)(随机)选择K个聚类的初始中心;
(2)对任意一个样本点,求其到K个聚类中心的距离,将样本点归类到距离最小的中心的聚类,如此迭代n次;
(3)每次迭代过程中,利用均值等方法更新各个聚类的中心点(质心);
(4)对K个聚类中心,利用2,3步迭代更新后,如果位置点变化很小(可以设置阈值),则认为达到稳定状态,迭代结束。
三、Python包
(1)numpy
四、实验内容
数据集如下:
[3.13257748 4.08653576]
[2.8486827 4.48815431]
[3.40882487 4.14138275]
[3.06977634 4.31563331]
[3.14381702 4.10147438]
[2.67195731 3.6464033 ]
[2.53242806 4.40165829]
[3.43557873 3.70279658]
[2.62401582 3.54597948]
[3.10216656 4.19867393]
[3.77207532 2.58221923]
[3.92348801 2.72714337]
[4.2845745 3.42431606]
[3.80646856 2.73666636]
[3.9872807 3.13824138]
[4.09143306 3.39424484]
[4.31901806 3.08375654]
[3.58912334 2.91815208]
[4.09898341 3.00657741]
[4.2702863 3.20399911]
试采用K-means 算法对其进行聚类(k为2)。
代码:
import numpy as np import matplotlib.pyplot as plt data=[[3.13257748,4.08653576],[2.8486827,4.48815431], [3.40882487,4.14138275],[3.06977634,4.31563331], [3.14381702,4.10147438],[2.67195731,3.6464033 ], [2.53242806,4.40165829],[3.43557873,3.70279658], [2.62401582,3.54597948], [3.10216656,4.19867393], [3.77207532,2.58221923],[3.92348801,2.72714337], [4.2845745,3.42431606],[3.80646856,2.73666636], [3.9872807,3.13824138],[4.09143306,3.39424484], [4.31901806,3.08375654],[3.58912334,2.91815208], [4.09898341,3.00657741],[4.2702863,3.20399911]] Data=np.array(data) plt.scatter(Data[:,0], Data[:,1],color = 'green', s = 200) plt.show() new_x1=[3.5,3.5] new_x2=[4,2.5] for i in range(100): temp1=[] temp2=[] for i in range(len(Data)): dis_1=sum((Data[i]-new_x1)**2) dis_2=sum((Data[i]-new_x2)**2) if dis_1>dis_2: temp1.append(Data[i]) else: temp2.append(Data[i]) temp1=np.array(temp1) temp2=np.array(temp2) new_x1=[] new_x1.append(np.average(temp1[:,0])) new_x1.append(np.average(temp1[:,1])) new_x2 = [] new_x2.append(np.average(temp2[:, 0])) new_x2.append(np.average(temp2[:, 1])) plt.scatter(temp1[:,0], temp1[:,1],color = 'green', s = 200) plt.scatter(temp2[:,0], temp2[:,1],color = 'red', s = 200) plt.show()