Python 基础:
Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。
示例:
>>> a = [-1, 3, 'aa', 85] # 定义一个list >>> a [-1, 3, 'aa', 85] >>> del a[0] # 删除第0个元素 >>> a [3, 'aa', 85] >>> del a[2:4] # 删除从第2个元素开始,到第4个为止的元素。包括头不包括尾 >>> a [3, 'aa'] >>> del a # 删除整个list >>> a Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'a' is not defined >>>
开始构造第一个基础的决策树:
(一):得到决策树(字典的表示形式)的代码:
def majorityCnt(classList):#得到出现次数最多的分类名称(投票表决代码) classCount={} for vote in classList: if vote not in classCount.keys():classCount[vote]=0 classCount[vote]+=1 sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True) return sortedClassCount[0][0] def createTree(dataSet,labels):#**构造的决策树** classList = [example[-1] for example in dataSet]#得到数据集的所有类标签,列表解析详见前一节 if classList.count(classList[0]) == len(classList):#Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。 return classList[0] if len(dataSet[0]) == 1: return majorityCnt(classList) bestFeat = chooseBestFeatureToSplit(dataSet)#分离出最适合的那个属性 bestFeatLabel = labels[bestFeat] myTree = {bestFeatLabel:{}}#创建一个嵌套有属性bestFeatLabel的字典,bestFeatLabel:{}后的字典内嵌套的是 myTree[bestFeatLabel][value](递归得到的字典) # 或者是上面两个if语句结束时return的myTree[bestFeatLabel][value])的值majorityCnt(classList)或classList[0] del(labels[bestFeat])#删除已经选择出来的属性标签 featValues = [example[bestFeat] for example in dataSet] uniqueVals = set(featValues)#得到属性的各种取值(所得元素不重复) for value in uniqueVals: subLabels = labels[:]#在python中函数参数是列表类型时,参数是按照引用的方式传递,可防止改变原始列表的内容 myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,bestFeat,value),subLabels) return myTree #测试代码: def createDataSet(): dataSet = [[1,1,0,'maybe'], [1, 1,0,'yes'], [1, 1, 1,'yes'], [1,0,1,'maybe'], [0,1,0,'no'], [0,1,0,'no']] labels = ['no surfacing','flippers','maybe'] return dataSet,labels import CreateDataSet import trees myDat,labels=CreateDataSet.createDataSet() myTree = trees.createTree(myDat,labels) print(myTree) #结果:{'no surfacing': {0: 'no', 1: {'flippers': {0: 'maybe', 1: {'maybe': {0: 'maybe', 1: 'yes'}}}}}}
(二)绘制树形图的代码(由于代码仅是依照上述的字典绘制,这里就不再占用过多的空间):
中间可能会遇到的一些问题:主要是Python2.x与3.x的差别导致的:
#Clearly you’re passing in d.keys() to your shuffle function.
# Probably this was written with python2.x (when d.keys() returned a list). With python3.x, d.keys() returns a dict_keys object which behaves a lot more like a set than a list.
# As such, it can’t be indexed.
#The solution is to pass list(d.keys()) (or simply list(d)) to shuffle.
或者中文可以参照这位csdn的:firstStr = myTree.keys()[0]
(三)测试算法,使用决策树:
def classify(inputTree, featLabels, testVec): firstStr = list(inputTree.keys())#得到节点所代表的属性eg:'flippers' firstStr = firstStr[0] secondDict = inputTree[firstStr]#得到该节点的子节点,是一个dict,eg:{0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}} featIndex = featLabels.index(firstStr)#得到firstStr在所给的featLabels(属性)中的位置,以便将testVec中的值与相应的属性对应 for key in secondDict.keys():#将testVec中的值放入决策树中进行判断 if testVec[featIndex] == key: if type(secondDict[key]).__name__=='dict':#如果还有子节点则继续判断 classLabel = classify(secondDict[key],featLabels,testVec) else: classLabel = secondDict[key]#否则返回该节点的值 return classLabel
(四)决策树的存储与读取:
此处主要遇到的问题是pickle的问题:
Pickle文件是二进制数据文件,因此必须使用’rb’模式打开文件,’wb’模式写入文件,而不是使用文本模式。
def storeTree(inputTree,filename): import pickle fw = open(filename,'wb')#Pickle files are binary data files, so you always have to open the file with the 'wb' mode when writing. Don't try to use a text mode here. pickle.dump(inputTree,fw) fw.close() def grabTree(filename): import pickle fr = open(filename,'rb')#Pickle files are binary data files, so you always have to open the file with the 'rb' mode when loading. Don't try to use a text mode here. return pickle.load(fr)• 1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 10
AIEarth是一个由众多领域内专家博主共同打造的学术平台,旨在建设一个拥抱智慧未来的学术殿堂!【平台地址:https://devpress.csdn.net/aiearth】 很高兴认识你!加入我们共同进步!