机器学习实战_初识决策树（ID3）算法_理解其python代码（二）-阿里云开发者社区

机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

2023-02-01 174 发布于黑龙江

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

python递归构建决策树：

Python 基础：

count()方法：

Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。

示例：

>>> a = [-1, 3, 'aa', 85] # 定义一个list
>>> a
[-1, 3, 'aa', 85]
>>> del a[0] # 删除第0个元素
>>> a
[3, 'aa', 85]
>>> del a[2:4] # 删除从第2个元素开始，到第4个为止的元素。包括头不包括尾
>>> a
[3, 'aa']
>>> del a # 删除整个list
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>>

开始构造第一个基础的决策树：

（一）：得到决策树（字典的表示形式）的代码：

def majorityCnt(classList):#得到出现次数最多的分类名称（投票表决代码）
    classCount={}
    for vote in classList:
        if vote not in classCount.keys():classCount[vote]=0
        classCount[vote]+=1
    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]
def createTree(dataSet,labels):#**构造的决策树**
    classList = [example[-1] for example in dataSet]#得到数据集的所有类标签，列表解析详见前一节
    if classList.count(classList[0]) == len(classList):#Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)#分离出最适合的那个属性
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel:{}}#创建一个嵌套有属性bestFeatLabel的字典，bestFeatLabel:{}后的字典内嵌套的是 myTree[bestFeatLabel][value]（递归得到的字典）
    # 或者是上面两个if语句结束时return的myTree[bestFeatLabel][value]）的值majorityCnt(classList)或classList[0]
    del(labels[bestFeat])#删除已经选择出来的属性标签
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)#得到属性的各种取值（所得元素不重复）
    for value in uniqueVals:
        subLabels = labels[:]#在python中函数参数是列表类型时，参数是按照引用的方式传递，可防止改变原始列表的内容
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,bestFeat,value),subLabels)
    return myTree
#测试代码：
def createDataSet():
    dataSet = [[1,1,0,'maybe'],
               [1, 1,0,'yes'],
               [1, 1, 1,'yes'],
               [1,0,1,'maybe'],
               [0,1,0,'no'],
               [0,1,0,'no']]
    labels = ['no surfacing','flippers','maybe']
    return dataSet,labels
import CreateDataSet
import trees
myDat,labels=CreateDataSet.createDataSet()
myTree = trees.createTree(myDat,labels)
print(myTree)
#结果：{'no surfacing': {0: 'no', 1: {'flippers': {0: 'maybe', 1: {'maybe': {0: 'maybe', 1: 'yes'}}}}}}

（二）绘制树形图的代码（由于代码仅是依照上述的字典绘制，这里就不再占用过多的空间）：

中间可能会遇到的一些问题：主要是Python2.x与3.x的差别导致的：

firstStr = myTree.keys()[0]

#Clearly you’re passing in d.keys() to your shuffle function.

# Probably this was written with python2.x (when d.keys() returned a list). With python3.x, d.keys() returns a dict_keys object which behaves a lot more like a set than a list.

# As such, it can’t be indexed.

#The solution is to pass list(d.keys()) (or simply list(d)) to shuffle.

或者中文可以参照这位csdn的：firstStr = myTree.keys()[0]

（三）测试算法，使用决策树：

def classify(inputTree, featLabels, testVec):
    firstStr = list(inputTree.keys())#得到节点所代表的属性eg：'flippers'
    firstStr = firstStr[0]
    secondDict = inputTree[firstStr]#得到该节点的子节点，是一个dict，eg：{0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}
    featIndex = featLabels.index(firstStr)#得到firstStr在所给的featLabels（属性）中的位置，以便将testVec中的值与相应的属性对应
    for key in secondDict.keys():#将testVec中的值放入决策树中进行判断
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__=='dict':#如果还有子节点则继续判断
                classLabel = classify(secondDict[key],featLabels,testVec)
            else: classLabel = secondDict[key]#否则返回该节点的值
    return classLabel

（四）决策树的存储与读取：

此处主要遇到的问题是pickle的问题：

Pickle文件是二进制数据文件，因此必须使用’rb’模式打开文件，’wb’模式写入文件，而不是使用文本模式。

def storeTree(inputTree,filename):
    import pickle
    fw = open(filename,'wb')#Pickle files are binary data files, so you always have to open the file with the 'wb' mode when writing. Don't try to use a text mode here.
    pickle.dump(inputTree,fw)
    fw.close()
def grabTree(filename):
    import pickle
    fr = open(filename,'rb')#Pickle files are binary data files, so you always have to open the file with the 'rb' mode when loading. Don't try to use a text mode here.
    return pickle.load(fr)• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10

AIEarth是一个由众多领域内专家博主共同打造的学术平台，旨在建设一个拥抱智慧未来的学术殿堂！【平台地址：https://devpress.csdn.net/aiearth】很高兴认识你！加入我们共同进步！

机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像