ID3决策树与C4.5决策树分类算法简述

简介: Let’s begin with ID3 decision tree: The ID3 algorithm tries to get the most information gain when grow the decision trees. The information gain is defined as Gain(A)=I(s1,s2,…,sm)−E(A)\

Let’s begin with ID3 decision tree:
The ID3 algorithm tries to get the most information gain when grow the decision trees. The information gain is defined as

Gain(A)=I(s1,s2,,sm)E(A)

where I is the information entropy of a given sample setting,
I(s1,s2,,sm)=i=1mpilog2(pi)

E(A) is the information entropy of the subset classified by attribute A=(a1,a2,,av),
E(A)=j=1vsij+s2j++smjsI(s1,s2,,sm)

Moreover, pi is the probability of an sample belonging to class Ci, which can be estimated as pi=si|S| and pij is the probability an sample belonging to class Ci with attribute A=aj, i.e. pij+sij|Sj|.
ID3 algorithm can be simplified as follows:
  1. For every attribute A, we calculate its information gain E(A).
  2. Pick up the attribute who is of the largest E(A) as the root node or internal node.
  3. Get rid of the grown attribute A, and for every value aj of attribute A, calculate the next node to be grown.
  4. Keep steps 1~3 until each subset has only one label/class Ci.

ID3 algorithm is an old machine learning algorithm created in 1979 based on information entropy, however, there are several problems of it:

  1. ID3 prefers the attribute with more values, though it turns out not to be the optimal one.
  2. ID3 has to calculate the information entropy of every value of every attribute. Hence it always leads to many levels and branches with very little probability, as a result of which it tends to overfit classification in the test set.

C4.5 decision tree
C4,.5 algorithm makes use of Grain Ratio instead of Gain to select attributes.

GainRatio(S,A)=Gain(S,A)SplitInfo(S,A)

where Gain(S,A) is nothing more than Gain(A) in ID3, and SplitInfo(S,A) is defined as
SplitInfo(S,A)=i=1c|si||S|log2(|S||si|)

in which si to sc are the sample sets divided by c values of attribute A.
相关文章
|
23天前
|
机器学习/深度学习 算法 搜索推荐
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)
|
1月前
|
机器学习/深度学习 算法 数据挖掘
请解释Python中的决策树算法以及如何使用Sklearn库实现它。
决策树是监督学习算法,常用于分类和回归问题。Python的Sklearn库提供了决策树实现。以下是一步步创建决策树模型的简要步骤:导入所需库,加载数据集(如鸢尾花数据集),划分数据集为训练集和测试集,创建`DecisionTreeClassifier`,训练模型,预测测试集结果,最后通过`accuracy_score`评估模型性能。示例代码展示了这一过程。
|
1月前
|
机器学习/深度学习 算法
随机森林算法是如何通过构建多个决策树并将它们的预测结果进行投票来做出最终的预测的?
【2月更文挑战第28天】【2月更文挑战第102篇】随机森林算法是如何通过构建多个决策树并将它们的预测结果进行投票来做出最终的预测的?
|
2天前
|
机器学习/深度学习 算法 数据挖掘
R语言逻辑回归、Naive Bayes贝叶斯、决策树、随机森林算法预测心脏病
R语言逻辑回归、Naive Bayes贝叶斯、决策树、随机森林算法预测心脏病
|
9天前
|
机器学习/深度学习 算法 数据可视化
样条曲线、决策树、Adaboost、梯度提升(GBM)算法进行回归、分类和动态可视化
样条曲线、决策树、Adaboost、梯度提升(GBM)算法进行回归、分类和动态可视化
13 0
|
9天前
|
算法 DataX
二叉树(中)+Leetcode每日一题——“数据结构与算法”“剑指Offer55-I. 二叉树的深度”“100.相同的树”“965.单值二叉树”
二叉树(中)+Leetcode每日一题——“数据结构与算法”“剑指Offer55-I. 二叉树的深度”“100.相同的树”“965.单值二叉树”
|
1月前
|
传感器 算法 计算机视觉
基于肤色模型和中值滤波的手部检测算法FPGA实现,包括tb测试文件和MATLAB辅助验证
该内容是关于一个基于肤色模型和中值滤波的手部检测算法的描述,包括算法的运行效果图和所使用的软件版本(matlab2022a, vivado2019.2)。算法分为肤色分割和中值滤波两步,其中肤色模型在YCbCr色彩空间定义,中值滤波用于去除噪声。提供了一段核心程序代码,用于处理图像数据并在FPGA上实现。最终,检测结果输出到"hand.txt"文件。
|
1月前
|
机器学习/深度学习 算法 计算机视觉
基于yolov2深度学习网络的视频手部检测算法matlab仿真
基于yolov2深度学习网络的视频手部检测算法matlab仿真
|
1月前
|
算法
【MATLAB】语音信号识别与处理:移动中位数滤波算法去噪及谱相减算法呈现频谱
【MATLAB】语音信号识别与处理:移动中位数滤波算法去噪及谱相减算法呈现频谱
23 2
|
1月前
|
算法
【MATLAB】语音信号识别与处理:一维信号NLM非局部均值滤波算法去噪及谱相减算法呈现频谱
【MATLAB】语音信号识别与处理:一维信号NLM非局部均值滤波算法去噪及谱相减算法呈现频谱
40 1