开发者学堂课程【Python 数据分析库 Pandas 快速入门:处理其他标记的缺失值】学习笔记,与课程紧密联系,让用户快速学习知识。
课程地址:https://developer.aliyun.com/learning/course/607/detail/8861
处理其他标记的缺失值
处理步骤
1.替换 “?” 为 np.nan
df.replace(to_replace=""?", value=np.nan)
2.处理 np.nah 缺失值的步骤
处理缺失值实例
第一步:读取数据
In:
path = “
https://archive.ics.uci.edu/ml/machine-learning-databases
/breast-cancer-wisconsin/breast-cancer-wisconsin.data”
name = [ “Sample code number”, “Clump Thickness”, “Uniformity of Cell Size”, “Uniformity of Cell Shape”, “Marginal Adhesion”, “Single Epithelial Cell Size”, “Bare Nuclei”, “Bland Chromatin”, “Normal Nucleoli” ]
data = pd. read_csv (path, names=name)
Data
第二部:替换
In:
data_new = data.replace(to_replace=" ? ", value=np.nan)
//刚刚“?”的部分已经变成 nan
data_new.head()
第三步:删除缺失值
In:
data_new.dropna (inplace = True)
data_new.isnull().any()
//全部返回 False 说明不存在缺失值了
Out: