在部署一个算法模型到生产环境之前,有必要对模型的性能进行测试;通常,对于准备投入模型训练的原始数据集拆分成训练数据(70%~80%)和测试数据(20%-30%)来训练模型和测试模型性能。
train_test_split
①自划分
np.random.seed(666)
shuffle_index = np.random.permutation(len(X))
train_index = shuffle_index[:int(len(shuffle_index)* .8)]
test_index = shuffle_index[int(len(shuffle_index)* .8):]
Train_X = X[train_index]
Train_Y = Y[train_index]
Test_X = X[test_index]
Test_Y = Y[test_index]
②scikit-learn中的train_test_split
from sklearn.model_selection import train_test_split
Train_X,Test_X,Train_Y,Test_Y = train_test_split(X,Y,test_size= 0.2,random_state=666)
分类效果评估
分类准确度(accuracy ): 统计测试集的模型预测结果与真实标签的一致度.sum( Y_predict == Test_Y )/len(Test_y)
,该方法在scikit-learn中也有封装好的函数:
from sklearn.metrics import accuracy_score
accuracy_score(Test_Y ,Y_predict)