目录
基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
分析label字段:统计SeriousDlqin2yrs类别及其个数统计
分析3个类似字段—NumberOfTimes90DaysLate、NumberOfTime60
分析单个字段—DebtRatio及与MonthlyIncome、SeriousDlqin2yrs关系
分析单个字段—NumberOfOpenCreditLinesAndLoans
分析单个字段—NumberRealEstateLoansOrLines
# 2.5.1、基于筛选的特征,利用WOE函数把分箱转成WOE值
# 3.2、模型评估:计算AUC值、绘制ROC曲线、输出混淆矩阵
# 4.1.1、求出两个刻度A、B:根据2个假设推导出评分卡的刻度参数A和B计算公式
# 4.1.2、设计评分卡规则表 :根据刻度B、对应分箱的WOE编码、模型系数,得到score_card_rule
# 4.2.1、随机选取12个样本(6个好的和6个坏的)并计算每个样本的总评分并对比Label,可验证模型效果
相关文章
DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型实现
基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
1、加载数据集
查看数据集的摘要信息
1. Unnamed: 0 ... NumberOfDependents 2. 0 1 ... 2.0 3. 1 2 ... 1.0 4. 2 3 ... 0.0 5. 3 4 ... 0.0 6. 4 5 ... 0.0 7. 8. [5 rows x 12 columns] 9. <class 'pandas.core.frame.DataFrame'> 10. RangeIndex: 150000 entries, 0 to 149999 11. Data columns (total 12 columns): 12. # Column Non-Null Count Dtype 13. --- ------ -------------- ----- 14. 0 Unnamed: 0 150000 non-null int64 15. 1 SeriousDlqin2yrs 150000 non-null int64 16. 2 RevolvingUtilizationOfUnsecuredLines 150000 non-null float64 17. 3 age 150000 non-null int64 18. 4 NumberOfTime30-59DaysPastDueNotWorse 150000 non-null int64 19. 5 DebtRatio 150000 non-null float64 20. 6 MonthlyIncome 120269 non-null float64 21. 7 NumberOfOpenCreditLinesAndLoans 150000 non-null int64 22. 8 NumberOfTimes90DaysLate 150000 non-null int64 23. 9 NumberRealEstateLoansOrLines 150000 non-null int64 24. 10 NumberOfTime60-89DaysPastDueNotWorse 150000 non-null int64 25. 11 NumberOfDependents 146076 non-null float64 26. dtypes: float64(4), int64(8) 27. memory usage: 13.7 MB 28. None 29. Unnamed: 0 ... NumberOfDependents 30. count 150000.000000 ... 146076.000000 31. mean 75000.500000 ... 0.757222 32. std 43301.414527 ... 1.115086 33. min 1.000000 ... 0.000000 34. 25% 37500.750000 ... 0.000000 35. 50% 75000.500000 ... 0.000000 36. 75% 112500.250000 ... 1.000000 37. max 150000.000000 ... 20.000000
2、特征工程:数据分析与处理
# 2.1、缺失值分析及处理
1. [8 rows x 12 columns] 2. Column Number_of_Null_Values Proportion 3. 0 Unnamed: 0 0 0.000000 4. 1 SeriousDlqin2yrs 0 0.000000 5. 2 RevolvingUtilizationOfUnsecuredLines 0 0.000000 6. 3 age 0 0.000000 7. 4 NumberOfTime30-59DaysPastDueNotWorse 0 0.000000 8. 5 DebtRatio 0 0.000000 9. 6 MonthlyIncome 29731 0.198207 10. 7 NumberOfOpenCreditLinesAndLoans 0 0.000000 11. 8 NumberOfTimes90DaysLate 0 0.000000 12. 9 NumberRealEstateLoansOrLines 0 0.000000 13. 10 NumberOfTime60-89DaysPastDueNotWorse 0 0.000000 14. 11 NumberOfDependents 3924 0.026160 15. Unnamed: 0 0 16. SeriousDlqin2yrs 0 17. RevolvingUtilizationOfUnsecuredLines 0 18. age 0 19. NumberOfTime30-59DaysPastDueNotWorse 0 20. DebtRatio 0 21. MonthlyIncome 0 22. NumberOfOpenCreditLinesAndLoans 0 23. NumberOfTimes90DaysLate 0 24. NumberRealEstateLoansOrLines 0 25. NumberOfTime60-89DaysPastDueNotWorse 0 26. NumberOfDependents 0
# 2.2、单个字段逐个分析
分析label字段:统计SeriousDlqin2yrs类别及其个数统计
1. Default Rate: 0.06684 2. count 150000.000000 3. mean 6.048438 4. std 249.755371 5. min 0.000000 6. 25% 0.029867 7. 50% 0.154181 8. 75% 0.559046 9. max 50708.000000 10. Name: RevolvingUtilizationOfUnsecuredLines, dtype: float64
1. [[0, 0.06684], [1, 0.37177950868783705], [2, 0.14555256064690028], [3, 0.09931506849315068], [4, 0.08679245283018867], [5, 0.07874015748031496], [6, 0.07692307692307693], [7, 0.0778688524590164], [8, 0.07407407407407407], [9, 0.07053941908713693], [10, 0.07053941908713693], [11, 0.07053941908713693], [12, 0.06666666666666667], [13, 0.058823529411764705], [14, 0.058823529411764705], [15, 0.05531914893617021], [16, 0.05531914893617021], [17, 0.05531914893617021], [18, 0.05531914893617021], [19, 0.05555555555555555]] 2. Proportion of Defaulters with Total Amount of Money Owed Not Exceeding Total Credit Limit: 0.05991996127598361 3. Proportion of Defaulters with Total Amount of Money Owed Not Exceeding or Equal to 13 times of Total Credit Limit: 4. 0.06685273968029273
分析age字段
1. count 150000.000000 2. mean 52.295207 3. std 14.771866 4. min 0.000000 5. 25% 41.000000 6. 50% 52.000000 7. 75% 63.000000 8. max 109.000000 9. Name: age, dtype: float64
分析3个类似字段—NumberOfTimes90DaysLate、NumberOfTime60
89DaysPastDueNotWorse、NumberOfTime30-59DaysPastDueNotWorse
1. 0 141662 2. 1 5243 3. 2 1555 4. 3 667 5. 4 291 6. 5 131 7. 6 80 8. 7 38 9. 8 21 10. 9 19 11. 10 8 12. 11 5 13. 12 2 14. 13 4 15. 14 2 16. 15 2 17. 17 1 18. 96 5 19. 98 264 20. Name: NumberOfTimes90DaysLate, dtype: int64 21. 0 142396 22. 1 5731 23. 2 1118 24. 3 318 25. 4 105 26. 5 34 27. 6 16 28. 7 9 29. 8 2 30. 9 1 31. 11 1 32. 96 5 33. 98 264 34. Name: NumberOfTime60-89DaysPastDueNotWorse, dtype: int64 35. 0 126018 36. 1 16033 37. 2 4598 38. 3 1754 39. 4 747 40. 5 342 41. 6 140 42. 7 54 43. 8 25 44. 9 12 45. 10 4 46. 11 1 47. 12 2 48. 13 1 49. 96 5 50. 98 264 51. Name: NumberOfTime30-59DaysPastDueNotWorse, dtype: int64 52. NumberOfTimes90DaysLate ... NumberOfTime30-59DaysPastDueNotWorse 53. count 269.000000 ... 269.000000 54. mean 97.962825 ... 97.962825 55. std 0.270628 ... 0.270628 56. min 96.000000 ... 96.000000 57. 25% 98.000000 ... 98.000000 58. 50% 98.000000 ... 98.000000 59. 75% 98.000000 ... 98.000000 60. max 98.000000 ... 98.000000 61. 62. [8 rows x 3 columns] 63. {'98,98,98': 263, '96,96,96': 4}
分析单个字段—DebtRatio及与MonthlyIncome、SeriousDlqin2yrs关系
1. temp = df_train[(df_DR > df_DR95) & (df_train['SeriousDlqin2yrs'] == df_train['MonthlyIncome'])] 2. temp.to_csv('20220314temp.csv')
1. count 150000.000000 2. mean 353.005076 3. std 2037.818523 4. min 0.000000 5. 25% 0.175074 6. 50% 0.366508 7. 75% 0.868254 8. max 329664.000000 9. Name: DebtRatio, dtype: float64 10. 2449.0 11. DebtRatio MonthlyIncome SeriousDlqin2yrs 12. count 7494.000000 7494.000000 7494.000000 13. mean 4417.958367 5126.905791 0.055111 14. std 7875.314649 1183.339377 0.228212 15. min 2450.000000 0.000000 0.000000 16. 25% 2893.250000 5400.000000 0.000000 17. 50% 3491.000000 5400.000000 0.000000 18. 75% 4620.000000 5400.000000 0.000000 19. max 329664.000000 5400.000000 1.000000 20. 331 21. 5400.0 7115 22. 0.0 347 23. 1.0 32 24. Name: MonthlyIncome, dtype: int64 25. Number of people who owe around 2449 or more times what they own and have same values for MonthlyIncome and SeriousDlqin2yrs: 331 26. 3489.024999999994 27. DebtRatio MonthlyIncome SeriousDlqin2yrs 28. count 3750.000000 3750.00000 3750.000000 29. mean 5917.488000 5133.60320 0.064267 30. std 10925.524011 1169.58239 0.245260 31. min 3490.000000 0.00000 0.000000 32. 25% 3957.250000 5400.00000 0.000000 33. 50% 4619.000000 5400.00000 0.000000 34. 75% 5789.500000 5400.00000 0.000000 35. max 329664.000000 5400.00000 1.000000 36. 164 37. 5400.0 3565 38. 0.0 173 39. 1.0 12 40. Name: MonthlyIncome, dtype: int64 41. Number of people who owe around 3490 or more times what they own and have same values for MonthlyIncome and SeriousDlqin2yrs: 164
分析单个字段—MonthlyIncome
分析单个字段—NumberOfOpenCreditLinesAndLoans
分析单个字段—NumberRealEstateLoansOrLines
分析单个字段—NumberOfDependents
# 2.3、数据分箱
仅label没分箱处理
# 2.4、特征筛选:利用IV方法
1. bin_DebtRatio cal_IV: 0.0595 2. bin_MonthlyIncome cal_IV: 0.0562 3. bin_RevolvingUtilizationOfUnsecuredLines cal_IV: 1.0596 4. bin_NumberOfOpenCreditLinesAndLoans cal_IV: 0.048 5. bin_NumberRealEstateLoansOrLines cal_IV: 0.0121 6. bin_age cal_IV: 0.2404 7. bin_NumberOfDependents cal_IV: 0.0145 8. bin_NumberOfTime30-59DaysPastDueNotWorse cal_IV: 0.4924 9. bin_NumberOfTime60-89DaysPastDueNotWorse cal_IV: 0.2666 10. bin_NumberOfTimes90DaysLate cal_IV: 0.4916
# 2.5、计算WOE值
# 2.5.1、基于筛选的特征,利用WOE函数把分箱转成WOE值
woe_cols: ['woe_bin_age', 'woe_bin_RevolvingUtilizationOfUnsecuredLines', 'woe_bin_NumberOfTime30-59DaysPastDueNotWorse', 'woe_bin_NumberOfTime60-89DaysPastDueNotWorse', 'woe_bin_NumberOfTimes90DaysLate']
1. ------------- age 2. <class 'pandas.core.frame.DataFrame'> df……final 3. features bin woe 4. 0 age (40.0, 50.0] 0.228343 5. 1 age (25.0, 40.0] 0.469547 6. 5 age (70.0, inf] -1.132145 7. 6 age (50.0, 60.0] -0.084782 8. 15 age (60.0, 70.0] -0.689003 9. 19 age (-inf, 25.0] 0.562024 10. 11. 12. ------------- RevolvingUtilizationOfUnsecuredLines 13. <class 'pandas.core.frame.DataFrame'> df……final 14. features bin woe 15. 0 RevolvingUtilizationOfUnsecuredLines (0.699, 50708.0] 1.242254 16. 2 RevolvingUtilizationOfUnsecuredLines (0.271, 0.699] 0.053164 17. 3 RevolvingUtilizationOfUnsecuredLines (0.0832, 0.271] -0.866502 18. 11 RevolvingUtilizationOfUnsecuredLines (-0.001, 0.0192] -1.286617 19. 14 RevolvingUtilizationOfUnsecuredLines (0.0192, 0.0832] -1.447382 20. 21. 22. ------------- NumberOfTime30-59DaysPastDueNotWorse 23. <class 'pandas.core.frame.DataFrame'> df……final 24. features bin woe 25. 0 NumberOfTime30-59DaysPastDueNotWorse (1.0, 2.0] 1.616726 26. 1 NumberOfTime30-59DaysPastDueNotWorse (-inf, 1.0] -0.257826 27. 13 NumberOfTime30-59DaysPastDueNotWorse (2.0, 3.0] 2.027495 28. 183 NumberOfTime30-59DaysPastDueNotWorse (3.0, 4.0] 2.336869 29. 191 NumberOfTime30-59DaysPastDueNotWorse (4.0, 5.0] 2.436786 30. 251 NumberOfTime30-59DaysPastDueNotWorse (6.0, 7.0] 2.710383 31. 423 NumberOfTime30-59DaysPastDueNotWorse (9.0, inf] 2.846431 32. 1052 NumberOfTime30-59DaysPastDueNotWorse (5.0, 6.0] 2.750685 33. 6909 NumberOfTime30-59DaysPastDueNotWorse (7.0, 8.0] 1.882503 34. 10822 NumberOfTime30-59DaysPastDueNotWorse (8.0, 9.0] 1.943128 35. 36. 37. ------------- NumberOfTime60-89DaysPastDueNotWorse 38. <class 'pandas.core.frame.DataFrame'> df……final 39. features bin woe 40. 0 NumberOfTime60-89DaysPastDueNotWorse (-inf, 1.0] -0.097990 41. 186 NumberOfTime60-89DaysPastDueNotWorse (1.0, 2.0] 2.643431 42. 423 NumberOfTime60-89DaysPastDueNotWorse (4.0, 5.0] 3.115848 43. 1146 NumberOfTime60-89DaysPastDueNotWorse (2.0, 3.0] 2.901978 44. 1733 NumberOfTime60-89DaysPastDueNotWorse (9.0, inf] 2.829466 45. 2406 NumberOfTime60-89DaysPastDueNotWorse (3.0, 4.0] 3.121783 46. 6664 NumberOfTime60-89DaysPastDueNotWorse (5.0, 6.0] 3.734887 47. 16642 NumberOfTime60-89DaysPastDueNotWorse (6.0, 7.0] 2.859419 48. 23964 NumberOfTime60-89DaysPastDueNotWorse (7.0, 8.0] 2.636275 49. 68976 NumberOfTime60-89DaysPastDueNotWorse (8.0, 9.0] 2.636275 50. 51. 52. ------------- NumberOfTimes90DaysLate 53. <class 'pandas.core.frame.DataFrame'> df……final 54. features bin woe 55. 0 NumberOfTimes90DaysLate (-inf, 1.0] -0.176674 56. 13 NumberOfTimes90DaysLate (2.0, 3.0] 2.947611 57. 186 NumberOfTimes90DaysLate (1.0, 2.0] 2.632416 58. 1298 NumberOfTimes90DaysLate (4.0, 5.0] 3.183915 59. 1713 NumberOfTimes90DaysLate (3.0, 4.0] 3.344926 60. 1733 NumberOfTimes90DaysLate (9.0, inf] 2.821100 61. 2910 NumberOfTimes90DaysLate (8.0, 9.0] 3.665894 62. 3400 NumberOfTimes90DaysLate (5.0, 6.0] 3.041740 63. 3929 NumberOfTimes90DaysLate (6.0, 7.0] 4.124352 64. 5684 NumberOfTimes90DaysLate (7.0, 8.0] 3.552566
# 2.5.2、解析不同bin对应woe值的一一对应情况
Variable | Binning | Score |
NumberOfTime30-59DaysPastDueNotWorse | (-inf, 1.0] | 11 |
NumberOfTime30-59DaysPastDueNotWorse | (1.0, 2.0] | -70 |
NumberOfTime30-59DaysPastDueNotWorse | (2.0, 3.0] | -87 |
NumberOfTime30-59DaysPastDueNotWorse | (3.0, 4.0] | -101 |
NumberOfTime30-59DaysPastDueNotWorse | (4.0, 5.0] | -105 |
NumberOfTime30-59DaysPastDueNotWorse | (5.0, 6.0] | -119 |
NumberOfTime30-59DaysPastDueNotWorse | (6.0, 7.0] | -117 |
NumberOfTime30-59DaysPastDueNotWorse | (7.0, 8.0] | -81 |
NumberOfTime30-59DaysPastDueNotWorse | (8.0, 9.0] | -84 |
NumberOfTime30-59DaysPastDueNotWorse | (9.0, inf] | -123 |
NumberOfTime60-89DaysPastDueNotWorse | (-inf, 1.0] | 2 |
NumberOfTime60-89DaysPastDueNotWorse | (1.0, 2.0] | -66 |
NumberOfTime60-89DaysPastDueNotWorse | (2.0, 3.0] | -73 |
NumberOfTime60-89DaysPastDueNotWorse | (3.0, 4.0] | -78 |
NumberOfTime60-89DaysPastDueNotWorse | (4.0, 5.0] | -78 |
NumberOfTime60-89DaysPastDueNotWorse | (5.0, 6.0] | -94 |
NumberOfTime60-89DaysPastDueNotWorse | (6.0, 7.0] | -72 |
NumberOfTime60-89DaysPastDueNotWorse | (7.0, 8.0] | -66 |
NumberOfTime60-89DaysPastDueNotWorse | (8.0, 9.0] | -66 |
NumberOfTime60-89DaysPastDueNotWorse | (9.0, inf] | -71 |
NumberOfTimes90DaysLate | (-inf, 1.0] | 7 |
NumberOfTimes90DaysLate | (1.0, 2.0] | -107 |
NumberOfTimes90DaysLate | (2.0, 3.0] | -120 |
NumberOfTimes90DaysLate | (3.0, 4.0] | -137 |
NumberOfTimes90DaysLate | (4.0, 5.0] | -130 |
NumberOfTimes90DaysLate | (5.0, 6.0] | -124 |
NumberOfTimes90DaysLate | (6.0, 7.0] | -168 |
NumberOfTimes90DaysLate | (7.0, 8.0] | -145 |
NumberOfTimes90DaysLate | (8.0, 9.0] | -150 |
NumberOfTimes90DaysLate | (9.0, inf] | -115 |
RevolvingUtilizationOfUnsecuredLines | (-0.001, 0.0192] | 71 |
RevolvingUtilizationOfUnsecuredLines | (0.0192, 0.0832] | 80 |
RevolvingUtilizationOfUnsecuredLines | (0.0832, 0.271] | 48 |
RevolvingUtilizationOfUnsecuredLines | (0.271, 0.699] | -3 |
RevolvingUtilizationOfUnsecuredLines | (0.699, 50708.0] | -69 |
age | (-inf, 25.0] | -19 |
age | (25.0, 40.0] | -16 |
age | (40.0, 50.0] | -8 |
age | (50.0, 60.0] | 3 |
age | (60.0, 70.0] | 23 |
age | (70.0, inf] | 38 |
# 2.6、切分数据集:留25%作为模型的验证集
1. bad_rate: 0.06688333333333334 2. X_train.shape: (120000, 5)
# 3、逻辑回归建模
# 3.1、建立模型
1. LoR_Score: 0.9368266666666667 2. LoRC_pred_proba [0.0121424 0.15221691 0.02248172 ... 0.0528182 0.0121424 0.0952767 ] 3. LoRC_coef_lists_ 4. [0.46051155 0.76869053 0.59104431 0.36452944 0.56621256]
# 3.2、模型评估:计算AUC值、绘制ROC曲线、输出混淆矩阵
1. Auc_Score: 0.8226466762033763 2. [[34827 200] 3. [ 2169 304]]
# 4、模型推理
# 4.1、设计评分卡规则表
# 4.1.1、求出两个刻度A、B:根据2个假设推导出评分卡的刻度参数A和B计算公式
650 72.13
# 4.1.2、设计评分卡规则表 :根据刻度B、对应分箱的WOE编码、模型系数,得到score_card_rule
Variable | Binning | Score | |
0 | age | (40.0, 50.0] | -8 |
1 | age | (25.0, 40.0] | -16 |
2 | age | (70.0, inf] | 38 |
3 | age | (50.0, 60.0] | 3 |
4 | age | (60.0, 70.0] | 23 |
5 | age | (-inf, 25.0] | -19 |
6 | RevolvingUtilizationOfUnsecuredLines | (0.699, 50708.0] | -69 |
7 | RevolvingUtilizationOfUnsecuredLines | (0.271, 0.699] | -3 |
8 | RevolvingUtilizationOfUnsecuredLines | (0.0832, 0.271] | 48 |
9 | RevolvingUtilizationOfUnsecuredLines | (-0.001, 0.0192] | 71 |
10 | RevolvingUtilizationOfUnsecuredLines | (0.0192, 0.0832] | 80 |
11 | NumberOfTime30-59DaysPastDueNotWorse | (1.0, 2.0] | -69 |
12 | NumberOfTime30-59DaysPastDueNotWorse | (-inf, 1.0] | 11 |
13 | NumberOfTime30-59DaysPastDueNotWorse | (2.0, 3.0] | -86 |
14 | NumberOfTime30-59DaysPastDueNotWorse | (3.0, 4.0] | -100 |
15 | NumberOfTime30-59DaysPastDueNotWorse | (4.0, 5.0] | -104 |
16 | NumberOfTime30-59DaysPastDueNotWorse | (6.0, 7.0] | -116 |
17 | NumberOfTime30-59DaysPastDueNotWorse | (9.0, inf] | -121 |
18 | NumberOfTime30-59DaysPastDueNotWorse | (5.0, 6.0] | -117 |
19 | NumberOfTime30-59DaysPastDueNotWorse | (7.0, 8.0] | -80 |
20 | NumberOfTime30-59DaysPastDueNotWorse | (8.0, 9.0] | -83 |
21 | NumberOfTime60-89DaysPastDueNotWorse | (-inf, 1.0] | 3 |
22 | NumberOfTime60-89DaysPastDueNotWorse | (1.0, 2.0] | -70 |
23 | NumberOfTime60-89DaysPastDueNotWorse | (4.0, 5.0] | -82 |
24 | NumberOfTime60-89DaysPastDueNotWorse | (2.0, 3.0] | -76 |
25 | NumberOfTime60-89DaysPastDueNotWorse | (9.0, inf] | -74 |
26 | NumberOfTime60-89DaysPastDueNotWorse | (3.0, 4.0] | -82 |
27 | NumberOfTime60-89DaysPastDueNotWorse | (5.0, 6.0] | -98 |
28 | NumberOfTime60-89DaysPastDueNotWorse | (6.0, 7.0] | -75 |
29 | NumberOfTime60-89DaysPastDueNotWorse | (7.0, 8.0] | -69 |
30 | NumberOfTime60-89DaysPastDueNotWorse | (8.0, 9.0] | -69 |
31 | NumberOfTimes90DaysLate | (-inf, 1.0] | 7 |
32 | NumberOfTimes90DaysLate | (2.0, 3.0] | -120 |
33 | NumberOfTimes90DaysLate | (1.0, 2.0] | -108 |
34 | NumberOfTimes90DaysLate | (4.0, 5.0] | -130 |
35 | NumberOfTimes90DaysLate | (3.0, 4.0] | -137 |
36 | NumberOfTimes90DaysLate | (9.0, inf] | -115 |
37 | NumberOfTimes90DaysLate | (8.0, 9.0] | -150 |
38 | NumberOfTimes90DaysLate | (5.0, 6.0] | -124 |
39 | NumberOfTimes90DaysLate | (6.0, 7.0] | -168 |
40 | NumberOfTimes90DaysLate | (7.0, 8.0] | -145 |
# 4.2、结合刻度A计算样本评分卡得分
# 4.2.1、随机选取12个样本(6个好的和6个坏的)并计算每个样本的总评分并对比Label,可验证模型效果
# 4.2.2、结合刻度A计算样本评分卡得分
age | RevolvingUtilization OfUnsecuredLines |
NumberOfTime 30-59Days PastDueNotWorse |
NumberOfTime60-89 DaysPastDueNotWorse |
NumberOf Times90 DaysLate |
score | |
44377 | 55 | 0.081686933 | 0 | 0 | 0 | 754 |
25143 | 47 | 0.9999999 | 0 | 0 | 1 | 594 |
67429 | 54 | 0.015170898 | 0 | 0 | 0 | 745 |
66689 | 26 | 0.252252252 | 0 | 0 | 0 | 703 |
42656 | 40 | 0.916334661 | 1 | 0 | 0 | 586 |
81903 | 65 | 0.091477937 | 0 | 0 | 0 | 742 |
age | Revolving UtilizationOf UnsecuredLines |
NumberOfTime 30-59Days PastDueNotWorse |
NumberOfTime60-89 DaysPastDueNotWorse |
NumberOf Times90 DaysLate |
score | |
111052 | 30 | 0.9999999 | 0 | 4 | 2 | 386 |
30582 | 30 | 0.9999999 | 0 | 0 | 0 | 586 |
23677 | 43 | 0.68756082 | 0 | 0 | 0 | 660 |
87669 | 27 | 0.9999999 | 0 | 1 | 1 | 586 |
46920 | 50 | 0.442370466 | 0 | 0 | 0 | 660 |
78952 | 48 | 0.40781316 | 0 | 0 | 0 | 660 |
# 4.3、对比测试样本得分及其对应标签,进而设计评审策略
1. 44377 754.0 --------- 直接接受! 2. age 47.0 3. RevolvingUtilizationOfUnsecuredLines 1.0 4. NumberOfTime30-59DaysPastDueNotWorse 0.0 5. NumberOfTime60-89DaysPastDueNotWorse 0.0 6. NumberOfTimes90DaysLate 1.0 7. score 594.0 8. Name: 25143, dtype: float64 9. 25143 594.0 --------- 人工审核! 10. age 54.000000 11. RevolvingUtilizationOfUnsecuredLines 0.015171 12. NumberOfTime30-59DaysPastDueNotWorse 0.000000 13. NumberOfTime60-89DaysPastDueNotWorse 0.000000 14. NumberOfTimes90DaysLate 0.000000 15. score 745.000000 16. Name: 67429, dtype: float64 17. 67429 745.0 --------- 直接接受! 18. age 26.000000 19. RevolvingUtilizationOfUnsecuredLines 0.252252 20. NumberOfTime30-59DaysPastDueNotWorse 0.000000 21. NumberOfTime60-89DaysPastDueNotWorse 0.000000 22. NumberOfTimes90DaysLate 0.000000 23. score 703.000000 24. Name: 66689, dtype: float64 25. 66689 703.0 --------- 直接接受! 26. age 40.000000 27. RevolvingUtilizationOfUnsecuredLines 0.916335 28. NumberOfTime30-59DaysPastDueNotWorse 1.000000 29. NumberOfTime60-89DaysPastDueNotWorse 0.000000 30. NumberOfTimes90DaysLate 0.000000 31. score 586.000000 32. Name: 42656, dtype: float64 33. 42656 586.0 --------- 人工审核! 34. age 65.000000 35. RevolvingUtilizationOfUnsecuredLines 0.091478 36. NumberOfTime30-59DaysPastDueNotWorse 0.000000 37. NumberOfTime60-89DaysPastDueNotWorse 0.000000 38. NumberOfTimes90DaysLate 0.000000 39. score 742.000000 40. Name: 81903, dtype: float64 41. 81903 742.0 --------- 直接接受!