DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型

简介: DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型


目录

基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型

1、加载数据集

查看数据集的摘要信息

2、特征工程:数据分析与处理

# 2.1、缺失值分析及处理

# 2.2、单个字段逐个分析

分析label字段:统计SeriousDlqin2yrs类别及其个数统计

分析age字段

分析3个类似字段—NumberOfTimes90DaysLate、NumberOfTime60

分析单个字段—DebtRatio及与MonthlyIncome、SeriousDlqin2yrs关系

分析单个字段—MonthlyIncome

分析单个字段—NumberOfOpenCreditLinesAndLoans

分析单个字段—NumberRealEstateLoansOrLines

分析单个字段—NumberOfDependents

# 2.3、数据分箱

# 2.4、特征筛选:利用IV方法

#  2.5、计算WOE值

# 2.5.1、基于筛选的特征,利用WOE函数把分箱转成WOE值

# 2.5.2、解析不同bin对应woe值的一一对应情况

#  2.6、切分数据集:留25%作为模型的验证集

# 3、逻辑回归建模

# 3.1、建立模型

# 3.2、模型评估:计算AUC值、绘制ROC曲线、输出混淆矩阵

# 4、模型推理

# 4.1、设计评分卡规则表

# 4.1.1、求出两个刻度A、B:根据2个假设推导出评分卡的刻度参数A和B计算公式

# 4.1.2、设计评分卡规则表 :根据刻度B、对应分箱的WOE编码、模型系数,得到score_card_rule

# 4.2、结合刻度A计算样本评分卡得分

# 4.2.1、随机选取12个样本(6个好的和6个坏的)并计算每个样本的总评分并对比Label,可验证模型效果

# 4.2.2、结合刻度A计算样本评分卡得分

# 4.3、对比测试样本得分及其对应标签,进而设计评审策略


相关文章

DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型

DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型实现

基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型

1、加载数据集

查看数据集的摘要信息

1.    Unnamed: 0  ...  NumberOfDependents
2. 0           1  ...                 2.0
3. 1           2  ...                 1.0
4. 2           3  ...                 0.0
5. 3           4  ...                 0.0
6. 4           5  ...                 0.0
7. 
8. [5 rows x 12 columns]
9. <class 'pandas.core.frame.DataFrame'>
10. RangeIndex: 150000 entries, 0 to 149999
11. Data columns (total 12 columns):
12.  #   Column                                Non-Null Count   Dtype  
13. ---  ------                                --------------   -----  
14. 0   Unnamed: 0                            150000 non-null  int64
15. 1   SeriousDlqin2yrs                      150000 non-null  int64
16. 2   RevolvingUtilizationOfUnsecuredLines  150000 non-null  float64
17. 3   age                                   150000 non-null  int64
18. 4   NumberOfTime30-59DaysPastDueNotWorse  150000 non-null  int64
19. 5   DebtRatio                             150000 non-null  float64
20. 6   MonthlyIncome                         120269 non-null  float64
21. 7   NumberOfOpenCreditLinesAndLoans       150000 non-null  int64
22. 8   NumberOfTimes90DaysLate               150000 non-null  int64
23. 9   NumberRealEstateLoansOrLines          150000 non-null  int64
24. 10  NumberOfTime60-89DaysPastDueNotWorse  150000 non-null  int64
25. 11  NumberOfDependents                    146076 non-null  float64
26. dtypes: float64(4), int64(8)
27. memory usage: 13.7 MB
28. None
29.           Unnamed: 0  ...  NumberOfDependents
30. count  150000.000000  ...       146076.000000
31. mean    75000.500000  ...            0.757222
32. std     43301.414527  ...            1.115086
33. min         1.000000  ...            0.000000
34. 25%     37500.750000  ...            0.000000
35. 50%     75000.500000  ...            0.000000
36. 75%    112500.250000  ...            1.000000
37. max    150000.000000  ...           20.000000

2、特征工程:数据分析与处理

# 2.1、缺失值分析及处理

1. [8 rows x 12 columns]
2. Column  Number_of_Null_Values  Proportion
3. 0                             Unnamed: 0                      0    0.000000
4. 1                       SeriousDlqin2yrs                      0    0.000000
5. 2   RevolvingUtilizationOfUnsecuredLines                      0    0.000000
6. 3                                    age                      0    0.000000
7. 4   NumberOfTime30-59DaysPastDueNotWorse                      0    0.000000
8. 5                              DebtRatio                      0    0.000000
9. 6                          MonthlyIncome                  29731    0.198207
10. 7        NumberOfOpenCreditLinesAndLoans                      0    0.000000
11. 8                NumberOfTimes90DaysLate                      0    0.000000
12. 9           NumberRealEstateLoansOrLines                      0    0.000000
13. 10  NumberOfTime60-89DaysPastDueNotWorse                      0    0.000000
14. 11                    NumberOfDependents                   3924    0.026160
15. Unnamed: 0                              0
16. SeriousDlqin2yrs                        0
17. RevolvingUtilizationOfUnsecuredLines    0
18. age                                     0
19. NumberOfTime30-59DaysPastDueNotWorse    0
20. DebtRatio                               0
21. MonthlyIncome                           0
22. NumberOfOpenCreditLinesAndLoans         0
23. NumberOfTimes90DaysLate                 0
24. NumberRealEstateLoansOrLines            0
25. NumberOfTime60-89DaysPastDueNotWorse    0
26. NumberOfDependents                      0

# 2.2、单个字段逐个分析

分析label字段:统计SeriousDlqin2yrs类别及其个数统计

1. Default Rate: 0.06684
2. count    150000.000000
3. mean          6.048438
4. std         249.755371
5. min           0.000000
6. 25%           0.029867
7. 50%           0.154181
8. 75%           0.559046
9. max       50708.000000
10. Name: RevolvingUtilizationOfUnsecuredLines, dtype: float64

1. [[0, 0.06684], [1, 0.37177950868783705], [2, 0.14555256064690028], [3, 0.09931506849315068], [4, 0.08679245283018867], [5, 0.07874015748031496], [6, 0.07692307692307693], [7, 0.0778688524590164], [8, 0.07407407407407407], [9, 0.07053941908713693], [10, 0.07053941908713693], [11, 0.07053941908713693], [12, 0.06666666666666667], [13, 0.058823529411764705], [14, 0.058823529411764705], [15, 0.05531914893617021], [16, 0.05531914893617021], [17, 0.05531914893617021], [18, 0.05531914893617021], [19, 0.05555555555555555]]
2. Proportion of Defaulters with Total Amount of Money Owed Not Exceeding Total Credit Limit: 0.05991996127598361
3. Proportion of Defaulters with Total Amount of Money Owed Not Exceeding or Equal to 13 times of Total Credit Limit:
4. 0.06685273968029273

分析age字段

1. count    150000.000000
2. mean         52.295207
3. std          14.771866
4. min           0.000000
5. 25%          41.000000
6. 50%          52.000000
7. 75%          63.000000
8. max         109.000000
9. Name: age, dtype: float64

分析3个类似字段—NumberOfTimes90DaysLate、NumberOfTime60

89DaysPastDueNotWorse、NumberOfTime30-59DaysPastDueNotWorse

1. 0     141662
2. 1       5243
3. 2       1555
4. 3        667
5. 4        291
6. 5        131
7. 6         80
8. 7         38
9. 8         21
10. 9         19
11. 10         8
12. 11         5
13. 12         2
14. 13         4
15. 14         2
16. 15         2
17. 17         1
18. 96         5
19. 98       264
20. Name: NumberOfTimes90DaysLate, dtype: int64 
21. 0     142396
22. 1       5731
23. 2       1118
24. 3        318
25. 4        105
26. 5         34
27. 6         16
28. 7          9
29. 8          2
30. 9          1
31. 11         1
32. 96         5
33. 98       264
34. Name: NumberOfTime60-89DaysPastDueNotWorse, dtype: int64 
35. 0     126018
36. 1      16033
37. 2       4598
38. 3       1754
39. 4        747
40. 5        342
41. 6        140
42. 7         54
43. 8         25
44. 9         12
45. 10         4
46. 11         1
47. 12         2
48. 13         1
49. 96         5
50. 98       264
51. Name: NumberOfTime30-59DaysPastDueNotWorse, dtype: int64
52.        NumberOfTimes90DaysLate  ...  NumberOfTime30-59DaysPastDueNotWorse
53. count               269.000000  ...                            269.000000
54. mean                 97.962825  ...                             97.962825
55. std                   0.270628  ...                              0.270628
56. min                  96.000000  ...                             96.000000
57. 25%                  98.000000  ...                             98.000000
58. 50%                  98.000000  ...                             98.000000
59. 75%                  98.000000  ...                             98.000000
60. max                  98.000000  ...                             98.000000
61. 
62. [8 rows x 3 columns]
63. {'98,98,98': 263, '96,96,96': 4}

分析单个字段—DebtRatio及与MonthlyIncome、SeriousDlqin2yrs关系

1.     temp = df_train[(df_DR > df_DR95) & (df_train['SeriousDlqin2yrs'] == df_train['MonthlyIncome'])]
2.     temp.to_csv('20220314temp.csv')

1. count    150000.000000
2. mean        353.005076
3. std        2037.818523
4. min           0.000000
5. 25%           0.175074
6. 50%           0.366508
7. 75%           0.868254
8. max      329664.000000
9. Name: DebtRatio, dtype: float64 
10. 2449.0
11.             DebtRatio  MonthlyIncome  SeriousDlqin2yrs
12. count    7494.000000    7494.000000       7494.000000
13. mean     4417.958367    5126.905791          0.055111
14. std      7875.314649    1183.339377          0.228212
15. min      2450.000000       0.000000          0.000000
16. 25%      2893.250000    5400.000000          0.000000
17. 50%      3491.000000    5400.000000          0.000000
18. 75%      4620.000000    5400.000000          0.000000
19. max    329664.000000    5400.000000          1.000000
20. 331
21. 5400.0    7115
22. 0.0        347
23. 1.0         32
24. Name: MonthlyIncome, dtype: int64
25. Number of people who owe around 2449 or more times what they own and have same values for MonthlyIncome and SeriousDlqin2yrs: 331
26. 3489.024999999994
27.             DebtRatio  MonthlyIncome  SeriousDlqin2yrs
28. count    3750.000000     3750.00000       3750.000000
29. mean     5917.488000     5133.60320          0.064267
30. std     10925.524011     1169.58239          0.245260
31. min      3490.000000        0.00000          0.000000
32. 25%      3957.250000     5400.00000          0.000000
33. 50%      4619.000000     5400.00000          0.000000
34. 75%      5789.500000     5400.00000          0.000000
35. max    329664.000000     5400.00000          1.000000
36. 164
37. 5400.0    3565
38. 0.0        173
39. 1.0         12
40. Name: MonthlyIncome, dtype: int64
41. Number of people who owe around 3490 or more times what they own and have same values for MonthlyIncome and SeriousDlqin2yrs: 164

分析单个字段—MonthlyIncome

分析单个字段—NumberOfOpenCreditLinesAndLoans

分析单个字段—NumberRealEstateLoansOrLines

分析单个字段—NumberOfDependents

# 2.3、数据分箱

仅label没分箱处理

# 2.4、特征筛选:利用IV方法

1. bin_DebtRatio cal_IV:  0.0595
2. bin_MonthlyIncome cal_IV:  0.0562
3. bin_RevolvingUtilizationOfUnsecuredLines cal_IV:  1.0596
4. bin_NumberOfOpenCreditLinesAndLoans cal_IV:  0.048
5. bin_NumberRealEstateLoansOrLines cal_IV:  0.0121
6. bin_age cal_IV:  0.2404
7. bin_NumberOfDependents cal_IV:  0.0145
8. bin_NumberOfTime30-59DaysPastDueNotWorse cal_IV:  0.4924
9. bin_NumberOfTime60-89DaysPastDueNotWorse cal_IV:  0.2666
10. bin_NumberOfTimes90DaysLate cal_IV:  0.4916

#  2.5、计算WOE值

# 2.5.1、基于筛选的特征,利用WOE函数把分箱转成WOE值

woe_cols: ['woe_bin_age', 'woe_bin_RevolvingUtilizationOfUnsecuredLines', 'woe_bin_NumberOfTime30-59DaysPastDueNotWorse', 'woe_bin_NumberOfTime60-89DaysPastDueNotWorse', 'woe_bin_NumberOfTimes90DaysLate']
1. ------------- age
2. <class 'pandas.core.frame.DataFrame'> df……final 
3.     features           bin       woe
4. 0       age  (40.0, 50.0]  0.228343
5. 1       age  (25.0, 40.0]  0.469547
6. 5       age   (70.0, inf] -1.132145
7. 6       age  (50.0, 60.0] -0.084782
8. 15      age  (60.0, 70.0] -0.689003
9. 19      age  (-inf, 25.0]  0.562024
10. 
11. 
12. ------------- RevolvingUtilizationOfUnsecuredLines
13. <class 'pandas.core.frame.DataFrame'> df……final 
14.                                  features               bin       woe
15. 0   RevolvingUtilizationOfUnsecuredLines  (0.699, 50708.0]  1.242254
16. 2   RevolvingUtilizationOfUnsecuredLines    (0.271, 0.699]  0.053164
17. 3   RevolvingUtilizationOfUnsecuredLines   (0.0832, 0.271] -0.866502
18. 11  RevolvingUtilizationOfUnsecuredLines  (-0.001, 0.0192] -1.286617
19. 14  RevolvingUtilizationOfUnsecuredLines  (0.0192, 0.0832] -1.447382
20. 
21. 
22. ------------- NumberOfTime30-59DaysPastDueNotWorse
23. <class 'pandas.core.frame.DataFrame'> df……final 
24.                                     features          bin       woe
25. 0      NumberOfTime30-59DaysPastDueNotWorse   (1.0, 2.0]  1.616726
26. 1      NumberOfTime30-59DaysPastDueNotWorse  (-inf, 1.0] -0.257826
27. 13     NumberOfTime30-59DaysPastDueNotWorse   (2.0, 3.0]  2.027495
28. 183    NumberOfTime30-59DaysPastDueNotWorse   (3.0, 4.0]  2.336869
29. 191    NumberOfTime30-59DaysPastDueNotWorse   (4.0, 5.0]  2.436786
30. 251    NumberOfTime30-59DaysPastDueNotWorse   (6.0, 7.0]  2.710383
31. 423    NumberOfTime30-59DaysPastDueNotWorse   (9.0, inf]  2.846431
32. 1052   NumberOfTime30-59DaysPastDueNotWorse   (5.0, 6.0]  2.750685
33. 6909   NumberOfTime30-59DaysPastDueNotWorse   (7.0, 8.0]  1.882503
34. 10822  NumberOfTime30-59DaysPastDueNotWorse   (8.0, 9.0]  1.943128
35. 
36. 
37. ------------- NumberOfTime60-89DaysPastDueNotWorse
38. <class 'pandas.core.frame.DataFrame'> df……final 
39.                                     features          bin       woe
40. 0      NumberOfTime60-89DaysPastDueNotWorse  (-inf, 1.0] -0.097990
41. 186    NumberOfTime60-89DaysPastDueNotWorse   (1.0, 2.0]  2.643431
42. 423    NumberOfTime60-89DaysPastDueNotWorse   (4.0, 5.0]  3.115848
43. 1146   NumberOfTime60-89DaysPastDueNotWorse   (2.0, 3.0]  2.901978
44. 1733   NumberOfTime60-89DaysPastDueNotWorse   (9.0, inf]  2.829466
45. 2406   NumberOfTime60-89DaysPastDueNotWorse   (3.0, 4.0]  3.121783
46. 6664   NumberOfTime60-89DaysPastDueNotWorse   (5.0, 6.0]  3.734887
47. 16642  NumberOfTime60-89DaysPastDueNotWorse   (6.0, 7.0]  2.859419
48. 23964  NumberOfTime60-89DaysPastDueNotWorse   (7.0, 8.0]  2.636275
49. 68976  NumberOfTime60-89DaysPastDueNotWorse   (8.0, 9.0]  2.636275
50. 
51. 
52. ------------- NumberOfTimes90DaysLate
53. <class 'pandas.core.frame.DataFrame'> df……final 
54.                       features          bin       woe
55. 0     NumberOfTimes90DaysLate  (-inf, 1.0] -0.176674
56. 13    NumberOfTimes90DaysLate   (2.0, 3.0]  2.947611
57. 186   NumberOfTimes90DaysLate   (1.0, 2.0]  2.632416
58. 1298  NumberOfTimes90DaysLate   (4.0, 5.0]  3.183915
59. 1713  NumberOfTimes90DaysLate   (3.0, 4.0]  3.344926
60. 1733  NumberOfTimes90DaysLate   (9.0, inf]  2.821100
61. 2910  NumberOfTimes90DaysLate   (8.0, 9.0]  3.665894
62. 3400  NumberOfTimes90DaysLate   (5.0, 6.0]  3.041740
63. 3929  NumberOfTimes90DaysLate   (6.0, 7.0]  4.124352
64. 5684  NumberOfTimes90DaysLate   (7.0, 8.0]  3.552566

# 2.5.2、解析不同bin对应woe值的一一对应情况

Variable Binning Score
NumberOfTime30-59DaysPastDueNotWorse (-inf, 1.0] 11
NumberOfTime30-59DaysPastDueNotWorse (1.0, 2.0] -70
NumberOfTime30-59DaysPastDueNotWorse (2.0, 3.0] -87
NumberOfTime30-59DaysPastDueNotWorse (3.0, 4.0] -101
NumberOfTime30-59DaysPastDueNotWorse (4.0, 5.0] -105
NumberOfTime30-59DaysPastDueNotWorse (5.0, 6.0] -119
NumberOfTime30-59DaysPastDueNotWorse (6.0, 7.0] -117
NumberOfTime30-59DaysPastDueNotWorse (7.0, 8.0] -81
NumberOfTime30-59DaysPastDueNotWorse (8.0, 9.0] -84
NumberOfTime30-59DaysPastDueNotWorse (9.0, inf] -123
NumberOfTime60-89DaysPastDueNotWorse (-inf, 1.0] 2
NumberOfTime60-89DaysPastDueNotWorse (1.0, 2.0] -66
NumberOfTime60-89DaysPastDueNotWorse (2.0, 3.0] -73
NumberOfTime60-89DaysPastDueNotWorse (3.0, 4.0] -78
NumberOfTime60-89DaysPastDueNotWorse (4.0, 5.0] -78
NumberOfTime60-89DaysPastDueNotWorse (5.0, 6.0] -94
NumberOfTime60-89DaysPastDueNotWorse (6.0, 7.0] -72
NumberOfTime60-89DaysPastDueNotWorse (7.0, 8.0] -66
NumberOfTime60-89DaysPastDueNotWorse (8.0, 9.0] -66
NumberOfTime60-89DaysPastDueNotWorse (9.0, inf] -71
NumberOfTimes90DaysLate (-inf, 1.0] 7
NumberOfTimes90DaysLate (1.0, 2.0] -107
NumberOfTimes90DaysLate (2.0, 3.0] -120
NumberOfTimes90DaysLate (3.0, 4.0] -137
NumberOfTimes90DaysLate (4.0, 5.0] -130
NumberOfTimes90DaysLate (5.0, 6.0] -124
NumberOfTimes90DaysLate (6.0, 7.0] -168
NumberOfTimes90DaysLate (7.0, 8.0] -145
NumberOfTimes90DaysLate (8.0, 9.0] -150
NumberOfTimes90DaysLate (9.0, inf] -115
RevolvingUtilizationOfUnsecuredLines (-0.001, 0.0192] 71
RevolvingUtilizationOfUnsecuredLines (0.0192, 0.0832] 80
RevolvingUtilizationOfUnsecuredLines (0.0832, 0.271] 48
RevolvingUtilizationOfUnsecuredLines (0.271, 0.699] -3
RevolvingUtilizationOfUnsecuredLines (0.699, 50708.0] -69
age (-inf, 25.0] -19
age (25.0, 40.0] -16
age (40.0, 50.0] -8
age (50.0, 60.0] 3
age (60.0, 70.0] 23
age (70.0, inf] 38

#  2.6、切分数据集:留25%作为模型的验证集

1. bad_rate:  0.06688333333333334
2. X_train.shape: (120000, 5)

# 3、逻辑回归建模

# 3.1、建立模型

1. LoR_Score: 0.9368266666666667
2. LoRC_pred_proba [0.0121424  0.15221691 0.02248172 ... 0.0528182  0.0121424  0.0952767 ]
3. LoRC_coef_lists_ 
4.  [0.46051155 0.76869053 0.59104431 0.36452944 0.56621256]

# 3.2、模型评估:计算AUC值、绘制ROC曲线、输出混淆矩阵

1. Auc_Score: 0.8226466762033763
2. [[34827   200]
3.  [ 2169   304]]

# 4、模型推理

# 4.1、设计评分卡规则表

# 4.1.1、求出两个刻度A、B:根据2个假设推导出评分卡的刻度参数A和B计算公式

650 72.13

# 4.1.2、设计评分卡规则表 :根据刻度B、对应分箱的WOE编码、模型系数,得到score_card_rule

Variable Binning Score
0 age (40.0, 50.0] -8
1 age (25.0, 40.0] -16
2 age (70.0, inf] 38
3 age (50.0, 60.0] 3
4 age (60.0, 70.0] 23
5 age (-inf, 25.0] -19
6 RevolvingUtilizationOfUnsecuredLines (0.699, 50708.0] -69
7 RevolvingUtilizationOfUnsecuredLines (0.271, 0.699] -3
8 RevolvingUtilizationOfUnsecuredLines (0.0832, 0.271] 48
9 RevolvingUtilizationOfUnsecuredLines (-0.001, 0.0192] 71
10 RevolvingUtilizationOfUnsecuredLines (0.0192, 0.0832] 80
11 NumberOfTime30-59DaysPastDueNotWorse (1.0, 2.0] -69
12 NumberOfTime30-59DaysPastDueNotWorse (-inf, 1.0] 11
13 NumberOfTime30-59DaysPastDueNotWorse (2.0, 3.0] -86
14 NumberOfTime30-59DaysPastDueNotWorse (3.0, 4.0] -100
15 NumberOfTime30-59DaysPastDueNotWorse (4.0, 5.0] -104
16 NumberOfTime30-59DaysPastDueNotWorse (6.0, 7.0] -116
17 NumberOfTime30-59DaysPastDueNotWorse (9.0, inf] -121
18 NumberOfTime30-59DaysPastDueNotWorse (5.0, 6.0] -117
19 NumberOfTime30-59DaysPastDueNotWorse (7.0, 8.0] -80
20 NumberOfTime30-59DaysPastDueNotWorse (8.0, 9.0] -83
21 NumberOfTime60-89DaysPastDueNotWorse (-inf, 1.0] 3
22 NumberOfTime60-89DaysPastDueNotWorse (1.0, 2.0] -70
23 NumberOfTime60-89DaysPastDueNotWorse (4.0, 5.0] -82
24 NumberOfTime60-89DaysPastDueNotWorse (2.0, 3.0] -76
25 NumberOfTime60-89DaysPastDueNotWorse (9.0, inf] -74
26 NumberOfTime60-89DaysPastDueNotWorse (3.0, 4.0] -82
27 NumberOfTime60-89DaysPastDueNotWorse (5.0, 6.0] -98
28 NumberOfTime60-89DaysPastDueNotWorse (6.0, 7.0] -75
29 NumberOfTime60-89DaysPastDueNotWorse (7.0, 8.0] -69
30 NumberOfTime60-89DaysPastDueNotWorse (8.0, 9.0] -69
31 NumberOfTimes90DaysLate (-inf, 1.0] 7
32 NumberOfTimes90DaysLate (2.0, 3.0] -120
33 NumberOfTimes90DaysLate (1.0, 2.0] -108
34 NumberOfTimes90DaysLate (4.0, 5.0] -130
35 NumberOfTimes90DaysLate (3.0, 4.0] -137
36 NumberOfTimes90DaysLate (9.0, inf] -115
37 NumberOfTimes90DaysLate (8.0, 9.0] -150
38 NumberOfTimes90DaysLate (5.0, 6.0] -124
39 NumberOfTimes90DaysLate (6.0, 7.0] -168
40 NumberOfTimes90DaysLate (7.0, 8.0] -145

# 4.2、结合刻度A计算样本评分卡得分

# 4.2.1、随机选取12个样本(6个好的和6个坏的)并计算每个样本的总评分并对比Label,可验证模型效果

# 4.2.2、结合刻度A计算样本评分卡得分

age RevolvingUtilization
OfUnsecuredLines
NumberOfTime
30-59Days
PastDueNotWorse
NumberOfTime60-89
DaysPastDueNotWorse
NumberOf
Times90
DaysLate
score
44377 55 0.081686933 0 0 0 754
25143 47 0.9999999 0 0 1 594
67429 54 0.015170898 0 0 0 745
66689 26 0.252252252 0 0 0 703
42656 40 0.916334661 1 0 0 586
81903 65 0.091477937 0 0 0 742
age Revolving
UtilizationOf
UnsecuredLines
NumberOfTime
30-59Days
PastDueNotWorse
NumberOfTime60-89
DaysPastDueNotWorse
NumberOf
Times90
DaysLate
score
111052 30 0.9999999 0 4 2 386
30582 30 0.9999999 0 0 0 586
23677 43 0.68756082 0 0 0 660
87669 27 0.9999999 0 1 1 586
46920 50 0.442370466 0 0 0 660
78952 48 0.40781316 0 0 0 660

# 4.3、对比测试样本得分及其对应标签,进而设计评审策略

1. 44377 754.0 --------- 直接接受!
2. age                                      47.0
3. RevolvingUtilizationOfUnsecuredLines      1.0
4. NumberOfTime30-59DaysPastDueNotWorse      0.0
5. NumberOfTime60-89DaysPastDueNotWorse      0.0
6. NumberOfTimes90DaysLate                   1.0
7. score                                   594.0
8. Name: 25143, dtype: float64
9. 25143 594.0 --------- 人工审核!
10. age                                      54.000000
11. RevolvingUtilizationOfUnsecuredLines      0.015171
12. NumberOfTime30-59DaysPastDueNotWorse      0.000000
13. NumberOfTime60-89DaysPastDueNotWorse      0.000000
14. NumberOfTimes90DaysLate                   0.000000
15. score                                   745.000000
16. Name: 67429, dtype: float64
17. 67429 745.0 --------- 直接接受!
18. age                                      26.000000
19. RevolvingUtilizationOfUnsecuredLines      0.252252
20. NumberOfTime30-59DaysPastDueNotWorse      0.000000
21. NumberOfTime60-89DaysPastDueNotWorse      0.000000
22. NumberOfTimes90DaysLate                   0.000000
23. score                                   703.000000
24. Name: 66689, dtype: float64
25. 66689 703.0 --------- 直接接受!
26. age                                      40.000000
27. RevolvingUtilizationOfUnsecuredLines      0.916335
28. NumberOfTime30-59DaysPastDueNotWorse      1.000000
29. NumberOfTime60-89DaysPastDueNotWorse      0.000000
30. NumberOfTimes90DaysLate                   0.000000
31. score                                   586.000000
32. Name: 42656, dtype: float64
33. 42656 586.0 --------- 人工审核!
34. age                                      65.000000
35. RevolvingUtilizationOfUnsecuredLines      0.091478
36. NumberOfTime30-59DaysPastDueNotWorse      0.000000
37. NumberOfTime60-89DaysPastDueNotWorse      0.000000
38. NumberOfTimes90DaysLate                   0.000000
39. score                                   742.000000
40. Name: 81903, dtype: float64
41. 81903 742.0 --------- 直接接受!


相关文章
|
3月前
|
机器学习/深度学习 算法 数据挖掘
K-means聚类算法是机器学习中常用的一种聚类方法,通过将数据集划分为K个簇来简化数据结构
K-means聚类算法是机器学习中常用的一种聚类方法,通过将数据集划分为K个簇来简化数据结构。本文介绍了K-means算法的基本原理,包括初始化、数据点分配与簇中心更新等步骤,以及如何在Python中实现该算法,最后讨论了其优缺点及应用场景。
193 6
|
20天前
|
机器学习/深度学习 算法 数据可视化
利用SVM(支持向量机)分类算法对鸢尾花数据集进行分类
本文介绍了如何使用支持向量机(SVM)算法对鸢尾花数据集进行分类。作者通过Python的sklearn库加载数据,并利用pandas、matplotlib等工具进行数据分析和可视化。
136 70
|
5天前
|
机器学习/深度学习 算法
扩散模型=进化算法!生物学大佬用数学揭示本质
在机器学习与生物学交叉领域,Tufts和Harvard大学研究人员揭示了扩散模型与进化算法的深刻联系。研究表明,扩散模型本质上是一种进化算法,通过逐步去噪生成数据点,类似于进化中的变异和选择机制。这一发现不仅在理论上具有重要意义,还提出了扩散进化方法,能够高效识别多解、处理高维复杂参数空间,并显著减少计算步骤,为图像生成、视频合成及神经网络优化等应用带来广泛潜力。论文地址:https://arxiv.org/pdf/2410.02543。
31 21
|
11天前
|
人工智能 算法 搜索推荐
单纯接入第三方模型就无需算法备案了么?
随着人工智能的发展,企业接入第三方模型提升业务能力的现象日益普遍,但算法备案问题引发诸多讨论。根据相关法规,无论使用自研或第三方模型,只要涉及向中国境内公众提供算法推荐服务,企业均需履行备案义务。这不仅因为服务性质未变,风险依然存在,也符合监管要求。备案内容涵盖模型基本信息、算法优化目标等,且需动态管理。未备案可能面临法律和运营风险。建议企业提前规划、合规管理和积极沟通,确保合法合规运营。
|
1月前
|
机器学习/深度学习 人工智能 算法
机器学习算法的优化与改进:提升模型性能的策略与方法
机器学习算法的优化与改进:提升模型性能的策略与方法
268 13
机器学习算法的优化与改进:提升模型性能的策略与方法
|
2月前
|
算法
基于模糊PI控制算法的龙格库塔CSTR模型控制系统simulink建模与仿真
本项目基于MATLAB2022a,采用模糊PI控制算法结合龙格-库塔方法,对CSTR模型进行Simulink建模与仿真。通过模糊控制处理误差及变化率,实现精确控制。核心在于将模糊逻辑与经典数值方法融合,提升系统性能。
|
2月前
|
存储 算法
基于HMM隐马尔可夫模型的金融数据预测算法matlab仿真
本项目基于HMM模型实现金融数据预测,包括模型训练与预测两部分。在MATLAB2022A上运行,通过计算状态转移和观测概率预测未来值,并绘制了预测值、真实值及预测误差的对比图。HMM模型适用于金融市场的时间序列分析,能够有效捕捉隐藏状态及其转换规律,为金融预测提供有力工具。
|
3月前
|
机器学习/深度学习 算法 Python
随机森林算法是一种强大的集成学习方法,通过构建多个决策树并综合其结果进行预测。
随机森林算法是一种强大的集成学习方法,通过构建多个决策树并综合其结果进行预测。本文详细介绍了随机森林的工作原理、性能优势、影响因素及调优方法,并提供了Python实现示例。适用于分类、回归及特征选择等多种应用场景。
112 7
|
3月前
|
机器学习/深度学习 人工智能 算法
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
手写数字识别系统,使用Python作为主要开发语言,基于深度学习TensorFlow框架,搭建卷积神经网络算法。并通过对数据集进行训练,最后得到一个识别精度较高的模型。并基于Flask框架,开发网页端操作平台,实现用户上传一张图片识别其名称。
179 0
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
|
3月前
|
JSON 算法 数据挖掘
基于图论算法有向图PageRank与无向图Louvain算法构建指令的方式方法 用于支撑qwen agent中的统计相关组件
利用图序列进行数据解读,主要包括节点序列分析、边序列分析以及结合节点和边序列的综合分析。节点序列分析涉及节点度分析(如入度、出度、度中心性)、节点属性分析(如品牌、价格等属性的分布与聚类)、节点标签分析(如不同标签的分布及标签间的关联)。边序列分析则关注边的权重分析(如关联强度)、边的类型分析(如管理、协作等关系)及路径分析(如最短路径计算)。结合节点和边序列的分析,如子图挖掘和图的动态分析,可以帮助深入理解图的结构和功能。例如,通过子图挖掘可以发现具有特定结构的子图,而图的动态分析则能揭示图随时间的变化趋势。这些分析方法结合使用,能够从多个角度全面解读图谱数据,为决策提供有力支持。
146 0