目录
基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解
# T2、手动分箱—利用自定义breaks_list参数即可
相关文章
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解代码实现
基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解
# 1、定义数据集
# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。
数据集:UCI Machine Learning Repository: Data Set
# 1.1、查看部分数据
status.of.existing.checking.account | duration.in.month | credit.history | purpose | credit.amount | savings.account.and.bonds | present.employment.since | installment.rate.in.percentage.of.disposable.income | personal.status.and.sex | other.debtors.or.guarantors | present.residence.since | property | age.in.years | other.installment.plans | housing | number.of.existing.credits.at.this.bank | job | number.of.people.being.liable.to.provide.maintenance.for | telephone | foreign.worker | creditability | |
0 | ... < 0 DM | 6 | critical account/ other credits existing (not at this bank) | radio/television | 1169 | unknown/ no savings account | ... >= 7 years | 4 | male : divorced/separated | none | 4 | real estate | 67 | none | own | 2 | skilled employee / official | 1 | yes, registered under the customers name | yes | good |
1 | 0 <= ... < 200 DM | 48 | existing credits paid back duly till now | radio/television | 5951 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 2 | real estate | 22 | none | own | 1 | skilled employee / official | 1 | none | yes | bad |
2 | no checking account | 12 | critical account/ other credits existing (not at this bank) | education | 2096 | ... < 100 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | none | 3 | real estate | 49 | none | own | 1 | unskilled - resident | 2 | none | yes | good |
3 | ... < 0 DM | 42 | existing credits paid back duly till now | furniture/equipment | 7882 | ... < 100 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | guarantor | 4 | building society savings agreement/ life insurance | 45 | none | for free | 1 | skilled employee / official | 2 | none | yes | good |
4 | ... < 0 DM | 24 | delay in paying off in the past | car (new) | 4870 | ... < 100 DM | 1 <= ... < 4 years | 3 | male : divorced/separated | none | 4 | unknown / no property | 53 | none | for free | 2 | skilled employee / official | 2 | none | yes | bad |
5 | no checking account | 36 | existing credits paid back duly till now | education | 9055 | unknown/ no savings account | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 4 | unknown / no property | 35 | none | for free | 1 | unskilled - resident | 2 | yes, registered under the customers name | yes | good |
6 | no checking account | 24 | existing credits paid back duly till now | furniture/equipment | 2835 | 500 <= ... < 1000 DM | ... >= 7 years | 3 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 53 | none | own | 1 | skilled employee / official | 1 | none | yes | good |
7 | 0 <= ... < 200 DM | 36 | existing credits paid back duly till now | car (used) | 6948 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 35 | none | rent | 1 | management/ self-employed/ highly qualified employee/ officer | 1 | yes, registered under the customers name | yes | good |
8 | no checking account | 12 | existing credits paid back duly till now | radio/television | 3059 | ... >= 1000 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | none | 4 | real estate | 61 | none | own | 1 | unskilled - resident | 1 | none | yes | good |
9 | 0 <= ... < 200 DM | 30 | critical account/ other credits existing (not at this bank) | car (new) | 5234 | ... < 100 DM | unemployed | 4 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 28 | none | own | 2 | management/ self-employed/ highly qualified employee/ officer | 1 | none | yes | bad |
10 | 0 <= ... < 200 DM | 12 | existing credits paid back duly till now | car (new) | 1295 | ... < 100 DM | ... < 1 year | 3 | male : divorced/separated | none | 1 | car or other, not in attribute Savings account/bonds | 25 | none | rent | 1 | skilled employee / official | 1 | none | yes | bad |
11 | ... < 0 DM | 48 | existing credits paid back duly till now | business | 4308 | ... < 100 DM | ... < 1 year | 3 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 24 | none | rent | 1 | skilled employee / official | 1 | none | yes | bad |
12 | 0 <= ... < 200 DM | 12 | existing credits paid back duly till now | radio/television | 1567 | ... < 100 DM | 1 <= ... < 4 years | 1 | male : divorced/separated | none | 1 | car or other, not in attribute Savings account/bonds | 22 | none | own | 1 | skilled employee / official | 1 | yes, registered under the customers name | yes | good |
13 | ... < 0 DM | 24 | critical account/ other credits existing (not at this bank) | car (new) | 1199 | ... < 100 DM | ... >= 7 years | 4 | male : divorced/separated | none | 4 | car or other, not in attribute Savings account/bonds | 60 | none | own | 2 | unskilled - resident | 1 | none | yes | bad |
14 | ... < 0 DM | 15 | existing credits paid back duly till now | car (new) | 1403 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 4 | car or other, not in attribute Savings account/bonds | 28 | none | rent | 1 | skilled employee / official | 1 | none | yes | good |
15 | ... < 0 DM | 24 | existing credits paid back duly till now | radio/television | 1282 | 100 <= ... < 500 DM | 1 <= ... < 4 years | 4 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 32 | none | own | 1 | unskilled - resident | 1 | none | yes | bad |
16 | no checking account | 24 | critical account/ other credits existing (not at this bank) | radio/television | 2424 | unknown/ no savings account | ... >= 7 years | 4 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 53 | none | own | 2 | skilled employee / official | 1 | none | yes | good |
17 | ... < 0 DM | 30 | no credits taken/ all credits paid back duly | business | 8072 | unknown/ no savings account | ... < 1 year | 2 | male : divorced/separated | none | 3 | car or other, not in attribute Savings account/bonds | 25 | bank | own | 3 | skilled employee / official | 1 | none | yes | good |
18 | 0 <= ... < 200 DM | 24 | existing credits paid back duly till now | car (used) | 12579 | ... < 100 DM | ... >= 7 years | 4 | male : divorced/separated | none | 2 | unknown / no property | 44 | none | for free | 1 | management/ self-employed/ highly qualified employee/ officer | 1 | yes, registered under the customers name | yes | bad |
19 | no checking account | 24 | existing credits paid back duly till now | radio/television | 3430 | 500 <= ... < 1000 DM | ... >= 7 years | 3 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 31 | none | own | 1 | skilled employee / official | 2 | yes, registered under the customers name | yes | good |
# 1.2、统计所有变量类型、个数等信息
1. <class 'pandas.core.frame.DataFrame'> 2. RangeIndex: 1000 entries, 0 to 999 3. Data columns (total 21 columns): 4. # Column Non-Null Count Dtype 5. --- ------ -------------- ----- 6. 0 status.of.existing.checking.account 1000 non-null category 7. 1 duration.in.month 1000 non-null int64 8. 2 credit.history 1000 non-null category 9. 3 purpose 1000 non-null object 10. 4 credit.amount 1000 non-null int64 11. 5 savings.account.and.bonds 1000 non-null category 12. 6 present.employment.since 1000 non-null category 13. 7 installment.rate.in.percentage.of.disposable.income 1000 non-null int64 14. 8 personal.status.and.sex 1000 non-null category 15. 9 other.debtors.or.guarantors 1000 non-null category 16. 10 present.residence.since 1000 non-null int64 17. 11 property 1000 non-null category 18. 12 age.in.years 1000 non-null int64 19. 13 other.installment.plans 1000 non-null category 20. 14 housing 1000 non-null category 21. 15 number.of.existing.credits.at.this.bank 1000 non-null int64 22. 16 job 1000 non-null category 23. 17 number.of.people.being.liable.to.provide.maintenance.for 1000 non-null int64 24. 18 telephone 1000 non-null category 25. 19 foreign.worker 1000 non-null category 26. 20 creditability 1000 non-null object 27. dtypes: category(12), int64(7), object(2) 28. memory usage: 84.0+ KB
# 2、数据预处理
# 2.1、变量筛选
#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y
1. var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95, 2. identical_limit=0.95, var_rm=None, var_kp=None, 3. return_rm_reason=False, positive='bad|1') 4. ''' 5. 函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。 6. 体参数如下:可跳到该函数查询 7. varrm:可设置强制保留的变量,默认为空; 8. varkp:可设置强制剔除的变量,默认为空; 9. return_rm_reason:可设置是否返回剔除原因,默认为不返回(False); 10. positive:可设置坏样本对应的值,默认为“bad|1”。 11. '''
age.in.years | other.debtors.or.guarantors | savings.account.and.bonds | credit.amount | installment.rate.in.percentage.of.disposable.income | status.of.existing.checking.account | credit.history | present.employment.since | purpose | housing | property | other.installment.plans | duration.in.month | creditability | |
0 | 67 | none | unknown/ no savings account | 1169 | 4 | ... < 0 DM | critical account/ other credits existing (not at this bank) | ... >= 7 years | radio/television | own | real estate | none | 6 | 0 |
1 | 22 | none | ... < 100 DM | 5951 | 2 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | real estate | none | 48 | 1 |
2 | 49 | none | ... < 100 DM | 2096 | 2 | no checking account | critical account/ other credits existing (not at this bank) | 4 <= ... < 7 years | education | own | real estate | none | 12 | 0 |
3 | 45 | guarantor | ... < 100 DM | 7882 | 2 | ... < 0 DM | existing credits paid back duly till now | 4 <= ... < 7 years | furniture/equipment | for free | building society savings agreement/ life insurance | none | 42 | 0 |
4 | 53 | none | ... < 100 DM | 4870 | 3 | ... < 0 DM | delay in paying off in the past | 1 <= ... < 4 years | car (new) | for free | unknown / no property | none | 24 | 1 |
5 | 35 | none | unknown/ no savings account | 9055 | 2 | no checking account | existing credits paid back duly till now | 1 <= ... < 4 years | education | for free | unknown / no property | none | 36 | 0 |
6 | 53 | none | 500 <= ... < 1000 DM | 2835 | 3 | no checking account | existing credits paid back duly till now | ... >= 7 years | furniture/equipment | own | building society savings agreement/ life insurance | none | 24 | 0 |
7 | 35 | none | ... < 100 DM | 6948 | 2 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | car (used) | rent | car or other, not in attribute Savings account/bonds | none | 36 | 0 |
8 | 61 | none | ... >= 1000 DM | 3059 | 2 | no checking account | existing credits paid back duly till now | 4 <= ... < 7 years | radio/television | own | real estate | none | 12 | 0 |
9 | 28 | none | ... < 100 DM | 5234 | 4 | 0 <= ... < 200 DM | critical account/ other credits existing (not at this bank) | unemployed | car (new) | own | car or other, not in attribute Savings account/bonds | none | 30 | 1 |
10 | 25 | none | ... < 100 DM | 1295 | 3 | 0 <= ... < 200 DM | existing credits paid back duly till now | ... < 1 year | car (new) | rent | car or other, not in attribute Savings account/bonds | none | 12 | 1 |
11 | 24 | none | ... < 100 DM | 4308 | 3 | ... < 0 DM | existing credits paid back duly till now | ... < 1 year | business | rent | building society savings agreement/ life insurance | none | 48 | 1 |
12 | 22 | none | ... < 100 DM | 1567 | 1 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 12 | 0 |
13 | 60 | none | ... < 100 DM | 1199 | 4 | ... < 0 DM | critical account/ other credits existing (not at this bank) | ... >= 7 years | car (new) | own | car or other, not in attribute Savings account/bonds | none | 24 | 1 |
14 | 28 | none | ... < 100 DM | 1403 | 2 | ... < 0 DM | existing credits paid back duly till now | 1 <= ... < 4 years | car (new) | rent | car or other, not in attribute Savings account/bonds | none | 15 | 0 |
15 | 32 | none | 100 <= ... < 500 DM | 1282 | 4 | ... < 0 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 24 | 1 |
16 | 53 | none | unknown/ no savings account | 2424 | 4 | no checking account | critical account/ other credits existing (not at this bank) | ... >= 7 years | radio/television | own | building society savings agreement/ life insurance | none | 24 | 0 |
17 | 25 | none | unknown/ no savings account | 8072 | 2 | ... < 0 DM | no credits taken/ all credits paid back duly | ... < 1 year | business | own | car or other, not in attribute Savings account/bonds | bank | 30 | 0 |
18 | 44 | none | ... < 100 DM | 12579 | 4 | 0 <= ... < 200 DM | existing credits paid back duly till now | ... >= 7 years | car (used) | for free | unknown / no property | none | 24 | 1 |
19 | 31 | none | 500 <= ... < 1000 DM | 3430 | 3 | no checking account | existing credits paid back duly till now | ... >= 7 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 24 | 0 |
# 2.2、分析Woe变量分箱
# T1、自动分箱—利用woebin()函数
1. woebin(dt, y, x=None, 2. var_skip=None, breaks_list=None, special_values=None, 3. stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8, 4. # min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8, 5. positive="bad|1", no_cores=None, print_step=0, method="tree", 6. ignore_const_cols=True, ignore_datetime_cols=True, 7. check_cate_num=True, replace_blank=True, 8. save_breaks_list=None, **kwargs) 9. ''' 10. 函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。 11. 具体参数如下:可跳到该函数查询 12. var_skip: 设置需要跳过分箱操作的变量; 13. breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理; 14. special_values: 设置需要单独分箱的值,默认为空; 15. count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05; 16. stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1; 17. bin_num_limit: 该参数为整数,代表最大分箱数。 18. positive: 指定样本中正样本对应的标签,默认为"bad|1"; 19. no_cores: 设置用于并行计算的 CPU 数目; 20. print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息; 21. method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree"; 22. ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列; 23. ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列; 24. check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度; 25. replace_blank: 设置是否将空值填为None,默认为True。 26. '''
data_df_woebin['age.in.years']
variable | bin | count | count_distr | good | bad | badprob | woe | bin_iv | total_iv | breaks | is_special_values | |
0 | age.in.years | [-inf,26.0) | 190 | 0.19 | 110 | 80 | 0.421052632 | 0.528844129 | 0.057921024 | 0.130498542 | 26 | FALSE |
1 | age.in.years | [26.0,28.0) | 101 | 0.101 | 74 | 27 | 0.267326733 | -0.160930367 | 0.002528906 | 0.130498542 | 28 | FALSE |
2 | age.in.years | [28.0,35.0) | 257 | 0.257 | 172 | 85 | 0.3307393 | 0.14245464 | 0.005359008 | 0.130498542 | 35 | FALSE |
3 | age.in.years | [35.0,37.0) | 79 | 0.079 | 67 | 12 | 0.151898734 | -0.872488109 | 0.048610052 | 0.130498542 | 37 | FALSE |
4 | age.in.years | [37.0,inf) | 373 | 0.373 | 277 | 96 | 0.257372654 | -0.212371454 | 0.016079553 | 0.130498542 | inf | FALSE |
# T2、手动分箱—利用自定义breaks_list参数即可
data_df_woebin_DIY['age.in.years']
variable | bin | count | count_distr | good | bad | badprob | woe | bin_iv | total_iv | breaks | is_special_values | |
0 | age.in.years | [-inf,25.0) | 149 | 0.149 | 88 | 61 | 0.409395973 | 0.48083491 | 0.037321948 | 0.086291678 | 25 | FALSE |
1 | age.in.years | [25.0,35.0) | 399 | 0.399 | 268 | 131 | 0.328320802 | 0.131508203 | 0.007076394 | 0.086291678 | 35 | FALSE |
2 | age.in.years | [35.0,45.0) | 251 | 0.251 | 193 | 58 | 0.231075697 | -0.354949318 | 0.029241063 | 0.086291678 | 45 | FALSE |
3 | age.in.years | [45.0,inf) | 201 | 0.201 | 151 | 50 | 0.248756219 | -0.257958971 | 0.012652273 | 0.086291678 | inf | FALSE |
# 2.3、分析变量分箱后可视化—观察是否存在单调性
对各变量分箱的count distribution和bad probability进行可视化
# 2.4、对变量执行woe分箱变换
creditability | savings.account.and.bonds_woe | housing_woe | age.in.years_woe | other.debtors.or.guarantors_woe | purpose_woe | credit.amount_woe | credit.history_woe | installment.rate.in.percentage.of.disposable.income_woe | other.installment.plans_woe | present.employment.since_woe | property_woe | status.of.existing.checking.account_woe | duration.in.month_woe | |
0 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | 0.033661283 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | -0.461034959 | 0.614203978 | -1.312186389 |
1 | 1 | 0.271357844 | -0.194156014 | 0.48083491 | -0.000525072 | -0.410062817 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | -0.461034959 | 0.614203978 | 1.134979933 |
2 | 0 | 0.271357844 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | -0.258307464 | -0.733740578 | -0.190472769 | -0.121178625 | -0.394415272 | -0.461034959 | -1.176263223 | -0.346624608 |
3 | 0 | 0.271357844 | 0.472604411 | -0.257958971 | 0.005115101 | 0.279920067 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | -0.394415272 | 0.028573372 | 0.614203978 | 0.524524468 |
4 | 1 | 0.271357844 | 0.472604411 | -0.257958971 | -0.000525072 | 0.279920067 | 0.390539458 | 0.085157808 | -0.064538521 | -0.121178625 | 0.032103245 | 0.586082361 | 0.614203978 | 0.108688306 |
5 | 0 | -0.762140052 | 0.472604411 | -0.354949318 | -0.000525072 | 0.279920067 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.586082361 | -1.176263223 | 0.524524468 |
6 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | -0.258307464 | 0.088318617 | -0.064538521 | -0.121178625 | -0.235566071 | 0.028573372 | -1.176263223 | 0.108688306 |
7 | 0 | 0.271357844 | 0.40444522 | -0.354949318 | -0.000525072 | -0.805625164 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | 0.524524468 |
8 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | -0.258307464 | 0.088318617 | -0.190472769 | -0.121178625 | -0.394415272 | -0.461034959 | -1.176263223 | -0.346624608 |
9 | 1 | 0.271357844 | -0.194156014 | 0.131508203 | -0.000525072 | 0.279920067 | 0.390539458 | -0.733740578 | 0.157300289 | -0.121178625 | 0.431137463 | 0.034191365 | 0.614203978 | 0.108688306 |
10 | 1 | 0.271357844 | 0.40444522 | 0.131508203 | -0.000525072 | 0.279920067 | 0.033661283 | 0.088318617 | -0.064538521 | -0.121178625 | 0.431137463 | 0.034191365 | 0.614203978 | -0.346624608 |
11 | 1 | 0.271357844 | 0.40444522 | 0.48083491 | -0.000525072 | 0.279920067 | 0.390539458 | 0.088318617 | -0.064538521 | -0.121178625 | 0.431137463 | 0.028573372 | 0.614203978 | 1.134979933 |
12 | 0 | 0.271357844 | -0.194156014 | 0.48083491 | -0.000525072 | -0.410062817 | -0.7282385 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | -0.346624608 |
13 | 1 | 0.271357844 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | 0.033661283 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | 0.034191365 | 0.614203978 | 0.108688306 |
14 | 0 | 0.271357844 | 0.40444522 | 0.131508203 | -0.000525072 | 0.279920067 | -0.7282385 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | -0.346624608 |
15 | 1 | 0.13955188 | -0.194156014 | 0.131508203 | -0.000525072 | -0.410062817 | 0.033661283 | 0.088318617 | 0.157300289 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | 0.108688306 |
16 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | -0.258307464 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | 0.028573372 | -1.176263223 | 0.108688306 |
17 | 0 | -0.762140052 | -0.194156014 | 0.131508203 | -0.000525072 | 0.279920067 | 0.390539458 | 1.234070835 | -0.190472769 | 0.477550835 | 0.431137463 | 0.034191365 | 0.614203978 | 0.108688306 |
18 | 1 | 0.271357844 | 0.472604411 | -0.354949318 | -0.000525072 | -0.805625164 | 1.170071253 | 0.088318617 | 0.157300289 | -0.121178625 | -0.235566071 | 0.586082361 | 0.614203978 | 0.108688306 |
19 | 0 | -0.762140052 | -0.194156014 | 0.131508203 | -0.000525072 | -0.410062817 | -0.258307464 | 0.088318617 | -0.064538521 | -0.121178625 | -0.235566071 | 0.034191365 | -1.176263223 | 0.108688306 |
# 3、模型训练
# 3.1、切分数据集
train2woe输出如下所示
age.in.years_woe | credit.amount_woe | credit.history_woe | creditability | duration.in.month_woe | housing_woe | installment.rate.in.percentage.of.disposable.income_woe | other.debtors.or.guarantors_woe | other.installment.plans_woe | present.employment.since_woe | property_woe | purpose_woe | savings.account.and.bonds_woe | status.of.existing.checking.account_woe | |
0 | -0.257958971 | 0.033661283 | -0.733740578 | 0 | -1.312186389 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | -0.461034959 | -0.410062817 | -0.762140052 | 0.614203978 |
1 | 0.48083491 | 0.390539458 | 0.088318617 | 1 | 1.134979933 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | 0.032103245 | -0.461034959 | -0.410062817 | 0.271357844 | 0.614203978 |
2 | -0.257958971 | -0.258307464 | -0.733740578 | 0 | -0.346624608 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | -0.394415272 | -0.461034959 | 0.279920067 | 0.271357844 | -1.176263223 |
6 | -0.257958971 | -0.258307464 | 0.088318617 | 0 | 0.108688306 | -0.194156014 | -0.064538521 | -0.000525072 | -0.121178625 | -0.235566071 | 0.028573372 | 0.279920067 | -0.762140052 | -1.176263223 |
7 | -0.354949318 | 0.390539458 | 0.088318617 | 0 | 0.524524468 | 0.40444522 | -0.190472769 | -0.000525072 | -0.121178625 | 0.032103245 | 0.034191365 | -0.805625164 | 0.271357844 | 0.614203978 |
8 | -0.257958971 | -0.258307464 | 0.088318617 | 0 | -0.346624608 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | -0.394415272 | -0.461034959 | -0.410062817 | -0.762140052 | -1.176263223 |
11 | 0.48083491 | 0.390539458 | 0.088318617 | 1 | 1.134979933 | 0.40444522 | -0.064538521 | -0.000525072 | -0.121178625 | 0.431137463 | 0.028573372 | 0.279920067 | 0.271357844 | 0.614203978 |
13 | -0.257958971 | 0.033661283 | -0.733740578 | 1 | 0.108688306 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.034191365 | 0.279920067 | 0.271357844 | 0.614203978 |
16 | -0.257958971 | -0.258307464 | -0.733740578 | 0 | 0.108688306 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.028573372 | -0.410062817 | -0.762140052 | -1.176263223 |
18 | -0.354949318 | 1.170071253 | 0.088318617 | 1 | 0.108688306 | 0.472604411 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.586082361 | -0.805625164 | 0.271357844 | 0.614203978 |
19 | 0.131508203 | -0.258307464 | 0.088318617 | 0 | 0.108688306 | -0.194156014 | -0.064538521 | -0.000525072 | -0.121178625 | -0.235566071 | 0.034191365 | -0.410062817 | -0.762140052 | -1.176263223 |
# 3.2、划分自变量和因变量
# 3.3、模型建立、训练、预测:建立逻辑回归模型
coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811
0. 0.23090027 0.7965086 0.22792681 1.07066195 0.83836441
0.72843684]]
intercept_: [-0.83437247]
# 3.4、模型评估
利用perf_eva函数进行评估
1. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], 2. show_plot=True, positive="bad|1", seed=186) 3. ''' 4. 函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr 5. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186) 6. perf_eva()函数可以从 7. '''
# 4、模型上线并监控
# 4.1、模型推理—计算信用得分
利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分
1. scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False) 2. ''' 3. 函数功能:概率进行映射,转换成评分卡得分 4. 具体参数如下 5. bins:分箱信息。woebin()返回的结果。 6. model:模型对象。 7. points0:基础分,默认为600。 odds:好坏比,默认为1:19 8. pdo:比率翻番的倍数,默认为50。 9. basepoints_eq0:如果为True,则将基础分分散到每个变量中。 10. '''
print('card_dict_age.in.years \n',card_dict['age.in.years'])
print('card_dict_credit.amount \n',card_dict['credit.amount'])
print('card_dict_credit.historyt \n',card_dict['credit.history'])
print('card_dict_duration.in.month \n',card_dict['duration.in.month'])
print('card_dict_housing \n',card_dict['housing'])
1. card_dict_age.in.years 2. variable bin points 3. 10 age.in.years [-inf,25.0) -12.0 4. 11 age.in.years [25.0,35.0) -3.0 5. 12 age.in.years [35.0,45.0) 9.0 6. 13 age.in.years [45.0,inf) 6.0 7. card_dict_credit.amount 8. variable bin points 9. 31 credit.amount [-inf,1400.0) -2.0 10. 32 credit.amount [1400.0,1800.0) 41.0 11. 33 credit.amount [1800.0,4000.0) 15.0 12. 34 credit.amount [4000.0,9200.0) -22.0 13. 35 credit.amount [9200.0,inf) -66.0 14. card_dict_credit.historyt 15. variable bin points 16. 17 credit.history no credits taken/ all credits paid back duly%,... -51.0 17. 18 credit.history existing credits paid back duly till now -4.0 18. 19 credit.history delay in paying off in the past -4.0 19. 20 credit.history critical account/ other credits existing (not ... 30.0 20. card_dict_duration.in.month 21. variable bin points 22. 23 duration.in.month [-inf,8.0) 85.0 23. 24 duration.in.month [8.0,16.0) 22.0 24. 25 duration.in.month [16.0,34.0) -7.0 25. 26 duration.in.month [34.0,44.0) -34.0 26. 27 duration.in.month [44.0,inf) -74.0 27. card_dict_housing 28. variable bin points 29. 42 housing rent -20.0 30. 43 housing own 10.0 31. 44 housing for free -23.0
# 4.2、线上模型评估—评分稳定性评估PSI
# 利用scorecard_ply()函数计算train和test数据集的信用分数
1. scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True, 2. var_kp=None): 3. ''' 4. 函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。 5. 6. dt:原始数据 7. card: 从`scorecard`生成的记分卡。 8. only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。 9. print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。 10. replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。 11. var_kp:强制保留变量的名称,如id列。 默认为无。 12. '''