ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

简介: ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解


目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 1.1、查看部分数据

# 1.2、统计所有变量类型、个数等信息

# 2、数据预处理

# 2.1、变量筛选

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

# T2、手动分箱—利用自定义breaks_list参数即可

# 2.3、分析变量分箱后可视化—观察是否存在单调性

# 2.4、对变量执行woe分箱变换

# 3、模型训练

# 3.1、切分数据集

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

# 3.4、模型评估

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

# 4.2、线上模型评估—评分稳定性评估PSI


相关文章

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解代码实现

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。

数据集UCI Machine Learning Repository: Data Set

# 1.1、查看部分数据

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income personal.status.and.sex other.debtors.or.guarantors present.residence.since property age.in.years other.installment.plans housing number.of.existing.credits.at.this.bank job number.of.people.being.liable.to.provide.maintenance.for telephone foreign.worker creditability
0 ... < 0 DM 6 critical account/ other credits existing (not at this bank) radio/television 1169 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 real estate 67 none own 2 skilled employee / official 1 yes, registered under the customers name yes good
1 0 <= ... < 200 DM 48 existing credits paid back duly till now radio/television 5951 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 real estate 22 none own 1 skilled employee / official 1 none yes bad
2 no checking account 12 critical account/ other credits existing (not at this bank) education 2096 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated none 3 real estate 49 none own 1 unskilled - resident 2 none yes good
3 ... < 0 DM 42 existing credits paid back duly till now furniture/equipment 7882 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated guarantor 4 building society savings agreement/ life insurance 45 none for free 1 skilled employee / official 2 none yes good
4 ... < 0 DM 24 delay in paying off in the past car (new) 4870 ... < 100 DM 1 <= ... < 4 years 3 male : divorced/separated none 4 unknown / no property 53 none for free 2 skilled employee / official 2 none yes bad
5 no checking account 36 existing credits paid back duly till now education 9055 unknown/ no savings account 1 <= ... < 4 years 2 male : divorced/separated none 4 unknown / no property 35 none for free 1 unskilled - resident 2 yes, registered under the customers name yes good
6 no checking account 24 existing credits paid back duly till now furniture/equipment 2835 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 1 skilled employee / official 1 none yes good
7 0 <= ... < 200 DM 36 existing credits paid back duly till now car (used) 6948 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 35 none rent 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes good
8 no checking account 12 existing credits paid back duly till now radio/television 3059 ... >= 1000 DM 4 <= ... < 7 years 2 male : divorced/separated none 4 real estate 61 none own 1 unskilled - resident 1 none yes good
9 0 <= ... < 200 DM 30 critical account/ other credits existing (not at this bank) car (new) 5234 ... < 100 DM unemployed 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 28 none own 2 management/ self-employed/ highly qualified employee/ officer 1 none yes bad
10 0 <= ... < 200 DM 12 existing credits paid back duly till now car (new) 1295 ... < 100 DM ... < 1 year 3 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 25 none rent 1 skilled employee / official 1 none yes bad
11 ... < 0 DM 48 existing credits paid back duly till now business 4308 ... < 100 DM ... < 1 year 3 male : divorced/separated none 4 building society savings agreement/ life insurance 24 none rent 1 skilled employee / official 1 none yes bad
12 0 <= ... < 200 DM 12 existing credits paid back duly till now radio/television 1567 ... < 100 DM 1 <= ... < 4 years 1 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 22 none own 1 skilled employee / official 1 yes, registered under the customers name yes good
13 ... < 0 DM 24 critical account/ other credits existing (not at this bank) car (new) 1199 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 60 none own 2 unskilled - resident 1 none yes bad
14 ... < 0 DM 15 existing credits paid back duly till now car (new) 1403 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 28 none rent 1 skilled employee / official 1 none yes good
15 ... < 0 DM 24 existing credits paid back duly till now radio/television 1282 100 <= ... < 500 DM 1 <= ... < 4 years 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 32 none own 1 unskilled - resident 1 none yes bad
16 no checking account 24 critical account/ other credits existing (not at this bank) radio/television 2424 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 2 skilled employee / official 1 none yes good
17 ... < 0 DM 30 no credits taken/ all credits paid back duly business 8072 unknown/ no savings account ... < 1 year 2 male : divorced/separated none 3 car or other, not in attribute Savings account/bonds 25 bank own 3 skilled employee / official 1 none yes good
18 0 <= ... < 200 DM 24 existing credits paid back duly till now car (used) 12579 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 2 unknown / no property 44 none for free 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes bad
19 no checking account 24 existing credits paid back duly till now radio/television 3430 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 31 none own 1 skilled employee / official 2 yes, registered under the customers name yes good

# 1.2、统计所有变量类型、个数等信息

1. <class 'pandas.core.frame.DataFrame'>
2. RangeIndex: 1000 entries, 0 to 999
3. Data columns (total 21 columns):
4.  #   Column                                                    Non-Null Count  Dtype   
5. ---  ------                                                    --------------  -----   
6. 0   status.of.existing.checking.account                       1000 non-null   category
7. 1   duration.in.month                                         1000 non-null   int64   
8. 2   credit.history                                            1000 non-null   category
9. 3   purpose                                                   1000 non-null   object
10. 4   credit.amount                                             1000 non-null   int64   
11. 5   savings.account.and.bonds                                 1000 non-null   category
12. 6   present.employment.since                                  1000 non-null   category
13. 7   installment.rate.in.percentage.of.disposable.income       1000 non-null   int64   
14. 8   personal.status.and.sex                                   1000 non-null   category
15. 9   other.debtors.or.guarantors                               1000 non-null   category
16. 10  present.residence.since                                   1000 non-null   int64   
17. 11  property                                                  1000 non-null   category
18. 12  age.in.years                                              1000 non-null   int64   
19. 13  other.installment.plans                                   1000 non-null   category
20. 14  housing                                                   1000 non-null   category
21. 15  number.of.existing.credits.at.this.bank                   1000 non-null   int64   
22. 16  job                                                       1000 non-null   category
23. 17  number.of.people.being.liable.to.provide.maintenance.for  1000 non-null   int64   
24. 18  telephone                                                 1000 non-null   category
25. 19  foreign.worker                                            1000 non-null   category
26. 20  creditability                                             1000 non-null   object
27. dtypes: category(12), int64(7), object(2)
28. memory usage: 84.0+ KB

# 2、数据预处理

# 2.1、变量筛选

#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y

1. var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95,  
2.                identical_limit=0.95, var_rm=None, var_kp=None, 
3.                return_rm_reason=False, positive='bad|1')
4. '''
5. 函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。
6. 体参数如下:可跳到该函数查询
7. varrm:可设置强制保留的变量,默认为空;
8. varkp:可设置强制剔除的变量,默认为空;
9. return_rm_reason:可设置是否返回剔除原因,默认为不返回(False);
10. positive:可设置坏样本对应的值,默认为“bad|1”。
11. '''
age.in.years other.debtors.or.guarantors savings.account.and.bonds credit.amount installment.rate.in.percentage.of.disposable.income status.of.existing.checking.account credit.history present.employment.since purpose housing property other.installment.plans duration.in.month creditability
0 67 none unknown/ no savings account 1169 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own real estate none 6 0
1 22 none ... < 100 DM 5951 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own real estate none 48 1
2 49 none ... < 100 DM 2096 2 no checking account critical account/ other credits existing (not at this bank) 4 <= ... < 7 years education own real estate none 12 0
3 45 guarantor ... < 100 DM 7882 2 ... < 0 DM existing credits paid back duly till now 4 <= ... < 7 years furniture/equipment for free building society savings agreement/ life insurance none 42 0
4 53 none ... < 100 DM 4870 3 ... < 0 DM delay in paying off in the past 1 <= ... < 4 years car (new) for free unknown / no property none 24 1
5 35 none unknown/ no savings account 9055 2 no checking account existing credits paid back duly till now 1 <= ... < 4 years education for free unknown / no property none 36 0
6 53 none 500 <= ... < 1000 DM 2835 3 no checking account existing credits paid back duly till now ... >= 7 years furniture/equipment own building society savings agreement/ life insurance none 24 0
7 35 none ... < 100 DM 6948 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years car (used) rent car or other, not in attribute Savings account/bonds none 36 0
8 61 none ... >= 1000 DM 3059 2 no checking account existing credits paid back duly till now 4 <= ... < 7 years radio/television own real estate none 12 0
9 28 none ... < 100 DM 5234 4 0 <= ... < 200 DM critical account/ other credits existing (not at this bank) unemployed car (new) own car or other, not in attribute Savings account/bonds none 30 1
10 25 none ... < 100 DM 1295 3 0 <= ... < 200 DM existing credits paid back duly till now ... < 1 year car (new) rent car or other, not in attribute Savings account/bonds none 12 1
11 24 none ... < 100 DM 4308 3 ... < 0 DM existing credits paid back duly till now ... < 1 year business rent building society savings agreement/ life insurance none 48 1
12 22 none ... < 100 DM 1567 1 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 12 0
13 60 none ... < 100 DM 1199 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years car (new) own car or other, not in attribute Savings account/bonds none 24 1
14 28 none ... < 100 DM 1403 2 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years car (new) rent car or other, not in attribute Savings account/bonds none 15 0
15 32 none 100 <= ... < 500 DM 1282 4 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 24 1
16 53 none unknown/ no savings account 2424 4 no checking account critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own building society savings agreement/ life insurance none 24 0
17 25 none unknown/ no savings account 8072 2 ... < 0 DM no credits taken/ all credits paid back duly ... < 1 year business own car or other, not in attribute Savings account/bonds bank 30 0
18 44 none ... < 100 DM 12579 4 0 <= ... < 200 DM existing credits paid back duly till now ... >= 7 years car (used) for free unknown / no property none 24 1
19 31 none 500 <= ... < 1000 DM 3430 3 no checking account existing credits paid back duly till now ... >= 7 years radio/television own car or other, not in attribute Savings account/bonds none 24 0

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

1. woebin(dt, y, x=None, 
2.            var_skip=None, breaks_list=None, special_values=None, 
3.            stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8, 
4. # min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8, 
5.            positive="bad|1", no_cores=None, print_step=0, method="tree",
6.            ignore_const_cols=True, ignore_datetime_cols=True, 
7.            check_cate_num=True, replace_blank=True, 
8.            save_breaks_list=None, **kwargs)
9. '''
10. 函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。
11. 具体参数如下:可跳到该函数查询
12. var_skip: 设置需要跳过分箱操作的变量;
13. breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理;
14. special_values: 设置需要单独分箱的值,默认为空;
15. count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05;
16. stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1;
17. bin_num_limit: 该参数为整数,代表最大分箱数。
18. positive: 指定样本中正样本对应的标签,默认为"bad|1";
19. no_cores: 设置用于并行计算的 CPU 数目;
20. print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息;
21. method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree";
22. ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列;
23. ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列;
24. check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度;
25. replace_blank: 设置是否将空值填为None,默认为True。
26. '''

data_df_woebin['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,26.0) 190 0.19 110 80 0.421052632 0.528844129 0.057921024 0.130498542 26 FALSE
1 age.in.years [26.0,28.0) 101 0.101 74 27 0.267326733 -0.160930367 0.002528906 0.130498542 28 FALSE
2 age.in.years [28.0,35.0) 257 0.257 172 85 0.3307393 0.14245464 0.005359008 0.130498542 35 FALSE
3 age.in.years [35.0,37.0) 79 0.079 67 12 0.151898734 -0.872488109 0.048610052 0.130498542 37 FALSE
4 age.in.years [37.0,inf) 373 0.373 277 96 0.257372654 -0.212371454 0.016079553 0.130498542 inf FALSE

# T2、手动分箱—利用自定义breaks_list参数即可

data_df_woebin_DIY['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,25.0) 149 0.149 88 61 0.409395973 0.48083491 0.037321948 0.086291678 25 FALSE
1 age.in.years [25.0,35.0) 399 0.399 268 131 0.328320802 0.131508203 0.007076394 0.086291678 35 FALSE
2 age.in.years [35.0,45.0) 251 0.251 193 58 0.231075697 -0.354949318 0.029241063 0.086291678 45 FALSE
3 age.in.years [45.0,inf) 201 0.201 151 50 0.248756219 -0.257958971 0.012652273 0.086291678 inf FALSE

# 2.3、分析变量分箱后可视化—观察是否存在单调性

对各变量分箱的count distribution和bad probability进行可视化

# 2.4、对变量执行woe分箱变换

creditability savings.account.and.bonds_woe housing_woe age.in.years_woe other.debtors.or.guarantors_woe purpose_woe credit.amount_woe credit.history_woe installment.rate.in.percentage.of.disposable.income_woe other.installment.plans_woe present.employment.since_woe property_woe status.of.existing.checking.account_woe duration.in.month_woe
0 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 -0.461034959 0.614203978 -1.312186389
1 1 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 -0.461034959 0.614203978 1.134979933
2 0 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 -0.733740578 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
3 0 0.271357844 0.472604411 -0.257958971 0.005115101 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 -0.394415272 0.028573372 0.614203978 0.524524468
4 1 0.271357844 0.472604411 -0.257958971 -0.000525072 0.279920067 0.390539458 0.085157808 -0.064538521 -0.121178625 0.032103245 0.586082361 0.614203978 0.108688306
5 0 -0.762140052 0.472604411 -0.354949318 -0.000525072 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.586082361 -1.176263223 0.524524468
6 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
7 0 0.271357844 0.40444522 -0.354949318 -0.000525072 -0.805625164 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 0.524524468
8 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
9 1 0.271357844 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 -0.733740578 0.157300289 -0.121178625 0.431137463 0.034191365 0.614203978 0.108688306
10 1 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 0.033661283 0.088318617 -0.064538521 -0.121178625 0.431137463 0.034191365 0.614203978 -0.346624608
11 1 0.271357844 0.40444522 0.48083491 -0.000525072 0.279920067 0.390539458 0.088318617 -0.064538521 -0.121178625 0.431137463 0.028573372 0.614203978 1.134979933
12 0 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
13 1 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.034191365 0.614203978 0.108688306
14 0 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
15 1 0.13955188 -0.194156014 0.131508203 -0.000525072 -0.410062817 0.033661283 0.088318617 0.157300289 -0.121178625 0.032103245 0.034191365 0.614203978 0.108688306
16 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
17 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 1.234070835 -0.190472769 0.477550835 0.431137463 0.034191365 0.614203978 0.108688306
18 1 0.271357844 0.472604411 -0.354949318 -0.000525072 -0.805625164 1.170071253 0.088318617 0.157300289 -0.121178625 -0.235566071 0.586082361 0.614203978 0.108688306
19 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.034191365 -1.176263223 0.108688306

# 3、模型训练

# 3.1、切分数据集

train2woe输出如下所示

age.in.years_woe credit.amount_woe credit.history_woe creditability duration.in.month_woe housing_woe installment.rate.in.percentage.of.disposable.income_woe other.debtors.or.guarantors_woe other.installment.plans_woe present.employment.since_woe property_woe purpose_woe savings.account.and.bonds_woe status.of.existing.checking.account_woe
0 -0.257958971 0.033661283 -0.733740578 0 -1.312186389 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 -0.461034959 -0.410062817 -0.762140052 0.614203978
1 0.48083491 0.390539458 0.088318617 1 1.134979933 -0.194156014 -0.190472769 -0.000525072 -0.121178625 0.032103245 -0.461034959 -0.410062817 0.271357844 0.614203978
2 -0.257958971 -0.258307464 -0.733740578 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 0.279920067 0.271357844 -1.176263223
6 -0.257958971 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.028573372 0.279920067 -0.762140052 -1.176263223
7 -0.354949318 0.390539458 0.088318617 0 0.524524468 0.40444522 -0.190472769 -0.000525072 -0.121178625 0.032103245 0.034191365 -0.805625164 0.271357844 0.614203978
8 -0.257958971 -0.258307464 0.088318617 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 -0.410062817 -0.762140052 -1.176263223
11 0.48083491 0.390539458 0.088318617 1 1.134979933 0.40444522 -0.064538521 -0.000525072 -0.121178625 0.431137463 0.028573372 0.279920067 0.271357844 0.614203978
13 -0.257958971 0.033661283 -0.733740578 1 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.034191365 0.279920067 0.271357844 0.614203978
16 -0.257958971 -0.258307464 -0.733740578 0 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.028573372 -0.410062817 -0.762140052 -1.176263223
18 -0.354949318 1.170071253 0.088318617 1 0.108688306 0.472604411 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.586082361 -0.805625164 0.271357844 0.614203978
19 0.131508203 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.034191365 -0.410062817 -0.762140052 -1.176263223

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811

 0.         0.23090027 0.7965086  0.22792681 1.07066195 0.83836441

 0.72843684]]

intercept_: [-0.83437247]

# 3.4、模型评估

利用perf_eva函数进行评估

1. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], 
2.  show_plot=True, positive="bad|1", seed=186)
3. '''
4. 函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr
5. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186)
6. perf_eva()函数可以从
7. '''

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分

1. scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False)
2. '''
3. 函数功能:概率进行映射,转换成评分卡得分
4. 具体参数如下
5. bins:分箱信息。woebin()返回的结果。
6. model:模型对象。
7. points0:基础分,默认为600。 odds:好坏比,默认为1:19
8. pdo:比率翻番的倍数,默认为50。
9. basepoints_eq0:如果为True,则将基础分分散到每个变量中。
10. '''

print('card_dict_age.in.years \n',card_dict['age.in.years'])

print('card_dict_credit.amount \n',card_dict['credit.amount'])

print('card_dict_credit.historyt \n',card_dict['credit.history'])

print('card_dict_duration.in.month \n',card_dict['duration.in.month'])

print('card_dict_housing \n',card_dict['housing'])

1. card_dict_age.in.years 
2.          variable          bin  points
3. 10  age.in.years  [-inf,25.0)   -12.0
4. 11  age.in.years  [25.0,35.0)    -3.0
5. 12  age.in.years  [35.0,45.0)     9.0
6. 13  age.in.years   [45.0,inf)     6.0
7. card_dict_credit.amount 
8.           variable              bin  points
9. 31  credit.amount    [-inf,1400.0)    -2.0
10. 32  credit.amount  [1400.0,1800.0)    41.0
11. 33  credit.amount  [1800.0,4000.0)    15.0
12. 34  credit.amount  [4000.0,9200.0)   -22.0
13. 35  credit.amount     [9200.0,inf)   -66.0
14. card_dict_credit.historyt 
15.            variable                                                bin  points
16. 17  credit.history  no credits taken/ all credits paid back duly%,...   -51.0
17. 18  credit.history           existing credits paid back duly till now    -4.0
18. 19  credit.history                    delay in paying off in the past    -4.0
19. 20  credit.history  critical account/ other credits existing (not ...    30.0
20. card_dict_duration.in.month 
21.               variable          bin  points
22. 23  duration.in.month   [-inf,8.0)    85.0
23. 24  duration.in.month   [8.0,16.0)    22.0
24. 25  duration.in.month  [16.0,34.0)    -7.0
25. 26  duration.in.month  [34.0,44.0)   -34.0
26. 27  duration.in.month   [44.0,inf)   -74.0
27. card_dict_housing 
28.     variable       bin  points
29. 42  housing      rent   -20.0
30. 43  housing       own    10.0
31. 44  housing  for free   -23.0

# 4.2、线上模型评估—评分稳定性评估PSI

# 利用scorecard_ply()函数计算train和test数据集的信用分数

1. scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True, 
2.  var_kp=None):
3. '''
4. 函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。
5.     
6. dt:原始数据
7. card: 从`scorecard`生成的记分卡。
8. only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。
9. print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。
10. replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。
11. var_kp:强制保留变量的名称,如id列。 默认为无。
12. '''


相关文章
|
1天前
|
机器学习/深度学习 算法
【机器学习】比较朴素贝叶斯算法与逻辑回归算法
【5月更文挑战第10天】【机器学习】比较朴素贝叶斯算法与逻辑回归算法
|
1天前
|
机器学习/深度学习 人工智能 算法
高性价比发文典范——101种机器学习算法组合革新骨肉瘤预后模型
随着高通量测序技术的飞速发展和多组学分析的广泛应用,科研人员在探索生物学奥秘时经常遇到一个令人又爱又恼的问题:如何从浩如烟海的数据中挖掘出潜在的疾病关联靶点?又如何构建一个全面而有效的诊断或预后模型?只有通过优雅的数据挖掘、精致的结果展示、深入的讨论分析,并且辅以充分的湿实验验证,我们才能锻造出一篇兼具深度与广度的“干湿结合”佳作。
12 0
高性价比发文典范——101种机器学习算法组合革新骨肉瘤预后模型
|
3天前
|
算法 调度
考虑需求响应的微网优化调度模型【粒子群算法】【matlab】
考虑需求响应的微网优化调度模型【粒子群算法】【matlab】
|
3天前
|
算法 调度
【免费】基于模型预测算法的含储能微网双层能量管理模型(MATLAB)
【免费】基于模型预测算法的含储能微网双层能量管理模型(MATLAB)
|
5天前
|
机器学习/深度学习 自然语言处理 算法
Python遗传算法GA对长短期记忆LSTM深度学习模型超参数调优分析司机数据|附数据代码
Python遗传算法GA对长短期记忆LSTM深度学习模型超参数调优分析司机数据|附数据代码
|
1天前
|
算法 数据安全/隐私保护 计算机视觉
基于二维CS-SCHT变换和LABS方法的水印嵌入和提取算法matlab仿真
该内容包括一个算法的运行展示和详细步骤,使用了MATLAB2022a。算法涉及水印嵌入和提取,利用LAB色彩空间可能用于隐藏水印。水印通过二维CS-SCHT变换、低频系数处理和特定解码策略来提取。代码段展示了水印置乱、图像处理(如噪声、旋转、剪切等攻击)以及水印的逆置乱和提取过程。最后,计算并保存了比特率,用于评估水印的稳健性。
|
2天前
|
存储 算法 数据可视化
基于harris角点和RANSAC算法的图像拼接matlab仿真
本文介绍了使用MATLAB2022a进行图像拼接的流程,涉及Harris角点检测和RANSAC算法。Harris角点检测寻找图像中局部曲率变化显著的点,RANSAC则用于排除噪声和异常点,找到最佳匹配。核心程序包括自定义的Harris角点计算函数,RANSAC参数设置,以及匹配点的可视化和仿射变换矩阵计算,最终生成全景图像。
|
2天前
|
算法 Serverless
m基于遗传优化的LDPC码NMS译码算法最优归一化参数计算和误码率matlab仿真
MATLAB 2022a仿真实现了遗传优化的归一化最小和(NMS)译码算法,应用于低密度奇偶校验(LDPC)码。结果显示了遗传优化的迭代过程和误码率对比。遗传算法通过选择、交叉和变异操作寻找最佳归一化因子,以提升NMS译码性能。核心程序包括迭代优化、目标函数计算及性能绘图。最终,展示了SNR与误码率的关系,并保存了关键数据。
12 1
|
3天前
|
运维 算法
基于改进遗传算法的配电网故障定位(matlab代码)
基于改进遗传算法的配电网故障定位(matlab代码)
|
3天前
|
算法 调度
基于多目标粒子群算法冷热电联供综合能源系统运行优化(matlab代码)
基于多目标粒子群算法冷热电联供综合能源系统运行优化(matlab代码)