ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

简介: ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解


目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 1.1、查看部分数据

# 1.2、统计所有变量类型、个数等信息

# 2、数据预处理

# 2.1、变量筛选

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

# T2、手动分箱—利用自定义breaks_list参数即可

# 2.3、分析变量分箱后可视化—观察是否存在单调性

# 2.4、对变量执行woe分箱变换

# 3、模型训练

# 3.1、切分数据集

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

# 3.4、模型评估

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

# 4.2、线上模型评估—评分稳定性评估PSI


相关文章

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解代码实现

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。

数据集UCI Machine Learning Repository: Data Set

# 1.1、查看部分数据

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income personal.status.and.sex other.debtors.or.guarantors present.residence.since property age.in.years other.installment.plans housing number.of.existing.credits.at.this.bank job number.of.people.being.liable.to.provide.maintenance.for telephone foreign.worker creditability
0 ... < 0 DM 6 critical account/ other credits existing (not at this bank) radio/television 1169 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 real estate 67 none own 2 skilled employee / official 1 yes, registered under the customers name yes good
1 0 <= ... < 200 DM 48 existing credits paid back duly till now radio/television 5951 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 real estate 22 none own 1 skilled employee / official 1 none yes bad
2 no checking account 12 critical account/ other credits existing (not at this bank) education 2096 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated none 3 real estate 49 none own 1 unskilled - resident 2 none yes good
3 ... < 0 DM 42 existing credits paid back duly till now furniture/equipment 7882 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated guarantor 4 building society savings agreement/ life insurance 45 none for free 1 skilled employee / official 2 none yes good
4 ... < 0 DM 24 delay in paying off in the past car (new) 4870 ... < 100 DM 1 <= ... < 4 years 3 male : divorced/separated none 4 unknown / no property 53 none for free 2 skilled employee / official 2 none yes bad
5 no checking account 36 existing credits paid back duly till now education 9055 unknown/ no savings account 1 <= ... < 4 years 2 male : divorced/separated none 4 unknown / no property 35 none for free 1 unskilled - resident 2 yes, registered under the customers name yes good
6 no checking account 24 existing credits paid back duly till now furniture/equipment 2835 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 1 skilled employee / official 1 none yes good
7 0 <= ... < 200 DM 36 existing credits paid back duly till now car (used) 6948 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 35 none rent 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes good
8 no checking account 12 existing credits paid back duly till now radio/television 3059 ... >= 1000 DM 4 <= ... < 7 years 2 male : divorced/separated none 4 real estate 61 none own 1 unskilled - resident 1 none yes good
9 0 <= ... < 200 DM 30 critical account/ other credits existing (not at this bank) car (new) 5234 ... < 100 DM unemployed 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 28 none own 2 management/ self-employed/ highly qualified employee/ officer 1 none yes bad
10 0 <= ... < 200 DM 12 existing credits paid back duly till now car (new) 1295 ... < 100 DM ... < 1 year 3 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 25 none rent 1 skilled employee / official 1 none yes bad
11 ... < 0 DM 48 existing credits paid back duly till now business 4308 ... < 100 DM ... < 1 year 3 male : divorced/separated none 4 building society savings agreement/ life insurance 24 none rent 1 skilled employee / official 1 none yes bad
12 0 <= ... < 200 DM 12 existing credits paid back duly till now radio/television 1567 ... < 100 DM 1 <= ... < 4 years 1 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 22 none own 1 skilled employee / official 1 yes, registered under the customers name yes good
13 ... < 0 DM 24 critical account/ other credits existing (not at this bank) car (new) 1199 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 60 none own 2 unskilled - resident 1 none yes bad
14 ... < 0 DM 15 existing credits paid back duly till now car (new) 1403 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 28 none rent 1 skilled employee / official 1 none yes good
15 ... < 0 DM 24 existing credits paid back duly till now radio/television 1282 100 <= ... < 500 DM 1 <= ... < 4 years 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 32 none own 1 unskilled - resident 1 none yes bad
16 no checking account 24 critical account/ other credits existing (not at this bank) radio/television 2424 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 2 skilled employee / official 1 none yes good
17 ... < 0 DM 30 no credits taken/ all credits paid back duly business 8072 unknown/ no savings account ... < 1 year 2 male : divorced/separated none 3 car or other, not in attribute Savings account/bonds 25 bank own 3 skilled employee / official 1 none yes good
18 0 <= ... < 200 DM 24 existing credits paid back duly till now car (used) 12579 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 2 unknown / no property 44 none for free 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes bad
19 no checking account 24 existing credits paid back duly till now radio/television 3430 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 31 none own 1 skilled employee / official 2 yes, registered under the customers name yes good

# 1.2、统计所有变量类型、个数等信息

1. <class 'pandas.core.frame.DataFrame'>
2. RangeIndex: 1000 entries, 0 to 999
3. Data columns (total 21 columns):
4.  #   Column                                                    Non-Null Count  Dtype   
5. ---  ------                                                    --------------  -----   
6. 0   status.of.existing.checking.account                       1000 non-null   category
7. 1   duration.in.month                                         1000 non-null   int64   
8. 2   credit.history                                            1000 non-null   category
9. 3   purpose                                                   1000 non-null   object
10. 4   credit.amount                                             1000 non-null   int64   
11. 5   savings.account.and.bonds                                 1000 non-null   category
12. 6   present.employment.since                                  1000 non-null   category
13. 7   installment.rate.in.percentage.of.disposable.income       1000 non-null   int64   
14. 8   personal.status.and.sex                                   1000 non-null   category
15. 9   other.debtors.or.guarantors                               1000 non-null   category
16. 10  present.residence.since                                   1000 non-null   int64   
17. 11  property                                                  1000 non-null   category
18. 12  age.in.years                                              1000 non-null   int64   
19. 13  other.installment.plans                                   1000 non-null   category
20. 14  housing                                                   1000 non-null   category
21. 15  number.of.existing.credits.at.this.bank                   1000 non-null   int64   
22. 16  job                                                       1000 non-null   category
23. 17  number.of.people.being.liable.to.provide.maintenance.for  1000 non-null   int64   
24. 18  telephone                                                 1000 non-null   category
25. 19  foreign.worker                                            1000 non-null   category
26. 20  creditability                                             1000 non-null   object
27. dtypes: category(12), int64(7), object(2)
28. memory usage: 84.0+ KB

# 2、数据预处理

# 2.1、变量筛选

#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y

1. var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95,  
2.                identical_limit=0.95, var_rm=None, var_kp=None, 
3.                return_rm_reason=False, positive='bad|1')
4. '''
5. 函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。
6. 体参数如下:可跳到该函数查询
7. varrm:可设置强制保留的变量,默认为空;
8. varkp:可设置强制剔除的变量,默认为空;
9. return_rm_reason:可设置是否返回剔除原因,默认为不返回(False);
10. positive:可设置坏样本对应的值,默认为“bad|1”。
11. '''
age.in.years other.debtors.or.guarantors savings.account.and.bonds credit.amount installment.rate.in.percentage.of.disposable.income status.of.existing.checking.account credit.history present.employment.since purpose housing property other.installment.plans duration.in.month creditability
0 67 none unknown/ no savings account 1169 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own real estate none 6 0
1 22 none ... < 100 DM 5951 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own real estate none 48 1
2 49 none ... < 100 DM 2096 2 no checking account critical account/ other credits existing (not at this bank) 4 <= ... < 7 years education own real estate none 12 0
3 45 guarantor ... < 100 DM 7882 2 ... < 0 DM existing credits paid back duly till now 4 <= ... < 7 years furniture/equipment for free building society savings agreement/ life insurance none 42 0
4 53 none ... < 100 DM 4870 3 ... < 0 DM delay in paying off in the past 1 <= ... < 4 years car (new) for free unknown / no property none 24 1
5 35 none unknown/ no savings account 9055 2 no checking account existing credits paid back duly till now 1 <= ... < 4 years education for free unknown / no property none 36 0
6 53 none 500 <= ... < 1000 DM 2835 3 no checking account existing credits paid back duly till now ... >= 7 years furniture/equipment own building society savings agreement/ life insurance none 24 0
7 35 none ... < 100 DM 6948 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years car (used) rent car or other, not in attribute Savings account/bonds none 36 0
8 61 none ... >= 1000 DM 3059 2 no checking account existing credits paid back duly till now 4 <= ... < 7 years radio/television own real estate none 12 0
9 28 none ... < 100 DM 5234 4 0 <= ... < 200 DM critical account/ other credits existing (not at this bank) unemployed car (new) own car or other, not in attribute Savings account/bonds none 30 1
10 25 none ... < 100 DM 1295 3 0 <= ... < 200 DM existing credits paid back duly till now ... < 1 year car (new) rent car or other, not in attribute Savings account/bonds none 12 1
11 24 none ... < 100 DM 4308 3 ... < 0 DM existing credits paid back duly till now ... < 1 year business rent building society savings agreement/ life insurance none 48 1
12 22 none ... < 100 DM 1567 1 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 12 0
13 60 none ... < 100 DM 1199 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years car (new) own car or other, not in attribute Savings account/bonds none 24 1
14 28 none ... < 100 DM 1403 2 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years car (new) rent car or other, not in attribute Savings account/bonds none 15 0
15 32 none 100 <= ... < 500 DM 1282 4 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 24 1
16 53 none unknown/ no savings account 2424 4 no checking account critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own building society savings agreement/ life insurance none 24 0
17 25 none unknown/ no savings account 8072 2 ... < 0 DM no credits taken/ all credits paid back duly ... < 1 year business own car or other, not in attribute Savings account/bonds bank 30 0
18 44 none ... < 100 DM 12579 4 0 <= ... < 200 DM existing credits paid back duly till now ... >= 7 years car (used) for free unknown / no property none 24 1
19 31 none 500 <= ... < 1000 DM 3430 3 no checking account existing credits paid back duly till now ... >= 7 years radio/television own car or other, not in attribute Savings account/bonds none 24 0

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

1. woebin(dt, y, x=None, 
2.            var_skip=None, breaks_list=None, special_values=None, 
3.            stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8, 
4. # min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8, 
5.            positive="bad|1", no_cores=None, print_step=0, method="tree",
6.            ignore_const_cols=True, ignore_datetime_cols=True, 
7.            check_cate_num=True, replace_blank=True, 
8.            save_breaks_list=None, **kwargs)
9. '''
10. 函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。
11. 具体参数如下:可跳到该函数查询
12. var_skip: 设置需要跳过分箱操作的变量;
13. breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理;
14. special_values: 设置需要单独分箱的值,默认为空;
15. count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05;
16. stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1;
17. bin_num_limit: 该参数为整数,代表最大分箱数。
18. positive: 指定样本中正样本对应的标签,默认为"bad|1";
19. no_cores: 设置用于并行计算的 CPU 数目;
20. print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息;
21. method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree";
22. ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列;
23. ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列;
24. check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度;
25. replace_blank: 设置是否将空值填为None,默认为True。
26. '''

data_df_woebin['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,26.0) 190 0.19 110 80 0.421052632 0.528844129 0.057921024 0.130498542 26 FALSE
1 age.in.years [26.0,28.0) 101 0.101 74 27 0.267326733 -0.160930367 0.002528906 0.130498542 28 FALSE
2 age.in.years [28.0,35.0) 257 0.257 172 85 0.3307393 0.14245464 0.005359008 0.130498542 35 FALSE
3 age.in.years [35.0,37.0) 79 0.079 67 12 0.151898734 -0.872488109 0.048610052 0.130498542 37 FALSE
4 age.in.years [37.0,inf) 373 0.373 277 96 0.257372654 -0.212371454 0.016079553 0.130498542 inf FALSE

# T2、手动分箱—利用自定义breaks_list参数即可

data_df_woebin_DIY['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,25.0) 149 0.149 88 61 0.409395973 0.48083491 0.037321948 0.086291678 25 FALSE
1 age.in.years [25.0,35.0) 399 0.399 268 131 0.328320802 0.131508203 0.007076394 0.086291678 35 FALSE
2 age.in.years [35.0,45.0) 251 0.251 193 58 0.231075697 -0.354949318 0.029241063 0.086291678 45 FALSE
3 age.in.years [45.0,inf) 201 0.201 151 50 0.248756219 -0.257958971 0.012652273 0.086291678 inf FALSE

# 2.3、分析变量分箱后可视化—观察是否存在单调性

对各变量分箱的count distribution和bad probability进行可视化

# 2.4、对变量执行woe分箱变换

creditability savings.account.and.bonds_woe housing_woe age.in.years_woe other.debtors.or.guarantors_woe purpose_woe credit.amount_woe credit.history_woe installment.rate.in.percentage.of.disposable.income_woe other.installment.plans_woe present.employment.since_woe property_woe status.of.existing.checking.account_woe duration.in.month_woe
0 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 -0.461034959 0.614203978 -1.312186389
1 1 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 -0.461034959 0.614203978 1.134979933
2 0 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 -0.733740578 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
3 0 0.271357844 0.472604411 -0.257958971 0.005115101 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 -0.394415272 0.028573372 0.614203978 0.524524468
4 1 0.271357844 0.472604411 -0.257958971 -0.000525072 0.279920067 0.390539458 0.085157808 -0.064538521 -0.121178625 0.032103245 0.586082361 0.614203978 0.108688306
5 0 -0.762140052 0.472604411 -0.354949318 -0.000525072 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.586082361 -1.176263223 0.524524468
6 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
7 0 0.271357844 0.40444522 -0.354949318 -0.000525072 -0.805625164 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 0.524524468
8 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
9 1 0.271357844 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 -0.733740578 0.157300289 -0.121178625 0.431137463 0.034191365 0.614203978 0.108688306
10 1 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 0.033661283 0.088318617 -0.064538521 -0.121178625 0.431137463 0.034191365 0.614203978 -0.346624608
11 1 0.271357844 0.40444522 0.48083491 -0.000525072 0.279920067 0.390539458 0.088318617 -0.064538521 -0.121178625 0.431137463 0.028573372 0.614203978 1.134979933
12 0 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
13 1 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.034191365 0.614203978 0.108688306
14 0 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
15 1 0.13955188 -0.194156014 0.131508203 -0.000525072 -0.410062817 0.033661283 0.088318617 0.157300289 -0.121178625 0.032103245 0.034191365 0.614203978 0.108688306
16 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
17 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 1.234070835 -0.190472769 0.477550835 0.431137463 0.034191365 0.614203978 0.108688306
18 1 0.271357844 0.472604411 -0.354949318 -0.000525072 -0.805625164 1.170071253 0.088318617 0.157300289 -0.121178625 -0.235566071 0.586082361 0.614203978 0.108688306
19 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.034191365 -1.176263223 0.108688306

# 3、模型训练

# 3.1、切分数据集

train2woe输出如下所示

age.in.years_woe credit.amount_woe credit.history_woe creditability duration.in.month_woe housing_woe installment.rate.in.percentage.of.disposable.income_woe other.debtors.or.guarantors_woe other.installment.plans_woe present.employment.since_woe property_woe purpose_woe savings.account.and.bonds_woe status.of.existing.checking.account_woe
0 -0.257958971 0.033661283 -0.733740578 0 -1.312186389 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 -0.461034959 -0.410062817 -0.762140052 0.614203978
1 0.48083491 0.390539458 0.088318617 1 1.134979933 -0.194156014 -0.190472769 -0.000525072 -0.121178625 0.032103245 -0.461034959 -0.410062817 0.271357844 0.614203978
2 -0.257958971 -0.258307464 -0.733740578 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 0.279920067 0.271357844 -1.176263223
6 -0.257958971 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.028573372 0.279920067 -0.762140052 -1.176263223
7 -0.354949318 0.390539458 0.088318617 0 0.524524468 0.40444522 -0.190472769 -0.000525072 -0.121178625 0.032103245 0.034191365 -0.805625164 0.271357844 0.614203978
8 -0.257958971 -0.258307464 0.088318617 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 -0.410062817 -0.762140052 -1.176263223
11 0.48083491 0.390539458 0.088318617 1 1.134979933 0.40444522 -0.064538521 -0.000525072 -0.121178625 0.431137463 0.028573372 0.279920067 0.271357844 0.614203978
13 -0.257958971 0.033661283 -0.733740578 1 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.034191365 0.279920067 0.271357844 0.614203978
16 -0.257958971 -0.258307464 -0.733740578 0 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.028573372 -0.410062817 -0.762140052 -1.176263223
18 -0.354949318 1.170071253 0.088318617 1 0.108688306 0.472604411 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.586082361 -0.805625164 0.271357844 0.614203978
19 0.131508203 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.034191365 -0.410062817 -0.762140052 -1.176263223

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811

 0.         0.23090027 0.7965086  0.22792681 1.07066195 0.83836441

 0.72843684]]

intercept_: [-0.83437247]

# 3.4、模型评估

利用perf_eva函数进行评估

1. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], 
2.  show_plot=True, positive="bad|1", seed=186)
3. '''
4. 函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr
5. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186)
6. perf_eva()函数可以从
7. '''

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分

1. scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False)
2. '''
3. 函数功能:概率进行映射,转换成评分卡得分
4. 具体参数如下
5. bins:分箱信息。woebin()返回的结果。
6. model:模型对象。
7. points0:基础分,默认为600。 odds:好坏比,默认为1:19
8. pdo:比率翻番的倍数,默认为50。
9. basepoints_eq0:如果为True,则将基础分分散到每个变量中。
10. '''

print('card_dict_age.in.years \n',card_dict['age.in.years'])

print('card_dict_credit.amount \n',card_dict['credit.amount'])

print('card_dict_credit.historyt \n',card_dict['credit.history'])

print('card_dict_duration.in.month \n',card_dict['duration.in.month'])

print('card_dict_housing \n',card_dict['housing'])

1. card_dict_age.in.years 
2.          variable          bin  points
3. 10  age.in.years  [-inf,25.0)   -12.0
4. 11  age.in.years  [25.0,35.0)    -3.0
5. 12  age.in.years  [35.0,45.0)     9.0
6. 13  age.in.years   [45.0,inf)     6.0
7. card_dict_credit.amount 
8.           variable              bin  points
9. 31  credit.amount    [-inf,1400.0)    -2.0
10. 32  credit.amount  [1400.0,1800.0)    41.0
11. 33  credit.amount  [1800.0,4000.0)    15.0
12. 34  credit.amount  [4000.0,9200.0)   -22.0
13. 35  credit.amount     [9200.0,inf)   -66.0
14. card_dict_credit.historyt 
15.            variable                                                bin  points
16. 17  credit.history  no credits taken/ all credits paid back duly%,...   -51.0
17. 18  credit.history           existing credits paid back duly till now    -4.0
18. 19  credit.history                    delay in paying off in the past    -4.0
19. 20  credit.history  critical account/ other credits existing (not ...    30.0
20. card_dict_duration.in.month 
21.               variable          bin  points
22. 23  duration.in.month   [-inf,8.0)    85.0
23. 24  duration.in.month   [8.0,16.0)    22.0
24. 25  duration.in.month  [16.0,34.0)    -7.0
25. 26  duration.in.month  [34.0,44.0)   -34.0
26. 27  duration.in.month   [44.0,inf)   -74.0
27. card_dict_housing 
28.     variable       bin  points
29. 42  housing      rent   -20.0
30. 43  housing       own    10.0
31. 44  housing  for free   -23.0

# 4.2、线上模型评估—评分稳定性评估PSI

# 利用scorecard_ply()函数计算train和test数据集的信用分数

1. scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True, 
2.  var_kp=None):
3. '''
4. 函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。
5.     
6. dt:原始数据
7. card: 从`scorecard`生成的记分卡。
8. only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。
9. print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。
10. replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。
11. var_kp:强制保留变量的名称,如id列。 默认为无。
12. '''


相关文章
|
8天前
|
机器学习/深度学习 人工智能 算法
青否数字人声音克隆算法升级,16个超真实直播声音模型免费送!
青否数字人的声音克隆算法全面升级,能够完美克隆真人的音调、语速、情感和呼吸。提供16种超真实的直播声音模型,支持3大AI直播类型和6大核心AIGC技术,60秒快速开播,助力商家轻松赚钱。AI讲品、互动和售卖功能强大,支持多平台直播,确保每场直播话术不重复,智能互动和真实感十足。新手小白也能轻松上手,有效规避违规风险。
|
10天前
|
算法 测试技术 开发者
在Python开发中,性能优化和代码审查至关重要。性能优化通过改进代码结构和算法提高程序运行速度,减少资源消耗
在Python开发中,性能优化和代码审查至关重要。性能优化通过改进代码结构和算法提高程序运行速度,减少资源消耗;代码审查通过检查源代码发现潜在问题,提高代码质量和团队协作效率。本文介绍了一些实用的技巧和工具,帮助开发者提升开发效率。
14 3
|
9天前
|
分布式计算 Java 开发工具
阿里云MaxCompute-XGBoost on Spark 极限梯度提升算法的分布式训练与模型持久化oss的实现与代码浅析
本文介绍了XGBoost在MaxCompute+OSS架构下模型持久化遇到的问题及其解决方案。首先简要介绍了XGBoost的特点和应用场景,随后详细描述了客户在将XGBoost on Spark任务从HDFS迁移到OSS时遇到的异常情况。通过分析异常堆栈和源代码,发现使用的`nativeBooster.saveModel`方法不支持OSS路径,而使用`write.overwrite().save`方法则能成功保存模型。最后提供了完整的Scala代码示例、Maven配置和提交命令,帮助用户顺利迁移模型存储路径。
|
13天前
|
机器学习/深度学习 人工智能 算法
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
车辆车型识别,使用Python作为主要编程语言,通过收集多种车辆车型图像数据集,然后基于TensorFlow搭建卷积网络算法模型,并对数据集进行训练,最后得到一个识别精度较高的模型文件。再基于Django搭建web网页端操作界面,实现用户上传一张车辆图片识别其类型。
52 0
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
|
17天前
|
算法 安全 数据安全/隐私保护
基于game-based算法的动态频谱访问matlab仿真
本算法展示了在认知无线电网络中,通过游戏理论优化动态频谱访问,提高频谱利用率和物理层安全性。程序运行效果包括负载因子、传输功率、信噪比对用户效用和保密率的影响分析。软件版本:Matlab 2022a。完整代码包含详细中文注释和操作视频。
|
2天前
|
算法 数据挖掘 数据安全/隐私保护
基于FCM模糊聚类算法的图像分割matlab仿真
本项目展示了基于模糊C均值(FCM)算法的图像分割技术。算法运行效果良好,无水印。使用MATLAB 2022a开发,提供完整代码及中文注释,附带操作步骤视频。FCM算法通过隶属度矩阵和聚类中心矩阵实现图像分割,适用于灰度和彩色图像,广泛应用于医学影像、遥感图像等领域。
|
3天前
|
算法 调度
基于遗传模拟退火混合优化算法的车间作业最优调度matlab仿真,输出甘特图
车间作业调度问题(JSSP)通过遗传算法(GA)和模拟退火算法(SA)优化多个作业在并行工作中心上的加工顺序和时间,以最小化总完成时间和机器闲置时间。MATLAB2022a版本运行测试,展示了有效性和可行性。核心程序采用作业列表表示法,结合遗传操作和模拟退火过程,提高算法性能。
|
4天前
|
存储 算法 决策智能
基于免疫算法的TSP问题求解matlab仿真
旅行商问题(TSP)是一个经典的组合优化问题,目标是寻找经过每个城市恰好一次并返回起点的最短回路。本文介绍了一种基于免疫算法(IA)的解决方案,该算法模拟生物免疫系统的运作机制,通过克隆选择、变异和免疫记忆等步骤,有效解决了TSP问题。程序使用MATLAB 2022a版本运行,展示了良好的优化效果。
|
3天前
|
机器学习/深度学习 算法 芯片
基于GSP工具箱的NILM算法matlab仿真
基于GSP工具箱的NILM算法Matlab仿真,利用图信号处理技术解析家庭或建筑内各电器的独立功耗。GSPBox通过图的节点、边和权重矩阵表示电气系统,实现对未知数据的有效分类。系统使用MATLAB2022a版本,通过滤波或分解技术从全局能耗信号中提取子设备的功耗信息。
|
3天前
|
机器学习/深度学习 算法 5G
基于MIMO系统的SDR-AltMin混合预编码算法matlab性能仿真
基于MIMO系统的SDR-AltMin混合预编码算法通过结合半定松弛和交替最小化技术,优化大规模MIMO系统的预编码矩阵,提高信号质量。Matlab 2022a仿真结果显示,该算法能有效提升系统性能并降低计算复杂度。核心程序包括预编码和接收矩阵的设计,以及不同信噪比下的性能评估。
16 3