ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

简介: ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解


目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 1.1、查看部分数据

# 1.2、统计所有变量类型、个数等信息

# 2、数据预处理

# 2.1、变量筛选

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

# T2、手动分箱—利用自定义breaks_list参数即可

# 2.3、分析变量分箱后可视化—观察是否存在单调性

# 2.4、对变量执行woe分箱变换

# 3、模型训练

# 3.1、切分数据集

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

# 3.4、模型评估

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

# 4.2、线上模型评估—评分稳定性评估PSI


相关文章

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以scorecardpy框架全流程讲解代码实现

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。

数据集UCI Machine Learning Repository: Data Set

# 1.1、查看部分数据

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income personal.status.and.sex other.debtors.or.guarantors present.residence.since property age.in.years other.installment.plans housing number.of.existing.credits.at.this.bank job number.of.people.being.liable.to.provide.maintenance.for telephone foreign.worker creditability
0 ... < 0 DM 6 critical account/ other credits existing (not at this bank) radio/television 1169 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 real estate 67 none own 2 skilled employee / official 1 yes, registered under the customers name yes good
1 0 <= ... < 200 DM 48 existing credits paid back duly till now radio/television 5951 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 real estate 22 none own 1 skilled employee / official 1 none yes bad
2 no checking account 12 critical account/ other credits existing (not at this bank) education 2096 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated none 3 real estate 49 none own 1 unskilled - resident 2 none yes good
3 ... < 0 DM 42 existing credits paid back duly till now furniture/equipment 7882 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated guarantor 4 building society savings agreement/ life insurance 45 none for free 1 skilled employee / official 2 none yes good
4 ... < 0 DM 24 delay in paying off in the past car (new) 4870 ... < 100 DM 1 <= ... < 4 years 3 male : divorced/separated none 4 unknown / no property 53 none for free 2 skilled employee / official 2 none yes bad
5 no checking account 36 existing credits paid back duly till now education 9055 unknown/ no savings account 1 <= ... < 4 years 2 male : divorced/separated none 4 unknown / no property 35 none for free 1 unskilled - resident 2 yes, registered under the customers name yes good
6 no checking account 24 existing credits paid back duly till now furniture/equipment 2835 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 1 skilled employee / official 1 none yes good
7 0 <= ... < 200 DM 36 existing credits paid back duly till now car (used) 6948 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 35 none rent 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes good
8 no checking account 12 existing credits paid back duly till now radio/television 3059 ... >= 1000 DM 4 <= ... < 7 years 2 male : divorced/separated none 4 real estate 61 none own 1 unskilled - resident 1 none yes good
9 0 <= ... < 200 DM 30 critical account/ other credits existing (not at this bank) car (new) 5234 ... < 100 DM unemployed 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 28 none own 2 management/ self-employed/ highly qualified employee/ officer 1 none yes bad
10 0 <= ... < 200 DM 12 existing credits paid back duly till now car (new) 1295 ... < 100 DM ... < 1 year 3 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 25 none rent 1 skilled employee / official 1 none yes bad
11 ... < 0 DM 48 existing credits paid back duly till now business 4308 ... < 100 DM ... < 1 year 3 male : divorced/separated none 4 building society savings agreement/ life insurance 24 none rent 1 skilled employee / official 1 none yes bad
12 0 <= ... < 200 DM 12 existing credits paid back duly till now radio/television 1567 ... < 100 DM 1 <= ... < 4 years 1 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 22 none own 1 skilled employee / official 1 yes, registered under the customers name yes good
13 ... < 0 DM 24 critical account/ other credits existing (not at this bank) car (new) 1199 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 60 none own 2 unskilled - resident 1 none yes bad
14 ... < 0 DM 15 existing credits paid back duly till now car (new) 1403 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 28 none rent 1 skilled employee / official 1 none yes good
15 ... < 0 DM 24 existing credits paid back duly till now radio/television 1282 100 <= ... < 500 DM 1 <= ... < 4 years 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 32 none own 1 unskilled - resident 1 none yes bad
16 no checking account 24 critical account/ other credits existing (not at this bank) radio/television 2424 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 2 skilled employee / official 1 none yes good
17 ... < 0 DM 30 no credits taken/ all credits paid back duly business 8072 unknown/ no savings account ... < 1 year 2 male : divorced/separated none 3 car or other, not in attribute Savings account/bonds 25 bank own 3 skilled employee / official 1 none yes good
18 0 <= ... < 200 DM 24 existing credits paid back duly till now car (used) 12579 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 2 unknown / no property 44 none for free 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes bad
19 no checking account 24 existing credits paid back duly till now radio/television 3430 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 31 none own 1 skilled employee / official 2 yes, registered under the customers name yes good

# 1.2、统计所有变量类型、个数等信息

1. <class 'pandas.core.frame.DataFrame'>
2. RangeIndex: 1000 entries, 0 to 999
3. Data columns (total 21 columns):
4.  #   Column                                                    Non-Null Count  Dtype   
5. ---  ------                                                    --------------  -----   
6. 0   status.of.existing.checking.account                       1000 non-null   category
7. 1   duration.in.month                                         1000 non-null   int64   
8. 2   credit.history                                            1000 non-null   category
9. 3   purpose                                                   1000 non-null   object
10. 4   credit.amount                                             1000 non-null   int64   
11. 5   savings.account.and.bonds                                 1000 non-null   category
12. 6   present.employment.since                                  1000 non-null   category
13. 7   installment.rate.in.percentage.of.disposable.income       1000 non-null   int64   
14. 8   personal.status.and.sex                                   1000 non-null   category
15. 9   other.debtors.or.guarantors                               1000 non-null   category
16. 10  present.residence.since                                   1000 non-null   int64   
17. 11  property                                                  1000 non-null   category
18. 12  age.in.years                                              1000 non-null   int64   
19. 13  other.installment.plans                                   1000 non-null   category
20. 14  housing                                                   1000 non-null   category
21. 15  number.of.existing.credits.at.this.bank                   1000 non-null   int64   
22. 16  job                                                       1000 non-null   category
23. 17  number.of.people.being.liable.to.provide.maintenance.for  1000 non-null   int64   
24. 18  telephone                                                 1000 non-null   category
25. 19  foreign.worker                                            1000 non-null   category
26. 20  creditability                                             1000 non-null   object
27. dtypes: category(12), int64(7), object(2)
28. memory usage: 84.0+ KB

# 2、数据预处理

# 2.1、变量筛选

#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y

1. var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95,  
2.                identical_limit=0.95, var_rm=None, var_kp=None, 
3.                return_rm_reason=False, positive='bad|1')
4. '''
5. 函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。
6. 体参数如下:可跳到该函数查询
7. varrm:可设置强制保留的变量,默认为空;
8. varkp:可设置强制剔除的变量,默认为空;
9. return_rm_reason:可设置是否返回剔除原因,默认为不返回(False);
10. positive:可设置坏样本对应的值,默认为“bad|1”。
11. '''
age.in.years other.debtors.or.guarantors savings.account.and.bonds credit.amount installment.rate.in.percentage.of.disposable.income status.of.existing.checking.account credit.history present.employment.since purpose housing property other.installment.plans duration.in.month creditability
0 67 none unknown/ no savings account 1169 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own real estate none 6 0
1 22 none ... < 100 DM 5951 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own real estate none 48 1
2 49 none ... < 100 DM 2096 2 no checking account critical account/ other credits existing (not at this bank) 4 <= ... < 7 years education own real estate none 12 0
3 45 guarantor ... < 100 DM 7882 2 ... < 0 DM existing credits paid back duly till now 4 <= ... < 7 years furniture/equipment for free building society savings agreement/ life insurance none 42 0
4 53 none ... < 100 DM 4870 3 ... < 0 DM delay in paying off in the past 1 <= ... < 4 years car (new) for free unknown / no property none 24 1
5 35 none unknown/ no savings account 9055 2 no checking account existing credits paid back duly till now 1 <= ... < 4 years education for free unknown / no property none 36 0
6 53 none 500 <= ... < 1000 DM 2835 3 no checking account existing credits paid back duly till now ... >= 7 years furniture/equipment own building society savings agreement/ life insurance none 24 0
7 35 none ... < 100 DM 6948 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years car (used) rent car or other, not in attribute Savings account/bonds none 36 0
8 61 none ... >= 1000 DM 3059 2 no checking account existing credits paid back duly till now 4 <= ... < 7 years radio/television own real estate none 12 0
9 28 none ... < 100 DM 5234 4 0 <= ... < 200 DM critical account/ other credits existing (not at this bank) unemployed car (new) own car or other, not in attribute Savings account/bonds none 30 1
10 25 none ... < 100 DM 1295 3 0 <= ... < 200 DM existing credits paid back duly till now ... < 1 year car (new) rent car or other, not in attribute Savings account/bonds none 12 1
11 24 none ... < 100 DM 4308 3 ... < 0 DM existing credits paid back duly till now ... < 1 year business rent building society savings agreement/ life insurance none 48 1
12 22 none ... < 100 DM 1567 1 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 12 0
13 60 none ... < 100 DM 1199 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years car (new) own car or other, not in attribute Savings account/bonds none 24 1
14 28 none ... < 100 DM 1403 2 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years car (new) rent car or other, not in attribute Savings account/bonds none 15 0
15 32 none 100 <= ... < 500 DM 1282 4 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 24 1
16 53 none unknown/ no savings account 2424 4 no checking account critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own building society savings agreement/ life insurance none 24 0
17 25 none unknown/ no savings account 8072 2 ... < 0 DM no credits taken/ all credits paid back duly ... < 1 year business own car or other, not in attribute Savings account/bonds bank 30 0
18 44 none ... < 100 DM 12579 4 0 <= ... < 200 DM existing credits paid back duly till now ... >= 7 years car (used) for free unknown / no property none 24 1
19 31 none 500 <= ... < 1000 DM 3430 3 no checking account existing credits paid back duly till now ... >= 7 years radio/television own car or other, not in attribute Savings account/bonds none 24 0

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

1. woebin(dt, y, x=None, 
2.            var_skip=None, breaks_list=None, special_values=None, 
3.            stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8, 
4. # min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8, 
5.            positive="bad|1", no_cores=None, print_step=0, method="tree",
6.            ignore_const_cols=True, ignore_datetime_cols=True, 
7.            check_cate_num=True, replace_blank=True, 
8.            save_breaks_list=None, **kwargs)
9. '''
10. 函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。
11. 具体参数如下:可跳到该函数查询
12. var_skip: 设置需要跳过分箱操作的变量;
13. breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理;
14. special_values: 设置需要单独分箱的值,默认为空;
15. count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05;
16. stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1;
17. bin_num_limit: 该参数为整数,代表最大分箱数。
18. positive: 指定样本中正样本对应的标签,默认为"bad|1";
19. no_cores: 设置用于并行计算的 CPU 数目;
20. print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息;
21. method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree";
22. ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列;
23. ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列;
24. check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度;
25. replace_blank: 设置是否将空值填为None,默认为True。
26. '''

data_df_woebin['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,26.0) 190 0.19 110 80 0.421052632 0.528844129 0.057921024 0.130498542 26 FALSE
1 age.in.years [26.0,28.0) 101 0.101 74 27 0.267326733 -0.160930367 0.002528906 0.130498542 28 FALSE
2 age.in.years [28.0,35.0) 257 0.257 172 85 0.3307393 0.14245464 0.005359008 0.130498542 35 FALSE
3 age.in.years [35.0,37.0) 79 0.079 67 12 0.151898734 -0.872488109 0.048610052 0.130498542 37 FALSE
4 age.in.years [37.0,inf) 373 0.373 277 96 0.257372654 -0.212371454 0.016079553 0.130498542 inf FALSE

# T2、手动分箱—利用自定义breaks_list参数即可

data_df_woebin_DIY['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,25.0) 149 0.149 88 61 0.409395973 0.48083491 0.037321948 0.086291678 25 FALSE
1 age.in.years [25.0,35.0) 399 0.399 268 131 0.328320802 0.131508203 0.007076394 0.086291678 35 FALSE
2 age.in.years [35.0,45.0) 251 0.251 193 58 0.231075697 -0.354949318 0.029241063 0.086291678 45 FALSE
3 age.in.years [45.0,inf) 201 0.201 151 50 0.248756219 -0.257958971 0.012652273 0.086291678 inf FALSE

# 2.3、分析变量分箱后可视化—观察是否存在单调性

对各变量分箱的count distribution和bad probability进行可视化

# 2.4、对变量执行woe分箱变换

creditability savings.account.and.bonds_woe housing_woe age.in.years_woe other.debtors.or.guarantors_woe purpose_woe credit.amount_woe credit.history_woe installment.rate.in.percentage.of.disposable.income_woe other.installment.plans_woe present.employment.since_woe property_woe status.of.existing.checking.account_woe duration.in.month_woe
0 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 -0.461034959 0.614203978 -1.312186389
1 1 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 -0.461034959 0.614203978 1.134979933
2 0 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 -0.733740578 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
3 0 0.271357844 0.472604411 -0.257958971 0.005115101 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 -0.394415272 0.028573372 0.614203978 0.524524468
4 1 0.271357844 0.472604411 -0.257958971 -0.000525072 0.279920067 0.390539458 0.085157808 -0.064538521 -0.121178625 0.032103245 0.586082361 0.614203978 0.108688306
5 0 -0.762140052 0.472604411 -0.354949318 -0.000525072 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.586082361 -1.176263223 0.524524468
6 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
7 0 0.271357844 0.40444522 -0.354949318 -0.000525072 -0.805625164 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 0.524524468
8 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
9 1 0.271357844 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 -0.733740578 0.157300289 -0.121178625 0.431137463 0.034191365 0.614203978 0.108688306
10 1 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 0.033661283 0.088318617 -0.064538521 -0.121178625 0.431137463 0.034191365 0.614203978 -0.346624608
11 1 0.271357844 0.40444522 0.48083491 -0.000525072 0.279920067 0.390539458 0.088318617 -0.064538521 -0.121178625 0.431137463 0.028573372 0.614203978 1.134979933
12 0 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
13 1 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.034191365 0.614203978 0.108688306
14 0 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
15 1 0.13955188 -0.194156014 0.131508203 -0.000525072 -0.410062817 0.033661283 0.088318617 0.157300289 -0.121178625 0.032103245 0.034191365 0.614203978 0.108688306
16 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
17 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 1.234070835 -0.190472769 0.477550835 0.431137463 0.034191365 0.614203978 0.108688306
18 1 0.271357844 0.472604411 -0.354949318 -0.000525072 -0.805625164 1.170071253 0.088318617 0.157300289 -0.121178625 -0.235566071 0.586082361 0.614203978 0.108688306
19 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.034191365 -1.176263223 0.108688306

# 3、模型训练

# 3.1、切分数据集

train2woe输出如下所示

age.in.years_woe credit.amount_woe credit.history_woe creditability duration.in.month_woe housing_woe installment.rate.in.percentage.of.disposable.income_woe other.debtors.or.guarantors_woe other.installment.plans_woe present.employment.since_woe property_woe purpose_woe savings.account.and.bonds_woe status.of.existing.checking.account_woe
0 -0.257958971 0.033661283 -0.733740578 0 -1.312186389 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 -0.461034959 -0.410062817 -0.762140052 0.614203978
1 0.48083491 0.390539458 0.088318617 1 1.134979933 -0.194156014 -0.190472769 -0.000525072 -0.121178625 0.032103245 -0.461034959 -0.410062817 0.271357844 0.614203978
2 -0.257958971 -0.258307464 -0.733740578 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 0.279920067 0.271357844 -1.176263223
6 -0.257958971 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.028573372 0.279920067 -0.762140052 -1.176263223
7 -0.354949318 0.390539458 0.088318617 0 0.524524468 0.40444522 -0.190472769 -0.000525072 -0.121178625 0.032103245 0.034191365 -0.805625164 0.271357844 0.614203978
8 -0.257958971 -0.258307464 0.088318617 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 -0.410062817 -0.762140052 -1.176263223
11 0.48083491 0.390539458 0.088318617 1 1.134979933 0.40444522 -0.064538521 -0.000525072 -0.121178625 0.431137463 0.028573372 0.279920067 0.271357844 0.614203978
13 -0.257958971 0.033661283 -0.733740578 1 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.034191365 0.279920067 0.271357844 0.614203978
16 -0.257958971 -0.258307464 -0.733740578 0 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.028573372 -0.410062817 -0.762140052 -1.176263223
18 -0.354949318 1.170071253 0.088318617 1 0.108688306 0.472604411 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.586082361 -0.805625164 0.271357844 0.614203978
19 0.131508203 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.034191365 -0.410062817 -0.762140052 -1.176263223

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811

 0.         0.23090027 0.7965086  0.22792681 1.07066195 0.83836441

 0.72843684]]

intercept_: [-0.83437247]

# 3.4、模型评估

利用perf_eva函数进行评估

1. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], 
2.  show_plot=True, positive="bad|1", seed=186)
3. '''
4. 函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr
5. perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186)
6. perf_eva()函数可以从
7. '''

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分

1. scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False)
2. '''
3. 函数功能:概率进行映射,转换成评分卡得分
4. 具体参数如下
5. bins:分箱信息。woebin()返回的结果。
6. model:模型对象。
7. points0:基础分,默认为600。 odds:好坏比,默认为1:19
8. pdo:比率翻番的倍数,默认为50。
9. basepoints_eq0:如果为True,则将基础分分散到每个变量中。
10. '''

print('card_dict_age.in.years \n',card_dict['age.in.years'])

print('card_dict_credit.amount \n',card_dict['credit.amount'])

print('card_dict_credit.historyt \n',card_dict['credit.history'])

print('card_dict_duration.in.month \n',card_dict['duration.in.month'])

print('card_dict_housing \n',card_dict['housing'])

1. card_dict_age.in.years 
2.          variable          bin  points
3. 10  age.in.years  [-inf,25.0)   -12.0
4. 11  age.in.years  [25.0,35.0)    -3.0
5. 12  age.in.years  [35.0,45.0)     9.0
6. 13  age.in.years   [45.0,inf)     6.0
7. card_dict_credit.amount 
8.           variable              bin  points
9. 31  credit.amount    [-inf,1400.0)    -2.0
10. 32  credit.amount  [1400.0,1800.0)    41.0
11. 33  credit.amount  [1800.0,4000.0)    15.0
12. 34  credit.amount  [4000.0,9200.0)   -22.0
13. 35  credit.amount     [9200.0,inf)   -66.0
14. card_dict_credit.historyt 
15.            variable                                                bin  points
16. 17  credit.history  no credits taken/ all credits paid back duly%,...   -51.0
17. 18  credit.history           existing credits paid back duly till now    -4.0
18. 19  credit.history                    delay in paying off in the past    -4.0
19. 20  credit.history  critical account/ other credits existing (not ...    30.0
20. card_dict_duration.in.month 
21.               variable          bin  points
22. 23  duration.in.month   [-inf,8.0)    85.0
23. 24  duration.in.month   [8.0,16.0)    22.0
24. 25  duration.in.month  [16.0,34.0)    -7.0
25. 26  duration.in.month  [34.0,44.0)   -34.0
26. 27  duration.in.month   [44.0,inf)   -74.0
27. card_dict_housing 
28.     variable       bin  points
29. 42  housing      rent   -20.0
30. 43  housing       own    10.0
31. 44  housing  for free   -23.0

# 4.2、线上模型评估—评分稳定性评估PSI

# 利用scorecard_ply()函数计算train和test数据集的信用分数

1. scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True, 
2.  var_kp=None):
3. '''
4. 函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。
5.     
6. dt:原始数据
7. card: 从`scorecard`生成的记分卡。
8. only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。
9. print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。
10. replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。
11. var_kp:强制保留变量的名称,如id列。 默认为无。
12. '''


相关文章
|
12天前
|
人工智能 编解码 算法
DeepSeek加持的通义灵码2.0 AI程序员实战案例:助力嵌入式开发中的算法生成革新
本文介绍了通义灵码2.0 AI程序员在嵌入式开发中的实战应用。通过安装VS Code插件并登录阿里云账号,用户可切换至DeepSeek V3模型,利用其强大的代码生成能力。实战案例中,AI程序员根据自然语言描述快速生成了C语言的base64编解码算法,包括源代码、头文件、测试代码和CMake编译脚本。即使在编译错误和需求迭代的情况下,AI程序员也能迅速分析问题并修复代码,最终成功实现功能。作者认为,通义灵码2.0显著提升了开发效率,打破了编程语言限制,是AI编程从辅助工具向工程级协同开发转变的重要标志,值得开发者广泛使用。
7862 67
DeepSeek加持的通义灵码2.0 AI程序员实战案例:助力嵌入式开发中的算法生成革新
|
3天前
|
算法 数据挖掘 数据安全/隐私保护
基于CS模型和CV模型的多目标协同滤波跟踪算法matlab仿真
本项目基于CS模型和CV模型的多目标协同滤波跟踪算法,旨在提高复杂场景下多个移动目标的跟踪精度和鲁棒性。通过融合目标间的关系和数据关联性,优化跟踪结果。程序在MATLAB2022A上运行,展示了真实轨迹与滤波轨迹的对比、位置及速度误差均值和均方误差等关键指标。核心代码包括对目标轨迹、速度及误差的详细绘图分析,验证了算法的有效性。该算法结合CS模型的初步聚类和CV模型的投票机制,增强了目标状态估计的准确性,尤其适用于遮挡、重叠和快速运动等复杂场景。
|
23天前
|
机器学习/深度学习 算法
扩散模型=进化算法!生物学大佬用数学揭示本质
在机器学习与生物学交叉领域,Tufts和Harvard大学研究人员揭示了扩散模型与进化算法的深刻联系。研究表明,扩散模型本质上是一种进化算法,通过逐步去噪生成数据点,类似于进化中的变异和选择机制。这一发现不仅在理论上具有重要意义,还提出了扩散进化方法,能够高效识别多解、处理高维复杂参数空间,并显著减少计算步骤,为图像生成、视频合成及神经网络优化等应用带来广泛潜力。论文地址:https://arxiv.org/pdf/2410.02543。
39 21
|
1月前
|
人工智能 算法 搜索推荐
单纯接入第三方模型就无需算法备案了么?
随着人工智能的发展,企业接入第三方模型提升业务能力的现象日益普遍,但算法备案问题引发诸多讨论。根据相关法规,无论使用自研或第三方模型,只要涉及向中国境内公众提供算法推荐服务,企业均需履行备案义务。这不仅因为服务性质未变,风险依然存在,也符合监管要求。备案内容涵盖模型基本信息、算法优化目标等,且需动态管理。未备案可能面临法律和运营风险。建议企业提前规划、合规管理和积极沟通,确保合法合规运营。
|
6天前
|
人工智能 编解码 算法
使用 PAI-DSW x Free Prompt Editing图像编辑算法,开发个人AIGC绘图小助理
使用 PAI-DSW x Free Prompt Editing图像编辑算法,开发个人AIGC绘图小助理
|
2月前
|
机器学习/深度学习 人工智能 算法
机器学习算法的优化与改进:提升模型性能的策略与方法
机器学习算法的优化与改进:提升模型性能的策略与方法
382 13
机器学习算法的优化与改进:提升模型性能的策略与方法
|
11天前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
本研究基于MATLAB 2022a,使用GRU网络对QAM调制信号进行检测。QAM是一种高效调制技术,广泛应用于现代通信系统。传统方法在复杂环境下性能下降,而GRU通过门控机制有效提取时间序列特征,实现16QAM、32QAM、64QAM、128QAM的准确检测。仿真结果显示,GRU在低SNR下表现优异,且训练速度快,参数少。核心程序包括模型预测、误检率和漏检率计算,并绘制准确率图。
83 65
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
|
2天前
|
机器学习/深度学习 数据采集 算法
基于PSO粒子群优化的CNN-LSTM-SAM网络时间序列回归预测算法matlab仿真
本项目展示了基于PSO优化的CNN-LSTM-SAM网络时间序列预测算法。使用Matlab2022a开发,完整代码含中文注释及操作视频。算法结合卷积层提取局部特征、LSTM处理长期依赖、自注意力机制捕捉全局特征,通过粒子群优化提升预测精度。适用于金融市场、气象预报等领域,提供高效准确的预测结果。
|
2天前
|
算法 数据安全/隐私保护
基于Big-Bang-Big-Crunch(BBBC)算法的目标函数最小值计算matlab仿真
该程序基于Big-Bang-Big-Crunch (BBBC)算法,在MATLAB2022A中实现目标函数最小值的计算与仿真。通过模拟宇宙大爆炸和大收缩过程,算法在解空间中搜索最优解。程序初始化随机解集,经过扩张和收缩阶段逐步逼近全局最优解,并记录每次迭代的最佳适应度。最终输出最佳解及其对应的目标函数最小值,并绘制收敛曲线展示优化过程。 核心代码实现了主循环、粒子位置更新、适应度评估及最优解更新等功能。程序运行后无水印,提供清晰的结果展示。
|
1天前
|
算法 数据安全/隐私保护
基于Adaboost的数据分类算法matlab仿真
本程序基于Adaboost算法进行数据分类的Matlab仿真,对比线性与非线性分类效果。使用MATLAB2022A版本运行,展示完整无水印结果。AdaBoost通过迭代训练弱分类器并赋予错分样本更高权重,最终组合成强分类器,显著提升预测准确率。随着弱分类器数量增加,训练误差逐渐减小。核心代码实现详细,适合研究和教学使用。

热门文章

最新文章