案例 | lightgbm算法优化-不平衡二分类问题(附代码)

简介:

本案例使用的数据为kaggle中“Santander Customer Satisfaction”比赛的数据。此案例为不平衡二分类问题,目标为最大化auc值(ROC曲线下方面积)。目前此比赛已经结束。

竞赛题目链接为:

https://www.kaggle.com/c/santander-customer-satisfaction 

2.建模思路

本文档采用微软开源的lightgbm算法进行分类,运行速度极快。具体步骤为:

  • 读取数据;
  • 并行运算:由于lightgbm包可以通过设置相应参数进行并行运算,因此不再调用doParallel与foreach包进行并行运算;
  • 特征选择:使用mlr包提取了99%的chi.square;
  • 调参:逐步调试lgb.cv函数的参数,并多次调试,直到满意为止;
  • 预测结果:用调试好的参数值构建lightgbm模型,输出预测结果;本案例所用程序输出结果的ROC值为0.833386,已超过Private Leaderboard排名第一的结果(0.829072)。

3.lightgbm算法

由于lightgbm算法没有给出具体的数学公式,因此此处不再介绍,如有需要,请查看github项目网址。

lightgbm算法具体介绍网址:

https://github.com/Microsoft/LightGBM

读取数据

 

options(java.parameters = "-Xmx8g") ## 特征选择时使用,但是需要在加载包之前设置

 

library(readr)
lgb_tr1 <- read_csv("C:/Users/Administrator/Documents/kaggle/scs_lgb/train.csv")
lgb_te1 <- read_csv("C:/Users/Administrator/Documents/kaggle/scs_lgb/test.csv")

数据探索

1.设置并行运算

 

library(dplyr)
library(mlr)
library(parallelMap)
parallelStartSocket(2)

2.数据各列初步探索

 

summarizeColumns(lgb_tr1) %>% View()

3.处理缺失值

impute missing values by mean and mode
 

imp_tr1 <- impute(
   as.data.frame(lgb_tr1),
   classes = list(
       integer = imputeMean(),
       numeric = imputeMean()
   )
)
imp_te1 <- impute(
   as.data.frame(lgb_te1),
   classes = list(
       integer = imputeMean(),
       numeric = imputeMean()
   )
)

处理缺失值后

 

summarizeColumns(imp_tr1$data) %>% View()

4.观察训练数据类别的比例–数据类别不平衡

 

table(lgb_tr1$TARGET)

5.剔除数据集中的常数列

 

lgb_tr2 <- removeConstantFeatures(imp_tr1$data)
lgb_te2 <- removeConstantFeatures(imp_te1$data)

6.保留训练数据集与测试数据及相同的列

 

tr2_name <- data.frame(tr2_name = colnames(lgb_tr2))
te2_name <- data.frame(te2_name = colnames(lgb_te2))
tr2_name_inner <- tr2_name %>%
   inner_join(te2_name, by = c('tr2_name' = 'te2_name'))
TARGET = data.frame(TARGET = lgb_tr2$TARGET)
lgb_tr2 <- lgb_tr2[, c(tr2_name_inner$tr2_name[2:dim(tr2_name_inner)[1]])]
lgb_te2 <- lgb_te2[, c(tr2_name_inner$tr2_name[2:dim(tr2_name_inner)[1]])]
lgb_tr2 <- cbind(lgb_tr2, TARGET)

注:

1)由于本次使用lightgbm算法,故而不对数据进行标准化处理;

2)lightgbm算法运行效率极高,1GB内不进行特征筛选也可以运行的极快,但是此处进行特征筛选,以进一步加快运行速率;

3)本案例直接进行特征筛选,未生成衍生变量,原因为:不知特征实际意义,不好随机生成。

特征筛选–卡方检验

 

library(lightgbm)

1.试算最大权重值程序,后面将继续优化

 

grid_search <- expand.grid(
   weight = seq(1, 30, 2)
   ## table(lgb_tr1$TARGET)[1] / table(lgb_tr1$TARGET)[2] = 24.27261
   ## 故而设定weight在[1, 30]之间
)

lgb_rate_1 <- numeric(length = nrow(grid_search))

 

set.seed(0)

 

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr2$TARGET * i + 1) / sum(lgb_tr2$TARGET * i + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr2[, 1:300]),
       label = lgb_tr2$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc'
   )
   # 交叉验证
   lgb_tr2_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       learning_rate = .1,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   lgb_rate_1[i] <- unlist(lgb_tr2_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr2_mod$record_evals$valid$auc$eval))]
}

 

library(ggplot2)
grid_search$perf <- lgb_rate_1
ggplot(grid_search,aes(x = weight, y = perf)) +
   geom_point()

从此图可知auc值受权重影响不大,在weight=5时达到最大。

3.特征选择

1)特征选择

 

lgb_tr2$TARGET <- factor(lgb_tr2$TARGET)
lgb.task <- makeClassifTask(data = lgb_tr2, target = 'TARGET')
lgb.task.smote <- oversample(lgb.task, rate = 5)
fv_time <- system.time(
   fv <- generateFilterValuesData(
       lgb.task.smote,
       method = c('chi.squared')
       ## 此处可以使用信息增益/卡方检验的方法,但是不建议使用随机森林方法,效率极低
       ## 如果有兴趣,也可以尝试IV值方法筛选
       ## 特征工程决定目标值(此处为auc)的上限,可以把特征筛选方法作为超参数处理
   )
)

2)制图查看

 

# plotFilterValues(fv)
plotFilterValuesGGVIS(fv)

3)提取99%的chi.squaredlightgbm算法效率极高,因此可以取更多的变量)

注:提取的X%的chi.squared中的X可以作为超参数处理。

 

fv_data2 <- fv$data %>%
   arrange(desc(chi.squared)) %>%
   mutate(chi_gain_cul = cumsum(chi.squared) / sum(chi.squared))

fv_data2_filter <- fv_data2 %>% filter(chi_gain_cul <= 0.99)
dim(fv_data2_filter) ## 减少了一半的自变量
fv_feature <- fv_data2_filter$name
lgb_tr3 <- lgb_tr2[, c(fv_feature, 'TARGET')]
lgb_te3 <- lgb_te2[, fv_feature]

4)写出数据

 

write_csv(lgb_tr3, 'C:/users/Administrator/Documents/kaggle/scs_lgb/lgb_tr3_chi.csv')
write_csv(lgb_te3, 'C:/users/Administrator/Documents/kaggle/scs_lgb/lgb_te3_chi.csv')

算法

 

lgb_tr <- rxImport('C:/Users/Administrator/Documents/kaggle/scs_lgb/lgb_tr3_chi.csv')
lgb_te <- rxImport('C:/Users/Administrator/Documents/kaggle/scs_lgb/lgb_te3_chi.csv')
## 建议lgb_te数据在预测时再读取,以节约内存
library(lightgbm)

1.调试weight参数

 

grid_search <- expand.grid(
   weight = 1:30
)

perf_weight_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * i + 1) / sum(lgb_tr$TARGET * i + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc'
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       learning_rate = .1,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_weight_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

library(ggplot2)
grid_search$perf <- perf_weight_1
ggplot(grid_search,aes(x = weight, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在weight=4时达到最大,呈递减趋势。

2.调试learning_rate参数

 

grid_search <- expand.grid(
   learning_rate = 2 ^ (-(8:1))
)

perf_learning_rate_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_learning_rate_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_learning_rate_1
ggplot(grid_search,aes(x = learning_rate, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在learning_rate=2^(-5) 时达到最大,但是 2^(-(6:3)) 区别极小,故取learning_rate = .125,提高运行速度。

3.调试num_leaves参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = seq(50, 800, 50)
)

perf_num_leaves_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_num_leaves_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_num_leaves_1
ggplot(grid_search,aes(x = num_leaves, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在num_leaves=650时达到最大。

4.调试min_data_in_leaf参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   min_data_in_leaf = 2 ^ (1:7)
)

perf_min_data_in_leaf_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       min_data_in_leaf = grid_search[i, 'min_data_in_leaf']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_min_data_in_leaf_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_min_data_in_leaf_1
ggplot(grid_search,aes(x = min_data_in_leaf, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值对min_data_in_leaf不敏感,因此不做调整。

5.调试max_bin参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin = 2 ^ (5:10)
)

perf_max_bin_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_max_bin_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_max_bin_1
ggplot(grid_search,aes(x = max_bin, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在max_bin=2^10 时达到最大,需要再次微调max_bin值。

6.微调max_bin参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin = 100 * (6:15)
)

perf_max_bin_2 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_max_bin_2[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_max_bin_2
ggplot(grid_search,aes(x = max_bin, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在max_bin=1000时达到最大。

7.调试min_data_in_bin参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 2 ^ (1:9)
   
)

perf_min_data_in_bin_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_min_data_in_bin_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_min_data_in_bin_1
ggplot(grid_search,aes(x = min_data_in_bin, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在min_data_in_bin=8时达到最大,但是变化极其细微,因此不做调整。

8.调试feature_fraction参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = seq(.5, 1, .02)
   
)

perf_feature_fraction_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_feature_fraction_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_feature_fraction_1
ggplot(grid_search,aes(x = feature_fraction, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在feature_fraction=.62时达到最大,feature_fraction在[.60,.62]之间时,auc值保持稳定,表现较好;从.64开始呈下降趋势。

9.调试min_sum_hessian参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = seq(0, .02, .001)
)

perf_min_sum_hessian_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_min_sum_hessian_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_min_sum_hessian_1
ggplot(grid_search,aes(x = min_sum_hessian, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在min_sum_hessian=0.005时达到最大,建议min_sum_hessian取值在[0.002, 0.005]区间,0.005后呈递减趋势。

10.调试lamda参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = seq(0, .01, .002),
   lambda_l2 = seq(0, .01, .002)
)

perf_lamda_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_lamda_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_lamda_1
ggplot(data = grid_search, aes(x = lambda_l1, y = perf)) +
   geom_point() +
   facet_wrap(~ lambda_l2, nrow = 5)

从此图可知建议lambda_l1 = 0, lambda_l2 = 0

11.调试drop_rate参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = seq(0, 1, .1)
)

perf_drop_rate_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_drop_rate_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_drop_rate_1
ggplot(data = grid_search, aes(x = drop_rate, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在drop_rate=0.2时达到最大,在0, .2, .5较好;在[0, 1]变化不大。

12.调试max_drop参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = seq(1, 10, 2)
)

perf_max_drop_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 4 + 1) / sum(lgb_tr$TARGET * 4 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_max_drop_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_max_drop_1
ggplot(data = grid_search, aes(x = max_drop, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在max_drop=5时达到最大,在[1, 10]区间变化较小。

二次调参

1.调试weight参数

 

grid_search <- expand.grid(
   learning_rate = .125,
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_weight_2 <- numeric(length = nrow(grid_search))

for(i in 1:20){
   lgb_weight <- (lgb_tr$TARGET * i + 1) / sum(lgb_tr$TARGET * i + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[1, 'learning_rate'],
       num_leaves = grid_search[1, 'num_leaves'],
       max_bin = grid_search[1, 'max_bin'],
       min_data_in_bin = grid_search[1, 'min_data_in_bin'],
       feature_fraction = grid_search[1, 'feature_fraction'],
       min_sum_hessian = grid_search[1, 'min_sum_hessian'],
       lambda_l1 = grid_search[1, 'lambda_l1'],
       lambda_l2 = grid_search[1, 'lambda_l2'],
       drop_rate = grid_search[1, 'drop_rate'],
       max_drop = grid_search[1, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       learning_rate = .1,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_weight_2[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

library(ggplot2)
ggplot(data.frame(num = 1:length(perf_weight_2), perf = perf_weight_2), aes(x = num, y = perf)) +
   geom_point() +
   geom_smooth()

从此图可知auc值在weight>=3时auc趋于稳定, weight=7 the max

2.调试learning_rate参数

 

grid_search <- expand.grid(
   learning_rate = seq(.05, .5, .03),
   num_leaves = 650,
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_learning_rate_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_learning_rate_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_learning_rate_1
ggplot(data = grid_search, aes(x = learning_rate, y = perf)) +
   geom_point() +
   geom_smooth()

结论:learning_rate=.11时,auc最大。

3.调试num_leaves参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = seq(100, 800, 50),
   max_bin=1000,
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_num_leaves_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_num_leaves_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_num_leaves_1
ggplot(data = grid_search, aes(x = num_leaves, y = perf)) +
   geom_point() +
   geom_smooth()

结论:num_leaves=200时,auc最大。

4.调试max_bin参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = seq(100, 1500, 100),
   min_data_in_bin = 8,
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_max_bin_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_max_bin_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_max_bin_1
ggplot(data = grid_search, aes(x = max_bin, y = perf)) +
   geom_point() +
   geom_smooth()

结论:max_bin=600时,auc最大;400,800也是可接受值。

5.调试min_data_in_bin参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = 600,
   min_data_in_bin = seq(5, 50, 5),
   feature_fraction = .62,
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_min_data_in_bin_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_min_data_in_bin_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_min_data_in_bin_1
ggplot(data = grid_search, aes(x = min_data_in_bin, y = perf)) +
   geom_point() +
   geom_smooth()

结论:min_data_in_bin=45时,auc最大;其中25是可接受值。

6.调试feature_fraction参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = 600,
   min_data_in_bin = 45,
   feature_fraction = seq(.5, .9, .02),
   min_sum_hessian = .005,
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_feature_fraction_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_feature_fraction_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_feature_fraction_1
ggplot(data = grid_search, aes(x = feature_fraction, y = perf)) +
   geom_point() +
   geom_smooth()

结论:feature_fraction=.54时,auc最大, .56, .58时也较好。

7.调试min_sum_hessian参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = 600,
   min_data_in_bin = 45,
   feature_fraction = .54,
   min_sum_hessian = seq(.001, .008, .0005),
   lambda_l1 = 0,
   lambda_l2 = 0,
   drop_rate = .2,
   max_drop = 5
)

perf_min_sum_hessian_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_min_sum_hessian_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_min_sum_hessian_1
ggplot(data = grid_search, aes(x = min_sum_hessian, y = perf)) +
   geom_point() +
   geom_smooth()

结论:min_sum_hessian=0.0065时auc取得最大值,取min_sum_hessian=0.003,0.0055时可接受。

8.调试lambda参数

 

grid_search <- expand.grid(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = 600,
   min_data_in_bin = 45,
   feature_fraction = .54,
   min_sum_hessian = 0.0065,
   lambda_l1 = seq(0, .001, .0002),
   lambda_l2 = seq(0, .001, .0002),
   drop_rate = .2,
   max_drop = 5
)

perf_lambda_1 <- numeric(length = nrow(grid_search))

for(i in 1:nrow(grid_search)){
   lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)
   
   lgb_train <- lgb.Dataset(
       data = data.matrix(lgb_tr[, 1:148]),
       label = lgb_tr$TARGET,
       free_raw_data = FALSE,
       weight = lgb_weight
   )
   
   # 参数
   params <- list(
       objective = 'binary',
       metric = 'auc',
       learning_rate = grid_search[i, 'learning_rate'],
       num_leaves = grid_search[i, 'num_leaves'],
       max_bin = grid_search[i, 'max_bin'],
       min_data_in_bin = grid_search[i, 'min_data_in_bin'],
       feature_fraction = grid_search[i, 'feature_fraction'],
       min_sum_hessian = grid_search[i, 'min_sum_hessian'],
       lambda_l1 = grid_search[i, 'lambda_l1'],
       lambda_l2 = grid_search[i, 'lambda_l2'],
       drop_rate = grid_search[i, 'drop_rate'],
       max_drop = grid_search[i, 'max_drop']
   )
   # 交叉验证
   lgb_tr_mod <- lgb.cv(
       params,
       data = lgb_train,
       nrounds = 300,
       stratified = TRUE,
       nfold = 10,
       num_threads = 2,
       early_stopping_rounds = 10
   )
   perf_lambda_1[i] <- unlist(lgb_tr_mod$record_evals$valid$auc$eval)[length(unlist(lgb_tr_mod$record_evals$valid$auc$eval))]
}

grid_search$perf <- perf_lambda_1
ggplot(data = grid_search, aes(x = lambda_l1, y = perf)) +
   geom_point() +
   facet_wrap(~ lambda_l2, nrow = 5)

结论:lambda与auc整体呈负相关,取lambda_l1=.0002, lambda_l2 = .0004

9.调试drop_rate参数

ab1b82f0e505a893a05dc0e9c8cf164515dafc71

结论:drop_rate=.4时取到最大值,.15, .25可接受。

10.调试max_drop参数

a3302d1a981a45cd5e3512865ad4370e4cd69ee6

结论:drop_rate=.4时取到最大值,.15, .25可接受。

预测

1.权重

 

lgb_weight <- (lgb_tr$TARGET * 7 + 1) / sum(lgb_tr$TARGET * 7 + 1)

2.训练数据集

 

lgb_train <- lgb.Dataset(
   data = data.matrix(lgb_tr[, 1:148]),
   label = lgb_tr$TARGET,
   free_raw_data = FALSE,
   weight = lgb_weight
)

3.训练

 

# 参数
params <- list(
   learning_rate = .11,
   num_leaves = 200,
   max_bin = 600,
   min_data_in_bin = 45,
   feature_fraction = .54,
   min_sum_hessian = 0.0065,
   lambda_l1 = .0002,
   lambda_l2 = .0004,
   drop_rate = .4,
   max_drop = 14
)

 

# 模型
lgb_mod <- lightgbm(
   params = params,
   data = lgb_train,
   nrounds = 300,
   early_stopping_rounds = 10,
   num_threads = 2
)

 

# 预测
lgb.pred <- predict(lgb_mod, data.matrix(lgb_te))

4.结果

 

lgb.pred2 <- matrix(unlist(lgb.pred), ncol = 1)
lgb.pred3 <- data.frame(lgb.pred2)

5.输出

 

write.csv(lgb.pred3, "C:/Users/Administrator/Documents/kaggle/scs_lgb/lgb.pred1_tr.csv")

注: 此处给在校读书的朋友一些建议:

1.在学校学习机器学习算法时,测试所用数据量一般较少,因此可以尝试大多数算法,大多数的R函数,例如测试随机森林算法时,可以选择randomforest包,如果数据量稍微增多,可以设置并行运算,但是如果数据量达到GB级别,并行运算randomforest包也处理不了了,并且内存会溢出;建议使用专业版R中的函数;

2.学校学习主要针对理论进行学习,测试数据一般较为干净,实际数据结构一般更为复杂一些。


原文发布时间为:2018-03-11

本文作者:苏高生

本文来自云栖社区合作伙伴“大数据文摘”,了解相关信息可以关注“大数据文摘”微信公众号

相关文章
|
4天前
|
机器学习/深度学习 前端开发 算法
婚恋交友系统平台 相亲交友平台系统 婚恋交友系统APP 婚恋系统源码 婚恋交友平台开发流程 婚恋交友系统架构设计 婚恋交友系统前端/后端开发 婚恋交友系统匹配推荐算法优化
婚恋交友系统平台通过线上互动帮助单身男女找到合适伴侣,提供用户注册、个人资料填写、匹配推荐、实时聊天、社区互动等功能。开发流程包括需求分析、技术选型、系统架构设计、功能实现、测试优化和上线运维。匹配推荐算法优化是核心,通过用户行为数据分析和机器学习提高匹配准确性。
26 3
|
4天前
|
算法
PAI下面的gbdt、xgboost、ps-smart 算法如何优化?
设置gbdt 、xgboost等算法的样本和特征的采样率
21 2
|
19天前
|
算法
基于GA遗传算法的PID控制器参数优化matlab建模与仿真
本项目基于遗传算法(GA)优化PID控制器参数,通过空间状态方程构建控制对象,自定义GA的选择、交叉、变异过程,以提高PID控制性能。与使用通用GA工具箱相比,此方法更灵活、针对性强。MATLAB2022A环境下测试,展示了GA优化前后PID控制效果的显著差异。核心代码实现了遗传算法的迭代优化过程,最终通过适应度函数评估并选择了最优PID参数,显著提升了系统响应速度和稳定性。
|
16天前
|
算法
基于WOA鲸鱼优化的购售电收益与风险评估算法matlab仿真
本研究提出了一种基于鲸鱼优化算法(WOA)的购售电收益与风险评估算法。通过将售电公司购售电收益风险计算公式作为WOA的目标函数,经过迭代优化计算出最优购电策略。实验结果表明,在迭代次数超过10次后,风险价值收益优化值达到1715.1万元的最大值。WOA还确定了中长期市场、现货市场及可再生能源等不同市场的最优购电量,验证了算法的有效性。核心程序使用MATLAB2022a实现,通过多次迭代优化,实现了售电公司收益最大化和风险最小化的目标。
|
20天前
|
算法
通过matlab分别对比PSO,反向学习PSO,多策略改进反向学习PSO三种优化算法
本项目使用MATLAB2022A版本,对比分析了PSO、反向学习PSO及多策略改进反向学习PSO三种优化算法的性能,主要通过优化收敛曲线进行直观展示。核心代码实现了标准PSO算法流程,加入反向学习机制及多种改进策略,以提升算法跳出局部最优的能力,增强全局搜索效率。
|
16天前
|
算法
通过matlab对比遗传算法优化前后染色体的变化情况
该程序使用MATLAB2022A实现遗传算法优化染色体的过程,通过迭代选择、交叉和变异操作,提高染色体适应度,优化解的质量,同时保持种群多样性,避免局部最优。代码展示了算法的核心流程,包括适应度计算、选择、交叉、变异等步骤,并通过图表直观展示了优化前后染色体的变化情况。
|
20天前
|
算法
基于大爆炸优化算法的PID控制器参数寻优matlab仿真
本研究基于大爆炸优化算法对PID控制器参数进行寻优,并通过Matlab仿真对比优化前后PID控制效果。使用MATLAB2022a实现核心程序,展示了算法迭代过程及最优PID参数的求解。大爆炸优化算法通过模拟宇宙大爆炸和大收缩过程,在搜索空间中迭代寻找全局最优解,特别适用于PID参数优化,提升控制系统性能。
|
20天前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于贝叶斯优化CNN-GRU网络的数据分类识别算法matlab仿真
本项目展示了使用MATLAB2022a实现的贝叶斯优化、CNN和GRU算法优化效果。优化前后对比显著,完整代码附带中文注释及操作视频。贝叶斯优化适用于黑盒函数,CNN用于时间序列特征提取,GRU改进了RNN的长序列处理能力。
|
22天前
|
并行计算 算法 测试技术
C语言因高效灵活被广泛应用于软件开发。本文探讨了优化C语言程序性能的策略,涵盖算法优化、代码结构优化、内存管理优化、编译器优化、数据结构优化、并行计算优化及性能测试与分析七个方面
C语言因高效灵活被广泛应用于软件开发。本文探讨了优化C语言程序性能的策略,涵盖算法优化、代码结构优化、内存管理优化、编译器优化、数据结构优化、并行计算优化及性能测试与分析七个方面,旨在通过综合策略提升程序性能,满足实际需求。
52 1
|
18天前
|
算法 决策智能
基于遗传优化算法的TSP问题求解matlab仿真
本项目使用遗传算法解决旅行商问题(TSP),目标是在四个城市间找到最短路径。算法通过编码、选择、交叉、变异等步骤,在MATLAB2022A上实现路径优化,最终输出最优路径及距离。
下一篇
DataWorks