基于PaddlePaddle的工业蒸汽预测

简介: 基于PaddlePaddle的工业蒸汽预测

一、工业蒸汽量预测


1.赛题简介


天池新人实战赛是针对数据新人开设的实战练习专场,以经典赛题作为学习场景,提供详尽入门教程,手把手教你学习数据挖掘。天池希望新人赛能成为高校备受热捧的数据实战课程,帮助更多学生掌握数据技能。

1696841696091.jpg


2.赛题背景


火力发电的基本原理是:燃料在燃烧时加热水生成蒸汽,蒸汽压力推动汽轮机旋转,然后汽轮机带动发电机旋转,产生电能。在这一系列的能量转化中,影响发电效率的核心是锅炉的燃烧效率,即燃料燃烧加热水产生高温高压蒸汽。锅炉的燃烧效率的影响因素很多,包括锅炉的可调参数,如燃烧给量,一二次风,引风,返料风,给水水量;以及锅炉的工况,比如锅炉床温、床压,炉膛温度、压力,过热器的温度等。


3.赛题描述


经脱敏后的锅炉传感器采集的数据(采集频率是分钟级别),根据锅炉的工况,预测产生的蒸汽量。 数据说明

数据分成训练数据(train.txt)和测试数据(test.txt),其中字段”V0”-“V37”,这38个字段是作为特征变量,”target”作为目标变量。选手利用训练数据训练出模型,预测测试数据的目标变量,排名结果依据预测结果的MSE(mean square error)。


二、数据处理


1.数据读取


import pandas as pd
df=pd.read_csv("data/data178496/zhengqi_train.txt",sep='\t')
df.head()

.dataframe tbody tr th:only-of-type {         vertical-align: middle;     } .dataframe tbody tr th {     vertical-align: top; } .dataframe thead th {     text-align: right; }

V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
0 0.566 0.016 -0.143 0.407 0.452 -0.901 -1.812 -2.360 -0.436 -2.114 ... 0.136 0.109 -0.615 0.327 -4.627 -4.789 -5.101 -2.608 -3.508 0.175
1 0.968 0.437 0.066 0.566 0.194 -0.893 -1.566 -2.360 0.332 -2.114 ... -0.128 0.124 0.032 0.600 -0.843 0.160 0.364 -0.335 -0.730 0.676
2 1.013 0.568 0.235 0.370 0.112 -0.797 -1.367 -2.360 0.396 -2.114 ... -0.009 0.361 0.277 -0.116 -0.843 0.160 0.364 0.765 -0.589 0.633
3 0.733 0.368 0.283 0.165 0.599 -0.679 -1.200 -2.086 0.403 -2.114 ... 0.015 0.417 0.279 0.603 -0.843 -0.065 0.364 0.333 -0.112 0.206
4 0.684 0.638 0.260 0.209 0.337 -0.454 -1.073 -2.086 0.314 -2.114 ... 0.183 1.078 0.328 0.418 -0.843 -0.215 0.364 -0.280 -0.028 0.384

5 rows × 39 columns

df.isnull().sum()
V0        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
V29       0
V30       0
V31       0
V32       0
V33       0
V34       0
V35       0
V36       0
V37       0
target    0
dtype: int64
import pandas as pd
df_test=pd.read_csv("data/data178496/zhengqi_test.txt",sep='\t')
df_merge=df.append(df_test)
df_merge.head()

.dataframe tbody tr th:only-of-type {         vertical-align: middle;     } .dataframe tbody tr th {     vertical-align: top; } .dataframe thead th {     text-align: right; }

V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
0 0.566 0.016 -0.143 0.407 0.452 -0.901 -1.812 -2.360 -0.436 -2.114 ... 0.136 0.109 -0.615 0.327 -4.627 -4.789 -5.101 -2.608 -3.508 0.175
1 0.968 0.437 0.066 0.566 0.194 -0.893 -1.566 -2.360 0.332 -2.114 ... -0.128 0.124 0.032 0.600 -0.843 0.160 0.364 -0.335 -0.730 0.676
2 1.013 0.568 0.235 0.370 0.112 -0.797 -1.367 -2.360 0.396 -2.114 ... -0.009 0.361 0.277 -0.116 -0.843 0.160 0.364 0.765 -0.589 0.633
3 0.733 0.368 0.283 0.165 0.599 -0.679 -1.200 -2.086 0.403 -2.114 ... 0.015 0.417 0.279 0.603 -0.843 -0.065 0.364 0.333 -0.112 0.206
4 0.684 0.638 0.260 0.209 0.337 -0.454 -1.073 -2.086 0.314 -2.114 ... 0.183 1.078 0.328 0.418 -0.843 -0.215 0.364 -0.280 -0.028 0.384

5 rows × 39 columns


2.数据归一化


columns = df_merge.columns
print(columns)
for column in columns:
    col = df_merge[column]
    col_min = col.min()
    col_max = col.max()
    normalized = (col - col_min) / (col_max - col_min)
    df_merge[column] = normalized
Index(['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'V29', 'V30',       'V31', 'V32', 'V33', 'V34', 'V35', 'V36', 'V37', 'target'],
      dtype='object')
df_merge.dropna(axis=0,inplace=True)
df_merge.shape
(2888, 39)


3.协相关


df_merge.corr()

.dataframe tbody tr th:only-of-type {         vertical-align: middle;     } .dataframe tbody tr th {     vertical-align: top; } .dataframe thead th {     text-align: right; }

V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V29 V30 V31 V32 V33 V34 V35 V36 V37 target
V0 1.000000 0.908607 0.463643 0.409576 0.781212 -0.327028 0.189267 0.141294 0.794013 0.077888 ... 0.302145 0.156968 0.675003 0.050951 0.056439 -0.019342 0.138933 0.231417 -0.494076 0.873212
V1 0.908607 1.000000 0.506514 0.383924 0.657790 -0.227289 0.276805 0.205023 0.874650 0.138849 ... 0.147096 0.175997 0.769745 0.085604 0.035129 -0.029115 0.146329 0.235299 -0.494043 0.871846
V2 0.463643 0.506514 1.000000 0.410148 0.057697 -0.322417 0.615938 0.477114 0.703431 0.047874 ... -0.275764 0.175943 0.653764 0.033942 0.050309 -0.025620 0.043648 0.316462 -0.734956 0.638878
V3 0.409576 0.383924 0.410148 1.000000 0.315046 -0.206307 0.233896 0.197836 0.411946 -0.063717 ... 0.117610 0.043966 0.421954 -0.092423 -0.007159 -0.031898 0.080034 0.324475 -0.229613 0.512074
V4 0.781212 0.657790 0.057697 0.315046 1.000000 -0.233959 -0.117529 -0.052370 0.449542 -0.031816 ... 0.659093 0.022807 0.447016 -0.026186 0.062367 0.028659 0.100010 0.113609 -0.031054 0.603984
V5 -0.327028 -0.227289 -0.322417 -0.206307 -0.233959 1.000000 -0.028995 0.081069 -0.182281 0.038810 ... -0.175836 -0.074214 -0.121290 -0.061886 -0.132727 -0.105801 -0.075191 0.026596 0.404799 -0.314676
V6 0.189267 0.276805 0.615938 0.233896 -0.117529 -0.028995 1.000000 0.917502 0.468233 0.450096 ... -0.467980 0.188907 0.546535 0.144550 0.054210 -0.002914 0.044992 0.433804 -0.404817 0.370037
V7 0.141294 0.205023 0.477114 0.197836 -0.052370 0.081069 0.917502 1.000000 0.389987 0.446611 ... -0.311363 0.170113 0.475254 0.122707 0.034508 -0.019103 0.111166 0.340479 -0.292285 0.287815
V8 0.794013 0.874650 0.703431 0.411946 0.449542 -0.182281 0.468233 0.389987 1.000000 0.100672 ... -0.011091 0.150258 0.878072 0.038430 0.026843 -0.036297 0.179167 0.326586 -0.553121 0.831904
V9 0.077888 0.138849 0.047874 -0.063717 -0.031816 0.038810 0.450096 0.446611 0.100672 1.000000 ... -0.221623 0.293026 0.121712 0.289891 0.115655 0.094856 0.141703 0.129542 -0.112503 0.139704
V10 0.298443 0.310120 0.346006 0.321262 0.141129 0.054060 0.415660 0.310982 0.419703 0.120208 ... -0.105042 -0.036705 0.560213 -0.093213 0.016739 -0.026994 0.026846 0.922190 -0.045851 0.394767
V11 -0.295420 -0.197317 -0.256407 -0.100489 -0.162507 0.863890 -0.147990 -0.064402 -0.146689 -0.114374 ... -0.084938 -0.153304 -0.084298 -0.153126 -0.095359 -0.053865 -0.032951 0.003413 0.459867 -0.263988
V12 0.751830 0.656186 0.059941 0.306397 0.927685 -0.306672 -0.087312 -0.036791 0.420557 -0.011889 ... 0.666775 0.028866 0.441963 -0.007658 0.046674 0.010122 0.081963 0.112150 -0.054827 0.594189
V13 0.185144 0.157518 0.204762 -0.003636 0.075993 -0.414517 0.138367 0.110973 0.153299 -0.040705 ... 0.008235 0.027328 0.113743 0.130598 0.157513 0.116944 0.219906 -0.024751 -0.379714 0.203373
V14 -0.004144 -0.006268 -0.106282 -0.232677 0.023853 -0.015671 0.072911 0.163931 0.008138 0.118176 ... 0.056814 -0.004057 0.010989 0.106581 0.073535 0.043218 0.233523 -0.086217 0.010553 0.008424
V15 0.314520 0.164702 -0.224573 0.143457 0.615704 -0.195037 -0.431542 -0.291272 0.018366 -0.199159 ... 0.951314 -0.111311 0.011768 -0.104618 0.050254 0.048602 0.100817 -0.051861 0.245635 0.154020
V16 0.347357 0.435606 0.782474 0.394517 0.023818 -0.044543 0.847119 0.752683 0.680031 0.193681 ... -0.342210 0.154794 0.778538 0.041474 0.028878 -0.054775 0.082293 0.551880 -0.420053 0.536748
V17 0.044722 0.072619 -0.019008 0.123900 0.044803 0.348211 0.134715 0.239448 0.112053 0.167310 ... 0.004855 -0.010787 0.150118 -0.051377 -0.055996 -0.064533 0.072320 0.312751 0.045842 0.104605
V18 0.148622 0.123862 0.132105 0.022868 0.136022 -0.190197 0.110570 0.098691 0.093682 0.260079 ... 0.053958 0.470341 0.079718 0.411967 0.512139 0.365410 0.152088 0.019603 -0.181937 0.170721
V19 -0.100294 -0.092673 -0.161802 -0.246008 -0.205729 0.171611 0.215290 0.158371 -0.144693 0.358149 ... -0.205409 0.100133 -0.131542 0.144018 -0.021517 -0.079753 -0.220737 0.087605 0.012115 -0.114976
V20 0.462493 0.459795 0.298385 0.289594 0.291309 -0.073232 0.136091 0.089399 0.412868 0.116111 ... 0.016233 0.086165 0.326863 0.050699 0.009358 -0.000979 0.048981 0.161315 -0.322006 0.444965
V21 -0.029285 -0.012911 -0.030932 0.114373 0.174025 0.115553 -0.051806 -0.065300 -0.047839 -0.018681 ... 0.157097 -0.077945 0.053025 -0.159128 -0.087561 -0.053707 -0.199398 0.047340 0.315470 -0.010063
V22 -0.105643 -0.102421 -0.212023 -0.291236 -0.028534 0.146545 -0.068158 0.077358 -0.097908 0.098401 ... 0.053349 -0.039953 -0.108088 0.057179 -0.019107 -0.002095 0.205423 -0.130607 0.099282 -0.107813
V23 0.231136 0.222574 0.065509 0.081374 0.196530 -0.158441 0.069901 0.125180 0.174124 0.380050 ... 0.116122 0.363963 0.129783 0.367086 0.183666 0.196681 0.635252 -0.035949 -0.187582 0.226331
V24 -0.324959 -0.233556 0.010225 -0.237326 -0.529866 0.275480 0.072418 -0.030292 -0.136898 -0.008549 ... -0.642370 0.033532 -0.202097 0.060608 -0.134320 -0.095588 -0.243738 -0.041325 -0.137614 -0.264815
V25 -0.200706 -0.070627 0.481785 -0.100569 -0.444375 0.045551 0.438610 0.316744 0.173320 0.078928 ... -0.575154 0.088238 0.201243 0.065501 -0.013312 -0.030747 -0.093948 0.069302 -0.246742 -0.019373
V26 -0.125140 -0.043012 0.035370 -0.027685 -0.080487 0.294934 0.106055 0.160566 0.015724 0.128494 ... -0.133694 -0.057247 0.062879 -0.004545 -0.034596 0.051294 0.085576 0.064963 0.010880 -0.046724
V27 0.733198 0.824198 0.726250 0.392006 0.412083 -0.218495 0.474441 0.424185 0.901100 0.114315 ... -0.032772 0.208074 0.790239 0.095127 0.030135 -0.036123 0.159884 0.226713 -0.617771 0.812585
V28 0.035119 0.077346 0.229575 0.159039 -0.044620 -0.042210 0.093427 0.058800 0.122050 -0.064595 ... -0.154572 0.054546 0.123403 0.013142 -0.024866 -0.058462 -0.080237 0.061601 -0.149326 0.100080
V29 0.302145 0.147096 -0.275764 0.117610 0.659093 -0.175836 -0.467980 -0.311363 -0.011091 -0.221623 ... 1.000000 -0.122817 -0.004364 -0.110699 0.035272 0.035392 0.078588 -0.099309 0.285581 0.123329
V30 0.156968 0.175997 0.175943 0.043966 0.022807 -0.074214 0.188907 0.170113 0.150258 0.293026 ... -0.122817 1.000000 0.114318 0.695725 0.083693 -0.028573 -0.027987 0.006961 -0.256814 0.187311
V31 0.675003 0.769745 0.653764 0.421954 0.447016 -0.121290 0.546535 0.475254 0.878072 0.121712 ... -0.004364 0.114318 1.000000 0.016782 0.016733 -0.047273 0.152314 0.510851 -0.357785 0.750297
V32 0.050951 0.085604 0.033942 -0.092423 -0.026186 -0.061886 0.144550 0.122707 0.038430 0.289891 ... -0.110699 0.695725 0.016782 1.000000 0.105255 0.069300 0.016901 -0.054411 -0.162417 0.066606
V33 0.056439 0.035129 0.050309 -0.007159 0.062367 -0.132727 0.054210 0.034508 0.026843 0.115655 ... 0.035272 0.083693 0.016733 0.105255 1.000000 0.719126 0.167597 0.031586 -0.062715 0.077273
V34 -0.019342 -0.029115 -0.025620 -0.031898 0.028659 -0.105801 -0.002914 -0.019103 -0.036297 0.094856 ... 0.035392 -0.028573 -0.047273 0.069300 0.719126 1.000000 0.233616 -0.019032 -0.006854 -0.006034
V35 0.138933 0.146329 0.043648 0.080034 0.100010 -0.075191 0.044992 0.111166 0.179167 0.141703 ... 0.078588 -0.027987 0.152314 0.016901 0.167597 0.233616 1.000000 0.025401 -0.077991 0.140294
V36 0.231417 0.235299 0.316462 0.324475 0.113609 0.026596 0.433804 0.340479 0.326586 0.129542 ... -0.099309 0.006961 0.510851 -0.054411 0.031586 -0.019032 0.025401 1.000000 -0.039478 0.319309
V37 -0.494076 -0.494043 -0.734956 -0.229613 -0.031054 0.404799 -0.404817 -0.292285 -0.553121 -0.112503 ... 0.285581 -0.256814 -0.357785 -0.162417 -0.062715 -0.006854 -0.077991 -0.039478 1.000000 -0.565795
target 0.873212 0.871846 0.638878 0.512074 0.603984 -0.314676 0.370037 0.287815 0.831904 0.139704 ... 0.123329 0.187311 0.750297 0.066606 0.077273 -0.006034 0.140294 0.319309 -0.565795 1.000000

39 rows × 39 columns

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')
# 热力图
plt.figure(figsize=(20,12))
sns.heatmap(df_merge.corr(), annot=True)

image.jpeg


4.数据集划分


df=df_merge.iloc[:df.shape[0],:]
df_test=df_merge.iloc[df.shape[0]:,:]
df.shape
(2888, 39)
from sklearn.model_selection import train_test_split,cross_val_score
train,test=train_test_split(df,test_size=0.25,random_state=2023)


三、模型构建


搭建全连接神经网络

import paddle
import paddle.nn as nn
# 定义动态图
class Net(paddle.nn.Layer):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = paddle.nn.Linear(38, 1000)
        self.fc2 = paddle.nn.Linear(1000, 100)
        self.fc3 = paddle.nn.Linear(100, 50)
        self.fc4 = paddle.nn.Linear(50, 1)
    # 网络的前向计算函数
    def forward(self, inputs):
        y = self.fc1(inputs)
        y = self.fc2(y)
        y = self.fc3(y)
        pred = self.fc4(y)
        return pred


四、模型训练


搭建全连接神经网络

model=Net()
loss_func = paddle.nn.CrossEntropyLoss()
#优化器
opt = paddle.optimizer.Adam(learning_rate=0.1,parameters= model.parameters())
import paddle.nn.functional as F
EPOCH_NUM = 1000   # 设置外层循环次数
BATCH_SIZE = 256  # 设置batch大小
import numpy as np
# 定义外层循环
for epoch_id in range(EPOCH_NUM):
    # 在每轮迭代开始之前,将训练数据的顺序随机的打乱
    train.sample(frac=1)
    # 将训练数据进行拆分,每个batch包含10条数据
    mini_batches = [train[k:k+BATCH_SIZE] for k in range(0, len(train), BATCH_SIZE)]
    # 定义内层循环
    for iter_id, mini_batch in enumerate(mini_batches):
        x = np.array(mini_batch.iloc[:, :-1]) # 获得当前批次训练数据
        y = np.array(mini_batch.iloc[:, -1:])# 获得当前批次训练标签
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x,dtype='float32')
        y = paddle.to_tensor(y,dtype='float32') 
        # 前向计算
        predicts = model(features)
        # 计算损失
        loss = F.square_error_cost(predicts, label=y)
        avg_loss = paddle.mean(loss)
        if iter_id%20==0:
            print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy()))
        # 反向传播,计算每层参数的梯度值
        avg_loss.backward()
        # 更新参数,根据设置好的学习率迭代一步
        opt.step()
        # 清空梯度变量,以备下一轮计算
        opt.clear_grad()
# 保存模型参数,文件名为LR_model.pdparams
paddle.save(model.state_dict(), 'LR_model.pdparams')
print("模型保存成功,模型参数保存在LR_model.pdparams中")
模型保存成功,模型参数保存在LR_model.pdparams中


五、模型预测


target=df_merge['target']
target_max=target.max()
target_min=target.min()
# 参数为保存模型参数的文件地址
model_dict = paddle.load('LR_model.pdparams')
model.load_dict(model_dict)
model.eval()
# 参数为数据集的文件地址
one_data = np.array(test.iloc[:, :-1]) # 获得当前批次训练数据
label = np.array(test.iloc[:, -1:])# 获得当前批次训练标签
# 将数据转为动态图的variable格式 
one_data = paddle.to_tensor(one_data,dtype='float32')
predict = model(one_data)
predict=predict.numpy()
# 对结果做反归一化处理
predict = predict* (target_max - target_min) + target_min
# 对label数据做反归一化处理
label = label * (target_max - target_min) + target_min
for i in range(10):
    print("Inference result is {}, the corresponding label is {}".format(predict[i], label[i]))
Inference result is [0.18284929], the corresponding label is [0.12307417]
Inference result is [0.3642674], the corresponding label is [0.46524543]
Inference result is [0.382918], the corresponding label is [0.46291652]
Inference result is [0.50540245], the corresponding label is [0.52185597]
Inference result is [0.63235503], the corresponding label is [0.62683626]
Inference result is [0.6009079], the corresponding label is [0.59620208]
Inference result is [0.6699288], the corresponding label is [0.74435686]
Inference result is [0.51734024], the corresponding label is [0.55499821]
Inference result is [0.4891268], the corresponding label is [0.59638123]
Inference result is [0.40018553], the corresponding label is [0.49390899]


目录
相关文章
|
数据采集 数据挖掘
基于PaddlePaddle的酒驾风险行为分析预测
基于PaddlePaddle的酒驾风险行为分析预测
120 0
|
数据挖掘
基于PaddlePaddle的中风患者线性模型预测
基于PaddlePaddle的中风患者线性模型预测
69 0
|
5月前
|
数据采集 自然语言处理 API
百度飞桨(PaddlePaddle)-数字识别
百度飞桨(PaddlePaddle)-数字识别
75 1
|
5月前
|
文字识别 数据可视化 Python
百度飞桨(PaddlePaddle) - PP-OCRv3 文字检测识别系统 Paddle Inference 模型推理(离线部署)
百度飞桨(PaddlePaddle) - PP-OCRv3 文字检测识别系统 Paddle Inference 模型推理(离线部署)
292 0
|
机器学习/深度学习 存储 数据采集
SqueezeNet算法解析—鸟类识别—Paddle实战
SqueezeNet算法,顾名思义,Squeeze的中文意思是压缩和挤压的意思,所以我们通过算法的名字就可以猜想到,该算法一定是通过解压模型来降低模型参数量的。当然任何算法的改进都是在原先的基础上提升精度或者降低模型参数,因此该算法的主要目的就是在于降低模型参数量的同时保持模型精度。随着CNN卷积神经网络的研究发展,越来越多的模型被研发出来,而为了提高模型的精度,深层次的模型例如AlexNet和ResNet等得到了大家的广泛认可。
239 0
SqueezeNet算法解析—鸟类识别—Paddle实战
|
API 异构计算
使用OpenVINO 和 PaddlePaddle 进行图像分类预测
使用OpenVINO 和 PaddlePaddle 进行图像分类预测
296 0
使用OpenVINO 和 PaddlePaddle 进行图像分类预测
|
机器学习/深度学习 自然语言处理 算法
瞎聊深度学习——PaddlePaddle的使用(一)
瞎聊深度学习——PaddlePaddle的使用(一)
|
人工智能 数据可视化 数据挖掘
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(上)
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(上)
273 0
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(上)
|
API 异构计算
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(下)
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(下)
218 0
基于OpenVINO的PaddlePaddle的鲜花识别模型预测部署(下)
|
机器学习/深度学习 算法 搜索推荐
使用Paddle飞桨重写波士顿房价预测案例
本篇文章基于上一篇波士顿房价预测基础案例改写,使用百度飞桨的paddle来进行实现波士顿房价的预测,同时会对比使用paddle框架和不使用框架的区别。
277 0
使用Paddle飞桨重写波士顿房价预测案例