一、工业蒸汽量预测

1.赛题简介

天池新人实战赛是针对数据新人开设的实战练习专场，以经典赛题作为学习场景，提供详尽入门教程，手把手教你学习数据挖掘。天池希望新人赛能成为高校备受热捧的数据实战课程，帮助更多学生掌握数据技能。

2.赛题背景

火力发电的基本原理是：燃料在燃烧时加热水生成蒸汽，蒸汽压力推动汽轮机旋转，然后汽轮机带动发电机旋转，产生电能。在这一系列的能量转化中，影响发电效率的核心是锅炉的燃烧效率，即燃料燃烧加热水产生高温高压蒸汽。锅炉的燃烧效率的影响因素很多，包括锅炉的可调参数，如燃烧给量，一二次风，引风，返料风，给水水量；以及锅炉的工况，比如锅炉床温、床压，炉膛温度、压力，过热器的温度等。

3.赛题描述

经脱敏后的锅炉传感器采集的数据（采集频率是分钟级别），根据锅炉的工况，预测产生的蒸汽量。数据说明

数据分成训练数据（train.txt）和测试数据（test.txt），其中字段”V0”-“V37”，这38个字段是作为特征变量，”target”作为目标变量。选手利用训练数据训练出模型，预测测试数据的目标变量，排名结果依据预测结果的MSE（mean square error）。

二、数据处理

1.数据读取

import pandas as pd
df=pd.read_csv("data/data178496/zhengqi_train.txt",sep='\t')
df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	V0	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V29	V30	V31	V32	V33	V34	V35	V36	V37	target
0	0.566	0.016	-0.143	0.407	0.452	-0.901	-1.812	-2.360	-0.436	-2.114	...	0.136	0.109	-0.615	0.327	-4.627	-4.789	-5.101	-2.608	-3.508	0.175
1	0.968	0.437	0.066	0.566	0.194	-0.893	-1.566	-2.360	0.332	-2.114	...	-0.128	0.124	0.032	0.600	-0.843	0.160	0.364	-0.335	-0.730	0.676
2	1.013	0.568	0.235	0.370	0.112	-0.797	-1.367	-2.360	0.396	-2.114	...	-0.009	0.361	0.277	-0.116	-0.843	0.160	0.364	0.765	-0.589	0.633
3	0.733	0.368	0.283	0.165	0.599	-0.679	-1.200	-2.086	0.403	-2.114	...	0.015	0.417	0.279	0.603	-0.843	-0.065	0.364	0.333	-0.112	0.206
4	0.684	0.638	0.260	0.209	0.337	-0.454	-1.073	-2.086	0.314	-2.114	...	0.183	1.078	0.328	0.418	-0.843	-0.215	0.364	-0.280	-0.028	0.384

5 rows × 39 columns

df.isnull().sum()

V0        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
V29       0
V30       0
V31       0
V32       0
V33       0
V34       0
V35       0
V36       0
V37       0
target    0
dtype: int64

import pandas as pd
df_test=pd.read_csv("data/data178496/zhengqi_test.txt",sep='\t')

df_merge=df.append(df_test)

df_merge.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	V0	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V29	V30	V31	V32	V33	V34	V35	V36	V37	target
0	0.566	0.016	-0.143	0.407	0.452	-0.901	-1.812	-2.360	-0.436	-2.114	...	0.136	0.109	-0.615	0.327	-4.627	-4.789	-5.101	-2.608	-3.508	0.175
1	0.968	0.437	0.066	0.566	0.194	-0.893	-1.566	-2.360	0.332	-2.114	...	-0.128	0.124	0.032	0.600	-0.843	0.160	0.364	-0.335	-0.730	0.676
2	1.013	0.568	0.235	0.370	0.112	-0.797	-1.367	-2.360	0.396	-2.114	...	-0.009	0.361	0.277	-0.116	-0.843	0.160	0.364	0.765	-0.589	0.633
3	0.733	0.368	0.283	0.165	0.599	-0.679	-1.200	-2.086	0.403	-2.114	...	0.015	0.417	0.279	0.603	-0.843	-0.065	0.364	0.333	-0.112	0.206
4	0.684	0.638	0.260	0.209	0.337	-0.454	-1.073	-2.086	0.314	-2.114	...	0.183	1.078	0.328	0.418	-0.843	-0.215	0.364	-0.280	-0.028	0.384

5 rows × 39 columns

2.数据归一化

columns = df_merge.columns
print(columns)
for column in columns:
    col = df_merge[column]
    col_min = col.min()
    col_max = col.max()
    normalized = (col - col_min) / (col_max - col_min)
    df_merge[column] = normalized

Index(['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'V29', 'V30',       'V31', 'V32', 'V33', 'V34', 'V35', 'V36', 'V37', 'target'],
      dtype='object')

df_merge.dropna(axis=0,inplace=True)

df_merge.shape

(2888, 39)

3.协相关

df_merge.corr()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	V0	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V29	V30	V31	V32	V33	V34	V35	V36	V37	target
V0	1.000000	0.908607	0.463643	0.409576	0.781212	-0.327028	0.189267	0.141294	0.794013	0.077888	...	0.302145	0.156968	0.675003	0.050951	0.056439	-0.019342	0.138933	0.231417	-0.494076	0.873212
V1	0.908607	1.000000	0.506514	0.383924	0.657790	-0.227289	0.276805	0.205023	0.874650	0.138849	...	0.147096	0.175997	0.769745	0.085604	0.035129	-0.029115	0.146329	0.235299	-0.494043	0.871846
V2	0.463643	0.506514	1.000000	0.410148	0.057697	-0.322417	0.615938	0.477114	0.703431	0.047874	...	-0.275764	0.175943	0.653764	0.033942	0.050309	-0.025620	0.043648	0.316462	-0.734956	0.638878
V3	0.409576	0.383924	0.410148	1.000000	0.315046	-0.206307	0.233896	0.197836	0.411946	-0.063717	...	0.117610	0.043966	0.421954	-0.092423	-0.007159	-0.031898	0.080034	0.324475	-0.229613	0.512074
V4	0.781212	0.657790	0.057697	0.315046	1.000000	-0.233959	-0.117529	-0.052370	0.449542	-0.031816	...	0.659093	0.022807	0.447016	-0.026186	0.062367	0.028659	0.100010	0.113609	-0.031054	0.603984
V5	-0.327028	-0.227289	-0.322417	-0.206307	-0.233959	1.000000	-0.028995	0.081069	-0.182281	0.038810	...	-0.175836	-0.074214	-0.121290	-0.061886	-0.132727	-0.105801	-0.075191	0.026596	0.404799	-0.314676
V6	0.189267	0.276805	0.615938	0.233896	-0.117529	-0.028995	1.000000	0.917502	0.468233	0.450096	...	-0.467980	0.188907	0.546535	0.144550	0.054210	-0.002914	0.044992	0.433804	-0.404817	0.370037
V7	0.141294	0.205023	0.477114	0.197836	-0.052370	0.081069	0.917502	1.000000	0.389987	0.446611	...	-0.311363	0.170113	0.475254	0.122707	0.034508	-0.019103	0.111166	0.340479	-0.292285	0.287815
V8	0.794013	0.874650	0.703431	0.411946	0.449542	-0.182281	0.468233	0.389987	1.000000	0.100672	...	-0.011091	0.150258	0.878072	0.038430	0.026843	-0.036297	0.179167	0.326586	-0.553121	0.831904
V9	0.077888	0.138849	0.047874	-0.063717	-0.031816	0.038810	0.450096	0.446611	0.100672	1.000000	...	-0.221623	0.293026	0.121712	0.289891	0.115655	0.094856	0.141703	0.129542	-0.112503	0.139704
V10	0.298443	0.310120	0.346006	0.321262	0.141129	0.054060	0.415660	0.310982	0.419703	0.120208	...	-0.105042	-0.036705	0.560213	-0.093213	0.016739	-0.026994	0.026846	0.922190	-0.045851	0.394767
V11	-0.295420	-0.197317	-0.256407	-0.100489	-0.162507	0.863890	-0.147990	-0.064402	-0.146689	-0.114374	...	-0.084938	-0.153304	-0.084298	-0.153126	-0.095359	-0.053865	-0.032951	0.003413	0.459867	-0.263988
V12	0.751830	0.656186	0.059941	0.306397	0.927685	-0.306672	-0.087312	-0.036791	0.420557	-0.011889	...	0.666775	0.028866	0.441963	-0.007658	0.046674	0.010122	0.081963	0.112150	-0.054827	0.594189
V13	0.185144	0.157518	0.204762	-0.003636	0.075993	-0.414517	0.138367	0.110973	0.153299	-0.040705	...	0.008235	0.027328	0.113743	0.130598	0.157513	0.116944	0.219906	-0.024751	-0.379714	0.203373
V14	-0.004144	-0.006268	-0.106282	-0.232677	0.023853	-0.015671	0.072911	0.163931	0.008138	0.118176	...	0.056814	-0.004057	0.010989	0.106581	0.073535	0.043218	0.233523	-0.086217	0.010553	0.008424
V15	0.314520	0.164702	-0.224573	0.143457	0.615704	-0.195037	-0.431542	-0.291272	0.018366	-0.199159	...	0.951314	-0.111311	0.011768	-0.104618	0.050254	0.048602	0.100817	-0.051861	0.245635	0.154020
V16	0.347357	0.435606	0.782474	0.394517	0.023818	-0.044543	0.847119	0.752683	0.680031	0.193681	...	-0.342210	0.154794	0.778538	0.041474	0.028878	-0.054775	0.082293	0.551880	-0.420053	0.536748
V17	0.044722	0.072619	-0.019008	0.123900	0.044803	0.348211	0.134715	0.239448	0.112053	0.167310	...	0.004855	-0.010787	0.150118	-0.051377	-0.055996	-0.064533	0.072320	0.312751	0.045842	0.104605
V18	0.148622	0.123862	0.132105	0.022868	0.136022	-0.190197	0.110570	0.098691	0.093682	0.260079	...	0.053958	0.470341	0.079718	0.411967	0.512139	0.365410	0.152088	0.019603	-0.181937	0.170721
V19	-0.100294	-0.092673	-0.161802	-0.246008	-0.205729	0.171611	0.215290	0.158371	-0.144693	0.358149	...	-0.205409	0.100133	-0.131542	0.144018	-0.021517	-0.079753	-0.220737	0.087605	0.012115	-0.114976
V20	0.462493	0.459795	0.298385	0.289594	0.291309	-0.073232	0.136091	0.089399	0.412868	0.116111	...	0.016233	0.086165	0.326863	0.050699	0.009358	-0.000979	0.048981	0.161315	-0.322006	0.444965
V21	-0.029285	-0.012911	-0.030932	0.114373	0.174025	0.115553	-0.051806	-0.065300	-0.047839	-0.018681	...	0.157097	-0.077945	0.053025	-0.159128	-0.087561	-0.053707	-0.199398	0.047340	0.315470	-0.010063
V22	-0.105643	-0.102421	-0.212023	-0.291236	-0.028534	0.146545	-0.068158	0.077358	-0.097908	0.098401	...	0.053349	-0.039953	-0.108088	0.057179	-0.019107	-0.002095	0.205423	-0.130607	0.099282	-0.107813
V23	0.231136	0.222574	0.065509	0.081374	0.196530	-0.158441	0.069901	0.125180	0.174124	0.380050	...	0.116122	0.363963	0.129783	0.367086	0.183666	0.196681	0.635252	-0.035949	-0.187582	0.226331
V24	-0.324959	-0.233556	0.010225	-0.237326	-0.529866	0.275480	0.072418	-0.030292	-0.136898	-0.008549	...	-0.642370	0.033532	-0.202097	0.060608	-0.134320	-0.095588	-0.243738	-0.041325	-0.137614	-0.264815
V25	-0.200706	-0.070627	0.481785	-0.100569	-0.444375	0.045551	0.438610	0.316744	0.173320	0.078928	...	-0.575154	0.088238	0.201243	0.065501	-0.013312	-0.030747	-0.093948	0.069302	-0.246742	-0.019373
V26	-0.125140	-0.043012	0.035370	-0.027685	-0.080487	0.294934	0.106055	0.160566	0.015724	0.128494	...	-0.133694	-0.057247	0.062879	-0.004545	-0.034596	0.051294	0.085576	0.064963	0.010880	-0.046724
V27	0.733198	0.824198	0.726250	0.392006	0.412083	-0.218495	0.474441	0.424185	0.901100	0.114315	...	-0.032772	0.208074	0.790239	0.095127	0.030135	-0.036123	0.159884	0.226713	-0.617771	0.812585
V28	0.035119	0.077346	0.229575	0.159039	-0.044620	-0.042210	0.093427	0.058800	0.122050	-0.064595	...	-0.154572	0.054546	0.123403	0.013142	-0.024866	-0.058462	-0.080237	0.061601	-0.149326	0.100080
V29	0.302145	0.147096	-0.275764	0.117610	0.659093	-0.175836	-0.467980	-0.311363	-0.011091	-0.221623	...	1.000000	-0.122817	-0.004364	-0.110699	0.035272	0.035392	0.078588	-0.099309	0.285581	0.123329
V30	0.156968	0.175997	0.175943	0.043966	0.022807	-0.074214	0.188907	0.170113	0.150258	0.293026	...	-0.122817	1.000000	0.114318	0.695725	0.083693	-0.028573	-0.027987	0.006961	-0.256814	0.187311
V31	0.675003	0.769745	0.653764	0.421954	0.447016	-0.121290	0.546535	0.475254	0.878072	0.121712	...	-0.004364	0.114318	1.000000	0.016782	0.016733	-0.047273	0.152314	0.510851	-0.357785	0.750297
V32	0.050951	0.085604	0.033942	-0.092423	-0.026186	-0.061886	0.144550	0.122707	0.038430	0.289891	...	-0.110699	0.695725	0.016782	1.000000	0.105255	0.069300	0.016901	-0.054411	-0.162417	0.066606
V33	0.056439	0.035129	0.050309	-0.007159	0.062367	-0.132727	0.054210	0.034508	0.026843	0.115655	...	0.035272	0.083693	0.016733	0.105255	1.000000	0.719126	0.167597	0.031586	-0.062715	0.077273
V34	-0.019342	-0.029115	-0.025620	-0.031898	0.028659	-0.105801	-0.002914	-0.019103	-0.036297	0.094856	...	0.035392	-0.028573	-0.047273	0.069300	0.719126	1.000000	0.233616	-0.019032	-0.006854	-0.006034
V35	0.138933	0.146329	0.043648	0.080034	0.100010	-0.075191	0.044992	0.111166	0.179167	0.141703	...	0.078588	-0.027987	0.152314	0.016901	0.167597	0.233616	1.000000	0.025401	-0.077991	0.140294
V36	0.231417	0.235299	0.316462	0.324475	0.113609	0.026596	0.433804	0.340479	0.326586	0.129542	...	-0.099309	0.006961	0.510851	-0.054411	0.031586	-0.019032	0.025401	1.000000	-0.039478	0.319309
V37	-0.494076	-0.494043	-0.734956	-0.229613	-0.031054	0.404799	-0.404817	-0.292285	-0.553121	-0.112503	...	0.285581	-0.256814	-0.357785	-0.162417	-0.062715	-0.006854	-0.077991	-0.039478	1.000000	-0.565795
target	0.873212	0.871846	0.638878	0.512074	0.603984	-0.314676	0.370037	0.287815	0.831904	0.139704	...	0.123329	0.187311	0.750297	0.066606	0.077273	-0.006034	0.140294	0.319309	-0.565795	1.000000

39 rows × 39 columns

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')
# 热力图
plt.figure(figsize=(20,12))
sns.heatmap(df_merge.corr(), annot=True)

4.数据集划分

df=df_merge.iloc[:df.shape[0],:]
df_test=df_merge.iloc[df.shape[0]:,:]

df.shape

(2888, 39)

from sklearn.model_selection import train_test_split,cross_val_score
train,test=train_test_split(df,test_size=0.25,random_state=2023)

三、模型构建

搭建全连接神经网络

import paddle
import paddle.nn as nn
# 定义动态图
class Net(paddle.nn.Layer):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = paddle.nn.Linear(38, 1000)
        self.fc2 = paddle.nn.Linear(1000, 100)
        self.fc3 = paddle.nn.Linear(100, 50)
        self.fc4 = paddle.nn.Linear(50, 1)
    # 网络的前向计算函数
    def forward(self, inputs):
        y = self.fc1(inputs)
        y = self.fc2(y)
        y = self.fc3(y)
        pred = self.fc4(y)
        return pred

四、模型训练

搭建全连接神经网络

model=Net()
loss_func = paddle.nn.CrossEntropyLoss()
#优化器
opt = paddle.optimizer.Adam(learning_rate=0.1,parameters= model.parameters())

import paddle.nn.functional as F
EPOCH_NUM = 1000   # 设置外层循环次数
BATCH_SIZE = 256  # 设置batch大小
import numpy as np
# 定义外层循环
for epoch_id in range(EPOCH_NUM):
    # 在每轮迭代开始之前，将训练数据的顺序随机的打乱
    train.sample(frac=1)
    # 将训练数据进行拆分，每个batch包含10条数据
    mini_batches = [train[k:k+BATCH_SIZE] for k in range(0, len(train), BATCH_SIZE)]
    # 定义内层循环
    for iter_id, mini_batch in enumerate(mini_batches):
        x = np.array(mini_batch.iloc[:, :-1]) # 获得当前批次训练数据
        y = np.array(mini_batch.iloc[:, -1:])# 获得当前批次训练标签
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x,dtype='float32')
        y = paddle.to_tensor(y,dtype='float32') 
        # 前向计算
        predicts = model(features)
        # 计算损失
        loss = F.square_error_cost(predicts, label=y)
        avg_loss = paddle.mean(loss)
        if iter_id%20==0:
            print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy()))
        # 反向传播，计算每层参数的梯度值
        avg_loss.backward()
        # 更新参数，根据设置好的学习率迭代一步
        opt.step()
        # 清空梯度变量，以备下一轮计算
        opt.clear_grad()

# 保存模型参数，文件名为LR_model.pdparams
paddle.save(model.state_dict(), 'LR_model.pdparams')
print("模型保存成功，模型参数保存在LR_model.pdparams中")

模型保存成功，模型参数保存在LR_model.pdparams中

五、模型预测

target=df_merge['target']
target_max=target.max()
target_min=target.min()

# 参数为保存模型参数的文件地址
model_dict = paddle.load('LR_model.pdparams')
model.load_dict(model_dict)
model.eval()
# 参数为数据集的文件地址
one_data = np.array(test.iloc[:, :-1]) # 获得当前批次训练数据
label = np.array(test.iloc[:, -1:])# 获得当前批次训练标签
# 将数据转为动态图的variable格式 
one_data = paddle.to_tensor(one_data,dtype='float32')
predict = model(one_data)
predict=predict.numpy()

# 对结果做反归一化处理
predict = predict* (target_max - target_min) + target_min
# 对label数据做反归一化处理
label = label * (target_max - target_min) + target_min
for i in range(10):
    print("Inference result is {}, the corresponding label is {}".format(predict[i], label[i]))

Inference result is [0.18284929], the corresponding label is [0.12307417]
Inference result is [0.3642674], the corresponding label is [0.46524543]
Inference result is [0.382918], the corresponding label is [0.46291652]
Inference result is [0.50540245], the corresponding label is [0.52185597]
Inference result is [0.63235503], the corresponding label is [0.62683626]
Inference result is [0.6009079], the corresponding label is [0.59620208]
Inference result is [0.6699288], the corresponding label is [0.74435686]
Inference result is [0.51734024], the corresponding label is [0.55499821]
Inference result is [0.4891268], the corresponding label is [0.59638123]
Inference result is [0.40018553], the corresponding label is [0.49390899]

基于PaddlePaddle的工业蒸汽预测

一、工业蒸汽量预测

1.赛题简介

2.赛题背景

3.赛题描述

二、数据处理

1.数据读取

2.数据归一化

3.协相关

4.数据集划分

三、模型构建

四、模型训练

五、模型预测

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

基于PaddlePaddle的工业蒸汽预测

一、工业蒸汽量预测

1.赛题简介

2.赛题背景

3.赛题描述

二、数据处理

1.数据读取

2.数据归一化

3.协相关

4.数据集划分

三、模型构建

四、模型训练

五、模型预测

热门文章

最新文章

相关电子书