COVID-19 Cases Prediction (Regression)(一)

简介: COVID-19 Cases Prediction (Regression)

Objectives:

  • Solve a regression problem with deep neural networks (DNN).
  • Understand basic DNN training tips.
  • Familiarize yourself with PyTorch.

Task Description

  • COVID-19 Cases Prediction
  • Source: Delphi group @ CMU
  • A daily survey since April 2020 via facebook.


Try to find out the data and use it to your training is forbidden


image.png

  • Given survey results in the past 5 days in a specific state in U.S., then predict the percentage of new tested positive cases in the 5th day.

image.png

Data

image.png

Conducted surveys via facebook (every day & every state) Survey: symptoms, COVID-19 testing,social distancing, mental health, demographics, economic effects, …

  • States (37, encoded to one-hot vectors)
  • COVID-like illness (4)
  • cli、ili …
  • Behavior Indicators (8)
  • wearing_mask、travel_outside_state …
  • Mental Health Indicators (3)
  • anxious、depressed …
  • Tested Positive Cases (1)
  • tested_positive (this is what we want to predict)


Data – One-hot Vector

  • One-hot vectors:

   Vectors with only one element equals to one while others are zero. Usually used to encode discrete values.

the details about One-hot Vector please read the blog:One-Hot

image.png

Evaluation Metric

  • Mean Squared Error (MSE)

image.png

image.png

Download data

If the Google Drive links below do not work, you can download data from Kaggle, and upload data manually to the workspace.

!gdown --id '1kLSW_-cW2Huj7bh84YTdimGBOJaODiOS' --output covid.train.csv
!gdown --id '1iiI5qROrAhZn-o4FPqsE97bMzDEFvIdg' --output covid.test.csv
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
  category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=1kLSW_-cW2Huj7bh84YTdimGBOJaODiOS
To: /content/covid.train.csv
100% 2.49M/2.49M [00:00<00:00, 238MB/s]
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
  category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=1iiI5qROrAhZn-o4FPqsE97bMzDEFvIdg
To: /content/covid.test.csv
100% 993k/993k [00:00<00:00, 137MB/s]

Import packages

# Numerical Operations
import math
import numpy as np
# Reading/Writing Data
import pandas as pd
import os
import csv
# For Progress Bar
from tqdm import tqdm
# Pytorch
import torch 
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, random_split
# For plotting learning curve
from torch.utils.tensorboard import SummaryWriter

Some Utility Functions

You do not need to modify this part.

def same_seed(seed): 
    '''Fixes random number generator seeds for reproducibility.'''
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
def train_valid_split(data_set, valid_ratio, seed):
    '''Split provided training data into training set and validation set'''
    valid_set_size = int(valid_ratio * len(data_set)) 
    train_set_size = len(data_set) - valid_set_size
    train_set, valid_set = random_split(data_set, [train_set_size, valid_set_size], generator=torch.Generator().manual_seed(seed))
    return np.array(train_set), np.array(valid_set)
def predict(test_loader, model, device):
    model.eval() # Set your model to evaluation mode.
    preds = []
    for x in tqdm(test_loader):
        x = x.to(device)                        
        with torch.no_grad():                   
            pred = model(x)                     
            preds.append(pred.detach().cpu())   
    preds = torch.cat(preds, dim=0).numpy()  
    return preds

Dataset

class COVID19Dataset(Dataset):
    '''
    x: Features.
    y: Targets, if none, do prediction.
    '''
    def __init__(self, x, y=None):
        if y is None:
            self.y = y
        else:
            self.y = torch.FloatTensor(y)
        self.x = torch.FloatTensor(x)
    def __getitem__(self, idx):
        if self.y is None:
            return self.x[idx]
        else:
            return self.x[idx], self.y[idx]
    def __len__(self):
        return len(self.x)

Neural Network Model

Try out different model architectures by modifying the class below.

class My_Model(nn.Module):
    def __init__(self, input_dim):
        super(My_Model, self).__init__()
        # TODO: modify model's structure, be aware of dimensions. 
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1)
        )
    def forward(self, x):
        x = self.layers(x)
        x = x.squeeze(1) # (B, 1) -> (B)
        return x






目录
相关文章
|
机器学习/深度学习 自然语言处理 TensorFlow
Next Sentence Prediction,NSP
Next Sentence Prediction(NSP) 是一种用于自然语言处理 (NLP) 的预测技术。
362 2
|
5月前
|
机器学习/深度学习 算法 关系型数据库
Hierarchical Attention-Based Age Estimation and Bias Analysis
【6月更文挑战第8天】Hierarchical Attention-Based Age Estimation论文提出了一种深度学习方法,利用层次注意力和图像增强来估计面部年龄。通过Transformer和CNN,它学习局部特征并进行序数分类和回归,提高在CACD和MORPH II数据集上的准确性。论文还包括对种族和性别偏倚的分析。方法包括自我注意的图像嵌入和层次概率年龄回归,优化多损失函数。实验表明,该方法在RS和SE协议下表现优越,且在消融研究中验证了增强聚合和编码器设计的有效性。
38 2
|
人工智能 资源调度 自动驾驶
Markov Decision Process,MDP
马尔可夫决策过程(Markov Decision Process,MDP)是一种用于描述决策者在马尔可夫环境中进行决策的数学模型。它由四个核心要素组成:状态(State)、动作(Action)、转移概率(Transition Probability)和奖励(Reward)。在 MDP 中,智能体(Agent)需要在给定的状态下选择一个动作,然后根据状态转移概率和奖励更新状态,最终目标是最大化累积奖励。
96 4
|
机器学习/深度学习
Denoising Autoencoder
去噪自动编码器(Denoising Autoencoder)是一种特殊的自动编码器,主要用于去除输入数据中的噪声。在图像、语音、文本等信号处理领域,噪声是很常见的问题。去噪自动编码器的主要目标是通过学习信号的特征,然后利用这些特征去除噪声。
55 1
|
机器学习/深度学习 算法 决策智能
Lecture 4:无模型预测
Lecture 4:无模型预测
129 1
|
运维 安全 数据挖掘
Outlier and Outlier Analysis|学习笔记
快速学习 Outlier and Outlier Analysis
Outlier and Outlier Analysis|学习笔记
|
机器学习/深度学习 自然语言处理 数据挖掘
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
|
机器学习/深度学习 异构计算
COVID-19 Cases Prediction (Regression)(二)
COVID-19 Cases Prediction (Regression)
458 0
COVID-19 Cases Prediction (Regression)(二)
|
机器学习/深度学习 人工智能 移动开发
Logistic Regression with a Neural Network mindset
数据集是两个 .h5 格式的文件,有训练集和测试集,分别有209和50张图片,大小为(64, 64 ,3),reshape 成(12288, 209)和(12288, 50)。
138 0