Objectives:
- Solve a regression problem with deep neural networks (DNN).
- Understand basic DNN training tips.
- Familiarize yourself with PyTorch.
Task Description
- COVID-19 Cases Prediction
- Source: Delphi group @ CMU
- A daily survey since April 2020 via facebook.
Try to find out the data and use it to your training is forbidden
- Given survey results in the past 5 days in a specific state in U.S., then predict the percentage of new tested positive cases in the 5th day.
Data
Conducted surveys via facebook (every day & every state) Survey: symptoms, COVID-19 testing,social distancing, mental health, demographics, economic effects, …
- States (37, encoded to one-hot vectors)
- COVID-like illness (4)
- cli、ili …
- Behavior Indicators (8)
- wearing_mask、travel_outside_state …
- Mental Health Indicators (3)
- anxious、depressed …
- Tested Positive Cases (1)
- tested_positive (this is what we want to predict)
Data – One-hot Vector
- One-hot vectors:
Vectors with only one element equals to one while others are zero. Usually used to encode discrete values.
the details about One-hot Vector please read the blog:One-Hot
Evaluation Metric
- Mean Squared Error (MSE)
Download data
If the Google Drive links below do not work, you can download data from Kaggle, and upload data manually to the workspace.
!gdown --id '1kLSW_-cW2Huj7bh84YTdimGBOJaODiOS' --output covid.train.csv !gdown --id '1iiI5qROrAhZn-o4FPqsE97bMzDEFvIdg' --output covid.test.csv
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID. category=FutureWarning, Downloading... From: https://drive.google.com/uc?id=1kLSW_-cW2Huj7bh84YTdimGBOJaODiOS To: /content/covid.train.csv 100% 2.49M/2.49M [00:00<00:00, 238MB/s] /usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID. category=FutureWarning, Downloading... From: https://drive.google.com/uc?id=1iiI5qROrAhZn-o4FPqsE97bMzDEFvIdg To: /content/covid.test.csv 100% 993k/993k [00:00<00:00, 137MB/s]
Import packages
# Numerical Operations import math import numpy as np # Reading/Writing Data import pandas as pd import os import csv # For Progress Bar from tqdm import tqdm # Pytorch import torch import torch.nn as nn from torch.utils.data import Dataset, DataLoader, random_split # For plotting learning curve from torch.utils.tensorboard import SummaryWriter
Some Utility Functions
You do not need to modify this part.
def same_seed(seed): '''Fixes random number generator seeds for reproducibility.''' torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False np.random.seed(seed) torch.manual_seed(seed) if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed) def train_valid_split(data_set, valid_ratio, seed): '''Split provided training data into training set and validation set''' valid_set_size = int(valid_ratio * len(data_set)) train_set_size = len(data_set) - valid_set_size train_set, valid_set = random_split(data_set, [train_set_size, valid_set_size], generator=torch.Generator().manual_seed(seed)) return np.array(train_set), np.array(valid_set) def predict(test_loader, model, device): model.eval() # Set your model to evaluation mode. preds = [] for x in tqdm(test_loader): x = x.to(device) with torch.no_grad(): pred = model(x) preds.append(pred.detach().cpu()) preds = torch.cat(preds, dim=0).numpy() return preds
Dataset
class COVID19Dataset(Dataset): ''' x: Features. y: Targets, if none, do prediction. ''' def __init__(self, x, y=None): if y is None: self.y = y else: self.y = torch.FloatTensor(y) self.x = torch.FloatTensor(x) def __getitem__(self, idx): if self.y is None: return self.x[idx] else: return self.x[idx], self.y[idx] def __len__(self): return len(self.x)
Neural Network Model
Try out different model architectures by modifying the class below.
class My_Model(nn.Module): def __init__(self, input_dim): super(My_Model, self).__init__() # TODO: modify model's structure, be aware of dimensions. self.layers = nn.Sequential( nn.Linear(input_dim, 16), nn.ReLU(), nn.Linear(16, 8), nn.ReLU(), nn.Linear(8, 1) ) def forward(self, x): x = self.layers(x) x = x.squeeze(1) # (B, 1) -> (B) return x