BBSD
对于 BBSD,我们使用分类器的 softmax 输出进行黑盒漂移检测。 此方法基于Detecting and Correcting for Label Shift with Black Box Predictors。 ResNet 分类器是根据实例标准化的数据进行训练的;因此,我们需要重新调整数据。
X_ref_bbsds = scale_by_instance(X_ref) X_h0_bbsds = scale_by_instance(X_h0) X_c_bbsds = [scale_by_instance(X_c[i]) for i in range(n_corr)] 复制代码
接下来进行漂移检测器的初始化。 这里我们使用 softmax 层的输出来检测漂移,但也可以通过将“layer”设置为模型中所需隐藏层的索引来提取其他隐藏层:
from alibi_detect.cd.tensorflow import HiddenOutput # define preprocessing function preprocess_fn = partial(preprocess_drift, model=HiddenOutput(clf, layer=-1), batch_size=128) # 初始化漂移检测器 # initialise drift detector cd = MMDDrift(X_ref_bbsds, backend='tensorflow', p_val=.05, preprocess_fn=preprocess_fn, n_permutations=100) 复制代码
make_predictions(cd, X_h0_bbsds, X_c_bbsds, corruption) 复制代码
运行结果:
No corruption Drift? No! p-value: 0.440 Time (s) 3.072 Corruption type: gaussian_noise Drift? Yes! p-value: 0.000 Time (s) 7.701 Corruption type: motion_blur Drift? Yes! p-value: 0.000 Time (s) 7.754 Corruption type: brightness Drift? Yes! p-value: 0.000 Time (s) 7.760 Corruption type: pixelate Drift? Yes! p-value: 0.000 Time (s) 7.732 复制代码
同样,漂移仅在受扰动的数据上出现。
使用 PyTorch 后端检测漂移
我们可以使用 PyTorch 后端做同样的事情。 我们使用随机初始化的编码器作为预处理步骤来说明这一点:
import torch import torch.nn as nn # set random seed and device seed = 0 torch.manual_seed(seed) torch.cuda.manual_seed(seed) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(device) 复制代码
运行结果:
cuda 复制代码
由于我们的 PyTorch 编码器需要(批量大小、通道、高度、宽度)格式化的图像,我们转换(transpose)数据:
注意:
正常情况下,我们读取某张图片(tensor)的顺序是
(h,w,c)
分别表示(通道数,宽,高)。在pytorch中,对图像的存储(tensor)的顺序是
(batch,channel,h,w)
分别表示(图片数量,通道数,图片高,图片宽)。
def permute_c(x): return np.transpose(x.astype(np.float32), (0, 3, 1, 2)) # X_ref ---> (5000, 32, 32, 3) # X_ref_pt ---> (5000, 3, 32, 32) X_ref_pt = permute_c(X_ref) X_h0_pt = permute_c(X_h0) X_c_pt = [permute_c(xc) for xc in X_c] print(X_ref_pt.shape, X_h0_pt.shape, X_c_pt[0].shape) 复制代码
运行结果:
(5000, 3, 32, 32) (5000, 3, 32, 32) (10000, 3, 32, 32) 复制代码
from alibi_detect.cd.pytorch import preprocess_drift # define encoder encoder_net = nn.Sequential( nn.Conv2d(3, 64, 4, stride=2, padding=0), nn.ReLU(), nn.Conv2d(64, 128, 4, stride=2, padding=0), nn.ReLU(), nn.Conv2d(128, 512, 4, stride=2, padding=0), nn.ReLU(), nn.Flatten(), nn.Linear(2048, encoding_dim) ).to(device).eval() # define preprocessing function preprocess_fn = partial(preprocess_drift, model=encoder_net, device=device, batch_size=512) # initialise drift detector cd = MMDDrift(X_ref_pt, backend='pytorch', p_val=.05, preprocess_fn=preprocess_fn, n_permutations=100) 复制代码
make_predictions(cd, X_h0_pt, X_c_pt, corruption) 复制代码
运行结果:
No corruption Drift? No! p-value: 0.730 Time (s) 0.478 Corruption type: gaussian_noise Drift? Yes! p-value: 0.000 Time (s) 1.104 Corruption type: motion_blur Drift? Yes! p-value: 0.000 Time (s) 1.066 Corruption type: brightness Drift? Yes! p-value: 0.000 Time (s) 1.065 Corruption type: pixelate Drift? Yes! p-value: 0.000 Time (s) 1.066 复制代码
如果GPU可用,漂移检测器将尝试使用 GPU,否则将使用 CPU。 我们还可以显式指定设备。 让我们比较一下 GPU 加速与 CPU 实现:
device = torch.device('cpu') preprocess_fn = partial(preprocess_drift, model=encoder_net.to(device), device=device, batch_size=512) cd = MMDDrift(X_ref_pt, backend='pytorch', preprocess_fn=preprocess_fn, device='cpu') 复制代码
make_predictions(cd, X_h0_pt, X_c_pt, corruption) 复制代码
运行结果:
No corruption Drift? No! p-value: 0.670 Time (s) 14.282 Corruption type: gaussian_noise Drift? Yes! p-value: 0.000 Time (s) 32.061 Corruption type: motion_blur Drift? Yes! p-value: 0.000 Time (s) 32.060 Corruption type: brightness Drift? Yes! p-value: 0.000 Time (s) 32.459 Corruption type: pixelate Drift? Yes! p-value: 0.000 Time (s) 35.935 复制代码
注意:
GPU 提供超过 30 倍的加速。
与 TensorFlow 实现类似,PyTorch 还可以通过以下使用来自预训练模型的隐藏层输出的方式进行预处理步骤:
from alibi_detect.cd.pytorch import HiddenOutput