K-S 检测器
初始化
我们继续初始化漂移检测器。从这里开始,检测器的工作原理与其他方式(如图像)相同。 请查看图像示例或 K-S 检测器文档以获取有关每个可能参数的更多信息。
from functools import partial from alibi_detect.cd.tensorflow import preprocess_drift # 定义预处理函数 # define preprocessing function preprocess_fn = partial(preprocess_drift, model=uae, tokenizer=tokenizer, max_len=max_len, batch_size=32) # 初始化检测器,指定参考数据集X_ref # initialize detector cd = KSDrift(X_ref, p_val=.05, preprocess_fn=preprocess_fn, input_shape=(max_len,)) # 保存/加载一个初始化检测器 # we can also save/load an initialised detector filepath = 'my_path' # change to directory where detector is saved save_detector(cd, filepath) cd = load_detector(filepath) 复制代码
检测漂移
让我们首先检查在训练集中与参考数据集相似的样本上是否发生漂移。
preds_h0 = cd.predict(X_h0) labels = ['No!', 'Yes!'] print('Drift? {}'.format(labels[preds_h0['data']['is_drift']])) print('p-value: {}'.format(preds_h0['data']['p_val'])) 复制代码
运行结果:
Drift? No! p-value: [0.31356168 0.18111965 0.60991895 0.43243074 0.6852314 0.722555 0.28769323 0.18111965 0.50035924 0.9134755 0.40047103 0.79439443 0.79439443 0.722555 0.5726548 0.1640792 0.9540582 0.60991895 0.5726548 0.5726548 0.31356168 0.40047103 0.6852314 0.34099194 0.5726548 0.07762147 0.79439443 0.09710453 0.5726548 0.79439443 0.7590978 0.26338065] 复制代码
检测不平衡和扰动数据集上的漂移:
# 不平衡数据集 for k, v in X_imb.items(): preds = cd.predict(v) print('% negative sentiment {}'.format(k * 100)) print('Drift? {}'.format(labels[preds['data']['is_drift']])) print('p-value: {}'.format(preds['data']['p_val'])) print('') 复制代码
运行结果:
% negative sentiment 10.0 Drift? Yes! p-value: [4.32430744e-01 4.00471032e-01 5.46463318e-02 7.76214674e-02 1.08282514e-01 1.12110768e-02 6.91903234e-02 2.82894098e-03 8.59294355e-01 6.47557259e-01 1.33834302e-01 7.94394433e-01 4.28151786e-02 2.87693232e-01 6.09918952e-01 1.33834302e-01 2.40603596e-01 9.71045271e-02 7.76214674e-02 9.35580969e-01 2.87693232e-01 2.92505771e-02 4.00471032e-01 6.09918952e-01 2.87693232e-01 5.06567594e-04 1.64079204e-01 6.09918952e-01 1.33834302e-01 2.19330013e-01 7.94394433e-01 2.56591532e-02] % negative sentiment 90.0 Drift? Yes! p-value: [7.36993998e-02 1.37563676e-01 5.86588383e-02 5.07961273e-01 8.37696046e-02 8.80799629e-03 1.23670578e-01 1.76981179e-04 3.21924835e-01 1.20594716e-02 8.43600273e-01 4.08206195e-01 1.69703156e-01 5.79056978e-01 6.32701874e-01 4.48510349e-02 5.07465303e-01 6.64306164e-04 5.23085408e-02 3.78374875e-01 6.65342569e-01 4.06090707e-01 6.21288121e-01 5.85612692e-02 5.87646782e-01 7.55570829e-03 8.99188042e-01 1.18489005e-02 6.68586135e-01 1.01421457e-02 7.97733963e-02 1.73885196e-01] 复制代码
# 扰动数据集 for w, probas in X_word.items(): for p, v in probas.items(): preds = cd.predict(v) print('Word: {} -- % perturbed: {}'.format(w, p)) print('Drift? {}'.format(labels[preds['data']['is_drift']])) print('p-value: {}'.format(preds['data']['p_val'])) print('') 复制代码
运行结果:
Word: fantastic -- % perturbed: 1.0 Drift? No! p-value: [0.8879386 0.01711409 0.2406036 0.9134755 0.21933001 0.04281518 0.03778438 0.28769323 0.3699725 0.996931 0.8879386 0.43243074 0.01121108 0.6852314 0.99870795 0.996931 0.93558097 0.99365413 0.02246371 0.60991895 0.8879386 0.34099194 0.09710453 0.8879386 0.1338343 0.06155144 0.85929435 0.99365413 0.07762147 0.07762147 0.9882611 0.85929435] Word: fantastic -- % perturbed: 5.0 Drift? Yes! p-value: [1.29345525e-02 1.69780876e-14 1.52437299e-11 5.72654784e-01 1.85489473e-08 1.88342838e-17 6.14975981e-09 4.28151786e-02 5.62237052e-13 2.13202584e-05 4.28151786e-02 1.97469308e-09 0.00000000e+00 1.48931602e-02 9.68870163e-01 1.29345525e-02 2.63380647e-01 1.08282514e-01 1.04535818e-26 4.28151786e-02 2.13202584e-05 3.47411038e-14 1.09291570e-20 1.08282514e-01 5.68982140e-18 1.69780876e-14 1.64079204e-01 4.00471032e-01 3.12689441e-34 3.89208371e-27 2.86525619e-06 1.71956726e-05] Word: good -- % perturbed: 1.0 Drift? Yes! p-value: [3.40991944e-01 9.80161786e-01 1.08282514e-01 9.98707950e-01 1.48338065e-01 9.35580969e-01 7.59097815e-01 9.88261104e-01 8.87938619e-01 6.47557259e-01 9.68870163e-01 7.94394433e-01 8.69054198e-02 9.99999642e-01 9.96931016e-01 5.72654784e-01 9.99870896e-01 4.32430744e-01 9.99870896e-01 2.92505771e-02 9.13475513e-01 9.13475513e-01 4.65766221e-01 9.35580969e-01 8.87938619e-01 9.98707950e-01 9.80161786e-01 9.99972701e-01 7.59097815e-01 1.34916729e-04 9.96931016e-01 9.68870163e-01] Word: good -- % perturbed: 5.0 Drift? Yes! p-value: [6.1319246e-16 8.5929435e-01 8.4248814e-24 5.3605431e-01 6.1410643e-10 1.9951835e-01 2.9080641e-04 3.6997250e-01 2.4072561e-04 3.3837957e-10 9.5405817e-01 8.6666952e-04 5.2673625e-28 1.4893160e-02 9.7104527e-02 5.3955968e-11 1.6407920e-01 6.1410643e-10 7.2255498e-01 2.5362303e-18 7.9439443e-01 1.7943768e-06 1.5330249e-07 2.0378644e-03 1.4563050e-03 2.1933001e-01 1.9626908e-02 6.4755726e-01 1.4790693e-09 0.0000000e+00 1.9626908e-02 3.1356168e-01] Word: bad -- % perturbed: 1.0 Drift? No! p-value: [0.8879386 0.21933001 0.12050407 0.9540582 0.9134755 0.9540582 0.99870795 0.9540582 0.7590978 0.40047103 0.9801618 0.7590978 0.02925058 0.996931 0.9995433 0.79439443 0.26338065 0.04281518 0.93558097 0.14833806 0.50035924 0.82795686 0.18111965 0.43243074 0.99365413 0.9882611 0.9801618 0.99870795 0.96887016 0.10828251 0.07762147 0.9882611 ] Word: bad -- % perturbed: 5.0 Drift? Yes! p-value: [7.04859247e-08 5.78442112e-12 7.08821891e-21 1.33834302e-01 7.13247118e-06 3.69972497e-01 9.68870163e-01 1.81119651e-01 2.13202584e-05 3.47411038e-14 5.00359237e-01 1.97830971e-07 9.82534992e-39 1.03241683e-03 1.96269080e-02 2.92505771e-02 8.76041099e-07 8.49670826e-18 1.08282514e-01 3.38379574e-10 8.07501343e-25 5.37760343e-07 2.79573150e-17 2.40344345e-03 1.99518353e-01 7.59097815e-01 8.69054198e-02 3.32311448e-03 2.15581372e-12 3.95873130e-15 1.95523170e-16 5.72654784e-01] Word: horrible -- % perturbed: 1.0 Drift? Yes! p-value: [2.63380647e-01 9.98707950e-01 9.98707950e-01 9.88261104e-01 6.47557259e-01 8.59294355e-01 9.96931016e-01 9.13475513e-01 3.50604125e-04 9.99870896e-01 9.99870896e-01 6.09918952e-01 1.33834302e-01 9.80161786e-01 9.35580969e-01 9.88261104e-01 9.71045271e-02 4.00471032e-01 6.85231388e-01 1.81119651e-01 4.65766221e-01 9.80161786e-01 8.69054198e-02 9.96931016e-01 9.99870896e-01 6.91903234e-02 9.80161786e-01 9.99972701e-01 9.93654132e-01 5.32228360e-03 1.20504074e-01 7.22554982e-01] Word: horrible -- % perturbed: 5.0 Drift? Yes! p-value: [1.6978088e-14 8.8793862e-01 2.8769323e-01 5.7265478e-01 1.3491673e-04 1.7114086e-02 4.3243074e-01 1.1211077e-02 8.5801831e-33 3.5060412e-04 8.6905420e-02 6.1497598e-09 1.4797455e-32 1.3383430e-01 1.7244401e-03 2.6338065e-01 1.4117470e-08 3.5060412e-04 5.7140245e-15 4.9547091e-14 5.9822431e-37 8.9143086e-06 8.4967083e-18 3.1356168e-01 8.7604110e-07 3.9584363e-20 1.4833806e-01 1.7244401e-03 1.1053569e-12 0.0000000e+00 1.3007273e-15 2.9250577e-02] 复制代码
MMD TensorFlow 检测器
初始化
再次查看 图像示例 或 MMD 检测器文档以获取有关每个可能参数的更多信息。
cd = MMDDrift(X_ref, p_val=.05, preprocess_fn=preprocess_fn, n_permutations=100, input_shape=(max_len,)) 复制代码
检测漂移
H0数据集:
preds_h0 = cd.predict(X_h0) labels = ['No!', 'Yes!'] print('Drift? {}'.format(labels[preds_h0['data']['is_drift']])) print('p-value: {}'.format(preds_h0['data']['p_val'])) 复制代码
运行结果:
Drift? No! p-value: 0.6 复制代码
不平衡数据集:
for k, v in X_imb.items(): preds = cd.predict(v) print('% negative sentiment {}'.format(k * 100)) print('Drift? {}'.format(labels[preds['data']['is_drift']])) print('p-value: {}'.format(preds['data']['p_val'])) print('') 复制代码
运行结果:
% negative sentiment 10.0 Drift? Yes! p-value: 0.01 % negative sentiment 90.0 Drift? Yes! p-value: 0.0 复制代码
扰动数据集:
for w, probas in X_word.items(): for p, v in probas.items(): preds = cd.predict(v) print('Word: {} -- % perturbed: {}'.format(w, p)) print('Drift? {}'.format(labels[preds['data']['is_drift']])) print('p-value: {}'.format(preds['data']['p_val'])) print('') 复制代码
运行结果:
Word: fantastic -- % perturbed: 1.0 Drift? No! p-value: 0.09 Word: fantastic -- % perturbed: 5.0 Drift? Yes! p-value: 0.0 Word: good -- % perturbed: 1.0 Drift? No! p-value: 0.71 Word: good -- % perturbed: 5.0 Drift? Yes! p-value: 0.0 Word: bad -- % perturbed: 1.0 Drift? No! p-value: 0.38 Word: bad -- % perturbed: 5.0 Drift? Yes! p-value: 0.0 Word: horrible -- % perturbed: 1.0 Drift? No! p-value: 0.18 Word: horrible -- % perturbed: 5.0 Drift? Yes! p-value: 0.0 复制代码